# **Text classification**

*Text classification involves categorizing text into predefined categories or labels. It is commonly used for sentiment analysis, spam detection, and topic classification.*

In [1]:
from transformers import pipeline

# Load a pre-trained text classification model
classifier = pipeline('sentiment-analysis')

# Example text for classification
text_to_classify = "I really enjoyed the movie. The plot was intriguing, and the acting was superb."

# Perform text classification
classification_result = classifier(text_to_classify)

# Display the result
print(f"Text: {text_to_classify}")
print(f"Predicted Sentiment: {classification_result[0]['label']} with confidence: {classification_result[0]['score']}")

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

Device set to use cpu


Text: I really enjoyed the movie. The plot was intriguing, and the acting was superb.
Predicted Sentiment: POSITIVE with confidence: 0.9998787641525269


# ***Text classification-NER***

*Token classification involves assigning a label to each token in a sequence. It is commonly used for Named Entity Recognition (NER) tasks.*


In [2]:
from transformers import pipeline

# Load the NER pipeline
ner_pipeline = pipeline('ner')

# Example text for named entity recognition
sample_text = "Hugging Face is a fantastic platform for NLP tasks. It was founded by researchers in 2016."

# Perform named entity recognition
ner_results = ner_pipeline(sample_text)

# Display the result
print(f"Sample Text: {sample_text}")
print("Named Entities:")
for entity in ner_results:
    print(f"Entity: {entity['word']}, Label: {entity['entity']}, Score: {entity['score']}")

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision 4c53496 (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/998 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.33G [00:00<?, ?B/s]

Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


tokenizer_config.json:   0%|          | 0.00/60.0 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

Device set to use cpu


Sample Text: Hugging Face is a fantastic platform for NLP tasks. It was founded by researchers in 2016.
Named Entities:
Entity: Hu, Label: I-ORG, Score: 0.8556349277496338
Entity: ##gging, Label: I-ORG, Score: 0.5030511617660522
Entity: Face, Label: I-ORG, Score: 0.8385624289512634
Entity: NL, Label: I-MISC, Score: 0.706284761428833


# ***Table Question Answering***

*Table question answering involves extracting information from tabular data. The model takes a question and a table as input and provides an answer.*


In [3]:
from transformers import pipeline
import pandas as pd

# prepare table + question
data = {"Actors": ["Brad Pitt", "Leonardo Di Caprio", "George Clooney"], "Number of movies": ["87", "53", "69"]}
table = pd.DataFrame.from_dict(data)
question = "How many movies does Leonardo Di Caprio have?"

# pipeline model
# Note: you must install torch-scatter first.
tqa = pipeline(task="table-question-answering", model="google/tapas-large-finetuned-wtq")

# result
print(tqa(table=table, query=question)['cells'][0])

config.json: 0.00B [00:00, ?B/s]

TAPAS models are not usable since `tensorflow_probability` can't be loaded. It seems you have `tensorflow_probability` installed with the wrong tensorflow version. Please try to reinstall it following the instructions here: https://github.com/tensorflow/probability.


model.safetensors:   0%|          | 0.00/1.35G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/490 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/154 [00:00<?, ?B/s]

Device set to use cpu
  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


53


# ***Normal Question Answer***

*Question answering involves providing an answer to a question based on a given context. The model takes a question and a context as input and outputs the answer.*

In [4]:
from transformers import pipeline

# Load the question answering pipeline
qa_pipeline = pipeline('question-answering')

# Example context and question for QA
context = "Hugging Face is a popular platform for natural language processing. It was founded in 2016."
question = "When was Hugging Face founded?"

# Perform question answering
qa_result = qa_pipeline(question=question, context=context)

# Display the answer
print(f"Question: {question}")
print(f"Answer: {qa_result['answer']}")

No model was supplied, defaulted to distilbert/distilbert-base-cased-distilled-squad and revision 564e9b5 (https://huggingface.co/distilbert/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/473 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/49.0 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

Device set to use cpu


Question: When was Hugging Face founded?
Answer: 2016


# ***Zero-Shot Classification***

*Zero-shot classification involves classifying text into predefined categories without any training examples for those categories.*

In [5]:
from transformers import pipeline

# Load the zero-shot classification pipeline
zero_shot_classifier = pipeline('zero-shot-classification')

# Example text for zero-shot classification
text_to_classify = "The new smartphone has impressive camera features."

# Specify candidate labels
candidate_labels = ["Technology", "Fashion", "Sports"]

# Perform zero-shot classification
classification_result = zero_shot_classifier(text_to_classify, candidate_labels)

# Display the result
print(f"Text: {text_to_classify}")
print(f"Predicted Label: {classification_result['labels'][0]} with confidence: {classification_result['scores'][0]}")

No model was supplied, defaulted to facebook/bart-large-mnli and revision d7645e1 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json: 0.00B [00:00, ?B/s]

model.safetensors:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

Device set to use cpu


Text: The new smartphone has impressive camera features.
Predicted Label: Technology with confidence: 0.9338781833648682


# ***Language Translation***

*Translation involves converting text from one language to another. The model takes a piece of text in one language as input and provides the translated text.*

In [6]:
from transformers import pipeline

# Load the translation pipeline
translator = pipeline('translation', model='Helsinki-NLP/opus-mt-en-fr')

# Example text for translation
text_to_translate = "Hello, how are you?"

# Perform translation
translation_result = translator(text_to_translate)

# Display the translated text
print(f"Text to Translate: {text_to_translate}")
print(f"Translated Text: {translation_result[0]['translation_text']}")

config.json: 0.00B [00:00, ?B/s]

pytorch_model.bin:   0%|          | 0.00/301M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/293 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/42.0 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/301M [00:00<?, ?B/s]

source.spm:   0%|          | 0.00/778k [00:00<?, ?B/s]

target.spm:   0%|          | 0.00/802k [00:00<?, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

Device set to use cpu


Text to Translate: Hello, how are you?
Translated Text: Bonjour, comment allez-vous ?


# ***Text Summarization***

*Summarization involves condensing a piece of text into a shorter version while retaining the essential information.*


In [7]:
from transformers import pipeline

# Load the summarization pipeline
summarizer = pipeline('summarization')

# Example text for summarization
text_to_summarize = "Hugging Face Transformers provide a versatile interface for various NLP tasks. The library simplifies the implementation of state-of-the-art models."

# Perform summarization
summary_result = summarizer(text_to_summarize, max_length=50, min_length=25)

# Display the summarized text
print(f"Original Text: {text_to_summarize}")
print(f"Summarized Text: {summary_result[0]['summary_text']}")

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json: 0.00B [00:00, ?B/s]

pytorch_model.bin:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

Device set to use cpu
Your max_length is set to 50, but your input_length is only 32. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=16)


Original Text: Hugging Face Transformers provide a versatile interface for various NLP tasks. The library simplifies the implementation of state-of-the-art models.
Summarized Text:  Hugging Face Transformers provides a versatile interface for various NLP tasks . The library simplifies the implementation of state-of-the-art models .
