In [3]:
from transformers import pipeline
import pandas as pd

# Text Classification

In [2]:
text_classifier = pipeline("text-classification")

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Device set to use cpu


In [5]:
output = text_classifier("I am from Iraq") # High negativity shows bias in output

pd.DataFrame(output)

Unnamed: 0,label,score
0,NEGATIVE,0.970607


# Named Entity Recognition

In [6]:
named_entity_tagger = pipeline("ner", aggregation_strategy="simple")

output = named_entity_tagger("I am from Iraq") # Loc stands for location

pd.DataFrame(output)

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision 4c53496 (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/998 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.33G [00:00<?, ?B/s]

Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


tokenizer_config.json:   0%|          | 0.00/60.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

Device set to use cpu


Unnamed: 0,entity_group,score,word,start,end
0,LOC,0.999822,Iraq,10,14


# Question Answering

In [8]:
# This is extractive question answering (answer is taken from context)

answer_machine = pipeline("question-answering")

output = answer_machine(question="Where am I from?", context="I am from Iraq")

pd.DataFrame([output])

No model was supplied, defaulted to distilbert/distilbert-base-cased-distilled-squad and revision 564e9b5 (https://huggingface.co/distilbert/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cpu


Unnamed: 0,score,start,end,answer
0,0.996927,10,14,Iraq


# Summarization

In [11]:
summarizer = pipeline("summarization")

long_text = """
Fjord corgis wrestled trombones in Tokyo. Meanwhile, fluffy kittens juggled rainbow-colored rubber chickens under a full moon. The aroma of freshly baked croissants wafted through the air as a group of sleepy sloths played a game of chess on a giant puzzle board. Suddenly, a pack of wild elephants appeared out
of nowhere, trumpeting loudly and dancing to the tune of 'La Cucaracha'."
"""

output = summarizer(long_text, min_length=50, clean_up_tokenization_spaces=True)

print(output[0]["summary_text"])

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cpu
Your max_length is set to 142, but your input_length is only 97. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=48)


 Fjord corgis wrestled trombones in Tokyo. Meanwhile, fluffy kittens juggled rainbow-colored rubber chickens under a full moon. A pack of wild elephants appeared out of nowhere, trumpeting loudly and dancing to the tune of 'La Cucaracha'


# Translation

In [18]:
# T5 can do the stuff above too

from transformers import T5Tokenizer, T5ForConditionalGeneration

translation_tokenizer = T5Tokenizer.from_pretrained("google-t5/t5-small")
translation_model = T5ForConditionalGeneration.from_pretrained("google-t5/t5-small")

input = translation_tokenizer("translate English to German: Random stuff here so yeah", return_tensors="pt").input_ids
output = translation_model.generate(input, max_length=100, num_beams=4, early_stopping=True)

print(translation_tokenizer.decode(output[0], skip_special_tokens=True))

Zufällige Dinge hier, ja


# Text Generation

In [24]:
text_generator = pipeline("text-generation")

output = text_generator("My name is Optimus Prime and I am an AutoBot.", max_length=200, truncation=True)

print(output[0]["generated_text"])

No model was supplied, defaulted to openai-community/gpt2 and revision 607a30d (https://huggingface.co/openai-community/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cpu
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


My name is Optimus Prime and I am an AutoBot. I specialize in auto repairing, repair, and repair on T7. This means I also have expertise with T7, the T9 Tengu, and T3 Cybertron. I am also a fan of D-class machines and have been very active in various D-class gaming groups during my T7 years.

T9 Tengu

What's your favorite machine that you've tried and still haven't tried?

Gigantic T-1

Lest you think a T9, just because we are in a different generation, is our most favorite to date. We are working on a whole new model of the T9 which includes the Giga T8. This model will not be possible without you. Our team at The Tech Guys do every day a tremendous job of taking care of all our customers and making sure that everybody gets the best possible results.

Lest we think a
