# Comments

In [1]:
"""
pip install transformers
pip install tensorflow
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117

"""

'\npip install transformers\npip install tensorflow\n'

# Import Zone

In [3]:
from transformers import pipeline

  from .autonotebook import tqdm as notebook_tqdm


# Sumarizer

The input to this task is a corpus of text and the model will output a summary of it based on the expected length mentioned in the parameters. Here, we have kept minimum length as 5 and maximum length as 30. 

In [4]:
summarizer = pipeline(
    "summarization", model="t5-base", tokenizer="t5-base", framework="tf"
)

All PyTorch model weights were used when initializing TFT5ForConditionalGeneration.

All the weights of TFT5ForConditionalGeneration were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFT5ForConditionalGeneration for predictions without further training.
For now, this behavior is kept to avoid breaking backwards compatibility when padding/encoding with `truncation is True`.
- Be aware that you SHOULD NOT rely on t5-base automatically truncating your input to 512 when padding/encoding.
- If you want to encode/pad to sequences longer than 512 you can either instantiate this tokenizer with `model_max_length` or pass `max_length` when encoding/padding.


In [14]:
input = "Parents need to know that Top Gun is a blockbuster 1980s action thriller starring Tom Cruise that's chock full of narrow escapes, chases, and battles. But there are also violent and upsetting scenes, particularly the death of a main character, which make it too intense for younger kids. There's also one graphic-for-its-time sex scene (though no explicit nudity) and quite a few shirtless men in locker rooms and, in one iconic sequence, on a beach volleyball court. Winning is the most important thing to all the pilots, who try to intimidate one another with plenty of posturing and banter -- though when push comes to shove, loyalty and friendship have important roles to play, too. While sexism is noticeable and almost all characters are men, two strong women help keep some of the objectification in check."
output_summarizer = summarizer(input, min_length=5, max_length=30)[0]['summary_text']
print(f'The text summarize is: {output_summarizer}')

The text summarize is : 1980s action thriller starring Tom Cruise is chock full of chases and battles . there are also violent and upsetting scenes,


# Question Answering

In this task, we provide a question and a context. The model will choose the answer from the context based on the highest probability score. It also provides the starting and ending positions of the text.

In [17]:
question_answering = pipeline(model="deepset/roberta-base-squad2")

In [21]:
output_question_answering = question_answering(
                                question="Where do I work?",
                                context="I work as a Data Scientist at a lab in University of Montreal. I like to develop my own algorithms.",
                            )
answer_qa = output_question_answering['answer']
score_qa = output_question_answering['score']
print(f'The answer of the question is: {answer_qa} with a score of: {score_qa}')

The answer of the question is: University of Montreal with a score of: 0.6422632336616516


# Name Entity Recognition

Named Entity Recognition deals with identifying and classifying the words based on the names of persons, organizations, locations and so on. The input is basically a sentence and the model will determine the named entity along with its category and its corresponding location in the text. 

In [22]:
entity_classifier = pipeline(
    model="dslim/bert-base-NER-uncased", aggregation_strategy="simple"
)

Downloading (…)lve/main/config.json: 100%|██████████| 1.26k/1.26k [00:00<00:00, 1.26MB/s]
Downloading model.safetensors: 100%|██████████| 438M/438M [00:12<00:00, 34.0MB/s] 
Some weights of the model checkpoint at dslim/bert-base-NER-uncased were not used when initializing BertForTokenClassification: ['bert.pooler.dense.weight', 'bert.pooler.dense.bias']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Downloading (…)okenizer_config.json: 100%|██████████| 39.0/39.0 [00:00<00:00, 39.1kB/s]
Downloading (…)solve/main/vocab.txt: 100%|██████████| 232k/232k [00:0

In [29]:
sentence = "John and Jane like to travel around Spain by train."
entities = entity_classifier(sentence)
for entity in entities:
        entity_group = entity['entity_group']
        word = entity['word']
        print(f'The entity group of the word {word} is {entity_group} ')


The entity group of the word john is PER 
The entity group of the word jane is PER 
The entity group of the word spain is LOC 


# Part-of-Speech Tagging 
PoS Tagging is useful to classify the text and provide its relevant parts of speech such as whether a word is a noun, pronoun, verb and so on. The model returns PoS tagged words along with their probability scores and respective locations. 

In [30]:
pos_tagger = pipeline(
    model="vblagoje/bert-english-uncased-finetuned-pos",
    aggregation_strategy="simple",
)

Downloading (…)lve/main/config.json: 100%|██████████| 1.06k/1.06k [00:00<?, ?B/s]
Downloading pytorch_model.bin: 100%|██████████| 438M/438M [00:22<00:00, 19.9MB/s] 
Some weights of the model checkpoint at vblagoje/bert-english-uncased-finetuned-pos were not used when initializing BertForTokenClassification: ['bert.pooler.dense.weight', 'bert.pooler.dense.bias']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Downloading (…)okenizer_config.json: 100%|██████████| 48.0/48.0 [00:00<?, ?B/s]
Downloading (…)solve/main/vocab.txt: 100%|██████████| 232k/232k [00:0

In [34]:
taggings = pos_tagger("I am an artist and I live in Dublin")
for tagging in taggings:
        entity_group = tagging['entity_group']
        word = tagging['word']
        print(f'The entity group of the word {word} is {entity_group} ')

The entity group of the word i is PRON 
The entity group of the word am is AUX 
The entity group of the word an is DET 
The entity group of the word artist is NOUN 
The entity group of the word and is CCONJ 
The entity group of the word i is PRON 
The entity group of the word live is VERB 
The entity group of the word in is ADP 
The entity group of the word dublin is PROPN 


# Sentiment Analizer
We will perform sentiment analysis and classify the text based on the tone.

In [3]:
text_classifier = pipeline(
    model="distilbert-base-uncased-finetuned-sst-2-english"
)

Xformers is not installed correctly. If you want to use memory_efficient_attention to accelerate training use the following command to install Xformers
pip install xformers.


In [17]:
output_classifier = text_classifier("This movie is my favorite!")
label_classfier = output_classifier[0]['label']
score_classifier = output_classifier[0]['score']
print(f'The text is classified as: {label_classfier.capitalize()} with a score of {score_classifier}')


The text is classified as: Positive with a score of 0.9997908473014832


# Text Generation

In [35]:
text_generator = pipeline(model="gpt2")

Downloading (…)lve/main/config.json: 100%|██████████| 665/665 [00:00<?, ?B/s] 
Downloading model.safetensors: 100%|██████████| 548M/548M [00:15<00:00, 34.3MB/s] 
Downloading (…)neration_config.json: 100%|██████████| 124/124 [00:00<?, ?B/s] 
Downloading (…)olve/main/vocab.json: 100%|██████████| 1.04M/1.04M [00:00<00:00, 8.39MB/s]
Downloading (…)olve/main/merges.txt: 100%|██████████| 456k/456k [00:00<00:00, 11.8MB/s]
Downloading (…)/main/tokenizer.json: 100%|██████████| 1.36M/1.36M [00:00<00:00, 1.83MB/s]
Xformers is not installed correctly. If you want to use memory_efficient_attention to accelerate training use the following command to install Xformers
pip install xformers.


In [37]:
generated_text = text_generator("If it is sunny today then ", do_sample=False)[0]['generated_text']
print(f'The text generated is: {generated_text}')


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


The text generated is: If it is sunny today then  it will be cloudy tomorrow.
I have been using this for a while now and I am very happy with it. I have been using it for a while now and I am very happy with it. I


# Text Translation
Here, we will translate the language of text from one language to another.

In [41]:
en_fr_translator = pipeline("translation_en_to_fr", model='t5-small')

Downloading (…)lve/main/config.json: 100%|██████████| 1.21k/1.21k [00:00<00:00, 1.20MB/s]
Downloading model.safetensors: 100%|██████████| 242M/242M [00:07<00:00, 32.1MB/s] 
Downloading (…)neration_config.json: 100%|██████████| 147/147 [00:00<00:00, 147kB/s]
Downloading (…)okenizer_config.json: 100%|██████████| 2.32k/2.32k [00:00<?, ?B/s]
Downloading (…)ve/main/spiece.model: 100%|██████████| 792k/792k [00:00<00:00, 5.98MB/s]
Downloading (…)/main/tokenizer.json: 100%|██████████| 1.39M/1.39M [00:00<00:00, 13.5MB/s]


In [43]:
output_translator = en_fr_translator("Hi, How are you?")
text_translated = output_translator[0]['translation_text']
print(f'The text translated is: {text_translated}')

The text translated is Bonjour, Comment êtes-vous ?
