<a href="https://colab.research.google.com/github/Kaif10/NLP-with-HuggingFace/blob/main/NLP_tasks.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In this notebook, I have used the pipeline API of Hugging Face Transformers for solving various NLP tasks.

The easiest way to use a pretrained model on a given task is to use pipeline(). 🤗 Transformers provides the following tasks out of the box:

Sentiment analysis: is a text positive or negative?

Question answering: provide the model with some context and a question, extract the answer from the context.

Summarization: generate a summary of a long text.

Translation: translate a text in another language.


In [5]:
# Transformers installation
! pip install transformers
# To install from source instead of the last release, comment the command above and uncomment the following one.
# ! pip install git+https://github.com/huggingface/transformers.git



### Sequence classification

Sequence classification is the task of classifying sequences according to a given number of classes. An example
of sequence classification is the GLUE dataset, which is entirely based on that task. 

Here is an example of using pipelines to do sentiment analysis: identifying if a sequence is positive or negative.
It leverages a fine-tuned model on sst2, which is a GLUE task.

This returns a label ("POSITIVE" or "NEGATIVE") alongside a score, as follows:

In [16]:
from transformers import pipeline
nlp = pipeline("sentiment-analysis")

In [34]:
# nlp returns a list of a dictionary with keys of labels and score
result = nlp("I hate you")
print(result)

result_2 = nlp("I love you")[0]
print(f"label: {result_2['label']}, with score: {round(result_2['score'], 4)}")

result_3 = nlp("I had a very frustrating day at work today")[0]
print(f"label: {result_3['label']}, with score: {round(result_3['score'], 4)}")

[{'label': 'NEGATIVE', 'score': 0.9991129040718079}]
label: POSITIVE, with score: 0.9999
label: NEGATIVE, with score: 0.9984


### Text-Translation

An example of a translation dataset is the WMT English to German dataset, which has sentences in English as the input data
and the corresponding sentences in German as the target data.


Here is an example of using the pipelines to do translation.
It leverages a T5 model that was only pre-trained on a multi-task mixture dataset (including WMT), yet, yielding impressive
translation results.

In [13]:
translator = pipeline("translation_en_to_de")

Some weights of the model checkpoint at t5-base were not used when initializing T5Model: ['decoder.block.0.layer.1.EncDecAttention.relative_attention_bias.weight']
- This IS expected if you are initializing T5Model from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing T5Model from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of the model checkpoint at t5-base were not used when initializing T5ForConditionalGeneration: ['decoder.block.0.layer.1.EncDecAttention.relative_attention_bias.weight']
- This IS expected if you are initializing T5ForConditionalGeneration from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification

[{'translation_text': 'Fußball ist der beste Sport der Welt'}]
[{'translation_text': 'Es war eine warme Sommernacht in Griechenland'}]


### German to English

In [20]:
print(translator("Football is the best sport in the world", max_length=40))
print((translator("It was a warm summer night in Greece", max_length=40)))
print((translator("Always respect your elders and be kind to everyone", max_length=40)))

[{'translation_text': 'Fußball ist der beste Sport der Welt'}]
[{'translation_text': 'Es war eine warme Sommernacht in Griechenland'}]
[{'translation_text': 'Respektieren Sie immer Ihre ltesten und seien Sie freundlich mit jedem'}]


### Extractive Question Answering

Extractive Question Answering is the task of extracting an answer from a text given a question. An example of a
question answering dataset is the SQuAD dataset, which is entirely based on that task.

Here is an example of using pipelines to do question answering: extracting an answer from a text given a question.
It leverages a fine-tuned model on SQuAD.

In [9]:
nlp = pipeline("question-answering")
context = r"""
Extractive Question Answering is the task of extracting an answer from a text given a question. An example of a
question answering dataset is the SQuAD dataset, which is entirely based on that task. If you would like to fine-tune
a model on a SQuAD task, you may leverage the examples/question-answering/run_squad.py script.
"""

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=473.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=260793700.0, style=ProgressStyle(descri…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=213450.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=435797.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=230.0, style=ProgressStyle(description_…




In [10]:
result = nlp(question="What is extractive question answering?", context=context)
print(f"Answer: '{result['answer']}', score: {round(result['score'], 4)}, start: {result['start']}, end: {result['end']}")
result = nlp(question="What is a good example of a question answering dataset?", context=context)
print(f"Answer: '{result['answer']}', score: {round(result['score'], 4)}, start: {result['start']}, end: {result['end']}")

Answer: 'the task of extracting an answer from a text given a question', score: 0.6226, start: 34, end: 95
Answer: 'SQuAD dataset', score: 0.5053, start: 147, end: 160


### Text Summarization 
#### With the help of in-built summarization pipeline of hugging face transformers

Summarization is the task of summarizing a document or an article into a shorter text.
Here is an example of using the pipelines to do summarization. It leverages a Bart model that was fine-tuned on the CNN / Daily Mail data set.

In [11]:
summarizer = pipeline("summarization")
ARTICLE = """ The color of animals is by no means a matter of chance; it depends on many considerations, 
but in the majority of cases tends to protect the animal from danger by rendering it less conspicuous. 
Perhaps it may be said that if coloring is mainly protective, there ought to be but few brightly colored animals. 
There are, however, not a few cases in which vivid colors are themselves protective. 
The kingfisher itself, though so brightly colored, is by no means easy to see. 
The blue harmonizes with the water, and the bird as it darts along the stream looks almost like a flash of sunlight.
Desert animals are generally the color of the desert. Thus, for instance, the lion, the antelope, and the wild donkey are all sand-colored. 
“Indeed,” says Canon Tristram, “in the desert, where neither trees, brushwood, nor even undulation of the surface afford the slightest protection to its foes, a modification of color assimilated to that of the surrounding country is absolutely necessary.
Hence, without exception, the upper plumage of every bird, and also the fur of all the smaller mammals and the skin of all the snakes and lizards, is of one uniform sand color.
The next point is the color of the mature caterpillars, some of which are brown. This probably makes the caterpillar even more conspicuous among the green leaves than would otherwise be the case. 
Let us see, then, whether the habits of the insect will throw any light upon the riddle.
What would you do if you were a big caterpillar? 
Why, like most other defenseless creatures, you would feed by night, and lie concealed by day. 
So do these caterpillars.
When the morning light comes, they creep down the stem of the food plant, and lie concealed among the thick herbage and dry sticks and leaves, near the ground, and it is obvious that under such circumstances the brown color really becomes a protection. 
It might indeed be argued that the caterpillars, having become brown, concealed themselves on the ground, and that we were reversing the state of things. 
But this is not so, because, while we may say as a general rule that large caterpillars feed by night and lie concealed by day, it is by no means always the case that they are brown; some of them still retaining the green color. 
We may then conclude that the habit of concealing themselves by day came first, and that the brown color is a later adaptation.
"""

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1621.0, style=ProgressStyle(description…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1222317369.0, style=ProgressStyle(descr…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=898822.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=456318.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=26.0, style=ProgressStyle(description_w…




## Summary of the above text.

In [12]:
print(summarizer(ARTICLE, max_length=130, min_length=30, do_sample=False))

[{'summary_text': ' The color of animals is by no means a matter of chance; it depends on many considerations . In the majority of cases, color tends to protect the animal from danger by rendering it less conspicuous . The lion, the antelope, and the wild donkey are all sand-colored .'}]


In [25]:
m = {'lannel': "dss", 'dsfsd' : "wedwd"}

In [29]:
m['lannel']

'dss'