# Pipelines with the Transformers library
This notebook demonstrates the pipeline feature of the Transformers library for various NLP tasks such as text generation, summarization, named entity recognition, question answering and so on. Let's go!

## Setup
### Imports

In [1]:
import pandas as pd
from transformers import pipeline, set_seed

### Sample Text
The following text sample will be used throughout this notebook.

In [2]:
text = """Dear Amazon, last week I ordered an Optimus Prime action figure \
from your online store in Germany. Unfortunately, when I opened the package, \
I discovered to my horror that I had been sent an action figure of Megatron \
instead! As a lifelong enemy of the Decepticons, I hope you can understand my \
dilemma. To resolve the issue, I demand an exchange of Megatron for the \
Optimus Prime figure I ordered. Enclosed are copies of my records concerning \
this purchase. I expect to hear from you soon. Sincerely, Bumblebee."""

print('')
print(text.replace(". ", ".\n").replace("! ", "!\n"))


Dear Amazon, last week I ordered an Optimus Prime action figure from your online store in Germany.
Unfortunately, when I opened the package, I discovered to my horror that I had been sent an action figure of Megatron instead!
As a lifelong enemy of the Decepticons, I hope you can understand my dilemma.
To resolve the issue, I demand an exchange of Megatron for the Optimus Prime figure I ordered.
Enclosed are copies of my records concerning this purchase.
I expect to hear from you soon.
Sincerely, Bumblebee.


-------
## Text Classification
We initialize a ***text classification*** pipeline and use it classify our sample text.

In [3]:
classifier = pipeline("text-classification")
outputs = classifier(text)
print('')
pd.DataFrame(outputs)   

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cuda:0





Unnamed: 0,label,score
0,NEGATIVE,0.901546


Since we did not specify the particular model that we wish to use, it defaulted to:  
[`distilbert/distilbert-base-uncased-finetuned-sst-2-english (revision 714eb0f)`](https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english)

The default model was able to determine that the sentiment in the text sample is `NEGATIVE`.

-------
## Named Entity Recognition
Now let's try a pipeline for ***named entity recognition***.

In [4]:
ner_tagger = pipeline("ner", aggregation_strategy="simple")

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision 4c53496 (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use cuda:0


In [5]:
outputs = ner_tagger(text)
pd.DataFrame(outputs)

Unnamed: 0,entity_group,score,word,start,end
0,ORG,0.879009,Amazon,5,11
1,MISC,0.990859,Optimus Prime,36,49
2,LOC,0.999755,Germany,90,97
3,MISC,0.556567,Mega,208,212
4,PER,0.590257,##tron,212,216
5,ORG,0.669692,Decept,253,259
6,MISC,0.498351,##icons,259,264
7,MISC,0.775361,Megatron,350,358
8,MISC,0.987854,Optimus Prime,367,380
9,PER,0.812096,Bumblebee,502,511


Entities have been determined successfully from our sample text, and next to each detected entity is the confidence score as the type of entity it is.

Again, default model used:  
[`dbmdz/bert-large-cased-finetuned-conll03-english and revision 4c53496`](https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english)

-------
## Question Answering
How about asking some questions about our sample text. Next up is the ***question answering*** pipeline.

In [6]:
reader = pipeline("question-answering")

No model was supplied, defaulted to distilbert/distilbert-base-cased-distilled-squad and revision 564e9b5 (https://huggingface.co/distilbert/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cuda:0


In [7]:
question = "What does the customer want?"
outputs = reader(question=question, context=text)
pd.DataFrame([outputs])

Unnamed: 0,score,start,end,answer
0,0.631292,335,358,an exchange of Megatron


In [8]:
question = "What is the name of the unhappy customer?"
outputs = reader(question=question, context=text)
pd.DataFrame([outputs])

Unnamed: 0,score,start,end,answer
0,0.448951,502,511,Bumblebee


In [9]:
question = "What is the customer's problem with Megatron?"
outputs = reader(question=question, context=text)
pd.DataFrame([outputs])

Unnamed: 0,score,start,end,answer
0,0.130827,295,302,dilemma


In [10]:
question = "What is the customer's problem with the item they received?"
outputs = reader(question=question, context=text)
pd.DataFrame([outputs])

Unnamed: 0,score,start,end,answer
0,0.233518,399,432,Enclosed are copies of my records


Well, it got 2 of our 4 questions right :/. However, for the last 2 questions, at least the returned answers had low confidence scores. This goes to show that we need to invest time into selecting a model that suits the task at hand.

-------
## Summarization
Although our sample text is already quite short, let's try a ***summarization*** model to see if we can get a good summary on it.

In [11]:
summarizer = pipeline("summarization")

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cuda:0


In [12]:
outputs = summarizer(text, min_length=10, max_length=45, clean_up_tokenization_spaces=True)
print(outputs[0]['summary_text'])

 Bumblebee ordered an Optimus Prime action figure from your online store in Germany. Unfortunately, when I opened the package, I discovered to my horror that I had been sent an action figure of Megatron instead.


A summary was obtained, but its quality leaves room for improvement. Again, specifying a particular model may help here.

-------
## Translation
For fun, let's try a ***translation*** pipeline to translate our sample text to German.

In [13]:

translator = pipeline("translation_en_to_de", model="Helsinki-NLP/opus-mt-en-de")

Device set to use cuda:0


In [14]:
outputs = translator(text, clean_up_tokenization_spaces=True, min_length=100)
print(outputs[0]['translation_text'].replace(". ", ".\n").replace("! ", "!\n"))

Sehr geehrter Amazon, letzte Woche habe ich eine Optimus Prime Action Figur aus Ihrem Online-Shop in Deutschland bestellt.
Leider, als ich das Paket öffnete, entdeckte ich zu meinem Entsetzen, dass ich stattdessen eine Action Figur von Megatron geschickt worden war!
Als lebenslanger Feind der Decepticons, Ich hoffe, Sie können mein Dilemma verstehen.
Um das Problem zu lösen, Ich fordere einen Austausch von Megatron für die Optimus Prime Figur habe ich bestellt.
Eingeschlossen sind Kopien meiner Aufzeichnungen über diesen Kauf.
Ich erwarte, von Ihnen bald zu hören.
Aufrichtig, Bumblebee.


Well, I don't speak German, but when I pasted the output on Google Translate and converted back to English, it was correct. :)

-------
## Text Generation
Finally, we will generate text via a ***text generation*** pipeline.

In [15]:
generator = pipeline("text-generation")

No model was supplied, defaulted to openai-community/gpt2 and revision 607a30d (https://huggingface.co/openai-community/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cuda:0


In [16]:
response = "Dear Bumblebee, I am sorry to hear that your order was mixed up."
prompt = text + "\n\nCustomer service response:\n" + response

set_seed(42) # Set the seed to get reproducible results
outputs = generator(prompt, max_length=200)
print('')
print(outputs[0]['generated_text'].replace(". ", ".\n").replace("! ", "!\n"))

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.



Dear Amazon, last week I ordered an Optimus Prime action figure from your online store in Germany.
Unfortunately, when I opened the package, I discovered to my horror that I had been sent an action figure of Megatron instead!
As a lifelong enemy of the Decepticons, I hope you can understand my dilemma.
To resolve the issue, I demand an exchange of Megatron for the Optimus Prime figure I ordered.
Enclosed are copies of my records concerning this purchase.
I expect to hear from you soon.
Sincerely, Bumblebee.

Customer service response:
Dear Bumblebee, I am sorry to hear that your order was mixed up.
It was only that the Optimus Prime statue was not delivered.
When I checked it out online, it seemed to be a completely different figure, but this seems odd.
The Transformers are a different species.
The Optimus Prime from the movie is the latest incarnation of the Decepticon Decepticons, known as the Death Stars.
It had


The generated customer service service seems reasonable enough.

-------
## THE END
And that was a quick tour of Transformer pipelines.

That's all folks!