### A Tour of Transformer Applications

Text is everywhere around us and being able to understand and act on information we can find in text is a crucial aspect in every company.

Start with some basic imports

Note: You will also need to install pytorch to run this notebook. Install with <pre>pip install torch torchvision

In [1]:
import pandas as pd

Let's start with some text, some feedback from a customer

In [2]:
text = """Dear Amazon, last week I ordered an Optimus Prime action figure \
from your online store in Germany. Unfortunately, when I opened the package, \
I discovered to my horror that I had been sent an action figure of Megatron \
instead! As a lifelong enemy of the Decepticons, I hope you can understand my \
dilemma. To resolve the issue, I demand an exchange of Megatron for the \
Optimus Prime figure I ordered. Enclosed are copies of my records concerning \
this purchase. I expect to hear from you soon. Sincerely, Bumblebee."""

We probably want to want to understand the feedback and then be able to respond. We may also want to use all the feedback we get from all of our reviews and letters to find general trends or anomalies that we should respond to.

*Hugging Face [Pipelines](https://huggingface.co/docs/transformers/main_classes/pipelines) are a great and easy way to use models for inference. These pipelines are objects that abstract most of the complex code from the library, offering a simple API dedicated to several tasks, including Named Entity Recognition, Masked Language Modeling, Sentiment Analysis, Feature Extraction and Question Answering. See the Hugging Face pipeline [task summary](https://huggingface.co/docs/transformers/master/task_summary) for examples of use.*

*There are pipelines for:*
* Audio Classification
* Conversations
* Feature Extraction
* Image Classification
* Object Detection
* Question Answering
* Summarization
* Text Classification
* Text Generation
* Translation


In [3]:
from transformers import pipeline

Let's start with looking at the feedback and determining if it is positive or negative. This task is called sentiment analysis.

In [4]:
classifier = pipeline("sentiment-analysis")

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english)


In [5]:
outputs = classifier(text)
pd.DataFrame.from_records(outputs)

Unnamed: 0,label,score
0,NEGATIVE,0.901546


So the feedback looks negative. But what is it about?

We need to determine the *named entities* that are in the feedback text. This is called Named Entity Recognition (NER). We can apply NER by creating another pipeline and feeding our text to it.

In [6]:
ner_tagger = pipeline("ner", aggregation_strategy="simple")
outputs = ner_tagger(text)
pd.DataFrame.from_records(outputs)

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english)


Unnamed: 0,entity_group,score,word,start,end
0,ORG,0.87901,Amazon,5,11
1,MISC,0.990859,Optimus Prime,36,49
2,LOC,0.999755,Germany,90,97
3,MISC,0.55657,Mega,208,212
4,PER,0.590256,##tron,212,216
5,ORG,0.669692,Decept,253,259
6,MISC,0.498349,##icons,259,264
7,MISC,0.775362,Megatron,350,358
8,MISC,0.987854,Optimus Prime,367,380
9,PER,0.812096,Bumblebee,502,511


The pipeline detected the entities and assigned a category. The pipeline also used the "aggregation_strategy" argument to group words according to the model's predictions. So we got "Optimus Prime" assigned as a single entity.

This is useful when trying to extact the subject of feedback, especially in a large corpus of feedback. Sometimes, we want answers to more targeted questions. This is where we can use *question answering*. For now, we will use the default extractive question answering which extracts phrases from the text to answer the posed question.

In [7]:
reader = pipeline("question-answering")
question = "What does the customer want?"
outputs = reader(question=question, context=text)
pd.DataFrame.from_records([outputs])

No model was supplied, defaulted to distilbert-base-cased-distilled-squad (https://huggingface.co/distilbert-base-cased-distilled-squad)


Unnamed: 0,score,start,end,answer
0,0.631292,335,358,an exchange of Megatron


We can also summarize long texts into shorter texts that have the relevant facts. This is a much more complicated task since we want to produce coherent text as output. There is a lot of current research on this topic. 

In [8]:
summarizer = pipeline("summarization")
outputs = summarizer(text, max_length=45, clean_up_tokenization_spaces=True)
print("Summary: ", outputs[0]['summary_text'])

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 (https://huggingface.co/sshleifer/distilbart-cnn-12-6)


Summary:   Bumblebee ordered an Optimus Prime action figure from your online store in Germany. Unfortunately, when I opened the package, I discovered to my horror that I had been sent an action figure of Megatron instead.


Hugging Face also provides a pipeline task to translate from one language to another.

You must also directly specify the model to be used.

Go to [the Hugging Face Models](https://huggingface.co/models) page to find the model you want to use.

In [9]:
import sentencepiece
#translator = pipeline("translation", model="Helsinki-NLP/opus-mt-en-es")
translator = pipeline("translation", model="Helsinki-NLP/opus-mt-en-fr")
outputs = translator(text, clean_up_tokenization_spaces=True, min_length=100)
print("Translation: ", outputs[0]['translation_text'])

Translation:  Cher Amazon, la semaine dernière j'ai commandé une figure d'action Optimus Prime de votre boutique en ligne en Allemagne. Malheureusement, quand j'ai ouvert le paquet, j'ai découvert à mon horreur que j'avais été envoyé une figure d'action de Megatron à la place! En tant qu'ennemi à vie des Decepticons, j'espère que vous pouvez comprendre mon dilemme. Pour résoudre le problème, j'exige un échange de Megatron contre la figure d'Optimus Prime que j'ai commandé.


Now, maybe we want some help generating a response to the customer. We can use the text-generation pipeline. We give the pipeline some initial text and let it generate text that could follow what we have written.

In [10]:
from transformers import set_seed

set_seed(42)
generator = pipeline("text-generation")
response = "Dear Bumblebee, I am sorry to hear that your order was mixed up."
prompt = text + "\n\nCustomer service response:\n" + response
outputs = generator(prompt, max_length=200)
print(outputs[0]['generated_text'])

No model was supplied, defaulted to gpt2 (https://huggingface.co/gpt2)
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Dear Amazon, last week I ordered an Optimus Prime action figure from your online store in Germany. Unfortunately, when I opened the package, I discovered to my horror that I had been sent an action figure of Megatron instead! As a lifelong enemy of the Decepticons, I hope you can understand my dilemma. To resolve the issue, I demand an exchange of Megatron for the Optimus Prime figure I ordered. Enclosed are copies of my records concerning this purchase. I expect to hear from you soon. Sincerely, Bumblebee.

Customer service response:
Dear Bumblebee, I am sorry to hear that your order was mixed up. The order was completely mislabeled, which is very common in our online store, but I can appreciate it because it was my understanding from this site and our customer service of the previous day that your order was not made correct in our mind and that we are in a process of resolving this matter. We can assure you that your order
