## A Tour of Transformer Applications

In [1]:
text = """Dear Amazon, last week I ordered an Optimus Prime action figure
from your online store in Germany. Unfortunately, when I opened the package,
I discovered to my horror that I had been sent an action figure of Megatron
instead! As a lifelong enemy of the Decepticons, I hope you can understand my
dilemma. To resolve the issue, I demand an exchange of Megatron for the
Optimus Prime figure I ordered. Enclosed are copies of my records concerning
this purchase. I expect to hear from you soon. Sincerely, Bumblebee."""

### Text Classification

Transformers have a layered API that allows you to interact with the library at various levels of abstraction.
In this chapter, we will start with pipelines.

In [3]:
from transformers import pipeline

In [4]:
classifier = pipeline("text-classification")

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


Downloading config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

Downloading tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

Downloading vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

In [6]:
import pandas as pd

outputs= classifier(text)
pd.DataFrame(outputs)

Unnamed: 0,label,score
0,NEGATIVE,0.901546


### Named Entity Recognition

In NLP, real world objects like products, places, and people are called named entites, and extracting them from text is called "Named Entity Recognition" (NER).
We can apply NER by loading the corresponing pipeline and feeding our customer review to it.

In [7]:
ner_tagger = pipeline("ner", aggregation_strategy="simple")
outputs= ner_tagger(text)
pd.DataFrame(outputs)

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision f2482bf (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


Downloading config.json:   0%|          | 0.00/998 [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/1.33G [00:00<?, ?B/s]

Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.weight', 'bert.pooler.dense.bias']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Downloading tokenizer_config.json:   0%|          | 0.00/60.0 [00:00<?, ?B/s]

Downloading vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

Unnamed: 0,entity_group,score,word,start,end
0,ORG,0.87901,Amazon,5,11
1,MISC,0.990859,Optimus Prime,36,49
2,LOC,0.999755,Germany,90,97
3,MISC,0.556567,Mega,208,212
4,PER,0.590256,##tron,212,216
5,ORG,0.669691,Decept,253,259
6,MISC,0.498349,##icons,259,264
7,MISC,0.775361,Megatron,350,358
8,MISC,0.987854,Optimus Prime,367,380
9,PER,0.812097,Bumblebee,502,511


### Question Answering

In question answering we provide the model with a passage of text called the context, along with the question whose answer we would like to extract.The model then returns a span of text corresponding to the answer.

In [None]:
reader= pipeline("question-answering")
question= "what does the customer want"
outputs = reader(question=question, context=text)
pd.DataFrame([outputs])

### Summarisation 

The goal is to take in a text input and generate a short version with all the relevant facts.
this is much more complex than previous tasks as it requires the model to generate coherent text.

In [None]:
summarizer = pipeline("summarization")
outputs = summarizer(text, max_length = 45, clean_up_tokenization_spaces = True)
print(outputs[0]['summary_text'])

### Translation

Like summarization the output from a translation task

In [None]:
aa