# A tour of Transformers applications with Hugging Face

References:
- "Natural Language Processing with Transformers" by Lewis Tunstall, Leandro von Werra and Thomas Wolf.

## Dependencies

In [2]:
%pip install tf-keras
%pip install transformers[torch]
%pip install sentencepiece

Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 24.0 -> 24.1.1
[notice] To update, run: python.exe -m pip install --upgrade pip


Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 24.0 -> 24.1.1
[notice] To update, run: python.exe -m pip install --upgrade pip


Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 24.0 -> 24.1.1
[notice] To update, run: python.exe -m pip install --upgrade pip


Import dependencies, note that some of the imports will download the model weights and configurations.

In [3]:
import pandas as pd
from transformers import pipeline

  from .autonotebook import tqdm as notebook_tqdm


## Application 1:  Text classification

We use a product review as input

In [4]:
text = """Dear Amazon, last week I ordered an Optimus Prime figure from your online store in Germany. Unfortunately, when I opened the package, I discovered to my horror that I had been sent an action figure of Megatron instead! As a lifelong enemy of the Decepticons, I hope you can understand my dilemma. To resolve the issue, I demand an exchange of Megatron for the Optimus Prime figure I ordered. Enclosed are copies of my records concerning this purchase. I expect to hear from you soon. Sincerely, Bumblebee."""
text

'Dear Amazon, last week I ordered an Optimus Prime figure from your online store in Germany. Unfortunately, when I opened the package, I discovered to my horror that I had been sent an action figure of Megatron instead! As a lifelong enemy of the Decepticons, I hope you can understand my dilemma. To resolve the issue, I demand an exchange of Megatron for the Optimus Prime figure I ordered. Enclosed are copies of my records concerning this purchase. I expect to hear from you soon. Sincerely, Bumblebee.'

and we can use the pipeline to summarize the text:

In [5]:
classifier = pipeline("text-classification")
outputs = classifier(text)

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.





Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.


In [6]:
pd.DataFrame(outputs)

Unnamed: 0,label,score
0,NEGATIVE,0.839062


## Application 2: Named Entity Recognition

In [7]:
ner_tagger = pipeline("ner", aggregation_strategy="simple")
outputs = ner_tagger(text)

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision f2482bf (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Hardware accelerator e.g. GPU is

In [8]:
pd.DataFrame(outputs)

Unnamed: 0,entity_group,score,word,start,end
0,ORG,0.862083,Amazon,5,11
1,MISC,0.992903,Optimus Prime,36,49
2,LOC,0.99976,Germany,83,90
3,MISC,0.549242,Mega,201,205
4,PER,0.582798,##tron,205,209
5,ORG,0.671608,Decept,246,252
6,MISC,0.498508,##icons,252,257
7,MISC,0.762837,Megatron,343,351
8,MISC,0.990934,Optimus Prime,360,373
9,PER,0.80459,Bumblebee,495,504


Those weird hash symbols (`#`) in `word` column, are produced by the internal model `subword` tokenization

## Application 3: Question Answering

In [9]:
reader = pipeline("question-answering")
question = "What does the customer want?"
outputs = reader(question=question, context=text)

No model was supplied, defaulted to distilbert/distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.
Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.


In [10]:
pd.DataFrame([outputs])

Unnamed: 0,score,start,end,answer
0,0.665506,328,351,an exchange of Megatron


## Application 4: Summarization

In [11]:
summarizer = pipeline("summarization")
outputs = summarizer(text, max_length=45, clean_up_tokenization_spaces=True)

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.
Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.
Your min_length=56 must be inferior than your max_length=45.


In [12]:
outputs[0]["summary_text"]

' Bumblebee ordered an Optimus Prime figure from your online store in Germany. Unfortunately, when I opened the package, I discovered to my horror that I had been sent an action figure of Megatron instead. As'

# Application 5: Translation

In [13]:
translator = pipeline("translation_en_to_de", model="Helsinki-NLP/opus-mt-en-de")
outputs = translator(text, clean_up_tokenization_spaces=True, min_length=100)

Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.


In [14]:
outputs[0]['translation_text']

'Liebe Amazon, letzte Woche habe ich eine Optimus Prime Figur von Ihrem Online-Shop in Deutschland bestellt. Leider, als ich das Paket öffnete, entdeckte ich zu meinem Entsetzen, dass ich stattdessen eine Actionfigur von Megatron geschickt worden war! Als lebenslanger Feind der Decepticons, Ich hoffe, Sie können mein Dilemma verstehen. Um das Problem zu lösen, Ich fordere einen Austausch von Megatron für die Optimus Prime Figur bestellte ich. Anbei sind Kopien meiner Aufzeichnungen über diesen Kauf. Ich erwarte, von Ihnen bald zu hören. Aufrichtig, Bumblebee.'

## Application 6: Text Generation

In [15]:
generator = pipeline("text-generation")
prompt = text
prompt += "\n\nCustomer service response:\n"
prompt += "Dead Bumblebee, I am sorry to hear that your order was mixed up."
outputs = generator(prompt, max_length=300)

No model was supplied, defaulted to openai-community/gpt2 and revision 6c0e608 (https://huggingface.co/openai-community/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.
Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


complete response (prompt + new generated text)

In [16]:
full_text = outputs[0]['generated_text']
full_text

'Dear Amazon, last week I ordered an Optimus Prime figure from your online store in Germany. Unfortunately, when I opened the package, I discovered to my horror that I had been sent an action figure of Megatron instead! As a lifelong enemy of the Decepticons, I hope you can understand my dilemma. To resolve the issue, I demand an exchange of Megatron for the Optimus Prime figure I ordered. Enclosed are copies of my records concerning this purchase. I expect to hear from you soon. Sincerely, Bumblebee.\n\nCustomer service response:\nDead Bumblebee, I am sorry to hear that your order was mixed up. Please make a note of the order number that comes with your order so that we can all check the status. (Please remember to pay your credit card information with this form.) In the email in case you see any additional "lost" orders, please let me know so I can update you. Thanks.\n\nDelivery issue:\n\nThis company offers the Transformers in various shipping options, but they offer a limited ship

New generated text

In [18]:
full_text[full_text.find("Customer service response:"):]

'Customer service response:\nDead Bumblebee, I am sorry to hear that your order was mixed up. Please make a note of the order number that comes with your order so that we can all check the status. (Please remember to pay your credit card information with this form.) In the email in case you see any additional "lost" orders, please let me know so I can update you. Thanks.\n\nDelivery issue:\n\nThis company offers the Transformers in various shipping options, but they offer a limited shipping fee (1550 - 2500$) for all packages.\n\nWhat is this?\n\nThank you, you are all very welcome and very happy with my purchase. I wish I could have purchased the Transformers from you before and paid you more money for my purchase, but I have been struggling with different shipping options I have not had the chance to try before. As such, I have to give you this little'