In [None]:
# Uncomment and run this cell if you're on Colab or Kaggle
# !git clone https://github.com/nlp-with-transformers/notebooks.git
# %cd notebooks
# from install import *
# install_requirements()

In [1]:
#hide
from utils import *
setup_chapter()

Using transformers v4.16.2
Using datasets v1.16.1


# Hello Transformers

<img alt="transformer-timeline" caption="The transformers timeline" src="images/chapter01_timeline.png" id="transformer-timeline"/>

## The Encoder-Decoder Framework

<img alt="rnn" caption="Unrolling an RNN in time." src="images/chapter01_rnn.png" id="rnn"/>

<img alt="enc-dec" caption="Encoder-decoder architecture with a pair of RNNs. In general, there are many more recurrent layers than those shown." src="images/chapter01_enc-dec.png" id="enc-dec"/>

## Attention Mechanisms

<img alt="enc-dec-attn" caption="Encoder-decoder architecture with an attention mechanism for a pair of RNNs." src="images/chapter01_enc-dec-attn.png" id="enc-dec-attn"/> 

<img alt="attention-alignment" width="500" caption="RNN encoder-decoder alignment of words in English and the generated translation in French (courtesy of Dzmitry Bahdanau)." src="images/chapter02_attention-alignment.png" id="attention-alignment"/> 

<img alt="transformer-self-attn" caption="Encoder-decoder architecture of the original Transformer." src="images/chapter01_self-attention.png" id="transformer-self-attn"/> 

## Transfer Learning in NLP

<img alt="transfer-learning" caption="Comparison of traditional supervised learning (left) and transfer learning (right)." src="images/chapter01_transfer-learning.png" id="transfer-learning"/>  

<img alt="ulmfit" width="500" caption="The ULMFiT process (courtesy of Jeremy Howard)." src="images/chapter01_ulmfit.png" id="ulmfit"/>

## Hugging Face Transformers: Bridging the Gap

## A Tour of Transformer Applications

In [2]:
text = """Dear Amazon, last week I ordered an Optimus Prime action figure \
from your online store in Germany. Unfortunately, when I opened the package, \
I discovered to my horror that I had been sent an action figure of Megatron \
instead! As a lifelong enemy of the Decepticons, I hope you can understand my \
dilemma. To resolve the issue, I demand an exchange of Megatron for the \
Optimus Prime figure I ordered. Enclosed are copies of my records concerning \
this purchase. I expect to hear from you soon. Sincerely, Bumblebee."""

### Text Classification

In [3]:
#hide_output
from transformers import pipeline

classifier = pipeline("text-classification")

2024-04-24 15:43:45.959554: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-04-24 15:43:45.982983: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


**DJH:** You can specify the text-classification model or leave it to the default value "DistilBertModel". Check out the list of [available text classification](https://huggingface.co/models?pipeline_tag=text-classification&sort=trending) models on Huggingface for other options. the pipeline function provides a powerful layer of abstraction from the underlying transformer models. In two lines of code we are able to do meaningful and useful work. Create a _pipeline_ for a particular operation i.e. "text summarisation", then use it for inference.

In [2]:
import pandas as pd

outputs = classifier(text)
pd.DataFrame(outputs)    

NameError: name 'text' is not defined

**DJH:** Let's try the "j-hartmann/emotion-english-distilroberta-base" model. The outputs classes will almost certainly be different. The classes are a property of how the data in the training set was labelled.

In [6]:
classifier = pipeline("text-classification", model="j-hartmann/emotion-english-distilroberta-base")
outputs = classifier(text)
pd.DataFrame(outputs)

Unnamed: 0,label,score
0,fear,0.706279


### Named Entity Recognition

In [8]:
ner_tagger = pipeline("ner", aggregation_strategy="simple")
outputs = ner_tagger(text)
pd.DataFrame(outputs)
# print(ner_tagger.model)

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


Unnamed: 0,entity_group,score,word,start,end
0,ORG,0.879011,Amazon,5,11
1,MISC,0.990859,Optimus Prime,36,49
2,LOC,0.999755,Germany,90,97
3,MISC,0.556571,Mega,208,212
4,PER,0.590255,##tron,212,216
5,ORG,0.669692,Decept,253,259
6,MISC,0.498349,##icons,259,264
7,MISC,0.775362,Megatron,350,358
8,MISC,0.987854,Optimus Prime,367,380
9,PER,0.812096,Bumblebee,502,511


**DJH:** let's try a different entity recognition model distilbert-base-uncased instead of default dbmdz/bert-large-cased-finetuned-conll03-english. There's no expectation this will be better. I'm simply exploring different availabe options.

In [6]:
ner_tagger = pipeline("ner", model="distilbert/distilbert-base-uncased", aggregation_strategy="simple")
outputs = ner_tagger(text)
pd.DataFrame(outputs)
# print(ner_tagger.model)

Some weights of the model checkpoint at distilbert/distilbert-base-uncased were not used when initializing DistilBertForTokenClassification: ['vocab_projector.weight', 'vocab_transform.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_transform.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of DistilBertForTokenClassification were not initialized from the model checkpoint at distilbert/distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You s

Unnamed: 0,entity_group,score,word,start,end
0,LABEL_0,0.549595,"dear amazon, last week i ordered an",0,35
1,LABEL_1,0.600996,optimus prime,36,49
2,LABEL_0,0.620313,action figure,50,63
3,LABEL_1,0.509519,from,64,68
4,LABEL_0,0.594764,your online store in germany,69,97
5,LABEL_1,0.502908,.,97,98
6,LABEL_0,0.556446,"unfortunately, when i opened the package, i di...",99,177
7,LABEL_1,0.505895,been,178,182
8,LABEL_0,0.573529,sent an action figure of,183,207
9,LABEL_1,0.568135,megatron,208,216


**DJH:** The default transformer model is more useful. The classes are more meaningul and classifications typically havce greater levels of confidence.

### Question Answering 

In [9]:
reader = pipeline("question-answering")
question = "What does the customer want?"
outputs = reader(question=question, context=text)
pd.DataFrame([outputs])    

No model was supplied, defaulted to distilbert-base-cased-distilled-squad (https://huggingface.co/distilbert-base-cased-distilled-squad)


Unnamed: 0,score,start,end,answer
0,0.631292,335,358,an exchange of Megatron


In [13]:
question = "What is the customer's name?"
outputs = reader(question=question, context=text)
pd.DataFrame([outputs]) 

Unnamed: 0,score,start,end,answer
0,0.159335,502,511,Bumblebee


### Summarization

In [17]:
summarizer = pipeline("summarization")
outputs = summarizer(text, max_length=12, clean_up_tokenization_spaces=True)
print(outputs[0]['summary_text'])

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 (https://huggingface.co/sshleifer/distilbart-cnn-12-6)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Your min_length=56 must be inferior than your max_length=12.


 Bumblebee demands an exchange of Megatron


### Translation

In [8]:
translator = pipeline("translation_en_to_de", 
                      model="Helsinki-NLP/opus-mt-en-de")
outputs = translator(text, clean_up_tokenization_spaces=True, min_length=100)
print(outputs[0]['translation_text'])

Downloading:   0%|          | 0.00/1.30k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/284M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/42.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/750k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/778k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.21M [00:00<?, ?B/s]

Sehr geehrter Amazon, letzte Woche habe ich eine Optimus Prime Action Figur aus
Ihrem Online-Shop in Deutschland bestellt. Leider, als ich das Paket öffnete,
entdeckte ich zu meinem Entsetzen, dass ich stattdessen eine Action Figur von
Megatron geschickt worden war! Als lebenslanger Feind der Decepticons, Ich
hoffe, Sie können mein Dilemma verstehen. Um das Problem zu lösen, Ich fordere
einen Austausch von Megatron für die Optimus Prime Figur habe ich bestellt.
Anbei sind Kopien meiner Aufzeichnungen über diesen Kauf. Ich erwarte, bald von
Ihnen zu hören. Aufrichtig, Bumblebee.


### Text Generation

In [18]:
#hide
from transformers import set_seed
set_seed(42) # Set the seed to get reproducible results

In [19]:
generator = pipeline("text-generation")
response = "Dear Bumblebee, I am sorry to hear that your order was mixed up."
prompt = text + "\n\nCustomer service response:\n" + response
outputs = generator(prompt, max_length=200)
print(outputs[0]['generated_text'])

No model was supplied, defaulted to gpt2 (https://huggingface.co/gpt2)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Dear Amazon, last week I ordered an Optimus Prime action figure from your online store in Germany. Unfortunately, when I opened the package, I discovered to my horror that I had been sent an action figure of Megatron instead! As a lifelong enemy of the Decepticons, I hope you can understand my dilemma. To resolve the issue, I demand an exchange of Megatron for the Optimus Prime figure I ordered. Enclosed are copies of my records concerning this purchase. I expect to hear from you soon. Sincerely, Bumblebee.

Customer service response:
Dear Bumblebee, I am sorry to hear that your order was mixed up. I am satisfied with your purchase. I will gladly see your product and the packaging again before you go out buying at your shop! I'm a small business owner and have not been able to deliver my product to customer before and after an Amazon purchase, so I apologize.

Customer service response:

All of my orders arrived


**DJH:** The default option is GPT2 so it's unsurprising that it performed relatively poorly. Let's try Mistral or Llama 3 instead.

In [6]:
generator = pipeline("text-generation", model="openai-community/roberta-large-openai-detector")
response = "Dear Bumblebee, I am sorry to hear that your order was mixed up."
prompt = text + "\n\nCustomer service response:\n" + response
outputs = generator(prompt, max_length=200)
print(outputs[0]['generated_text'])

Downloading:   0%|          | 0.00/519 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.33G [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/25.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/878k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/446k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.29M [00:00<?, ?B/s]

Dear Amazon, last week I ordered an Optimus Prime action figure from your online
store in Germany. Unfortunately, when I opened the package, I discovered to my
horror that I had been sent an action figure of Megatron instead! As a lifelong
enemy of the Decepticons, I hope you can understand my dilemma. To resolve the
issue, I demand an exchange of Megatron for the Optimus Prime figure I ordered.
Enclosed are copies of my records concerning this purchase. I expect to hear
from you soon. Sincerely, Bumblebee.

Customer service response:
Dear Bumblebee, I am sorry to hear that your order was mixed up. robbed robbed
robbed robbed robbed robbed robbed robbed robbed robbed robbed smartest smartest
robbed smartestIndividual robbedIndividualIndividual smartestIndividual
Individual Individualurs Individualurs dispers smartest smartesturs smartest
Brend smartest Brend smartest Brend smartest Brend smartest Brend smartest Brend
smartest Brend smartest Brend smartest Brend smartest Brend smartest 

**DJH:** Note: It appears the authentication with Huggingface is required for the majority of the published models. For example, attempting to download `mistralai/Mistral-7B-v0.1` fails with an authentication error message. I would expect Mistral and the recently released llama 3 to perform much, much better as this text generation task. The model `openai-community/roberta-large-openai-detector` produces a completely unusable email body.

Here's what llama3 can produce using ollama/webui:

```
Dear Bumblebee,
 
Thank you for reaching out to us about the issue with your action figure order. We apologize for the mistake and understand how disappointing it must be for a lifelong enemy of the Decepticons like yourself.
 
We have located your original order and are happy to assist you in resolving this matter. However, we do need to clarify that Megatron is actually a popular character among many Transformers fans, and we're sure he's got his own loyal following out there!
 
That being said, we are more than happy to exchange the Megatron action figure for the Optimus Prime one you originally ordered. Please allow us 3-5 business days to process the return and send out the correct item.
 
As a token of our apology and appreciation for your understanding, we would like to offer you a complimentary gift with your replacement order. We will be sending you a set of exclusive Transformers trading cards featuring both Optimus Prime and Megatron!
 
Thank you again for bringing this issue to our attention, and we look forward to getting the correct action figure into your hands soon.
 
Best regards,
The Amazon Customer Service Team
```



**DJH:** That's a 10/10 for Llama3 and 3/10 for GPT2. Llama3 is a new 8B parameter model vs GPT2 at 1.5B. 

## The Hugging Face Ecosystem

<img alt="ecosystem" width="500" caption="An overview of the Hugging Face ecosystem of libraries and the Hub." src="images/chapter01_hf-ecosystem.png" id="ecosystem"/>

### The Hugging Face Hub

<img alt="hub-overview" width="1000" caption="The models page of the Hugging Face Hub, showing filters on the left and a list of models on the right." src="images/chapter01_hub-overview.png" id="hub-overview"/> 

<img alt="hub-model-card" width="1000" caption="A example model card from the Hugging Face Hub. The inference widget is shown on the right, where you can interact with the model." src="images/chapter01_hub-model-card.png" id="hub-model-card"/> 

### Hugging Face Tokenizers

### Hugging Face Datasets

### Hugging Face Accelerate

## Main Challenges with Transformers

## Conclusion