#Transformers
## An Intro:
Transformers came into picture in 2017 with a revolutionary paper called "Attention is all  you need!". Transformer arctitecture can solve the old age problem of NLP tasks which includes processing sequential data.
Traditionally, RNNs were a go-to models but it faced the challenges of capturing lon-range dependencies in text due to its sequential nature. Transformers reply on a mechanism known as self-attention, enabling them to weigh the importance of different words. It allows parallelization of computation which makes it efficient from RNNs.

In [1]:
# Uncomment and run this cell if you're on Colab
!git clone https://github.com/nlp-with-transformers/notebooks.git
%cd notebooks
from install import *
install_requirements()

Cloning into 'notebooks'...
remote: Enumerating objects: 526, done.[K
remote: Counting objects: 100% (173/173), done.[K
remote: Compressing objects: 100% (47/47), done.[K
remote: Total 526 (delta 143), reused 135 (delta 126), pack-reused 353[K
Receiving objects: 100% (526/526), 28.62 MiB | 18.21 MiB/s, done.
Resolving deltas: 100% (250/250), done.
/content/notebooks
⏳ Installing base requirements ...
✅ Base requirements installed!
⏳ Installing Git LFS ...
✅ Git LFS installed!


In [2]:
#hide
from utils import *
setup_chapter()

Using transformers v4.16.2
Using datasets v1.16.1


## Transformers Timeline:
1. ULMFiT: Introduced Transfer Learning which showed that LSTMs can be used as state-of-art classifiers with little labelled data using this technique.
2. GPT: Uses only decoder part, was pretrained on BookCorpus which contains 7000 unpublished books.
3. BERT: Uses only Encoder part and masked language modelling (MLM). It was pretrained on BookCorpus and Wikipedia.

<img alt="transformer-timeline" caption="The transformers timeline" src="https://github.com/nlp-with-transformers/notebooks/blob/main/images/chapter01_timeline.png?raw=1" id="transformer-timeline"/>

## The Encoder-Decoder Framework
It consists of two main components: an encoder and a decoder.

1. **Encoder:**
   - The encoder is responsible for processing the input sequence and capturing its essential information.
   - It converts the input sequence into a fixed-dimensional representation or context vector.
   - Commonly, recurrent neural networks (RNNs), long short-term memory networks (LSTMs), or transformers are employed as encoder architectures.

2. **Decoder:**
   - The decoder takes the context vector produced by the encoder and generates the output sequence.
   - It processes the context vector and generates the output step by step, often autoregressively.
   - Like the encoder, RNNs, LSTMs, or transformers can be used as decoder architectures.

3. **Sequence-to-Sequence Tasks:**
   - The encoder-decoder framework is widely used for sequence-to-sequence tasks, where the input and output sequences can have different lengths.
   - Examples of such tasks include machine translation, text summarization, and speech-to-text conversion.

Disadvantages:
1. Creates Information Bottleneck.
2. They are inherently sequential and cannot be parallelized.


<img alt="rnn" caption="Unrolling an RNN in time." src="https://github.com/nlp-with-transformers/notebooks/blob/main/images/chapter01_rnn.png?raw=1" id="rnn"/>

<img alt="enc-dec" caption="Encoder-decoder architecture with a pair of RNNs. In general, there are many more recurrent layers than those shown." src="https://github.com/nlp-with-transformers/notebooks/blob/main/images/chapter01_enc-dec.png?raw=1" id="enc-dec"/>

## Attention Mechanisms:
To improve the model's ability to handle long sequences and capture relevant information, attention mechanisms are often integrated into the encoder-decoder architecture.
Attention allows the model to focus on different parts of the input sequence dynamically during the decoding process.

<img alt="enc-dec-attn" caption="Encoder-decoder architecture with an attention mechanism for a pair of RNNs." src="https://github.com/nlp-with-transformers/notebooks/blob/main/images/chapter01_enc-dec-attn.png?raw=1" id="enc-dec-attn"/>

<img alt="transformer-self-attn" caption="Encoder-decoder architecture of the original Transformer." src="https://github.com/nlp-with-transformers/notebooks/blob/main/images/chapter01_self-attention.png?raw=1" id="transformer-self-attn"/>

## Transfer Learning in NLP:
In NLP, we do not have access to large amount of text for every language or for every task. Therefore, with the help of transfer learning, we can fine tune the model to our specific task without needing to train from scratch.
Architecturally, this involes splitting the model into of a body and a head, where head is a task-specific network. Weights of the body learn the board features of the source domain and these are then used to initialise a new model for new task.

<img alt="transfer-learning" caption="Comparison of traditional supervised learning (left) and transfer learning (right)." src="https://github.com/nlp-with-transformers/notebooks/blob/main/images/chapter01_transfer-learning.png?raw=1" id="transfer-learning"/>  

<img alt="ulmfit" width="500" caption="The ULMFiT process (courtesy of Jeremy Howard)." src="https://github.com/nlp-with-transformers/notebooks/blob/main/images/chapter01_ulmfit.png?raw=1" id="ulmfit"/>

Part 1: Pretraining
language modelling using text sources alike wikipedia.

Part 2: Domain adaptation
adapt the model to in-domain corpus like wikipedia to IMDb corpus.

Step 3: Fine-tuning
language model is fine-tuned with a clssification layer for the specific task.

## Hugging Face Ecosystem:
It involves following tasks:
1. Implement the model architecture in code.
2. Load the pretrained weights.
3. Preprocess the inputs, pass them through the model, and apply some task specific postprocessing.
4. Implememnt dataloaders and define the loss function and optimisers to train the model.



## A Tour of Transformer Applications

In [3]:
text = """Dear Amazon, last week I ordered an Optimus Prime action figure \
from your online store in Germany. Unfortunately, when I opened the package, \
I discovered to my horror that I had been sent an action figure of Megatron \
instead! As a lifelong enemy of the Decepticons, I hope you can understand my \
dilemma. To resolve the issue, I demand an exchange of Megatron for the \
Optimus Prime figure I ordered. Enclosed are copies of my records concerning \
this purchase. I expect to hear from you soon. Sincerely, Bumblebee."""

### Text Classification

In [4]:
#hide_output
from transformers import pipeline

classifier = pipeline("text-classification")

Downloading:   0%|          | 0.00/629 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/255M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/226k [00:00<?, ?B/s]

In [5]:
import pandas as pd

outputs = classifier(text)
pd.DataFrame(outputs)

Unnamed: 0,label,score
0,NEGATIVE,0.901546


In [6]:
#Lets see more use examples
# Example 1: Positive sentiment
text1 = "I absolutely loved the movie, it was so heartwarming and entertaining!"

# Example 2: Negative sentiment
text2 = "The service at the restaurant was terrible, and the food was cold."

# Example 3: Neutral sentiment
text3 = "The weather today is quite mild, not too hot nor too cold."

# Classify sentiments for each example
outputs1 = classifier(text1)
outputs2 = classifier(text2)
outputs3 = classifier(text3)

# Create a DataFrame to display the results
df = pd.DataFrame([outputs1[0], outputs2[0], outputs3[0]])

# Print the DataFrame
print(df)


      label     score
0  POSITIVE  0.999891
1  NEGATIVE  0.999731
2  POSITIVE  0.993080


It classified neutral comment as positive.

### Named Entity Recognition:
In NLP, real-world objects like products, places and people are called as named entities and extracting them from text is called as named-entity recognition.

In [7]:
ner_tagger = pipeline("ner", aggregation_strategy="simple")
outputs = ner_tagger(text)
pd.DataFrame(outputs)

Downloading:   0%|          | 0.00/998 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.24G [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/60.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/208k [00:00<?, ?B/s]

Unnamed: 0,entity_group,score,word,start,end
0,ORG,0.87901,Amazon,5,11
1,MISC,0.990859,Optimus Prime,36,49
2,LOC,0.999755,Germany,90,97
3,MISC,0.556571,Mega,208,212
4,PER,0.590256,##tron,212,216
5,ORG,0.669692,Decept,253,259
6,MISC,0.498349,##icons,259,264
7,MISC,0.775362,Megatron,350,358
8,MISC,0.987854,Optimus Prime,367,380
9,PER,0.812096,Bumblebee,502,511


### Question Answering:
We have to pass an argument called as context which basically the corpus from which we want the text. The pipeline also returns start and end integers that corresponds to the character indices. This tyoe of QnA is called as extractive question answering.

In [8]:
reader = pipeline("question-answering")
question = "What does the customer want?"
outputs = reader(question=question, context=text)
pd.DataFrame([outputs])

Downloading:   0%|          | 0.00/473 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/249M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/29.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/208k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/426k [00:00<?, ?B/s]

Unnamed: 0,score,start,end,answer
0,0.631292,335,358,an exchange of Megatron


Gave the right answer. lets try another one.

In [13]:
question = "Why is the customer upset?"
outputs = reader(question=question, context=text)
pd.DataFrame([outputs])

Unnamed: 0,score,start,end,answer
0,0.085511,231,264,lifelong enemy of the Decepticons


Not a good one with low score of confidence.

### Summarization:
To summarize the long text into shorter one.

In [9]:
summarizer = pipeline("summarization")
outputs = summarizer(text, max_length=45, clean_up_tokenization_spaces=True)
print(outputs[0]['summary_text'])

Downloading:   0%|          | 0.00/1.76k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.14G [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/878k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/446k [00:00<?, ?B/s]

 Bumblebee ordered an Optimus Prime action figure from your online store in
Germany. Unfortunately, when I opened the package, I discovered to my horror
that I had been sent an action figure of Megatron instead.


### Translation:
For translating one language to another. Model name has to be provided.

In [10]:
translator = pipeline("translation_en_to_de",
                      model="Helsinki-NLP/opus-mt-en-de")
outputs = translator(text, clean_up_tokenization_spaces=True, min_length=100)
print(outputs[0]['translation_text'])

Downloading:   0%|          | 0.00/1.30k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/284M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/42.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/750k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/778k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.21M [00:00<?, ?B/s]

Sehr geehrter Amazon, letzte Woche habe ich eine Optimus Prime Action Figur aus
Ihrem Online-Shop in Deutschland bestellt. Leider, als ich das Paket öffnete,
entdeckte ich zu meinem Entsetzen, dass ich stattdessen eine Action Figur von
Megatron geschickt worden war! Als lebenslanger Feind der Decepticons, Ich
hoffe, Sie können mein Dilemma verstehen. Um das Problem zu lösen, Ich fordere
einen Austausch von Megatron für die Optimus Prime Figur habe ich bestellt.
Anbei sind Kopien meiner Aufzeichnungen über diesen Kauf. Ich erwarte, bald von
Ihnen zu hören. Aufrichtig, Bumblebee.


### Text Generation:
can be used in cases like customer service.

In [11]:
#hide
from transformers import set_seed
set_seed(42) # Set the seed to get reproducible results

In [12]:
generator = pipeline("text-generation")
response = "Dear Bumblebee, I am sorry to hear that your order was mixed up."
prompt = text + "\n\nCustomer service response:\n" + response
outputs = generator(prompt, max_length=200)
print(outputs[0]['generated_text'])

Downloading:   0%|          | 0.00/665 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/523M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/0.99M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/446k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.29M [00:00<?, ?B/s]

Dear Amazon, last week I ordered an Optimus Prime action figure from your online
store in Germany. Unfortunately, when I opened the package, I discovered to my
horror that I had been sent an action figure of Megatron instead! As a lifelong
enemy of the Decepticons, I hope you can understand my dilemma. To resolve the
issue, I demand an exchange of Megatron for the Optimus Prime figure I ordered.
Enclosed are copies of my records concerning this purchase. I expect to hear
from you soon. Sincerely, Bumblebee.

Customer service response:
Dear Bumblebee, I am sorry to hear that your order was mixed up. I am satisfied
with your purchase. I will gladly see your product and the packaging again
before you go out buying at your shop! I'm a small business owner and have not
been able to deliver my product to customer before and after an Amazon purchase,
so I apologize.

Customer service response:

All of my orders arrived


## The Hugging Face Ecosystem

<img alt="ecosystem" width="500" caption="An overview of the Hugging Face ecosystem of libraries and the Hub." src="https://github.com/nlp-with-transformers/notebooks/blob/main/images/chapter01_hf-ecosystem.png?raw=1" id="ecosystem"/>

## Main Challenges with Transformers:
1. Language: It is harder to find pre-trained models for rare and low-resource language.
2. Data availability: The data needed to perform the task is still a lot more than it is compromised through transfer learning.
3. Working with long documents: Self attention sometimes becomes expensive when we move to longer text like whole documents.
4. Opacity: The inner working of the model is not known, so deploying it to make important decision becomes difficult.
5. Bias: Since it is trained on internet data, all kinds of bias is introduced. Removing it becomes a difficult task.