## text_generation using GPT2

### Attention is All you Need ⚠️

We leveraged a basic RNN based network to generate text in the previous notebook. To enhance performance of **sequence to sequence** tasks a typical **Encoder-Decoder** architecture is the go-to choice.

<img src="illustrations/encoder_decoder.png">

Let us consider the case of Machine Translation, i.e. translation of English to Spanish (or any other language). In a typical **Encoder-Decoder** architecture, the _Encoder_ takes in the input text in English as input and prepares a condensed vector representation of the whole input. Typically termed as _bottleneck_ features. The _Decoder_ then uses these features to generate the translated text in Spanish.

While this architecture and its variants worked wonders, they had issues. Issues such as inability handle longer input sequences, cases where there is not a one to one mapping between input vs output language and so on. 


To handle these issues, Vasvani et. al. in their now famouly titled paper [Attention Is All You Need](https://arxiv.org/abs/1706.03762) build up on the concepts of attention. Attention was introduced in the works of [Bahdanau et. al.](https://arxiv.org/abs/1409.0473) to handle the task of machine translation. The main highlight of this work was the **Transformer** architecture. **Transformers** were shown to present state of the art results on multiple benchmarks without using any recurrence or convolutional components.
<img src="illustrations/transformer.png" width="400">


The concept of **Attention** is a simple yet important one. In layman terms, it helps the model focus on not just the current input but also determine specific pieces of information from the past. This helps in models which are able to handle long range dependencies along with scenarios where there is not a one to one mapping between inputs and outputs. The following is a sample illustration from the paper demonstrating the focus/attention of the model on the words when _making_ is the input.

<img src="illustrations/attention.png" width="400">

## Transformers

The transformer architecture presented by [Vasvani et al](https://arxiv.org/abs/1706.03762) was just the begining. With recurrent components out of scope, researchers explored more complex architectures. The following figure shows different architectures and their number of parameters:
<img src="illustrations/nlp_models.png">
[Source](https://miro.medium.com/max/2070/1*IFVX74cEe8U5D1GveL1uZA.png)

## Hugging Face 🤗

> On a mission to solve NLP,
one commit at a time.

As their tagline explains, they are helping solve NLP problems. While the transformer revolution changed things for language related tasks, using them was not a simple thing. With number of parameters running into billions, these models were out of reach for most researchers and application developers.

Hugging Face changed the scene by developing the ```transformers``` package. The ```transformers``` package is a one stop shop for all your transformer type architectures. This package provides standard interfaces to handle tasks such as ```tokenization```, ```decoding```, ```fine tuning```, ```vectorization``` and so on. The package supports ```pytorch``` and ```tensorflow``` backends for ease of use.


Let us get started with our quick handson session with Hugging Face ```transformers```

### Install Transformers

In [1]:
!pip install transformers

Collecting transformers
  Downloading transformers-2.9.1-py3-none-any.whl (641 kB)
[K     |████████████████████████████████| 641 kB 543 kB/s eta 0:00:01     |███████▋                        | 153 kB 543 kB/s eta 0:00:01
[?25hCollecting regex!=2019.12.17
  Downloading regex-2020.5.14.tar.gz (696 kB)
[K     |████████████████████████████████| 696 kB 1.5 MB/s eta 0:00:01
Collecting sentencepiece
  Downloading sentencepiece-0.1.90-cp37-cp37m-macosx_10_6_x86_64.whl (1.1 MB)
[K     |████████████████████████████████| 1.1 MB 360 kB/s eta 0:00:01
[?25hCollecting tokenizers==0.7.0
  Downloading tokenizers-0.7.0-cp37-cp37m-macosx_10_10_x86_64.whl (1.2 MB)
[K     |████████████████████████████████| 1.2 MB 352 kB/s eta 0:00:01     |████▊                           | 174 kB 245 kB/s eta 0:00:05
[?25hCollecting sacremoses
  Downloading sacremoses-0.0.43.tar.gz (883 kB)
[K     |████████████████████████████████| 883 kB 612 kB/s eta 0:00:01
Building wheels for collected packages: regex, sacremoses


### Import Libraries

In [12]:
import tensorflow as tf
import transformers
from numpy import random
from transformers import (TFGPT2LMHeadModel,
                          GPT2Tokenizer,
                          GPT2Config)

In [5]:
print("tf version={}".format(tf.__version__))
print("huggingface/transformer version={}".format(transformers.__version__))

tf version=2.0.0
huggingface/transformer version=2.9.1


### Model Setup

In [6]:
model_name = "gpt2-medium"
config = GPT2Config.from_pretrained(model_name)
tokenizer = GPT2Tokenizer.from_pretrained(model_name)
model = TFGPT2LMHeadModel.from_pretrained(model_name, config=config)

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=718.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1042301.0, style=ProgressStyle(descript…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=456318.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1419628976.0, style=ProgressStyle(descr…




### Generate Text

In [16]:
# encode context the generation is conditioned on
input_ids = tokenizer.encode('Watson you are', return_tensors='tf')
input_ids

<tf.Tensor: id=392285, shape=(1, 4), dtype=int32, numpy=array([[   54, 13506,   345,   389]], dtype=int32)>

#### Greedy Decoding

In [19]:
# generate text until the output length (which includes the context length) reaches 50
greedy_output = model.generate(input_ids, max_length=20)

print("Output:\n" + 110 * '-')
print(tokenizer.decode(greedy_output[0], skip_special_tokens=True))

Setting `pad_token_id` to 50256 (first `eos_token_id`) to generate sequence


Output:
--------------------------------------------------------------------------------------------------------------
Watson you are a great guy and I hope you are doing well. I am sorry to hear


#### Sampled Decoding

In [18]:
tf.random.set_seed(0)

# Use a combination of decoding techniques
sample_outputs = model.generate(
    input_ids,
    do_sample=True, 
    max_length=20, 
    top_k=50, 
    top_p=0.95, 
    num_return_sequences=3
)

print("Output:\n" + 110 * '-')
for i, sample_output in enumerate(sample_outputs):
    print("{}: {}".format(i, tokenizer.decode(sample_output, skip_special_tokens=True)))
    print("-"*110)

Setting `pad_token_id` to 50256 (first `eos_token_id`) to generate sequence


Output:
--------------------------------------------------------------------------------------------------------------
0: Watson you are in an accident?

Pam: Oh yes, and I would tell
--------------------------------------------------------------------------------------------------------------
1: Watson you are in a great spot if the team wants to sign your kid. He will make
--------------------------------------------------------------------------------------------------------------
2: Watson you are no longer allowed to leave your job but your job does not matter. He tells
--------------------------------------------------------------------------------------------------------------


## References

+ [Illustrated Transformer](http://jalammar.github.io/illustrated-transformer/)