<a href="https://colab.research.google.com/github/bhargavyagnik/text_generation/blob/main/text_generation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Text Generation using Transformers library

This work is based on the Huggingface tutorial titled "How to generate text: using different decoding methods for language generation with Transformers" by  [Patrick Von Platen](https://huggingface.co/patrickvonplaten). 

I highly recommend you to check that ! It is quite detailed and very well explained

In [None]:
!pip install -q git+https://github.com/huggingface/transformers.git
!pip install -q tensorflow==2.1

In [None]:
!pip3 install --upgrade tensorflow-gpu

In [1]:
import tensorflow as tf
from transformers import TFGPT2LMHeadModel, GPT2Tokenizer


tokenizer = GPT2Tokenizer.from_pretrained("gpt2")

# add the EOS token as PAD token to avoid warnings
model = TFGPT2LMHeadModel.from_pretrained("gpt2", pad_token_id=tokenizer.eos_token_id)

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1042301.0, style=ProgressStyle(descript…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=456318.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=665.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=497933648.0, style=ProgressStyle(descri…




All model checkpoint layers were used when initializing TFGPT2LMHeadModel.

All the layers of TFGPT2LMHeadModel were initialized from the model checkpoint at gpt2.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFGPT2LMHeadModel for predictions without further training.


### **Greedy Search**

Greedy search simply selects the word with the highest probability as its next word: $w_t = argmax_{w}P(w | w_{1:t-1})$ at each timestep $t$. The following sketch shows greedy search. 

![Greedy Search](https://raw.githubusercontent.com/patrickvonplaten/scientific_images/master/greedy_search.png)

Starting from the word $\text{"The"}$, the algorithm 
greedily chooses the next word of highest probability $\text{"nice"}$ and so on, so that the final generated word sequence is $\text{"The", "nice", "woman"}$ having an overall probability of $0.5 \times 0.4 = 0.2$.

In the following we will generate word sequences using GPT2 on the context $(\text{"I", "enjoy", "walking", "with", "my", "cute", "dog"})$. Let's see how greedy search can be used in `transformers` as follows:

In [6]:
input_ids = tokenizer.encode('I am living in 2020',return_tensors='tf')
greedy_output = model.generate(input_ids, max_length= 50)

print("Output:\n" + 100 * '-')
print(tokenizer.decode(greedy_output[0], skip_special_tokens=True))

Output:
----------------------------------------------------------------------------------------------------
I am living in 2020. I am not going to be able to afford to buy a car. I am not going to be able to afford to buy a house. I am not going to be able to afford to buy a house. I am


In [8]:
greedy_output

<tf.Tensor: shape=(1, 50), dtype=int32, numpy=
array([[   40,   716,  2877,   287, 12131,    13,   314,   716,   407,
         1016,   284,   307,  1498,   284,  5368,   284,  2822,   257,
         1097,    13,   314,   716,   407,  1016,   284,   307,  1498,
          284,  5368,   284,  2822,   257,  2156,    13,   314,   716,
          407,  1016,   284,   307,  1498,   284,  5368,   284,  2822,
          257,  2156,    13,   314,   716]], dtype=int32)>

Alright! We have generated our first short text with GPT2 😊. The generated words following the context are reasonable, but the model quickly starts repeating itself! This is a very common problem in language generation in general and seems to be even more so in greedy and beam search - check out [Vijayakumar et al., 2016](https://arxiv.org/abs/1610.02424) and [Shao et al., 2017](https://arxiv.org/abs/1701.03185).

The major drawback of greedy search though is that it misses high probability words hidden behind a low probability word as can be seen in our sketch above:

The word $\text{"has"}$ with its high conditional probability of $0.9$ is hidden behind the word $\text{"dog"}$, which has only the second-highest conditional probability, so that greedy search misses the word sequence $\text{"The"}, \text{"dog"}, \text{"has"}$.

Thankfully, we have beam search to alleviate this problem!


### **Beam search**

Beam search reduces the risk of missing hidden high probability word sequences by keeping the most likely `num_beams` of hypotheses at each time step and eventually choosing the hypothesis that has the overall highest probability. Let's illustrate with `num_beams=2`:

![Beam search](https://raw.githubusercontent.com/patrickvonplaten/scientific_images/master/beam_search.png)

At time step $1$, besides the most likely hypothesis $\text{"The", "woman"}$, beam search also keeps track of the second most likely one $\text{"The", "dog"}$. At time step $2$, beam search finds that the word sequence $\text{"The", "dog", "has"}$ has with $0.36$ a higher probability than $\text{"The", "nice", "woman"}$, which has $0.2$. Great, it has found the most likely word sequence in our toy example! 

Beam search will always find an output sequence with higher probability than greedy search, but is not guaranteed to find the most likely output. 

Let's see how beam search can be used in `transformers`. We set `num_beams > 1` and `early_stopping=True` so that generation is finished when all beam hypotheses reached the EOS token.

In [12]:
# activate beam search and early_stopping
beam_output = model.generate(
    input_ids,  
    max_length=50, 
    num_beams=5, 
    early_stopping=True
)

print("Output:\n" + 100 * '-')
print(tokenizer.decode(beam_output[0], skip_special_tokens=True))

Output:
----------------------------------------------------------------------------------------------------
I am living in 2020."

"I am living in 2020."

"I am living in 2020."

"I am living in 2020."

"I am living in 2020."

"I am living in 2020


In [13]:
import time
# activate beam search and early_stopping
n , m = 3,7
t=  time.time()
beam_output = model.generate(
    input_ids,  
    max_length=50, 
    num_beams=n, 
    early_stopping=True
)

print("Output:\n" + 100 * '-')
print(tokenizer.decode(beam_output[0], skip_special_tokens=True))
print("Time to run with ",n," Beams ",time.time()-t)

t=time.time()
beam_output = model.generate(
    input_ids,  
    max_length=50, 
    num_beams=m, 
    early_stopping=True
)

print("Output:\n" + 100 * '-')
print(tokenizer.decode(beam_output[0], skip_special_tokens=True))
print("Time to run with ",m," Beams ",time.time()-t)


Output:
----------------------------------------------------------------------------------------------------
I am living in 2020, and I am not going to be able to afford to live in the future," he said.

"I am not going to be able to afford to live in the future. I am not going to be able
Time to run with  3  Beams  22.940566062927246
Output:
----------------------------------------------------------------------------------------------------
I am living in 2020," he said.

"I am living in 2020. I am living in 2020. I am living in 2020. I am living in 2020. I am living in 2020. I am living in 2020. I am
Time to run with  7  Beams  23.795392513275146


While the result is arguably more fluent, the output still includes repetitions of the same word sequences.  
A simple remedy is to introduce *n-grams* (*a.k.a* word sequences of $n$ words) penalties as introduced by [Paulus et al. (2017)](https://arxiv.org/abs/1705.04304) and [Klein et al. (2017)](https://arxiv.org/abs/1701.02810). The most common *n-grams* penalty makes sure that no *n-gram* appears twice by manually setting the probability of next words that could create an already seen *n-gram* to $0$.

Let's try it out by setting `no_repeat_ngram_size=2` so that no *2-gram* appears twice:

In [14]:
# set no_repeat_ngram_size to 2
beam_output = model.generate(
    input_ids, 
    max_length=50, 
    num_beams=5, 
    no_repeat_ngram_size=2, 
    early_stopping=True
)

print("Output:\n" + 100 * '-')
print(tokenizer.decode(beam_output[0], skip_special_tokens=True))

Output:
----------------------------------------------------------------------------------------------------
I am living in 2020, and I am not going to be able to afford to live in the future," he said.

"I don't know if I can afford it, but it's not a problem for me. It's a


Nice, that looks much better! We can see that the repetition does not appear anymore. Nevertheless, *n-gram* penalties have to be used with care. An article generated about the city *New York* should not use a *2-gram* penalty or otherwise, the name of the city would only appear once in the whole text!

Another important feature about beam search is that we can compare the top beams after generation and choose the generated beam that fits our purpose best. 

In `transformers`, we simply set the parameter `num_return_sequences` to the number of highest scoring beams that should be returned. Make sure though that `num_return_sequences <= num_beams`!

In [15]:
# set return_num_sequences > 1
beam_outputs = model.generate(
    input_ids, 
    max_length=50, 
    num_beams=5, 
    no_repeat_ngram_size=2, 
    num_return_sequences=5, 
    early_stopping=True
)

# now we have 3 output sequences
print("Output:\n" + 100 * '-')
for i, beam_output in enumerate(beam_outputs):
  print("{}: {}".format(i, tokenizer.decode(beam_output, skip_special_tokens=True)))

Output:
----------------------------------------------------------------------------------------------------
0: I am living in 2020, and I am not going to be able to afford to live in the future," he said.

"I don't know if I can afford it, but it's not a problem for me. It's a
1: I am living in 2020, and I am not going to be able to afford to live in the future," he said.

"I don't know if I can afford it, but it's not a problem for me. I'm not
2: I am living in 2020, and I am not going to be able to afford to live in the future," he said.

"I don't know if I can afford it, but it's not a problem for me. It's just
3: I am living in 2020, and I am not going to be able to afford to live in the future," he said.

"I don't know if I can afford it, but it's not a problem for me. I'm going
4: I am living in 2020, and I am not going to be able to afford to live in the future," he said.

"I don't know if I can afford it, but it's not a problem for me. I have a
