<a href="https://colab.research.google.com/github/Kaif10/NLP-with-HuggingFace/blob/main/Text_generation_with_GPT2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

I have recently decided to explore the ins and outs of the 😊 Transformers library and this is the next chapter in that journey. In this notebook, I will explore text generation using a GPT-2 model, which was trained to predict next words on 40GB of Internet text data. The fully trained model is actually not available as the creators were concerned about 'malicious applications of the technology', but there is a much smaller version that is available for enthusiants to play with, which we will use here

In this notebook, we will explore different decoding methods like Top-K sampling, and Top-P sampling, demonstrating their performance along the way. 



## Intro
I. Intro
A language model is a machine learning model that can look at part of a sentence and predict the next word/sequence of words. Much like the autofill features on your iPhone/Android, GPT-2 is capable of next word prediction on a much larger and more sophisticated scale. For reference, the smallest available GPT-2 has 117 million parameters, whereas the largest one (invisible to the public) has over 1.5 billion parameters. The largest one available for public use is half the size of their main GPT-2 model

😊 Transformers makes it very easy to import this model with both PyTorch and TensorFlow - in this notebook we will be using TensorFlow. Both the model and its Tokenizer can be imported from the transformers library that anyone can get by typing !pip install transformers. Let's see just how simple it is to generate text with a neural network. 

In [2]:
!pip install transformers

Collecting transformers
[?25l  Downloading https://files.pythonhosted.org/packages/ed/db/98c3ea1a78190dac41c0127a063abf92bd01b4b0b6970a6db1c2f5b66fa0/transformers-4.0.1-py3-none-any.whl (1.4MB)
[K     |▎                               | 10kB 25.1MB/s eta 0:00:01[K     |▌                               | 20kB 33.6MB/s eta 0:00:01[K     |▊                               | 30kB 21.5MB/s eta 0:00:01[K     |█                               | 40kB 25.4MB/s eta 0:00:01[K     |█▏                              | 51kB 24.3MB/s eta 0:00:01[K     |█▌                              | 61kB 26.0MB/s eta 0:00:01[K     |█▊                              | 71kB 18.0MB/s eta 0:00:01[K     |██                              | 81kB 19.4MB/s eta 0:00:01[K     |██▏                             | 92kB 18.0MB/s eta 0:00:01[K     |██▍                             | 102kB 18.1MB/s eta 0:00:01[K     |██▋                             | 112kB 18.1MB/s eta 0:00:01[K     |███                             | 

### We will choose the largest available GPT-2 model but it is easy to install the other sizes if you want to mess around with them:

In [3]:
#get transformers
from transformers import TFGPT2LMHeadModel, GPT2Tokenizer

#get large GPT2 tokenizer and GPT2 model
tokenizer = GPT2Tokenizer.from_pretrained("gpt2-large")
GPT2 = TFGPT2LMHeadModel.from_pretrained("gpt2-large", pad_token_id=tokenizer.eos_token_id)

#tokenizer = GPT2Tokenizer.from_pretrained("gpt2-medium")
#GPT2 = TFGPT2LMHeadModel.from_pretrained("gpt2-medium", pad_token_id=tokenizer.eos_token_id)

#tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
#GPT2 = TFGPT2LMHeadModel.from_pretrained("gpt2", pad_token_id=tokenizer.eos_token_id)

#view model parameters
GPT2.summary()

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1042301.0, style=ProgressStyle(descript…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=456318.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=764.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=3096618024.0, style=ProgressStyle(descr…




All model checkpoint layers were used when initializing TFGPT2LMHeadModel.

All the layers of TFGPT2LMHeadModel were initialized from the model checkpoint at gpt2-large.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFGPT2LMHeadModel for predictions without further training.


Model: "tfgp_t2lm_head_model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
transformer (TFGPT2MainLayer multiple                  774030080 
Total params: 774,030,080
Trainable params: 774,030,080
Non-trainable params: 0
_________________________________________________________________


In [4]:
#This is our starting text and we will use GPT2 model to generate new ideas continuing the below sentence

input_sequence = "They both fell in love"

In [5]:
#for reproducability
SEED = 34

#maximum number of words in output text
MAX_LEN = 70

In [6]:
#get deep learning basics
import tensorflow as tf
tf.random.set_seed(SEED)

In [7]:
input_ids = tokenizer.encode(input_sequence, return_tensors='tf')

# generate text until the output length (which includes the context length) reaches 50
greedy_output = GPT2.generate(input_ids, max_length = MAX_LEN)

print("Output:\n" + 100 * '-')
print(tokenizer.decode(greedy_output[0], skip_special_tokens = True))

Output:
----------------------------------------------------------------------------------------------------
They both fell in love with the same girl, and they were both in love with the same guy. They both fell in love with the same girl, and they were both in love with the same guy. They both fell in love with the same girl, and they were both in love with the same guy. They both fell in love with the


In [8]:
# set return_num_sequences > 1
beam_outputs = GPT2.generate(
    input_ids, 
    max_length = MAX_LEN, 
    num_beams = 5, 
    no_repeat_ngram_size = 2, 
    num_return_sequences = 5, 
    early_stopping = True
)

print('')
print("Output:\n" + 100 * '-')

# now we have 3 output sequences
for i, beam_output in enumerate(beam_outputs):
      print("{}: {}".format(i, tokenizer.decode(beam_output, skip_special_tokens=True)))


Output:
----------------------------------------------------------------------------------------------------
0: They both fell in love with each other, and they were married. They had two children, a boy and a girl.

"They were very happy together," she said. "They had a great life together."
1: They both fell in love with each other, and they were married. They had two children, a boy and a girl.

"They were very happy together," he said. "They had a great life together."
2: They both fell in love with each other, and they were married. They had two children, a boy and a girl.

"They were very happy together," she said. "They had a great life together."


The couple had been living together for about a year and were planning to move in together when they got the news that
3: They both fell in love with each other, and they were married. They had two children, a boy and a girl.

"They were very happy together," she said. "They had a great life together."


The couple had been living t

In [9]:
# use temperature to decrease the sensitivity to low probability candidates
sample_output = GPT2.generate(
                             input_ids, 
                             do_sample = True, 
                             max_length = MAX_LEN, 
                             top_k = 0, 
                             temperature = 0.8
)

print("Output:\n" + 100 * '-')
print(tokenizer.decode(sample_output[0], skip_special_tokens = True))

Output:
----------------------------------------------------------------------------------------------------
They both fell in love. He gave her a hand-made book, while she brought him the book of poetry. He is the kind of man who reads the books of the poets, doesn't he? You know, a lot of the time it's for young girls. And i'm telling you, i don't have a right to give


Top-K Sampling
In Top-K sampling, the top k most likely next words are selected and the entire probability mass is shifted to these k words. So instead of increasing the chances of high probability words occuring and decreasing the chances of low probabillity words, we just remove low probability words all together

We just need to set top_k to however many of the top words we want to consider for our conditional probability distribution:

In [10]:
#sample from only top_k most likely words
sample_output = GPT2.generate(
                             input_ids, 
                             do_sample = True, 
                             max_length = MAX_LEN, 
                             top_k = 50
)

print("Output:\n" + 100 * '-')
print(tokenizer.decode(sample_output[0], skip_special_tokens = True), '...')

Output:
----------------------------------------------------------------------------------------------------
They both fell in love and started a family. The next year, the father of their child, his best friend, was killed in a car accident…

And since then, when people ask me how that story got back into the public conversation, I don't feel like I can really tell… I feel like it's more about people trying ...


### Top-P Sampling
Top-P sampling (also known as nucleus sampling) is similar to Top-K, but instead of choosing the top k most likely wordsm we choose the smallest set of words whose total probability is larger than p, and then the entire probability mass is shifted to the words in this set

The main difference here is that with Top-K sampling, the size of the set of words is static (obviously) whereas in Top-P sampling, the size of the set can change. To use this sampling method, we just set top_k = 0 and choose a value top_p:

In [11]:
#sample only from 80% most likely words
sample_output = GPT2.generate(
                             input_ids, 
                             do_sample = True, 
                             max_length = MAX_LEN, 
                             top_p = 0.8, 
                             top_k = 0
)

print("Output:\n" + 100 * '-')
print(tokenizer.decode(sample_output[0], skip_special_tokens = True), '...')

Output:
----------------------------------------------------------------------------------------------------
They both fell in love with it," Linta told the Associated Press. "She said, 'I want to see if it's a relationship.' "

The couple met while attending Dartmouth College in New Hampshire, where Linta was a freshman. They eventually moved to Chicago, where the couple would stay together for the next three years ...


#### Combining both Top-K and Top-P sampling for best results

In [12]:
#combine both sampling techniques
sample_outputs = GPT2.generate(
                              input_ids,
                              do_sample = True, 
                              max_length = 2*MAX_LEN,                              #to test how long we can generate and it be coherent
                              #temperature = .7,
                              top_k = 50, 
                              top_p = 0.85, 
                              num_return_sequences = 5
)

print("Output:\n" + 100 * '-')
for i, sample_output in enumerate(sample_outputs):
    print("{}: {}...".format(i, tokenizer.decode(sample_output, skip_special_tokens = True)))
    print('')

Output:
----------------------------------------------------------------------------------------------------
0: They both fell in love with the game, so when we decided to do something about it we just took it to the next level.

"I'm so happy I got to play with her. I'm so excited to play with her. I feel like she's an athlete and a natural performer. She's the one who's going to be there on the field and on the screen with me."

Hood will play on Saturday when the Vikings host the Chicago Bears at U.S. Bank Stadium in Minneapolis. She has been preparing for the occasion ever since the Vikings placed her on waivers this week.

"I've been practicing a little bit since I was released on...

1: They both fell in love, and they married," she said. "It was a happy marriage, and they had three children together."

And then there was the boy, the one who was the first one to learn the truth about his mother's death.

"I don't know why, but I found out about my mother's death after I had been

### Can we use GPT-2 to do our homework?
Lets try

In [18]:
#max limit of text generated by our GPT-2 model.
MAX_LEN = 300

In [19]:
prompt = "The benefits of drinking 8 glasses of water are"

input_ids = tokenizer.encode(prompt, return_tensors='tf')
sample_outputs = GPT2.generate(
                              input_ids,
                              do_sample = True, 
                              max_length = MAX_LEN,                  
#to test how long we can generate and it be coherent temperature = .8,
                              top_k = 50, 
                              top_p = 0.85,
                              num_return_sequences = 5
)

print("Output:\n" + 100 * '-')
for i, sample_output in enumerate(sample_outputs):
    print("{}: {}...".format(i, tokenizer.decode(sample_output, skip_special_tokens = True)))
    print('')

Output:
----------------------------------------------------------------------------------------------------
0: The benefits of drinking 8 glasses of water are numerous and far-reaching. For starters, water is much better for you than the various other electrolytes commonly found in many other drinks. This is because water is alkaline. Water contains about 20-25% water and 10% minerals.

While water may seem like a simple solution to many of our most common problems, the truth is that water is also an excellent food. One teaspoon of water can provide as much as 60 calories, which is much more than most people need.

As you consume more water, your body will use more of it. In a few weeks, your body will require a higher level of water than it normally does, so it will need to get a greater supply of it from other sources. As a result, your body will begin to produce more water, leading to an increase in body temperature, fluid retention, increased heart rate, increased heart rate varia

### We really got some cool and suprising results with this GPT2 model. You can play with this notebook by tuning parameters and generate new text.