# Generating text with a pre-trained GPT2 in PyTorch

This notebook was Adapted from this blog post : [Fine-tuning large Transformer models on a single GPU in PyTorch - Teaching GPT-2 a sense of humor](https://mf1024.github.io/2019/11/12/Fun-With-GPT-2/).

In this notebook, I will use a pre-trained medium-sized GPT2 model from the [huggingface](https://github.com/huggingface/transformers) to generate some text.

The easiest way to use huggingface transformer libraries is to install their pip package *transformers*.

In [2]:
!pip install transformers

Defaulting to user installation because normal site-packages is not writeable


In [3]:
import logging
logging.getLogger().setLevel(logging.CRITICAL)

import torch
import numpy as np

from transformers import GPT2Tokenizer, GPT2LMHeadModel

device = 'cpu'
if torch.cuda.is_available():
    device = 'cuda'

### Models and classes

I use the [GPT2LMHeadModel](https://github.com/huggingface/transformers/blob/master/transformers/modeling_gpt2.py#L491) module for the language model, which is [GPT2Model](https://github.com/huggingface/transformers/blob/master/transformers/modeling_gpt2.py#L326), with an additional linear layer that uses input embedding layer weights to do the inverse operation of the embedding layer - to create logits vector for the dictionary from outputs of the GPT2.

[GPT2Tokenizer](https://github.com/huggingface/transformers/blob/master/transformers/tokenization_gpt2.py#L106) is a byte-code pair encoder that will transform input text input into input tokens that the huggingface transformers were trained on. 

In [4]:
tokenizer = GPT2Tokenizer.from_pretrained('gpt2-medium')
model = GPT2LMHeadModel.from_pretrained('gpt2-medium')
model = model.to(device)

In [5]:
# Function to first select topN tokens from the probability list and then based on the selected N word distribution
# get random token ID
def choose_from_top(probs, n=5):
    ind = np.argpartition(probs, -n)[-n:]
    top_prob = probs[ind]
    top_prob = top_prob / np.sum(top_prob) # Normalize
    choice = np.random.choice(n, 1, p = top_prob)
    token_id = ind[choice][0]
    return int(token_id)

### Text generation

At each prediction step, GPT2 model needs to know all of the previous sequence elements to predict the next one. Below is a function that will tokenize the starting input text, and then in a loop, one new token is predicted at each step and is added to the sequence, which will be fed into the model in the next step. In the end, the token list is decoded back into a text. 

In [6]:
def generate_some_text(input_str, text_len = 250):

    cur_ids = torch.tensor(tokenizer.encode(input_str)).unsqueeze(0).long().to(device)

    model.eval()
    with torch.no_grad():

        for i in range(text_len):
            outputs = model(cur_ids, labels=cur_ids)
            loss, logits = outputs[:2]
            softmax_logits = torch.softmax(logits[0,-1], dim=0) #Take the first(only one) batch and the last predicted embedding
            next_token_id = choose_from_top(softmax_logits.to('cpu').numpy(), n=10) #Randomly(from the given probability distribution) choose the next word from the top n words
            cur_ids = torch.cat([cur_ids, torch.ones((1,1)).long().to(device) * next_token_id], dim = 1) # Add the last word

        output_list = list(cur_ids.squeeze().to('cpu').numpy())
        output_text = tokenizer.decode(output_list)
        print(output_text)

## Generating the text

I will give thre different sentence beginnings to the GPT2 and let it generate the rest:


***1. The Matrix is everywhere. It is all around us. Even now, in this very room. You can see it when you look out your window or when you turn on your television. You can feel it when you go to work… when you go to church… when you pay your taxes. It is the world that has been pulled over your eyes to blind you from the truth…***

***2. Artificial general intelligence is…***

***3. The Godfather: “I’m going to make him an offer he can’t refuse.”…***

In [7]:
generate_some_text(" The Matrix is everywhere. It is all around us. Even now, in this very room. You can see it when you look out your window or when you turn on your television. You can feel it when you go to work... when you go to church... when you pay your taxes. It is the world that has been pulled over your eyes to blind you from the truth. ")

 The Matrix is everywhere. It is all around us. Even now, in this very room. You can see it when you look out your window or when you turn on your television. You can feel it when you go to work... when you go to church... when you pay your taxes. It is the world that has been pulled over your eyes to blind you from the truth. _____________________ "What is the truth? What is the world? You cannot understand it until you've experienced it." ~ Albert Einstein

The Matrix is real. You can be sure of that. You don't need to be able to see the film, you just have to be aware of your surroundings. And if you can't be sure of anything, it's because you've never experienced it. _____________ "We must be careful what we ask, but we cannot be sure of the answers. The answers are all around us and if we look, we will find them, they can be found, they may come and go as they please. We will never know them." ~ Albert Einstein

The Matrix is real. You can be sure of that. You don't need to be abl

In [9]:
generate_some_text("If everytime i went to work, I took my bike instead of my car...")

If everytime i went to work, I took my bike instead of my car...

If i had the opportunity to travel all over the world...

If i could take every opportunity i got...

It would be awesome, but not if it was just a way to get some money to live, eat, have fun!


I don't know why I even care anymore. This shit just makes no sense at all.


And if you think that's a good thing for your life, then fuck you. I have no clue how you could live without it. I don't care whether you're rich, poor, black, brown, white, Asian, or whatever. I'm not going to be around to see how you survive.


I don't give a fuck if you're rich or poor because if you're rich your life is going to be better, but when it comes to the fact that you can't have enough money for your shit, then I don't care. You can go get a job, but I can't see your fucking job being worth shit when it's fucking boring. I'm not saying that you can't get a job. I'm saying that you can't have any money that you can live off.


If you don't

In [10]:
generate_some_text(" The Godfather: \"I'm going to make him an offer he can't refuse.\" ")

 The Godfather: "I'm going to make him an offer he can't refuse."  The only reason I don't think he's going to accept the offer is because the offer was so much less than what he really wanted.  The offer was to kill the Godfather, which I think would make him a little more sympathetic to him.  It's possible the offer he accepted was something else. But I think we can't discount all of the things that are really going on in the film.  There's a scene between the Godfather and a mobster that is really interesting because it reveals what a bad actor the Mafia is.  There's also this scene where the Godfather gets arrested and there's a scene when they are driving home in a van after a night out that I think makes the Godfather look more human. The Godfather: "...I've seen the best. I've seen the worst." It was the best. That's the thing with the Mafia, you don't see them in the same light as people you know.  The people they work with know the people they're dealing with.  But the people 