## GPT2 is in pytorch_transformers

### What is a Language Model? 

A language model learns to predict the probability of a sequence of words.

PyTorch-Transformers provides state-of-the-art pre-trained models for Natural Language Processing (NLP).



### This project: 

Let’s build our own sentence completion model using GPT-2. We’ll try to predict the next word in the sentence:

“what is the fastest car in the _________”



In [5]:
import sys
!{sys.executable} -m pip install pytorch_transformers



Import required libraries

In [2]:
import torch
from pytorch_transformers import GPT2Tokenizer, GPT2LMHeadModel

Load pre-trained model tokenizer (vocabulary)

In [3]:
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')

100%|██████████| 1042301/1042301 [00:00<00:00, 40049902.92B/s]
100%|██████████| 456318/456318 [00:00<00:00, 45060304.01B/s]


Here, we tokenize and index the text as a sequence of numbers and pass it to the GPT2LMHeadModel. This is the GPT2 model transformer with a language modeling head on top (linear layer with weights tied to the input embeddings).



Encode the sentence to complete

In [14]:
text = "My favorite meal is a"
indexed_tokens = tokenizer.encode(text)

Convert indexed tokens in a PyTorch tensor

In [15]:
tokens_tensor = torch.tensor([indexed_tokens])

Load pre-trained model (weights)

In [16]:
model = GPT2LMHeadModel.from_pretrained('gpt2')

Set the model in evaluation mode to deactivate the DropOut modules

In [17]:
model.eval()

GPT2LMHeadModel(
  (transformer): GPT2Model(
    (wte): Embedding(50257, 768)
    (wpe): Embedding(1024, 768)
    (drop): Dropout(p=0.1, inplace=False)
    (h): ModuleList(
      (0-11): 12 x Block(
        (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (attn): Attention(
          (c_attn): Conv1D()
          (c_proj): Conv1D()
          (attn_dropout): Dropout(p=0.1, inplace=False)
          (resid_dropout): Dropout(p=0.1, inplace=False)
        )
        (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (mlp): MLP(
          (c_fc): Conv1D()
          (c_proj): Conv1D()
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
  )
  (lm_head): Linear(in_features=768, out_features=50257, bias=False)
)

If you have a GPU, uncomment the following to put everything on cuda

In [18]:
# tokens_tensor = tokens_tensor.to('cuda')
# model.to('cuda')

Predict all tokens

In [19]:
with torch.no_grad():
    outputs = model(tokens_tensor)
    predictions = outputs[0]

Get the predicted next sub-word

In [20]:
predicted_index = torch.argmax(predictions[0, -1, :]).item()
predicted_text = tokenizer.decode(indexed_tokens + [predicted_index])

Print the predicted word

In [21]:
print(predicted_text)

 My favorite meal is a chicken
