<a href="https://colab.research.google.com/github/ComponentSoftTeam/AI-110/blob/main/notebooks/2_Transformer_Demo_with_GPT.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Inspired by code originally from Sinan Ozdemir's [notebook](https://github.com/sinanuozdemir/oreilly-transformers-video-series/blob/main/notebooks/7_NLG_with_GPT.ipynb).

## 2.1 Introduction to the tokenizer and embeddings of GPT-2

In [1]:
%%capture
%pip install datasets transformers bertviz

In [2]:
from transformers import pipeline, set_seed, GPT2Tokenizer, GPT2LMHeadModel
from torch import tensor, numel, nn
from bertviz import model_view

set_seed(42)

In [3]:
generator = pipeline('text-generation', model='gpt2')

generator("Hello, I'm here in Budapest and I", max_length=30, num_return_sequences=3)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Device set to use cpu
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': "Hello, I'm here in Budapest and I have a new story. A short story written about friendship in Hungary. There is a big, white dog"},
 {'generated_text': 'Hello, I\'m here in Budapest and I\'m a huge fan of the Hungarian culture," said Zohér Åkerlund, a member of'},
 {'generated_text': "Hello, I'm here in Budapest and I've been very late, in case you were wondering. I am going to Budapest for work. I have"}]

In [4]:
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2LMHeadModel.from_pretrained('gpt2')

In [None]:
'Erno' in tokenizer.get_vocab()

False

In [None]:
encoded = tokenizer.encode('Hi there. I am Erno, I am your instructor.', return_tensors='pt')

encoded

tensor([[17250,   612,    13,   314,   716,  5256,  3919,    11,   314,   716,
           534, 21187,    13]])

In [None]:
tokenizer.convert_ids_to_tokens(tokenizer.encode('Hi there. I am Erno, I am your instructor.'))

['Hi',
 'Ġthere',
 '.',
 'ĠI',
 'Ġam',
 'ĠEr',
 'no',
 ',',
 'ĠI',
 'Ġam',
 'Ġyour',
 'Ġinstructor',
 '.']

In [None]:
encoded.shape

torch.Size([1, 13])

In [None]:
model

GPT2LMHeadModel(
  (transformer): GPT2Model(
    (wte): Embedding(50257, 768)
    (wpe): Embedding(1024, 768)
    (drop): Dropout(p=0.1, inplace=False)
    (h): ModuleList(
      (0-11): 12 x GPT2Block(
        (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (attn): GPT2SdpaAttention(
          (c_attn): Conv1D(nf=2304, nx=768)
          (c_proj): Conv1D(nf=768, nx=768)
          (attn_dropout): Dropout(p=0.1, inplace=False)
          (resid_dropout): Dropout(p=0.1, inplace=False)
        )
        (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (mlp): GPT2MLP(
          (c_fc): Conv1D(nf=3072, nx=768)
          (c_proj): Conv1D(nf=768, nx=3072)
          (act): NewGELUActivation()
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
  )
  (lm_head): Linear(in_features=768, out_features=50257, bias=False)
)

In [None]:
# Get all of the model's parameters as a list of tuples.
named_params = list(model.named_parameters())

print('The GPT-2 model has {:} different named parameters.\n'.format(len(named_params)))

print('==== Embedding Layer ====\n')
for name, params in named_params[0:2]:
    print(f"{name:<55} {str(tuple(params.size())):>12}")

print('\n==== Decoders ====\n')
for name, params in named_params[2:146]:
    print(f"{name:<55} {str(tuple(params.size())):>12}")


print('\n==== Output Layer ====\n')
for name, params in named_params[-2:]:
    print(f"{name:<55} {str(tuple(params.size())):>12}")

print('\n==== LM Head Layer ====\n')
print(f"{'lm_head':<55} {str(tuple(next(model.lm_head.parameters()).size())):>12}")

The GPT-2 model has 148 different named parameters.

==== Embedding Layer ====

transformer.wte.weight                                  (50257, 768)
transformer.wpe.weight                                   (1024, 768)

==== Decoders ====

transformer.h.0.ln_1.weight                                   (768,)
transformer.h.0.ln_1.bias                                     (768,)
transformer.h.0.attn.c_attn.weight                       (768, 2304)
transformer.h.0.attn.c_attn.bias                             (2304,)
transformer.h.0.attn.c_proj.weight                        (768, 768)
transformer.h.0.attn.c_proj.bias                              (768,)
transformer.h.0.ln_2.weight                                   (768,)
transformer.h.0.ln_2.bias                                     (768,)
transformer.h.0.mlp.c_fc.weight                          (768, 3072)
transformer.h.0.mlp.c_fc.bias                                (3072,)
transformer.h.0.mlp.c_proj.weight                        (3072, 768)
tr

In [None]:
# Get all of the model's parameters as a list of tuples.
named_params = list(model.named_parameters())

#print('The GPT-2 model has {:} different named parameters.\n'.format(len(named_params)))

for name, params in named_params[4:6]:
    print(f"\n")
    print(name)
    print(params.size())
    print(params)
for name, params in named_params[16:18]:
    print(f"\n")
    print(name)
    print(params.size())
    print(params)



transformer.h.0.attn.c_attn.weight
torch.Size([768, 2304])
Parameter containing:
tensor([[-0.4738, -0.2614, -0.0978,  ...,  0.0513, -0.0584,  0.0250],
        [ 0.0874,  0.1473,  0.2387,  ..., -0.0525, -0.0113, -0.0156],
        [ 0.0039,  0.0695,  0.3668,  ...,  0.1143,  0.0363, -0.0318],
        ...,
        [-0.2592, -0.0164,  0.1991,  ...,  0.0095, -0.0516,  0.0319],
        [ 0.1517,  0.2170,  0.1043,  ...,  0.0293, -0.0429, -0.0475],
        [-0.4100, -0.1924, -0.2400,  ..., -0.0046,  0.0070,  0.0198]],
       requires_grad=True)


transformer.h.0.attn.c_attn.bias
torch.Size([2304])
Parameter containing:
tensor([ 0.4803, -0.5254, -0.4293,  ...,  0.0126, -0.0499,  0.0032],
       requires_grad=True)


transformer.h.1.attn.c_attn.weight
torch.Size([768, 2304])
Parameter containing:
tensor([[-0.2906,  0.3057,  0.0302,  ..., -0.0057, -0.0582, -0.0061],
        [-0.3272,  0.2420,  0.2140,  ..., -0.0100,  0.1192, -0.1672],
        [-0.2679,  0.1188, -0.2670,  ...,  0.1511,  0.0671,  

In [None]:
total_params = 0
for param in model.parameters():
    total_params += numel(param)

print(f'Number of params: {total_params:,}')

Number of params: 124,439,808


In [None]:
wte=model.transformer.wte(encoded)
print(wte.shape)
wte

torch.Size([1, 13, 768])


tensor([[[-0.0679, -0.1280,  0.0666,  ...,  0.0494, -0.0610, -0.0470],
         [-0.0806,  0.0413,  0.0576,  ..., -0.0095, -0.1874, -0.0539],
         [ 0.0466, -0.0113,  0.0283,  ..., -0.0735,  0.0496,  0.0963],
         ...,
         [-0.0756,  0.0461,  0.0550,  ..., -0.0826,  0.0872, -0.0208],
         [ 0.1415, -0.1637, -0.0499,  ...,  0.0580,  0.0695,  0.0071],
         [ 0.0466, -0.0113,  0.0283,  ..., -0.0735,  0.0496,  0.0963]]],
       grad_fn=<EmbeddingBackward0>)

In [None]:
wpe=model.transformer.wpe(tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]).reshape(1, 13))
print(wpe.shape)
wpe

torch.Size([1, 13, 768])


tensor([[[-1.8821e-02, -1.9742e-01,  4.0267e-03,  ..., -4.3044e-02,
           2.8267e-02,  5.4490e-02],
         [ 2.3959e-02, -5.3792e-02, -9.4879e-02,  ...,  3.4170e-02,
           1.0172e-02, -1.5573e-04],
         [ 4.2161e-03, -8.4764e-02,  5.4515e-02,  ...,  1.9745e-02,
           1.9325e-02, -2.1424e-02],
         ...,
         [ 1.6006e-03,  6.2476e-03,  1.0040e-01,  ..., -4.6657e-03,
           9.3994e-04, -5.8468e-03],
         [-3.5615e-03,  1.7494e-02,  1.0676e-01,  ..., -5.4367e-03,
          -7.9653e-04, -5.6959e-03],
         [ 5.9564e-05,  1.7205e-02,  9.6934e-02,  ..., -1.5799e-03,
          -8.6813e-04, -7.8220e-03]]], grad_fn=<EmbeddingBackward0>)

In [None]:
initial_input=wte+wpe
print(initial_input.shape)
initial_input


torch.Size([1, 13, 768])


tensor([[[-0.0867, -0.3254,  0.0706,  ...,  0.0063, -0.0328,  0.0075],
         [-0.0566, -0.0125, -0.0373,  ...,  0.0247, -0.1772, -0.0540],
         [ 0.0509, -0.0961,  0.0828,  ..., -0.0538,  0.0689,  0.0749],
         ...,
         [-0.0740,  0.0523,  0.1554,  ..., -0.0873,  0.0881, -0.0266],
         [ 0.1379, -0.1462,  0.0569,  ...,  0.0525,  0.0687,  0.0014],
         [ 0.0467,  0.0059,  0.1253,  ..., -0.0751,  0.0488,  0.0885]]],
       grad_fn=<AddBackward0>)

In [None]:
contextful_embedding = model(encoded, output_hidden_states=True).hidden_states[-1]
print(f"\ncontextful_embedding:")
print(contextful_embedding.shape)
print(contextful_embedding)


contextful_embedding:
torch.Size([1, 13, 768])
tensor([[[-5.1051e-02, -1.9038e-01, -3.3829e-01,  ..., -1.9444e-01,
          -4.5693e-02, -1.8426e-01],
         [ 3.4732e-01, -1.0114e-03, -2.9825e-01,  ...,  1.7716e-01,
           1.9853e-01,  4.4778e-01],
         [ 3.2489e-01, -7.4510e-02, -3.0923e-01,  ..., -1.1854e-01,
          -1.7494e-01, -2.1527e-02],
         ...,
         [ 1.4272e-01, -1.4377e-01,  2.1839e-01,  ...,  4.7257e-02,
           4.5007e-01,  2.0302e-01],
         [ 7.8564e-02, -1.3801e-01, -1.5103e+00,  ...,  4.3329e-02,
           1.2174e-01,  1.5725e-01],
         [ 2.0761e-01, -3.0169e-01, -3.8818e-01,  ...,  1.0580e-01,
          -4.1715e-02,  2.2291e-01]]], grad_fn=<ViewBackward0>)


In [None]:
module_output = initial_input
for module in model.transformer.h:
    module_output = module(module_output)[0]

print(f"\nfinal module output before normalization:")
print(module_output.shape)
print(module_output)

contextful_embedding = model.transformer.ln_f(module_output)
print(f"\nfinal module output after normalization = contextful_embedding:")
print(contextful_embedding.shape)
print(contextful_embedding)


final module output before normalization:
torch.Size([1, 13, 768])
tensor([[[ 1.2112e-02, -1.6717e+00, -1.3894e+00,  ..., -1.6987e+00,
           2.6213e-01, -1.8849e+00],
         [ 3.6916e+00, -8.3286e-01, -2.3966e+00,  ...,  2.4070e+00,
           2.9722e+00,  5.2475e+00],
         [ 4.5242e+00, -1.7653e+00, -2.7203e+00,  ..., -2.1180e+00,
          -2.8785e+00, -1.0734e+00],
         ...,
         [ 1.6042e+00, -2.3082e+00,  2.4460e+00,  ...,  7.7297e-01,
           7.1875e+00,  2.2483e+00],
         [ 6.2690e-01, -2.3633e+00, -1.2819e+01,  ...,  5.1152e-01,
           1.9169e+00,  1.3856e+00],
         [ 2.7683e+00, -4.8010e+00, -3.3416e+00,  ...,  1.9459e+00,
          -3.5618e-01,  2.9281e+00]]], grad_fn=<AddBackward0>)

final module output after normalization = contextful_embedding:
torch.Size([1, 13, 768])
tensor([[[-5.1051e-02, -1.9038e-01, -3.3829e-01,  ..., -1.9444e-01,
          -4.5693e-02, -1.8426e-01],
         [ 3.4732e-01, -1.0114e-03, -2.9825e-01,  ...,  1.7716e-01,

In [None]:
logits = model.lm_head(contextful_embedding)
print(f"\nlogits:")
print(logits.shape)
print(logits)

probabilities = nn.functional.softmax(logits, dim=2)
print(f"\nprobabilities:")
print(probabilities.shape)
print(probabilities)


logits:
torch.Size([1, 13, 50257])
tensor([[[ -34.2418,  -34.3303,  -37.3033,  ...,  -43.0448,  -42.7169,
           -35.2205],
         [ -52.0912,  -56.3441,  -61.8496,  ...,  -65.2909,  -64.1484,
           -57.9116],
         [-122.3509, -122.4479, -123.1734,  ..., -130.7879, -131.2375,
          -115.8060],
         ...,
         [-104.8461, -103.7207, -107.3849,  ..., -108.0555, -109.1399,
          -104.2351],
         [ -76.4137,  -79.0455,  -84.3336,  ...,  -90.5665,  -90.1864,
           -79.6281],
         [-138.8847, -138.3423, -140.8626,  ..., -149.0314, -150.1313,
          -133.3433]]], grad_fn=<UnsafeViewBackward0>)

probabilities:
torch.Size([1, 13, 50257])
tensor([[[4.0598e-03, 3.7159e-03, 1.9007e-04,  ..., 6.1012e-07,
          8.4687e-07, 1.5257e-03],
         [1.1630e-01, 1.6542e-03, 6.7229e-06,  ..., 2.1528e-07,
          6.7487e-07, 3.4501e-04],
         [2.1517e-05, 1.9529e-05, 9.4529e-06,  ..., 4.6627e-09,
          2.9742e-09, 1.4969e-02],
         ...,
     

In [None]:
print('Original words:')
print(tokenizer.convert_ids_to_tokens(tokenizer.encode('Hi there. I am Erno, I am your instructor.', return_tensors='pt')[0]))
print('\nMost probable next words:')
print(tokenizer.convert_ids_to_tokens(probabilities.argmax(dim=2)[0]))
print('\nProbability of most probable next words:')
print([f"{probabilities[0][i][j.item()].item():.2%}" for i, j in enumerate(probabilities.argmax(2)[0])])

Original words:
['Hi', 'Ġthere', '.', 'ĠI', 'Ġam', 'ĠEr', 'no', ',', 'ĠI', 'Ġam', 'Ġyour', 'Ġinstructor', '.']

Most probable next words:
['.', ',', 'ĠI', "'m", 'Ġnot', 'nie', '.', 'Ġand', 'Ġam', 'Ġthe', 'Ġfather', '.', 'ĠI']

Probability of most probable next words:
['7.61%', '27.57%', '16.35%', '19.35%', '4.33%', '9.05%', '16.82%', '12.08%', '40.86%', '5.57%', '6.87%', '42.34%', '19.80%']


## 2.2 Introduction to GPT-2 masked multi-headed attention

In [5]:
import torch
import pandas as pd


In [7]:
encoded = tokenizer.encode('Hi there. I am Erno, I am your instructor.', return_tensors='pt')
contextful_embedding = model(encoded, output_hidden_states=True).hidden_states[-1]
print(f"\ncontextful_embedding:")
print(contextful_embedding.shape)
print(contextful_embedding)


contextful_embedding:
torch.Size([1, 13, 768])
tensor([[[-5.1051e-02, -1.9038e-01, -3.3829e-01,  ..., -1.9444e-01,
          -4.5692e-02, -1.8426e-01],
         [ 3.4731e-01, -1.0119e-03, -2.9825e-01,  ...,  1.7716e-01,
           1.9853e-01,  4.4778e-01],
         [ 3.2489e-01, -7.4511e-02, -3.0923e-01,  ..., -1.1854e-01,
          -1.7494e-01, -2.1527e-02],
         ...,
         [ 1.4272e-01, -1.4376e-01,  2.1839e-01,  ...,  4.7257e-02,
           4.5007e-01,  2.0302e-01],
         [ 7.8564e-02, -1.3801e-01, -1.5103e+00,  ...,  4.3330e-02,
           1.2174e-01,  1.5725e-01],
         [ 2.0761e-01, -3.0169e-01, -3.8818e-01,  ...,  1.0580e-01,
          -4.1714e-02,  2.2291e-01]]], grad_fn=<ViewBackward0>)


In [8]:
phrase = 'My friend was right about this book. It is really funny.' ###
encoded_phrase = tokenizer(phrase, return_tensors='pt')

response = model(**encoded_phrase, output_attentions=True, output_hidden_states=True)

len(response.attentions)



12

In [9]:
encoded_phrase

{'input_ids': tensor([[3666, 1545,  373,  826,  546,  428, 1492,   13,  632,  318, 1107, 8258,
           13]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])}

In [10]:
response.attentions[-1].shape  # From the final decoder

torch.Size([1, 12, 13, 13])

In [11]:
encoded_phrase['input_ids'].shape

torch.Size([1, 13])

In [12]:
tokens = tokenizer.convert_ids_to_tokens(encoded_phrase['input_ids'][0]) ###

tokens

['My',
 'Ġfriend',
 'Ġwas',
 'Ġright',
 'Ġabout',
 'Ġthis',
 'Ġbook',
 '.',
 'ĠIt',
 'Ġis',
 'Ġreally',
 'Ġfunny',
 '.']

In [13]:
### Layer index 9, head 0. Check out the 60% attention the token "it" is giving to the token "book"
arr = response.attentions[9][0][0]
# arr = response.attentions[11][0][11]

n_digits = 3

attention_df = pd.DataFrame((torch.round(arr * 10**n_digits) / (10**n_digits)).detach()).applymap(float)

attention_df.columns = tokens
attention_df.index = tokens

attention_df


  attention_df = pd.DataFrame((torch.round(arr * 10**n_digits) / (10**n_digits)).detach()).applymap(float)


Unnamed: 0,My,Ġfriend,Ġwas,Ġright,Ġabout,Ġthis,Ġbook,.,ĠIt,Ġis,Ġreally,Ġfunny,..1
My,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Ġfriend,0.968,0.032,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Ġwas,0.824,0.145,0.031,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Ġright,0.979,0.008,0.007,0.005,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Ġabout,0.979,0.008,0.004,0.005,0.005,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Ġthis,0.924,0.031,0.007,0.006,0.016,0.016,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Ġbook,0.956,0.004,0.002,0.002,0.004,0.004,0.028,0.0,0.0,0.0,0.0,0.0,0.0
.,0.728,0.01,0.003,0.003,0.002,0.008,0.235,0.011,0.0,0.0,0.0,0.0,0.0
ĠIt,0.31,0.002,0.003,0.006,0.012,0.02,0.601,0.014,0.031,0.0,0.0,0.0,0.0
Ġis,0.344,0.005,0.002,0.004,0.004,0.016,0.517,0.011,0.067,0.03,0.0,0.0,0.0


In [14]:
tokens = tokenizer.convert_ids_to_tokens(encoded_phrase['input_ids'][0])
model_view(response.attentions, tokens)

<IPython.core.display.Javascript object>

In [None]:
tokens

['My',
 'Ġfriend',
 'Ġwas',
 'Ġright',
 'Ġabout',
 'Ġthis',
 'Ġbook',
 '.',
 'ĠIt',
 'Ġis',
 'Ġreally',
 'Ġfunny',
 '.']

In [None]:
response.hidden_states[-1].shape

torch.Size([1, 13, 768])

In [None]:
response.logits.shape

torch.Size([1, 13, 50257])

In [None]:
states = torch.nn.functional.softmax(response.logits, dim=2)
pd.DataFrame(
    zip(tokens, tokenizer.convert_ids_to_tokens(response.logits.argmax(2)[0]), [f"{states[0][i][j.item()].item():.2%}" for i, j in enumerate(states.argmax(2)[0])]),
    columns=['Sequence up until', 'Next token with highest probability', 'Probability']
)

Unnamed: 0,Sequence up until,Next token with highest probability,Probability
0,My,Ċ,1.32%
1,Ġfriend,",",13.32%
2,Ġwas,Ġa,4.96%
3,Ġright,.,23.58%
4,Ġabout,Ġthat,20.61%
5,Ġthis,.,22.32%
6,Ġbook,.,27.37%
7,.,ĠI,19.27%
8,ĠIt,'s,37.43%
9,Ġis,Ġa,17.66%


In [None]:
print(generator(phrase, max_length=20, num_return_sequences=1, do_sample=False))  # greedy search
print(generator(phrase, max_length=20, num_return_sequences=1, do_sample=False))  # greedy search
print(generator(phrase, max_length=20, num_return_sequences=1, do_sample=False))  # greedy search

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'My friend was right about this book. It is really funny. I think it is a great book'}]


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'My friend was right about this book. It is really funny. I think it is a great book'}]
[{'generated_text': 'My friend was right about this book. It is really funny. I think it is a great book'}]


In [18]:
print(generator(phrase, max_length=20, num_return_sequences=2, do_sample=True, num_beams=10))  # sampling with num_beam

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': "My friend was right about this book. It is really funny. It's funny because I don't"}, {'generated_text': "My friend was right about this book. It is really funny. I'm not sure what it is"}]


In [None]:
generator(phrase, max_length=20, num_return_sequences=3, do_sample=True)  # sampling

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'My friend was right about this book. It is really funny. It is the first person you have'},
 {'generated_text': 'My friend was right about this book. It is really funny. He wrote these characters. He says'},
 {'generated_text': "My friend was right about this book. It is really funny. That's what this book is about"}]

In [None]:
generator("What is the largest city in Mexico?", max_length=25, num_return_sequences=5, do_sample=True)  # sampling

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'What is the largest city in Mexico?\n\nI went there two months ago but got lucky.\n\nI was really'},
 {'generated_text': "What is the largest city in Mexico? It's located on the border, in the state of Michoacán, about"},
 {'generated_text': 'What is the largest city in Mexico?\n\nThe big question is whether to ask it, especially given their unique geography."'},
 {'generated_text': 'What is the largest city in Mexico?\n\nAccording to the Mexican city government (Namco Nacional, or'},
 {'generated_text': 'What is the largest city in Mexico?\n\nMexico City is the fourth largest city in Europe and has nearly the highest per'}]

In [None]:
generator("Is the earth flat?", max_length=25, num_return_sequences=5, do_sample=True)  # sampling

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': "Is the earth flat? That's not correct. There is no such thing as a flat planet, and it's not completely"},
 {'generated_text': "Is the earth flat? What about Mars? Are these the results of a'slightly more complex' test? It sounds"},
 {'generated_text': 'Is the earth flat? Well, a more complete description can be found in Chapter 13. What exactly is a geocentric'},
 {'generated_text': "Is the earth flat? Because of this, it seems that there isn't much difference between the two: the Earth is not"},
 {'generated_text': 'Is the earth flat? There are not known scientific theories to back that up. In an earlier paper, I wrote that the'}]

In [None]:
generator("Who was Albert Einstein?", max_length=25, num_return_sequences=5, do_sample=True)  # sampling

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'Who was Albert Einstein? Why, then, are the two different books so connected? How could one have come across a single'},
 {'generated_text': "Who was Albert Einstein? Did it matter.\n\nThe question comes up again when you learn the details about Einstein's ideas"},
 {'generated_text': 'Who was Albert Einstein? No, really not."\n\nHe had just asked for two things in his life. He wanted'},
 {'generated_text': 'Who was Albert Einstein? Oh man – that\'s a very tough question."\n\nThe answer is almost certainly no. The'},
 {'generated_text': 'Who was Albert Einstein? He passed away before his birthday on May 26th 1933, a few weeks after Einstein walked into the'}]