# T-725 Natural Language Processing: Lab 5
In today's lab, we will be working with neural networks, using GRUs and Transformers for text generation.

To begin with, do the following:
* Select `"File" > "Save a copy in Drive"` to create a local copy of this notebook that you can edit.
* **Select `"Runtime" > "Change runtime type"`, and make sure that you have "Hardware accelerator" set to "GPU"**
* Select `"Runtime" > "Run all"` to run the code in this notebook.

In [1]:
import os

# Suppress some warnings from TensorFlow about deprecated functions
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'

## Generating text with neural networks
Let's create a neural language model and use it to generate some text. This time, we will use character embeddings rather than word embeddings. They are created in exactly the same way, and are often used together in neural network-based models. One benefit of using character embeddings is that we can generate words that our model has never seen before.

The model takes as input a sequence of characters and predicts which character is most likely to follow. We will generate text by repeatedly predicting and appending the next character to a string. First, however, we need some text to train it on.


In [2]:
# Based on the following tutorial:
# https://www.tensorflow.org/tutorials/text/text_generation

import tensorflow as tf
import numpy as np

# Let's download some text by Shakespeare to train our model
url = 'https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt'
path_to_file = tf.keras.utils.get_file('shakespeare.txt', url)

with open(path_to_file, encoding='utf-8') as f:
  shakespeare = f.read()

print("First 250 characters:")
print(shakespeare[:250])

print ("Length of text: {:,} characters".format(len(shakespeare)))

Downloading data from https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt
First 250 characters:
First Citizen:
Before we proceed any further, hear me speak.

All:
Speak, speak.

First Citizen:
You are all resolved rather to die than to famish?

All:
Resolved. resolved.

First Citizen:
First, you know Caius Marcius is chief enemy to the people.

Length of text: 1,115,394 characters


Now we can create training examples for our model. Each example will be a pair of strings: one input string containing 100 characters, and a target string that is one character ahead. For example, the first pair we create is:

**Input string**:  `'First Citizen:\nBefore we proceed any further, hear me speak.\n\nAll:\nSpeak, speak.\n\nFirst Citizen:\nYou'`

**Target string**: `'irst Citizen:\nBefore we proceed any further, hear me speak.\n\nAll:\nSpeak, speak.\n\nFirst Citizen:\nYou '`

However, before we can start training, we need to convert our text into a list of integers, where each integer represents a different character. For example, "First Citizen" becomes:

```
Character:   F   i   r   s   t      C   i   t   i   z   e   n
Integer:   [18, 47, 56, 57, 58, 1, 15, 47, 58, 47, 64, 43, 52]
```

In [3]:
BATCH_SIZE = 64  # Batch size
BUFFER_SIZE = 10000  # Buffer size to shuffle the dataset

def split_input_target(chunk):
  # Create (input_string, output_string) pairs
  input_text = chunk[:-1]
  target_text = chunk[1:]
  return input_text, target_text

def prepare_text(text):
  # The unique characters in the file
  vocab = sorted(set(text))
  print ('{} unique characters'.format(len(vocab)))

  # Creating a mapping from unique characters to indices
  char_map = {
      'char_to_index': {char: index for index, char in enumerate(vocab)},
      'index_to_char': np.array(vocab)
  }

  text_as_int = np.array([char_map['char_to_index'][c] for c in text])

  # The maximum length sentence we want for a single input in characters
  seq_length = 100
  examples_per_epoch = len(text) // (seq_length+1)

  # Create training examples / targets
  char_dataset = tf.data.Dataset.from_tensor_slices(text_as_int)
  sequences = char_dataset.batch(seq_length + 1, drop_remainder=True)
  dataset = sequences.map(split_input_target)

  # (TF data is designed to work with possibly infinite sequences,
  # so it doesn't attempt to shuffle the entire sequence in memory. Instead,
  # it maintains a buffer in which it shuffles elements).
  dataset = dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True)

  return dataset, vocab, examples_per_epoch, char_map

Now we can create and train the neural network.

In [4]:
import os

def loss(labels, logits):
  return tf.keras.losses.sparse_categorical_crossentropy(labels, logits, from_logits=True)


def build_model(vocab_size, embedding_dim, rnn_units, batch_size):
  model = tf.keras.Sequential([
      tf.keras.layers.Embedding(vocab_size,
                                embedding_dim,
                                batch_input_shape=[batch_size, None]),
      tf.keras.layers.GRU(rnn_units,
                          return_sequences=True,
                          recurrent_initializer='glorot_uniform',
                          stateful=True),
      tf.keras.layers.Dense(vocab_size)
  ])

  return model


def create_model(text, epochs=3, embedding_dim = 256, rnn_units = 1024):
  dataset, vocab, examples_per_epoch, char_map = prepare_text(text)

  vocab_size = len(vocab)  # Length of the vocabulary in chars

  model = build_model(vocab_size, embedding_dim, rnn_units, BATCH_SIZE)

  # Compile the model
  model.compile(optimizer='adam', loss=loss)

  # Create checkpoints once the model has been trained
  checkpoint_dir = './training_checkpoints'
  checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt_{epoch}")
  checkpoint_callback=tf.keras.callbacks.ModelCheckpoint(
      filepath=checkpoint_prefix,
      save_weights_only=True)

  # Train the model
  history = model.fit(
      dataset,
      epochs=epochs,
      callbacks=[checkpoint_callback])

  tf.train.latest_checkpoint(checkpoint_dir)
  model = build_model(vocab_size, embedding_dim, rnn_units, batch_size=1)
  model.load_weights(tf.train.latest_checkpoint(checkpoint_dir))
  model.build(tf.TensorShape([1, None]))

  return model, char_map

In [5]:
shake_model, shake_chars = create_model(shakespeare)

65 unique characters
Epoch 1/3
Epoch 2/3
Epoch 3/3


Now that we've trained our model, we can finally use it to generate some text. The following function takes a model and a string as input, and continually predicts and appends the next character to the string until it becomes 1,000 characters long.

In [6]:
def generate_text(model, char_map, start_string, temperature=1.0):
  # Evaluation step (generating text using the learned model)
  # Low temperatures results in more predictable text.
  # Higher temperatures results in more surprising text.
  if not start_string:
    print("start_string can't be empty")
    return ""

  # Number of characters to generate
  num_generate = 1000

  # Converting our start string to numbers (vectorizing)
  input_eval = [char_map['char_to_index'][s] for s in start_string]
  input_eval = tf.expand_dims(input_eval, 0)

  # Empty string to store our results
  text_generated = []

  # Here batch size == 1
  model.reset_states()
  for i in range(num_generate):
      predictions = model(input_eval)
      # remove the batch dimension
      predictions = tf.squeeze(predictions, 0)

      # using a categorical distribution to predict the character returned by the model
      predictions = predictions / temperature
      predicted_id = tf.random.categorical(predictions, num_samples=1)[-1,0].numpy()

      # We pass the predicted word as the next input to the model
      # along with the previous hidden state
      input_eval = tf.expand_dims([predicted_id], 0)

      text_generated.append(char_map['index_to_char'][predicted_id])

  return (start_string + ''.join(text_generated))

Let's generate some text!

In [7]:
print(generate_text(shake_model, shake_chars, "ROMEO: ", temperature=1.0))

ROMEO: Good consiour aris bruther,
What it the lawe?

ROMEO:
Lordo't sir; we'll chence the prace fase hither: come that,
Be mather date her must shall bede:
This such yids he wish the parr cas

QUEEN MARGARET:
Whese weld you our subjecthes.
Rignd, of lost makes the widland of the marry
Than chargions father, came'bl, sir?

KING Juhing thou art?

HENRY Aurue.

ProvoRK:
I that kind, what am us ruch?

LEONTES:
To list held that not of bid yet shee dean,
Or Verstame be pity of his blowerage.

QUEEN:
This riseangs me; faurt pleazer how he dase hear,
A peason! what all accour'd.
We prayers before, eye it masid let-med-maks and cheezen I Am greater:
Say, alings he let upon mistroys wife
Till eafed their other liegely.

Luss:
Moch did cunmand it thou, if the riege of you's bleade his hand;
The crown, I befter it wranch, or e begs; he sean hor all.

CORIOLANUS:
If hask the lawnt's handys he what to ROVIF Mirdanat:
Is indwed, I bust my shulless upon.
Mally: the patture your graitous?

Thind Sira

# Assignment
Answer the following questions and hand in your solution in Canvas before 8:30 on Monday morning, October 2nd. Remember to save your file before uploading it.

## Question 1
The `temperature` parameter of `generate_text()`, defined earlier in the notebook, controls how predictable the generated text will be. The lower the temperature, the more the function will tend to append the most likely character (according to the model's prediction). A higher temperature introduces some randomness, leading to more unpredictable text.

The text we generated above used a temperature of 1.0. Try generating more text using the Shakespeare model, once using a temperature of 0.2 and again using a temperature of 0.8.

In [8]:
# Your solution here
print("Text generated with temperature 0.2:\n", generate_text(shake_model, shake_chars, "ROMEO: ", temperature=0.2))

print("Text generated with temperature 0.8:\n", generate_text(shake_model, shake_chars, "ROMEO: ", temperature=0.8))

Text generated with temperature 0.2:
 ROMEO: I have stay the courters of his heart his bearth the cause of his heart.

KING RICHARD III:
The shall be come to her his prince and the properts of his heart.

KING RICHARD III:
The surse the court of his hath be consule.

KING RICHARD III:
The courtes of the courtel of the stand of his hand.

KING RICHARD III:
The cause you shall be speak the surse the prown and the courtes of the courtes of the parton of his heart.

KING RICHARD II:
The now is the courtes of the courters,
The court he had the cause of his procest of the present of the partent of the courters of the surjess,
And shall be consent the prother with the propert of the courters of the provess of the subject of his heart her stands of the parton and a cause of his hath be speak.

KING RICHARD III:
What is the rest of the seath of his heart.

KING RICHARD III:
The surse the courtes of the cause of heart shall be so read the partion.

QUEEN ELIZABETH:
The shall be consting of the c

## Question 2
NLTK's `names` corpus contains a list of approximately 8,000 English names. Train a new model on `names_raw` for at least 20 epochs using the `create_model(text, epochs=n)` function defined earlier. Use the trained model to generate a list of names (with the `generate_text` function defined earlier), starting with your own first name. Your name should not contain any non-English characters, and should end with an `\n`.

Print out the names that do not appear in the training data. Do you get any actual names (or at least names that sound plausible)?

In [9]:
# Don't modify this code cell
import nltk
from nltk.corpus import names
nltk.download('names')

# Print out a few examples
names_raw = names.raw()
names_unique = set(names_raw.split())
names_raw = "\n".join(names_unique)
print(names_raw.splitlines()[:5])

['Gavrielle', 'Georgine', 'Whitaker', 'Florry', 'Janka']


[nltk_data] Downloading package names to /root/nltk_data...
[nltk_data]   Unzipping corpora/names.zip.


In [10]:
names_raw_model, names_raw_chars = create_model(names_raw, epochs=20)
response = generate_text(names_raw_model, names_raw_chars, "Francesco\n").split()

55 unique characters
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


In [11]:
print(response)

['Francesco', 'Hossia', 'Alanena', 'Dan', 'Cobber', 'Wence', 'Soreqellode', 'Aulinia', 'Cdonis', 'Reos', 'Erdi', 'Harna', 'Aldine', 'Deouza', 'Melpoy', 'Nozie', 'Amleite', 'Shbiriet', 'Hilasmara', 'Annayna', 'Loba', 'Daincce', 'Moldie', 'Roomie', 'Heryrce', 'Dormy', 'New', 'Caulen', 'Kale', 'Shfonn', 'Morrito', 'Anesol', 'Athelen', 'Hargergiettha', 'Brncel', 'Roneand', 'Lence', 'Folrie', 'Aeni', 'Clevorgu', 'Arson', 'Tanerele', 'Poavie', 'Chovie', 'Stotte', 'Leona', 'Melia', 'Karitpe', 'Daronio', 'Emeula', 'Shandeli', 'Gamlie', 'Isodde', 'Danert', 'Chevon', 'Brann', 'Adod', 'Batri', 'Wirgielle', 'Amor', 'Nolona', 'Roneb', 'Roleette', 'Jundoh', 'Kaventey', 'Dannacia', 'Wirollin', 'Selbenti', 'Beblie', 'Adilfie', 'Heodry', 'Areatha', 'Nettta', 'Wilyann', 'Tely', 'Sefally', 'Otep', 'Anisa', 'Miph', 'Va', 'Sabollen', 'Doida', 'Maria', 'LosSy', 'Pillin', 'Hanchelda', 'Heiter', 'Erd', 'Sheora', 'Carsh', 'Rengmia', 'Jitha', 'Icia', 'Kele', 'Martora', 'Ien', 'Dory', 'Canno', 'Rofelane', 'Jona'

In [12]:

not_in_training_dataset = list(set(response) - set(names_raw))
print(not_in_training_dataset)

['Clevorgu', 'Dory', 'Cheonie', 'Icia', 'Doida', 'Pillin', 'Cobber', 'Oivy', 'Nettta', 'Chevon', 'Amleite', 'Wirollin', 'New', 'Stuisa', 'Gellera', 'Dormy', 'Theulann', 'Lence', 'Bany', 'Danert', 'Kale', 'Kalifu', 'Anesol', 'Soreqellode', 'Aulinia', 'Moldie', 'Aldine', 'Batri', 'Loba', 'Athelen', 'Kele', 'Nozie', 'Annayna', 'Fobry', 'Jitha', 'Rose', 'Rugoane', 'Roneb', 'Jodriqa', 'Saceyn', 'Isodde', 'Caulen', 'Joph', 'Killle', 'Adod', 'Maria', 'Maucoh', 'Deouza', 'Sefally', 'Mathissa', 'Rengmia', 'Roenie', 'Pharlus', 'Lilbath', 'Selbenti', 'Rivteyn', 'Sulle', 'Gamlie', 'Wirgielle', 'Brann', 'Shuiss', 'Leona', 'Heryrce', 'Dannacia', 'Wiedd', 'Areatha', 'Morrito', 'Beblie', 'Va', 'Ibella', 'Folrie', 'Miph', 'Jundoh', 'Roneand', 'Kyen', 'Kaventey', 'Hilasmara', 'Roleette', 'Roomie', 'Namya', 'Wence', 'Karitpe', 'Shfonn', 'Dary', 'Emeula', 'Hanchelda', 'Harna', 'Cdonis', 'Denzo', 'Reos', 'Alanena', 'Adilfie', 'Leephadina', 'Anere', 'Sabollen', 'Chovie', 'Jofili', 'Grivyna', 'Canno', 'Tey',

From the list above, the only names in list that I think are plausible are:
 - my name (Altrough is a quite common name in Italy, it wasn't in the list)
 - 'Marika'
 - 'Cam'

All the other do not seem like human-ish, I would use some specific ones for Role Playing Games (like DnD):
 - 'Zeana'
 - 'Nalret'
 - 'Ammonoel'
 - 'Layle' (Directly from Final Fantasy)
 - 'Merylle'
 - 'Lofia'
 - 'Floray'

##Question 3
The size of the model can make a difference when it comes to performance. Create a new model that has twice the number of hidden units as the previous model and double the size of the embeddings. How does the performance change? What happens if you decrease these parameters?

In [13]:
names_raw_model_twice, names_raw_chars_twice = create_model(names_raw, epochs=20, embedding_dim = 512, rnn_units = 2048)
names_raw_model_half, names_raw_chars_half = create_model(names_raw, epochs=20, embedding_dim = 256, rnn_units = 512)

55 unique characters
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20
55 unique characters
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


In [14]:
# To assess the performances I'll generate some names and use the results to compare the two models

result_twice = generate_text(names_raw_model_twice, names_raw_chars_twice, "Francesco\n").split()
result_half = generate_text(names_raw_model_half, names_raw_chars_half, "Francesco\n").split()

print("Result Twice:\n", result_twice)
print("Result Half: \n", result_half)

Result Twice:
 ['Francesco', 'Agsellene', 'Jelfoenaloonna', 'Tromiak', 'Sastanda', 'Beabiesaana', 'Ann-Natse', 'Eleta', 'Uli', 'Sarena', 'Rilbeoe', 'Eswina', 'Esed', 'Decof', 'Warcie', 'Jona', 'Elybbe', 'Margie', 'Discita', 'Stammay', 'Danera', 'Odwy', 'Eutttoue', 'Margurttte', 'Simarorlond', 'Ellelley', 'Kaphwele', 'Sabod', 'Cadillen', 'Staree', 'Barse', 'Marer', 'Keine', 'Rathiathy', 'Gwannco', 'Amatheri', 'Adrina', 'Naildan', 'Sormy', 'Gorgabere', 'Merdelle', 'Mavie', 'Marely', 'Borons', 'Daeti', 'Istin', 'Clinally', 'Gurimande', 'Myscke', 'Naiay', 'Londendo', 'Skieto', 'Alb', 'Brion', 'Mollas', 'Evaila', 'Mabraree', 'Hled', 'Ally', 'Marli', 'Rokeli', 'Lianea', 'Lyneth', 'Witthye', 'Leavin', 'Toberance', 'Fineter', 'Ilva', 'Dallly', 'Corny', 'Malmole', 'Gadetta', 'Barter', 'Collor', 'Mandie', 'Clairlia', 'Rigig', 'Jeecie', 'Bewenn', 'Marlar', 'Dan', 'Demaic', 'Tanna', 'Kery', 'Elite', 'Vackean', 'Morgakd', 'Canali', 'Jaghadge', 'Wararac', 'Jarde', 'Katy', 'Doly', 'Roria', 'Livet', '

The output from the model trained with twice the embeddings and units is more realistic, the generated names are more human-like and usable.
The difference in the provided output shows that the model trained with twice the parameters performs better compared to the default model and to the model trained with half the parameters.
Therefore I assume that a higher amount of parameters can help the model in better understanding the structure of names and how to generate them.

## Question 4
Transformer large language models can also generate text. The following code imports a pretrained GPT-2 model from Huggingface's Transformer library. This model can then be used directly to generate text, given a prompt as context. Alter the prompt to have the transformer model (GPT-2) generate an engaging story beginning using one of the following story starters:


*   It was the day the moon fell.
*   Am I in heaven?  What happened to me?
*   Wandering through the graveyard it felt like something was watching me.
*   Three of us.  We were the only ones left, the only ones to make it to the island.

There are several different methods to choose from to generate the text (as seen in the commented out lines below). Try out the different methods and play with the parameters. This [blogpost](https://huggingface.co/blog/how-to-generate) explains their differences.

Which method has the best performance?

Can GPT-2 generate Shakespere?

In [15]:
# Uncomment if transformers is not installed
!pip install transformers

Collecting transformers
  Downloading transformers-4.33.3-py3-none-any.whl (7.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.6/7.6 MB[0m [31m21.1 MB/s[0m eta [36m0:00:00[0m
Collecting huggingface-hub<1.0,>=0.15.1 (from transformers)
  Downloading huggingface_hub-0.17.3-py3-none-any.whl (295 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m295.0/295.0 kB[0m [31m30.1 MB/s[0m eta [36m0:00:00[0m
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1 (from transformers)
  Downloading tokenizers-0.13.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.8/7.8 MB[0m [31m55.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting safetensors>=0.3.1 (from transformers)
  Downloading safetensors-0.3.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m59.0 MB/s[0m eta [36m0:00:0

In [16]:
# Do not modify this code
# https://huggingface.co/docs/transformers/main_classes/text_generation

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("gpt2")

model = AutoModelForCausalLM.from_pretrained("gpt2")

prompt = "Today I believe we can finally"

input_ids = tokenizer(prompt, return_tensors="pt").input_ids

outputs = model.generate(input_ids, max_length=100) # Greedy search
#outputs = model.generate(input_ids, max_length=100, num_beams=5, no_repeat_ngram_size=3, early_stopping=True) # Beam search
#outputs = model.generate(input_ids, do_sample=True, max_length=100, top_k=0, temperature=0.7) # Sampling
#outputs = model.generate(input_ids, do_sample=True, max_length=100, top_k=50) # Top-k
#outputs = model.generate(input_ids, do_sample=True, max_length=100, top_k=50, top_p=0.92) # Top-p

tokenizer.batch_decode(outputs, skip_special_tokens=True)

Downloading (…)lve/main/config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


['Today I believe we can finally get to the point where we can make a difference in the lives of the people of the United States of America.\n\nI believe that we can make a difference in the lives of the people of the United States of America.\n\nI believe that we can make a difference in the lives of the people of the United States of America.\n\nI believe that we can make a difference in the lives of the people of the United States of America.\n\n']

In [17]:
def test_gpt2(prompt: str, max_length=100):
  input_ids = tokenizer(prompt, return_tensors="pt").input_ids

  outputs = model.generate(input_ids, max_length=100) # Greedy search
  print(tokenizer.batch_decode(outputs, skip_special_tokens=True))

  print('\n')

  outputs = model.generate(input_ids, max_length=max_length, num_beams=5, no_repeat_ngram_size=3, early_stopping=True) # Beam search
  print(tokenizer.batch_decode(outputs, skip_special_tokens=True))

  print('\n')

  outputs = model.generate(input_ids, do_sample=True, max_length=max_length, top_k=0, temperature=0.7) # Sampling
  print(tokenizer.batch_decode(outputs, skip_special_tokens=True))

  print('\n')

  outputs = model.generate(input_ids, do_sample=True, max_length=max_length, top_k=50) # Top-k
  print(tokenizer.batch_decode(outputs, skip_special_tokens=True))

  print('\n')

  outputs = model.generate(input_ids, do_sample=True, max_length=max_length, top_k=50, top_p=0.92) # Top-p
  print(tokenizer.batch_decode(outputs, skip_special_tokens=True))

In [18]:
prompt = "It was the day the moon fell."

test_gpt2(prompt)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


['It was the day the moon fell.\n\n"I was in the middle of the night, and I saw the moon rise and fall," he said. "I was in the middle of the night, and I saw the moon rise and fall."\n\nHe said he was in the middle of the night, and he saw the moon rise and fall.\n\n"I was in the middle of the night, and I saw the moon rise and fall," he said. "I was']




The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


['It was the day the moon fell.\n\n"It was a beautiful day," she said. "It was beautiful. It was beautiful."']




The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


['It was the day the moon fell. A single meteor hit the sky and it\'s the only way to get around a planet like this. It\'s like a dream come true.\n\n"Is that so?"\n\n"I\'m not sure. I\'ve never seen a meteor as big as this."\n\n"You think you can see a meteor?"\n\n"I\'m not sure. I\'ve never seen a meteor as big as this."\n\n- C4 -\n\n']




The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


["It was the day the moon fell. I could hear the moon's cry as her eyes opened. I could see the moon light with my naked eye. I saw her with one hand above her with the other (I also could see her at a distance, just as she must have been when she first found me on this planet). I could say that she seemed to be in love with me. That would have been hard for any man that did not have the pleasure of looking after himself for methyl"]


["It was the day the moon fell.\n\nAnd now I was the same. I was one of five children to survive. And I remember thinking of those who were my cousins who'd lost their families and grandparents, and who didn't like that I looked down on them and thought 'oh, they're going to do a fine job,' because I knew they would do it again.'\n\nIt is hard to believe that my grandfather, my grandmother, my great-grandfather and my great"]


In [19]:
# Test Shakespere
prompt = shakespeare[:100]

test_gpt2(prompt, max_length=1000)

prompt = shakespeare[:200]
test_gpt2(prompt, max_length=1000)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


['First Citizen:\nBefore we proceed any further, hear me speak.\n\nAll:\nSpeak, speak.\n\nFirst Citizen:\nYou are not a citizen.\n\nAll:\n\nSpeak, speak.\n\nFirst Citizen:\n\nYou are not a citizen.\n\nAll:\n\nSpeak, speak.\n\nFirst Citizen:\n\nYou are not a citizen.\n\nAll:\n\nSpeak, speak.\n\nFirst Citizen:']




The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


['First Citizen:\nBefore we proceed any further, hear me speak.\n\nAll:\nSpeak, speak.\n\nFirst Citizen:\nYou know what I\'m talking about. You know what it\'s like to be a first-class citizen in the United States of America. And you know what that means to me. I mean, you know, I\'ve been in this country for a long time, and I\'ve never seen anything like this before. And I think it\'s important for us to be able to say, "We\'re here to help you, and we\'re here for you." And that\'s what we\'re going to do. And we\'re not going to stop until we\'re able to do that. And it\'s going to take a lot of hard work, and it\'s gonna take time, but we\'re gonna do it. We\'re gonna make sure that we\'re doing everything we can to make this country a better place for all of us, and that\'s why I\'m here today. I want to thank all of you for being here today, and thank you all for your support. And thank you to all of the people who have been here for so long, and all of those who have worked so 

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


["First Citizen:\nBefore we proceed any further, hear me speak.\n\nAll:\nSpeak, speak.\n\nFirst Citizen:\nYou, out of the village, can tell me the list of things you want to know.\n\n(T-Shit, you're kidding!)\n\nFirst Citizen:\n\nI know.\n\nFirst Citizen:\n\nSo I can tell you why I feel like we're having a bad time.\n\n(T-Shit, you're kidding!)\n\nFirst Citizen:\n\nI know.\n\n param = paramAll(first_corpses, expression, paramAll(first_corpses, expression, paramAll(first_corpses, expression, paramAll(first_corpses, expression, paramAll(first_corpses, expression, paramAll(first_corpses, expression, paramAll(first_corpses, expression, paramAll(first_corpses, expression, paramAll(first_corpses, expression, paramAll(first_corpses, expression, paramAll(first_corpses, expression, paramAll(first_corpsBehavior) ) ) ) ) ) ) ) )\n\nFirst Citizen:\n\nI know.\n\nFirst Citizen:\n\nSo now we're out of here.\n\nFirst Citizen:\n\nI know.\n\nFirst Citizen:\n\nBy the way, it seems you're having so little

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


['First Citizen:\nBefore we proceed any further, hear me speak.\n\nAll:\nSpeak, speak.\n\nFirst Citizen:\nYou are very good. I will look at your suit and take it to the station!\n\nFirst Citizen:\n\nWhat you need is the armor; if you are weak enough you will take care of him.\n\nFirst Citizen:\n\nNo need. I am going."\n\nFirst Citizen:\n\nI will take your suit and send it to the station.\n\nFirst Citizen:\n\nI will wait upon you.\n\nNext, we come to the station.\n\nFirst Citizen:\n\nI am here for you. First Citizen:\n\nYou see the armor that you were given. Well, how long have you had to wear it before you came here.\n\nFirst Citizen:\n\nI shall have the armor until then.\n\nThe second part of the letter is.\n\nFirst Citizen:\n\nYou are not only an outlaw fighter, but your mother has been telling you for fifteen years. I will not give you any trouble, but I am sure I will do so as often as possible. When I find something, I will give it to you.\n\nIn this case they ask you.\n\nSecond P

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


["First Citizen:\nBefore we proceed any further, hear me speak.\n\nAll:\nSpeak, speak.\n\nFirst Citizen:\nYou are the commander. The man who will take this task, who will bring the first-born on board and make sure his species is secure and a sanctuary for the rest of history.\n\nSpeak:\n\nYou are our first-born. The first-born to reach the planet's surface, and the first-born to live on another world.\n\nYou:\n\nMy planet was built to be a space colony. It was constructed on top of the stars, in their own solar system. But we knew of nothing, not even a tiny planet, the only one to have been touched by space travel in its history.\n\nBefore we proceed, hear me speak.\n\nAll:\n\nYou are the commander. The man who will take this task, who will bring the first-born on board and make sure his species is secure and a sanctuary for the rest of history.\n\nSpeak:\n\nYou are our first-born. The first-born to reach the planet's surface, and the first-born to live on another world.\n\nYou:\n\nM

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


['First Citizen:\nBefore we proceed any further, hear me speak.\n\nAll:\nSpeak, speak.\n\nFirst Citizen:\nYou are all resolved rather to die than to famish?\n\nAll:\nResolved. resolved.\n\nFirst Citizen:\nFirst, you are all resolved rather to die than to famish?\n\nAll:\n\nResolved. resolved.\n\nFirst Citizen:\n\nYou are all resolved rather to die than to fam']




The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


["First Citizen:\nBefore we proceed any further, hear me speak.\n\nAll:\nSpeak, speak.\n\nFirst Citizen:\nYou are all resolved rather to die than to famish?\n\nAll:\nResolved. resolved.\n\nFirst Citizen:\nFirst, you are the only one who is willing to die. You are the one who has the courage to stand up for what you believe in. First, you have the strength to fight for what is right. You have the courage not to be afraid of what is wrong. You know that you are not alone, and you know that there are others who are willing to do the right thing. You do not fear death, but you do fear that you will not be able to do it. You fear that if you die, you will be unable to do what you want to do, because you are afraid of death, and because you fear that your life will be taken away from you if you do not stand up to the will of the people. You don't want to die because you don't know what to do with your life. You want to live, and that is what you have to live for. You live for the people, and

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


['First Citizen:\nBefore we proceed any further, hear me speak.\n\nAll:\nSpeak, speak.\n\nFirst Citizen:\nYou are all resolved rather to die than to famish?\n\nAll:\nResolved. resolved.\n\nFirst Citizen:\nFirst, you are right, my friend.\n\nAll:\n\nLet us not suffer the death of our father, your brother, your friend, if we shall suffer it.\n\nFirst Citizen:\n\nMay your death be a blessing, my friend?\n\nAll:\n\nLet us not suffer the death of your family, your friends, your brother, your sister, your mother, your father, your brother, your mother, your sister, your sister, your sister, your brother, your sister, your sister, your brother, your sister, your sister, your sister, your sister, your brother, your brother, your brother, your brother, your brother, your brother, your brother, your brother, your brother, your brother, your brother, your brother, your brother, your brother, your brother, your brother, your brother, your brother, your brother, your brother, your brother, your bro

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


['First Citizen:\nBefore we proceed any further, hear me speak.\n\nAll:\nSpeak, speak.\n\nFirst Citizen:\nYou are all resolved rather to die than to famish?\n\nAll:\nResolved. resolved.\n\nFirst Citizen:\nFirst, you are all resolved to live in peace with one another.\n\nFirst Citizen:\n\nBut when do you think she will, after her first, in another day.\n\nFirst Citizen:\n\nI dare not tell.\n\nFirst Citizen:\n\nYou did the things you did in vain.\n\nFirst Citizen:\n\nIt is not wise to say anything, except of some thing which is of great necessity.\n\nFirst Citizen:\n\nThere can be nothing good for me, except the cause of man.\n\nFirst Citizen:\n\nSo you thought I was an unworthy friend, and a fool.\n\nFirst Citizen:\n\nThat is no objection.\n\nFirst Citizen:\n\nIt is nothing if that man you talked to is not the same as you?\n\nFirst Citizen:\n\nYes.\n\nFirst Citizen:\n\nSo when will we both die together if she wants you?\n\nFirst Citizen:\n\nI am not your friend.\n\nFirst Citizen:\n\nHow