# ðŸ¥™ LSTM on Recipe Data

In this notebook, we'll walk through the steps required to train your own LSTM on the recipes dataset

In [3]:
%load_ext autoreload
%autoreload 2

import numpy as np
import json
import re
import string

import tensorflow as tf
from tensorflow.keras import layers, models, callbacks, losses

%cd /home/clachris/Documents/projects/Generative_Deep_Learning_2nd_Edition/notebooks

2023-07-21 16:12:22.946312: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-07-21 16:12:23.081748: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2023-07-21 16:12:23.081769: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2023-07-21 16:12:23.099537: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-07-21 16:12:23.655843: W tensorflow/stream_executor/platform/de

/home/clachris/Documents/projects/Generative_Deep_Learning_2nd_Edition/notebooks


## 0. Parameters <a name="parameters"></a>

In [4]:
VOCAB_SIZE = 10000
MAX_LEN = 200
EMBEDDING_DIM = 100
N_UNITS = 128
VALIDATION_SPLIT = 0.2
SEED = 42
LOAD_MODEL = False
BATCH_SIZE = 32
EPOCHS = 25

## 1. Load the data <a name="load"></a>

In [5]:
# Load the full dataset
with open('/home/clachris/Documents/projects/Generative_Deep_Learning_2nd_Edition/data/full_format_recipes.json') as json_data:
    recipe_data = json.load(json_data)

In [6]:
# Filter the dataset
filtered_data = [
    "Recipe for " + x["title"] + " | " + " ".join(x["directions"])
    for x in recipe_data
    if "title" in x
    and x["title"] is not None
    and "directions" in x
    and x["directions"] is not None
]

In [7]:
# Count the recipes
n_recipes = len(filtered_data)
print(f"{n_recipes} recipes loaded")

20111 recipes loaded


In [8]:
example = filtered_data[9]
print(example)

Recipe for Ham Persillade with Mustard Potato Salad and Mashed Peas  | Chop enough parsley leaves to measure 1 tablespoon; reserve. Chop remaining leaves and stems and simmer with broth and garlic in a small saucepan, covered, 5 minutes. Meanwhile, sprinkle gelatin over water in a medium bowl and let soften 1 minute. Strain broth through a fine-mesh sieve into bowl with gelatin and stir to dissolve. Season with salt and pepper. Set bowl in an ice bath and cool to room temperature, stirring. Toss ham with reserved parsley and divide among jars. Pour gelatin on top and chill until set, at least 1 hour. Whisk together mayonnaise, mustard, vinegar, 1/4 teaspoon salt, and 1/4 teaspoon pepper in a large bowl. Stir in celery, cornichons, and potatoes. Pulse peas with marjoram, oil, 1/2 teaspoon pepper, and 1/4 teaspoon salt in a food processor to a coarse mash. Layer peas, then potato salad, over ham.


## 2. Tokenise the data

In [9]:
# Pad the punctuation, to treat them as separate 'words'
def pad_punctuation(s):
    s = re.sub(f"([{string.punctuation}])", r" \1 ", s)
    s = re.sub(" +", " ", s)
    return s

text_data = [pad_punctuation(x) for x in filtered_data]

In [10]:
# Display an example of a recipe
example_data = text_data[9]
example_data

'Recipe for Ham Persillade with Mustard Potato Salad and Mashed Peas | Chop enough parsley leaves to measure 1 tablespoon ; reserve . Chop remaining leaves and stems and simmer with broth and garlic in a small saucepan , covered , 5 minutes . Meanwhile , sprinkle gelatin over water in a medium bowl and let soften 1 minute . Strain broth through a fine - mesh sieve into bowl with gelatin and stir to dissolve . Season with salt and pepper . Set bowl in an ice bath and cool to room temperature , stirring . Toss ham with reserved parsley and divide among jars . Pour gelatin on top and chill until set , at least 1 hour . Whisk together mayonnaise , mustard , vinegar , 1 / 4 teaspoon salt , and 1 / 4 teaspoon pepper in a large bowl . Stir in celery , cornichons , and potatoes . Pulse peas with marjoram , oil , 1 / 2 teaspoon pepper , and 1 / 4 teaspoon salt in a food processor to a coarse mash . Layer peas , then potato salad , over ham . '

In [11]:
# Convert to a Tensorflow Dataset
text_ds = (
    tf.data.Dataset.from_tensor_slices(text_data)
    .batch(BATCH_SIZE)
    .shuffle(1000)
)

2023-07-21 16:12:28.282938: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-07-21 16:12:28.283154: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2023-07-21 16:12:28.283184: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublas.so.11'; dlerror: libcublas.so.11: cannot open shared object file: No such file or directory
2023-07-21 16:12:28.283212: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublasLt.so.11'; dlerror: libcublasLt.so.11: cannot open shared object file: No such file or directory
2023-07-21 16:12:28.283237: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Co

In [12]:
# Create a vectorisation layer
vectorize_layer = layers.TextVectorization(
    standardize="lower",
    max_tokens=VOCAB_SIZE,
    output_mode="int",
    output_sequence_length=MAX_LEN + 1,
)

In [44]:
# Adapt the layer to the training set
vectorize_layer.adapt(text_ds)
vocab = vectorize_layer.get_vocabulary()

In [14]:
# Display some token:word mappings
for i, word in enumerate(vocab[:10]):
    print(f"{i}: {word}")

0: 
1: [UNK]
2: .
3: ,
4: and
5: to
6: in
7: the
8: with
9: a


In [15]:
# Display the same example converted to ints
example_tokenised = vectorize_layer(example_data)
print(example_tokenised.numpy())

[  26   16  557    1    8  298  335  189    4 1054  494   27  332  228
  235  262    5  594   11  133   22  311    2  332   45  262    4  671
    4   70    8  171    4   81    6    9   65   80    3  121    3   59
   12    2  299    3   88  650   20   39    6    9   29   21    4   67
  529   11  164    2  320  171  102    9  374   13  643  306   25   21
    8  650    4   42    5  931    2   63    8   24    4   33    2  114
   21    6  178  181 1245    4   60    5  140  112    3   48    2  117
  557    8  285  235    4  200  292  980    2  107  650   28   72    4
  108   10  114    3   57  204   11  172    2   73  110  482    3  298
    3  190    3   11   23   32  142   24    3    4   11   23   32  142
   33    6    9   30   21    2   42    6  353    3 3224    3    4  150
    2  437  494    8 1281    3   37    3   11   23   15  142   33    3
    4   11   23   32  142   24    6    9  291  188    5    9  412  572
    2  230  494    3   46  335  189    3   20  557    2    0    0    0
    0 

## 3. Create the Training Set

In [28]:
# Create the training set of recipes and the same text shifted by one word
def prepare_inputs(text):
    text = tf.expand_dims(text, -1)
    tokenized_sentences = vectorize_layer(text)
    x = tokenized_sentences[:, :-1]
    y = tokenized_sentences[:, 1:]
    return x, y

train_ds = text_ds.map(prepare_inputs)
first_batch = next(iter(train_ds))
print(f'x: {first_batch[0]}')
print(f'y: {first_batch[1]}')

'''
So, we see that the x and y are just shifted by one!
'''

x: [[  26   16  247 ...    0    0    0]
 [  26   16 4163 ...    0    0    0]
 [  26   16  479 ...   96   22   40]
 ...
 [  26   16  264 ...    3  891   99]
 [  26   16  420 ...    0    0    0]
 [  26   16  187 ...    0    0    0]]
y: [[  16  247 1446 ...    0    0    0]
 [  16 4163  265 ...    0    0    0]
 [  16  479  109 ...   22   40    5]
 ...
 [  16  264  725 ...  891   99    7]
 [  16  420  272 ...    0    0    0]
 [  16  187 1336 ...    0    0    0]]


'\nSo, we see that the x and y are just shifted by one!\n'

## 4. Build the LSTM <a name="build"></a>

## Mathematical Overview of the Recurrent Neural Netowkrs

The *Recurrent Neural Network* (RNN) is one in which an output of the neural network is used as input again, hence the name *recurrent*. Values flow through the network, influencing later outputs. This architecture make sense for *sequential* tasks, takes that have a time or ordered structure. To predict the next word in a sentence, one needs to know what was said before. The translate, one must have gender and/or plurality of a noun that appears before the adjective, and so on. The RNN does this by passing an activation/output/hidden state to each iteration/pass-through that the subsequent iteration (perhaps also with the input from the sequence at that timestep) to produce output for the next iteration (and perhaps a prediction). In this way, information from earlier iterations can be passed down the sequence and influence predictions.

In the vanilla RNN, given a data instance $\bf{x}$ of length $T$, each iteration performs the following operations on the $t^{th}$ element of $\bf{x}$:

$$a^{<t>}=g_1(W_{aa}a^{<t-1>}+W_{ax}{\bf{x}}^{<t>}+b_a)$$

$$y^{<t>}=g_2(W_{ya}a^{<t>}+b_y)$$

In the above, $W_{aa}$ are the weights associate with the activation of the previous iteration, $W_{ax}$ are the weights associated with the input from the sequence, $W_{ya}$ are the weights used to calculate the prediction, and $b_a$ and $b_y$ are the biases used to calculate the activation and the prediction, respectively. The $g$ functions are the activation functions of the *cell*. Note that this is all happening within a single cell, though we often *unroll* it to show how each element passes through the cell, along with the previous iteration's output. Here are the two graphics demonstrating this:

<div style='text-align: center;'>
    <img src='/home/clachris/Documents/projects/Generative_Deep_Learning_2nd_Edition/notebooks/Graphics/RNN_Cell.png' alt='RNN_Cell' width='500'>
</div>

In the above graphic, the next step would be to feed $a^{<t>}$, along with $\mathbf{x}_t$, into the cell again. This looks like:

<div style='text-align: center;'>
    <img src='/home/clachris/Documents/projects/Generative_Deep_Learning_2nd_Edition/notebooks/Graphics/RNN_Unfurled.png' alt='RNN_Unfurled' width='500'>
</div>

They can also be stacked to give a Deep RNN:

<div style='text-align: center;'>
    <img src='/home/clachris/Documents/projects/Generative_Deep_Learning_2nd_Edition/notebooks/Graphics/Deep_RNN.png' alt='Deep_RNN' width='300'>
</div>


When the timeseries can be read forward and backward, we can even have a *Bidirectional RNN*:

<div style='text-align: center;'>
    <img src='/home/clachris/Documents/projects/Generative_Deep_Learning_2nd_Edition/notebooks/Graphics/Birdirectional_RNN.png' alt='Bridirectional_RNN' width='300'>
</div>

There are further several variants for what kind of prediction we want. Are we predicting an entire sequence sequentially, as in text generation, or are perhaps only a single value at a time, such as predicting a stock movement after some days' behavior? This leads to the following variants: 

*One-to-One*:

<div style='text-align: center;'>
    <img src='/home/clachris/Documents/projects/Generative_Deep_Learning_2nd_Edition/notebooks/Graphics/One-to-One.png' alt='One-to-One' width='300'>
</div>

*One-to-Many*:

<div style='text-align: center;'>
    <img src='/home/clachris/Documents/projects/Generative_Deep_Learning_2nd_Edition/notebooks/Graphics/One-to-Many.png' alt='One-to-Many' width='300'>
</div>

*Many-to-One*:

<div style='text-align: center;'>
    <img src='/home/clachris/Documents/projects/Generative_Deep_Learning_2nd_Edition/notebooks/Graphics/Many-to-One.png' alt='Many-to-One' width='300'>
</div>

*Many-to-Many (length of $\mathbf{x}$ is equal to length of $\mathbf{y}$)*:
<div style='text-align: center;'>
    <img src='/home/clachris/Documents/projects/Generative_Deep_Learning_2nd_Edition/notebooks/Graphics/Many-to-Many_x_equal_y.png' alt='Many-to-Many_x_equal_y' width='300'>
</div>

*Many-to-Many ($\mathbf{x}$ is not equal to length of $\mathbf{y}$)*:

<div style='text-align: center;'>
    <img src='/home/clachris/Documents/projects/Generative_Deep_Learning_2nd_Edition/notebooks/Graphics/Many-to-Many_x_nequal_y.png' alt='Many-to-Many_x_equal_y' width='300'>
</div>

## Mathematical Overview of the Long-Short Term Memory Network

The above formulation suffers from several defects, most notably a *vanishing gradient*. As the sequence length increases, it becomes harder for the network to keep track of what is happening and the gradient plummets, taking the learning with it. To fix this, two novel approaches are the *Gated Reccurent Neural Network* (GRU) and the *Long-Short Term Neural Network* (LSTM).

The GRU introduces the *Relevance* and *Update* gates. We will denote these gates by $\Gamma_r$ and $\Gamma_u$, respectively, where each gate is defined as:

$$ \Gamma = \sigma (W\mathbf{x}^{<t>} + Ua^{<t-1>}+b) $$

In the above, $W,U$ and $b$ are the weights associated with the input, the weights associated with the activation, and the bias of the gate. $\sigma$ is the sigmoid activation function. Thus, each gate has a value between 0 and 1.

Then, the GRU will output an activation $a^{<t>}$ and $c^{<t>}$, which gives information about to handle the activation at the next timestep. In the GRU, these are calculated by:

$$\tilde{c}^{<t>} = tanh(W_c[\Gamma_r \odot a^{<t-1>}, {\bf{x}}^{<t>}]+b_c)$$

$$c^{<t>} = \Gamma_u \odot \tilde{c}^{<t>} + (1-\Gamma_u) c^{<t-1>}$$

$$a^{<t>} = c^{<t>}$$

Graphically, this looks like:

<div style='text-align: center;'>
    <img src='/home/clachris/Documents/projects/Generative_Deep_Learning_2nd_Edition/notebooks/Graphics/GRU.png' alt='GRU' width='300'>
</div>

The LSTM block introduces the *Forget* Gate $\Gamma_f$ and the *Output* Gate $\Gamma_o$. They are calculated in the following manner:

$$\tilde{c}^{<t>}=tanh(W_c[\Gamma_r \odot a^{<t-1>}, {\bf{x}}^{<t>}]+b_c)$$

$$c^{<t>}=\Gamma_u \odot \tilde{c}^{<t>} + \Gamma_f c^{<t-1>}$$

$$a^{<t>}=\Gamma_o \odot c^{<t>}$$

Graphically, this looks like:

<div style='text-align: center;'>
    <img src='/home/clachris/Documents/projects/Generative_Deep_Learning_2nd_Edition/notebooks/Graphics/LSTM.png' alt='LSTM' width='300'>
</div>

This cheatsheet from Stanford is quite helpful: https://stanford.edu/~shervine/teaching/cs-230/cheatsheet-recurrent-neural-networks

## Overview of the Preprocessing and Embedding Layer

Since we dealing with text sequences here, we need to talk a bit about how we deal with those. To begin, we need to *tokenize* the text. This essentially splits the text up into words and punctuation *tokens*. From there, we can remove capitalization, *stem* the words (turning things like *browsing*, *browsed* and *browsing* into just *brows*), and more. We could also break the text into just characters.

Then, we need to count how many unique tokens we have, giving us our *Vocabulary* $V$. We might also prefer to replace sparsly occuring words with the *UNK*, or *unknown* token to reduce the number of parameters. 

Then, we create a vectorized layer which assigns a unique integer to each token based on frequency, with the integer $0$ reserved for padding, which is done so each data instance is the same length, the $1$ for the unknown words. Note that in this notebook, we have padded length of $201$, with the extra being the thing are going to predict.

Now, at this point, one might be thinking we need to one-hot encode the integers, as we would use in normal multi-class classification. However, we would actually like to learn some *representation* of each word. So, we will use and *Embedding Layer*. We define the length of each embedding, in our case, $100$, giving us an embedding layer with $100 * 10000 = 1000000$ weights, which we can learn. 

Graphically, this looks like (with a length of only $4$):

<div style='text-align: center;'>
    <img src='/home/clachris/Documents/projects/Generative_Deep_Learning_2nd_Edition/notebooks/Graphics/EmbeddingLayer.png' alt='EmbeddingLayer' width='600'>
</div>

We note here something interesting. We can actually use this embedding on its own, as it represents the relationship between the various words to each other. Consider a $2D$ example comparing Royalty and Gender:

<div style='text-align: center;'>
    <img src='/home/clachris/Documents/projects/Generative_Deep_Learning_2nd_Edition/notebooks/Graphics/EmbeddingRelationships.png' alt='EmbeddingRelationships' width='500'>
</div>

So, we see that certain words are grouped together, as they should be, and that the distance from, say, *king* and *man* is about the same as *woman* and *queen*, signifying that the embeddings have captured some notion that the two groups of words are related. 

Now, at the end, the output of the model will be a probability vector $\mathbf{p} \in [0,1]^{|V|}, \sum_{p \in \mathbf{p}} p = 1$, and we simply take the most likely words as our word choice.

In [34]:
inputs = layers.Input(shape=(None,), dtype="int32")
x = layers.Embedding(VOCAB_SIZE, EMBEDDING_DIM)(inputs)
x = layers.LSTM(N_UNITS, return_sequences=True)(x) # Note that the N_UNITS here is the size of the hidden output (a, in our mathematical formulation),
# not the sequence length, which is 200. We are returning the sequences, which will be one hidden state for each token, and then feeding the hidden state
# outputs to the dense layer, which will given a prediction for the next word for each hidden state
outputs = layers.Dense(VOCAB_SIZE, activation="softmax")(x) # The output will now be a probability vector over the entire vocabulary size for each
# of the 200 outputs of the model
lstm = models.Model(inputs, outputs)
lstm.summary()

Model: "model_3"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_4 (InputLayer)        [(None, None)]            0         
                                                                 
 embedding_3 (Embedding)     (None, None, 100)         1000000   
                                                                 
 lstm_3 (LSTM)               (None, None, 128)         117248    
                                                                 
 dense_3 (Dense)             (None, None, 10000)       1290000   
                                                                 
Total params: 2,407,248
Trainable params: 2,407,248
Non-trainable params: 0
_________________________________________________________________


In [35]:
if LOAD_MODEL:
    # model.load_weights('./models/model')
    lstm = models.load_model("./models/lstm", compile=False)

## 5. Train the LSTM <a name="train"></a>

In [36]:
loss_fn = losses.SparseCategoricalCrossentropy()
lstm.compile("adam", loss_fn)

In [48]:
# Create a TextGenerator checkpoint
class TextGenerator(callbacks.Callback):
    def __init__(self, index_to_word, top_k=10):
        self.index_to_word = index_to_word 
        self.word_to_index = {
            word: index for index, word in enumerate(index_to_word)
        } # Reverses the vocab we created earlier to go from word to index

    def sample_from(self, probs, temperature):  # Once we have our probability vectors, this will sample from them
        probs = probs ** (1 / temperature) # This changes the probabilities by a temperature, closer to 0 is more deterministic and 1 is more random
        probs = probs / np.sum(probs) # The probabilities no longer sum to one, so we normalize them
        return np.random.choice(len(probs), p=probs), probs # Sampling a single value from the probabilities

    def generate(self, start_prompt, max_tokens, temperature):
        start_tokens = [
            self.word_to_index.get(x, 1) for x in start_prompt.split()
        ]  # <3>
        sample_token = None
        info = []
        while len(start_tokens) < max_tokens and sample_token != 0:  # We start with some start prompt and are going to add to it until we reach the desired length
            x = np.array([start_tokens]) # Has shape (batch_size=1, num_tokens)
            y = self.model.predict(x, verbose=0) # Outputs shape (batch_size=1, len(x), 10000). Note, it is len(x) because it is predicting a staggered string,
            # as in, it will skip the first word in the input, predict all the input, and then predict the next word
            sample_token, probs = self.sample_from(y[0][-1], temperature)  # Samples from our y. Note y[0][-1] is because we remove the batch (of 1),
            # then we take the last element to sample from, as we already have all the rest as the input prompt
            info.append({"prompt": start_prompt, "word_probs": probs}) # Keeping track of what is happening
            start_tokens.append(sample_token)  # Adding the chosen token to our input
            start_prompt = start_prompt + " " + self.index_to_word[sample_token] # Adding the word to our start prompt, which will grow as we predict
        print(f"\ngenerated text:\n{start_prompt}\n")
        return info

    def on_epoch_end(self, epoch, logs=None):
        self.generate("recipe for", max_tokens=100, temperature=1.0)

In [49]:
# Create a model save checkpoint
model_checkpoint_callback = callbacks.ModelCheckpoint(
    filepath="./checkpoint/checkpoint.ckpt",
    save_weights_only=True,
    save_freq="epoch",
    verbose=0,
)

tensorboard_callback = callbacks.TensorBoard(log_dir="./logs")

# Tokenize starting prompt
text_generator = TextGenerator(vocab)

In [50]:
lstm.fit(
    train_ds,
    epochs=EPOCHS,
    callbacks=[model_checkpoint_callback, tensorboard_callback, text_generator],
)

Epoch 1/25
generated text:
recipe for breasts pudding with beef and potatoes with malt | melt butter in large saucepan over medium heat . continue simmer until just tender , about 30 minutes . divide rice mixture into bowl . toss toss evenly with remaining 3 / 4 cup oil . chopped remaining almonds atop each of potatoes and serve . 

Epoch 2/25
generated text:
recipe for lemon potato marinated | in a food processor combine the seed butter , the melted flavor more , using it , pressing it on outside ' logs of your holes . knead the dough through your fingers and starting for each . reduce potatoes with a time , leaving a paste , about 5 minutes in each plate . put the palms crosswise into a cavity , and cut into 1 / 4 - inch pieces . brush meat with a large pastry fine sheets , discarding strip the foil and drizzle with shell oil to loosen the

Epoch 3/25
generated text:
recipe for apricot cake cake with mushroom layer and cream | on a plate sift together flour , baking powder , and salt

<keras.callbacks.History at 0x7f57f03120b0>

In [51]:
# Save the final model
lstm.save("./models/lstm")



INFO:tensorflow:Assets written to: ./models/lstm/assets


INFO:tensorflow:Assets written to: ./models/lstm/assets


## 6. Generate text using the LSTM

In [52]:
def print_probs(info, vocab, top_k=5):
    for i in info:
        print(f"\nPROMPT: {i['prompt']}")
        word_probs = i["word_probs"]
        p_sorted = np.sort(word_probs)[::-1][:top_k]
        i_sorted = np.argsort(word_probs)[::-1][:top_k]
        for p, i in zip(p_sorted, i_sorted):
            print(f"{vocab[i]}:   \t{np.round(100*p,2)}%")
        print("--------\n")

In [53]:
info = text_generator.generate(
    "recipe for roasted vegetables | chop 1 /", max_tokens=20, temperature=1.0
)


generated text:
recipe for roasted vegetables | chop 1 / 2 cup plus 2 tablespoons blueberries ; 1 / 3 cup cornstarch



In [54]:
print_probs(info, vocab)


PROMPT: recipe for roasted vegetables | chop 1 /
4:   	50.1%
2:   	36.57%
3:   	8.29%
8:   	3.49%
6:   	0.27%
--------


PROMPT: recipe for roasted vegetables | chop 1 / 2
cup:   	28.52%
tsp:   	23.96%
teaspoon:   	22.14%
tablespoon:   	4.4%
inch:   	2.95%
--------


PROMPT: recipe for roasted vegetables | chop 1 / 2 cup
of:   	18.6%
hot:   	6.87%
warm:   	6.2%
granny:   	2.91%
corn:   	2.83%
--------


PROMPT: recipe for roasted vegetables | chop 1 / 2 cup plus
2:   	47.75%
1:   	27.92%
3:   	1.91%
a:   	1.91%
garlic:   	1.83%
--------


PROMPT: recipe for roasted vegetables | chop 1 / 2 cup plus 2
tablespoons:   	46.41%
tbsp:   	16.42%
teaspoons:   	13.22%
tsp:   	7.55%
cups:   	2.14%
--------


PROMPT: recipe for roasted vegetables | chop 1 / 2 cup plus 2 tablespoons
of:   	9.1%
lemon:   	7.59%
lime:   	7.51%
oil:   	3.8%
flour:   	2.81%
--------


PROMPT: recipe for roasted vegetables | chop 1 / 2 cup plus 2 tablespoons blueberries
in:   	46.42%
into:   	31.57%
and:   	4.26%
with:

In [55]:
info = text_generator.generate(
    "recipe for roasted vegetables | chop 1 /", max_tokens=20, temperature=0.2
)


generated text:
recipe for roasted vegetables | chop 1 / 2 tsp . salt and 1 / 4 tsp . pepper in



In [56]:
print_probs(info, vocab)


PROMPT: recipe for roasted vegetables | chop 1 /
4:   	82.83%
2:   	17.16%
3:   	0.01%
8:   	0.0%
6:   	0.0%
--------


PROMPT: recipe for roasted vegetables | chop 1 / 2
cup:   	58.81%
tsp:   	24.59%
teaspoon:   	16.59%
tablespoon:   	0.01%
inch:   	0.0%
--------


PROMPT: recipe for roasted vegetables | chop 1 / 2 tsp
.:   	100.0%
salt:   	0.0%
each:   	0.0%
;:   	0.0%
butter:   	0.0%
--------


PROMPT: recipe for roasted vegetables | chop 1 / 2 tsp .
salt:   	100.0%
oil:   	0.0%
kosher:   	0.0%
butter:   	0.0%
pepper:   	0.0%
--------


PROMPT: recipe for roasted vegetables | chop 1 / 2 tsp . salt
and:   	96.03%
,:   	3.84%
.:   	0.12%
to:   	0.01%
in:   	0.0%
--------


PROMPT: recipe for roasted vegetables | chop 1 / 2 tsp . salt and
1:   	99.99%
pepper:   	0.01%
peel:   	0.0%
2:   	0.0%
core:   	0.0%
--------


PROMPT: recipe for roasted vegetables | chop 1 / 2 tsp . salt and 1
/:   	99.97%
tsp:   	0.01%
.:   	0.01%
teaspoon:   	0.0%
1:   	0.0%
--------


PROMPT: recipe for roas

In [57]:
info = text_generator.generate(
    "recipe for chocolate ice cream |", max_tokens=20, temperature=1.0
)
print_probs(info, vocab)


generated text:
recipe for chocolate ice cream | beat butter and sweetened together nuts in a food processor ( reserve remainder if


PROMPT: recipe for chocolate ice cream |
preheat:   	18.56%
in:   	16.04%
stir:   	8.93%
whisk:   	8.54%
bring:   	6.9%
--------


PROMPT: recipe for chocolate ice cream | beat
cream:   	28.59%
egg:   	11.67%
together:   	8.11%
eggs:   	7.87%
1:   	4.96%
--------


PROMPT: recipe for chocolate ice cream | beat butter
and:   	58.76%
,:   	20.2%
in:   	9.05%
with:   	6.94%
together:   	0.91%
--------


PROMPT: recipe for chocolate ice cream | beat butter and
sugar:   	40.48%
1:   	10.66%
butter:   	5.42%
2:   	3.28%
salt:   	2.8%
--------


PROMPT: recipe for chocolate ice cream | beat butter and sweetened
condensed:   	26.57%
sugar:   	10.54%
powdered:   	3.29%
egg:   	3.07%
in:   	2.28%
--------


PROMPT: recipe for chocolate ice cream | beat butter and sweetened together
in:   	33.79%
sugar:   	8.92%
butter:   	3.93%
remaining:   	3.41%
lemon:   	3.12%

In [58]:
info = text_generator.generate(
    "recipe for chocolate ice cream |", max_tokens=20, temperature=0.2
)
print_probs(info, vocab)


generated text:
recipe for chocolate ice cream | preheat oven to 350Â°f . line a baking sheet with parchment paper . in


PROMPT: recipe for chocolate ice cream |
preheat:   	65.03%
in:   	31.43%
stir:   	1.68%
whisk:   	1.34%
bring:   	0.46%
--------


PROMPT: recipe for chocolate ice cream | preheat
oven:   	99.92%
the:   	0.08%
a:   	0.0%
to:   	0.0%
broiler:   	0.0%
--------


PROMPT: recipe for chocolate ice cream | preheat oven
to:   	100.0%
temperature:   	0.0%
and:   	0.0%
.:   	0.0%
rack:   	0.0%
--------


PROMPT: recipe for chocolate ice cream | preheat oven to
350Â°f:   	99.77%
375Â°f:   	0.2%
400Â°f:   	0.01%
325Â°f:   	0.01%
425Â°f:   	0.0%
--------


PROMPT: recipe for chocolate ice cream | preheat oven to 350Â°f
.:   	100.0%
with:   	0.0%
and:   	0.0%
(:   	0.0%
,:   	0.0%
--------


PROMPT: recipe for chocolate ice cream | preheat oven to 350Â°f .
line:   	54.82%
butter:   	42.75%
in:   	2.16%
combine:   	0.18%
lightly:   	0.03%
--------


PROMPT: recipe for chocolate