## Chapter 4: Recurrent Neural Networks

Set up the seeds so the results are reproducible:

In [1]:
import random
random.seed(2)

import tensorflow as tf
tf.random.set_seed(2)

import numpy as np
np.random.seed(2)

### Your Task

The website team wants to make it a bit more fun for customers to order pizza. They want to add a pizza button to their website called Adventure Time, which will automatically generate a set of ingredients that a person can order, and the chefs will cook. They saw an article by an MIT team and asked you if you can do something similar.

### Understand the data

You have been lucky enough to find the exact same data that the team used and load it up:

In [2]:
import numpy as np
data_pizza = np.load("datasets/recipes/data_pizza.npy")

In [3]:
data_pizza.shape

(200,)

You have 200 pizza recipes in there. Let's check them out:

In [4]:
from pprint import pprint
for entry in data_pizza[0:3]:
    pprint(entry)

{'categories': ['pizza'],
 'directions': ['Preheat oven to 400 degrees F (200 degrees C). Grease a 9x13 '
                'inch baking dish. Place ground beef in a large, deep skillet. '
                'Cook over medium high heat until evenly brown. Stir in '
                'pepperoni, and cook until browned. Drain excess fat. Stir in '
                'pizza sauce. Remove from heat, and set aside.',
                'Cut biscuits into quarters, and place in the bottom of baking '
                'dish. Spread meat mixture evenly over the biscuits. Sprinkle '
                'top with onion, olives and mushrooms.',
                'Bake uncovered in preheated oven for 20 to 25 minutes. '
                'Sprinkle top with mozzarella and Cheddar cheese. Bake an '
                'additional 5 to 10 minutes, until cheese is melted. Let stand '
                '10 minutes before serving.'],
 'ingredients': [[1, '', 'beef', ''],
                 [0.25, '', 'sausage', 'sliced'],
          

There's a lot more information there than what you need. You only care about the ingredients so extract those:

In [5]:
ingredients_only = [t['ingredients'] for t in data_pizza]

In [6]:
ingredients_only[0]

[[1, '', 'beef', ''],
 [0.25, '', 'sausage', 'sliced'],
 [1, '', 'sauce', ''],
 [2, '', 'buttermilk', 'refrigerated'],
 [0.5, '', 'onion', 'sliced_separated'],
 [1, '', 'black', 'sliced'],
 [1, '', 'mushroom', 'sliced'],
 [1.5, '', 'mozzarella_cheese', 'shredded'],
 [1, '', 'cheese', 'shredded']]

Now create a single string per pizza recipe from these ingredients:

In [7]:
joined_ingredients = []
for set_of_ingredients in ingredients_only:
    str_ingredients = ''
    for ingredient in set_of_ingredients:
        str_ingredients += ' '.join([str(t) for t in ingredient]) + ' '
    joined_ingredients.append(str_ingredients.replace('  ', ' '))

In [8]:
joined_ingredients[0]

'1 beef 0.25 sausage sliced 1 sauce 2 buttermilk refrigerated 0.5 onion sliced_separated 1 black sliced 1 mushroom sliced 1.5 mozzarella_cheese shredded 1 cheese shredded '

That looks better. Now make it simpler and join up all the text in one large part:

In [9]:
text = ' '.join(joined_ingredients).lower()

Step 1: Convert the joined ingredients into a list of numbers:

In [10]:
vocabulary = sorted(set(text))

character_to_number = {}

for idx, character in enumerate(vocabulary):
    character_to_number[character] = idx
    
number_to_character = np.array(vocabulary)

print(f'There are {len(vocabulary)} different characters in the vocabulary')

There are 39 different characters in the vocabulary


In [11]:
character_to_number

{' ': 0,
 '.': 1,
 '0': 2,
 '1': 3,
 '2': 4,
 '3': 5,
 '4': 6,
 '5': 7,
 '6': 8,
 '7': 9,
 '8': 10,
 '9': 11,
 '_': 12,
 'a': 13,
 'b': 14,
 'c': 15,
 'd': 16,
 'e': 17,
 'f': 18,
 'g': 19,
 'h': 20,
 'i': 21,
 'j': 22,
 'k': 23,
 'l': 24,
 'm': 25,
 'n': 26,
 'o': 27,
 'p': 28,
 'q': 29,
 'r': 30,
 's': 31,
 't': 32,
 'u': 33,
 'v': 34,
 'w': 35,
 'x': 36,
 'y': 37,
 'z': 38}

Convert the ingredients text to an array of numbers:

In [12]:
text_as_numbers = [character_to_number[character] for character in text]

Let's see how the characters map:

In [13]:
print(f'"{text[:13]}" maps to: {text_as_numbers[:13]}')

"1 beef 0.25 s" maps to: [3, 0, 14, 17, 17, 18, 0, 2, 1, 4, 7, 0, 31]


Step 2: Split the text into chunks. 

In [14]:
chunk_length = 15

chunk_length_with_extra_character = chunk_length + 1

chunks = []
for idx in range(0, len(text_as_numbers), chunk_length_with_extra_character):
    chunks.append(text_as_numbers[idx:idx+chunk_length_with_extra_character])

print(f'You split the text into {len(chunks)} of {chunk_length} characters')

You split the text into 1807 of 15 characters


Let's check the chunks:

In [15]:
for number_chunk in chunks[:5]:
    text_chunk = [number_to_character[item] for item in number_chunk]
    print(f"Number sequence: {number_chunk}")
    print(f"As text: {''.join(text_chunk)}")
    print('')

Number sequence: [3, 0, 14, 17, 17, 18, 0, 2, 1, 4, 7, 0, 31, 13, 33, 31]
As text: 1 beef 0.25 saus

Number sequence: [13, 19, 17, 0, 31, 24, 21, 15, 17, 16, 0, 3, 0, 31, 13, 33]
As text: age sliced 1 sau

Number sequence: [15, 17, 0, 4, 0, 14, 33, 32, 32, 17, 30, 25, 21, 24, 23, 0]
As text: ce 2 buttermilk 

Number sequence: [30, 17, 18, 30, 21, 19, 17, 30, 13, 32, 17, 16, 0, 2, 1, 7]
As text: refrigerated 0.5

Number sequence: [0, 27, 26, 21, 27, 26, 0, 31, 24, 21, 15, 17, 16, 12, 31, 17]
As text:  onion sliced_se



Step 3: Now convert all the chunks into input (x) and target (y) chunks:

In [16]:
x = []
y = []

for chunk in chunks:
    x.append(chunk[:-1])
    y.append(chunk[1:])

And let's take a look:

In [17]:
print(f"Input chunk: {''.join([number_to_character[item] for item in x[0]])}")
print(f"Target chunk: {''.join([number_to_character[item] for item in y[0]])}")

Input chunk: 1 beef 0.25 sau
Target chunk:  beef 0.25 saus


Step 4: Now create batches of these chunks:

In [18]:
batch_size = 64

In [19]:
batched_x = []
batched_y = []

for idx in range(0, min(len(x), len(y)), batch_size):
    if (
        len(x[idx:idx+batch_size]) == batch_size and 
        len(y[idx:idx+batch_size]) == batch_size
    ):
        batched_x.append(np.asarray(x[idx:idx+batch_size]))
        batched_y.append(np.asarray(y[idx:idx+batch_size]))

You know have batches of data to train with:

In [20]:
print(f'You have {len(batched_x)} batches of {len(batched_x[0])} '
      f'chunks of {len(batched_x[0][0])} characters each.')

You have 28 batches of 64 chunks of 15 characters each.


### Set Up Your First Recurrent Neural Network

Your first layer in the model will be an embedding layer. The embedding layer will convert the characters from having values from 0 to 39 to being represented by vectors, which allows the model to capture relationships between the characters. The embedding layer has the following configuration:

Input shape is the size of the vocabulary.

Batch_input_shape will be the size of the batches above - 64.

Output shape will be the size of the vector that the characters will be transformed into - this is set to 256 dimensions, but feel free to change it.

Next, add an LSTM layer with a memory size of 256. Set them up to return sequences such that the output is of the shape (number of samples, number of time steps, LSTM units) and be stateful - remember information between batches.

The output layer has to be able to output the next character in the sequence. That means it needs to have as many neurons as there are characters in your vocabulary. For this final layer, use a dense setup with linear activation.

You will need to change the embedding layer to accept smaller batches so bring everything together into one function that builds the model for you:

In [21]:
from tensorflow.keras.layers import Embedding, LSTM, Dense
from tensorflow.keras.models import Sequential

def create_model(batch_size):
    input_layer = Embedding(
        input_dim=len(vocabulary), 
        output_dim=256,
        batch_input_shape=[batch_size, None]
    )
    hidden_layer = LSTM(
        units=256, 
        return_sequences=True, 
        stateful=True
    )
    output_layer = Dense(units=len(vocabulary), activation='softmax')
    rnn_model = Sequential([
        input_layer,
        hidden_layer,
        output_layer,
    ])
    
    return rnn_model

In [22]:
rnn_model = create_model(batch_size)

In [23]:
rnn_model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding (Embedding)        (64, None, 256)           9984      
_________________________________________________________________
lstm (LSTM)                  (64, None, 256)           525312    
_________________________________________________________________
dense (Dense)                (64, None, 39)            10023     
Total params: 545,319
Trainable params: 545,319
Non-trainable params: 0
_________________________________________________________________


Your target data is encoded as numbers and not one-hot as usual, so you need to use sparse categorical cross-entropy instead of normal cross-entropy to make the algorithm aware of this:

In [24]:
rnn_model.compile(
    optimizer='adam', 
    loss='sparse_categorical_crossentropy'
)

This time around, you will be saving the model as it trains after every epoch. That is because you want to change the embedding layer's settings to from using batches of 64 chunks data to only use one chunk when using the model.

Set up Keras to store the model during training:

In [25]:
import os

from tensorflow.keras.callbacks import ModelCheckpoint

checkpoint_dir = './training_checkpoints'
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt_{epoch}")

checkpoint_callback = ModelCheckpoint(
    filepath=checkpoint_prefix,
    save_weights_only=True
)

And start training:

In [26]:
history = rnn_model.fit(
    batched_x, 
    batched_y, 
    epochs=200, 
    callbacks=[checkpoint_callback],
    batch_size=batch_size
)

Epoch 1/200
Epoch 2/200
Epoch 3/200
Epoch 4/200
Epoch 5/200
Epoch 6/200
Epoch 7/200
Epoch 8/200
Epoch 9/200
Epoch 10/200
Epoch 11/200
Epoch 12/200
Epoch 13/200
Epoch 14/200
Epoch 15/200
Epoch 16/200
Epoch 17/200
Epoch 18/200
Epoch 19/200
Epoch 20/200
Epoch 21/200
Epoch 22/200
Epoch 23/200
Epoch 24/200
Epoch 25/200
Epoch 26/200
Epoch 27/200
Epoch 28/200
Epoch 29/200
Epoch 30/200
Epoch 31/200
Epoch 32/200
Epoch 33/200
Epoch 34/200
Epoch 35/200
Epoch 36/200
Epoch 37/200
Epoch 38/200
Epoch 39/200
Epoch 40/200
Epoch 41/200
Epoch 42/200
Epoch 43/200
Epoch 44/200
Epoch 45/200
Epoch 46/200
Epoch 47/200
Epoch 48/200
Epoch 49/200
Epoch 50/200
Epoch 51/200
Epoch 52/200
Epoch 53/200
Epoch 54/200
Epoch 55/200
Epoch 56/200
Epoch 57/200
Epoch 58/200
Epoch 59/200
Epoch 60/200
Epoch 61/200
Epoch 62/200
Epoch 63/200
Epoch 64/200
Epoch 65/200
Epoch 66/200
Epoch 67/200
Epoch 68/200
Epoch 69/200
Epoch 70/200
Epoch 71/200
Epoch 72/200
Epoch 73/200
Epoch 74/200
Epoch 75/200
Epoch 76/200
Epoch 77/200
Epoch 78

Epoch 104/200
Epoch 105/200
Epoch 106/200
Epoch 107/200
Epoch 108/200
Epoch 109/200
Epoch 110/200
Epoch 111/200
Epoch 112/200
Epoch 113/200
Epoch 114/200
Epoch 115/200
Epoch 116/200
Epoch 117/200
Epoch 118/200
Epoch 119/200
Epoch 120/200
Epoch 121/200
Epoch 122/200
Epoch 123/200
Epoch 124/200
Epoch 125/200
Epoch 126/200
Epoch 127/200
Epoch 128/200
Epoch 129/200
Epoch 130/200
Epoch 131/200
Epoch 132/200
Epoch 133/200
Epoch 134/200
Epoch 135/200
Epoch 136/200
Epoch 137/200
Epoch 138/200
Epoch 139/200
Epoch 140/200
Epoch 141/200
Epoch 142/200
Epoch 143/200
Epoch 144/200
Epoch 145/200
Epoch 146/200
Epoch 147/200
Epoch 148/200
Epoch 149/200
Epoch 150/200
Epoch 151/200
Epoch 152/200
Epoch 153/200
Epoch 154/200
Epoch 155/200
Epoch 156/200
Epoch 157/200
Epoch 158/200
Epoch 159/200
Epoch 160/200
Epoch 161/200
Epoch 162/200
Epoch 163/200
Epoch 164/200
Epoch 165/200
Epoch 166/200
Epoch 167/200
Epoch 168/200
Epoch 169/200
Epoch 170/200
Epoch 171/200
Epoch 172/200
Epoch 173/200
Epoch 174/200
Epoch 

The model is trained. It's now time to build a new version of it that takes in 1 batch of 1 character:

In [27]:
from tensorflow.train import latest_checkpoint
from tensorflow import TensorShape

latest_checkpoint = latest_checkpoint(checkpoint_dir)

single_input_rnn_model = create_model(batch_size=1)
single_input_rnn_model.load_weights(latest_checkpoint)
single_input_rnn_model.build(TensorShape([1, None]))

To generate lists of ingredients, first determine the average size of a list of ingredients from your input data:

In [28]:
average_length_ingredients = np.mean([len(t) for t in joined_ingredients])
print(average_length_ingredients)

143.5


In [29]:
output_sequence_length = int(np.round(average_length_ingredients))
print(output_sequence_length)

144


To run the model, you can choose a starting character that will then be converted to a number and fed to the model. The model will output a probability from 0 to 100% of what the next character should be. Let's try it:

In [30]:
starting_character = 'a'

In [31]:
# convert starting character to a number and store it as a (batch, sample)
model_input = [[character_to_number[s] for s in starting_character]]

# store the generated text in here
generated_text = []

# reset the model
single_input_rnn_model.reset_states()

for i in range(output_sequence_length):
    predictions = single_input_rnn_model.predict(model_input)

    # np.argmax only returns the max of the predictions
    predicted_id = np.argmax(predictions)

    # use the predicted character as input now
    model_input = np.array([np.array([predicted_id])])

    generated_text.append(number_to_character[predicted_id])

print(starting_character + ''.join(generated_text))

atonion sliced_seast 1 gliced 1 mushroom 1 beef 0.25 saustared 1 clove 1 oril 1 crust refrigerated 0.5 onion chopped 1 dreded 1 chopped 1 green_o


As you can see, the ingredients are not perfect words! But humans can see some similarities to known words. The website team can't wait to set up the system. The chefs will be able to swap unclear ingredients with whatever they think works best. Will you also order one?