In [1]:
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import keras.backend as K

Using TensorFlow backend.


Pretend we live in a world with just 7 letters: E-D-H-L-O-R-W. 

We would like to teach our network to say "helloworld", given a starting syllable "hel". 

We can represent any character in our alphabet with a 7-vector, using one-hot encoding.

In [2]:
def letter_to_encoding(letter):
    letters = ['e','d','h','l','o','r','w']
    vec = np.zeros((7),dtype="float32")
    vec[letters.index(letter)] = 1
    return vec

As an example, the word "hello" would be represented as the follow matrix:

In [3]:
for letter in "hello":
    print(letter_to_encoding(letter))

[0. 0. 1. 0. 0. 0. 0.]
[1. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 1. 0. 0. 0.]
[0. 0. 0. 1. 0. 0. 0.]
[0. 0. 0. 0. 1. 0. 0.]


In this example, we will use a single-layer RNN.
The network takes an input with shape 3x7xN, where 3 represents the input sequence length, 7 represents the size of the data vector associated with each input, and N is the number of samples in a training batch. 

The RNN layer is a Keras SimpleRNN; each of the 3 nodes of this RNN will create a 7-dimensional output (again used to represent the likelihood associated with a single character in our alphabet). Though each length-3 input will map to a length-3 output, we take only the 3rd (final) output to represent the predicted "next character" of our phrase. Had we left off the "return_sequences" parameter, the network would output only this final character (but for sake of example, it is interesting to see the entire predicted sequence).  

Note that we apply a softmax activation function prior to output to normalize the raw output. 

In [4]:
model = keras.Sequential()
model.add(keras.Input(shape=(3,7,)))
model.add(layers.SimpleRNN(7, activation="softmax",return_sequences=True))
model.summary()

Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
simple_rnn (SimpleRNN)       (None, 3, 7)              105       
Total params: 105
Trainable params: 105
Non-trainable params: 0
_________________________________________________________________


Exercise N at the end of the chapter asks you to think about why this network has 105 trainable parameters. As a hint, recall that the input to the RNN node is made up of the input vector, plus the hidden state, plus a bias term, and that these inputs are fully connected to the generated output.

In the below cell, we create our training data. In our hypothetical world, the only phrase that exists is "helloworld", so we train the network on fragments of this phrase. You can see in the below example that for each three-letter portion, the output matches the next predicted letter following each letter of the input. 

In [5]:
train_text = "helloworld"*30
def generate_train_set(train_text, as_words=False):
    x_train = []
    y_train = []
    for i in range(len(train_text)-4):
        if as_words:
            x_train += [[train_text[i:i+3]]]
            y_train += [[train_text[i+1:i+4]]]

        else:
            x_train += [[letter_to_encoding(letter) for letter in train_text[i:i+3]]]
            y_train += [[letter_to_encoding(letter) for letter in train_text[i+1:i+4]]]

    if as_words:
        print(x_train[0][:5])
        print(y_train[0][:5])
    else:
        print(np.array(x_train)[0,:5])
        print(np.array(x_train).shape)
        
    return np.array(x_train), np.array(y_train)
    
generate_train_set(train_text, True)
x_train, y_train = generate_train_set(train_text)

['hel']
['ell']
[[0. 0. 1. 0. 0. 0. 0.]
 [1. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 1. 0. 0. 0.]]
(296, 3, 7)


So what happens when we generate an output phrase using the untrained network?

In [6]:
letters = ['e','d','h','l','o','r','w']

seed = "hel"
result = "hel"
input_data = np.array([[letter_to_encoding(letter) for letter in seed]])
model.get_weights()

for i in range(7):
    
    out = model(input_data)
    print_output = K.eval(out)
    for row in print_output[0]:
        next_letter = letters[np.argmax(row)]
    result += next_letter
    print(result)
    input_data = np.array([[letter_to_encoding(letter) for letter in result[-3:]]])

hele
helee
heleee
heleeee
heleeeee
heleeeeee
heleeeeeee


Next, we train:

In [9]:
model.compile(optimizer=keras.optimizers.Adam(learning_rate=0.01), loss=tf.keras.losses.CategoricalCrossentropy())
model.fit(x_train, y_train, batch_size=24, epochs=300, verbose=0)

<tensorflow.python.keras.callbacks.History at 0x1ceb9ba1d08>

We see the network has effectively learned to generate the input sequence given a 3-letter seed:

In [10]:
input_data = np.array([[letter_to_encoding(letter) for letter in seed]])
model.get_weights()
result = "hel"
for i in range(7):
    out = model(input_data)
    print_output = K.eval(out)
    for row in print_output[0]:
        next_letter = letters[np.argmax(row)]
    result += next_letter
    print(result)
    input_data = np.array([[letter_to_encoding(letter) for letter in result[-3:]]])

hell
hello
hellow
hellowo
hellowor
helloworl
helloworld
