[Home](http://realai.org/) > [Course](http://realai.org/course/) > [TensorFlow](http://realai.org/course/tensorflow/) > [RNN](http://realai.org/course/tensorflow/#rnn) >

# Character RNN

*Last Updated: September 9, 2017*

The [GPU](http://realai.org/course/tensorflow/#gpu) VM built from the last session allows us to explore more interesting neural networks within a reasonable time. In this session, we use TensorFlow to build a recurrent neural network (RNN) on English characters. This model is widely known for its [unreasonable effectiveness](http://karpathy.github.io/2015/05/21/rnn-effectiveness/).

First we list all the necessary modules and constants. Although they appear at the very beginning, they're actually collected over time as the code is gradually developed.

In [1]:
import numpy as np
import tensorflow as tf
import time

# Model parameters
HIDDEN_SIZE = 64
NUM_LAYERS = 2
NUM_STEPS = 32

# Training parameters
BATCH_SIZE = 32
NUM_EPOCHS = 4

# Testing parameters
INFERENCE_LEN = 200
TEMPERATURE = 2.0

# Output parameters
INFER_INTERVAL = 1000
LOGDIR = "/tmp/char_RNN"

# Data parameters
FILE_PATH = "data/shakespeare.txt"
STRIDE = 32

All training data are characters contained in a text file. The file `shakespeare.txt` used here is [downloaded](https://github.com/hzy46/Char-RNN-TensorFlow/raw/master/data/shakespeare.txt) from a nicely implemented [GitHub project](https://github.com/hzy46/Char-RNN-TensorFlow) of character RNNs. Its first 10,000 lines are collected in `shakespeare_10K.txt`, a smaller file more convenient for development.

In [2]:
%ls -R data/

data/:
shakespeare_10K.txt  shakespeare.txt


Each unique character in the entire file is encoded as an integer:

In [3]:
# Read character file into a huge string
with open(FILE_PATH, 'r') as f:
    raw_data = f.read()

# Encoder and decoder for characters
char_set = set(raw_data)

encode_dict = {}
decode_dict = {}
size_charset = 0

for char in char_set:
    encode_dict[char] = size_charset
    decode_dict[size_charset] = char
    size_charset += 1

# Encode raw data into data
data_len = len(raw_data)
data = np.zeros(data_len)

for i in range(data_len):
    data[i] = encode_dict[raw_data[i]]

del(raw_data)

Batches will be generated from the data of encoded characters. They will be rearranged at the beginning of each epoch to allow easy generation of input and target. Target has the same shape as input. The target character is the character next to the corresponding input character in the data file.

In [4]:
# Build a data generator for each epoch
def all_batches():
    for start in range(0, batch_len-NUM_STEPS, STRIDE):
        yield (all_batches_data[:, start:(start + NUM_STEPS)], 
               all_batches_data[:, (start + 1):(start + NUM_STEPS + 1)])

Our model is a [`tf.nn.dynamic_rnn`](https://www.tensorflow.org/api_docs/python/tf/nn/dynamic_rnn) of cells that belong to the class [`tf.contrib.rnn.MultiRNNCell`](https://www.tensorflow.org/api_docs/python/tf/contrib/rnn/MultiRNNCell), which is composed sequentially of [`tf.contrib.rnn.BasicLSTMCell`](https://www.tensorflow.org/api_docs/python/tf/contrib/rnn/BasicLSTMCell). In plain English, it's a multi-layer LSTM model whose computational details are conveniently packaged in TensorFlow modules, so that we can just "describe" them in a few lines of code.

During training, inputs to the model are encoded characters of shape [BATCH_SIZE, NUM_STEPS]. Testing is conducted online where we take a single character in shape [1, 1] to bootstrap inference. To meet the need of both training and testing, our computation graph begins with an input of shape [None, None]. A new inner-most axis is then added by [`tf.one_hot`](https://www.tensorflow.org/api_docs/python/tf/one_hot).

In [5]:
with tf.name_scope("Input_Characters"):
    encoded_input = tf.placeholder(tf.int32, (None, None), name="Encoded_Input")
    one_hot_input = tf.one_hot(encoded_input, size_charset, name="One_Hot_Input")

cell = tf.contrib.rnn.MultiRNNCell(
    [tf.contrib.rnn.BasicLSTMCell(HIDDEN_SIZE) for _ in range(NUM_LAYERS)]
)

initial_state = cell.zero_state(tf.shape(one_hot_input)[0], tf.float32)
outputs, out_state = tf.nn.dynamic_rnn(cell, one_hot_input, initial_state=initial_state)

# Output logits
logits = tf.layers.dense(outputs, size_charset, activation=None, name="Logits")

We use the Adam optimizer to minimize cross entropy loss with the target characters:

In [6]:
with tf.name_scope("Target_Characters"):
    encoded_target = tf.placeholder(tf.int32, (None, NUM_STEPS), name="Encoded_Target")
    target = tf.one_hot(encoded_target, len(char_set), name="One_Hot_Target")

# Loss, training and sampling
with tf.name_scope("Loss"):
    loss = tf.reduce_mean(
        tf.nn.softmax_cross_entropy_with_logits(labels=target, logits=logits),
        name="Mean")

with tf.name_scope("Optimizer"):
    train = tf.train.AdamOptimizer(learning_rate=0.001, name="Adam").minimize(loss)

`TEMPERATURE` is a hyperparameter that controls how confidently we'd like our model to behave. Lower temperature exaggerates the differences among logits and causes preferred characters to be chosen by [`tf.multinomial`](https://www.tensorflow.org/api_docs/python/tf/multinomial) with overwhelming higher probabilities:

In [7]:
with tf.name_scope("Sample"):
    sample = tf.multinomial(tf.exp(logits[:, -1] / TEMPERATURE), 1)[:, 0]

In [8]:
# Take a look at the computation graph
if tf.gfile.Exists(LOGDIR):
    tf.gfile.DeleteRecursively(LOGDIR)
tf.gfile.MakeDirs(LOGDIR)

writer = tf.summary.FileWriter(LOGDIR, tf.get_default_graph())

At this point, the computation graph is complete and looks like

![](http://realai.org/course/tensorflow/char-RNN-1.png)

Testing is conducted by running a partially trained model many steps from a random starting character:

In [9]:
def run_inference():
    # Initialization: one-character input and initial RNN state
    new_index = [np.random.randint(0, size_charset)]
    new_input = [new_index]
    print(decode_dict[new_index[0]], end='')
    new_state = sess.run(initial_state, feed_dict={encoded_input: new_input})

    for i in range(INFERENCE_LEN - 1):
        new_index, new_state = sess.run([sample, out_state], feed_dict={encoded_input: new_input, initial_state: new_state})
        new_input = [new_index]
        print(decode_dict[new_index[0]], end='')
    print("\n")

Start session:

In [10]:
sess = tf.Session()
sess.run(tf.global_variables_initializer())

Full training using a cloud [GPU](https://cloud.google.com/compute/pricing#gpus) on an n1-standard-2 (2 vCPUs, 7.2 GB memory) machine should take less than 15 minutes:

In [11]:
%%time
for e in range(NUM_EPOCHS):
    
    # Set up batched character input data from random start
    seed = np.random.randint(0, NUM_STEPS)
    batch_len = (data_len - seed) // BATCH_SIZE
    all_batches_data = np.reshape(data[seed:(seed + BATCH_SIZE*batch_len)], (BATCH_SIZE, batch_len))
        
    print("***** Epoch {} runs in {} steps *****".format(e, (batch_len-NUM_STEPS-1) // STRIDE + 1))
    start = time.time()
    step_counter = 0
    for batch in all_batches():
        # The training step
        Loss, _ = sess.run((loss, train), feed_dict={encoded_input: batch[0], encoded_target: batch[1]})
        
        step_counter += 1
        if step_counter % INFER_INTERVAL == 0:
            print("Step {} training loss is {:.5f}. Time: {:.2f}s".format(step_counter, Loss, time.time() - start))
            start = time.time()
            run_inference()
        
    print("End of epoch {} training loss is {:.5f}".format(e, Loss))
    run_inference()

***** Epoch 0 runs in 4466 steps *****
Step 1000 training loss is 2.47922. Time: 41.00s
HnW Voppan pothe me bereAd the the anle the beresG
Thep more the the cout the ther the LoHn jorithe the son the me the soud he the the wond me the sorge fo gore borens the the me founs the the coreeut

Step 2000 training loss is 2.27208. Time: 41.09s
be the gound Xe the sor in the core and Bather the the reand the QoThe gour the colle so of me the we she the in role head he of she the so le so the the the the and she be not and the the the and the

Step 3000 training loss is 2.14611. Time: 40.62s
well and the the the cound in the cond the be the me your will he sore heI so gore not you hear in with her the with of and the the wistary I all the the goo he the condering sone me deded so and her 

Step 4000 training loss is 2.10461. Time: 40.42s
id here [orter dow me the will and and and courd in here the wist ;
Here the $all me with me sim be the locke, spare shat the sand the your the mare the loth m