# Task 2: Char-RNN

Char-RNN implements multi-layer Recurrent Neural Network (RNN, LSTM, and GRU) for training/sampling from character-level language models. In other words the model takes one text file as input and trains a Recurrent Neural Network that learns to predict the next character in a sequence. The RNN can then be used to generate text character by character that will look like the original training data. This network is first posted by Andrej Karpathy, you can find out about his original code on https://github.com/karpathy/char-rnn, the original code is written in *lua*.

Here we will implement Char-RNN using Tensorflow!

In [1]:
import time
import numpy as np
import tensorflow as tf

# Notebook auto reloads code. (Ref: http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython)
%load_ext autoreload
%autoreload 2

## Part 1: Setup
In this part, we will read the data of our input text and process the text for later network training. There are two txt files in the data folder, for computing time consideration, we will use tinyshakespeare.txt here.

In [2]:
with open('data/tinyshakespeare.txt', 'r') as f:
    text=f.read()
# length of text is the number of characters in it
print('Length of text: {} characters'.format(len(text)))
# and let's get a glance of what the text is
print(text[:500])

Length of text: 1115394 characters
First Citizen:
Before we proceed any further, hear me speak.

All:
Speak, speak.

First Citizen:
You are all resolved rather to die than to famish?

All:
Resolved. resolved.

First Citizen:
First, you know Caius Marcius is chief enemy to the people.

All:
We know't, we know't.

First Citizen:
Let us kill him, and we'll have corn at our own price.
Is't a verdict?

All:
No more talking on't; let it be done: away, away!

Second Citizen:
One word, good citizens.

First Citizen:
We are accounted poor


In [3]:
# The unique characters in the file
vocab = sorted(set(text))
print ('{} unique characters'.format(len(vocab)))

65 unique characters


In [4]:
# Creating a mapping from unique characters to indices
vocab_to_ind = {c: i for i, c in enumerate(vocab)}
ind_to_vocab = dict(enumerate(vocab))
text_as_int = np.array([vocab_to_ind[c] for c in text], dtype=np.int32)

# We mapped the character as indexes from 0 to len(vocab)
for char,_ in zip(vocab_to_ind, range(20)):
    print('{:6s} ---> {:4d}'.format(repr(char), vocab_to_ind[char]))
# Show how the first 10 characters from the text are mapped to integers
print ('{} --- characters mapped to int --- > {}'.format(text[:10], text_as_int[:10]))

'\n'   --->    0
' '    --->    1
'!'    --->    2
'$'    --->    3
'&'    --->    4
"'"    --->    5
','    --->    6
'-'    --->    7
'.'    --->    8
'3'    --->    9
':'    --->   10
';'    --->   11
'?'    --->   12
'A'    --->   13
'B'    --->   14
'C'    --->   15
'D'    --->   16
'E'    --->   17
'F'    --->   18
'G'    --->   19
First Citi --- characters mapped to int --- > [18 47 56 57 58  1 15 47 58 47]


In [18]:
print(vocab_to_ind)

{'\n': 0, ' ': 1, '!': 2, '$': 3, '&': 4, "'": 5, ',': 6, '-': 7, '.': 8, '3': 9, ':': 10, ';': 11, '?': 12, 'A': 13, 'B': 14, 'C': 15, 'D': 16, 'E': 17, 'F': 18, 'G': 19, 'H': 20, 'I': 21, 'J': 22, 'K': 23, 'L': 24, 'M': 25, 'N': 26, 'O': 27, 'P': 28, 'Q': 29, 'R': 30, 'S': 31, 'T': 32, 'U': 33, 'V': 34, 'W': 35, 'X': 36, 'Y': 37, 'Z': 38, 'a': 39, 'b': 40, 'c': 41, 'd': 42, 'e': 43, 'f': 44, 'g': 45, 'h': 46, 'i': 47, 'j': 48, 'k': 49, 'l': 50, 'm': 51, 'n': 52, 'o': 53, 'p': 54, 'q': 55, 'r': 56, 's': 57, 't': 58, 'u': 59, 'v': 60, 'w': 61, 'x': 62, 'y': 63, 'z': 64}


In [19]:
vocab_to_ind["d"]

42

## Part 2: Creating batches
Now that we have preprocessed our input data, we then need to partition our data, here we will use mini-batches to train our model, so how will we define our batches?

Let's first clarify the concepts of batches:
1. **batch_size**: Reviewing batches in CNN, if we have 100 samples and we set batch_size as 10, it means that we will send 10 samples to the network at one time. In RNN, batch_size have the same meaning, it defines how many samples we send to the network at one time.
2. **sequence_length**: However, as for RNN, we store memory in our cells, we pass the information through cells, so we have this sequence_length concept, which also called 'steps', it defines how long a sequence is.

From above two concepts, we here clarify the meaning of batch_size in RNN. Here, we define the number of sequences in a batch as N and the length of each sequence as M, so batch_size in RNN **still** represent the number of sequences in a batch but the data size of a batch is actually an array of size **[N, M]**.

<span style="color:red">TODO:</span>
finish the get_batches() function below to generate mini-batches.

Hint: this function defines a generator, use *yield*.

In [101]:
def get_batches(array, n_seqs, n_steps):
    '''
    Partition data array into mini-batches
    input:
    array: input data
    n_seqs: number of sequences in a batch n
    n_steps: length of each sequence m
    output:
    x: inputs
    y: targets, which is x with one position shift
       you can check the following figure to get the sence of what a target looks like
    '''
    batch_size = n_seqs * n_steps
    n_batches = int(len(array) / batch_size)
    # we only keep the full batches and ignore the left.
    array = array[:batch_size * n_batches]
    array = array.reshape((n_seqs, -1))
    
    # You should now create a loop to generate batches for inputs and targets
    #############################################
    #           TODO: YOUR CODE HERE            #
    #############################################
    '''
    for i in range(n_batches):
        x=array[:,i*n_steps:(i+1)*n_steps]
        #y=x shift 1
        y=np.roll(x,1,axis=1)
        yield x,y
    '''
    while True:
        for i in range(0,array.shape[1],n_steps):
            x=array[:,i:i+n_steps]
            y=np.zeros_like(x)
            y[:,:-1]=x[:,1:]
            y[:,-1]=x[:,0]
            yield x,y
        

In [102]:
batches = get_batches(text_as_int, 10, 10)
x, y = next(batches)
print('x\n', x[:10, :10])
print('\ny\n', y[:10, :10])

x
 [[18 47 56 57 58  1 15 47 58 47]
 [ 1 43 52 43 51 63 11  0 37 43]
 [52 58 43 42  1 60 47 56 58 59]
 [56 44 53 50 49  6  0 27 52  1]
 [47 52  1 57 54 47 58 43  1 53]
 [56 57  6  1 39 52 42  1 57 58]
 [46 47 51  1 42 53 61 52  1 58]
 [ 1 40 43 43 52  1 57 47 52 41]
 [50 58 57  1 51 39 63  1 57 46]
 [57 47 53 52  1 53 44  1 56 43]]

y
 [[47 56 57 58  1 15 47 58 47 18]
 [43 52 43 51 63 11  0 37 43  1]
 [58 43 42  1 60 47 56 58 59 52]
 [44 53 50 49  6  0 27 52  1 56]
 [52  1 57 54 47 58 43  1 53 47]
 [57  6  1 39 52 42  1 57 58 56]
 [47 51  1 42 53 61 52  1 58 46]
 [40 43 43 52  1 57 47 52 41  1]
 [58 57  1 51 39 63  1 57 46 50]
 [47 53 52  1 53 44  1 56 43 57]]


## Part 3: Build Char-RNN model
In this section, we will build our char-rnn model, it consists of input layer, rnn_cell layer, output layer, loss and optimizer, we will build them one by one.

The goal is to predict new text after given prime word, so for our training data, we have to define inputs and targets, here is a figure that explains the structure of the Char-RNN network.

![structure](img/charrnn.jpg)

<span style="color:red">TODO:</span>
finish all TODOs in ecbm4040.CharRNN and the blanks in the following cells.

**Note: The training process on following settings of parameters takes about 20 minutes on a GTX 1070 GPU, so you are suggested to use GCP for this task.**

In [85]:
from ecbm4040.CharRNN import *

### Training
Set sampling as False(default), we can start training the network, we automatically save checkpoints in the folder /checkpoints.

In [86]:
# these are preset parameters, you can change them to get better result
batch_size = 100         # Sequences per batch
num_steps = 100          # Number of sequence steps per batch
rnn_size = 512           # Size of hidden layers in rnn_cell
num_layers = 5           # Number of hidden layers
learning_rate = 0.005    # Learning rate

In [87]:
model = CharRNN(len(vocab), batch_size, num_steps, 'LSTM', rnn_size,
               num_layers, learning_rate)
batches = get_batches(text_as_int, batch_size, num_steps)
model.train(batches, 6000, 2000)

step: 200  loss: 3.3446  0.8193 sec/batch
step: 400  loss: 3.2857  0.8199 sec/batch
step: 600  loss: 3.3308  0.8184 sec/batch
step: 800  loss: 3.2967  0.8011 sec/batch
step: 1000  loss: 3.2966  0.8068 sec/batch
step: 1200  loss: 3.3238  0.8057 sec/batch
step: 1400  loss: 3.3161  0.8133 sec/batch
step: 1600  loss: 3.3250  0.8124 sec/batch
step: 1800  loss: 3.3092  0.8103 sec/batch
step: 2000  loss: 3.3053  0.8196 sec/batch
step: 2200  loss: 3.3177  0.8099 sec/batch
step: 2400  loss: 2.4816  0.8176 sec/batch
step: 2600  loss: 2.2422  0.8047 sec/batch
step: 2800  loss: 2.0142  0.8076 sec/batch
step: 3000  loss: 1.8888  0.8081 sec/batch
step: 3200  loss: 1.8101  0.8092 sec/batch
step: 3400  loss: 1.6927  0.8099 sec/batch
step: 3600  loss: 1.6893  0.8192 sec/batch
step: 3800  loss: 1.6430  0.8053 sec/batch
step: 4000  loss: 1.6199  0.8132 sec/batch
step: 4200  loss: 1.5851  0.8120 sec/batch
step: 4400  loss: 1.5546  0.8140 sec/batch
step: 4600  loss: 1.5524  0.8097 sec/batch
step: 4800  los

In [88]:
# look up checkpoints
tf.train.get_checkpoint_state('checkpoints')

model_checkpoint_path: "checkpoints/i6000_l512.ckpt"
all_model_checkpoint_paths: "checkpoints/i2000_l512.ckpt"
all_model_checkpoint_paths: "checkpoints/i4000_l512.ckpt"
all_model_checkpoint_paths: "checkpoints/i6000_l512.ckpt"

### Sampling
Set the sampling as True and we can generate new characters one by one. We can use our saved checkpoints to see how the network learned gradually.

In [89]:
model = CharRNN(len(vocab), batch_size, num_steps, 'LSTM', rnn_size,
               num_layers, learning_rate, sampling=True)
# choose the last checkpoint and generate new text
checkpoint = tf.train.latest_checkpoint('checkpoints')
samp = model.sample(checkpoint, 1000, len(vocab), vocab_to_ind, ind_to_vocab, prime="LORD ")
print(samp)

INFO:tensorflow:Restoring parameters from checkpoints/i6000_l512.ckpt
LORD ESS OF YORK:
Thou wilt not speak thy feads to his men
I am their house of them. Which was it so.
Once more stould to thy force; but thy heirs of thee.

CAMILLO:
I will be comport in their hearts,
But I
to me worthiest sinses one a sound; and so their wealth,
But say to be heaven so so my billing,
And where he shall he must and hence would never
wint, and though an onestard weak o' her braws to the silence and be pity at my house
Till time is to the wretched,

MROTUN:
Who shall be plack'd the wind what thy son of your
wills and the hours of this shorms, all things;
And hate time will himself. That she shall be a wish,
And have he spent to be that hope;
I so said till, the most golden brise and boter but time
Tell times worthy to my saint with said, thy son,
And take thee words a mine,e

GLOUCESTER:
Why, here ctew me in him where he hours,
And he shoop him that I heard
Wherein you muse to me: till I am none of hea

In [97]:
# choose a checkpoint other than the final one and see the results. It could be nasty, don't worry!
#############################################
#           TODO: YOUR CODE HERE            #
#############################################
model = CharRNN(len(vocab), batch_size, num_steps, 'LSTM', rnn_size,
               num_layers, learning_rate, sampling=True)
# choose the last checkpoint and generate new text
checkpoint = tf.train.load_checkpoint('checkpoints/i4000_l512.ckpt')
samp = model.sample(checkpoint, 1000, len(vocab), vocab_to_ind, ind_to_vocab, prime="LORD ")
print(samp)

INFO:tensorflow:Restoring parameters from <tensorflow.python.pywrap_tensorflow_internal.CheckpointReader; proxy of <Swig Object of type 'tensorflow::checkpoint::CheckpointReader *' at 0x7f8de859d8d0> >


InternalError: Unable to get element as bytes.

### Change another type of RNN cell
We are using LSTM cell as the original work, but GRU cell is getting more popular today, let's chage the cell in rnn_cell layer to GRU cell and see how it performs. Your number of step should be the same as above.

**Note: You need to change your saved checkpoints' name or they will rewrite the LSTM results that you have already saved.**

In [98]:
# these are preset parameters, you can change them to get better result
batch_size = 100         # Sequences per batch
num_steps = 100          # Number of sequence steps per batch
rnn_size = 512           # Size of hidden layers in rnn_cell
num_layers = 5           # Number of hidden layers
learning_rate = 0.005    # Learning rate

model = CharRNN(len(vocab), batch_size, num_steps, 'GRU', rnn_size,
               num_layers, learning_rate)
batches = get_batches(text_as_int, batch_size, num_steps)
model.train(batches, 6000, 2000)

step: 200  loss: 3.3785  0.7822 sec/batch
step: 400  loss: 3.3122  0.7854 sec/batch
step: 600  loss: 3.3568  0.7844 sec/batch
step: 800  loss: 3.3204  0.7850 sec/batch
step: 1000  loss: 3.3199  0.7825 sec/batch
step: 1200  loss: 3.3465  0.7770 sec/batch
step: 1400  loss: 3.3369  0.7825 sec/batch
step: 1600  loss: 3.3491  0.7789 sec/batch
step: 1800  loss: 3.3279  0.7828 sec/batch
step: 2000  loss: 3.3415  0.7812 sec/batch
step: 2200  loss: 3.3456  0.7858 sec/batch
step: 2400  loss: 3.3500  0.7928 sec/batch
step: 2600  loss: 3.3223  0.7912 sec/batch
step: 2800  loss: 3.3396  0.7825 sec/batch
step: 3000  loss: 3.3574  0.7755 sec/batch
step: 3200  loss: 3.2915  0.7863 sec/batch
step: 3400  loss: 3.3540  0.7746 sec/batch
step: 3600  loss: 3.3374  0.7729 sec/batch
step: 3800  loss: 3.3381  0.7725 sec/batch
step: 4000  loss: 3.3427  0.7733 sec/batch
step: 4200  loss: 3.3469  0.7737 sec/batch
step: 4400  loss: 3.3357  0.7720 sec/batch
step: 4600  loss: 3.3158  0.7755 sec/batch
step: 4800  los

In [100]:
model = CharRNN(len(vocab), batch_size, num_steps, 'GRU', rnn_size,
               num_layers, learning_rate, sampling=True)
# choose the last checkpoint and generate new text
checkpoint = tf.train.latest_checkpoint('checkpoints')
samp = model.sample(checkpoint, 1000, len(vocab), vocab_to_ind, ind_to_vocab, prime="LORD ")
print(samp)

INFO:tensorflow:Restoring parameters from checkpoints/gi6000_l512.ckpt
LORD eetteoeoe  e  oe e oo e tatte  eoaa t  attee ttato t  taa eto eae  oaoeeeet eo   oae  a  oe aaottt aat ea tttee ee eea taaeta ot tto aaooee    eetoaeoo oa e t   otee eet o  teo ttaa oeoo t ate eotto oae oote  toaaeet eea et ttooee eee  o o eto      a  eeettatea aotet ao eeo    o  ee   to o oeeeate  aeatea tetoae too tttee  oaa  a aet eaa ta tee oeaee oteot e  oa aoeet o      oaee oea e   a eea te  teoetteoota  ee oaot eet  ee o aaae tao o  aea o t teo t  t  aea ooeoaa aeaeoeee eaaet ao a    tee t   ae oo e  t tt   oaoata aooteooeeateettootoote  ea o  aa aett    e   o taea  t  oo  t   teaoteeeteeet a te  e ootea eetao   o o  eoee t  eooo tae ea  ot   ae  ae   tatta o t  atatoaoattoeteoaate a   eo aoeott  e  t e  eeeeot e t ooet a tetea t  oe  eeattaoaaeo a   o ea  etaat t t   a eoet  etae   a aa  eoe t  eea aaa aaa   te eoa et eaao o  aeaaa eet  e t too eatea  eaoteaoeeo aeete otoa eaeoeotto   oeeettoatettetaoaa

#### Questions
1. Compare your result of two networks that you built and the reasons that caused the difference. (It is a qualitative comparison, it should be based on the specific model that you build.)
2. Discuss the difference between LSTM cells and GRU cells, what are the pros and cons of using GRU cells?

Answer:

1. The two networks uses the same parameters, and the LSTM network is much better than GRU. GRU networks's training loss hardly reduces, whereas the LSTM network's loss reduced significantly during the training.

2.GRU cells do not have output gate, which controls the moemory to next unit. GRU cells do not control memory. When computing new memory, GRU cells use reset gate to control old momory, whereas LSTM cells do not control old memory, instead it's implemented by forget gate.
Pros: GRU has less tensor computation, thus the training is faster.
Cons: With large dataset, LSTM is better at representation.