# Task 2: Char-RNN

Char-RNN implements multi-layer Recurrent Neural Network (RNN, LSTM, and GRU) for training/sampling from character-level language models. In other words the model takes one text file as input and trains a Recurrent Neural Network that learns to predict the next character in a sequence. The RNN can then be used to generate text character by character that will look like the original training data. This network is first posted by Andrej Karpathy, you can find out about his original code on https://github.com/karpathy/char-rnn, the original code is written in *lua*.

Here we will implement Char-RNN using Tensorflow!

In [1]:
import time
import numpy as np
import tensorflow as tf

# Notebook auto reloads code. (Ref: http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython)
%load_ext autoreload
%autoreload 2

## Part 1: Setup
In this part, we will read the data of our input text and process the text for later network training. There are two txt files in the data folder, for computing time consideration, we will use tinyshakespeare.txt here.

In [2]:
with open('data/tinyshakespeare.txt', 'r') as f:
    text=f.read()
# length of text is the number of characters in it
print('Length of text: {} characters'.format(len(text)))
# and let's get a glance of what the text is
print(text[:500])

Length of text: 1115394 characters
First Citizen:
Before we proceed any further, hear me speak.

All:
Speak, speak.

First Citizen:
You are all resolved rather to die than to famish?

All:
Resolved. resolved.

First Citizen:
First, you know Caius Marcius is chief enemy to the people.

All:
We know't, we know't.

First Citizen:
Let us kill him, and we'll have corn at our own price.
Is't a verdict?

All:
No more talking on't; let it be done: away, away!

Second Citizen:
One word, good citizens.

First Citizen:
We are accounted poor


In [3]:
# The unique characters in the file
vocab = sorted(set(text))
print ('{} unique characters'.format(len(vocab)))

65 unique characters


In [4]:
# Creating a mapping from unique characters to indices
vocab_to_ind = {c: i for i, c in enumerate(vocab)}
ind_to_vocab = dict(enumerate(vocab))
text_as_int = np.array([vocab_to_ind[c] for c in text], dtype=np.int32)

# We mapped the character as indexes from 0 to len(vocab)
for char,_ in zip(vocab_to_ind, range(20)):
    print('{:6s} ---> {:4d}'.format(repr(char), vocab_to_ind[char]))
# Show how the first 10 characters from the text are mapped to integers
print ('{} --- characters mapped to int --- > {}'.format(text[:10], text_as_int[:10]))

'\n'   --->    0
' '    --->    1
'!'    --->    2
'$'    --->    3
'&'    --->    4
"'"    --->    5
','    --->    6
'-'    --->    7
'.'    --->    8
'3'    --->    9
':'    --->   10
';'    --->   11
'?'    --->   12
'A'    --->   13
'B'    --->   14
'C'    --->   15
'D'    --->   16
'E'    --->   17
'F'    --->   18
'G'    --->   19
First Citi --- characters mapped to int --- > [18 47 56 57 58  1 15 47 58 47]


## Part 2: Creating batches
Now that we have preprocessed our input data, we then need to partition our data, here we will use mini-batches to train our model, so how will we define our batches?

Let's first clarify the concepts of batches:
1. **batch_size**: Reviewing batches in CNN, if we have 100 samples and we set batch_size as 10, it means that we will send 10 samples to the network at one time. In RNN, batch_size have the same meaning, it defines how many samples we send to the network at one time.
2. **sequence_length**: However, as for RNN, we store memory in our cells, we pass the information through cells, so we have this sequence_length concept, which also called 'steps', it defines how long a sequence is.

From above two concepts, we here clarify the meaning of batch_size in RNN. Here, we define the number of sequences in a batch as N and the length of each sequence as M, so batch_size in RNN **still** represent the number of sequences in a batch but the data size of a batch is actually an array of size **[N, M]**.

<span style="color:red">TODO:</span>
finish the get_batches() function below to generate mini-batches.

Hint: this function defines a generator, use *yield*.

In [5]:
def get_batches(array, n_seqs, n_steps):
    '''
    Partition data array into mini-batches
    input:
    array: input data
    n_seqs: number of sequences in a batch
    n_steps: length of each sequence
    output:
    x: inputs
    y: targets, which is x with one position shift
       you can check the following figure to get the sence of what a target looks like
    '''
    batch_size = n_seqs * n_steps
    n_batches = int(len(array) / batch_size)
    # we only keep the full batches and ignore the left.
    array = array[:batch_size * n_batches]
    array = array.reshape((n_seqs, -1))
    batch_count =0
    while(True) :
        if batch_count >= n_batches :
            batch_count =0
            
        else :
            
            x = array[:,batch_count*n_seqs : (n_seqs)*(batch_count+1)]
            y =np.roll(x, -1, axis =1)
            batch_count +=1
            yield[x,y]
           
        
    
    # You should now create a loop to generate batches for inputs and targets
    #############################################
    #           TODO: YOUR CODE HERE            #
    #############################################
    

In [6]:
batches = get_batches(text_as_int, 10, 10)
x,y= next(batches)

print('x\n', x[:10, :10])
print('\ny\n', y[:10, :10])

x
 [[18 47 56 57 58  1 15 47 58 47]
 [ 1 43 52 43 51 63 11  0 37 43]
 [52 58 43 42  1 60 47 56 58 59]
 [56 44 53 50 49  6  0 27 52  1]
 [47 52  1 57 54 47 58 43  1 53]
 [56 57  6  1 39 52 42  1 57 58]
 [46 47 51  1 42 53 61 52  1 58]
 [ 1 40 43 43 52  1 57 47 52 41]
 [50 58 57  1 51 39 63  1 57 46]
 [57 47 53 52  1 53 44  1 56 43]]

y
 [[47 56 57 58  1 15 47 58 47 18]
 [43 52 43 51 63 11  0 37 43  1]
 [58 43 42  1 60 47 56 58 59 52]
 [44 53 50 49  6  0 27 52  1 56]
 [52  1 57 54 47 58 43  1 53 47]
 [57  6  1 39 52 42  1 57 58 56]
 [47 51  1 42 53 61 52  1 58 46]
 [40 43 43 52  1 57 47 52 41  1]
 [58 57  1 51 39 63  1 57 46 50]
 [47 53 52  1 53 44  1 56 43 57]]


## Part 3: Build Char-RNN model
In this section, we will build our char-rnn model, it consists of input layer, rnn_cell layer, output layer, loss and optimizer, we will build them one by one.

The goal is to predict new text after given prime word, so for our training data, we have to define inputs and targets, here is a figure that explains the structure of the Char-RNN network.

![structure](img/charrnn.jpg)

<span style="color:red">TODO:</span>
finish all TODOs in ecbm4040.CharRNN and the blanks in the following cells.

**Note: The training process on following settings of parameters takes about 20 minutes on a GTX 1070 GPU, so you are suggested to use GCP for this task.**

In [7]:
from ecbm4040.CharRNN import *

### Training
Set sampling as False(default), we can start training the network, we automatically save checkpoints in the folder /checkpoints.

In [8]:
# these are preset parameters, you can change them to get better result
batch_size = 100         # Sequences per batch
num_steps = 100          # Number of sequence steps per batch
rnn_size = 256           # Size of hidden layers in rnn_cell
num_layers = 2           # Number of hidden layers
learning_rate = 0.005    # Learning rate

In [12]:
model = CharRNN(len(vocab), batch_size, num_steps, 'LSTM', rnn_size,
               num_layers, learning_rate)
batches = get_batches(text_as_int, batch_size, num_steps)
"""
x,y = next(batches)
x = np.array(x)
y = np.array(y)
print (x.shape)
print(y.shape)
"""
model.train(batches, 6000, 2000)

2
step: 200  loss: 2.1968  0.1872 sec/batch
step: 400  loss: 1.8883  0.1889 sec/batch
step: 600  loss: 1.7444  0.1873 sec/batch
step: 800  loss: 1.6774  0.1884 sec/batch
step: 1000  loss: 1.7117  0.1886 sec/batch
step: 1200  loss: 1.5788  0.1865 sec/batch
step: 1400  loss: 1.5539  0.1879 sec/batch
step: 1600  loss: 1.5088  0.1888 sec/batch
step: 1800  loss: 1.4735  0.1904 sec/batch
step: 2000  loss: 1.4951  0.1873 sec/batch
step: 2200  loss: 1.4805  0.1883 sec/batch
step: 2400  loss: 1.4652  0.1888 sec/batch
step: 2600  loss: 1.4783  0.1880 sec/batch
step: 2800  loss: 1.4468  0.1842 sec/batch
step: 3000  loss: 1.4143  0.1866 sec/batch
step: 3200  loss: 1.4518  0.1872 sec/batch
step: 3400  loss: 1.3825  0.1862 sec/batch
step: 3600  loss: 1.4363  0.1886 sec/batch
step: 3800  loss: 1.4071  0.1881 sec/batch
step: 4000  loss: 1.4141  0.1859 sec/batch
step: 4200  loss: 1.3862  0.1845 sec/batch
step: 4400  loss: 1.3868  0.1884 sec/batch
step: 4600  loss: 1.3861  0.1864 sec/batch
step: 4800  l

In [13]:
# look up checkpoints
tf.train.get_checkpoint_state('checkpoints')

model_checkpoint_path: "checkpoints/gi6000_l256.ckpt"
all_model_checkpoint_paths: "checkpoints/lstmi2000_l256.ckpt"
all_model_checkpoint_paths: "checkpoints/lstmi4000_l256.ckpt"
all_model_checkpoint_paths: "checkpoints/lstmi6000_l256.ckpt"
all_model_checkpoint_paths: "checkpoints/gi6000_l256.ckpt"

### Sampling
Set the sampling as True and we can generate new characters one by one. We can use our saved checkpoints to see how the network learned gradually.

In [14]:
model = CharRNN(len(vocab), batch_size, num_steps,'LSTM', rnn_size,
               num_layers, learning_rate, sampling=True)
# choose the last checkpoint and generate new text
checkpoint = tf.train.latest_checkpoint('checkpoints')
samp = model.sample(checkpoint, 1000, len(vocab), vocab_to_ind, ind_to_vocab, prime="LORD ")
print(samp)

2
INFO:tensorflow:Restoring parameters from checkpoints/gi6000_l256.ckpt
SCALIST:
That tend the cirture of an issue in all;
And we have something but something some marks,
As sorrow he is then, to bid him with all,
When will not say I have an ancely,
I am born a shall thou art heart withal.

GLOUCESTER:
Then, with a propers and shore steps and high.

KING EDWARD IV:
I'll hear the selfsit fire of something sometime,
True words of subject, wert I say'st.

LEONTES:
When he did throw him that he was at them,
But the more fealties of your strong infinite
As thou destroy with the part of that;
We have not the painter with me with a soul,
Art any on their words, that thou art to thee.

KING EDWARD IV:
The ways are both in tears were as an istue,
Thy such a man of heart to me and so,
Which tell me here as true worse wanters,
As I spoke in my house against them spoke,
To honour a sence that I have done the means;
For that, and tell this service in to the world,
The presently that terror of this

In [15]:
# choose a checkpoint other than the final one and see the results. It could be nasty, don't worry!
#############################################
#           TODO: YOUR CODE HERE            #
#############################################
checkpoint = 'checkpoints/lstmi2000_l256.ckpt'
samp = model.sample(checkpoint, 1000, len(vocab), vocab_to_ind, ind_to_vocab, prime="LORD ")
print(samp)

INFO:tensorflow:Restoring parameters from checkpoints/lstmi2000_l256.ckpt
GLOUCEST:
In the stands out and the compronient as the
sheather be to bein to her from my heart;
As we shall hear her hand of more will be,
And that a pardon and my light women,
I have shown her to me the parchous blood,
And shis to see a march, that, see the post
What streight of my some world or so with the
prisoner, by my suberied, when the death of shraigh,
In army that me so the souls of my
thing the winds of her take,
We shall she will be married, and the might
To the weeth and son, we were to be served,
To may so starms that the some of her sun
Which though and threaten weary weether'd bare
That the stands of her with her, with her best,
To the winding thinks a bean and breast them answer.

PETRUCHIO:
We have a bride to make you hence, and true to
the children to yourselves: by this till they have,
What warring her of harks: what's she'll stay:
I shall be mean as whene we may have this?

BENVOLIO:
I have n

### Change another type of RNN cell
We are using LSTM cell as the original work, but GRU cell is getting more popular today, let's chage the cell in rnn_cell layer to GRU cell and see how it performs. Your number of step should be the same as above.

**Note: You need to change your saved checkpoints' name or they will rewrite the LSTM results that you have already saved.**

In [16]:
# these are preset parameters, you can change them to get better result
batch_size = 100         # Sequences per batch
num_steps = 100          # Number of sequence steps per batch
rnn_size = 256           # Size of hidden layers in rnn_cell
num_layers = 2           # Number of hidden layers
learning_rate = 0.005    # Learning rate

model = CharRNN(len(vocab), batch_size, num_steps, 'GRU', rnn_size,
               num_layers, learning_rate)
batches = get_batches(text_as_int, batch_size, num_steps)
model.train(batches, 6000, 2000)

2
step: 200  loss: 3.2987  0.1771 sec/batch
step: 400  loss: 2.5488  0.1785 sec/batch
step: 600  loss: 2.3004  0.1770 sec/batch
step: 800  loss: 2.1526  0.1764 sec/batch
step: 1000  loss: 2.0516  0.1779 sec/batch
step: 1200  loss: 1.8370  0.1764 sec/batch
step: 1400  loss: 1.7286  0.1773 sec/batch
step: 1600  loss: 1.6566  0.1768 sec/batch
step: 1800  loss: 1.5781  0.1758 sec/batch
step: 2000  loss: 1.5982  0.1760 sec/batch
step: 2200  loss: 1.5710  0.1757 sec/batch
step: 2400  loss: 1.5141  0.1776 sec/batch
step: 2600  loss: 1.5446  0.1746 sec/batch
step: 2800  loss: 1.5011  0.1749 sec/batch
step: 3000  loss: 1.4587  0.1759 sec/batch
step: 3200  loss: 1.4676  0.1766 sec/batch
step: 3400  loss: 1.4318  0.1764 sec/batch
step: 3600  loss: 1.4696  0.1741 sec/batch
step: 3800  loss: 1.4353  0.1748 sec/batch
step: 4000  loss: 1.4404  0.1764 sec/batch
step: 4200  loss: 1.4229  0.1757 sec/batch
step: 4400  loss: 1.3997  0.1760 sec/batch
step: 4600  loss: 1.4188  0.1753 sec/batch
step: 4800  l

In [17]:
model = CharRNN(len(vocab), batch_size, num_steps, 'GRU', rnn_size,
               num_layers, learning_rate, sampling=True)
# choose the last checkpoint and generate new text
checkpoint = tf.train.latest_checkpoint('checkpoints')
samp = model.sample(checkpoint, 1000, len(vocab), vocab_to_ind, ind_to_vocab, prime="LORD ")
print(samp)

2
INFO:tensorflow:Restoring parameters from checkpoints/gi6000_l256.ckpt
LAUD:
That's some poor sense; thy sorrows shall be so:
For that I see so, to thyself a sease.

DUME IO:
Would I be so all the chair of me?

BALOR:
If they be said the words at this that doth not
And well that is the deod of anger it.
I shall not be a man and mine in thee;
And that the sorrows to be stored weiker,
Thoreself see so shall so disgracious mards.

BUSIS:
O care that didst thou'll be thy back in me;
Which is my blood to me, to do thy sake.

JULIET:
O my save, and my wounds thy sun of this.
I am another, for so say that seems,
Though thy begest still but first: and there shall be
To-morrows that I do; and we have seen a better
As seem the foot are stopped of a boot.
He was a marrion is my served men.
Heaven do me sends, I shall do now.
To many seems, and whose sorrow hath set not
One that his hand and marry such her brothers.

DUKE OF YORK:
We'll buse a possible.

BUCKINGHAM:
But when you honour, sir, to 

#### Questions
1. Compare your result of two networks that you built and the reasons that caused the difference. (It is a qualitative comparison, it should be based on the specific model that you build.)

Results :

LSTM: loss reduced to ~ 1.35
GRU: loss reduced to ~1.35

LSTM sees a faster reduction in loss compared to the GRU cell,due to more capability to capture a function and more paramters that are able to more quickly recognize and train itself to the function.

LSTM : separate output and hidden
GRU : output and hidden same

2. Discuss the difference between LSTM cells and GRU cells, what are the pros and cons of using GRU cells?

Pros of GRU Cell:
- Lesser memory
- Lesser time to run (train)
- Lesser parameters to train

Cons of GRU Cell:
- Lesser capalitity to capture a function.
- Takes longer iterations to converge.