# RNN for Character level text generation
Welcome, in this lab-exercise, we will will building an RNN which generates text one character at a time! Hope you enjoy coding.

## This is divided into 4 stages:

1. Loading and Pre-Processing 
2. FeedForward computation
3. Backpropagation through time (BPTT)
4. Sentence generation

## Instructions:

Please follow along the comments/markdown provided to help you in completing your tasks. Feel free to reach-out to coordinators for help in coding.

Lets get started.

Sairam!

## Imports and hyper parameters

1. Sequence Length
2. Size of Hidden Layer
3. Learning Rate

In [1]:
# Library imports
import numpy as np

# initialize hyper-parameters
learning_rate = .003
hidden_size = 100
seq_len = 10

## Milestone 1

1. Read input file using python 'open' functionality
2. load file/sentences into 'data' variable (remove special chars!)
3. build char_to_num and num_to_char dicts mappers


In [2]:
# Read input from file
file = open('data/RNN/input_text.txt')
data = file.read().strip()
# define vocabulary
vocab = set(data)

# define data_size and vocab_size
data_size,vocab_size = len(data),len(vocab)

print("Number of chars = {} and unique chars = {}\n".format(data_size,vocab_size))

# Define mapping from char to number and vice-versa

char_to_num = {ch:i for i,ch in enumerate(vocab)}
num_to_char = {i:ch for i,ch in enumerate(vocab)}

Number of chars = 3873 and unique chars = 66



### Model Parameters

1. Matrix U -> Input-to-Hidden
2. Matrix W -> Hidden-to-Hidden Recurrence
3. Matrix V -> Hidden-to-Output
4. Matrix Bh -> Bias Input-to-Hidden (at hidden layer)
5. Matrix Bo -> Bias Hidden-to-Output (at output layer)

In [3]:
# Input to hidden
U = np.random.randn(hidden_size,vocab_size) * .01
# Hidden to hidden
W = np.random.randn(hidden_size,hidden_size) * .01
# Hidden to output
V = np.random.randn(vocab_size,hidden_size) * .01
# Bias for hidden
Bh = np.zeros((hidden_size,1))
# Bias for output
Bo = np.zeros((vocab_size,1))

### Prediction on untrained model parameters

***Input ***: hidden_state, index of start_character and length of the sentence to be generated

***Output***: Model generated sentence of specified length

#### To Do

1. Function to convert index to one-hot representation
2. Define a softmax activation (for generating normalized log-probabilities)
3. DO a forward pass and apply softmax for obtaining output probabilities
4. Maximum from the output probability becomes input for the next iteration
5. Convert max_prob_index position to obtain generated char (using num_to_char mapping)


In [4]:
# Softmax function
def softmax(x):
    val = np.exp(x)
    return val/np.sum(val,axis=0)

# one-hot encoding - take position as the argument
def convert_index_to_one_hot(pos):
    x = np.zeros((vocab_size,1))
    x[pos]=1
    return x

# Generate One Character at a time

In [5]:
# to generate sequence - taking start char as input
def generate(hidden_state,input_char_pos,trial_seq_length=200):
   
    # convert to one-hot
    x = convert_index_to_one_hot(pos=input_char_pos)
    gen_string = ''
    
    # generate sequence
    for t in range(trial_seq_length):
        # compute hidden_state
        hidden_state = np.tanh(np.dot(U,x)+np.dot(W,hidden_state) + Bh)
        # comput output probability
        output = softmax(np.dot(V,hidden_state) + Bo)
        # max_prob element is the next-character
        next_x = np.argmax(output.ravel())
        # use it as one-hot for next element
        x = convert_index_to_one_hot(next_x)
        # append to existing string
        gen_string += num_to_char[next_x]
    
    print(gen_string)

#start hidden state
hidden_prev = np.zeros((hidden_size,1))
#generate sentence - output will be random-chars which dont make sense!
generate(hidden_prev,char_to_num['a'],trial_seq_length=100)

d1b7mt]Uy/V+hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh


### Sample Input and Target Values

We will write code to actually see how our inputs and targets look like. This will give a great perspective before going for the actual implementation

In [6]:
pos=0

# obtain input_data - using sequence length: will be a list of int's [char index]
input_data = [char_to_num[ch] for ch in data[pos:pos+seq_len]]
# target data - similar to input, but right-shifted by 1 position
target_data = [char_to_num[ch] for ch in data[pos+1:pos+seq_len+1]]

print ("Input data {} \nTarget data {}\n".format(input_data,target_data))

Input data [22, 4, 53, 14, 12, 57, 49, 57, 36, 4] 
Target data [4, 53, 14, 12, 57, 49, 57, 36, 4, 14]



# Forward Propagation

Simple forward propagation with the following steps:

1. Convert to one-hot representation
2. Compute hidden layer values - np.tanh ( u*x + w*h_(t_1) ) for each time-step
3. Compute output (from hidden to output) - Matrix "V" dot "hidden"
4. output is softmax
5. loss += -logloss

<img src="images/rnn_1.png">

In [7]:
def forward_propagation(input_data,target_data,hidden_prev):
    # define dicts to hold values
    x,h,y,o = {},{},{},{}
    # store "inital" hidden state value
    h[-1]=np.copy(hidden_prev)
    # define loss
    loss = 0
    # for each input-timestep
    
    for t in range(len(input_data)):
        # convert input[t] to one-hot representation
        x[t]=convert_index_to_one_hot(input_data[t])
        # tanh activation on hidden-state values
        h[t]=np.tanh(np.dot(U,x[t])+np.dot(W,h[t-1])+Bh)
        # dot-product to take from hidden-to-output
        y[t]=np.dot(V,h[t])
        # softmax on y to map to output-probabilities
        o[t]=softmax(y[t])
        # add to log-loss on target[t]
        loss += -np.log(o[t][target_data[t],0])
        
    # return x,h,y,o,loss values for next-steps
    return x,h,y,o,loss

# Backpropagation for each input

Refer to hand-outs for equations!

In [8]:
def back_propagation(input_data,target_data,x,h,o,clip_gradient=True):
    # define graient matrices
    dU,dW,dV = np.zeros_like(U),np.zeros_like(W),np.zeros_like(V)
    dBh,dBo = np.zeros_like(Bh),np.zeros_like(Bo)
    d_hidden_next = np.zeros_like(h)
    # start from reverse
    for t in reversed(range(len(input_data))):
        # cache the data
        dy = np.copy(o[t])
        # dy = y -1
        dy[target_data[t]] -= 1
        # dot dy with hidden - remember to accumulate!
        dV += np.dot(dy,h[t].T)
        # dy is the gradient for output-bias
        dBo += dy
        # hidden gradient - dy.V + d_hidden_next (back-prop into hidden)
        dh = np.dot(V.T,dy) + d_hidden_next
        # tanh gradient * dh
        dh_raw = np.array((1- h[t]*h[t]) * dh,dtype=np.float64)
        # dh_raw for bias gradient
        dBh += dh_raw
        
        # dot dh_raw, with x for dU
        dU += np.dot(dh_raw,x[t].T)
        # dot dh_raw with h for dW
        dW += np.dot(dh_raw,h[t-1].T)
        
        # back_prop into hidden - time information.
        d_hidden_next = np.dot(W.T,dh_raw)
    
    # Account for exploding gradient
    if clip_gradient:
        for param in [dU,dW,dV,dBh,dBo]:
            # set range between -5 and 5
            np.clip(param,-5,5,out=param)
            
    # return respective gradients
    return dU,dW,dV,dBh,dBo

# Perform a single iteration: forward and backward pass


In [9]:
def do_one_iteration(input_data,target_data,hidden_prev):
    
    # Forward propgation
    x,h,y,o,loss = forward_propagation(input_data,target_data,hidden_prev)
    # Backward propagation
    dU,dW,dV,dBh,dBo = back_propagation(input_data,target_data,x,h,o)
    # return loss, gradients and hidden_state for next iteration
    return loss,dU,dW,dV,dBh,dBo,h[len(input_data)-1]
    

# Train the model

Steps:

1. Compute input_data and target_data -> will be indices
2. do one-iteration of forward and backward
3. Use the gradients and use Gradient Descent for online-update of parameters

In [10]:
# start position
pos=0

for iter_count in range(1000*1000):
    # check for the bounds/end words
    if pos+seq_len+1>len(data) or iter_count==0:
        pos=0
        hidden_prev = np.zeros((hidden_size,1))
    
    #compute input and output data indices
    input_data = [char_to_num[ch] for ch in data[pos:pos+seq_len]]
    target_data = [char_to_num[ch] for ch in data[pos+1:pos+seq_len+1]]
    # compute forward and backward prop and return gadients
    loss,dU,dW,dV,dBh,dBo,hidden_prev = do_one_iteration(input_data,target_data,hidden_prev)

    # Gradient Descent for the weight updates
    U -= learning_rate * dU
    W -= learning_rate * dW
    V -= learning_rate * dV
    Bh -= learning_rate * dBh
    Bo -= learning_rate * dBo    

    # print loss
    if iter_count%5000==0:
        print("\nLoss: {} and Iteration {}".format(loss,iter_count))
        generate(hidden_state=hidden_prev,input_char_pos=input_data[0],trial_seq_length=20)
        
    # remember to move pos by seq_len for each iteration!!
    pos += seq_len


Loss: 41.896944603719525 and Iteration 0
iiiiiiiiiiiiiiiiiiii

Loss: 27.052458964061096 and Iteration 5000
tatatatatatatatatata

Loss: 13.045024309853844 and Iteration 10000
nput_data = np.dotat

Loss: 21.595310197720874 and Iteration 15000
dot(do_ntet_data,hid


KeyboardInterrupt: 