## Course: DD2424 - Assignment 4

The main objective of this assignment is to train a RNN to synthesize English text character by character. The training will be done using the text from the book _Harry potter and The Goblet of Fire_ by J.K.Rowling. AdaGrad was used as the main optimizing function. The main steps that were perfomed during the implementation of the current Assignment were: 
1. __Preparing Data__: Read in the training data, determine the number of unique characters in the text and set up mapping functions - one mapping each character to a unique index and another mapping each index to a character.
2. __Back-propagation__: The forward and the backward pass of the backpropagation algorithm for a vanilla RNN to efficiently compute the gradients.
3. __AdaGrad updating__ the RNN’s parameters.
4. __Synthesizing__ text from the RNN: Given a learnt set of parameters for the RNN, a default initial hidden state h0 and an initial input vector, x0, from which to bootstrap from then you will write a function to generate a sequence of text.

### 0.1 Read the data

The data is obtained from a Text file that include the whole _Harry Potter and the Goblet of Fire_ book. Then, the first step is to read it from the text file: 

In [1]:
# Load the book
book_fname = 'data/goblet.txt';
book_data = open(book_fname,'r').read();

In [5]:
book_chars = ''.join(set(book_data))
book_chars = list(sorted(book_chars))
# Show all the characters
print(book_chars,)

(['\t', '\n', ' ', '!', '"', "'", '(', ')', ',', '-', '.', '/', '0', '1', '2', '3', '4', '6', '7', '9', ':', ';', '?', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', '^', '_', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', '}', '\x80', '\xa2', '\xbc', '\xc3', '\xe2'],)


Now the dimensionality of the output and book size was obtained: 

In [3]:
# Size of the output and input layers
K = len(book_chars)
book_size = len(book_data)

To allow you to easily go between a character and its one-hot encoding and
in the other direction a map dictionary was defined (one for every direction index to character and viceversa):

In [4]:
# Mappings from char to ind and vice versa, mapped in a dictionary
char_to_ind = { character:index for index, character in enumerate(book_chars)}
ind_to_char = { index:character for index,character in enumerate(book_chars)}

All these operations are included in a class named LoadBook that initizalize and store all these parameters

In [5]:
from Lab4 import LoadBook

# Initialize the class
book = LoadBook()

### 0.2 Set hyper-parameters & initialize the RNN’s parameters

We have to set all the parameters of the RNN. As recommended, this will be done using the RNN as a class. This class is defined, using the parameters that will be defined as follows. The dimension of the hidden state of the RNN's architecture is set to:

In [6]:
# Hidden state dimension
m = 100

During training, the learning rate, and the length of the input sequence are set to:

In [7]:
# Learning rate
eta = 0.1
# Input sequence length
seq_length = 25

The bias vectors, are initizalized to zero: 

In [8]:
# We need numpy
import numpy as np

# bias vector of size m x 1 in equation for at
b = np.zeros((m, 1))
# bias vector of size C x 1 in equation for ot
c = np.zeros((book.K, 1))

While the weight matrixes are randomly initialized as (take into account that input size is the same as output size):

In [9]:
# Sigma
sig = 0.01

# weight matrix of size m x d applied to xt (input-to-hidden connection)
U = np.random.randn(m, book.K)*sig

# weight matrix of size m x m applied to ht-1 (hidden-to-hidden connection)
W = np.random.randn(m, m)*sig

# weight matrix of size C x m applied to at (hidden-to-output connection)
V = np.random.randn(book.K,m)*sig


As recommended in the lab notes, all the parameters of the RNN will be stored in a class with the same name. We do then: 

In [10]:
from Lab4 import RNN, LoadBook

# Load the book
book = LoadBook()

# Initialize the network
rnn = RNN(book.K)

### 0.3 Synthesizing test from the randomly initizalized Weigths and Bias

During training, and to check the continuous improvement of the performance, some example text will be Synthesize. Therefore, a function to Synthesize text must be written.  This function must take as input, the RNN, the hidden state vector h0 at time 0, another vector that represents the first input vector to the RNN (x0) and an integrer n denoting the lenght of the synthesized text.  The next input vector xnext is obtained from the current input vector x .  At each time step t when the vector of probabilities is generated, the label must be extracted from this probability distribution.   This sample will then be the t + 1th character in your sequence and will be the input vector for the next time-step of the RNN. 

In [13]:
# States initialization 
x_t = np.zeros((K,1))
# First dummy charachter
x_t[char_to_ind['d']] = 1

# We don't want to modify the value of the hidden in the network, just use it
hidden = rnn.hidden_init

# Length of the output text
n = 30

# We need to store the text
text = [] 

for t in range (n):
    # Prediction
    a_t = np.dot(rnn.W,hidden) + np.dot(rnn.U,x_t) + rnn.b
    hidden = np.tanh(a_t)
    o_t = np.dot(rnn.V,hidden) + rnn.c
    e = np.exp(o_t)
    p_t = e / e.sum()

    # Randomly pick a character
    cp = np.cumsum(p_t)
    # Random number from normal distribution
    a = np.random.uniform()
    ind = np.array(np.nonzero(cp - a > 0))
    ind = ind[0,0]
    
    # take sampled index as an input to next sampling
    x_t = np.zeros((K,1))
    x_t[ind] = 1
    
    # Save the computed character
    text.append(ind)
     
# The final text is 
text_char = ''.join([ind_to_char[character] for character in text])
text_char

'pKz \xe22A7Dz/vprq)^WLC2)?^LIWzh\xe2'

This function operations are implemented in the Functions class.

In [14]:
from Lab4 import RNN, LoadBook, Functions

# Load the book
book = LoadBook()

# Initialize the network
rnn = RNN(book.K)

# Initialize functions
functions = Functions(rnn,book)

# Length of the output text
n = 40

# Get the randomly Synthesized text
text = functions.Synthesize_text(char_to_ind['d'], n)

In [15]:
text_char = ''.join([ind_to_char[character] for character in text])
text_char

'S,6K\xc3}l:ZFr"m,mZ\tnhB/bqu"b0D!oC^\'-gB\nt0v'

### 0.4 Implement the forward & backward pass of back-prop

Next up is writing the code to compute the gradients of the loss w.r.t. the parameters of the model.  The first seq_length characters of book data will be set as the labelled sequence for debugging:

In [16]:
X_chars = book_data[0:seq_length]
Y_chars = book_data[1:seq_length+1]

Note the label for an input character is the next character in the book. X_chars and Y_chars have to be converted to matrices X and Y containing the one-hot encoding of the characters of the sequence. Both X and Y have size K × seq_length and each column of the respective matrices corresponds to an input vector and its target output vector. This is done: 

In [17]:
# Initialize the matrices
X = np.zeros((book.K,rnn.seq_length))
Y = np.zeros((book.K,rnn.seq_length))

# One hot encoding
for i in range(rnn.seq_length):
    X[char_to_ind[X_chars[i]],i] = 1
    Y[char_to_ind[Y_chars[i]],i] = 1

dumb = []
# We checked if it worked
for i in range(rnn.seq_length):
    for j in range(book.K): 
        if X[j,i] == 1:
            dumb.append(ind_to_char[j])
print(''.join([character for character in dumb ]))

HARRY POTTER AND THE GOBL


We include the one hot encoding as a function of the class Functions: 

In [18]:
from Lab4 import RNN, LoadBook, Functions

# Load the book
book = LoadBook()

# Initialize the network
rnn = RNN(book.K)

# Initialize functions
functions = Functions(rnn,book)

# Test strings
X_chars = book_data[0:seq_length]
Y_chars = book_data[1:seq_length+1]

# Onte hot encoding
X,Y = functions.char_to_hot(X_chars, Y_chars)

Now the forward pass must be implemented. Following the instructions in the lab notes and reusing the code in the syntesize function:

In [19]:
# Test strings
X_chars = book_data[0:seq_length]
Y_chars = book_data[1:seq_length+1]

# One hot encoding
x_t,y_t = functions.char_to_hot(X_chars, Y_chars)


# Initizalize hidden
hidden = np.zeros((m,rnn.seq_length + 1))
# We don't want to modify the value of the hidden in the network, just use it
hidden[:,-1] = rnn.hidden_init[:,0]

# Probabilities matrix
p = np.zeros((book.K,seq_length))

# Cost initialization
cost = 0

for t in range(rnn.seq_length): 
    
    # find new hidden state
    a_t = np.dot(rnn.W, hidden[:,t]) + np.dot(rnn.U, x_t[:,t]) + rnn.b.squeeze()
    hidden[:,t + 1] = np.tanh(a_t)

    # unnormalized log probabilities for next chars o_t
    o_t = np.dot(rnn.V, hidden[:,t + 1]) + rnn.c.squeeze()
    
    e = np.exp(o_t)
    # Softmax
    if t == 0: 
        p[:,t] = e / e.sum()
    else: 
        p[:,t] = e / e.sum()

    # Cross-entropy loss
cost = -np.log((p*y_t).sum(axis = 1) + np.finfo(float).eps).sum()
cost

2542.8033831179364

This algorithm is implemented as a function of the class functions. Let's check if it was correctly implemented:

In [20]:
from Lab4 import RNN, LoadBook, Functions

# Load the book
book = LoadBook()

# Initialize the network
rnn = RNN(book.K)

# Initialize functions
functions = Functions(rnn,book)

# Test strings
X_chars = book.book_data[0:rnn.seq_length]
Y_chars = book.book_data[1:rnn.seq_length+1]

# Onte hot encoding
X,Y = functions.char_to_hot(X_chars, Y_chars)

# Forward - pass
p, cost = functions.forward_pass(X,Y)

All the characters of the book are now in the vector book_data. Now we need a vector with the unique classes of characters in the book: 

In [21]:
cost

2542.8049462457648

Now is time to implement the backward pass: 

In [22]:
functions.grads_to_zero()

dh_next = np.zeros_like(hidden[:,0])

for t in reversed(range(rnn.seq_length)):

        # gradient w.r.t. o_t
        g = p[:,t] - Y[:,t]
        
        # Bias c gradient update
        functions.RNN.grad_c[:,0] += g
        
        # gradient w.r.t. V and c
        functions.RNN.grad_V += np.outer(g, hidden[:,t+1])

        # gradient w.r.t. h, tanh nonlinearity
        dh = (1 - hidden[:,t+1] ** 2) * (np.dot(functions.RNN.V.T, g) + dh_next)

        # gradient w.r.t. U
        functions.RNN.grad_U += np.outer(dh, X[:,t])

        # gradient w.r.t W
        functions.RNN.grad_W += np.outer(dh, hidden[:,t])

        # gradient w.r.t. b
        functions.RNN.grad_b[:,0] += dh

        # Next (previous) dh
        dh_next = np.dot(functions.RNN.W.T, dh)


This algorithm is implemented as a function of the class functions. Let's check if it was correctly implemented:

In [24]:
from Lab4 import RNN, LoadBook, Functions

# Load the book
book = LoadBook()

# Initialize the network
rnn = RNN(book.K)

# Initialize functions
functions = Functions(rnn,book)

# Test strings
X_chars = book.book_data[0:rnn.seq_length]
Y_chars = book.book_data[1:rnn.seq_length+1]

# Onte hot encoding
X,Y = functions.char_to_hot(X_chars, Y_chars)

# Forward - pass
p, cost = functions.forward_pass(X,Y)

functions.backward_pass(X, Y, p, cost)

-0.0010248988667696233