# Day 6: Sequence Models in Deep Learning

### Exercise 6.1 
Convince yourself a RNN is just an MLP with inputs and outputs at various layers. Run the NumpyRNN code. Set break-points and compare with what you learned about back-propagation in the previous day.

Start by loading data Part-of-speech data and configure it for the exercises

In [1]:
# Add tools
# NOTE: This should only be needed if you do not stire the notebook on the lxmls root
import sys
sys.path.append('../../')

In [2]:
# Location of Part-of-Speech WSJ Data
WSJ_TRAIN = "../../data/train-02-21.conll"
WSJ_TEST = "../../data/test-23.conll"
WSJ_DEV = "../../data/dev-22.conll"

In [3]:
# Load Part-of-Speech data 
import lxmls.readers.pos_corpus as pcc
corpus = pcc.PostagCorpus()
train_seq = corpus.read_sequence_list_conll(WSJ_TRAIN, max_sent_len=15, max_nr_sent=1000)
test_seq = corpus.read_sequence_list_conll(WSJ_TEST, max_sent_len=15, max_nr_sent=1000)
dev_seq = corpus.read_sequence_list_conll(WSJ_DEV, max_sent_len=15, max_nr_sent=1000) 
# Redo indices so that they are consecutive. Also cast all data to numpy arrays
# of int32 for compatibility with GPUs and theano.
train_seq, test_seq, dev_seq = pcc.compacify(train_seq, test_seq, dev_seq, theano=True)
# Get number of words and tags in the corpus
nr_words = len(train_seq.x_dict)
nr_tags = len(train_seq.y_dict.keys())

**TODO: Move this to a later exercise**

In [4]:
# Embeddings Path
EMBEDDINGS = "../../data/senna_50"
import lxmls.deep_learning.embeddings as emb
import os
reload(emb)
if not os.path.isfile(EMBEDDINGS):
    emb.download_embeddings('senna_50', EMBEDDINGS)
E = emb.extract_embeddings(EMBEDDINGS, train_seq.x_dict)  

Getting embeddings for the vocabulary 3618/4786 
24.4% missing embeddings, set to random


Model configuration

In [5]:
import lxmls.deep_learning.rnn as rnns
reload(rnns)

<module 'lxmls.deep_learning.rnn' from '../../lxmls/deep_learning/rnn.pyc'>

In [6]:
# RNN configuration
SEED = 1234       # Random seed to initialize weigths
hidden_size = 20  # size of hidden layer

In [7]:
np_rnn = rnns.NumpyRNN(E, hidden_size, nr_tags, seed=SEED)

In [8]:
x0 = train_seq[0].x
y0 = train_seq[0].y
loos, p_y, p, y_rnn, h, z1, x = np_rnn.forward(x0, all_outputs=True, outputs=y0)

In [9]:
# Compute gradients
numpy_rnn_gradients = np_rnn.grads(x0, y0)

### Exercise 6.2
Scan is your friend, maybe.

### Exercise 6.3
Complete the theano code for a RNN inside lxmls/deep learning/rnn.py. Use exercise 6.1 for a numpy example and 6.2 to learn how to handle scan. You only need to implement the forward pass. To debug modify and compile the forward pass

In [10]:
import numpy as np
import theano
import theano.tensor as T
rnn = rnns.RNN(E, hidden_size, nr_tags, seed=SEED)

In [11]:
# Compile theano function
x = T.ivector('x')
th_forward = theano.function([x], rnn._forward(x).T)
np_forward = np_rnn.forward

In [12]:
assert np.allclose(th_forward(x0), np_forward(x0)), "Numpy and Theano forward pass differ!"

Once you are confident the forward pass is working you can test the gradients

In [13]:
# Compile function returning the list of gradients
x = T.ivector('x')
p_y = rnn._forward(x)
y = T.ivector('y')
F = -T.mean(T.log(p_y)[T.arange(y.shape[0]), y])
grads_fun = theano.function([x, y], [T.grad(F, par) for par in rnn.param])

In [14]:
# Compare numpy and theano gradients
theano_rnn_gradients = grads_fun(x0, y0)
for n in range(len(theano_rnn_gradients)): 
    assert np.allclose(numpy_rnn_gradients[n], theano_rnn_gradients[n]), "Numpy and Theano gradients differ in step n"