In [46]:
%load_ext autoreload
%autoreload 2
%matplotlib inline
import sys
sys.path.append('../lxmls-toolkit')
import lxmls
import scipy

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


## Exercise 6.1

**Convince yourself that a RNN is just an FF unfolded in time. Run the NumpyRNN code. Set break-points and compare with what you learned about backpropagation in the previous day.**

**To work with RNNs we will use the Part-of-speech data-set seen in the sequence models day.**

In [110]:
import lxmls.readers.pos_corpus as pcc
corpus =  lxmls.readers.pos_corpus.PostagCorpus()

data_path = "../lxmls-toolkit/data/"

train_seq = corpus.read_sequence_list_conll(data_path + "train-02-21.conll",
                                            max_sent_len=15,
                                            max_nr_sent=1000)

test_seq = corpus.read_sequence_list_conll(data_path + "test-23.conll",
                                           max_sent_len=15,
                                           max_nr_sent=1000)

dev_seq = corpus.read_sequence_list_conll(data_path + "dev-22.conll", 
                                          max_sent_len=15,
                                          max_nr_sent=1000)

In [111]:
nr_words = len(train_seq.x_dict)
nr_tags = len(train_seq.y_dict)

print "nr_words:", nr_words
print "nr_tags:", nr_tags

nr_words: 19217
nr_tags: 12


In [112]:
len(train_seq.x_dict)

19217

In [113]:
train_seq[0].y

[0, 0, 6, 0, 4]

In [114]:
train_seq[0].x

[42, 40, 43, 44, 41]

In [115]:
train_seq[0]

Ms./noun Haag/noun plays/verb Elianti/noun ./. 

We will need to redo the indices of the data so that they are consecutive and cast all data to numpy arrays of int32 for compatibility with GPUs. This function will also add reverse indices to recover tag and word from its index word dict and tag dict

- **Why do the pcc.compacify changes the indicies of the words in train_seq, test_seq and dev_seq ?**

    - the idea is the new mapping is compact: does not have unused indices

In [118]:
# Redo indices
train_seq, test_seq, dev_seq = pcc.compacify(train_seq, test_seq, dev_seq, theano=True) # Get number of words and tags in the corpus
nr_words = len(train_seq.x_dict)
nr_tags = len(train_seq.y_dict)

In [119]:
print "nr_words:", nr_words
print "nr_tags:", nr_tags

nr_words: 4786
nr_tags: 12


In [120]:
train_seq[0].x

array([0, 1, 2, 3, 4], dtype=int32)

In [121]:
train_seq[1].x

array([ 5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16], dtype=int32)

In [122]:
train_seq[100].x

array([  5, 536, 537, 165, 347, 105, 538, 289, 131, 539, 289,   4], dtype=int32)

Load and configure the NumpyRNN. Remember to user reload if you want to modify the code inside the rnns module

In [123]:
import lxmls.deep_learning.rnn as rnns
reload(rnns)
# RNN configuration
SEED = 1234 # Random seed to initialize weigths
emb_size = 50 # Size of word embeddings
hidden_size = 20 # size of hidden layer

np_rnn = rnns.NumpyRNN(nr_words, emb_size, hidden_size, nr_tags, seed=SEED)
x0 = train_seq[0].x
y0 = train_seq[0].y

In [124]:
# Forward pass
p_y, y_rnn, h, z1, x = np_rnn.forward(x0, all_outputs=True) 

# Gradients
numpy_rnn_gradients = np_rnn.grads(x0, y0)

## Scan operation in theano

Handling variable length computation graphs in an automatic fashion is not simple. Theano provides the scan function for this purpose. The scan function acts as a symbolic “for” loop. Since, unlike for normal python “for” loops, it is not possible to put a breakpoint in the scan loop, the design of graphs with scan has to be handled with care. Toolboxes like Keras conveniently abstract the user from such constructs. However, for complex designs it will be necessary to be able to use scan or equivalent functions.

## Exercise 6.2

**Understand the basics of scan with these examples. Scan allows you to build computation graphs with a variable number of nodes. It acts as a python ”for” loop but it is symbolic. The following example should help you understand the basic scan functionality. It generates a sequence for a given length. Run it and modify it. Try to arrive at an error and understand what happened.**


In [51]:
import numpy as np
import theano
import theano.tensor as T
theano.config.optimizer='None'

def square(x): 
    return x**2

# Python
def np_square_n_steps(nr_steps): 
    out = []
    for n in np.arange(nr_steps): 
        out.append(square(n))
    return np.array(out)

In [52]:
# Theano
nr_steps = T.lscalar('nr_steps')
h, _ = theano.scan(fn=square, sequences=T.arange(nr_steps))
th_square_n_steps = theano.function([nr_steps], h)

# Compare both
print np_square_n_steps(10)
print th_square_n_steps(10)

[ 0  1  4  9 16 25 36 49 64 81]
[ 0  1  4  9 16 25 36 49 64 81]


The following example should help you understand about matrix multiplications and passing values from one iteration to the other. At each step, we will multiply the output of the previous step by a matrix A. We start with an initial vector s0. The matrix and vector are random but normalized to result on a Markov chain (this is irrelevant for the use of scan)

In [53]:
# Configuration
nr_states = 3
nr_steps = 5

# Transition matrix
A = np.abs(np.random.randn(nr_states, nr_states)) 
A = A/A.sum(0, keepdims=True)

# Initial state
s0 = np.zeros(nr_states)
s0[0] = 1

In [36]:
# Numpy version
def np_markov_step(s_tm1): 
    s_t = np.dot(s_tm1, A.T) 
    return s_t

def np_markov_chain(nr_steps, A, s0):
    # Pre-allocate space
    s = np.zeros((nr_steps+1, nr_states)) 
    s[0, :] = s0
    for t in np.arange(nr_steps):
         s[t+1, :] = np_markov_step(s[t, :]) 
    return s


In [37]:
np_markov_chain(nr_steps, A, s0)

array([[ 1.        ,  0.        ,  0.        ],
       [ 0.36331926,  0.00998989,  0.62669084],
       [ 0.22093512,  0.07267485,  0.70639002],
       [ 0.22723631,  0.08820146,  0.68456223],
       [ 0.23850484,  0.08797088,  0.67352428],
       [ 0.24099095,  0.08686008,  0.67214897]])

In [40]:
# Theano version
# Store variables as shared variables
th_A = theano.shared(A, name='A', borrow=True) 
th_s0 = theano.shared(s0, name='s0', borrow=True) 

# Symbolic variable for the number of steps 
th_nr_steps = T.lscalar('nr_steps')

def th_markov_step(s_tm1):
    s_t = T.dot(s_tm1, th_A.T)
    # Remember to name variables 
    s_t.name = 's_t'
    return s_t

s, _ = theano.scan(th_markov_step, 
                   outputs_info=[dict(initial=th_s0)],
                   n_steps=th_nr_steps)

th_markov_chain = theano.function([th_nr_steps], T.concatenate((th_s0[None, :], s), 0))

th_markov_chain(nr_steps)

array([[ 1.        ,  0.        ,  0.        ],
       [ 0.36331926,  0.00998989,  0.62669084],
       [ 0.22093512,  0.07267485,  0.70639002],
       [ 0.22723631,  0.08820146,  0.68456223],
       [ 0.23850484,  0.08797088,  0.67352428],
       [ 0.24099095,  0.08686008,  0.67214897]])

# A RNN in Theano for Part-of-Speech Tagging

## Exercise 6.3

** Complete the theano code for a RNN inside lxmls/deep learning/rnn.py. Use exercise 6.1 for a numpy example and 6.2 to learn how to handle scan. Keep in mind that you only need to implement the forward pass! Theano will handle backpropagation for us.**

In [41]:
# Instantiate the class
rnn = rnns.RNN(nr_words, emb_size, hidden_size, nr_tags, seed=SEED)
# Compile the forward pass function
x = T.ivector('x')
th_forward = theano.function([x], rnn._forward(x).T)


When working with theano, it is more difficult to localize the source of errors. It is therefore important to work step by step and test the code frequently. To debug we suggest to implement and compile the forward pass first. You can use this code for testing. If it raises no error you are good to go.

In [43]:
help(theano.scan)

Help on function scan in module theano.scan_module.scan:

scan(fn, sequences=None, outputs_info=None, non_sequences=None, n_steps=None, truncate_gradient=-1, go_backwards=False, mode=None, name=None, profile=False, allow_gc=None, strict=False)
    This function constructs and applies a Scan op to the provided
    arguments.
    
    :param fn:
        ``fn`` is a function that describes the operations involved in one
        step of ``scan``. ``fn`` should construct variables describing the
        output of one iteration step. It should expect as input theano
        variables representing all the slices of the input sequences
        and previous values of the outputs, as well as all other arguments
        given to scan as ``non_sequences``. The order in which scan passes
        these variables to ``fn``  is the following :
    
        * all time slices of the first sequence
        * all time slices of the second sequence
        * ...
        * all time slices of the last sequen