# Minimal RNN notebook

### RNN representation

Contrary to a Feed Forward Neural Network, an RNN is a recurrent neural network, in which the information flow is not linear. A general representation can be seen as follows:

![Representation](img/rnn_simple.svg)

An RNN is useful to deal with sequential information: a sequence of inputs is fed through the network and the hidden state is updated at each step of the sequence. The sequence is commonly represented as a time sequence, and the most straight forward learning algorithm is backpropagation through time (BPTT) http://en.wikipedia.org/wiki/Backpropagation_through_time.

To understand properly BPTT, a better representation of the RNN is its unfolded version:

![Representation](img/rnn_unfolded.svg)

The input X is a sequence $x_0, x_1, ... x_t$, at each time-step t a new input $x_t$ is fed to the network.

### Equations

The most simple forward equations for a RNN are as follows:

$$h_t = \tanh(x_t . W_{in} + h_{t-1} . W_{rec})$$
$$y_t = softmax(h_t . W_{out})$$

Depending on the problem, all the outputs $y_0, ... y_t$ might be useful, or just $y_t$ the last one.

In [8]:
import numpy as np
import theano
import theano.tensor as T
from theano import shared 
from collections import OrderedDict

dtype=T.config.floatX
theano.config.optimizer='fast_compile'

In [9]:
def init_weight(shape, name, sample='uni'):
    if sample=='unishape':
        return shared(value=np.asarray(np.random.uniform(
                low=-np.sqrt(6. / (shape[0] + shape[1])),
                high=np.sqrt(6. / (shape[0] + shape[1])),
                size=shape), dtype=dtype), 
                    name=name, borrow=True)
    
    if sample=='svd':
        values = np.ndarray(shape, dtype=dtype)
        for dx in xrange(shape[0]):
            vals = np.random.uniform(low=-1., high=1.,  size=(shape[1],))
            values[dx,:] = vals
        _,svs,_ = np.linalg.svd(values)
        #svs[0] is the largest singular value                      
        values = values / svs[0]
        return shared(values, name=name, borrow=True)
    
    if sample=='uni':
        return shared(value=np.asarray(np.random.uniform(low=-0.1,high=0.1, size=shape), dtype=dtype), 
                      name=name, borrow=True)
    
    if sample=='zero':
        return shared(value=np.zeros(shape=shape, dtype=dtype), 
                      name=name, borrow=True)
    
    
    raise "error bad sample technique"

In [10]:
class Rnn:
    def __init__(self, n_in, n_hid, n_out, lr):   
        self.n_in = n_in
        self.n_hid = n_hid
        self.n_out = n_out
        self.W_in = init_weight((self.n_in, self.n_hid),'W_in', 'svd')
        self.W_out = init_weight((self.n_hid, self.n_out),'W_out', 'svd')
        self.W_rec = init_weight((self.n_hid, self.n_hid),'W_rec', 'svd')
        self.b_out = init_weight((self.n_out), 'b_out','zero')
        self.params = [self.W_in,self.W_out,self.W_rec, self.b_out]
        
        def step(x_t, h_tm1):
            h_t = T.tanh(T.dot(x_t, self.W_in) + T.dot(h_tm1, self.W_rec))
            y_t = T.nnet.softmax(- (T.dot(h_t, self.W_out) + self.b_out))            
            return [h_t, y_t]

        X = T.matrix() # X is a sequence of vectors
        Y = T.matrix() # Y is a sequence of vectors
        h0 = shared(np.zeros(self.n_hid, dtype=dtype)) # initial hidden state         
        lr = shared(np.cast[dtype](lr))
        
        [h_vals, y_vals], _ = theano.scan(fn=step,                                  
                                          sequences=X,
                                          outputs_info=[h0, None])
        
        #h_vals is a sequence of hidden states
        #y_vals is a sequence of outputs
        
        # compute cost : cross entropy cost
        cost = -T.mean(Y * T.log(y_vals)+ (1.- Y) * T.log(1. - y_vals))        
        # for mean squared error, use 
        # cost = -T.mean((Y - y_vals) ** 2)
        
        gparams = T.grad(cost, self.params)
        updates = OrderedDict()
        for param, gparam in zip(self.params, gparams):
            updates[param] = param - gparam * lr
                
        self.train = theano.function(inputs = [X, Y], outputs = cost, updates=updates)
        self.predictions = theano.function(inputs = [X], outputs = y_vals)
        self.debug = theano.function(inputs = [X, Y], outputs = [X.shape, Y.shape, h_vals.shape, y_vals.shape])
    

In [11]:
model = Rnn(7, 50, 7, 0.1)

In [12]:
#sequences of 100 elements and vector size 7
X = np.random.uniform(low=-0.1, high=0.1, size=(100,7)).astype(dtype=dtype) 
Y = np.random.uniform(low=-0.1, high=0.1, size=(100,7)).astype(dtype=dtype)

print(model.debug(X,Y))
model.predictions(X)

[array([100,   7]), array([100,   7]), array([100,  50]), array([100,   1,   7])]


array([[[ 0.14284763,  0.14250721,  0.14240675,  0.14396988,  0.14260595,
          0.14442767,  0.14123492]],

       [[ 0.14559752,  0.14280474,  0.14049399,  0.14519536,  0.14013915,
          0.14428471,  0.14148448]],

       [[ 0.14008938,  0.14464262,  0.14370957,  0.14081264,  0.14415146,
          0.14370921,  0.14288515]],

       [[ 0.14818682,  0.13951913,  0.14199056,  0.14270474,  0.14128901,
          0.14287454,  0.14343517]],

       [[ 0.14399292,  0.14488365,  0.1453148 ,  0.14258461,  0.13886179,
          0.13955611,  0.14480619]],

       [[ 0.14164145,  0.14122082,  0.14076631,  0.14671937,  0.14178421,
          0.14381069,  0.14405715]],

       [[ 0.13907513,  0.14299788,  0.14283548,  0.14237292,  0.14659183,
          0.1456562 ,  0.14047052]],

       [[ 0.14139324,  0.14561172,  0.14414689,  0.14105491,  0.14538541,
          0.14147893,  0.14092895]],

       [[ 0.14625072,  0.13952824,  0.13899292,  0.14481378,  0.14001323,
          0.14398773,  0.14641

In [13]:
model.train(X,Y)

array(0.15975379943847656, dtype=float32)

In [14]:
nb_epochs = 100
#stupid and naive sgd
for x in range(nb_epochs):
    error = 0.
    for j in range(len(train_data)):  
        index = np.random.randint(0, len(train_data))
        i, o = train_data[index]
        train_cost = model.train(i, o)
        error += train_cost
    if x%10==0:
            print "epoch "+str(x)+ " error: "+str(error)

NameError: name 'train_data' is not defined