In [3]:
__author__ = "Cindy Wang"
__version__ = "Alibaba August 2018"

# Setup

The following packages are required:
```
numpy==1.14.0
scipy==1.0.0
scikit-learn >= 0.19.1
jupyter==1.0.0
```

You can paste these into a `requirements.txt` file and run
`pip install -r requirements.txt`. You will also need to install Tensorflow by following the instructions on the [website](https://www.tensorflow.org/install/).

## Running the notebook
Navigate to your notebooks directory (wherever this file is located) and run `jupyter notebook --port 5656` to start the notebook server.

# RNNs

Recall the most basic artifical neural network, an MLP (multi-layer perceptron, or feed-forward neural network). Each layer of this network has its own weights $W$, biases $b$, and activation function $f$. These are applied, then the activations are sent to the next layer. The following set of equations describes an $l$-layer MLP:

$$\begin{align} 
z^{(2)} &= W^{(1)} x + b^{(1)} \\ 
h^{(2)} &= f(z^{(2)}) \\ 
z^{(3)} &= W^{(2)} h^{(2)} + b^{(2)} \\ 
h^{(3)} &= f(z^{(3)}) \\ 
&\dots \\
z^{(l+1)} &= W^{(l)} h^{(l)} + b^{(l)}   \\ 
h^{(l+1)} &= f(z^{(l+1)}) \\
y &= softmax(h^{(l+1)})
\end{align}$$

Now imagine that our inputs are sequential. Our objective is now to capture relationships between successive inputs. We can achieve this by feeding **successive inputs** to **successive hidden layers**. In the diagram below, inputs are passed through identical copies of the neural network layer $A$.
![rnn1](img/rnn1.png)

This is essentially what an RNN is.
![rnn2](img/rnn2.png)

## The problem with RNNs

$$\textbf{h}_t = \sigma (\textbf{Ux}_t + \textbf{Vh}_{t-1})$$

$$\textbf{h}_t = \sigma (\textbf{Ux}_t + \textbf{V}(\sigma(\textbf{Ux}_{t-1} + \textbf{V}(\sigma(\textbf{Ux}_{t-2})))$$

$$\frac{\partial E_3}{\partial U} = \frac{\partial E_3}{\partial out_3}\frac{\partial out_3}{\partial h_3}\frac{\partial h_3}{\partial h_2}\frac{\partial h_2}{\partial h_1}\frac{\partial h_1}{\partial U}$$
                                                   

![sigmoid gradient](img/Sigmoid-gradient.png)

# LSTMs

$$\text{loss} = -\frac{1}{N}\sum_{i=1}^{N} \ln p_{\text{target}_i}$$

## The Echo LSTM: A simple working example
We will go through a basic example that involves feeding a sequence of IDs (numbers) into an LSTM one at a time. (For NLP, these IDs would be the IDs of words in the vocabulary.) Our goal is for the network to echo back a partial or full list of contiguous observations observed.

While this seems overly simplistic, we can think of it as machine translation from L1 to L2, where L1=L2. This requires the network to remember blocks of contiguous observations and demonstrates the LSTM's ability to encode temporal information in low dimensions.

In [26]:
import tensorflow as tf
import numpy as np
import pandas as pd
import random
import sys

## Generate sequence
Normally we would convert sequences of words into a sequences of IDs, but since we are only interested in echoing the sequence, we will just generate it randomly.

In [134]:
def generate_sequence(num_seq, length):
    return np.random.randint(100, size=(num_seq, length))

## Encode sequence
In the previous lesson, we learned about word embeddings. Here we initialize them randomly, but these can be initialized with pretrained word2vec or GloVe as well.

Since our vocabulary is small, another option is to use one-hot encoding.
```
[0,1,2] -> [[1., 0., 0.],
            [0., 1., 0.],
            [0., 0., 1.]]
```

In [135]:
def embedding_encode(ids, vocab_size=100, embedding_size=100):
    with tf.variable_scope('word_embedding'):
        # Embedding matrix
        C = tf.get_variable('C', [vocab_size, embedding_size], initializer=tf.random_normal_initializer())
        return tf.nn.embedding_lookup(C, ids)
                        
def one_hot_encode(ids, vocab_size=100):
    return tf.one_hot(ids, vocab_size)

## Create training set

blah

In [150]:
from sklearn.model_selection import train_test_split

def get_data(n_in, n_out, size=10000):
    X = generate_sequence(size, n_in)
    y = X[:,:n_out]
    return X, y

n_in = 5
n_out = 2
X, y = get_data(n_in, n_out)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

In [162]:
class BasicEncoderDecoder(object):
    """
    Adapted from Chris Potts's CS224U model classes (https://github.com/cgpotts/cs224u)
    
    Parameters
    ----------
    embed_dim : int
        Dimensionality of embeddings (if one-hot, size of vocab).
    cell_class : tf.nn.rnn_cell class
       The default is `tf.nn.rnn_cell.LSTMCell`. Other prominent options:
       `tf.nn.rnn_cell.BasicRNNCell`, and `tf.nn.rnn_cell.GRUCell`.
    hidden_dim : int
        Dimensionality of hidden layers.
    hidden_activation : tf.nn activation
       E.g., tf.nn.relu, tf.nn.relu, tf.nn.selu.
    max_iter : int
    eta : float
        Learning rate
    tol : float
        Stopping criterion for the loss.
    display_progress : int
        For value i, progress is printed every ith iteration.
    Attributes
    ----------
    errors : list
        Tracks loss from each iteration during training.
    """
    def __init__(self, 
                 embed_dim=100,
                 cell_class=tf.nn.rnn_cell.LSTMCell,
                 hidden_dim=150, 
                 hidden_activation=tf.nn.tanh,
                 batch_size=200, 
                 eta=0.01, 
                 tol=1e-4, 
                 display_progress=1):
        self.embed_dim = embed_dim
        self.cell_class = cell_class
        self.hidden_dim = hidden_dim
        self.hidden_activation = hidden_activation
        self.batch_size = batch_size
        self.eta = eta
        self.tol = tol
        self.display_progress = display_progress
    
    def encoder(self, encoder_inputs):
        with tf.variable_scope("encoder"):
            # Declaring LSTM cell
            encoder_cell = self.cell_class(
                self.hidden_dim, activation=self.hidden_activation)
            # Declaring RNN with above cell
            return tf.nn.dynamic_rnn(
                encoder_cell,
                encoder_inputs,
                dtype=tf.float32)
            
    def decoder(self, encoder_state, decoder_inputs):
        with tf.variable_scope("decoder"):
            # Need to declare again
            decoder_cell = self.cell_class(
                self.hidden_dim, activation=self.hidden_activation)
            decoder_outputs, _ = tf.nn.dynamic_rnn(
                decoder_cell, 
                decoder_inputs, 
                initial_state=encoder_state)
            return tf.layers.dense(decoder_outputs, self.embed_dim)
        
    def enc_dec_body(self, inputs):
        _, encoder_state = self.encoder(inputs)
        decoder_inputs = tf.tile(tf.expand_dims(encoder_state.h, 1), (1, self.trg_len, 1))
        return self.decoder(encoder_state, decoder_inputs)
            
    def to_embedding(self, ids):
        # Can alternatively use random, word2vec, or GloVe embeddings
        return tf.one_hot(ids, self.embed_dim)
    
    def shift_left_3d(self, x):
        """Shift the second dimension of x right by one."""
        return tf.pad(x, [[0, 0], [1, 0], [0, 0]])[:, :-1, :]
        
    def fit(self, X, y, max_iter=100, log_msg="", **kwargs):
        """Standard `fit` method.
        Parameters
        ----------
        X : [n x src_len] int
        y : [n x trg_len] int

        Returns
        -------
        self
        """
        if isinstance(X, pd.DataFrame):
            X = X.values

        self.src_len = len(X[0])
        self.trg_len = len(y[0]) 

        # Start the session:
        tf.reset_default_graph()
        self.sess = tf.InteractiveSession()

        # Declare the inputs and outputs
        self.src = tf.placeholder(
            tf.int32, [None, self.src_len])
        self.trg = tf.placeholder(
            tf.int32, shape=[None, None])
        inputs = self.to_embedding(self.src)
        outputs = self.to_embedding(self.trg)
        
        # Build the encoder-decoder body.
        self.model = self.enc_dec_body(inputs)

        # Optimizer set-up:
        self.cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(
                logits=self.model, labels=outputs))
        self.optimizer = tf.train.AdamOptimizer(self.eta).minimize(self.cost)
        self.pred = tf.argmax(tf.nn.softmax(self.model), axis=2)
        self.accuracy = tf.metrics.accuracy(self.trg, self.pred)

        # Initialize the session variables:
        self.sess.run(tf.global_variables_initializer())
        self.sess.run(tf.local_variables_initializer())

        # Training, full dataset for each iteration:
        for i in range(1, max_iter+1):
            loss = 0
            acc = []
            for X_batch, y_batch in self.batch_iterator(X, y):
                _, batch_loss, batch_acc = self.sess.run(
                    [self.optimizer, self.cost, self.accuracy],
                    feed_dict={
                        self.src: X,
                        self.trg: y
                    })
                loss += batch_loss
                acc.append(batch_acc[1])
            acc = sum(acc)/len(acc)
            if loss < self.tol:
                self._progressbar("stopping with loss < self.tol", i)
                break
            else:
                self._progressbar("loss: {}, accuracy: {}".format(loss, acc), i)
        return self
    
    def batch_iterator(self, X, y):
        dataset = list(zip(X, y))
        random.shuffle(dataset)
        for i in range(0, len(dataset), self.batch_size):
            batch = dataset[i: i+self.batch_size]
            X_batch, y_batch = zip(*batch)
            yield X_batch, y_batch

    def predict(self, X, y=None, report_acc=False):
        """Return target sequence
        Parameters
        ----------
        X : np.array
        Returns
        -------
        list
        """
        if report_acc:
            if y is None:
                raise ValueError
            return self.sess.run(
                [self.pred, self.accuracy], feed_dict={self.src: X, self.trg: y})
        return self.sess.run([self.pred], feed_dict={self.src: X})[0]
    
    def _progressbar(self, msg, index):
        if self.display_progress and index % self.display_progress == 0:
            sys.stderr.write('\r')
            sys.stderr.write('Iter {}: {}'.format(index, msg))
            sys.stderr.flush()

In [165]:
bed = BasicEncoderDecoder()
bed.fit(X_train, y_train, max_iter=25)

Iter 25: loss: 0.0139156941732, accuracy: 0.972046352923

<__main__.BasicEncoderDecoder at 0x1366c6f10>

In [166]:
preds, acc = bed.predict(X_test, y_test, report_acc=True)
print "Accuracy: ", acc[1]
for y, yp in zip(y_test, preds):
    print y, yp, y==yp

Accuracy:  0.97259635
[19 13] [19 13] [ True  True]
[88 23] [88 23] [ True  True]
[ 4 24] [ 4 24] [ True  True]
[22 60] [22 60] [ True  True]
[17 62] [17 62] [ True  True]
[92 36] [92 36] [ True  True]
[22 68] [22 68] [ True  True]
[51 59] [51 59] [ True  True]
[67 93] [67 93] [ True  True]
[70 49] [70 49] [ True  True]
[65 86] [65 86] [ True  True]
[65 66] [65 66] [ True  True]
[22 47] [22 47] [ True  True]
[44 62] [44 62] [ True  True]
[43 75] [43 75] [ True  True]
[18 25] [18 25] [ True  True]
[65  5] [65  5] [ True  True]
[62 82] [62 82] [ True  True]
[83 96] [83 96] [ True  True]
[39 19] [39 19] [ True  True]
[75 36] [75 36] [ True  True]
[36 24] [36 24] [ True  True]
[26 22] [26 22] [ True  True]
[93  7] [93  7] [ True  True]
[71  5] [71  5] [ True  True]
[19 48] [19 48] [ True  True]
[48 34] [48 34] [ True  True]
[81 92] [81 81] [ True False]
[71 34] [71 34] [ True  True]
[74 24] [74 24] [ True  True]
[79 12] [79 12] [ True  True]
[96 87] [96 87] [ True  True]
[62 77] [62 77] [ 

[70  1] [70  1] [ True  True]
[70 68] [70 68] [ True  True]
[31 77] [31 77] [ True  True]
[42  7] [42  7] [ True  True]
[97  8] [97  8] [ True  True]
[85 61] [85 85] [ True False]
[3 4] [3 4] [ True  True]
[47 98] [47 98] [ True  True]
[41 79] [41 79] [ True  True]
[61 29] [61 29] [ True  True]
[13 64] [13 64] [ True  True]
[55  2] [55  2] [ True  True]
[14 95] [14 95] [ True  True]
[ 1 76] [ 1 76] [ True  True]
[59 58] [59 58] [ True  True]
[17 29] [17 29] [ True  True]
[11 71] [11 11] [ True False]
[69 90] [69 90] [ True  True]
[48 52] [48 52] [ True  True]
[22 91] [22 91] [ True  True]
[61 11] [61 11] [ True  True]
[14 76] [14 76] [ True  True]
[62 21] [62 21] [ True  True]
[14 82] [14 82] [ True  True]
[74 48] [74 48] [ True  True]
[81 89] [81 89] [ True  True]
[84 23] [84 23] [ True  True]
[43 22] [43 22] [ True  True]
[75 46] [75 46] [ True  True]
[82  8] [82  8] [ True  True]
[55 17] [55 17] [ True  True]
[46 67] [46 46] [ True False]
[69 64] [69 64] [ True  True]
[98 22] [98 22

[47 42] [47 42] [ True  True]
[99 61] [99 61] [ True  True]
[30 65] [30 30] [ True False]
[ 2 74] [ 2 74] [ True  True]
[26 70] [26 70] [ True  True]
[73 52] [73 52] [ True  True]
[46 11] [46 11] [ True  True]
[45 58] [45 58] [ True  True]
[46 48] [46 48] [ True  True]
[67 18] [67 18] [ True  True]
[ 3 13] [ 3 13] [ True  True]
[70 96] [70 96] [ True  True]
[61 24] [61 24] [ True  True]
[33 41] [33 41] [ True  True]
[43 49] [43 49] [ True  True]
[90 32] [90 32] [ True  True]
[51 91] [51 91] [ True  True]
[36 13] [36 13] [ True  True]
[ 5 31] [ 5 31] [ True  True]
[34  0] [34  0] [ True  True]
[32 24] [32 24] [ True  True]
[76 10] [76 10] [ True  True]
[49 40] [49 40] [ True  True]
[64 38] [64 64] [ True False]
[11 57] [11 57] [ True  True]
[10  1] [10  1] [ True  True]
[56 60] [56 60] [ True  True]
[99 30] [99 30] [ True  True]
[56  3] [56  3] [ True  True]
[48 37] [48 37] [ True  True]
[52 85] [52 85] [ True  True]
[67 48] [67 48] [ True  True]
[63 81] [63 81] [ True  True]
[95 52] [9

[61 27] [61 27] [ True  True]
[69 88] [69 88] [ True  True]
[87 65] [87 65] [ True  True]
[61 21] [61 61] [ True False]
[64 15] [64 15] [ True  True]
[63 54] [63 54] [ True  True]
[42 84] [42 84] [ True  True]
[94 96] [94 96] [ True  True]
[77 93] [77 93] [ True  True]
[71 14] [71 14] [ True  True]
[ 5 25] [ 5 25] [ True  True]
[ 8 58] [ 8 58] [ True  True]
[49 70] [49 70] [ True  True]
[57 60] [57 57] [ True False]
[56 51] [56 51] [ True  True]
[61 98] [61 98] [ True  True]
[38 61] [38 61] [ True  True]
[91 82] [91 24] [ True False]
[42  7] [42  7] [ True  True]
[12 40] [12 40] [ True  True]
[25 53] [25 53] [ True  True]
[17 97] [17 97] [ True  True]
[80 57] [80 57] [ True  True]
[53 62] [53 62] [ True  True]
[ 0 32] [ 0 32] [ True  True]
[ 9 38] [ 9 27] [ True False]
[17  4] [17  4] [ True  True]
[88 49] [88 49] [ True  True]
[ 6 69] [ 6 69] [ True  True]
[35 88] [35 88] [ True  True]
[68 18] [68 18] [ True  True]
[21 47] [21 47] [ True  True]
[40 89] [40 89] [ True  True]
[78 48] [7

[56  2] [56  2] [ True  True]
[13  0] [13  0] [ True  True]
[70 77] [70 77] [ True  True]
[90 24] [90 24] [ True  True]
[60  7] [60  7] [ True  True]
[41 77] [41 77] [ True  True]
[74 25] [74 25] [ True  True]
[36  1] [36 36] [ True False]
[98 18] [98 18] [ True  True]
[20 41] [20 41] [ True  True]
[55 94] [55 94] [ True  True]
[48 52] [48 52] [ True  True]
[58 92] [58 92] [ True  True]
[50 84] [50 84] [ True  True]
[30 19] [30 19] [ True  True]
[61 37] [61 61] [ True False]
[98  8] [98  8] [ True  True]
[90 78] [90 78] [ True  True]
[34 22] [34 22] [ True  True]
[75 71] [75 71] [ True  True]
[ 3 18] [ 3 18] [ True  True]
[75 47] [75 47] [ True  True]
[74 20] [74 20] [ True  True]
[41 15] [41 15] [ True  True]
[44 76] [44 76] [ True  True]
[59 16] [59 16] [ True  True]
[55  0] [55  0] [ True  True]
[79 25] [79 79] [ True False]
[98 82] [98 82] [ True  True]
[83 43] [83 43] [ True  True]
[27 66] [27 66] [ True  True]
[32 70] [32 70] [ True  True]
[92 76] [92 76] [ True  True]
[21  6] [2

[88 30] [88 30] [ True  True]
[77 74] [77 74] [ True  True]
[61 85] [61 85] [ True  True]
[82  3] [82  3] [ True  True]
[ 8 20] [ 8 20] [ True  True]
[81 13] [81 13] [ True  True]
[73 12] [73 12] [ True  True]
[14  4] [14  4] [ True  True]
[89 50] [89 50] [ True  True]
[16 31] [16 31] [ True  True]
[37 11] [37 37] [ True False]
[51 47] [51 47] [ True  True]
[20 71] [20 71] [ True  True]
[56 16] [56 16] [ True  True]
[70  6] [70  6] [ True  True]
[32 69] [32 69] [ True  True]
[ 3 34] [ 3 34] [ True  True]
[ 4 47] [ 4 47] [ True  True]
[16 88] [16 19] [ True False]
[60 73] [60 73] [ True  True]
[23 31] [23 31] [ True  True]
[16 31] [16 31] [ True  True]
[74 61] [74 61] [ True  True]
[32 12] [32 12] [ True  True]
[60 48] [60 48] [ True  True]
[54 58] [54 58] [ True  True]
[81 26] [81 26] [ True  True]
[67 79] [67 79] [ True  True]
[86 67] [86 67] [ True  True]
[34 32] [34 34] [ True False]
[59  3] [59  3] [ True  True]
[94 53] [94 53] [ True  True]
[37 29] [37 29] [ True  True]
[17 18] [1

### References
- "Understanding LSTM Networks" http://colah.github.io/posts/2015-08-Understanding-LSTMs/
- "Recurrent neural networks and LSTM tutorial in Python and TensorFlow" http://adventuresinmachinelearning.com/recurrent-neural-networks-lstm-tutorial-tensorflow/
- "How to use an Encoder-Decoder LSTM to Echo Sequences of Random Integers" https://machinelearningmastery.com/how-to-use-an-encoder-decoder-lstm-to-echo-sequences-of-random-integers/