# Train a Bidirectional LSTM on the IMDB sentiment classification task

## Python 2 Compatibility

In [4]:
from __future__ import print_function
import numpy as np

## Import Keras

In [1]:
from keras.preprocessing import sequence

from keras.models import Sequential
# Obviousliy imports imbdb stuff
from keras.datasets import imdb

Using TensorFlow backend.


### Keras Layers

In [6]:
# These are just layers to be stacked up on the network
from keras.layers import Dense, Dropout, Embedding, LSTM, Bidirectional

Layer | Docs | Summary | Questions
---:|---
Dense | <code>Dense</code> implements the operation: <code>output = activation(dot(input, kernel) + bias)</code> where activation is the element-wise activation function passed as the activation argument, kernel is a weights matrix created by the layer, and bias is a bias vector created by the layer (only applicable if use_bias is True). | It dots the matrix with a list of weights, and if it's greater than some threshold, it returns that matrix as 1, else returns 0 | What exactly is an activation function?
Dropout | Dropout consists in randomly setting a fraction rate of input units to 0 at each update during training time, which helps prevent overfitting | I think this just adds some random 0's to the input, to add some stochasticity to the data. "Dropout helps prevent weights from converging to identical positions. It does this by randomly turning nodes off when forward propagating. It then back-propagates with all the nodes turned on. Let’s take a closer look."| How exactly is it adding 0's, and why does that help avoid overfitting? 
Embedding | Turns positive integers (indexes) into dense vectors of fixed size. eg. [[4], [20]] -> [[0.25, 0.1], [0.6, -0.2]] | I think this somehow re-representing the data we give to the sequnce as something interpretable by the other layers| What is a dense vector, and what is a vector's size (magnitude?)? 
LSTM | http://deeplearning.net/tutorial/lstm.html | So, normally when you let a layer have memory, the layer's weight will blow up or disappear whatever you give it. In LSTM, if the layer decided to loop back to itself, it doesn't change the input, avoiding that problem "The key to LSTMs is the cell state, the horizontal line running through the top of the diagram. The cell state is kind of like a conveyor belt. It runs straight down the entire chain, with only some minor linear interactions. It’s very easy for information to just flow along it unchanged."| Lol, wtf. 
Bidirectional | BRNNs were introduced to increase the amount of input information available to the network. For example, multilayer perceptron (MLPs) and time delay neural network (TDNNs) have limitations on the input data flexibility, as they require their input data to be fixed. Standard recurrent neural network (RNNs) also have restrictions as the future input information cannot be reached from the current state. On the contrary, BRNNs do not require their input data to be fixed. | The recurrent layer can send information to previous iterations of the node, kind of like time travel | How is this different what just LSTM? Why is this apparently super useful? How does the layer know which prev iteration to send information?


## Prep Data

In [None]:
# The maximum number of charactaristics we consider for each input
max_features = 20000

# cut texts after this number of words
# (among top max_features most common words)
maxlen = 100

# Iterate on your training data in batches. This trains the model 32 inputs at a time
batch_size = 32

# Not sure what x and y are, but puts x and y data into training and test data
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)

# Transform a list of num_samples sequences (lists of scalars) into a 2D Numpy array of shape (num_samples, num_timesteps).
# Basically, turns these lists of numbers into matrices which the neural net would like to use
# Obviously, x_train is used to train the model, x_test is used to test the model
x_train = sequence.pad_sequences(x_train, maxlen=maxlen)
x_test = sequence.pad_sequences(x_test, maxlen=maxlen)

# Similar to what happens above.
# Because I'm not savvy on the difference between x and y here, 
# not sure why not y_train = sequence.pad_sequences(y_train, maxlen=maxlen) 
y_train = np.array(y_train)
y_test = np.array(y_test)

## Make the model

In [None]:
model = Sequential()

# This layer is here to convert the matrices above into input the neural net process
# Obviously should come first
model.add(Embedding(max_features, 128, input_length=maxlen))

# This is the R in this RNN. The 64 is the dimensionality of the output space (which means what?)
model.add(Bidirectional(LSTM(64)))

# Adds some randomness in the training process.
model.add(Dropout(0.5)
          
# This comes last because this is the threshold-cutter-offer
model.add(Dense(1, activation='sigmoid'))

## Run the model

In [None]:
# Flattens the layers into something that can be trained
model.compile('adam', 'binary_crossentropy', metrics=['accuracy'])

print('Train...')
# Tunes the weights of the layers to produce the most accurate model possible
model.fit(x_train, y_train,
          batch_size=batch_size,
          epochs=4,
          validation_data=[x_test, y_test])

## End Notes

1. The model will add some training data, check the outputs, and adjust the weights of the LSTM layer, as well as randomly turn some inputs on and off in the dropout layer

2. How do I make use or otherwise access the model?

3. How does this program know to do sentiment analysis? Is that a property of something stored in the IMDB training data?