## Sentiment Analysis of Movie Reviews


In this exercise, we will write a model to analyze movie reviews on IMDB and decide if they are positive or negative reviews.

The IMDB dataset consists of 25,000 reviews, each with a binary label (1 = positive, 0 = negative).

Here is an example of a POSITIVE review:

> "The pace is steady and constant, the characters full and engaging, the relationships and interactions natural showing that you do not need floods of tears to show emotion, screams to show fear, shouting to show dispute or violence to show anger. Naturally Joyce's short story lends the film a ready made structure as perfect as a polished diamond, but the small changes Huston makes such as the inclusion of the poem fit in neatly. It is truly a masterpiece of tact, subtlety and overwhelming beauty."

Here is an example of a NEGATIVE review:

> "Beautiful attracts excellent idea, but ruined with a bad selection of the actors. The main character is a loser and his woman friend and his friend upset viewers. Apart from the first episode all the other become more boring and boring. First, it considers it illogical behavior. No one normal would not behave the way the main character behaves. It all represents a typical Halmark way to endear viewers to the reduced amount of intelligence. Does such a scenario, or the casting director and destroy this question is on Halmark producers. Cat is the main character is wonderful. The main character behaves according to his friend selfish."

1. Setup
--------

We first generate a compute backend and provide some needed settings.

In [None]:
from neon.backends import gen_backend

be = gen_backend(backend='gpu', batch_size=128, rng_seed=0)

We also import all the components we need for this exercise:

In [None]:
from neon.data.text_preprocessing import clean_string
from neon.initializers import Uniform, GlorotUniform
from neon.layers import LSTM, Affine, Dropout, LookupTable, RecurrentSum, Recurrent
from neon.transforms import Logistic, Tanh, Softmax
from neon.models import Model
from neon.optimizers import Adagrad, GradientDescentMomentum, Schedule
from neon.transforms import CrossEntropyMulti
from neon.layers import GeneralizedCost
from neon.callbacks import Callbacks
from neon.transforms.cost import Accuracy
from viz_callback import CostVisCallback
from imdb import IMDB
import numpy as np

2. Dataset
----------

We have to preprocess the dataset to convert the words into numbers. We take our vocabularly of words, and assign a number to each word. For example, a sentence such as:

> "Hello world, my name is Intel and my location is Santa Clara"

Will be converted to list of 6 numbers:

> [24, 784, 4, 98, 22, 143, 15, 4, 314, 22, 488, 2894] 

We already done this for you, and is loaded in the code below:

In [None]:
imdb = IMDB(be)

3. Build the Model
-----------------

We initialize the parameters with uniform random numbers ranging from `-1/128` to `1/128`

In [None]:
init_uniform = Uniform(-0.1/128, 0.1/128)

The network consists of list of the following layers:

1. `LookupTable` transforms each word into a vector of numbers. 
2. `LSTM` is a recurrent layer with “long short-term memory” units. LSTM networks are good at learning temporal dependencies in the data.
3. `RecurrentSum` sums the output from the LSTM layer across the different time steps.
4. `Dropout` randomly silences a subset of the units during training.
5. `Affine` is a layer with two outputs, for the two target classes.

Below we first create a list of layers, then create the model object.

In [None]:
layers = []
layers.append(LookupTable(vocab_size=20000, embedding_dim=128, init=init_uniform))
layers.append(LSTM(output_size=64, init=GlorotUniform(), activation=Tanh(),
              gate_activation=Logistic(), reset_cells=True))
layers.append(RecurrentSum())
layers.append(Dropout(0.5))
layers.append(Affine(nout=2, init=GlorotUniform(), bias=GlorotUniform(), activation=Softmax()))

# create model object
model = Model(layers=layers)

4. Select the Algorithm
------------
For training, we set up the cost function, which we want to minimize. We also select an optimization algorithm with a particular learning rate `0.01`.

In [None]:
# define cost
cost = GeneralizedCost(costfunc=CrossEntropyMulti(usebits=True))

# use Adagrad algorithm with a learning rate of 0.01
optimizer = Adagrad(learning_rate=0.01)

Callbacks allow the model to report its progress during the course of training. Here we tell neon to plot a graph with the cost over the course of training.

In [None]:
callbacks = Callbacks(model, eval_set=imdb.valid_set)
callbacks.add_callback(CostVisCallback(nepochs=2))

Train model
-----------

Now are ready to train the model. Recall what happens during the training process:

<img src="train_schematic.png" width=700px>





To train the model, we call the `fit()` function and pass in the training set. Here we train for 2 epochs, meaning two rounds through the dataset.

In [None]:
model.fit(imdb.train_set, optimizer=optimizer, num_epochs=2,
          cost=cost, callbacks=callbacks)

Accuracy
--------

We can then measure the model's accuracy on the validation data -- data that the model was not trained on.



In [None]:
print "Test  Accuracy - {}".format(100 * model.eval(imdb.valid_set, metric=Accuracy()))

Inference
--------

Now let's do something fun with the trained model! We create a UI below where you can type in your movie review (or any other text) and have it classified into positive or negative.

In [None]:
from imdb import preprocess
from ipywidgets import interact, interactive, fixed
from imdb import text_window

def inference(x):
    input_data = be.zeros((128, be.bsz), dtype=np.int32)
    preprocess(x, input_data)
    result = model.fprop(input_data, inference=True)
    print("Sentiment: {:.1f}% Positive".format(100*result.get()[1][0]))
    
z = interact(inference, x=text_window())

Note: In some browsers, the above text window may not show up. In that case,run the below code to do inference. Feel free to edit the text inside the quotes and re-run the code to experiment.

In [None]:
inference("This movie was terrible!")