# IMDB review dataset sentiment analysis with Keras

This notebook implements a simple approach for analyzing the IMDB review dataset, trying to predict if the content from a review is describing something good or bad.

### Imports

In [11]:
import numpy as np
import keras
from keras.datasets import imdb
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation
from keras.preprocessing.text import Tokenizer
from keras.utils import to_categorical
import matplotlib.pyplot as plt

np.random.seed(42)

### Load dataset

In [20]:
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=1000)

### Proprocessing
The IMDB dataset comes in a tokenized format. For instence, if we have the following dictionary:

hello : 55<br>
darkness : 678<br>
my : 123<br>
old : 34<br>
friend : 69

The sentence : "Hello darkness my old friend", would come as :<br> [55, 678, 123, 34, 69]

At this step, the dataset will be one-hot encoded based on the numbers corresponding to the words from our dictionary.

In [21]:
tokenizer = Tokenizer(num_words=1000)
x_train = tokenizer.sequences_to_matrix(x_train, mode='binary')
x_test = tokenizer.sequences_to_matrix(x_test, mode='binary')

y_train = to_categorical(y_train)
y_test = to_categorical(y_test)

### Model

In [25]:
# Create model
model = Sequential()

# First layer
model.add(Dense(128, activation='relu', input_shape=(1000,)))

# Second layer
model.add(Dense(64, activation='relu'))

# Third layer
model.add(Dense(32, activation='relu'))

# Output layer
model.add(Dense(2, activation='sigmoid'))

# Compile model
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

### Model training

In [24]:
model.fit(x_train, y_train, epochs=200, batch_size=100, verbose=0)

Instructions for updating:
Use tf.cast instead.


<keras.callbacks.History at 0x135df16a0>

### Model evaluation

In [26]:
train_score = model.evaluate(x_train, y_train)
print("Accuracy on training dataset : ", train_score[1])
train_score = model.evaluate(x_test, y_test)
print("Accuracy on training dataset : ", train_score[1])

Accuracy on training dataset :  0.49412
Accuracy on training dataset :  0.4938
