# Data Science Society Term 2 Workshop 3 - Word Embedding (Solutions)

In [None]:
!pip install keras

In [None]:
!pip install theano

In [None]:
import os
os.environ['KERAS_BACKEND'] = "theano"
import keras
from keras.models import Sequential
from keras.preprocessing import sequence
from keras.preprocessing.text import one_hot
from keras.preprocessing.sequence import pad_sequences
from keras.layers import Dense, Flatten, Embedding
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

The code cell below loads in the IMDb dataset. The dataset consists of words from movie reviews already encoded as we did before with the ````one_hot```` function. The ````vocab_size```` is set to ````500````. Your task is to construct a model similar to the one before that tries to determine whether the sentiment of the review is positivie or negative.

In [None]:
# Vocabulary size
vocab_size = 500

# Loads in dataset in sequential form
(X_train, y_train), (X_test, y_test) = keras.datasets.imdb.load_data(num_words=vocab_size)

## Padding
Using a maximum length of $500$, pad the sequences for the ````X_train```` and ````X_test```` data.

In [None]:
# Sets max length of the data to 500
max_length = 500

# Ensures all data is of the specified length
X_train = pad_sequences(X_train, max_length)
X_test = pad_sequences(X_test, max_length)

## Building the Model
Build and train a neural network with the following features:
- Starts with an embedding layer with vocabulary size and input length as defined above. The vector space should consist of 32 dimensions.
- Contains at least one hidden layer.
- Output layer has one neuron and uses the sigmoid activation function.
- Loss function should be ````binary_crossentropy````.
- Keep track of the accuracy.

In [None]:
# Chooses the sequential model for our neural network
model = Sequential()

# Embedding layer with 32 dimensions in vector space
model.add(Embedding(vocab_size, 32, input_length=max_length))

# Flattens data
model.add(Flatten())

# Hidden layer with 200 neurons and a rectified linear activation function
model.add(Dense(200, activation='relu'))

# Output layer using one neuron and the sigmoid activation function
model.add(Dense(1, activation='sigmoid'))

# Training parameters
model.compile(optimizer="adam",
              loss="binary_crossentropy",
              metrics=["accuracy"])

# Trains model for 3 epochs
model.fit(X_train,y_train,epochs=3)

## Evaluation
Now evaluate the performance of your model.

In [None]:
# Evaluating the performance of the model
loss, accuracy = model.evaluate(X_test,y_test)
print(accuracy)