# Assignment 1: Neural Networks
# BA 64061-003
# Madeline Witzeman

# First Step: Loading the IMDb dataset

In [1]:
import tensorflow
from tensorflow.keras.datasets import imdb
(train_data, train_labels), (test_data, test_labels) = imdb.load_data(
    num_words=10000)

# Second Step: Recreating the IMDb example studied in class so I have a baseline model to compare against when trying to improve model performance

## Preparing the data

Encoding the integer sequences via multi-hot encoding so the data is ready to be fed into a neural network

In [2]:
import numpy as np
def vectorize_sequences(sequences, dimension=10000):
    results = np.zeros((len(sequences), dimension))
    for i, sequence in enumerate(sequences):
        for j in sequence:
            results[i, j] = 1.
    return results
x_train = vectorize_sequences(train_data)
x_test = vectorize_sequences(test_data)

Vectorizing labels:

In [3]:
y_train = np.asarray(train_labels).astype("float32")
y_test = np.asarray(test_labels).astype("float32")

## Building the model

Defining the model:

In [4]:
from tensorflow import keras
from tensorflow.keras import layers

model = keras.Sequential([
    layers.Dense(16, activation="relu"),
    layers.Dense(16, activation="relu"),
    layers.Dense(1, activation="sigmoid")
])

Compiling the model:

In [5]:
model.compile(optimizer="rmsprop",
              loss="binary_crossentropy",
              metrics=["accuracy"])

## Validating approach

Setting aside validation set:

In [6]:
x_val = x_train[:10000]
partial_x_train = x_train[10000:]
y_val = y_train[:10000]
partial_y_train = y_train[10000:]

Training the model (using epoch = 4, determined as best epoch value in the textbook):

In [7]:
history = model.fit(partial_x_train,
                    partial_y_train,
                    epochs=4,
                    batch_size=512,
                    validation_data=(x_val, y_val))

Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


In [10]:
results = model.evaluate(x_test, y_test)
results



[0.3192662000656128, 0.8755599856376648]

Using two hidden layers, 16 hidden units, binary_crossentropy loss function, relu activation, and epoch = 4 yields 88.34% accuracy on the validation dataset and 87.69% accuracy on the test dataset, on average.

Since the neural network I'm utilizing implements stochastic learning, the model produces different validation and test accuracies each time it is run. To arrive at the validation and test accuracies for each model, I will take the average of 5 runs:

1.   Val: .8837; Test: .8789
2.   Val: .8762; Test: .8666
3.   Val: .8846; Test: .8785
4.   Val: .8879; Test: .8847
5.   Val: .8847; Test: .8756

Average - Val: .8834; Test: .8769



Now that I know the validation and test accuracy of the baseline model, I'm going to try various approaches to improve the model's performance by altering one component at a time.

# Next approach: Using one hidden layer instead of two hidden layers

## Retraining the model

In [9]:
model_2 = keras.Sequential([
    layers.Dense(16, activation="relu"),
    layers.Dense(1, activation="sigmoid")
])
model_2.compile(optimizer="rmsprop",
              loss="binary_crossentropy",
              metrics=["accuracy"])
model_2.fit(partial_x_train, partial_y_train, epochs=4, batch_size=512, validation_data=(x_val, y_val))
results = model_2.evaluate(x_test, y_test)
results

Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


[0.31009921431541443, 0.8757200241088867]

## Results

Using only one hidden layer increases the validation accuracy to 88.88% and the test accuracy to 88.26%.

Run data:


1.   Val: .8868; Test: .8786
2.   Val: .8894; Test: .8839
3.   Val: .8894; Test: .8836
4.   Val: .8890; Test: .8839
5.   Val: .8894; Test: .8832

Average - Val: .8888; Test: .8826



# Using three hidden layers instead of two hidden layers

## Retraining the model

In [13]:
model_3 = keras.Sequential([
    layers.Dense(16, activation="relu"),
    layers.Dense(16, activation="relu"),
    layers.Dense(16, activation="relu"),
    layers.Dense(1, activation="sigmoid")
])
model_3.compile(optimizer="rmsprop",
              loss="binary_crossentropy",
              metrics=["accuracy"])
model_3.fit(partial_x_train, partial_y_train, epochs=4, batch_size=512, validation_data=(x_val, y_val))
results = model_3.evaluate(x_test, y_test)
results

Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


[0.29587337374687195, 0.8813199996948242]

## Results

Using three hidden layers slightly increased the validation accuracy to 88.52% and the test accuracy to 87.94%.

Run data:


1.   Val: .8890; Test: .8827
2.   Val: .8900; Test: .8840
3.   Val: .8879; Test: .8822
4.   Val: .8735; Test: .8668
5.   Val: .8857; Test: .8813

Average - Val: .8852; Test: .8794



## Using one hidden layer produced the best validation and test performance

# Increasing hidden units to 32 instead of 16

## Retraining the model

In [12]:
model_4 = keras.Sequential([
    layers.Dense(32, activation="relu"),
    layers.Dense(32, activation="relu"),
    layers.Dense(1, activation="sigmoid")
])
model_4.compile(optimizer="rmsprop",
              loss="binary_crossentropy",
              metrics=["accuracy"])
model_4.fit(partial_x_train, partial_y_train, epochs=4, batch_size=512, validation_data=(x_val, y_val))
results = model_4.evaluate(x_test, y_test)
results

Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


[0.2903912663459778, 0.8824800252914429]

## Results

Using 32 hidden units in hidden layers slightly decreased the validation accuracy to 88.09% and the test accuracy to 87.31%. Since performance decreased for both the validation and test sets, I won't use 32 hidden units moving forward.




Run data:


1.   Val: .8815; Test: .8712
2.   Val: .8880; Test: .8822
3.   Val: .8607; Test: .8486
4.   Val: .8869; Test: .8810
5.   Val: .8874; Test: .8825

Average - Val: .8809; Test: .8731



# Increasing hidden units to 64 instead of 16

## Retraining the model

In [10]:
model_5 = keras.Sequential([
    layers.Dense(64, activation="relu"),
    layers.Dense(64, activation="relu"),
    layers.Dense(1, activation="sigmoid")
])
model_5.compile(optimizer="rmsprop",
              loss="binary_crossentropy",
              metrics=["accuracy"])
model_5.fit(partial_x_train, partial_y_train, epochs=4, batch_size=512, validation_data=(x_val, y_val))
results = model_5.evaluate(x_test, y_test)
results

Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


[0.32506248354911804, 0.8685600161552429]

## Results

Using 64 hidden units in hidden layers decreased the validation accuracy notably to 87.13% and the test accuracy to 86.06%. It appears that adding hidden units decreases model performance when using 4 epoches. Given this information, I won't use 64 hidden units moving forward.

Run data:


1.   Val: .8646; Test: .8535
2.   Val: .8882; Test: .8830
3.   Val: .8613; Test: .8478
4.   Val: .8637; Test: .8501
5.   Val: .8785; Test: .8686

Average - Val: .8713; Test: .8606



## Using 16 hidden units in hidden layers produced the best validation and test performance

# Using MSE loss function instead of binary_crossentropy

## Retraining the model

In [11]:
model_6 = keras.Sequential([
    layers.Dense(16, activation="relu"),
    layers.Dense(16, activation="relu"),
    layers.Dense(1, activation="sigmoid")
])
model_6.compile(optimizer="rmsprop",
              loss="mse",
              metrics=["accuracy"])
model_6.fit(partial_x_train, partial_y_train, epochs=4, batch_size=512, validation_data=(x_val, y_val))
results = model_6.evaluate(x_test, y_test)
results

Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


[0.08807112276554108, 0.8828399777412415]

## Results

Using the mean squared error (mse) loss function kept the validation accuracy around 88.33% and slightly increased the test accuracy to 87.88%. Given this information, I may consider using the mse loss function moving forward.

Run data:


1.   Val: .8727; Test: .8726
2.   Val: .8853; Test: .8809
3.   Val: .8840; Test: .8766
4.   Val: .8861; Test: .8812
5.   Val: .8883; Test: .8828

Average - Val: .8833; Test: .8788



# Using tanh activation instead of relu

## Retraining the model

In [9]:
model_7 = keras.Sequential([
    layers.Dense(16, activation="tanh"),
    layers.Dense(16, activation="tanh"),
    layers.Dense(1, activation="sigmoid")
])
model_7.compile(optimizer="rmsprop",
              loss="binary_crossentropy",
              metrics=["accuracy"])
model_7.fit(partial_x_train, partial_y_train, epochs=4, batch_size=512, validation_data=(x_val, y_val))
results = model_7.evaluate(x_test, y_test)
results

Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


[0.30325186252593994, 0.8740800023078918]

## Results

Using the tanh activation slightly increased the validation accuracy to 88.48% and test accuracy to 87.78%. Given this information, I may consider using tanh activation moving forward.

Run data:


1.   Val: .8841; Test: .8768
2.   Val: .8850; Test: .8777
3.   Val: .8890; Test: .8822
4.   Val: .8846; Test: .8784
5.   Val: .8812; Test: .8741

Average - Val: .8848; Test: .8778



# Attempt at optimizing the model based on prior results

# Using one hidden layer + 16 hidden units + mse loss function + tanh activation to explore model performance

## Retraining the model

In [9]:
model_8 = keras.Sequential([
    layers.Dense(16, activation="tanh"),
    layers.Dense(1, activation="sigmoid")
])
model_8.compile(optimizer="rmsprop",
              loss="mse",
              metrics=["accuracy"])
model_8.fit(partial_x_train, partial_y_train, epochs=4, batch_size=512, validation_data=(x_val, y_val))
results = model_8.evaluate(x_test, y_test)
results

Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


[0.09115107357501984, 0.884440004825592]

## Results

This model consistently had the highest performance with a 88.67% validation accuracy and 88.31% test accuracy. It appears that adjusting the base model using the components that improved performance earlier in the analysis collectively made the model better.

Run data:

1.   Val: .8878; Test: .8852
2.   Val: .8871; Test: .8827
3.   Val: .8893; Test: .8851
4.   Val: .8880; Test: .8830
5.   Val: .8812; Test: .8796

Average - Val: .8867; Test: .8831

