<img src="https://www.th-koeln.de/img/logo.svg" style="float: right;" width="200">

# 3rd exercise: <font color="#C70039">Binary sentiment classification with IMDB movie reviews</font>
* Course: DIS21a.1
* Lecturer: <a href="https://www.gernotheisenberg.de/">Gernot Heisenberg</a>
* Author of notebook modifications and adaptations: <a href="https://www.gernotheisenberg.de/">Gernot Heisenberg</a>
* Student: Maximilian Pekarski
* Matriculation number: 11120099
* Date:   09.01.2023

<img src="https://brand24.com/blog/wp-content/uploads/2017/04/Screen-Shot-2017-04-12-at-16.24.20.png" style="float: center;" width="200">

**GENERAL NOTE 1**: 
Please make sure you are reading the entire notebook, since it contains a lot of information about your tasks (e.g. regarding the set of certain paramaters or specific computational tricks, etc.), and the written mark downs as well as comments contain a lot of information on how things work together as a whole. 

**GENERAL NOTE 2**: 
* Please, when commenting source code, just use English language only. 
* When describing an observation (for instance, after you have run through your test plan) you may use German language.
This applies to all exercises in DIS 21a.1.  

--------------------

### <font color="ce33ff">DESCRIPTION</font>:
This notebook allows you for a binary classification (two classes only). You will classify movie reviews into "positive" reviews and "negative" reviews, just based on the text content of the reviews.

The **IMDB** dataset is a set of 50.000 highly-polarized reviews from the Internet Movie Database. They are split into 25.000 reviews for training and 25.000 reviews for testing, each set consisting in 50% negative and 50% positive reviews.
The IMDB dataset comes packaged with Keras. It has already been preprocessed: the reviews (sequences of words) 
have been turned into sequences of integers, where each integer stands for a specific word in a dictionary.

-----------------------

### <font color="FFC300">TASKS</font>:
Within this notebook, the tasks that you need to work on are always listed as bullet points below. 
If a task is more challenging and consists of several steps, this is indicated as well. 
Make sure you have worked down the task list and commented your doings. 
This should be done using markdown.<br> 
<font color=red>Make sure you don't forget to specify your name and your matriculation number in the notebook before submitting it.</font>

**YOUR TASKS in this exercise are as follows**:
1. import the notebook into Google Colab.
2. make sure you specified you name and your matriculation number in the header below my name and date.
    * set the date too and remove mine.
3. read the entire notebook carefully.
    * add comments whereever you feel it necessary for better understanding
    * run the notebook for the first time and note the result in your markdown result table (your test plan). 
4. go into the section 'building the ANN'.
    * add the missing code that does create a network as shown in the image in the lecture slides on page 166 (File: 'DIS21a.1-7.HANDS_ON.First.DLNetwork.Architectures.for.Solving.Three.Interesting.Problems.pdf')
    * set the activation function to ReLu
    * set the correct activation function in the last layer (the output layer). What is correct?
5. stay in the 'building the ANN' section again.
    * add the missing code for compiling the network by setting
        * the loss function
        * the optimizer
        * the evaluation metric (little hint: is the label distribution balanced ?)
6. take less `training` data and rerun the network.
    * add the size of the training data as a column in the table and note the accuracy you achieve
7. take less `testing` data and rerun the network.
    * add the size of the testing data as a column in the table and note the accuracy you achieve
8. increase/decrease the number of epochs and the batch size 
    * add those hyperparameters as columns in the table and note the accuracy you achieve
9. make combinations of this according to your testplan. Make sure you combine with sense and reason and not just chaotically.
10. comment your observations.
    * when is the accuracy increasing / decreasing
11. until now there were 2 hidden layers. Try to use 1 or 3 hidden layers and see how it affects the test accuracy. Use again a little table. Describe your observations.
12. Try to use layers with more hidden units or less hidden units: 32 units, 64 units, 128 units ...! What effect can you observe?
13. Try to use the `mse` loss function instead of `binary_crossentropy`. What effect can you observe?
14. Try to use the `tanh` activation (an activation that was popular in the early days of neural networks) instead of `relu`. What effect can you observe?


## START OF THE NOTEBOOK CODE
----------------------------------------------------------------------------------------------------------------------

In [1]:
#tensorflow and keras (implicit)
import tensorflow
tensorflow.keras.__version__

'2.9.0'

### loading the IMDB movie review data set
This code loads the data set (when you run it for the first time on a local machine, about approx. 80MB of data will be downloaded into the memory):

In [2]:
from tensorflow.keras.datasets import imdb

(train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words=10000)


The argument `num_words=10000` means that we will only keep the top 10.000 most frequently occurring words in the training data. Rare words will be discarded. This allows us to work with vector data of manageable size.

The variables `train_data` and `test_data` are lists of reviews, each review being a list of word indices (encoding a sequence of words). 
`train_labels` and `test_labels` are lists of 0s and 1s, where 0 stands for "negative" and 1 stands for "positive":

In [3]:
train_data[0]

[1,
 14,
 22,
 16,
 43,
 530,
 973,
 1622,
 1385,
 65,
 458,
 4468,
 66,
 3941,
 4,
 173,
 36,
 256,
 5,
 25,
 100,
 43,
 838,
 112,
 50,
 670,
 2,
 9,
 35,
 480,
 284,
 5,
 150,
 4,
 172,
 112,
 167,
 2,
 336,
 385,
 39,
 4,
 172,
 4536,
 1111,
 17,
 546,
 38,
 13,
 447,
 4,
 192,
 50,
 16,
 6,
 147,
 2025,
 19,
 14,
 22,
 4,
 1920,
 4613,
 469,
 4,
 22,
 71,
 87,
 12,
 16,
 43,
 530,
 38,
 76,
 15,
 13,
 1247,
 4,
 22,
 17,
 515,
 17,
 12,
 16,
 626,
 18,
 2,
 5,
 62,
 386,
 12,
 8,
 316,
 8,
 106,
 5,
 4,
 2223,
 5244,
 16,
 480,
 66,
 3785,
 33,
 4,
 130,
 12,
 16,
 38,
 619,
 5,
 25,
 124,
 51,
 36,
 135,
 48,
 25,
 1415,
 33,
 6,
 22,
 12,
 215,
 28,
 77,
 52,
 5,
 14,
 407,
 16,
 82,
 2,
 8,
 4,
 107,
 117,
 5952,
 15,
 256,
 4,
 2,
 7,
 3766,
 5,
 723,
 36,
 71,
 43,
 530,
 476,
 26,
 400,
 317,
 46,
 7,
 4,
 2,
 1029,
 13,
 104,
 88,
 4,
 381,
 15,
 297,
 98,
 32,
 2071,
 56,
 26,
 141,
 6,
 194,
 7486,
 18,
 4,
 226,
 22,
 21,
 134,
 476,
 26,
 480,
 5,
 144,
 30,
 5535,
 18,

In [4]:
train_labels[0]

1

Since we restricted ourselves to the top 10.000 most frequent words, no word index will exceed 10.000:

In [5]:
max([max(sequence) for sequence in train_data])

9999

### data preparation

Since lists of integers cannot be fed into a neural network, the lists need to be turned into tensors by one-hot-encoding ( vectors of 0s and 1s). This means for instance, turning the sequence `[3, 5]` into a 10000-dimensional vector, that would be all-zeros except for indices 3 and 5, which would be ones. Then use as first layer in the network a `Dense` layer, capable of handling floating point vector data.

In [6]:
import numpy as np

# just take the first 10.000 most frequent words
def vectorize_sequences(sequences, dimension=10000):
    # Create an all-zero matrix of shape (len(sequences), dimension)
    results = np.zeros((len(sequences), dimension))
    for i, sequence in enumerate(sequences):
        results[i, sequence] = 1.  # set specific indices of results[i] to 1s
    return results

# Our vectorized training data
x_train = vectorize_sequences(train_data)
# Our vectorized test data
x_test = vectorize_sequences(test_data)

Here's what our samples look like now:

In [7]:
x_train[0]

array([0., 1., 1., ..., 0., 0., 0.])

Vectorizing the labels too, which is straightforward:

In [8]:
# Our vectorized labels
y_train = np.asarray(train_labels).astype('float32')
y_test = np.asarray(test_labels).astype('float32')

### building the ANN

Now the data is ready to be fed into a neural network. The input data is simply vectors, and the labels are scalars (1s and 0s): this is the easiest setup you will ever encounter.
A type of network that performs well on such a problem would be a simple stack of fully-connected (`Dense`) layers as you have learned in my lectures. 
The final output layer will use a special activation so as to output a probability (a score between 0 and 1, indicating how likely the sample is to have the target "1", i.e. how likely the review is to be positive).

The implementation is very similar to what you have learned from the MNIST example from the earlier exercise.

In [12]:
# necessary inputs
from tensorflow.keras import models
from tensorflow.keras import layers
from tensorflow.keras.optimizers import RMSprop

In [14]:
network = models.Sequential()
network.add(layers.Dense(16, activation='relu', input_shape=(10000,)))
network.add(layers.Dense(16, activation='relu'))
network.add(layers.Dense(1, activation='sigmoid'))

At last, a loss function and an optimizer needs to be specified.
Since this is a binary classification problem and the output of our network is a probability it is best to use the `binary_crossentropy` loss. It is not the only viable choice: you could use, for instance, `mean_squared_error`. But crossentropy is usually the best choice when you are dealing with models that compute and output probabilities. 
Crossentropy is a quantity from the field of Information Theory, that measures the "distance" 
between probability distributions (or in our case, between the actual distribution (ground-truth) and the predictions).

In addition, since the problem is class-balanced, what do you think can be used as an evaluation metric? Is there a special metric that is used when the classification task is binary?

The network can be configured with the `rmsprop` optimizer, since it always does a good job. 

In [15]:
network.compile(optimizer=RMSprop(learning_rate=0.001),loss='binary_crossentropy',metrics=['accuracy'])

--------------------

### training the ANN
Let's train the model for 4 epochs (4 iterations over all samples in the `x_train` and `y_train` tensors), in mini-batches of 512 samples.

In [16]:
history = network.fit(x_train, y_train, epochs=4, batch_size=512)

Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


On CPU of a local machine (not on Google Colab), this will take less than two seconds per epoch -- training is over in 9 seconds. If you are using GPU support it is much faster. <font color=red>Note:</font> Colab also offers GPU support. Please read through the Colab documentation for finding out how to set it up. 

Note that the call to `model.fit()` returns a `history` object. This object has a member `history`, which is a dictionary containing data about everything that happened during training. Let's use it to output some info on the metrics used.

In [17]:
history_dict = history.history
history_dict.keys()

dict_keys(['loss', 'accuracy'])

### evaluate the model

In [19]:
results = network.evaluate(x_test, y_test)



In [27]:
results

[0.30154499411582947, 0.8802800178527832]

Our fairly naive approach achieves an accuracy of higher than 80%. 
With state-of-the-art approaches, one should be able to get close to 95% (we will come to this in a later exercise.)

### <font color="#C70039">Include your result table here and reflect a good test plan (see task list)</font>

In [22]:
import pandas as pd

In [25]:
test_plan = pd.DataFrame(columns=['method','loss_function','optimizer','accuracy','loss','training_size','testing_size','epochs','batch_size'])

In [35]:
def run_network(x_train, y_train, x_test, y_test, training_size=25000, testing_size=25000, epochs_num = 4, batch_size_num = 512):
    
    network = models.Sequential()
    network.add(layers.Dense(16, activation='relu', input_shape=(10000,)))
    network.add(layers.Dense(16, activation='relu'))
    network.add(layers.Dense(1, activation='sigmoid'))
    
    network.compile(optimizer=RMSprop(learning_rate=0.001),loss='binary_crossentropy',metrics=['accuracy'])
    
    x_train = x_train[:training_size]
    y_train = y_train[:training_size]
    
    x_test = x_test[:testing_size]
    y_test = y_test[:testing_size]
    
    network.fit(x_train, y_train, epochs=epochs_num, batch_size=batch_size_num)

    results = network.evaluate(x_test, y_test)

    test_plan_dict = {'method':['binary_classification'], 
    'loss_function':['binary_crossentropy'],
    'optimizer':['rmsprop'],
    'accuracy' : [results[1]],
    'loss':[results[0]],
    'training_size':[training_size],
    'testing_size':[testing_size],
    'epochs':[epochs_num],
    'batch_size':[batch_size_num]
    }
    
    df = pd.DataFrame(test_plan_dict)

    return df

In [36]:
tr1 = run_network(x_train, y_train, x_test, y_test)
tr1

Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


Unnamed: 0,method,loss_function,optimizer,accuracy,loss,training_size,testing_size,epochs,batch_size
0,binary_classification,binary_crossentropy,rmsprop,0.87208,0.322585,25000,25000,4,512


In [39]:
test_plan = pd.concat([test_plan, tr1])

### task 6

In [40]:
tr2 = run_network(x_train, y_train, x_test, y_test, training_size=15000)
test_plan = pd.concat([test_plan, tr2])
tr2

Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


Unnamed: 0,method,loss_function,optimizer,accuracy,loss,training_size,testing_size,epochs,batch_size
0,binary_classification,binary_crossentropy,rmsprop,0.87468,0.306827,15000,25000,4,512


### task 7

In [41]:
tr3 = run_network(x_train, y_train, x_test, y_test, testing_size=15000)
test_plan = pd.concat([test_plan, tr3])
test_plan

Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


Unnamed: 0,method,loss_function,optimizer,accuracy,loss,training_size,testing_size,epochs,batch_size
0,binary_classification,binary_crossentropy,rmsprop,0.87208,0.322585,25000,25000,4,512
0,binary_classification,binary_crossentropy,rmsprop,0.87468,0.306827,15000,25000,4,512
0,binary_classification,binary_crossentropy,rmsprop,0.875867,0.316983,25000,15000,4,512


### task 8

In [42]:
tr4 = run_network(x_train, y_train, x_test, y_test, epochs_num=10, batch_size_num=128)
test_plan = pd.concat([test_plan, tr4])
test_plan

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


Unnamed: 0,method,loss_function,optimizer,accuracy,loss,training_size,testing_size,epochs,batch_size
0,binary_classification,binary_crossentropy,rmsprop,0.87208,0.322585,25000,25000,4,512
0,binary_classification,binary_crossentropy,rmsprop,0.87468,0.306827,15000,25000,4,512
0,binary_classification,binary_crossentropy,rmsprop,0.875867,0.316983,25000,15000,4,512
0,binary_classification,binary_crossentropy,rmsprop,0.85388,0.716287,25000,25000,10,128


### task 9

In [43]:
tr5 = run_network(x_train, y_train, x_test, y_test, epochs_num=10, batch_size_num=1024)
test_plan = pd.concat([test_plan, tr5])
test_plan

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


Unnamed: 0,method,loss_function,optimizer,accuracy,loss,training_size,testing_size,epochs,batch_size
0,binary_classification,binary_crossentropy,rmsprop,0.87208,0.322585,25000,25000,4,512
0,binary_classification,binary_crossentropy,rmsprop,0.87468,0.306827,15000,25000,4,512
0,binary_classification,binary_crossentropy,rmsprop,0.875867,0.316983,25000,15000,4,512
0,binary_classification,binary_crossentropy,rmsprop,0.85388,0.716287,25000,25000,10,128
0,binary_classification,binary_crossentropy,rmsprop,0.87208,0.354711,25000,25000,10,1024


In [44]:
tr = run_network(x_train, y_train, x_test, y_test, epochs_num=4, batch_size_num=1024)
test_plan = pd.concat([test_plan, tr])
test_plan

Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


Unnamed: 0,method,loss_function,optimizer,accuracy,loss,training_size,testing_size,epochs,batch_size
0,binary_classification,binary_crossentropy,rmsprop,0.87208,0.322585,25000,25000,4,512
0,binary_classification,binary_crossentropy,rmsprop,0.87468,0.306827,15000,25000,4,512
0,binary_classification,binary_crossentropy,rmsprop,0.875867,0.316983,25000,15000,4,512
0,binary_classification,binary_crossentropy,rmsprop,0.85388,0.716287,25000,25000,10,128
0,binary_classification,binary_crossentropy,rmsprop,0.87208,0.354711,25000,25000,10,1024
0,binary_classification,binary_crossentropy,rmsprop,0.87036,0.333041,25000,25000,4,1024


In [45]:
tr = run_network(x_train, y_train, x_test, y_test, epochs_num=4, batch_size_num=128)
test_plan = pd.concat([test_plan, tr])
test_plan

Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


Unnamed: 0,method,loss_function,optimizer,accuracy,loss,training_size,testing_size,epochs,batch_size
0,binary_classification,binary_crossentropy,rmsprop,0.87208,0.322585,25000,25000,4,512
0,binary_classification,binary_crossentropy,rmsprop,0.87468,0.306827,15000,25000,4,512
0,binary_classification,binary_crossentropy,rmsprop,0.875867,0.316983,25000,15000,4,512
0,binary_classification,binary_crossentropy,rmsprop,0.85388,0.716287,25000,25000,10,128
0,binary_classification,binary_crossentropy,rmsprop,0.87208,0.354711,25000,25000,10,1024
0,binary_classification,binary_crossentropy,rmsprop,0.87036,0.333041,25000,25000,4,1024
0,binary_classification,binary_crossentropy,rmsprop,0.8702,0.365211,25000,25000,4,128


In [46]:
tr = run_network(x_train, y_train, x_test, y_test,training_size=10000, epochs_num=2, batch_size_num=512)
test_plan = pd.concat([test_plan, tr])
test_plan

Epoch 1/2
Epoch 2/2


Unnamed: 0,method,loss_function,optimizer,accuracy,loss,training_size,testing_size,epochs,batch_size
0,binary_classification,binary_crossentropy,rmsprop,0.87208,0.322585,25000,25000,4,512
0,binary_classification,binary_crossentropy,rmsprop,0.87468,0.306827,15000,25000,4,512
0,binary_classification,binary_crossentropy,rmsprop,0.875867,0.316983,25000,15000,4,512
0,binary_classification,binary_crossentropy,rmsprop,0.85388,0.716287,25000,25000,10,128
0,binary_classification,binary_crossentropy,rmsprop,0.87208,0.354711,25000,25000,10,1024
0,binary_classification,binary_crossentropy,rmsprop,0.87036,0.333041,25000,25000,4,1024
0,binary_classification,binary_crossentropy,rmsprop,0.8702,0.365211,25000,25000,4,128
0,binary_classification,binary_crossentropy,rmsprop,0.87276,0.37444,10000,25000,2,512


In [47]:
tr = run_network(x_train, y_train, x_test, y_test,training_size=10000, epochs_num=2, batch_size_num=256)
test_plan = pd.concat([test_plan, tr])
test_plan

Epoch 1/2
Epoch 2/2


Unnamed: 0,method,loss_function,optimizer,accuracy,loss,training_size,testing_size,epochs,batch_size
0,binary_classification,binary_crossentropy,rmsprop,0.87208,0.322585,25000,25000,4,512
0,binary_classification,binary_crossentropy,rmsprop,0.87468,0.306827,15000,25000,4,512
0,binary_classification,binary_crossentropy,rmsprop,0.875867,0.316983,25000,15000,4,512
0,binary_classification,binary_crossentropy,rmsprop,0.85388,0.716287,25000,25000,10,128
0,binary_classification,binary_crossentropy,rmsprop,0.87208,0.354711,25000,25000,10,1024
0,binary_classification,binary_crossentropy,rmsprop,0.87036,0.333041,25000,25000,4,1024
0,binary_classification,binary_crossentropy,rmsprop,0.8702,0.365211,25000,25000,4,128
0,binary_classification,binary_crossentropy,rmsprop,0.87276,0.37444,10000,25000,2,512
0,binary_classification,binary_crossentropy,rmsprop,0.87336,0.326006,10000,25000,2,256


In [48]:
tr = run_network(x_train, y_train, x_test, y_test,training_size=10000, epochs_num=2, batch_size_num=128)
test_plan = pd.concat([test_plan, tr])
test_plan

Epoch 1/2
Epoch 2/2


Unnamed: 0,method,loss_function,optimizer,accuracy,loss,training_size,testing_size,epochs,batch_size
0,binary_classification,binary_crossentropy,rmsprop,0.87208,0.322585,25000,25000,4,512
0,binary_classification,binary_crossentropy,rmsprop,0.87468,0.306827,15000,25000,4,512
0,binary_classification,binary_crossentropy,rmsprop,0.875867,0.316983,25000,15000,4,512
0,binary_classification,binary_crossentropy,rmsprop,0.85388,0.716287,25000,25000,10,128
0,binary_classification,binary_crossentropy,rmsprop,0.87208,0.354711,25000,25000,10,1024
0,binary_classification,binary_crossentropy,rmsprop,0.87036,0.333041,25000,25000,4,1024
0,binary_classification,binary_crossentropy,rmsprop,0.8702,0.365211,25000,25000,4,128
0,binary_classification,binary_crossentropy,rmsprop,0.87276,0.37444,10000,25000,2,512
0,binary_classification,binary_crossentropy,rmsprop,0.87336,0.326006,10000,25000,2,256
0,binary_classification,binary_crossentropy,rmsprop,0.88152,0.294228,10000,25000,2,128


In [49]:
tr = run_network(x_train, y_train, x_test, y_test,training_size=8000, epochs_num=2, batch_size_num=64)
test_plan = pd.concat([test_plan, tr])
test_plan

Epoch 1/2
Epoch 2/2


Unnamed: 0,method,loss_function,optimizer,accuracy,loss,training_size,testing_size,epochs,batch_size
0,binary_classification,binary_crossentropy,rmsprop,0.87208,0.322585,25000,25000,4,512
0,binary_classification,binary_crossentropy,rmsprop,0.87468,0.306827,15000,25000,4,512
0,binary_classification,binary_crossentropy,rmsprop,0.875867,0.316983,25000,15000,4,512
0,binary_classification,binary_crossentropy,rmsprop,0.85388,0.716287,25000,25000,10,128
0,binary_classification,binary_crossentropy,rmsprop,0.87208,0.354711,25000,25000,10,1024
0,binary_classification,binary_crossentropy,rmsprop,0.87036,0.333041,25000,25000,4,1024
0,binary_classification,binary_crossentropy,rmsprop,0.8702,0.365211,25000,25000,4,128
0,binary_classification,binary_crossentropy,rmsprop,0.87276,0.37444,10000,25000,2,512
0,binary_classification,binary_crossentropy,rmsprop,0.87336,0.326006,10000,25000,2,256
0,binary_classification,binary_crossentropy,rmsprop,0.88152,0.294228,10000,25000,2,128


In [50]:
tr = run_network(x_train, y_train, x_test, y_test,training_size=10000, testing_size=10000, epochs_num=2, batch_size_num=128)
test_plan = pd.concat([test_plan, tr])
test_plan

Epoch 1/2
Epoch 2/2


Unnamed: 0,method,loss_function,optimizer,accuracy,loss,training_size,testing_size,epochs,batch_size
0,binary_classification,binary_crossentropy,rmsprop,0.87208,0.322585,25000,25000,4,512
0,binary_classification,binary_crossentropy,rmsprop,0.87468,0.306827,15000,25000,4,512
0,binary_classification,binary_crossentropy,rmsprop,0.875867,0.316983,25000,15000,4,512
0,binary_classification,binary_crossentropy,rmsprop,0.85388,0.716287,25000,25000,10,128
0,binary_classification,binary_crossentropy,rmsprop,0.87208,0.354711,25000,25000,10,1024
0,binary_classification,binary_crossentropy,rmsprop,0.87036,0.333041,25000,25000,4,1024
0,binary_classification,binary_crossentropy,rmsprop,0.8702,0.365211,25000,25000,4,128
0,binary_classification,binary_crossentropy,rmsprop,0.87276,0.37444,10000,25000,2,512
0,binary_classification,binary_crossentropy,rmsprop,0.87336,0.326006,10000,25000,2,256
0,binary_classification,binary_crossentropy,rmsprop,0.88152,0.294228,10000,25000,2,128


In [56]:
test_plan.to_markdown('ex3_p1.md')

|    | method                | loss_function       | optimizer   |   accuracy |     loss |   training_size |   testing_size |   epochs |   batch_size |
|---:|:----------------------|:--------------------|:------------|-----------:|---------:|----------------:|---------------:|---------:|-------------:|
|  0 | binary_classification | binary_crossentropy | rmsprop     |   0.87208  | 0.322585 |           25000 |          25000 |        4 |          512 |
|  0 | binary_classification | binary_crossentropy | rmsprop     |   0.87468  | 0.306827 |           15000 |          25000 |        4 |          512 |
|  0 | binary_classification | binary_crossentropy | rmsprop     |   0.875867 | 0.316983 |           25000 |          15000 |        4 |          512 |
|  0 | binary_classification | binary_crossentropy | rmsprop     |   0.85388  | 0.716287 |           25000 |          25000 |       10 |          128 |
|  0 | binary_classification | binary_crossentropy | rmsprop     |   0.87208  | 0.354711 |           25000 |          25000 |       10 |         1024 |
|  0 | binary_classification | binary_crossentropy | rmsprop     |   0.87036  | 0.333041 |           25000 |          25000 |        4 |         1024 |
|  0 | binary_classification | binary_crossentropy | rmsprop     |   0.8702   | 0.365211 |           25000 |          25000 |        4 |          128 |
|  0 | binary_classification | binary_crossentropy | rmsprop     |   0.87276  | 0.37444  |           10000 |          25000 |        2 |          512 |
|  0 | binary_classification | binary_crossentropy | rmsprop     |   0.87336  | 0.326006 |           10000 |          25000 |        2 |          256 |
|  0 | binary_classification | binary_crossentropy | rmsprop     |   0.88152  | 0.294228 |           10000 |          25000 |        2 |          128 |
|  0 | binary_classification | binary_crossentropy | rmsprop     |   0.87532  | 0.309221 |            8000 |          25000 |        2 |           64 |
|  0 | binary_classification | binary_crossentropy | rmsprop     |   0.879    | 0.296283 |           10000 |          10000 |        2 |          128 |

The accuracy seems to increase when the size of the training data is decreased, down to around a training size of 10000. The loss seems to decrease, when lowering the epochs to 2 and the batch size to around 128.

### task 11

In [57]:
test_plan = pd.DataFrame(columns=['method','loss_function','optimizer','accuracy','loss','training_size','testing_size','epochs','batch_size', 'hidden_layers'])

In [58]:
def run_network(x_train, y_train, x_test, y_test, training_size=25000, testing_size=25000, epochs_num = 4, batch_size_num = 512, hidden_layers=2):
    
    network = models.Sequential()
    network.add(layers.Dense(16, activation='relu', input_shape=(10000,)))
    if hidden_layers > 1 :
        for i in range(hidden_layers-1):
            network.add(layers.Dense(16, activation='relu'))        
    
    network.add(layers.Dense(1, activation='sigmoid'))
    
    network.compile(optimizer=RMSprop(learning_rate=0.001),loss='binary_crossentropy',metrics=['accuracy'])
    
    x_train = x_train[:training_size]
    y_train = y_train[:training_size]
    
    x_test = x_test[:testing_size]
    y_test = y_test[:testing_size]
    
    network.fit(x_train, y_train, epochs=epochs_num, batch_size=batch_size_num)

    results = network.evaluate(x_test, y_test)

    test_plan_dict = {'method':['binary_classification'], 
    'loss_function':['binary_crossentropy'],
    'optimizer':['rmsprop'],
    'accuracy' : [results[1]],
    'loss':[results[0]],
    'training_size':[training_size],
    'testing_size':[testing_size],
    'epochs':[epochs_num],
    'batch_size':[batch_size_num],
    'hidden_layers':[hidden_layers]
    }
    
    df = pd.DataFrame(test_plan_dict)

    return df

In [59]:
tr = run_network(x_train, y_train, x_test, y_test, hidden_layers=1)
test_plan = pd.concat([test_plan, tr])
test_plan

Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


Unnamed: 0,method,loss_function,optimizer,accuracy,loss,training_size,testing_size,epochs,batch_size,hidden_layers
0,binary_classification,binary_crossentropy,rmsprop,0.88756,0.280807,25000,25000,4,512,1


In [60]:
tr = run_network(x_train, y_train, x_test, y_test, hidden_layers=3)
test_plan = pd.concat([test_plan, tr])
test_plan

Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


Unnamed: 0,method,loss_function,optimizer,accuracy,loss,training_size,testing_size,epochs,batch_size,hidden_layers
0,binary_classification,binary_crossentropy,rmsprop,0.88756,0.280807,25000,25000,4,512,1
0,binary_classification,binary_crossentropy,rmsprop,0.87928,0.311819,25000,25000,4,512,3


In [61]:
tr = run_network(x_train, y_train, x_test, y_test, hidden_layers=2)
test_plan = pd.concat([test_plan, tr])
test_plan

Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


Unnamed: 0,method,loss_function,optimizer,accuracy,loss,training_size,testing_size,epochs,batch_size,hidden_layers
0,binary_classification,binary_crossentropy,rmsprop,0.88756,0.280807,25000,25000,4,512,1
0,binary_classification,binary_crossentropy,rmsprop,0.87928,0.311819,25000,25000,4,512,3
0,binary_classification,binary_crossentropy,rmsprop,0.874,0.315143,25000,25000,4,512,2


In [64]:
test_plan.to_markdown('ex3_p2.md')

|    | method                | loss_function       | optimizer   |   accuracy |     loss |   training_size |   testing_size |   epochs |   batch_size |   hidden_layers |
|---:|:----------------------|:--------------------|:------------|-----------:|---------:|----------------:|---------------:|---------:|-------------:|----------------:|
|  0 | binary_classification | binary_crossentropy | rmsprop     |    0.88756 | 0.280807 |           25000 |          25000 |        4 |          512 |               1 |
|  0 | binary_classification | binary_crossentropy | rmsprop     |    0.87928 | 0.311819 |           25000 |          25000 |        4 |          512 |               3 |
|  0 | binary_classification | binary_crossentropy | rmsprop     |    0.874   | 0.315143 |           25000 |          25000 |        4 |          512 |               2 |

The network containing only one layer seems to perform with a higher accuracy and less loss.

### task 12

In [72]:
test_plan = pd.DataFrame(columns=['method','loss_function','optimizer','accuracy','loss','training_size','testing_size','epochs','batch_size', 'hidden_layers','hidden_units','activation_function'])

In [73]:
def run_network(x_train, y_train, x_test, y_test, training_size=25000, testing_size=25000, epochs_num = 4, batch_size_num = 512, hidden_layers=2, hidden_units=16, loss_function='binary_crossentropy', activation_function='relu'):
    
    network = models.Sequential()
    network.add(layers.Dense(hidden_units, activation=activation_function, input_shape=(10000,)))
    if hidden_layers > 1 :
        for i in range(hidden_layers-1):
            network.add(layers.Dense(hidden_units, activation=activation_function))        
    
    network.add(layers.Dense(1, activation='sigmoid'))
    
    network.compile(optimizer=RMSprop(learning_rate=0.001), loss=loss_function, metrics=['accuracy'])
    
    x_train = x_train[:training_size]
    y_train = y_train[:training_size]
    
    x_test = x_test[:testing_size]
    y_test = y_test[:testing_size]
    
    network.fit(x_train, y_train, epochs=epochs_num, batch_size=batch_size_num)

    results = network.evaluate(x_test, y_test)

    test_plan_dict = {'method':['binary_classification'], 
    'loss_function':[loss_function],
    'optimizer':['rmsprop'],
    'accuracy' : [results[1]],
    'loss':[results[0]],
    'training_size':[training_size],
    'testing_size':[testing_size],
    'epochs':[epochs_num],
    'batch_size':[batch_size_num],
    'hidden_layers':[hidden_layers],
    'hidden_units':[hidden_units],
    'activation_function':[activation_function]
    }
    
    df = pd.DataFrame(test_plan_dict)

    return df

In [75]:
tr = run_network(x_train, y_train, x_test, y_test, hidden_units=32)
test_plan = pd.concat([test_plan, tr])
test_plan

Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


Unnamed: 0,method,loss_function,optimizer,accuracy,loss,training_size,testing_size,epochs,batch_size,hidden_layers,hidden_units,activation_function
0,binary_classification,binary_crossentropy,rmsprop,0.88064,0.312139,25000,25000,4,512,2,32,relu


In [76]:
tr = run_network(x_train, y_train, x_test, y_test, hidden_units=64)
test_plan = pd.concat([test_plan, tr])
test_plan

Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


Unnamed: 0,method,loss_function,optimizer,accuracy,loss,training_size,testing_size,epochs,batch_size,hidden_layers,hidden_units,activation_function
0,binary_classification,binary_crossentropy,rmsprop,0.88064,0.312139,25000,25000,4,512,2,32,relu
0,binary_classification,binary_crossentropy,rmsprop,0.86824,0.356516,25000,25000,4,512,2,64,relu


In [77]:
tr = run_network(x_train, y_train, x_test, y_test, hidden_units=128)
test_plan = pd.concat([test_plan, tr])
test_plan

Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


Unnamed: 0,method,loss_function,optimizer,accuracy,loss,training_size,testing_size,epochs,batch_size,hidden_layers,hidden_units,activation_function
0,binary_classification,binary_crossentropy,rmsprop,0.88064,0.312139,25000,25000,4,512,2,32,relu
0,binary_classification,binary_crossentropy,rmsprop,0.86824,0.356516,25000,25000,4,512,2,64,relu
0,binary_classification,binary_crossentropy,rmsprop,0.86904,0.353826,25000,25000,4,512,2,128,relu


In [79]:
test_plan.to_markdown('ex3_p3.md')

|    | method                | loss_function       | optimizer   |   accuracy |     loss |   training_size |   testing_size |   epochs |   batch_size |   hidden_layers |   hidden_units | activation_function   |
|---:|:----------------------|:--------------------|:------------|-----------:|---------:|----------------:|---------------:|---------:|-------------:|----------------:|---------------:|:----------------------|
|  0 | binary_classification | binary_crossentropy | rmsprop     |    0.88064 | 0.312139 |           25000 |          25000 |        4 |          512 |               2 |             32 | relu                  |
|  0 | binary_classification | binary_crossentropy | rmsprop     |    0.86824 | 0.356516 |           25000 |          25000 |        4 |          512 |               2 |             64 | relu                  |
|  0 | binary_classification | binary_crossentropy | rmsprop     |    0.86904 | 0.353826 |           25000 |          25000 |        4 |          512 |               2 |            128 | relu                  |

Increasing the amount of units in the hidden layers lowers the accuracy in this use case.

### task 13

In [82]:
test_plan = pd.DataFrame(columns=['method','loss_function','optimizer','accuracy','loss','training_size','testing_size','epochs','batch_size', 'hidden_layers','hidden_units','activation_function'])

In [83]:
tr = run_network(x_train, y_train, x_test, y_test)
test_plan = pd.concat([test_plan, tr])
test_plan

Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


Unnamed: 0,method,loss_function,optimizer,accuracy,loss,training_size,testing_size,epochs,batch_size,hidden_layers,hidden_units,activation_function
0,binary_classification,binary_crossentropy,rmsprop,0.88092,0.301691,25000,25000,4,512,2,16,relu


In [84]:
tr = run_network(x_train, y_train, x_test, y_test, loss_function='mse')
test_plan = pd.concat([test_plan, tr])
test_plan

Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


Unnamed: 0,method,loss_function,optimizer,accuracy,loss,training_size,testing_size,epochs,batch_size,hidden_layers,hidden_units,activation_function
0,binary_classification,binary_crossentropy,rmsprop,0.88092,0.301691,25000,25000,4,512,2,16,relu
0,binary_classification,mse,rmsprop,0.88172,0.087236,25000,25000,4,512,2,16,relu


In [85]:
print(test_plan.to_markdown())

|    | method                | loss_function       | optimizer   |   accuracy |      loss |   training_size |   testing_size |   epochs |   batch_size |   hidden_layers |   hidden_units | activation_function   |
|---:|:----------------------|:--------------------|:------------|-----------:|----------:|----------------:|---------------:|---------:|-------------:|----------------:|---------------:|:----------------------|
|  0 | binary_classification | binary_crossentropy | rmsprop     |    0.88092 | 0.301691  |           25000 |          25000 |        4 |          512 |               2 |             16 | relu                  |
|  0 | binary_classification | mse                 | rmsprop     |    0.88172 | 0.0872356 |           25000 |          25000 |        4 |          512 |               2 |             16 | relu                  |


In [86]:
test_plan.to_markdown('ex3_p4.md')

|    | method                | loss_function       | optimizer   |   accuracy |      loss |   training_size |   testing_size |   epochs |   batch_size |   hidden_layers |   hidden_units | activation_function   |
|---:|:----------------------|:--------------------|:------------|-----------:|----------:|----------------:|---------------:|---------:|-------------:|----------------:|---------------:|:----------------------|
|  0 | binary_classification | binary_crossentropy | rmsprop     |    0.88092 | 0.301691  |           25000 |          25000 |        4 |          512 |               2 |             16 | relu                  |
|  0 | binary_classification | mse                 | rmsprop     |    0.88172 | 0.0872356 |           25000 |          25000 |        4 |          512 |               2 |             16 | relu                  |

Changin the loss function to mse increases the accuracy by a barely significant amount, but the loss value is reduced massively.

### task 14

In [87]:
test_plan = pd.DataFrame(columns=['method','loss_function','optimizer','accuracy','loss','training_size','testing_size','epochs','batch_size', 'hidden_layers','hidden_units','activation_function'])

In [88]:
tr = run_network(x_train, y_train, x_test, y_test)
test_plan = pd.concat([test_plan, tr])
test_plan

Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


Unnamed: 0,method,loss_function,optimizer,accuracy,loss,training_size,testing_size,epochs,batch_size,hidden_layers,hidden_units,activation_function
0,binary_classification,binary_crossentropy,rmsprop,0.88408,0.294821,25000,25000,4,512,2,16,relu


In [89]:
tr = run_network(x_train, y_train, x_test, y_test, activation_function='tanh')
test_plan = pd.concat([test_plan, tr])
test_plan

Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


Unnamed: 0,method,loss_function,optimizer,accuracy,loss,training_size,testing_size,epochs,batch_size,hidden_layers,hidden_units,activation_function
0,binary_classification,binary_crossentropy,rmsprop,0.88408,0.294821,25000,25000,4,512,2,16,relu
0,binary_classification,binary_crossentropy,rmsprop,0.87772,0.321474,25000,25000,4,512,2,16,tanh


In [90]:
print(test_plan.to_markdown())

|    | method                | loss_function       | optimizer   |   accuracy |     loss |   training_size |   testing_size |   epochs |   batch_size |   hidden_layers |   hidden_units | activation_function   |
|---:|:----------------------|:--------------------|:------------|-----------:|---------:|----------------:|---------------:|---------:|-------------:|----------------:|---------------:|:----------------------|
|  0 | binary_classification | binary_crossentropy | rmsprop     |    0.88408 | 0.294821 |           25000 |          25000 |        4 |          512 |               2 |             16 | relu                  |
|  0 | binary_classification | binary_crossentropy | rmsprop     |    0.87772 | 0.321474 |           25000 |          25000 |        4 |          512 |               2 |             16 | tanh                  |


In [91]:
test_plan.to_markdown('ex3_p5.md')

|    | method                | loss_function       | optimizer   |   accuracy |     loss |   training_size |   testing_size |   epochs |   batch_size |   hidden_layers |   hidden_units | activation_function   |
|---:|:----------------------|:--------------------|:------------|-----------:|---------:|----------------:|---------------:|---------:|-------------:|----------------:|---------------:|:----------------------|
|  0 | binary_classification | binary_crossentropy | rmsprop     |    0.88408 | 0.294821 |           25000 |          25000 |        4 |          512 |               2 |             16 | relu                  |
|  0 | binary_classification | binary_crossentropy | rmsprop     |    0.87772 | 0.321474 |           25000 |          25000 |        4 |          512 |               2 |             16 | tanh                  |

Using the tanh activation function, instead of relu, increases the accuracy and decreases loss.

### using a trained network to generate predictions on new data

After having trained a network, you want to use it in a practical setting. You can generate the likelihood of reviews being positive by using the `predict` method:

In [93]:
r = network.predict(x_test)
print(r[:10])
print(r[-10:])

[[0.18460888]
 [0.9996491 ]
 [0.7816341 ]
 [0.7415408 ]
 [0.93561536]
 [0.7899484 ]
 [0.9991618 ]
 [0.00424661]
 [0.9668042 ]
 [0.9880916 ]]
[[6.7427123e-01]
 [1.2729563e-04]
 [9.9499589e-01]
 [4.0960208e-01]
 [9.5713991e-01]
 [9.9964964e-01]
 [3.1243515e-01]
 [8.8947229e-02]
 [4.8776586e-02]
 [4.3542066e-01]]


Print out the first 10 and the last 10 entries.
As you can see, the network is very confident for some samples (0.99, 0.05) but less confident for others (0.7, 0.2). 

----------------

---------------------------------
## <font color="ce33ff">SIDE PROJECT FOR IMPROVING YOUR PYTHON SKILLS</font>

**DESCRIPTION:**
Here is a way implemented to quickly decode one of the above used reviews back to English words

In [94]:
# word_index is a dictionary mapping words to an integer index
word_index = imdb.get_word_index()

# reverse it, mapping integer indices to words
reverse_word_index = dict([(value, key) for (key, value) in word_index.items()])

# decode the review; note that the indices were offset by 3
# because 0, 1 and 2 are reserved indices for "padding", "start of sequence", and "unknown".
decoded_review = ' '.join([reverse_word_index.get(i - 3, '?') for i in train_data[0]])

In [95]:
decoded_review

"? this film was just brilliant casting location scenery story direction everyone's really suited the part they played and you could just imagine being there robert ? is an amazing actor and now the same being director ? father came from the same scottish island as myself so i loved the fact there was a real connection with this film the witty remarks throughout the film were great it was just brilliant so much that i bought the film as soon as it was released for ? and would recommend it to everyone to watch and the fly fishing was amazing really cried at the end it was so sad and you know what they say if you cry at a film it must have been good and this definitely was also ? to the two little boy's that played the ? of norman and paul they were just brilliant children are often left out of the ? list i think because the stars that play them all grown up are such a big profile for the whole film but these children are amazing and should be praised for what they have done don't you th

### table

In [107]:
p1 = pd.read_table('ex3_p1.md', sep='|')
p2 = pd.read_table('ex3_p2.md', sep='|')
p3 = pd.read_table('ex3_p3.md', sep='|')
p4 = pd.read_table('ex3_p4.md', sep='|')
p5 = pd.read_table('ex3_p5.md', sep='|')

In [133]:
def tt_import_clean(parts, exercise_num):
    d={}
    parts = parts-1
    for i in range(parts):
        i+=1
        d["p{0}".format(i)] = pd.read_table('ex'+str(exercise_num)+'_p'+str(i)+'.md', sep='|')
        d["p{0}".format(i)] = d["p{0}".format(i)].iloc[1:]
        d["p{0}".format(i)] = d["p{0}".format(i)].iloc[:,:-1]
        d["p{0}".format(i)] = d["p{0}".format(i)].iloc[:,1:]
    return d


In [134]:
d = tt_import_clean(5,3)

In [140]:
pd.concat(d.values()).to_markdown('ex3_complete.md')

In [141]:
print(pd.concat(d.values()).to_markdown())

|    |        |  method                   |  loss_function          |  optimizer      |      accuracy  |        loss  |      training_size  |      testing_size  |      epochs  |      batch_size  |      hidden_layers  |      hidden_units  |  activation_function      |         loss  |
|---:|-------:|:--------------------------|:------------------------|:----------------|---------------:|-------------:|--------------------:|-------------------:|-------------:|-----------------:|--------------------:|-------------------:|:--------------------------|--------------:|
|  1 |     0  | binary_classification     | binary_crossentropy     | rmsprop         |       0.87208  |     0.322585 |              25000  |             25000  |           4  |             512  |                 nan |                nan | nan                       |   nan         |
|  2 |     0  | binary_classification     | binary_crossentropy     | rmsprop         |       0.87468  |     0.306827 |              15000  |       

|    |        |  method                   |  loss_function          |  optimizer      |      accuracy  |        loss  |      training_size  |      testing_size  |      epochs  |      batch_size  |      hidden_layers  |      hidden_units  |  activation_function      |         loss  |
|---:|-------:|:--------------------------|:------------------------|:----------------|---------------:|-------------:|--------------------:|-------------------:|-------------:|-----------------:|--------------------:|-------------------:|:--------------------------|--------------:|
|  1 |     0  | binary_classification     | binary_crossentropy     | rmsprop         |       0.87208  |     0.322585 |              25000  |             25000  |           4  |             512  |                 nan |                nan | nan                       |   nan         |
|  2 |     0  | binary_classification     | binary_crossentropy     | rmsprop         |       0.87468  |     0.306827 |              15000  |             25000  |           4  |             512  |                 nan |                nan | nan                       |   nan         |
|  3 |     0  | binary_classification     | binary_crossentropy     | rmsprop         |       0.875867 |     0.316983 |              25000  |             15000  |           4  |             512  |                 nan |                nan | nan                       |   nan         |
|  4 |     0  | binary_classification     | binary_crossentropy     | rmsprop         |       0.85388  |     0.716287 |              25000  |             25000  |          10  |             128  |                 nan |                nan | nan                       |   nan         |
|  5 |     0  | binary_classification     | binary_crossentropy     | rmsprop         |       0.87208  |     0.354711 |              25000  |             25000  |          10  |            1024  |                 nan |                nan | nan                       |   nan         |
|  6 |     0  | binary_classification     | binary_crossentropy     | rmsprop         |       0.87036  |     0.333041 |              25000  |             25000  |           4  |            1024  |                 nan |                nan | nan                       |   nan         |
|  7 |     0  | binary_classification     | binary_crossentropy     | rmsprop         |       0.8702   |     0.365211 |              25000  |             25000  |           4  |             128  |                 nan |                nan | nan                       |   nan         |
|  8 |     0  | binary_classification     | binary_crossentropy     | rmsprop         |       0.87276  |     0.37444  |              10000  |             25000  |           2  |             512  |                 nan |                nan | nan                       |   nan         |
|  9 |     0  | binary_classification     | binary_crossentropy     | rmsprop         |       0.87336  |     0.326006 |              10000  |             25000  |           2  |             256  |                 nan |                nan | nan                       |   nan         |
| 10 |     0  | binary_classification     | binary_crossentropy     | rmsprop         |       0.88152  |     0.294228 |              10000  |             25000  |           2  |             128  |                 nan |                nan | nan                       |   nan         |
| 11 |     0  | binary_classification     | binary_crossentropy     | rmsprop         |       0.87532  |     0.309221 |               8000  |             25000  |           2  |              64  |                 nan |                nan | nan                       |   nan         |
| 12 |     0  | binary_classification     | binary_crossentropy     | rmsprop         |       0.879    |     0.296283 |              10000  |             10000  |           2  |             128  |                 nan |                nan | nan                       |   nan         |
|  1 |     0  | binary_classification     | binary_crossentropy     | rmsprop         |       0.88756  |     0.280807 |              25000  |             25000  |           4  |             512  |                   1 |                nan | nan                       |   nan         |
|  2 |     0  | binary_classification     | binary_crossentropy     | rmsprop         |       0.87928  |     0.311819 |              25000  |             25000  |           4  |             512  |                   3 |                nan | nan                       |   nan         |
|  3 |     0  | binary_classification     | binary_crossentropy     | rmsprop         |       0.874    |     0.315143 |              25000  |             25000  |           4  |             512  |                   2 |                nan | nan                       |   nan         |
|  1 |     0  | binary_classification     | binary_crossentropy     | rmsprop         |       0.88064  |     0.312139 |              25000  |             25000  |           4  |             512  |                   2 |                 32 | relu                      |   nan         |
|  2 |     0  | binary_classification     | binary_crossentropy     | rmsprop         |       0.86824  |     0.356516 |              25000  |             25000  |           4  |             512  |                   2 |                 64 | relu                      |   nan         |
|  3 |     0  | binary_classification     | binary_crossentropy     | rmsprop         |       0.86904  |     0.353826 |              25000  |             25000  |           4  |             512  |                   2 |                128 | relu                      |   nan         |
|  1 |     0  | binary_classification     | binary_crossentropy     | rmsprop         |       0.88092  |   nan        |              25000  |             25000  |           4  |             512  |                   2 |                 16 | relu                      |     0.301691  |
|  2 |     0  | binary_classification     | mse                     | rmsprop         |       0.88172  |   nan        |              25000  |             25000  |           4  |             512  |                   2 |                 16 | relu                      |     0.0872356 |