__Neural Nets with Keras__

*Oscar Padilla*

# Introduction
The goal of this case study is to analyze the effects of different input parameters into the neural network API Keras.

The assignment consists of

1. Developing 3 different architectures: number of layers and neurons

2. Trying diffeerent activation functions (e.g. `relu`, `tanh`)

3. Varying `batch_size`

4. Experimenting with different `kernel_initializer`

5. Trying different `optimizer`

# Background

Let's understand the original dataset

## HIGGS Data Set

> This is a classification problem to distinguish between a signal process which produces Higgs bosons and a background process which does not.

> The data has been produced using Monte Carlo simulations. The first 21 features (columns 2-22) are kinematic properties measured by the particle detectors in the accelerator. The last seven features are functions of the first 21 features; these are high-level features derived by physicists to help discriminate between the two classes. There is an interest in using deep learning methods to obviate the need for physicists to manually develop such features. Benchmark results using Bayesian Decision Trees from a standard physics package and 5-layer neural networks are presented in the original paper. The last 500,000 examples are used as a test set. [1]

# Methods

## Setup and Data Import

First, a Tensorflow Keras environment needed to be created. I cannot overemphasize the great effort put to have a workable environment, which was achieved thanks to the advice of Prof. Slater and the guidance provided in '*Set up Anaconda, Jupyter Notebook, Tensorflow for Deep Learning*' [2]

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import keras

Using TensorFlow backend.


In [2]:
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation, Flatten
from keras.optimizers import SGD, Adam, RMSprop, Adagrad
from sklearn.metrics import roc_auc_score

In [3]:
N=1050000. #Change this line adjust the number of rows. 
data=pd.read_csv("HIGGS.csv",nrows=N,header=None)
test_data=pd.read_csv("HIGGS.csv",nrows=500000,header=None,skiprows=1050000)

In [4]:
data.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,19,20,21,22,23,24,25,26,27,28
0,1.0,0.869293,-0.635082,0.22569,0.32747,-0.689993,0.754202,-0.248573,-1.092064,0.0,...,-0.010455,-0.045767,3.101961,1.35376,0.979563,0.978076,0.920005,0.721657,0.988751,0.876678
1,1.0,0.907542,0.329147,0.359412,1.49797,-0.31301,1.095531,-0.557525,-1.58823,2.173076,...,-1.13893,-0.000819,0.0,0.30222,0.833048,0.9857,0.978098,0.779732,0.992356,0.798343
2,1.0,0.798835,1.470639,-1.635975,0.453773,0.425629,1.104875,1.282322,1.381664,0.0,...,1.128848,0.900461,0.0,0.909753,1.10833,0.985692,0.951331,0.803252,0.865924,0.780118
3,0.0,1.344385,-0.876626,0.935913,1.99205,0.882454,1.786066,-1.646778,-0.942383,0.0,...,-0.678379,-1.360356,0.0,0.946652,1.028704,0.998656,0.728281,0.8692,1.026736,0.957904
4,1.0,1.105009,0.321356,1.522401,0.882808,-1.205349,0.681466,-1.070464,-0.921871,0.0,...,-0.373566,0.113041,0.0,0.755856,1.361057,0.98661,0.838085,1.133295,0.872245,0.808487


In [5]:
data.describe()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,19,20,21,22,23,24,25,26,27,28
count,1050000.0,1050000.0,1050000.0,1050000.0,1050000.0,1050000.0,1050000.0,1050000.0,1050000.0,1050000.0,...,1050000.0,1050000.0,1050000.0,1050000.0,1050000.0,1050000.0,1050000.0,1050000.0,1050000.0,1050000.0
mean,0.529619,0.9915832,0.0007197975,-0.0004485629,0.9982084,-0.0007806863,0.9907035,-0.0007582857,0.0004148951,1.000331,...,-0.0004109631,-0.001580198,0.9982186,1.034462,1.024987,1.0505,1.009884,0.9732704,1.033413,0.9599648
std,0.4991222,0.5649974,1.008404,1.005799,0.5991746,1.006752,0.4751551,1.01029,1.00621,1.027721,...,1.007888,1.005828,1.399447,0.6728666,0.3800901,0.1642851,0.3980071,0.5252274,0.3651356,0.3132762
min,0.0,0.2746966,-2.434976,-1.742508,0.0006259872,-1.743944,0.1386017,-2.969725,-1.741237,0.0,...,-2.497265,-1.742691,0.0,0.1011684,0.2347527,0.09220192,0.1574726,0.04812501,0.3033497,0.3509388
25%,0.0,0.5907533,-0.7363746,-0.8719308,0.5762861,-0.8718935,0.6786263,-0.6882352,-0.8680962,0.0,...,-0.7141902,-0.8720338,0.0,0.7907532,0.8463601,0.9857513,0.767317,0.673947,0.8193815,0.7703798
50%,1.0,0.8537375,0.0009198132,0.001526303,0.8916285,-0.001136021,0.8942697,-0.001015666,0.0007152493,1.086538,...,0.000372133,-0.00525859,0.0,0.8949573,0.9506129,0.9897675,0.9163983,0.8735931,0.9475113,0.8718547
75%,1.0,1.236958,0.7382142,0.8693294,1.293289,0.87116,1.170832,0.6881843,0.8699757,2.173076,...,0.7149345,0.8693858,3.101961,1.024431,1.08351,1.02014,1.142226,1.13975,1.140837,1.059417
max,1.0,8.711782,2.434868,1.743236,9.900929,1.743257,8.38261,2.969674,1.741454,2.173076,...,2.498009,1.743372,3.101961,31.07619,15.63786,5.921233,10.79409,13.73569,8.779915,6.259156


In [6]:
y = np.array(data.loc[:,0])
x = np.array(data.loc[:,1:])
x_test = np.array(test_data.loc[:,1:])
y_test = np.array(test_data.loc[:,0])

## Base Model

In [7]:
model = Sequential()
model.add(Dense(50, input_dim=x.shape[1], kernel_initializer='uniform')) # X_train.shape[1] == 28 here
model.add(Activation('sigmoid'))
model.add(Dropout(0.10))
model.add(Dense(50, kernel_initializer='uniform'))
model.add(Activation('sigmoid'))
model.add(Dropout(0.10))
model.add(Dense(1, kernel_initializer='uniform')) 
model.add(Activation('sigmoid'))

Instructions for updating:
Colocations handled automatically by placer.
Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.


In [8]:
sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='binary_crossentropy', metrics=['accuracy'], optimizer=sgd)

In [9]:
model.fit(x, y, epochs=5, batch_size=1000)
model_ROC = roc_auc_score(y_test,model.predict(x_test))
print(model_ROC)

Instructions for updating:
Use tf.cast instead.
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
0.6697949124675913


In [10]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_1 (Dense)              (None, 50)                1450      
_________________________________________________________________
activation_1 (Activation)    (None, 50)                0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 50)                0         
_________________________________________________________________
dense_2 (Dense)              (None, 50)                2550      
_________________________________________________________________
activation_2 (Activation)    (None, 50)                0         
_________________________________________________________________
dropout_2 (Dropout)          (None, 50)                0         
_________________________________________________________________
dense_3 (Dense)              (None, 1)                 51        
__________

## Architectures

### Model 2: Adding Neurons

In [11]:
model2 = Sequential()
model2.add(Dense(100, input_dim=x.shape[1], kernel_initializer='uniform')) # X_train.shape[1] == 28 here
model2.add(Activation('sigmoid'))
model2.add(Dropout(0.10))
model2.add(Dense(100, kernel_initializer='uniform'))
model2.add(Activation('sigmoid'))
model2.add(Dropout(0.10))
model2.add(Dense(1, kernel_initializer='uniform')) 
model2.add(Activation('sigmoid'))

In [12]:
sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)
model2.compile(loss='binary_crossentropy', metrics=['accuracy'], optimizer=sgd)

In [13]:
model2.fit(x, y, epochs=5, batch_size=1000)
model2_ROC = roc_auc_score(y_test,model2.predict(x_test))
print(model2_ROC)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
0.6803245192248515


In [14]:
model2.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_4 (Dense)              (None, 100)               2900      
_________________________________________________________________
activation_4 (Activation)    (None, 100)               0         
_________________________________________________________________
dropout_3 (Dropout)          (None, 100)               0         
_________________________________________________________________
dense_5 (Dense)              (None, 100)               10100     
_________________________________________________________________
activation_5 (Activation)    (None, 100)               0         
_________________________________________________________________
dropout_4 (Dropout)          (None, 100)               0         
_________________________________________________________________
dense_6 (Dense)              (None, 1)                 101       
__________

### Model 3: Adding 2 Layers

In [15]:
model3 = Sequential()
model3.add(Dense(50, input_dim=x.shape[1], kernel_initializer='uniform')) # X_train.shape[1] == 28 here
model3.add(Activation('sigmoid'))
model3.add(Dropout(0.10))
model3.add(Dense(50, kernel_initializer='uniform'))
model3.add(Activation('sigmoid'))
model3.add(Dropout(0.10))
model3.add(Dense(50, kernel_initializer='uniform'))
model3.add(Activation('sigmoid'))
model3.add(Dropout(0.10))
model3.add(Dense(50, kernel_initializer='uniform'))
model3.add(Activation('sigmoid'))
model3.add(Dropout(0.10))
model3.add(Dense(1, kernel_initializer='uniform')) 
model3.add(Activation('sigmoid'))

In [16]:
sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)
model3.compile(loss='binary_crossentropy', metrics=['accuracy'], optimizer=sgd)

In [17]:
model3.fit(x, y, epochs=5, batch_size=1000)
model3_ROC = roc_auc_score(y_test,model3.predict(x_test))
print(model3_ROC)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
0.4939073671115921


In [18]:
model3.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_7 (Dense)              (None, 50)                1450      
_________________________________________________________________
activation_7 (Activation)    (None, 50)                0         
_________________________________________________________________
dropout_5 (Dropout)          (None, 50)                0         
_________________________________________________________________
dense_8 (Dense)              (None, 50)                2550      
_________________________________________________________________
activation_8 (Activation)    (None, 50)                0         
_________________________________________________________________
dropout_6 (Dropout)          (None, 50)                0         
_________________________________________________________________
dense_9 (Dense)              (None, 50)                2550      
__________

### Model 4: Adding Neurons less One Layer

In [19]:
model4 = Sequential()
model4.add(Dense(250, input_dim=x.shape[1], kernel_initializer='uniform')) # X_train.shape[1] == 28 here
model4.add(Activation('sigmoid'))
model4.add(Dropout(0.10))
model4.add(Dense(1, kernel_initializer='uniform')) 
model4.add(Activation('sigmoid'))

In [20]:
sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)
model4.compile(loss='binary_crossentropy', metrics=['accuracy'], optimizer=sgd)

In [21]:
model4.fit(x, y, epochs=5, batch_size=1000)
model4_ROC = roc_auc_score(y_test,model4.predict(x_test))
print(model4_ROC)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
0.7251389819070889


In [22]:
model4.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_12 (Dense)             (None, 250)               7250      
_________________________________________________________________
activation_12 (Activation)   (None, 250)               0         
_________________________________________________________________
dropout_9 (Dropout)          (None, 250)               0         
_________________________________________________________________
dense_13 (Dense)             (None, 1)                 251       
_________________________________________________________________
activation_13 (Activation)   (None, 1)                 0         
Total params: 7,501
Trainable params: 7,501
Non-trainable params: 0
_________________________________________________________________


## Activation Functions

### Model 2a: same as Model 2 with `activation = 'relu'`

In [23]:
model2a = Sequential()
model2a.add(Dense(100, input_dim=x.shape[1], kernel_initializer='uniform')) # X_train.shape[1] == 28 here
model2a.add(Activation('relu'))
model2a.add(Dropout(0.10))
model2a.add(Dense(100, kernel_initializer='uniform'))
model2a.add(Activation('relu'))
model2a.add(Dropout(0.10))
model2a.add(Dense(1, kernel_initializer='uniform')) 
model2a.add(Activation('sigmoid'))

In [24]:
sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)
model2a.compile(loss='binary_crossentropy', metrics=['accuracy'], optimizer=sgd)

In [25]:
model2a.fit(x, y, epochs=5, batch_size=1000)
model2a_ROC = roc_auc_score(y_test,model2a.predict(x_test))
print(model2a_ROC)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
0.8059848388712745


### Model 3a: same as Model 3 with `activation = 'tanh'`

In [26]:
model3a = Sequential()
model3a.add(Dense(50, input_dim=x.shape[1], kernel_initializer='uniform')) # X_train.shape[1] == 28 here
model3a.add(Activation('tanh'))
model3a.add(Dropout(0.10))
model3a.add(Dense(50, kernel_initializer='uniform'))
model3a.add(Activation('tanh'))
model3a.add(Dropout(0.10))
model3a.add(Dense(50, kernel_initializer='uniform'))
model3a.add(Activation('tanh'))
model3a.add(Dropout(0.10))
model3a.add(Dense(50, kernel_initializer='uniform'))
model3a.add(Activation('tanh'))
model3a.add(Dropout(0.10))
model3a.add(Dense(1, kernel_initializer='uniform')) 
model3a.add(Activation('sigmoid'))

In [27]:
sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)
model3a.compile(loss='binary_crossentropy', metrics=['accuracy'], optimizer=sgd)

In [28]:
model3a.fit(x, y, epochs=5, batch_size=1000)
model3a_ROC = roc_auc_score(y_test,model3a.predict(x_test))
print(model3a_ROC)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
0.759545479417074


### Model 4a: same as Model 4 with `activation = 'relu'`

In [29]:
model4a = Sequential()
model4a.add(Dense(250, input_dim=x.shape[1], kernel_initializer='uniform')) # X_train.shape[1] == 28 here
model4a.add(Activation('relu'))
model4a.add(Dropout(0.10))
model4a.add(Dense(1, kernel_initializer='uniform')) 
model4a.add(Activation('sigmoid'))

In [30]:
sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)
model4a.compile(loss='binary_crossentropy', metrics=['accuracy'], optimizer=sgd)

In [31]:
model4a.fit(x, y, epochs=5, batch_size=1000)
model4a_ROC = roc_auc_score(y_test,model4a.predict(x_test))
print(model4a_ROC)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
0.7903243844069884


## Batch Size

### Model 2b: same as Model 2a with `batch_size = 100`

In [32]:
model2b = Sequential()
model2b.add(Dense(100, input_dim=x.shape[1], kernel_initializer='uniform')) # X_train.shape[1] == 28 here
model2b.add(Activation('relu'))
model2b.add(Dropout(0.10))
model2b.add(Dense(100, kernel_initializer='uniform'))
model2b.add(Activation('relu'))
model2b.add(Dropout(0.10))
model2b.add(Dense(1, kernel_initializer='uniform')) 
model2b.add(Activation('sigmoid'))

In [33]:
sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)
model2b.compile(loss='binary_crossentropy', metrics=['accuracy'], optimizer=sgd)

In [34]:
model2b.fit(x, y, epochs=5, batch_size=100)
model2b_ROC = roc_auc_score(y_test,model2b.predict(x_test))
print(model2b_ROC)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
0.8039806677602047


### Model 2c: same as Model 2a with `batch_size = 100000`

In [35]:
model2c = Sequential()
model2c.add(Dense(100, input_dim=x.shape[1], kernel_initializer='uniform')) # X_train.shape[1] == 28 here
model2c.add(Activation('relu'))
model2c.add(Dropout(0.10))
model2c.add(Dense(100, kernel_initializer='uniform'))
model2c.add(Activation('relu'))
model2c.add(Dropout(0.10))
model2c.add(Dense(1, kernel_initializer='uniform')) 
model2c.add(Activation('sigmoid'))

In [36]:
sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)
model2c.compile(loss='binary_crossentropy', metrics=['accuracy'], optimizer=sgd)

In [37]:
model2c.fit(x, y, epochs=5, batch_size=100000)
model2c_ROC = roc_auc_score(y_test,model2c.predict(x_test))
print(model2c_ROC)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
0.5854569689943897


## Kernel Initializer

### Model 2d: same as Model 2a with `kernel_initializer = 'normal'`

In [38]:
model2d = Sequential()
model2d.add(Dense(100, input_dim=x.shape[1], kernel_initializer='uniform')) # X_train.shape[1] == 28 here
model2d.add(Activation('relu'))
model2d.add(Dropout(0.10))
model2d.add(Dense(100, kernel_initializer='normal'))
model2d.add(Activation('relu'))
model2d.add(Dropout(0.10))
model2d.add(Dense(1, kernel_initializer='normal')) 
model2d.add(Activation('sigmoid'))

In [39]:
sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)
model2d.compile(loss='binary_crossentropy', metrics=['accuracy'], optimizer=sgd)

In [40]:
model2d.fit(x, y, epochs=5, batch_size=1000)
model2d_ROC = roc_auc_score(y_test,model2d.predict(x_test))
print(model2d_ROC)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
0.8081034868200654


### Model 2e: same as Model 2a with `kernel_initializer = 'zeros'`

In [41]:
model2e = Sequential()
model2e.add(Dense(100, input_dim=x.shape[1], kernel_initializer='uniform')) # X_train.shape[1] == 28 here
model2e.add(Activation('relu'))
model2e.add(Dropout(0.10))
model2e.add(Dense(100, kernel_initializer='zeros'))
model2e.add(Activation('relu'))
model2e.add(Dropout(0.10))
model2e.add(Dense(1, kernel_initializer='zeros')) 
model2e.add(Activation('sigmoid'))

In [42]:
sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)
model2e.compile(loss='binary_crossentropy', metrics=['accuracy'], optimizer=sgd)

In [43]:
model2e.fit(x, y, epochs=5, batch_size=1000)
model2e_ROC = roc_auc_score(y_test,model2e.predict(x_test))
print(model2e_ROC)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
0.5


### Model 2f: same as Model 2a with `kernel_initializer = 'ones'`

In [44]:
model2f = Sequential()
model2f.add(Dense(100, input_dim=x.shape[1], kernel_initializer='uniform')) # X_train.shape[1] == 28 here
model2f.add(Activation('relu'))
model2f.add(Dropout(0.10))
model2f.add(Dense(100, kernel_initializer='ones'))
model2f.add(Activation('relu'))
model2f.add(Dropout(0.10))
model2f.add(Dense(1, kernel_initializer='ones')) 
model2f.add(Activation('sigmoid'))

In [45]:
sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)
model2f.compile(loss='binary_crossentropy', metrics=['accuracy'], optimizer=sgd)

In [46]:
model2f.fit(x, y, epochs=5, batch_size=1000)
model2f_ROC = roc_auc_score(y_test,model2f.predict(x_test))
print(model2f_ROC)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
0.5


## Optimizer

### Model 2g: same as Model 2a with `optimizer = adam`

In [47]:
model2g = Sequential()
model2g.add(Dense(100, input_dim=x.shape[1], kernel_initializer='uniform')) # X_train.shape[1] == 28 here
model2g.add(Activation('relu'))
model2g.add(Dropout(0.10))
model2g.add(Dense(100, kernel_initializer='uniform'))
model2g.add(Activation('relu'))
model2g.add(Dropout(0.10))
model2g.add(Dense(1, kernel_initializer='uniform')) 
model2g.add(Activation('sigmoid'))

In [48]:
adam = Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=None, decay=0.0, amsgrad=False)
model2g.compile(loss='binary_crossentropy', metrics=['accuracy'], optimizer=adam)

In [49]:
model2g.fit(x, y, epochs=5, batch_size=1000)
model2g_ROC = roc_auc_score(y_test,model2g.predict(x_test))
print(model2g_ROC)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
0.8046211061926368


### Model 2h: same as Model 2a with `optimizer = rmsprop`

In [50]:
model2h = Sequential()
model2h.add(Dense(100, input_dim=x.shape[1], kernel_initializer='uniform')) # X_train.shape[1] == 28 here
model2h.add(Activation('relu'))
model2h.add(Dropout(0.10))
model2h.add(Dense(100, kernel_initializer='uniform'))
model2h.add(Activation('relu'))
model2h.add(Dropout(0.10))
model2h.add(Dense(1, kernel_initializer='uniform')) 
model2h.add(Activation('sigmoid'))

In [51]:
rmsprop = RMSprop(lr=0.001, rho=0.9, epsilon=None, decay=0.0001)
model2h.compile(loss='binary_crossentropy', metrics=['accuracy'], optimizer=rmsprop)

In [52]:
model2h.fit(x, y, epochs=5, batch_size=1000)
model2h_ROC = roc_auc_score(y_test,model2h.predict(x_test))
print(model2h_ROC)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
0.7946117282081183


### Model 2i: same as Model 2a with `optimizer = adagrad`

In [53]:
model2i = Sequential()
model2i.add(Dense(100, input_dim=x.shape[1], kernel_initializer='uniform')) # X_train.shape[1] == 28 here
model2i.add(Activation('relu'))
model2i.add(Dropout(0.10))
model2i.add(Dense(100, kernel_initializer='uniform'))
model2i.add(Activation('relu'))
model2i.add(Dropout(0.10))
model2i.add(Dense(1, kernel_initializer='uniform')) 
model2i.add(Activation('sigmoid'))

In [54]:
adagrad = Adagrad(lr=0.01, epsilon=None, decay=0.0)
model2i.compile(loss='binary_crossentropy', metrics=['accuracy'], optimizer=adagrad)

In [55]:
model2i.fit(x, y, epochs=5, batch_size=1000)
model2i_ROC = roc_auc_score(y_test,model2i.predict(x_test))
print(model2i_ROC)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
0.7777813539030459


# Results

Let's summarize all result from all the experiments above in one dataframe that contains the following parameters:
1. number of layers
2. number of neurons
3. activation functions
4. batch size
5. kernel initializer
6. optimizer
7. ROC

In [61]:
r = {'Model' : ['base', 'model2', 'model3', 'model4', 'model2a', 'model3a', 'model4a', 'model2b', 'model2c', 'model2d', 'model2e', 'model2f', 'model2g', 'model2h', 'model2i'],
     'Layers': [3, 3, 5, 2, 3, 5, 2, 3, 3, 3, 3, 3, 3, 3, 3],
     'Neurons' : [50, 100, 50, 250, 100, 50, 250, 100, 100, 100, 100, 100, 100, 100, 100],
     'Activation': ['sigmoid', 'sigmoid', 'sigmoid', 'sigmoid', 'relu', 'tanh', 'relu', 'relu', 'relu', 'relu', 'relu', 'relu', 'relu', 'relu', 'relu'],
     'Batch' : [1000, 1000, 1000, 1000, 1000, 1000, 1000, 100, 100000, 1000, 1000, 1000, 1000, 1000, 1000],
     'Kernel': ['uniform', 'uniform', 'uniform', 'uniform', 'uniform', 'uniform', 'uniform', 'uniform', 'uniform', 'normal', 'zeros', 'ones', 'uniform', 'uniform', 'uniform'],
     'Optimizer': ['SGD', 'SGD', 'SGD', 'SGD', 'SGD', 'SGD', 'SGD', 'SGD', 'SGD', 'SGD', 'SGD', 'SGD', 'Adam', 'RMSprop', 'Adagrad'],
     'ROC' : [model_ROC, model2_ROC, model3_ROC, model4_ROC, model2a_ROC, model3a_ROC, model4a_ROC, model2b_ROC, model2c_ROC, model2d_ROC, model2e_ROC, model2f_ROC, model2g_ROC, model2h_ROC, model2i_ROC]
    }

results = pd.DataFrame(data = r)

results

Unnamed: 0,Model,Layers,Neurons,Activation,Batch,Kernel,Optimizer,ROC
0,base,3,50,sigmoid,1000,uniform,SGD,0.669795
1,model2,3,100,sigmoid,1000,uniform,SGD,0.680325
2,model3,5,50,sigmoid,1000,uniform,SGD,0.493907
3,model4,2,250,sigmoid,1000,uniform,SGD,0.725139
4,model2a,3,100,relu,1000,uniform,SGD,0.805985
5,model3a,5,50,tanh,1000,uniform,SGD,0.759545
6,model4a,2,250,relu,1000,uniform,SGD,0.790324
7,model2b,3,100,relu,100,uniform,SGD,0.803981
8,model2c,3,100,relu,100000,uniform,SGD,0.585457
9,model2d,3,100,relu,1000,normal,SGD,0.808103


# Conclusion

The best model was `model2d` with a __ROC = 0.808103__. It has the following parameters:

- 3 layers
- 100 neurons
- `Activation('relu')`
- `batch_size=1000`
- `kernel_initializer='normal'`
- `optimizer=sgd`

## Key takeaways

For this particular binary classification problem with 28 numerical features,

* Adding neurons increases ROC (`base` to `model2`), while increasing the number of layers had a detrimental effect (`model3`)
* The combination of Sigmoid and Rectified Linear Unit as activation functions had the most material effect lifting the ROC from 0.680 (`model2`) to 0.805 (`model2a`)
* Increasing the batch size from 1000 to 100000 was really bad in terms of ROC, while reducing it to 100 slowed down the process tremendously
* The *normal* kernel initializer performed slightly better than the *uniform* increasing the ROC to 0.808
* SGD outperformed other optimizers, Adam being slightly behind at 0.804
* Further tweaking of the parameters may yield marginal improvements to the ROC. A major ROC gain may require a serious relook to the architecture

# References

1. UC Irvine Machine Learning Repository, https://archive.ics.uci.edu/ml/datasets/HIGGS

2. Set up Anaconda, Jupyter Notebook, Tensorflow for Deep Learning, https://threenine.co.uk/set-up-anaconda-jupyter-notebook-tensorflow-for-deep-learning/

3. Keras: The Python Deep Learning library, https://keras.io

4. Aurélien Géron, *Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow*, 2nd Edition, O'Reilly Media