## Case Study 6 (Unit 12): Evaluate Neural Nets using Keras
### Team: Hieu Nguyen, Nithya Devadoss, Ramesh Simhambhatla, Ramya Mandava
### Date: 11/18/2018

## Abstract

In Artifical Intelligence, Artificial Neural Networks (ANN) are computing systems vaguely inspired by the biological neural networks that constitute animal brains. The neural network itself is not an algorithm, but rather a framework for many different machine learning algorithms to work together and process complex data inputs. Such systems "learn" to perform tasks by considering examples, generally without being programmed with any task-specific rules. In this case study, we will leverage Keras API, a python package, capable of fast experimentation for deep learning training and testing. We will experiment with a subset of Higgs Boson experimental data acquired from UCI Machine Learning repository. We will use ROC AUC (Area under ROC Curve, which provides an aggregate measure of performance across all possible classification thresholds) as our model performance metric to measure and compare our models. We will experiment with varying number of neurons, hidden layers, activation functions, kernel initializers, optimizers and various other hyper parameters such as epochs, batch sizes, learning rates etc. We will compare the performance results with the starter (base) mode vs best model and conlcude our findings.

## Introduction

The Higgs boson is an elementary particle in the Standard Model of particle physics, produced by the quantum excitation of the Higgs field, one of the fields in particle physics theory [1].

The data set is acquired from UCI Machine Learning repository: https://archive.ics.uci.edu/ml/datasets/HIGGS

#### Data Set Information:

The data has been produced using Monte Carlo simulations. The first 21 features (columns 2-22) are kinematic properties measured by the particle detectors in the accelerator. The last seven features are functions of the first 21 features; these are high-level features derived by physicists to help discriminate between the two classes. There is an interest in using deep learning methods to obviate the need for physicists to manually develop such features. Benchmark results using Bayesian Decision Trees from a standard physics package and 5-layer neural networks are presented in the original paper. The last 500,000 examples are used as a test set.


#### Attribute Information:

The first column is the class label (1 for signal, 0 for background), followed by the 28 features (21 low-level features then 7 high-level features): lepton pT, lepton eta, lepton phi, missing energy magnitude, missing energy phi, jet 1 pt, jet 1 eta, jet 1 phi, jet 1 b-tag, jet 2 pt, jet 2 eta, jet 2 phi, jet 2 b-tag, jet 3 pt, jet 3 eta, jet 3 phi, jet 3 b-tag, jet 4 pt, jet 4 eta, jet 4 phi, jet 4 b-tag, m_jj, m_jjj, m_lv, m_jlv, m_bb, m_wbb, m_wwbb. For more detailed information about each feature see the original paper.

In [1]:
import pandas as pd
import numpy as np
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation
from keras.optimizers import SGD
from keras.optimizers import Adam
from keras.optimizers import Adagrad
from keras.optimizers import RMSprop
from keras.optimizers import Adamax
from sklearn.metrics import roc_auc_score

  return f(*args, **kwds)
  return f(*args, **kwds)
  return f(*args, **kwds)
  from ._conv import register_converters as _register_converters
Using TensorFlow backend.
  return f(*args, **kwds)
  return f(*args, **kwds)
  return f(*args, **kwds)
  return f(*args, **kwds)


#### We will use the data set of 1100000 observations to train and use 60000 observations to test the model

In [2]:
N=1100000. #Change this line adjust the number of rows. 
data=pd.read_csv("HIGGS.csv",nrows=N,header=None)
test_data=pd.read_csv("HIGGS.csv",nrows=60000,header=None,skiprows=1100000)

In [3]:
def size_it (df):
    total=0
    for i in range(len(data.memory_usage())):
        total=total+data.memory_usage()[i]
    size=total/(2**20)
    print("%.2f Megabytes" % size)
    

In [4]:
size_it(data)
data.shape

251.77 Megabytes


(1100000, 29)

In [5]:
size_it(test_data)
test_data.shape

251.77 Megabytes


(60000, 29)

In [6]:
data.columns=['label','lepton pT', 'lepton eta', 'lepton phi', 'missing energy magnitude', 'missing energy phi', 'jet 1 pt',
             'jet 1 eta', 'jet 1 phi', 'jet 1 b-tag', 'jet 2 pt', 'jet 2 eta', 'jet 2 phi', 'jet 2 b-tag', 'jet 3 pt',
             'jet 3 eta', 'jet 3 phi', 'jet 3 b-tag', 'jet 4 pt', 'jet 4 eta', 'jet 4 phi', 'jet 4 b-tag', 'm_jj',
             'm_jjj', 'm_lv','m_jlv', 'm_bb', 'm_wbb', 'm_wwbb']

In [7]:
test_data.columns=['label','lepton pT', 'lepton eta', 'lepton phi', 'missing energy magnitude', 'missing energy phi', 'jet 1 pt',
             'jet 1 eta', 'jet 1 phi', 'jet 1 b-tag', 'jet 2 pt', 'jet 2 eta', 'jet 2 phi', 'jet 2 b-tag', 'jet 3 pt',
             'jet 3 eta', 'jet 3 phi', 'jet 3 b-tag', 'jet 4 pt', 'jet 4 eta', 'jet 4 phi', 'jet 4 b-tag', 'm_jj',
             'm_jjj', 'm_lv','m_jlv', 'm_bb', 'm_wbb', 'm_wwbb']

In [8]:
data.tail()

Unnamed: 0,label,lepton pT,lepton eta,lepton phi,missing energy magnitude,missing energy phi,jet 1 pt,jet 1 eta,jet 1 phi,jet 1 b-tag,...,jet 4 eta,jet 4 phi,jet 4 b-tag,m_jj,m_jjj,m_lv,m_jlv,m_bb,m_wbb,m_wwbb
1099995,0.0,1.147101,-0.290297,-0.50239,0.787117,-0.115922,1.501992,-0.961539,-0.940719,2.173076,...,0.785724,1.255605,0.0,0.971365,0.957792,0.987488,0.941613,1.499944,1.521772,1.303628
1099996,1.0,1.07829,0.090525,-1.113295,0.8289,0.15326,1.389315,-0.565447,0.124895,0.0,...,0.54254,-1.517396,0.0,0.352488,0.508641,1.191143,1.010111,0.888589,0.931572,0.859181
1099997,1.0,0.91596,0.174286,-0.096232,1.543762,0.596144,0.664335,-0.476326,-1.245072,1.086538,...,-0.108728,0.350543,3.101961,0.462586,0.593328,0.988834,1.154577,0.720061,0.737488,0.683846
1099998,1.0,0.585263,-0.88247,-1.682583,0.990881,0.796417,1.032413,-0.100039,-0.312609,0.0,...,-0.585103,0.029249,0.0,0.800382,0.917762,0.982345,0.820718,0.939589,0.953103,0.852331
1099999,0.0,1.175833,0.074941,1.691634,0.293551,-0.324434,0.932928,2.726078,0.404855,0.0,...,1.120519,1.539165,0.0,0.961684,1.63029,0.993905,0.983109,1.452523,1.619471,1.329248


In [9]:
test_data.tail()

Unnamed: 0,label,lepton pT,lepton eta,lepton phi,missing energy magnitude,missing energy phi,jet 1 pt,jet 1 eta,jet 1 phi,jet 1 b-tag,...,jet 4 eta,jet 4 phi,jet 4 b-tag,m_jj,m_jjj,m_lv,m_jlv,m_bb,m_wbb,m_wwbb
59995,0.0,0.370959,1.178448,-0.839749,0.77205,1.058658,0.807243,0.743637,-0.260498,2.173076,...,0.321009,0.735097,3.101961,0.871797,0.847451,0.989446,0.859421,0.736282,0.769379,0.728622
59996,0.0,0.971595,-1.077263,1.713273,0.31497,0.231261,0.641067,-0.844692,0.510741,0.0,...,-0.909071,1.276136,3.101961,0.745242,0.717253,0.990235,0.53295,0.449628,0.477112,0.507585
59997,1.0,1.003622,-0.219197,1.164516,0.555159,0.221953,1.710307,0.027701,1.581794,0.0,...,-0.401049,-0.963039,3.101961,1.060794,0.926048,0.988824,1.239017,0.924933,1.02152,0.939992
59998,0.0,1.891765,0.0428,-0.227179,0.876925,0.322577,0.782142,2.035887,-1.363153,0.0,...,1.459478,1.006449,0.0,0.758624,1.057242,0.986712,1.217548,0.833781,0.873545,0.999432
59999,1.0,0.989347,-1.482435,-0.797579,0.924406,0.210223,0.481304,-1.085317,1.071214,2.173076,...,0.146949,1.297223,0.0,0.896747,0.723381,1.04509,0.95503,0.740068,0.736319,0.675808


In [10]:
data.iloc[:,0:10].describe()

Unnamed: 0,label,lepton pT,lepton eta,lepton phi,missing energy magnitude,missing energy phi,jet 1 pt,jet 1 eta,jet 1 phi,jet 1 b-tag
count,1100000.0,1100000.0,1100000.0,1100000.0,1100000.0,1100000.0,1100000.0,1100000.0,1100000.0,1100000.0
mean,0.5295173,0.9913837,0.0009533126,-0.0007382432,0.9981465,-0.0007666476,0.9906847,-0.0007511484,0.0005081288,1.000773
std,0.4991282,0.5649399,1.008487,1.005848,0.599157,1.006687,0.4751833,1.010139,1.006215,1.027788
min,0.0,0.2746966,-2.434976,-1.742508,0.0006259872,-1.743944,0.1386017,-2.969725,-1.741237,0.0
25%,0.0,0.5907533,-0.7363746,-0.8719308,0.5762637,-0.8717909,0.6788095,-0.6882352,-0.8680962,0.0
50%,1.0,0.8535544,0.0009198132,0.0009714414,0.8915848,-0.001158754,0.8942697,-0.001015666,0.0007152493,1.086538
75%,1.0,1.236592,0.7391881,0.8693294,1.293202,0.8711392,1.17074,0.6871941,0.8694214,2.173076
max,1.0,8.711782,2.434868,1.743236,9.900929,1.743257,8.38261,2.969674,1.741454,2.173076


In [11]:
data.iloc[:,10:20].describe()

Unnamed: 0,jet 2 pt,jet 2 eta,jet 2 phi,jet 2 b-tag,jet 3 pt,jet 3 eta,jet 3 phi,jet 3 b-tag,jet 4 pt,jet 4 eta
count,1100000.0,1100000.0,1100000.0,1100000.0,1100000.0,1100000.0,1100000.0,1100000.0,1100000.0,1100000.0
mean,0.9930826,0.0016941,0.0002226422,1.000497,0.992406,0.002097878,6.820779e-05,0.9992596,0.9860794,-0.0007935666
std,0.5000212,1.008805,1.006862,1.049272,0.486773,1.008249,1.005805,1.193635,0.5057663,1.007891
min,0.1889811,-2.91309,-1.742372,0.0,0.2636076,-2.729663,-1.742069,0.0,0.3653542,-2.497265
25%,0.6573421,-0.6925291,-0.8707339,0.0,0.6512039,-0.6970776,-0.8711343,0.0,0.617524,-0.715023
50%,0.8906413,0.001031646,0.000351499,0.0,0.8977762,0.00290364,-0.0007519117,0.0,0.8679899,-0.0004606903
75%,1.202001,0.6965352,0.8715371,2.214872,1.2225,0.7019747,0.87084,2.548224,1.222147,0.7141017
max,11.64708,2.91321,1.743175,2.214872,8.864838,2.730009,1.742884,2.548224,11.62123,2.498009


In [12]:
types={}
for i in data.columns:
    types[i]=data[i].dtype

In [13]:
types

{'jet 1 b-tag': dtype('float64'),
 'jet 1 eta': dtype('float64'),
 'jet 1 phi': dtype('float64'),
 'jet 1 pt': dtype('float64'),
 'jet 2 b-tag': dtype('float64'),
 'jet 2 eta': dtype('float64'),
 'jet 2 phi': dtype('float64'),
 'jet 2 pt': dtype('float64'),
 'jet 3 b-tag': dtype('float64'),
 'jet 3 eta': dtype('float64'),
 'jet 3 phi': dtype('float64'),
 'jet 3 pt': dtype('float64'),
 'jet 4 b-tag': dtype('float64'),
 'jet 4 eta': dtype('float64'),
 'jet 4 phi': dtype('float64'),
 'jet 4 pt': dtype('float64'),
 'label': dtype('float64'),
 'lepton eta': dtype('float64'),
 'lepton pT': dtype('float64'),
 'lepton phi': dtype('float64'),
 'm_bb': dtype('float64'),
 'm_jj': dtype('float64'),
 'm_jjj': dtype('float64'),
 'm_jlv': dtype('float64'),
 'm_lv': dtype('float64'),
 'm_wbb': dtype('float64'),
 'm_wwbb': dtype('float64'),
 'missing energy magnitude': dtype('float64'),
 'missing energy phi': dtype('float64')}

#### Splitting the data into features and output as well test data.

In [14]:
y = np.array(data.iloc[:,0])
x = np.array(data.iloc[:,1:])

In [15]:
x_test = np.array(test_data.iloc[:,1:])
y_test = np.array(test_data.iloc[:,0])

## 1.	Pick 3 or more different architectures (add/subtract layers+neurons) and run the model + score. 

#### 1.1 Architecture1: 1 hidden layer with 100 nodes - Base Model

We will begin with our starter model with 1 hidden layer that has 100 neurons and signmoid activation functions

In [16]:
test_results = {}
arch_name = "sigmoid-1layer-100nodes"
model = Sequential()

#hidden layer 1: 100 nodes
model.add(Dense(100, input_dim=x.shape[1], kernel_initializer='uniform'))
model.add(Activation('sigmoid'))
model.add(Dropout(0.10))

#output layer: 1 node
model.add(Dense(1, kernel_initializer='uniform'))
model.add(Activation('sigmoid'))

#optimizer
sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)

#compile
model.compile(loss='binary_crossentropy', metrics=['accuracy'], optimizer=sgd)
model.fit(x, y, epochs=5, batch_size=1000)

score = roc_auc_score(y_test,model.predict(x_test))

print("")
print("ROC AUC Score for " + arch_name + " is: " + str(score))
test_results[arch_name] = score

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5

ROC AUC Score for sigmoid-1layer-100nodes is: 0.7297741518132652


#### 1.2 Architecture 2: 2 hidden layers with 100 nodes each

In [17]:
arch_name = "sigmoid-2layers-100nodes"
model = Sequential()

#hidden layer 1: 100 nodes
model.add(Dense(100, input_dim=x.shape[1], kernel_initializer='uniform'))
model.add(Activation('sigmoid'))
model.add(Dropout(0.10))

#hidden layer 2: 100 nodes
model.add(Dense(100, input_dim=x.shape[1], kernel_initializer='uniform'))
model.add(Activation('sigmoid'))
model.add(Dropout(0.10))

#output layer: 1 node
model.add(Dense(1, kernel_initializer='uniform'))
model.add(Activation('sigmoid'))

#optimizer
sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)

#compile
model.compile(loss='binary_crossentropy', metrics=['accuracy'], optimizer=sgd)
model.fit(x, y, epochs=5, batch_size=1000)

score = roc_auc_score(y_test,model.predict(x_test))

print("")
print("ROC AUC Score for " + arch_name + " is: " + str(score))
test_results[arch_name] = score

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5

ROC AUC Score for sigmoid-2layers-100nodes is: 0.6833962416769679


#### 1.3 Architecture 3: 3 hidden layers with 100 nodes each

In [18]:
arch_name = "sigmoid-3layers-100nodes"
model = Sequential()

#hidden layer 1: 100 nodes
model.add(Dense(100, input_dim=x.shape[1], kernel_initializer='uniform'))
model.add(Activation('sigmoid'))
model.add(Dropout(0.10))

#hidden layer 2: 100 nodes
model.add(Dense(100, input_dim=x.shape[1], kernel_initializer='uniform'))
model.add(Activation('sigmoid'))
model.add(Dropout(0.10))

#hidden layer 3: 100 nodes
model.add(Dense(100, input_dim=x.shape[1], kernel_initializer='uniform'))
model.add(Activation('sigmoid'))
model.add(Dropout(0.10))

#output layer: 1 node
model.add(Dense(1, kernel_initializer='uniform'))
model.add(Activation('sigmoid'))

#optimizer
sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)

#compile
model.compile(loss='binary_crossentropy', metrics=['accuracy'], optimizer=sgd)
model.fit(x, y, epochs=5, batch_size=1000)

score = roc_auc_score(y_test,model.predict(x_test))

print("")
print("ROC AUC Score for " + arch_name + " is: " + str(score))
test_results[arch_name] = score

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5

ROC AUC Score for sigmoid-3layers-100nodes is: 0.5547979177295054


## 2.	With those same 3 architectures, run the SAME architecture but with 2 different (from sigmoid) activation functions.  Google the Keras documentation for a look at different available activations. 

### Architecture 2 using ReLU

#### 2.1.1 Architecture1: 1 hidden layer with 100 nodes using ReLU

In [19]:
arch_name = "relu-1layer-100nodes"
model = Sequential()

#hidden layer 1: 100 nodes
model.add(Dense(100, input_dim=x.shape[1], kernel_initializer='uniform'))
model.add(Activation('relu'))
model.add(Dropout(0.10))

#output layer: 1 node
model.add(Dense(1, kernel_initializer='uniform'))
model.add(Activation('sigmoid'))

#optimizer
sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)

#compile
model.compile(loss='binary_crossentropy', metrics=['accuracy'], optimizer=sgd)
model.fit(x, y, epochs=5, batch_size=1000)

score = roc_auc_score(y_test,model.predict(x_test))

print("")
print("ROC AUC Score for " + arch_name + " is: " + str(score))
test_results[arch_name] = score

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5

ROC AUC Score for relu-1layer-100nodes is: 0.7888676450761396


#### 2.1.2 Architecture 2: 2 hidden layers with 100 nodes each using ReLU

In [20]:
arch_name = "relu-2layers-100nodes"
model = Sequential()

#hidden layer 1: 100 nodes
model.add(Dense(100, input_dim=x.shape[1], kernel_initializer='uniform'))
model.add(Activation('relu'))
model.add(Dropout(0.10))

#hidden layer 2: 100 nodes
model.add(Dense(100, input_dim=x.shape[1], kernel_initializer='uniform'))
model.add(Activation('relu'))
model.add(Dropout(0.10))

#output layer: 1 node
model.add(Dense(1, kernel_initializer='uniform'))
model.add(Activation('sigmoid'))

#optimizer
sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)

#compile
model.compile(loss='binary_crossentropy', metrics=['accuracy'], optimizer=sgd)
model.fit(x, y, epochs=5, batch_size=1000)

score = roc_auc_score(y_test,model.predict(x_test))

print("")
print("ROC AUC Score for " + arch_name + " is: " + str(score))
test_results[arch_name] = score

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5

ROC AUC Score for relu-2layers-100nodes is: 0.808768326293547


#### 2.1.3 Architecture 3: 3 hidden layers with 100 nodes each using ReLU

In [21]:
arch_name = "relu-3layers-100nodes-batch1000"
model = Sequential()

#hidden layer 1: 100 nodes
model.add(Dense(100, input_dim=x.shape[1], kernel_initializer='uniform'))
model.add(Activation('relu'))
model.add(Dropout(0.10))

#hidden layer 2: 100 nodes
model.add(Dense(100, input_dim=x.shape[1], kernel_initializer='uniform'))
model.add(Activation('relu'))
model.add(Dropout(0.10))

#hidden layer 3: 100 nodes
model.add(Dense(100, input_dim=x.shape[1], kernel_initializer='uniform'))
model.add(Activation('relu'))
model.add(Dropout(0.10))

#output layer: 1 node
model.add(Dense(1, kernel_initializer='uniform'))
model.add(Activation('sigmoid'))

#optimizer
sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)

#compile
model.compile(loss='binary_crossentropy', metrics=['accuracy'], optimizer=sgd)
model.fit(x, y, epochs=5, batch_size=1000)

score = roc_auc_score(y_test,model.predict(x_test))

print("")
print("ROC AUC Score for " + arch_name + " is: " + str(score))
test_results[arch_name] = score

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5

ROC AUC Score for relu-3layers-100nodes-batch1000 is: 0.8129594421174531


### Architecture 3 using Tanh

#### 2.2.1 Architecture1: 1 hidden layer with 100 nodes using Tanh

In [22]:
arch_name = "tanh-1layer-100nodes"
model = Sequential()

#hidden layer 1: 100 nodes
model.add(Dense(100, input_dim=x.shape[1], kernel_initializer='uniform'))
model.add(Activation('tanh'))
model.add(Dropout(0.10))

#output layer: 1 node
model.add(Dense(1, kernel_initializer='uniform'))
model.add(Activation('sigmoid'))

#optimizer
sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)

#compile
model.compile(loss='binary_crossentropy', metrics=['accuracy'], optimizer=sgd)
model.fit(x, y, epochs=5, batch_size=1000)

score = roc_auc_score(y_test,model.predict(x_test))

print("")
print("ROC AUC Score for " + arch_name + " is: " + str(score))
test_results[arch_name] = score

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5

ROC AUC Score for tanh-1layer-100nodes is: 0.7589372184514009


#### 2.2.2 Architecture 2: 2 hidden layers with 100 nodes each using tanh

In [23]:
arch_name = "tanh-2layers-50nodes"
model = Sequential()

#hidden layer 1: 100 nodes
model.add(Dense(100, input_dim=x.shape[1], kernel_initializer='uniform'))
model.add(Activation('tanh'))
model.add(Dropout(0.10))

#hidden layer 2: 100 nodes
model.add(Dense(100, input_dim=x.shape[1], kernel_initializer='uniform'))
model.add(Activation('tanh'))
model.add(Dropout(0.10))

#output layer: 1 node
model.add(Dense(1, kernel_initializer='uniform'))
model.add(Activation('sigmoid'))

#optimizer
sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)

#compile
model.compile(loss='binary_crossentropy', metrics=['accuracy'], optimizer=sgd)
model.fit(x, y, epochs=5, batch_size=1000)

score = roc_auc_score(y_test,model.predict(x_test))

print("")
print("ROC AUC Score for " + arch_name + " is: " + str(score))
test_results[arch_name] = score

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5

ROC AUC Score for tanh-2layers-50nodes is: 0.7684203916854455


#### 2.2.3 Architecture 3: 3 hidden layers with 100 nodes each using tanh

In [24]:
arch_name = "tanh-3layers-100nodes"
model = Sequential()

#hidden layer 1: 100 nodes
model.add(Dense(100, input_dim=x.shape[1], kernel_initializer='uniform'))
model.add(Activation('tanh'))
model.add(Dropout(0.10))

#hidden layer 2: 100 nodes
model.add(Dense(100, input_dim=x.shape[1], kernel_initializer='uniform'))
model.add(Activation('tanh'))
model.add(Dropout(0.10))

#hidden layer 3: 100 nodes
model.add(Dense(100, input_dim=x.shape[1], kernel_initializer='uniform'))
model.add(Activation('tanh'))
model.add(Dropout(0.10))

#output layer: 1 node
model.add(Dense(1, kernel_initializer='uniform'))
model.add(Activation('sigmoid'))

#optimizer
sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)

#compile
model.compile(loss='binary_crossentropy', metrics=['accuracy'], optimizer=sgd)
model.fit(x, y, epochs=5, batch_size=1000)

score = roc_auc_score(y_test,model.predict(x_test))

print("")
print("ROC AUC Score for " + arch_name + " is: " + str(score))
test_results[arch_name] = score

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5

ROC AUC Score for tanh-3layers-100nodes is: 0.7707491172777148


### Compare all tests

In [25]:
testSeries = pd.Series(test_results)
testSeries.sort_values(ascending=False, inplace=True)
testSeries

relu-3layers-100nodes-batch1000    0.812959
relu-2layers-100nodes              0.808768
relu-1layer-100nodes               0.788868
tanh-3layers-100nodes              0.770749
tanh-2layers-50nodes               0.768420
tanh-1layer-100nodes               0.758937
sigmoid-1layer-100nodes            0.729774
sigmoid-2layers-100nodes           0.683396
sigmoid-3layers-100nodes           0.554798
dtype: float64

### 3.	Take your best model from parts 1&2 and vary the batch size by at least 2 orders of magnitude

#### Our best model from previous tests was using ReLU activation functions with 3 layers and 100 neurons each. We will now experiment changing the batch sizes of 100, 10000, and 100000 - and compare results

In [27]:
test_results_batch = {}

test_results_batch["relu-3layers-100nodes-batch1000"] = testSeries[0]

arch_name = "relu-3layers-100nodes-batch10K"
model = Sequential()

#hidden layer 1: 100 nodes
model.add(Dense(100, input_dim=x.shape[1], kernel_initializer='uniform'))
model.add(Activation('relu'))
model.add(Dropout(0.10))

#hidden layer 2: 100 nodes
model.add(Dense(100, input_dim=x.shape[1], kernel_initializer='uniform'))
model.add(Activation('relu'))
model.add(Dropout(0.10))

#hidden layer 3: 100 nodes
model.add(Dense(100, input_dim=x.shape[1], kernel_initializer='uniform'))
model.add(Activation('relu'))
model.add(Dropout(0.10))

#output layer: 1 node
model.add(Dense(1, kernel_initializer='uniform'))
model.add(Activation('sigmoid'))

#optimizer
sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)

#compile
model.compile(loss='binary_crossentropy', metrics=['accuracy'], optimizer=sgd)
model.fit(x, y, epochs=5, batch_size=10000)

score = roc_auc_score(y_test,model.predict(x_test))

print("")
print("ROC AUC Score for " + arch_name + " is: " + str(score))
test_results_batch[arch_name] = score

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5

ROC AUC Score for relu-3layers-100nodes-batch10K is: 0.7506488569395502


In [28]:
arch_name = "relu-3layers-100nodes-batch100K"
model = Sequential()

#hidden layer 1: 100 nodes
model.add(Dense(100, input_dim=x.shape[1], kernel_initializer='uniform'))
model.add(Activation('relu'))
model.add(Dropout(0.10))

#hidden layer 2: 100 nodes
model.add(Dense(100, input_dim=x.shape[1], kernel_initializer='uniform'))
model.add(Activation('relu'))
model.add(Dropout(0.10))

#hidden layer 3: 100 nodes
model.add(Dense(100, input_dim=x.shape[1], kernel_initializer='uniform'))
model.add(Activation('relu'))
model.add(Dropout(0.10))

#output layer: 1 node
model.add(Dense(1, kernel_initializer='uniform'))
model.add(Activation('sigmoid'))

#optimizer
sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)

#compile
model.compile(loss='binary_crossentropy', metrics=['accuracy'], optimizer=sgd)
model.fit(x, y, epochs=5, batch_size=100000)

score = roc_auc_score(y_test,model.predict(x_test))

print("")
print("ROC AUC Score for " + arch_name + " is: " + str(score))
test_results_batch[arch_name] = score

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5

ROC AUC Score for relu-3layers-100nodes-batch100K is: 0.5635010057614114


In [29]:
arch_name = "relu-3layers-100nodes-batch100"
model = Sequential()

#hidden layer 1: 100 nodes
model.add(Dense(100, input_dim=x.shape[1], kernel_initializer='uniform'))
model.add(Activation('relu'))
model.add(Dropout(0.10))

#hidden layer 2: 100 nodes
model.add(Dense(100, input_dim=x.shape[1], kernel_initializer='uniform'))
model.add(Activation('relu'))
model.add(Dropout(0.10))

#hidden layer 3: 100 nodes
model.add(Dense(100, input_dim=x.shape[1], kernel_initializer='uniform'))
model.add(Activation('relu'))
model.add(Dropout(0.10))

#output layer: 1 node
model.add(Dense(1, kernel_initializer='uniform'))
model.add(Activation('sigmoid'))

#optimizer
sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)

#compile
model.compile(loss='binary_crossentropy', metrics=['accuracy'], optimizer=sgd)
model.fit(x, y, epochs=5, batch_size=100)

score = roc_auc_score(y_test,model.predict(x_test))

print("")
print("ROC AUC Score for " + arch_name + " is: " + str(score))
test_results_batch[arch_name] = score

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5

ROC AUC Score for relu-3layers-100nodes-batch100 is: 0.8058374979328562


In [30]:
testSeries = pd.Series(test_results_batch)
testSeries.sort_values(ascending=False, inplace=True)
testSeries

relu-3layers-100nodes-batch1000    0.812959
relu-3layers-100nodes-batch100     0.805837
relu-3layers-100nodes-batch10K     0.750649
relu-3layers-100nodes-batch100K    0.563501
dtype: float64

### 4.	Take your best model (score) from parts 1&2 and use 3 different kernel initializers. Use a reasonable batch size.

#### In this section, we will use our best model (ReLU activation function, with 3 hidden layers of 100 neurons each w/batch size of 1000 inputs), and experiment with kernel initilizers such as 'random_uniform', 'glorot_uniform' and 'he_uniform' 

In [31]:
test_results_kernals = {}

test_results_kernals["relu-3layers-100nodes-batch1000-uniform"] = testSeries[0]

arch_name = "relu-3layers-100nodes-batch1000-randomuniform"
model = Sequential()

#hidden layer 1: 100 nodes
model.add(Dense(100, input_dim=x.shape[1], kernel_initializer='random_uniform'))
model.add(Activation('relu'))
model.add(Dropout(0.10))

#hidden layer 2: 100 nodes
model.add(Dense(100, input_dim=x.shape[1], kernel_initializer='random_uniform'))
model.add(Activation('relu'))
model.add(Dropout(0.10))

#hidden layer 3: 100 nodes
model.add(Dense(100, input_dim=x.shape[1], kernel_initializer='random_uniform'))
model.add(Activation('relu'))
model.add(Dropout(0.10))

#output layer: 1 node
model.add(Dense(1, kernel_initializer='uniform'))
model.add(Activation('sigmoid'))

#optimizer
sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)

#compile
model.compile(loss='binary_crossentropy', metrics=['accuracy'], optimizer=sgd)
model.fit(x, y, epochs=5, batch_size=1000)

score = roc_auc_score(y_test,model.predict(x_test))

print("")
print("ROC AUC Score for " + arch_name + " is: " + str(score))
test_results_kernals[arch_name] = score

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5

ROC AUC Score for relu-3layers-100nodes-batch1000-randomuniform is: 0.8115728379223576


In [32]:
arch_name = "relu-3layers-100nodes-batch1000-glorotuniform"
model = Sequential()

#hidden layer 1: 100 nodes
model.add(Dense(100, input_dim=x.shape[1], kernel_initializer='glorot_uniform'))
model.add(Activation('relu'))
model.add(Dropout(0.10))

#hidden layer 2: 100 nodes
model.add(Dense(100, input_dim=x.shape[1], kernel_initializer='glorot_uniform'))
model.add(Activation('relu'))
model.add(Dropout(0.10))

#hidden layer 3: 100 nodes
model.add(Dense(100, input_dim=x.shape[1], kernel_initializer='glorot_uniform'))
model.add(Activation('relu'))
model.add(Dropout(0.10))

#output layer: 1 node
model.add(Dense(1, kernel_initializer='uniform'))
model.add(Activation('sigmoid'))

#optimizer
sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)

#compile
model.compile(loss='binary_crossentropy', metrics=['accuracy'], optimizer=sgd)
model.fit(x, y, epochs=5, batch_size=1000)

score = roc_auc_score(y_test,model.predict(x_test))

print("")
print("ROC AUC Score for " + arch_name + " is: " + str(score))
test_results_kernals[arch_name] = score

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5

ROC AUC Score for relu-3layers-100nodes-batch1000-glorotuniform is: 0.8108049065358367


In [33]:
arch_name = "relu-3layers-100nodes-batch1000-heuniform"
model = Sequential()

#hidden layer 1: 100 nodes
model.add(Dense(100, input_dim=x.shape[1], kernel_initializer='he_uniform'))
model.add(Activation('relu'))
model.add(Dropout(0.10))

#hidden layer 2: 100 nodes
model.add(Dense(100, input_dim=x.shape[1], kernel_initializer='he_uniform'))
model.add(Activation('relu'))
model.add(Dropout(0.10))

#hidden layer 3: 100 nodes
model.add(Dense(100, input_dim=x.shape[1], kernel_initializer='he_uniform'))
model.add(Activation('relu'))
model.add(Dropout(0.10))

#output layer: 1 node
model.add(Dense(1, kernel_initializer='uniform'))
model.add(Activation('sigmoid'))

#optimizer
sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)

#compile
model.compile(loss='binary_crossentropy', metrics=['accuracy'], optimizer=sgd)
model.fit(x, y, epochs=5, batch_size=1000)

score = roc_auc_score(y_test,model.predict(x_test))

print("")
print("ROC AUC Score for " + arch_name + " is: " + str(score))
test_results_kernals[arch_name] = score

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5

ROC AUC Score for relu-3layers-100nodes-batch1000-heuniform is: 0.8059411961474043


In [34]:
testSeries = pd.Series(test_results_kernals)
testSeries.sort_values(ascending=False, inplace=True)
testSeries

relu-3layers-100nodes-batch1000-uniform          0.812959
relu-3layers-100nodes-batch1000-randomuniform    0.811573
relu-3layers-100nodes-batch1000-glorotuniform    0.810805
relu-3layers-100nodes-batch1000-heuniform        0.805941
dtype: float64

### 5.	Take your best results from #3 and try 3 different optimizers. 

#### Our best model userd SGD optimizer in the previous experiment. In this section, we will use our best model to experiment with 3 different optimizers such as adagrad, adam, rmsprop to compare the model peformances

In [35]:
test_results_optimizers = {}

test_results_optimizers["relu-3layers-100nodes-batch1000-glorotuniform-sgd"] = testSeries[0]
test_results_optimizers 

{'relu-3layers-100nodes-batch1000-glorotuniform-sgd': 0.8129594421174531}

In [36]:
arch_name = "relu-3layers-100nodes-batch1000-glorotuniform-adagrad"
model = Sequential()

#hidden layer 1: 100 nodes
model.add(Dense(100, input_dim=x.shape[1], kernel_initializer='glorot_uniform'))
model.add(Activation('relu'))
model.add(Dropout(0.10))

#hidden layer 2: 100 nodes
model.add(Dense(100, input_dim=x.shape[1], kernel_initializer='glorot_uniform'))
model.add(Activation('relu'))
model.add(Dropout(0.10))

#hidden layer 3: 100 nodes
model.add(Dense(100, input_dim=x.shape[1], kernel_initializer='glorot_uniform'))
model.add(Activation('relu'))
model.add(Dropout(0.10))

#output layer: 1 node
model.add(Dense(1, kernel_initializer='uniform'))
model.add(Activation('sigmoid'))

#optimizer
# sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)

adagrad = Adagrad(lr=0.01, epsilon=None, decay=1e-6)

#compile
model.compile(loss='binary_crossentropy', metrics=['accuracy'], optimizer=adagrad)
model.fit(x, y, epochs=5, batch_size=1000)

score = roc_auc_score(y_test,model.predict(x_test))

print("")
print("ROC AUC Score for " + arch_name + " is: " + str(score))
test_results_optimizers[arch_name] = score

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5

ROC AUC Score for relu-3layers-100nodes-batch1000-glorotuniform-adagrad is: 0.7796285956280546


In [37]:
arch_name = "relu-3layers-100nodes-batch1000-glorotuniform-adam"
model = Sequential()

#hidden layer 1: 100 nodes
model.add(Dense(100, input_dim=x.shape[1], kernel_initializer='glorot_uniform'))
model.add(Activation('relu'))
model.add(Dropout(0.10))

#hidden layer 2: 100 nodes
model.add(Dense(100, input_dim=x.shape[1], kernel_initializer='glorot_uniform'))
model.add(Activation('relu'))
model.add(Dropout(0.10))

#hidden layer 3: 100 nodes
model.add(Dense(100, input_dim=x.shape[1], kernel_initializer='glorot_uniform'))
model.add(Activation('relu'))
model.add(Dropout(0.10))

#output layer: 1 node
model.add(Dense(1, kernel_initializer='uniform'))
model.add(Activation('sigmoid'))

#optimizer
# sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)
adam = Adam(lr=0.1, beta_1=0., epsilon=None, decay=1e-6)

#compile
model.compile(loss='binary_crossentropy', metrics=['accuracy'], optimizer=adam)
model.fit(x, y, epochs=5, batch_size=1000)

score = roc_auc_score(y_test,model.predict(x_test))

print("")
print("ROC AUC Score for " + arch_name + " is: " + str(score))
test_results_optimizers[arch_name] = score

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5

ROC AUC Score for relu-3layers-100nodes-batch1000-glorotuniform-adam is: 0.5


In [38]:
arch_name = "relu-3layers-100nodes-batch1000-glorotuniform-rmsprop"
model = Sequential()

#hidden layer 1: 100 nodes
model.add(Dense(100, input_dim=x.shape[1], kernel_initializer='glorot_uniform'))
model.add(Activation('relu'))
model.add(Dropout(0.10))

#hidden layer 2: 100 nodes
model.add(Dense(100, input_dim=x.shape[1], kernel_initializer='glorot_uniform'))
model.add(Activation('relu'))
model.add(Dropout(0.10))

#hidden layer 3: 100 nodes
model.add(Dense(100, input_dim=x.shape[1], kernel_initializer='glorot_uniform'))
model.add(Activation('relu'))
model.add(Dropout(0.10))

#output layer: 1 node
model.add(Dense(1, kernel_initializer='uniform'))
model.add(Activation('sigmoid'))

#optimizer
# sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)
# adamax = Adamax(lr=0.1, beta_1=0.9, epsilon=None, decay=1e-6)
#adadelta = Adadelta(lr=0.1, rho=0.95, epsilon=None, decay=1e-6)
rmsprop = RMSprop(lr=0.001, rho=0.9, epsilon=None, decay=0.0)

#compile
model.compile(loss='binary_crossentropy', metrics=['accuracy'], optimizer=rmsprop)
model.fit(x, y, epochs=5, batch_size=1000)

score = roc_auc_score(y_test,model.predict(x_test))

print("")
print("ROC AUC Score for " + arch_name + " is: " + str(score))
test_results_optimizers[arch_name] = score

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5

ROC AUC Score for relu-3layers-100nodes-batch1000-glorotuniform-rmsprop is: 0.8093321229689416


In [39]:
testSeries = pd.Series(test_results_optimizers)
testSeries.sort_values(ascending=False, inplace=True)
testSeries

relu-3layers-100nodes-batch1000-glorotuniform-sgd        0.812959
relu-3layers-100nodes-batch1000-glorotuniform-rmsprop    0.809332
relu-3layers-100nodes-batch1000-glorotuniform-adagrad    0.779629
relu-3layers-100nodes-batch1000-glorotuniform-adam       0.500000
dtype: float64

### 6.	Take all that you’ve learned so far and give your best shot at producing a score. 

#### After experimenting with various parameters, we have achieved best peformance with ReLU activation function for 3 hidden layers with 100 neurons each, with a batch size of 1000, SGD optimizer and glorot_uniform kernel initializer. We will now experiment changing some of these key parameters and find out if there will be further improvement in the model performance.

In [40]:
arch_name = "relu-3layers-100nodes-batch1000-glorotuniform-sgd-THEBEST"
model = Sequential()

#hidden layer 1: 200 nodes
model.add(Dense(200, input_dim=x.shape[1], kernel_initializer='glorot_uniform'))
model.add(Activation('relu'))
model.add(Dropout(0.10))

#hidden layer 2: 200 nodes
model.add(Dense(200, input_dim=x.shape[1], kernel_initializer='glorot_uniform'))
model.add(Activation('relu'))
model.add(Dropout(0.10))

#hidden layer 3: 200 nodes
model.add(Dense(200, input_dim=x.shape[1], kernel_initializer='glorot_uniform'))
model.add(Activation('relu'))
model.add(Dropout(0.10))

#hidden layer 4: 200 nodes
model.add(Dense(200, input_dim=x.shape[1], kernel_initializer='glorot_uniform'))
model.add(Activation('relu'))
model.add(Dropout(0.10))

#output layer: 1 node
model.add(Dense(1, kernel_initializer='uniform'))
model.add(Activation('sigmoid'))

#optimizer
sgd = SGD(lr=0.15, decay=1e-8, momentum=0.95, nesterov=True)

#compile
model.compile(loss='binary_crossentropy', metrics=['accuracy'], optimizer=sgd)
model.fit(x, y, epochs=20, batch_size=1000)

score = roc_auc_score(y_test,model.predict(x_test))

print("")
print("ROC AUC Score for " + arch_name + " is: " + str(score))
# test_results_optimizers[arch_name] = score

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20

ROC AUC Score for relu-3layers-100nodes-batch1000-glorotuniform-sgd-THEBEST is: 0.8424720856506737


### 10 points - Q1: What was the effect of adding more layers/neurons. 

During this exercise, we have gamed with our best model by adjusting layers and neurons. We have tested the model with 100 and 200 neurons in each of the layers, and also tested with 3, 4 and 5 hidden layers. We found that adding additional neurons to the layers increased the ROC AUC Score by about 2% (from ~0.8127 to ~0.8267), and additional 1% increase by adding 4th layer (from ~0.8267 to ~0.83) - with number of epochs at 10. However, the score slighly decreased by adding 5th layer. We reran our final model with 20 epochs and achieved the score of 0.8428. We conclude 4 layers with 200 neurons each yielded best results with all other parameters unchanged.

### 10 points - Q2: Which parameters gave you the best result and why (in your opinion) did they work.

We have achieved the best results with following key parameters:

**Activation function for Hidden Layers:** ReLU

**Kernel Initializer:** glorot_uniform

**Optimizer:** SGD (with learning rate=0.15; decay=1e-8, momentum=0.95) (we have increased learning rate from 0.1 to 0.15, decreased decay from 1e-6 to 1e-8 and increased momentum from 0.9 to 0.95 compared to the base model)

**Number of Epochs:** We started the base model with 5 epochs, and gradually increased to 10 and 20. We have observed that the score significantly increased as we increase the epochs.

**Batch size:** We tested the batch sizes of 100, 1000, 10000 and 100000. We have observed the best scores with the batch size at 1000.

### 20 points - Q3: For #6, how did you decide that your model was ‘done’

We have started our base model with 'sigmoid' activation function with 'uniform' Kernel Initializer, 'SGD' Optimizer with 1 hidden layer, with 5 epochs and a batch size of 1000. We have used ROC AUC score as our metric to measure the performance of the model. When we gamed changing these parameters, we have observed the ROC AUC score ranging from 0.5 to 0.7 with our base model. We continued to tune the paramerts as decribed in the previous sections, we have certainly improved the model performance significantly with our BEST model (activation funtion: ReLU, Optimizer: SGG, Kernal Initializer: glorot_uniform, epochs:20, batch size:1000) with ROC AUC score of 0.8428. The improvement of the scores can still be possible with further tuning of number of epochs, batch size and other parameters we did not test. We conclude the exercise given the time and resource constraints, and achieved satisfactory results, and we call it done.

### References

[1] Higgs Boson Experiment (Wikipedia) https://en.wikipedia.org/wiki/Higgs_boson