# Hyper Parameter Tuning  

Unless you are lucky, your first model architecture won't be perfect. This milestone is attempting to find the optimal parameters for your neural network. Below, I will try with different layers, different neuron counts, and different activation functions.  

First, I am going to try different amounts of neurons in the layers. From there, I will take the winner and change the activation function. Once I have that picked out, I will change the optimizer.

### Layers and Neurons  
We can adjust the complexity of the model but varying the number of layers and the neurons inside of them. Our only limitation is that our input layer has to match our data and our output layer needs to give us a number.  

One thing to keep in mind is that the more complexity of your model the more data it will take to generalize it. This is because the network has so many paths through it that it will give the exact result of game versus having multiple games go through that same path.

### Activation Function  
The activation function determines how each neuron is adjusted based on the error. In our case, if the expected result was 5 and the actual was 3 that means the activation function will pass in the error into its function and then adjust the neuron accordingly.

We started with `relu` and will try `sigmoid` and `softmax`. 

**RELU**: Rectified Linear Unit. It is a very quickly converging. This function will return a positive value. Equation: $$relu(x) = max(x,0)$$  
**Softmax**: This function will also return a postivie value but from 0 to 1. It converts a real vector to a vector of categorical probabilities. Equation: $$softmax(x_i) = \frac{exp(x_i)}{\sum_{0}^{k}exp(x_k)}$$  
**Sigmoid**: This function will also return a postive value from 0 to 1. It is equivalent to a 2-element Softmax. Equation: $$sigmoid(x) = \frac{1}{(1 + exp(-x))}$$  

Available Keras activation functions: [https://keras.io/api/layers/activations/](https://keras.io/api/layers/activations/).  

### Optimizer  
**RMSProp** [Link](https://keras.io/api/optimizers/rmsprop/): Maintains a moving average of the squares of the gradients and divides the gradient by the root of this average.  
**SDG (stochastic gradient descent)** [Link](https://keras.io/api/optimizers/sgd/): A basic gradient descent optimizer.  
**Adam** [Link](https://keras.io/api/optimizers/adam/): Adam is based on SDG but uses adaptive estimation of the first and second order moments.  
**Adamax** [Link](https://keras.io/api/optimizers/adamax/): Adamax is a variation of Adam based on the infinity norm.    

Available Keras optimizers: [https://keras.io/api/optimizers/](https://keras.io/api/optimizers/)


## Loading the data

In [None]:
from __future__ import absolute_import, division, print_function

import tensorflow as tf
from tensorflow import keras

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [None]:
#Load the data set from the last milestone 1
column_names = ['Date','HomeTeam','HomeScore','AwayTeam','AwayScore',
                'HomeScoreAverage','HomeDefenseAverage','AwayScoreAverage','AwayDefenseAverage',
                'Result']

games_csv = 'https://liveproject-resources.s3.amazonaws.com/other/deeplearningbasketballscores/Games-Calculated.csv'
all_data = pd.read_csv(games_csv, header=None, names=column_names)

# Drop the columns that we are NOT going to train on
all_data.drop(['Date','HomeTeam','HomeScore','AwayTeam','AwayScore'], axis=1, inplace=True)
all_data.tail()

#Break it into 80/20 splits
train = all_data.sample(frac=0.8, random_state=0)
test = all_data.drop(train.index)
print('Training Size: %s' % train.shape[0])
print('Testing Size: %s' % test.shape[0])

#Create the labels
train_labels = train.pop('Result')
test_labels = test.pop('Result')

# Normalize the data
mean = train.mean(axis=0)
train_data = train - mean
std = train_data.std(axis=0)
train_data /= std

test_data = test - mean
test_data /= std

Training Size: 16128
Testing Size: 4032


In [None]:
# This method will be used in place of the normal output. This is cleaner in my opinion
class PrintDoc(keras.callbacks.Callback):
  def on_epoch_end(self, epoch, logs):
    if epoch % 10 == 0: print('')
    print('.', end='')

## Milestone 3 Network  

First, I am going to grab the network I created in milestone 3 and list the mean absolute error from the testing data.

In [None]:
def Build_Model_Milestone3():
  model = keras.models.Sequential([
    keras.layers.Dense(32, activation='relu', input_shape=[4]),
    keras.layers.Dense(32, activation='relu'),
    keras.layers.Dense(1)                                   
  ])
  
  opt = keras.optimizers.RMSprop()
  m = [
       keras.metrics.MeanAbsoluteError(),
       keras.metrics.Accuracy(),
       keras.metrics.MeanSquaredError()
  ]
  l = keras.losses.MeanSquaredError()
  
  model.compile(loss=l, optimizer=opt, metrics=m)
  return model

In [None]:
m3 = Build_Model_Milestone3()
m3_history = m3.fit(train_data, train_labels, epochs=100, validation_split=0.2, verbose=0, callbacks=[PrintDoc()])


..........
..........
..........
..........
..........
..........
..........
..........
..........
..........

In [None]:
_, m3_mean_absolute_error, _, _ = m3.evaluate(test_data, test_labels,verbose=0)
print('Milestone 3 model: %s' % m3_mean_absolute_error)

Milestone 3 model: 7.866069316864014


## Less Neutrons  

I started with 32 neutrons in each layer. I am going to adjust that down to 24, 12, and 8.

In [None]:
def Build_Model_24Neutrons():
  model = keras.models.Sequential([
    keras.layers.Dense(24, activation='relu', input_shape=[4]),
    keras.layers.Dense(24, activation='relu'),
    keras.layers.Dense(1)                                   
  ])
  
  opt = keras.optimizers.RMSprop()
  m = [
       keras.metrics.MeanAbsoluteError(),
       keras.metrics.Accuracy(),
       keras.metrics.MeanSquaredError()
  ]
  l = keras.losses.MeanSquaredError()
  
  model.compile(loss=l, optimizer=opt, metrics=m)
  return model

def Build_Model_12Neutrons():
  model = keras.models.Sequential([
    keras.layers.Dense(12, activation='relu', input_shape=[4]),
    keras.layers.Dense(12, activation='relu'),
    keras.layers.Dense(1)                                   
  ])
  
  opt = keras.optimizers.RMSprop()
  m = [
       keras.metrics.MeanAbsoluteError(),
       keras.metrics.Accuracy(),
       keras.metrics.MeanSquaredError()
  ]
  l = keras.losses.MeanSquaredError()
  
  model.compile(loss=l, optimizer=opt, metrics=m)
  return model

def Build_Model_8Neutrons():
  model = keras.models.Sequential([
    keras.layers.Dense(8, activation='relu', input_shape=[4]),
    keras.layers.Dense(8, activation='relu'),
    keras.layers.Dense(1)                                   
  ])
  
  opt = keras.optimizers.RMSprop()
  m = [
       keras.metrics.MeanAbsoluteError(),
       keras.metrics.Accuracy(),
       keras.metrics.MeanSquaredError()
  ]
  l = keras.losses.MeanSquaredError()
  
  model.compile(loss=l, optimizer=opt, metrics=m)
  return model

In [None]:
# Train the networks
m8 = Build_Model_8Neutrons()
m8_history = m8.fit(train_data, train_labels, epochs=100, validation_split=0.2, verbose=0, callbacks=[PrintDoc()])

m12 = Build_Model_12Neutrons()
m12_history = m12.fit(train_data, train_labels, epochs=100, validation_split=0.2, verbose=0, callbacks=[PrintDoc()])

m24 = Build_Model_24Neutrons()
m24_history = m24.fit(train_data, train_labels, epochs=100, validation_split=0.2, verbose=0, callbacks=[PrintDoc()])


..........
..........
..........
..........
..........
..........
..........
..........
..........
..........
..........
..........
..........
..........
..........
..........
..........
..........
..........
..........
..........
..........
..........
..........
..........
..........
..........
..........
..........
..........

In [None]:
# Grab Their Results
_, m8_mean_absolute_error, _, _ = m8.evaluate(test_data, test_labels,verbose=0)
print('8 Neurons model: %s' % m8_mean_absolute_error)

_, m12_mean_absolute_error, _, _ = m12.evaluate(test_data, test_labels,verbose=0)
print('12 Neurons model: %s' % m12_mean_absolute_error)

_, m24_mean_absolute_error, _, _ = m24.evaluate(test_data, test_labels,verbose=0)
print('24 Neurons model: %s' % m24_mean_absolute_error)

print('Milestone 3 model: %s' % m3_mean_absolute_error)

8 Neurons model: 7.877933025360107
12 Neurons model: 7.877676010131836
24 Neurons model: 7.846304893493652
Milestone 3 model: 7.866069316864014


**Winner**: 24 neurons!!  

Now that we have the neuron winner we will move to changing the activation functions.

## Activation Functions  

We currently have RELU (rectified linear unit). We will try `sigmoid` and `softmax`.

In [None]:
def Build_Model_Sigmoid():
  model = keras.models.Sequential([
    keras.layers.Dense(24, activation='sigmoid', input_shape=[4]),
    keras.layers.Dense(24, activation='sigmoid'),
    keras.layers.Dense(1)                                   
  ])
  
  opt = keras.optimizers.RMSprop()
  m = [
       keras.metrics.MeanAbsoluteError(),
       keras.metrics.Accuracy(),
       keras.metrics.MeanSquaredError()
  ]
  l = keras.losses.MeanSquaredError()
  
  model.compile(loss=l, optimizer=opt, metrics=m)
  return model

def Build_Model_Softmax():
  model = keras.models.Sequential([
    keras.layers.Dense(24, activation='softmax', input_shape=[4]),
    keras.layers.Dense(24, activation='softmax'),
    keras.layers.Dense(1)                                   
  ])
  
  opt = keras.optimizers.RMSprop()
  m = [
       keras.metrics.MeanAbsoluteError(),
       keras.metrics.Accuracy(),
       keras.metrics.MeanSquaredError()
  ]
  l = keras.losses.MeanSquaredError()
  
  model.compile(loss=l, optimizer=opt, metrics=m)
  return model

In [None]:
# Train the networks
msg = Build_Model_Sigmoid()
msg_history = msg.fit(train_data, train_labels, epochs=100, validation_split=0.2, verbose=0, callbacks=[PrintDoc()])

msm = Build_Model_Softmax()
msm_history = msm.fit(train_data, train_labels, epochs=100, validation_split=0.2, verbose=0, callbacks=[PrintDoc()])


..........
..........
..........
..........
..........
..........
..........
..........
..........
..........
..........
..........
..........
..........
..........
..........
..........
..........
..........
..........

In [None]:
_, msg_mean_absolute_error, _, _ = msg.evaluate(test_data, test_labels,verbose=0)
print('Sigmoid model: %s' % msg_mean_absolute_error)

_, msm_mean_absolute_error, _, _ = msm.evaluate(test_data, test_labels,verbose=0)
print('Softmax model: %s' % msm_mean_absolute_error)


print('24 Neuron model: %s' % m24_mean_absolute_error)

Sigmoid model: 7.884239196777344
Softmax model: 7.989224433898926
24 Neuron model: 7.846304893493652


**Winner**: Still the 24 neuron layers with the relu activation function.

## Optimizers  

We started with RMSProp and we are going to try SDG (stochastic gradient descent), Adam, and Adamax.

In [None]:
def Build_Model_SDG():
  model = keras.models.Sequential([
    keras.layers.Dense(24, activation='relu', input_shape=[4]),
    keras.layers.Dense(24, activation='relu'),
    keras.layers.Dense(1)                                   
  ])
  
  opt = keras.optimizers.SGD()
  m = [
       keras.metrics.MeanAbsoluteError(),
       keras.metrics.Accuracy(),
       keras.metrics.MeanSquaredError()
  ]
  l = keras.losses.MeanSquaredError()
  
  model.compile(loss=l, optimizer=opt, metrics=m)
  return model

def Build_Model_Adam():
  model = keras.models.Sequential([
    keras.layers.Dense(24, activation='relu', input_shape=[4]),
    keras.layers.Dense(24, activation='relu'),
    keras.layers.Dense(1)                                   
  ])
  
  opt = keras.optimizers.Adam()
  m = [
       keras.metrics.MeanAbsoluteError(),
       keras.metrics.Accuracy(),
       keras.metrics.MeanSquaredError()
  ]
  l = keras.losses.MeanSquaredError()
  
  model.compile(loss=l, optimizer=opt, metrics=m)
  return model

def Build_Model_Adamax():
  model = keras.models.Sequential([
    keras.layers.Dense(24, activation='relu', input_shape=[4]),
    keras.layers.Dense(24, activation='relu'),
    keras.layers.Dense(1)                                   
  ])
  
  opt = keras.optimizers.Adamax()
  m = [
       keras.metrics.MeanAbsoluteError(),
       keras.metrics.Accuracy(),
       keras.metrics.MeanSquaredError()
  ]
  l = keras.losses.MeanSquaredError()
  
  model.compile(loss=l, optimizer=opt, metrics=m)
  return model

In [None]:
# Train the networks
sdg = Build_Model_SDG()
sdg_history = sdg.fit(train_data, train_labels, epochs=100, validation_split=0.2, verbose=0, callbacks=[PrintDoc()])

adam = Build_Model_Adam()
adam_history = adam.fit(train_data, train_labels, epochs=100, validation_split=0.2, verbose=0, callbacks=[PrintDoc()])

ada = Build_Model_Adamax()
ada_history = ada.fit(train_data, train_labels, epochs=100, validation_split=0.2, verbose=0, callbacks=[PrintDoc()])


..........
..........
..........
..........
..........
..........
..........
..........
..........
..........
..........
..........
..........
..........
..........
..........
..........
..........
..........
..........
..........
..........
..........
..........
..........
..........
..........
..........
..........
..........

In [None]:
_, sdg_mean_absolute_error, _, _ = sdg.evaluate(test_data, test_labels,verbose=0)
print('SDG model: %s' % sdg_mean_absolute_error)

_, adam_mean_absolute_error, _, _ = adam.evaluate(test_data, test_labels,verbose=0)
print('Adam model: %s' % adam_mean_absolute_error)

_, ada_mean_absolute_error, _, _ = ada.evaluate(test_data, test_labels,verbose=0)
print('Adamax model: %s' % ada_mean_absolute_error)


print('RMSProp model: %s' % m24_mean_absolute_error)

SDG model: 8.956071853637695
Adam model: 7.867661952972412
Adamax model: 7.859252452850342
RMSProp model: 7.846304893493652


## Export h5 Model  

We will use the built in Keras `save` function to export the model

In [None]:
#Save the model and all it's weights
m24.save('deeplearning-manning.h5')

#Google Colab code to download
from google.colab import files
files.download('deeplearning-manning.h5')

## Conclusion  
The RMSProp activation was still the best. It turns out that after everything we did in this milestone the only change to my first network was to change the neurons from 32 to 24. But, this does show the process on how to narrow down your parameters when you are finalizing your network.