<img align="left" src="https://lever-client-logos.s3.amazonaws.com/864372b1-534c-480e-acd5-9711f850815c-1524247202159.png" width=200>
<br></br>

# Hyperparameter Tuning

## *Data Science Unit 4 Sprint 2 Assignment 4*

## Your Mission, should you choose to accept it...

To hyperparameter tune and extract every ounce of accuracy out of this telecom customer churn dataset: <https://drive.google.com/file/d/1dfbAsM9DwA7tYhInyflIpZnYs7VT-0AQ/view> 

## Requirements

- Load the data
- Clean the data if necessary (it will be)
- Create and fit a baseline Keras MLP model to the data.
- Hyperparameter tune (at least) the following parameters:
 - batch_size
 - training epochs
 - optimizer
 - learning rate (if applicable to optimizer)
 - momentum (if applicable to optimizer)
 - activation functions
 - network weight initialization
 - dropout regularization
 - number of neurons in the hidden layer
 
 You must use Grid Search and Cross Validation for your initial pass of the above hyperparameters
 
 Try and get the maximum accuracy possible out of this data! You'll save big telecoms millions! Doesn't that sound great?


# DAY1

### Learning Objectives
- Describe the foundational components of a neural network
- Implement a Perceptron from scratch in Python

#### Input Layer:

The input Layer is where the feature data from the dataframe are input

#### Hidden Layer:

These are the layer that exist between the input layer and output layer. You cna have one hidden layer or many hidden layers

#### Output Layer:

This is the answer/result of our neurons in our neural netoworks. These ouputs can then be used as inputs for the next layer of neurons or be the final output(s) of the neural network.

#### Neuron:

The neuron recieves inputs, multiplies the inputs by their weights, sums everyhting up, and then applies the activation function to the sum. Usually involves a continuous activation function

#### Weight:

This is the amount or positive or negative effect an input will be associated with the ending output.

#### Activation Function:

The activation function is how the neural network normalizes the results after inputs, weights, and biases have been applied within the neuron.

#### Node Map:

The node maps show how the features of the dataframe or the outputs of upper level neurons are further processed throughout the neural netowork. It shows inputs, outputs, and hidden layers visualized at a high level.

#### Perceptron:

Simply, a perceptron consists of four distinct parts. Uses a binary activation function that is either activate or not, different from a neuron

    Inputs
    Weights
    Weighted Sum
    Activation Function (Output)

Perceptrons classify data into two parts (0,1) most of the time. Perceptrons are also known as Linear Binary Classifiers


#### Inputs -> Outputs
Explain the flow of information through a neural network from inputs to outputs. Be sure to include: inputs, weights, bias, and activation functions. How does it all flow from beginning to end?
Your Answer Here

Depending on your network, Inputs and Outputs can range arbitraily. Each input can come from an upper level neuron or the intial inputted values from a dataframe. Each input can be weighted negatively or positvely depending on whether your desired answer needs the neuron to activate negatively or positively depending how your inputted bias has shifted the activation curve up or down.


### Imports

In [5]:
!pip install category-encoders

[33mYou are using pip version 10.0.1, however version 19.3.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


In [101]:
import numpy as np
import pandas as pd
import seaborn as sns

from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.metrics import mean_squared_error
from sklearn.impute import SimpleImputer
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import MinMaxScaler
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import GridSearchCV


from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.datasets import mnist
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier

import category_encoders as ce

In [7]:
#Load Data
df = sns.load_dataset('tips')
df.describe()

Unnamed: 0,total_bill,tip,size
count,244.0,244.0,244.0
mean,19.785943,2.998279,2.569672
std,8.902412,1.383638,0.9511
min,3.07,1.0,1.0
25%,13.3475,2.0,2.0
50%,17.795,2.9,2.0
75%,24.1275,3.5625,3.0
max,50.81,10.0,6.0


In [8]:
print(df.shape)
df.head()

(244, 7)


Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.5,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4


In [78]:
def prep(df, target):
    
    """
    This function will:
    1. Change "size" into a catagorical to be one hotted
    2. Add Total and Tip and put into 3 bins
    3. Split data
    4. Create X and y train/test
    5. process X train/test data by one hotting categoricals
    6. Make 'sex' a binary column
    7. return 4 df's
    """
    df['size'] = df['size'].astype(str)
    df['bill_tip_sum'] = pd.qcut(df['total_bill']+df['tip'], 3, labels=['low', 'medium', 'high'])
    df['tip_pct'] = df['tip']/df['total_bill']
    
    training, testing = train_test_split(df, test_size=.2)
    
    X_train = training.drop(columns=target)
    y_train = training[target]
    X_test = testing.drop(columns=target)
    y_test = testing[target]
    
    processor = make_pipeline(
        ce.OneHotEncoder(use_cat_names=True),  
        SimpleImputer(strategy='median'),
        StandardScaler()
    )
    
    gender = {'Female': 0, 'Male': 1}
    y_train = y_train.map(gender)
    y_test = y_test.map(gender)
    
    X_process_train = processor.fit_transform(X_train)
    X_process_test = processor.transform(X_test)
    
    return X_process_train,y_train, X_process_test, y_test

In [80]:
X_train, y_train, X_test, y_test = prep(df, 'sex')
print(X_train.shape) 
print(X_test.shape) 
print(y_train.shape) 
print(y_test.shape)
# X_train.head()

(195, 20)
(49, 20)
(195,)
(49,)


In [81]:
class NNet:
    def __init__(self):
        
        # Inputs must be == to number of features
        self.inputs = 20
        # Only one output node b/c only trying to predict one thing
        self.outputNodes = 1
        
        self.weights = np.random.rand(self.inputs, self.outputNodes)
     
    # Squishify
    def sigmoid(self, s):
        return 1 / (1+np.exp(-s))
    
    # Create 0 or 1 from prediced activated output
    def binary(self, X):
        binary = self.feed_forward(X)
        binary = [1 if x > .9999 else 0 for x in binary]
        return binary
    
     
    def feed_forward(self, X):
        """Calculate the NNet inference using the feed forward, aka predict """
        
        # Combining  inputs and weights in a weighted sum
        self.input_sum = np.dot(X, self.weights)
        
        # Apply activation function to the weighted sum
        self.output_activated = self.sigmoid(self.input_sum)
        
        return self.output_activated

In [82]:
nn = NNet()

In [83]:
y_pred1 = nn.binary(X_train)
score = accuracy_score(y_train, y_pred1)

y_pred2 = nn.binary(X_test)
score2 = accuracy_score(y_test, y_pred2)

print(f"Mean baseline for our target(Males) is {round(df['sex'].value_counts(normalize=True)[0]*100, 2)}%")
print(f"The accuracy of the train is {round(score*100, 2)}%")
print(f"The accuracy of the test is {round(score2*100, 2)}%")

Mean baseline for our target(Males) is 64.34%
The accuracy of the train is 35.9%
The accuracy of the test is 34.69%


# Day 2

### Learning Objectives
- Explain the intuition behind backproprogation
- Implement gradient descent + backproprogation on a feedforward neural network

In [84]:
# I want activations that correspond to negative weights to be lower
# and activations that correspond to positive weights to be higher

class NNetbackprop:
    def __init__(self):
        # Set up Architecture of Neural Network
        self.inputs = 20
        self.hiddenNodes = 3
        self.outputNodes = 1

        # Initial Weights
        # 2x3 Matrix Array for the First Layer
        self.weights1 = np.random.rand(self.inputs, self.hiddenNodes)
       
        # 3x1 Matrix Array for Hidden to Output
        self.weights2 = np.random.rand(self.hiddenNodes, self.outputNodes)
        
    def sigmoid(self, s):
        return 1 / (1+np.exp(-s))
    
    def sigmoidPrime(self, s):
        return s * (1 - s)
    
    def feed_forward(self, X):
        """
        Calculate the NN inference using feed forward.
        aka "predict"
        """
        
        # Weighted sum of inputs => hidden layer
        self.hidden_sum = np.dot(X, self.weights1)
        
        # Activations of weighted sum
        self.activated_hidden = self.sigmoid(self.hidden_sum)
        
        # Weight sum between hidden and output
        self.output_sum = np.dot(self.activated_hidden, self.weights2)
        
        # Final activation of output
        self.activated_output = self.sigmoid(self.output_sum)
        
        return self.activated_output
        
    def backward(self, X,y,o):
        """
        Backward propagate through the network
        """
        
        # Error in Output
        # Calculate the error, the diffrence between true y value and the predicted
        self.o_error = y - o
        
        # Apply Derivative of Sigmoid to error
        # How far off are we in relation to the Sigmoid f(x) of the output
        # ^- aka hidden => output
        # Which direction do we want to go 
        self.o_delta = self.o_error * self.sigmoidPrime(o)
        
        # z2 error
        # Applying the  o-delta/correction to weights2 transformed
        self.z2_error = self.o_delta.dot(self.weights2.T)
        
        # How much of that "far off" can explained by the input => hidden
        # Apply sigmoid derivative to the error
        self.z2_delta = self.z2_error * self.sigmoidPrime(self.activated_hidden)
        
        # Adjustment to first set of weights (input => hidden)
        # Applying adjustments to the weights
        self.weights1 += X.T.dot(self.z2_delta)
        
        # Adjustment to second set of weights (hidden => output)
        # Applying adjustments to the weights
        self.weights2 += self.activated_hidden.T.dot(self.o_delta)
        

    def train(self, X, y):
        o = self.feed_forward(X)
        self.backward(X,y,o)

### Load Data

In [85]:
def y_input(x):
    y_list = []
    for x in y_train:
        new = np.array([x])
        y_list.append(new)
    return y_list

In [86]:
nnbp = NNetbackprop()

In [87]:
ytrain = y_input(y_train)
nnbp.train(X_train, ytrain)

### Backproprogation

In [88]:
# ---1st ERROR---
# Apply sigmoid derivative to the error
# Which direction do we want to go
# self.o_delta = self.o_error * sigmoidprime(o)
#How much more sigmoid activation would have pushed us towards the right answer

nnbp.o_error

array([[ 0.41844722],
       [-0.69862933],
       [-0.6079635 ],
       [ 0.35260327],
       [ 0.41625736],
       [-0.5826915 ],
       [-0.73659235],
       [ 0.44601459],
       [ 0.331906  ],
       [ 0.32038647],
       [ 0.26706285],
       [-0.64643893],
       [ 0.45960855],
       [ 0.46484187],
       [ 0.45904129],
       [ 0.2805177 ],
       [ 0.36751556],
       [ 0.36301886],
       [ 0.39169574],
       [ 0.44055137],
       [ 0.26223126],
       [-0.56817841],
       [-0.62874613],
       [ 0.32504236],
       [-0.51879803],
       [ 0.38685828],
       [ 0.32794122],
       [-0.66972359],
       [-0.63095203],
       [ 0.39646249],
       [-0.64476748],
       [-0.66598358],
       [ 0.26139817],
       [-0.58956706],
       [-0.65850239],
       [-0.72438496],
       [-0.63668481],
       [-0.61447721],
       [-0.67596328],
       [ 0.40657945],
       [-0.62242337],
       [ 0.3781949 ],
       [-0.6131646 ],
       [ 0.43546549],
       [-0.59685576],
       [ 0

In [89]:
# Apply sigmoid derivative to the error
# Which direction do we want to go
# self.o_delta = self.o_error * sigmoidprime(o)

nnbp.o_delta

array([[ 0.10182877],
       [-0.14709388],
       [-0.14490438],
       [ 0.08049023],
       [ 0.1011452 ],
       [-0.1416885 ],
       [-0.14291664],
       [ 0.11020377],
       [ 0.0735983 ],
       [ 0.06976062],
       [ 0.05227496],
       [-0.14774726],
       [ 0.1141523 ],
       [ 0.11563588],
       [ 0.11399023],
       [ 0.05661619],
       [ 0.08542821],
       [ 0.08394309],
       [ 0.09332942],
       [ 0.10858087],
       [ 0.05073284],
       [-0.13940354],
       [-0.14676471],
       [ 0.07131099],
       [-0.12951618],
       [ 0.09176238],
       [ 0.07227686],
       [-0.14813877],
       [-0.14691817],
       [ 0.09486554],
       [-0.14767908],
       [-0.14814768],
       [ 0.05046793],
       [-0.14266211],
       [-0.14808204],
       [-0.14462446],
       [-0.14727619],
       [-0.14556656],
       [-0.14806092],
       [ 0.09809648],
       [-0.14627728],
       [ 0.08893764],
       [-0.14543882],
       [ 0.10705279],
       [-0.14361481],
       [ 0

In [90]:
# z2 error
# Applying the  o-delta/correction to weights2 transformed
# These are the errors from the output to the hidden layer

nnbp.z2_error

array([[ 0.02007052,  0.04915294,  0.05494625],
       [-0.02899231, -0.0710025 , -0.07937105],
       [-0.02856075, -0.06994562, -0.07818961],
       [ 0.01586468,  0.03885279,  0.04343209],
       [ 0.01993579,  0.04882298,  0.05457739],
       [-0.0279269 , -0.06839331, -0.07645434],
       [-0.02816897, -0.06898614, -0.07711703],
       [ 0.02172124,  0.05319557,  0.05946535],
       [ 0.01450628,  0.03552604,  0.03971324],
       [ 0.01374987,  0.03367359,  0.03764245],
       [ 0.01030343,  0.02523322,  0.02820728],
       [-0.02912109, -0.07131789, -0.07972361],
       [ 0.0224995 ,  0.05510153,  0.06159595],
       [ 0.02279191,  0.05581766,  0.06239649],
       [ 0.02246755,  0.0550233 ,  0.0615085 ],
       [ 0.01115909,  0.02732874,  0.03054979],
       [ 0.01683796,  0.04123636,  0.04609659],
       [ 0.01654524,  0.04051949,  0.04529523],
       [ 0.01839529,  0.04505029,  0.05036004],
       [ 0.02140137,  0.0524122 ,  0.05858964],
       [ 0.00999948,  0.02448884,  0.027

In [91]:
# How much on the sigmoid curve we want to move
# Being the delta this is the direction we will be traveling
# For each observation, how much more sigmoid activation from this layer would have 
# pushed us towards the right answer?

nnbp.z2_delta

array([[ 3.90084935e-03,  8.77052728e-03,  6.34503161e-03],
       [-5.75150323e-03, -1.38511075e-02, -1.26231357e-02],
       [-6.55283472e-03, -1.67401296e-02, -1.72340641e-02],
       [ 3.51049728e-03,  9.67509111e-03,  1.08188539e-02],
       [ 4.97971873e-03,  1.11183885e-02,  5.96726492e-03],
       [-1.92449945e-03, -1.68695416e-02, -1.20448397e-02],
       [-2.79957544e-03, -1.18414674e-02, -7.92737456e-03],
       [ 4.35621028e-03,  1.11605049e-02,  1.86548498e-03],
       [ 3.46260322e-03,  8.58221781e-03,  9.32461782e-03],
       [ 2.31419665e-03,  8.40580568e-03,  3.81906714e-03],
       [ 4.48411867e-04,  5.95147152e-03,  8.55006491e-04],
       [-6.00868183e-03, -1.63173230e-02, -1.61344761e-02],
       [ 3.47975751e-03,  1.00096576e-02,  1.01748286e-03],
       [ 3.72950676e-03,  8.41702234e-03,  1.23763083e-03],
       [ 3.34915783e-03,  6.90246916e-03,  5.83592429e-03],
       [ 2.71106706e-03,  6.27036989e-03,  1.26546969e-03],
       [ 4.14615878e-03,  9.40709563e-03

In [92]:
#Calculation to update the weights
X_train.T.dot(nnbp.z2_delta)

array([[ 0.02788001,  0.19075253,  0.21020698],
       [-0.06783556, -0.00155953,  0.02953201],
       [-0.00820204,  0.03250527,  0.04967572],
       [ 0.00820204, -0.03250527, -0.04967572],
       [-0.13144472, -0.43601746, -0.19357505],
       [-0.07724362, -0.17784679, -0.06411615],
       [ 0.04513977,  0.15804689,  0.06183662],
       [ 0.12372191,  0.3577006 ,  0.15913622],
       [-0.13154338, -0.44449755, -0.16709347],
       [ 0.13154338,  0.44449755,  0.16709347],
       [ 0.05212866,  0.14772791,  0.07075208],
       [-0.02270668,  0.04185197,  0.07547588],
       [-0.02877886, -0.09280064, -0.05179312],
       [-0.0081627 , -0.00398685, -0.0761025 ],
       [-0.02696442, -0.0861027 ,  0.00647607],
       [-0.01329646, -0.3197288 , -0.27259809],
       [-0.0027848 , -0.10905463, -0.27388632],
       [-0.04600571, -0.08631566,  0.06728955],
       [ 0.04911008,  0.1951433 ,  0.20391662],
       [-0.13247509, -0.26221751, -0.26276935]])

In [93]:
# Update hidden layer weights

nnbp.activated_hidden.T.dot(nnbp.o_delta)

array([[ 0.23030452],
       [-0.75013359],
       [-1.12882874]])

In [94]:
# Train my 'net
nnbp = NNetbackprop()

# Number of Epochs / Iterations
for i in range(10000):
    if (i+1 in [1,2,3,4,5]) or ((i+1) % 1000 ==0):
        print('+' + '---' * 3 + f'EPOCH {i+1}' + '---'*3 + '+')
        print('Input: \n', X_train)
        print('Actual Output: \n', ytrain)
        print('Predicted Output: \n', str(nn.feed_forward(X_train)))
        print("Loss: \n", str(np.mean(np.square(ytrain - nnbp.feed_forward(X_train)))))
    nnbp.train(X_train,ytrain)

+---------EPOCH 1---------+
Input: 
 [[ 0.4402151  -0.72944965  1.35132785 ... -0.72348981  1.41421356
  -1.2189249 ]
 [-0.74297692  0.28119269  1.35132785 ... -0.72348981 -0.70710678
   1.54036825]
 [ 0.51982157  0.46938126 -0.74001287 ... -0.72348981  1.41421356
  -0.17664106]
 ...
 [-0.7996829  -1.37765474  1.35132785 ... -0.72348981 -0.70710678
  -1.19333871]
 [-0.67972795 -0.05336478 -0.74001287 ... -0.72348981 -0.70710678
   0.83309886]
 [-0.566316   -0.77823928 -0.74001287 ... -0.72348981 -0.70710678
  -0.48576948]]
Actual Output: 
 [array([1]), array([0]), array([0]), array([1]), array([1]), array([0]), array([0]), array([1]), array([1]), array([1]), array([1]), array([0]), array([1]), array([1]), array([1]), array([1]), array([1]), array([1]), array([1]), array([1]), array([1]), array([0]), array([0]), array([1]), array([0]), array([1]), array([1]), array([0]), array([0]), array([1]), array([0]), array([0]), array([1]), array([0]), array([0]), array([0]), array([0]), array([0]

# Day 3

### Learning Objectives
- Introduce the Keras Sequential Model API
- Learn How to Select Model Architecture
- Discuss the trade-off between various activation functions

In [95]:
#Model
model = Sequential()

#Input
model.add(Dense(16, input_dim=20, activation='relu'))

#Hidden
model.add(Dense(32, kernel_initializer='normal', activation='relu'))
model.add(Dense(32, kernel_initializer='normal', activation='relu'))

#Output
model.add(Dense(1, activation='linear'))

#Compile
model.compile(loss='mean_squared_error',
              metrics=['mean_squared_error'],
              optimizer='adam')

#Fit & Evaluate
history = model.fit(X_train, y_train, epochs=100, verbose=False, validation_split=.1)
scores = model.evaluate(X_test, y_test, verbose=0)

In [96]:
print(f"The MSE of our neural net is ${scores[1]}")
print(f"The RMSE of our neural net is ${round(np.sqrt(scores[1]), 2)}")

The MSE of our neural net is $0.2909967005252838
The RMSE of our neural net is $0.5400000214576721


### Activation Functions

#### Step Function

#### Linear Function

#### Sigmoid Function

#### Tanh Function

#### ReLu Function

#### Leaky ReLu

# DAY 4

### Learning Objectives
- Describe the major hyperparameters to tune
- Implement an exeriment tracking framework
- Search the hyperameter space using RandomSearch (Optional)

In [100]:
# Important Hyperparameters
inputs = X_train.shape[1]
epochs = 75
batch_size = 10


# Create Model
model = Sequential()
model.add(Dense(64, activation='relu', input_shape=(inputs,)))
model.add(Dense(64, activation='relu'))
model.add(Dense(1))

# Compile Model
model.compile(optimizer='adam', loss='mse', metrics=['mse', 'mae'])

# Fit Model
model.fit(X_train, y_train, 
          validation_data=(X_test,y_test), 
          epochs=epochs, 
          batch_size=batch_size
         )

Train on 195 samples, validate on 49 samples
Epoch 1/75
Epoch 2/75
Epoch 3/75
Epoch 4/75
Epoch 5/75
Epoch 6/75
Epoch 7/75
Epoch 8/75
Epoch 9/75
Epoch 10/75
Epoch 11/75
Epoch 12/75
Epoch 13/75
Epoch 14/75
Epoch 15/75
Epoch 16/75
Epoch 17/75
Epoch 18/75
Epoch 19/75
Epoch 20/75
Epoch 21/75
Epoch 22/75
Epoch 23/75
Epoch 24/75
Epoch 25/75
Epoch 26/75
Epoch 27/75
Epoch 28/75
Epoch 29/75
Epoch 30/75
Epoch 31/75
Epoch 32/75
Epoch 33/75
Epoch 34/75
Epoch 35/75
Epoch 36/75
Epoch 37/75
Epoch 38/75
Epoch 39/75
Epoch 40/75
Epoch 41/75
Epoch 42/75
Epoch 43/75
Epoch 44/75
Epoch 45/75
Epoch 46/75
Epoch 47/75
Epoch 48/75
Epoch 49/75
Epoch 50/75
Epoch 51/75
Epoch 52/75
Epoch 53/75
Epoch 54/75
Epoch 55/75
Epoch 56/75
Epoch 57/75
Epoch 58/75
Epoch 59/75
Epoch 60/75
Epoch 61/75
Epoch 62/75
Epoch 63/75
Epoch 64/75
Epoch 65/75
Epoch 66/75
Epoch 67/75
Epoch 68/75
Epoch 69/75
Epoch 70/75
Epoch 71/75
Epoch 72/75
Epoch 73/75
Epoch 74/75
Epoch 75/75


<tensorflow.python.keras.callbacks.History at 0x7fd0e4c8d358>

In [106]:
# fix random seed for reproducibility
seed = 7
np.random.seed(seed)

# Replaced with tips dataset
# # load dataset
# url ="https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv"

# dataset = pd.read_csv(url, header=None).values

# # split into input (X) and output (Y) variables
# X = dataset[:,0:8]
# Y = dataset[:,8]

# Function to create model, required for KerasClassifier
def create_model():
    # create model
    model = Sequential()
    model.add(Dense(12, input_dim=20, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    # Compile model
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

# create model
model = KerasClassifier(build_fn=create_model, verbose=0)

# define the grid search parameters
# batch_size = [10, 20, 40, 60, 80, 100]
# param_grid = dict(batch_size=batch_size, epochs=epochs)

# define the grid search parameters
param_grid = {'batch_size': [10, 20, 40, 60, 80, 100],
              'epochs': [20]}

# Create Grid Search
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=1)
grid_result = grid.fit(X_train, y_train)

# Report Results
print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}") 



Best: 0.5948718190193176 using {'batch_size': 10, 'epochs': 20}
Means: 0.5948718190193176, Stdev: 0.0691832654313919 with: {'batch_size': 10, 'epochs': 20}
Means: 0.5794872045516968, Stdev: 0.05936326262982861 with: {'batch_size': 20, 'epochs': 20}
Means: 0.5589743852615356, Stdev: 0.014504753621327159 with: {'batch_size': 40, 'epochs': 20}
Means: 0.5794872045516968, Stdev: 0.026148816459833527 with: {'batch_size': 60, 'epochs': 20}
Means: 0.5589743852615356, Stdev: 0.0725237681066358 with: {'batch_size': 80, 'epochs': 20}
Means: 0.5589743753274282, Stdev: 0.09428091042619949 with: {'batch_size': 100, 'epochs': 20}


In [None]:
# define the grid search parameters
param_grid = {'batch_size': [20],
              'epochs': [20, 40, 60,200]}

# Create Grid Search
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=1)
grid_result = grid.fit(X_train, y_train)

# Report Results
print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}")



## Stretch Goals:

- Try to implement Random Search Hyperparameter Tuning on this dataset
- Try to implement Bayesian Optimiation tuning on this dataset using hyperas or hyperopt (if you're brave)
- Practice hyperparameter tuning other datasets that we have looked at. How high can you get MNIST? Above 99%?
- Study for the Sprint Challenge
 - Can you implement both perceptron and MLP models from scratch with forward and backpropagation?
 - Can you implement both perceptron and MLP models in keras and tune their hyperparameters with cross validation?