# Neural Networks Sprint Challenge

## 1) Define the following terms:

- Neuron
- Input Layer
- Hidden Layer
- Output Layer
- Activation
- Backpropagation

 ## YOUR ANSWER HERE
    
### NEURON - The structure of brain cells that a neural network is trying to emulate into computer algorithms which can make predictions.

### INPUT LAYER - The input layer is what receives input from the dataset, the visible layer, the only layer that interacts directly with our dataset.

### HIDDEN LAYER - All the layers after the input layer and before the output layer.  It's hidden in that we don't get to directly interact with this layer.

### OUTPUT LAYER - The final layer of the neural network that outputs a vector of values suitable for the problem we're trying to solve.  The output layer in a classificaiton problem has been modified by an activation function (i.e. sigmoid / Relu).

### ACTIVATION - A decision function that decides how much signal to pass onto the next layer.  They can be like the step function where everything equal to or greater than 0 is assigned a 1, otherwise -1.

### BACKPROPAGATION - a neural network algorithm where the weights in the weighted sum are first calculated forwards in the manner of feed forward algorithm, and then backwards starting from the output layer using paritial derivatives or the Chain Rule of calculus.

## 2) Create a perceptron class that can model the behavior of an AND gate. You can use the following table as your training data:

| x1 | x2 | x3 | y |
|----|----|----|---|
| 1  | 1  | 1  | 1 |
| 1  | 0  | 1  | 0 |
| 0  | 1  | 1  | 0 |
| 0  | 0  | 1  | 0 |

In [11]:
##### Your Code Here #####
import numpy as np
np.random.seed(42)

inputs = np.array([[1, 1, 1],
                  [1, 0, 1],
                  [0, 1, 1],
                  [0, 0, 1]])
correct_outputs = [[1],
                  [0],
                  [0],
                  [0]]

# sigmoid activation function:
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def sigmoid_derivative(x):
    return sigmoid(x) * (1 - sigmoid(x))

# initialize some random weights
weights = 2 * np.random.random((3, 1)) - 1
weights

array([[-0.25091976],
       [ 0.90142861],
       [ 0.46398788]])

In [12]:
# calculate weighted sum of inputs and weights
weighted_sum = np.dot(inputs, weights)
weighted_sum

array([[1.11449673],
       [0.21306812],
       [1.3654165 ],
       [0.46398788]])

In [13]:
# output the activated value for the end of 1 training epoch
activated_output = sigmoid(weighted_sum)
activated_output

array([[0.75296649],
       [0.55306642],
       [0.79663861],
       [0.61395979]])

In [14]:
# Error for our first pass
error = correct_outputs - activated_output
error

array([[ 0.24703351],
       [-0.55306642],
       [-0.79663861],
       [-0.61395979]])

In [15]:
# Now we take our errors and update the weights accordingly
# using derivative of the sigmoid function to calculate weight updates
adjustments = error * sigmoid_derivative(activated_output)
adjustments

array([[ 0.05377007],
       [-0.12820984],
       [-0.17062609],
       [-0.13988803]])

In [16]:
# update weights
weights += np.dot(inputs.T, adjustments)
weights

array([[-0.32535954],
       [ 0.78457259],
       [ 0.07903399]])

In [17]:
# Putting it all together:
for iteration in range(10000):
  # Weighted sum of inputs and weights
  weighted_sum = np.dot(inputs, weights)
  # Activate with sigmoid function
  activated_output = sigmoid(weighted_sum)
  # Calculate Error
  error = correct_outputs - activated_output
  # Calculate weight adjustments with sigmoid_derivative
  adjustments = error * sigmoid_derivative(activated_output)
  # Update weights
  weights += np.dot(inputs.T, adjustments)
    
print('optimized weights after training: ')
print(weights)

print("Output After Training:")
print(activated_output)

optimized weights after training: 
[[ 11.83991275]
 [ 11.83991275]
 [-18.04817835]]
Output After Training:
[[9.96429763e-01]
 [2.00888375e-03]
 [2.00888375e-03]
 [1.45179935e-08]]


## 3) Implement a Neural Network Multilayer Perceptron class that uses backpropagation to update the network's weights. 
- Your network must have one hidden layer. 
- You do not have to update weights via gradient descent. You can use something like the derivative of the sigmoid function to update weights.
- Train your model on the Heart Disease dataset from UCI:

[Github Dataset](https://github.com/ryanleeallred/datasets/blob/master/heart.csv)

[Raw File on Github](https://raw.githubusercontent.com/ryanleeallred/datasets/master/heart.csv)


In [33]:
##### Your Code Here #####
import pandas as pd

df = pd.read_csv('https://raw.githubusercontent.com/ryanleeallred/datasets/master/heart.csv')
df.head()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,63,1,3,145,233,1,0,150,0,2.3,0,0,1,1
1,37,1,2,130,250,0,1,187,0,3.5,0,0,2,1
2,41,0,1,130,204,0,0,172,0,1.4,2,0,2,1
3,56,1,1,120,236,0,1,178,0,0.8,2,0,2,1
4,57,0,0,120,354,0,1,163,1,0.6,2,0,2,1


In [34]:
df = df.sample(frac=1).reset_index(drop=True)
df.head()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,58,1,2,112,230,0,0,165,0,2.5,1,1,3,0
1,59,1,3,134,204,0,1,162,0,0.8,2,2,2,0
2,58,0,0,130,197,0,1,131,0,0.6,1,0,2,1
3,56,1,1,130,221,0,0,163,0,0.0,2,0,3,1
4,44,1,0,110,197,0,0,177,0,0.0,2,1,2,0


In [35]:
from sklearn.preprocessing import StandardScaler

# Create X and y arrays
y = df['target'].values.astype('float')
X = df.drop('target', axis=1)
X = X.values.astype('float')

scaler = StandardScaler()

X_scaled = scaler.fit_transform(X)


In [46]:
import numpy as np
np.random.seed(1)

class Neural_Network(object):
    def __init__(self):
        self.inputs = 13
        self.hiddenNodes = 4
        self.outputNodes = 1
        
        # Initialize Weights
        self.L1_weights = np.random.randn(self.inputs, self.hiddenNodes) # 3x4
        self.L2_weights = np.random.randn(self.hiddenNodes, self.outputNodes) # 4x1
        
    def feed_forward(self, X):
        # Weighted sum between input and hidden layer
        self.hidden_sum = np.dot(X, self.L1_weights)
        # Activations of weighted sum
        self.activated_hidden = self.sigmoid(self.hidden_sum)
        # Weighted sum between hidden and output
        self.output_sum = np.dot(self.activated_hidden, self.L2_weights)
        # final activation of output
        self.activated_output = self.sigmoid(self.output_sum)
        return self.activated_output
    
    def sigmoid(self, s):
        return 1 / (1 + np.exp(-s))
    
    def sigmoidPrime(self, s):
        return s * (1 - s)
    
    def backward(self, X, y, o):
        # backward propagate through the network
        self.o_error = y - o # error in output
        self.o_delta = self.o_error*self.sigmoidPrime(o) # applying derivative of sigmoid to output error
        
        self.z2_error = self.o_delta.dot(self.L2_weights.T) # z2 errro: how much of our hidden layer weights contributed to output error
        self.z2_delta = self.z2_error*self.sigmoidPrime(self.activated_hidden) # applying derivative of sigmoid to z2_error
        
        self.L1_weights += X.T.dot(self.z2_delta) # adjusting first set (input --> hidden) weights
        self.L2_weights += self.activated_hidden.T.dot(self.o_delta) # adjusting second set (hidden --> output) weights
        
    def train(self, X, y):
        o = self.feed_forward(X)
        self.backward(X, y, o)

In [48]:
NN = Neural_Network()

for i in range(10000):
    if i+1 in [1, 2, 3, 4, 5] or (i+1) % 50 == 0:
        print('+--------- EPOCH', i+1, '------------+')
        print("Input: \n", X)
        print("Actual Ouput: \n", y)
        print("Predicted Output: \n" + str(NN.feed_forward(X)))
        print("Loss: \n" + str(np.mean(np.square(y - NN.feed_forward(X))))) # MSE
        print("\n")
    NN.train(X, y)
    
# Normalizing the data and shuffling it did not lower my Loss value

+--------- EPOCH 1 ------------+
Input: 
 [[58.  1.  2. ...  1.  1.  3.]
 [59.  1.  3. ...  2.  2.  2.]
 [58.  0.  0. ...  1.  0.  2.]
 ...
 [39.  0.  2. ...  1.  0.  2.]
 [43.  1.  0. ...  1.  4.  3.]
 [52.  1.  0. ...  2.  2.  3.]]
Actual Ouput: 
 [0. 0. 1. 1. 0. 0. 1. 0. 1. 0. 1. 0. 1. 0. 0. 1. 0. 1. 1. 1. 0. 0. 0. 1.
 1. 0. 0. 1. 1. 1. 1. 1. 0. 0. 1. 0. 1. 0. 0. 0. 0. 0. 1. 0. 1. 1. 0. 0.
 1. 1. 0. 0. 1. 1. 1. 1. 0. 1. 0. 0. 1. 1. 0. 1. 1. 0. 1. 1. 1. 1. 0. 1.
 1. 1. 0. 1. 1. 0. 1. 0. 1. 0. 1. 1. 1. 1. 0. 0. 1. 0. 0. 0. 1. 1. 0. 1.
 0. 1. 1. 1. 1. 1. 0. 1. 0. 1. 0. 0. 1. 1. 1. 0. 0. 1. 1. 1. 1. 1. 0. 1.
 1. 0. 0. 1. 0. 0. 0. 1. 1. 1. 0. 0. 1. 1. 1. 0. 0. 1. 0. 1. 0. 0. 1. 1.
 1. 1. 1. 1. 1. 0. 1. 1. 0. 1. 1. 1. 1. 0. 1. 0. 1. 0. 1. 0. 1. 1. 0. 1.
 0. 1. 0. 1. 1. 0. 1. 1. 1. 1. 0. 1. 1. 1. 1. 0. 0. 1. 0. 1. 1. 1. 1. 0.
 0. 1. 1. 1. 0. 1. 1. 1. 1. 0. 0. 0. 0. 1. 1. 0. 1. 1. 0. 0. 1. 1. 0. 0.
 0. 0. 1. 1. 1. 1. 0. 0. 0. 1. 1. 1. 1. 0. 0. 0. 0. 0. 1. 0. 0. 1. 0. 1.
 0. 0. 1. 0. 1. 1. 0

ValueError: shapes (303,303) and (1,4) not aligned: 303 (dim 1) != 1 (dim 0)

## 4) Implement a Multilayer Perceptron architecture of your choosing using the Keras library. Train your model and report its baseline accuracy. Then hyperparameter tune at least two parameters and report your model's accuracy. 

- Use the Heart Disease Dataset (binary classification)
- Use an appropriate loss function for a binary classification task
- Use an appropriate activation function on the final layer of your network. 
- Train your model using verbose output for ease of grading.
- Use GridSearchCV to hyperparameter tune your model. (for at least two hyperparameters)
- When hyperparameter tuning, show you work by adding code cells for each new experiment. 
- Report the accuracy for each combination of hyperparameters as you test them so that we can easily see which resulted in the highest accuracy.
- You must hyperparameter tune at least 5 parameters in order to get a 3 on this section.

In [56]:
##### Your Code Here #####
import pandas as pd

df = pd.read_csv('https://raw.githubusercontent.com/ryanleeallred/datasets/master/heart.csv')
df.head()


Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,63,1,3,145,233,1,0,150,0,2.3,0,0,1,1
1,37,1,2,130,250,0,1,187,0,3.5,0,0,2,1
2,41,0,1,130,204,0,0,172,0,1.4,2,0,2,1
3,56,1,1,120,236,0,1,178,0,0.8,2,0,2,1
4,57,0,0,120,354,0,1,163,1,0.6,2,0,2,1


In [57]:
df_copy = df.copy()
target = df_copy.pop('target')
y = target.values

X = df_copy.values
X

array([[63.,  1.,  3., ...,  0.,  0.,  1.],
       [37.,  1.,  2., ...,  0.,  0.,  2.],
       [41.,  0.,  1., ...,  2.,  0.,  2.],
       ...,
       [68.,  1.,  0., ...,  1.,  2.,  3.],
       [57.,  1.,  0., ...,  1.,  1.,  3.],
       [57.,  0.,  1., ...,  1.,  1.,  2.]])

### First I build and run Cross-Validation on a KerasClassifier without any hyperparameter tuning to get a 
### baseline evaluation

In [63]:
import keras
from sklearn.model_selection import StratifiedKFold
from sklearn.pipeline import Pipeline
from sklearn.pipeline import make_pipeline
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import GridSearchCV
from keras.wrappers.scikit_learn import KerasClassifier
from keras.models import Sequential
from keras.layers import Dense
from sklearn.preprocessing import StandardScaler

# fix random seed for reproducibility
seed = 42
np.random.seed(seed)

# define 5-fold cross validation test harness
kfold = StratifiedKFold(n_splits=5, shuffle=True, random_state=seed)

inputs = X.shape[1]
epochs = 100
batch_size = 10

# baseline model generator for use with the KerasClassifier wrapper for scikit learn
def baseline_model():
    # create model
    model = Sequential()
    model.add(Dense(12, input_dim=inputs, activation='relu'))
    model.add(Dense(8, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    # compile model
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

# evaluate model with normalized dataset using a Pipeline
estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasClassifier(build_fn=baseline_model, epochs=epochs, batch_size=batch_size, verbose=0)))
pipeline = Pipeline(estimators)
results = cross_val_score(pipeline, X, y, cv=kfold)
print()
print()
print("K-Fold Cross-Validation results -> Mean: {:.2f}, Standard deviation: {:.2f}".format(results.mean(), results.std()))



K-Fold Cross-Validation results -> Mean: 0.81, Standard deviation: 0.02


### Hyperparameter Tuning: Batch Size

In [64]:
pipe = make_pipeline(StandardScaler(), KerasClassifier(build_fn=baseline_model, verbose=3))

param_grid = {'kerasclassifier__batch_size': [10, 20, 40, 60, 80, 100],
             'kerasclassifier__epochs': [20]}

# Create grid search
grid = GridSearchCV(estimator=pipe, param_grid=param_grid, cv=kfold, n_jobs=-1)
grid_result = grid.fit(X, y)

# Report results
print()
print("Best: {:.2f} using {}".format(grid_result.best_score_, grid_result.best_params_))
print()
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}") 



Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20

Best: 0.83 using {'kerasclassifier__batch_size': 10, 'kerasclassifier__epochs': 20}

Means: 0.834983493235245, Stdev: 0.03660535740505067 with: {'kerasclassifier__batch_size': 10, 'kerasclassifier__epochs': 20}
Means: 0.8085808604463885, Stdev: 0.03705298353831973 with: {'kerasclassifier__batch_size': 20, 'kerasclassifier__epochs': 20}
Means: 0.7326732763756226, Stdev: 0.03368473675270844 with: {'kerasclassifier__batch_size': 40, 'kerasclassifier__epochs': 20}
Means: 0.7590759099513391, Stdev: 0.028848009359369646 with: {'kerasclassifier__batch_size': 60, 'kerasclassifier__epochs': 20}
Means: 0.66996699237194, Stdev: 0.062072692809988624 with: {'kerasclassifier__batch_size': 80, 'kerasclassifier__epochs': 20}
Means: 0.7326732639825777, Stdev: 0.0388791065784

### Hyperparameter Tuning: Batch Size

In [65]:
pipe = make_pipeline(StandardScaler(), KerasClassifier(build_fn=baseline_model, verbose=3))

param_grid = {'kerasclassifier__batch_size': [20, 40],
             'kerasclassifier__epochs': [20, 40, 60]}

# Create grid search
grid = GridSearchCV(estimator=pipe, param_grid=param_grid, cv=kfold, n_jobs=-1)
grid_result = grid.fit(X, y)

# Report results
print()
print("Best: {:.2f} using {}".format(grid_result.best_score_, grid_result.best_params_))
print()
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}") 



Epoch 1/60
Epoch 2/60
Epoch 3/60
Epoch 4/60
Epoch 5/60
Epoch 6/60
Epoch 7/60
Epoch 8/60
Epoch 9/60
Epoch 10/60
Epoch 11/60
Epoch 12/60
Epoch 13/60
Epoch 14/60
Epoch 15/60
Epoch 16/60
Epoch 17/60
Epoch 18/60
Epoch 19/60
Epoch 20/60
Epoch 21/60
Epoch 22/60
Epoch 23/60
Epoch 24/60
Epoch 25/60
Epoch 26/60
Epoch 27/60
Epoch 28/60
Epoch 29/60
Epoch 30/60
Epoch 31/60
Epoch 32/60
Epoch 33/60
Epoch 34/60
Epoch 35/60
Epoch 36/60
Epoch 37/60
Epoch 38/60
Epoch 39/60
Epoch 40/60
Epoch 41/60
Epoch 42/60
Epoch 43/60
Epoch 44/60
Epoch 45/60
Epoch 46/60
Epoch 47/60
Epoch 48/60
Epoch 49/60
Epoch 50/60
Epoch 51/60
Epoch 52/60
Epoch 53/60
Epoch 54/60
Epoch 55/60
Epoch 56/60
Epoch 57/60
Epoch 58/60
Epoch 59/60
Epoch 60/60

Best: 0.86 using {'kerasclassifier__batch_size': 20, 'kerasclassifier__epochs': 60}

Means: 0.8085808565120886, Stdev: 0.027084999565204528 with: {'kerasclassifier__batch_size': 20, 'kerasclassifier__epochs': 20}
Means: 0.8382838338908583, Stdev: 0.025120703385117696 with: {'kerasclassif

### Hyperparameter Tuning: Activation in Input Layer

In [67]:
def baseline_model(activation='relu'):
    # create model
    model = Sequential()
    model.add(Dense(12, input_dim=inputs, activation=activation))
    model.add(Dense(8, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    # compile model
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

pipe = make_pipeline(StandardScaler(), KerasClassifier(build_fn=baseline_model, verbose=3))

param_grid = {'kerasclassifier__batch_size': [20, 40],
             'kerasclassifier__epochs': [20, 40, 60],
             'kerasclassifier__activation': ['softmax', 'relu', 'tanh', 'sigmoid', 'hard_sigmoid', 'linear']}

# Create grid search
grid = GridSearchCV(estimator=pipe, param_grid=param_grid, cv=kfold, n_jobs=-1)
grid_result = grid.fit(X, y)

# Report results
print()
print("Best: {:.2f} using {}".format(grid_result.best_score_, grid_result.best_params_))
print()
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}") 



Epoch 1/60
Epoch 2/60
Epoch 3/60
Epoch 4/60
Epoch 5/60
Epoch 6/60
Epoch 7/60
Epoch 8/60
Epoch 9/60
Epoch 10/60
Epoch 11/60
Epoch 12/60
Epoch 13/60
Epoch 14/60
Epoch 15/60
Epoch 16/60
Epoch 17/60
Epoch 18/60
Epoch 19/60
Epoch 20/60
Epoch 21/60
Epoch 22/60
Epoch 23/60
Epoch 24/60
Epoch 25/60
Epoch 26/60
Epoch 27/60
Epoch 28/60
Epoch 29/60
Epoch 30/60
Epoch 31/60
Epoch 32/60
Epoch 33/60
Epoch 34/60
Epoch 35/60
Epoch 36/60
Epoch 37/60
Epoch 38/60
Epoch 39/60
Epoch 40/60
Epoch 41/60
Epoch 42/60
Epoch 43/60
Epoch 44/60
Epoch 45/60
Epoch 46/60
Epoch 47/60
Epoch 48/60
Epoch 49/60
Epoch 50/60
Epoch 51/60
Epoch 52/60
Epoch 53/60
Epoch 54/60
Epoch 55/60
Epoch 56/60
Epoch 57/60
Epoch 58/60
Epoch 59/60
Epoch 60/60

Best: 0.85 using {'kerasclassifier__activation': 'linear', 'kerasclassifier__batch_size': 40, 'kerasclassifier__epochs': 60}

Means: 0.8019801988066619, Stdev: 0.03486255629388665 with: {'kerasclassifier__activation': 'softmax', 'kerasclassifier__batch_size': 20, 'kerasclassifier__epochs

### Hyperparameter Tuning: Activation in Hidden Layer

In [68]:
param_grid = {'kerasclassifier__batch_size': [20, 40],
             'kerasclassifier__epochs': [20, 40, 60],
             'kerasclassifier__activation': ['softmax', 'relu', 'tanh', 'sigmoid', 'hard_sigmoid', 'linear']}


def baseline_model(activation='relu'):
    # create model
    model = Sequential()
    model.add(Dense(12, input_dim=inputs, activation='relu'))
    model.add(Dense(8, activation=activation))
    model.add(Dense(1, activation='sigmoid'))
    # compile model
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

pipe = make_pipeline(StandardScaler(), KerasClassifier(build_fn=baseline_model, verbose=3))

# Create grid search
grid = GridSearchCV(estimator=pipe, param_grid=param_grid, cv=kfold, n_jobs=-1)
grid_result = grid.fit(X, y)

# Report results
print()
print("Best: {:.2f} using {}".format(grid_result.best_score_, grid_result.best_params_))
print()
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}") 



Epoch 1/40
Epoch 2/40
Epoch 3/40
Epoch 4/40
Epoch 5/40
Epoch 6/40
Epoch 7/40
Epoch 8/40
Epoch 9/40
Epoch 10/40
Epoch 11/40
Epoch 12/40
Epoch 13/40
Epoch 14/40
Epoch 15/40
Epoch 16/40
Epoch 17/40
Epoch 18/40
Epoch 19/40
Epoch 20/40
Epoch 21/40
Epoch 22/40
Epoch 23/40
Epoch 24/40
Epoch 25/40
Epoch 26/40
Epoch 27/40
Epoch 28/40
Epoch 29/40
Epoch 30/40
Epoch 31/40
Epoch 32/40
Epoch 33/40
Epoch 34/40
Epoch 35/40
Epoch 36/40
Epoch 37/40
Epoch 38/40
Epoch 39/40
Epoch 40/40

Best: 0.85 using {'kerasclassifier__activation': 'softmax', 'kerasclassifier__batch_size': 20, 'kerasclassifier__epochs': 40}

Means: 0.8217821766441018, Stdev: 0.03810264330050086 with: {'kerasclassifier__activation': 'softmax', 'kerasclassifier__batch_size': 20, 'kerasclassifier__epochs': 20}
Means: 0.8481848263504481, Stdev: 0.013442581315084483 with: {'kerasclassifier__activation': 'softmax', 'kerasclassifier__batch_size': 20, 'kerasclassifier__epochs': 40}
Means: 0.8448844915962849, Stdev: 0.033181904087241684 with: {

### Hyperparameter Tuning: Optimizer

In [72]:
param_grid = {'kerasclassifier__batch_size': [20, 40],
             'kerasclassifier__epochs': [20, 40, 60],
             'kerasclassifier__optimizer': ['rmsprop', 'adam']}


def baseline_model(optimizer='adam'):
    # create model
    model = Sequential()
    model.add(Dense(12, input_dim=inputs, activation='relu'))
    model.add(Dense(8, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    # compile model
    model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])
    return model

pipe = make_pipeline(StandardScaler(), KerasClassifier(build_fn=baseline_model, verbose=3))

# Create grid search
grid = GridSearchCV(estimator=pipe, param_grid=param_grid, cv=kfold, n_jobs=-1)
grid_result = grid.fit(X, y)

# Report results
print()
print("Best: {:.2f} using {}".format(grid_result.best_score_, grid_result.best_params_))
print()
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}")



Epoch 1/60
Epoch 2/60
Epoch 3/60
Epoch 4/60
Epoch 5/60
Epoch 6/60
Epoch 7/60
Epoch 8/60
Epoch 9/60
Epoch 10/60
Epoch 11/60
Epoch 12/60
Epoch 13/60
Epoch 14/60
Epoch 15/60
Epoch 16/60
Epoch 17/60
Epoch 18/60
Epoch 19/60
Epoch 20/60
Epoch 21/60
Epoch 22/60
Epoch 23/60
Epoch 24/60
Epoch 25/60
Epoch 26/60
Epoch 27/60
Epoch 28/60
Epoch 29/60
Epoch 30/60
Epoch 31/60
Epoch 32/60
Epoch 33/60
Epoch 34/60
Epoch 35/60
Epoch 36/60
Epoch 37/60
Epoch 38/60
Epoch 39/60
Epoch 40/60
Epoch 41/60
Epoch 42/60
Epoch 43/60
Epoch 44/60
Epoch 45/60
Epoch 46/60
Epoch 47/60
Epoch 48/60
Epoch 49/60
Epoch 50/60
Epoch 51/60
Epoch 52/60
Epoch 53/60
Epoch 54/60
Epoch 55/60
Epoch 56/60
Epoch 57/60
Epoch 58/60
Epoch 59/60
Epoch 60/60

Best: 0.83 using {'kerasclassifier__batch_size': 40, 'kerasclassifier__epochs': 60, 'kerasclassifier__optimizer': 'adam'}

Means: 0.8085808565120886, Stdev: 0.023712207545200963 with: {'kerasclassifier__batch_size': 20, 'kerasclassifier__epochs': 20, 'kerasclassifier__optimizer': 'rmspro

### The best GridSearchCV results came from hyperparameter tuning came form a normal baseline model (activation='relu', 'sigmoid'), {'kerasclassifier__batch_size': 20, 'kerasclassifier__epochs': 60}.

### Comparison with RandomForestClassifier out of curiosity: (RandomForest comes pretty close to MLP performance)

In [75]:
from sklearn.ensemble import RandomForestClassifier

rf = RandomForestClassifier()

results = cross_val_score(rf, X, y, cv=kfold)

print("Cross-Validation scores: {}".format(results))
print("Mean: {:.2f}, Standard Deviation: {:.2f}".format(results.mean(), results.std()))

Cross-Validation scores: [0.7704918  0.86885246 0.80327869 0.78333333 0.76666667]
Mean: 0.80, Standard Deviation: 0.04




In [82]:
from sklearn.ensemble import RandomForestClassifier

rf = RandomForestClassifier(n_estimators=100, class_weight='balanced', min_samples_split=9)

results = cross_val_score(rf, X, y, cv=kfold)

print("Cross-Validation scores: {}".format(results))
print("Mean: {:.2f}, Standard Deviation: {:.2f}".format(results.mean(), results.std()))

Cross-Validation scores: [0.81967213 0.85245902 0.93442623 0.8        0.8       ]
Mean: 0.84, Standard Deviation: 0.05
