<img align="left" src="https://lever-client-logos.s3.amazonaws.com/864372b1-534c-480e-acd5-9711f850815c-1524247202159.png" width=200>
<br></br>
<br></br>

## *Data Science Unit 4 Sprint 2*

# Sprint Challenge - Neural Network Foundations

Table of Problems

1. [Defining Neural Networks](#Q1)
2. [Perceptron on XOR Gates](#Q2)
3. [Multilayer Perceptron](#Q3)
4. [Keras MMP](#Q4)

<a id="Q1"></a>
## 1. Define the following terms:

- **Neuron:** An artificial neuron is essentially a function. It recieves input from neurons or data input, sums all of its weighted inputs and bias, applies an activation function to the weighted sum, and then outputs the result of the activation function. This output is supposed to represent the extent to which the neuron is "activated."
- **Input Layer:** The first layer of an ANN that provides the initial data to be processed 
- **Hidden Layer:** Intermediary layers between input and output with neurons that take a set of weighted inputs and produce an output through an activation function
- **Output Layer:** The final layer of an ANN that transmits the prediction in an appropriately sized vector
- **Activation:** Activation Function: An activation is a function that transforms a continuous quantity (the weighted sums plus the bias) that can be anywhere on the number line into a value that represents the extent to which a neuron is activated.
- **Backpropagation:** the backwards propagation of errors. An artificial neural network has an error funciton that calculates how different the output values are from the desired output values. Through the use of gradient descent and the chain rule, we can calculate the gradient of the loss function with respect to the weights and adjust them accordingly


## 2. Perceptron on XOR Gates <a id="Q2"></a>

The XOr, or “exclusive or”, problem is a classic problem in ANN research. It is the problem of using a neural network to predict the outputs of XOr logic gates given two binary inputs. An XOr function should return a true value if the two inputs are not equal and a false value if they are equal. Create a perceptron class that can model the behavior of an AND gate. You can use the following table as your training data:

|x1	|x2 | y |
|---|---|---|
| 0 | 0 | 0 |
| 0 | 1 | 1 |
| 1 | 1 | 0 |
| 1 | 0 | 1 |


In [23]:
#imports 
import numpy as np 

In [None]:
# Training data and targets
train = np.array([[0,0],[0,1],[1,1],[1,0]])
target = np.array([[0],[1],[0],[1]])

In [23]:
class Perceptron():
    
    def __init__(self):
        # Set up architecture
        self.learning_rate = 0.1
        self.inputLayer = 2
        self.hiddenLayer = 2
        self.outputLayer = 1

        # Initialize weights and biases
        self.hidden_weights = np.random.uniform(size=(inputLayer,hiddenLayer))
        self.hidden_bias =np.random.uniform(size=(1,hiddenLayer))
        self.output_weights = np.random.uniform(size=(hiddenLayer,outputLayer))
        self.output_bias = np.random.uniform(size=(1,outputLayer))
        
    def sigmoid (self, x):
        return 1/(1 + np.exp(-x))
    
    def sigmoid_derivative(self, x):
        return x * (1 - x)

    def feed_forward(self, train):
        # Forward Propagation to calculate NN inference
        hidden_layer_activation = np.dot(train,self.hidden_weights)
        hidden_layer_activation += self.hidden_bias
        hidden_layer_output = sigmoid(hidden_layer_activation)

        output_layer_activation = np.dot(hidden_layer_output,self.output_weights)
        output_layer_activation += self.output_bias
        predicted_output = sigmoid(output_layer_activation)
        return predicted_output, hidden_layer_output
    
    def backprop(self, train, target, o, hidden_layer_output):
        error = target - o
        d_predicted_output = error * sigmoid_derivative(o)
    
        error_hidden_layer = d_predicted_output.dot(self.output_weights.T)
        d_hidden_layer = error_hidden_layer * sigmoid_derivative(hidden_layer_output)

        #Updating Weights and Biases
        self.output_weights += hidden_layer_output.T.dot(d_predicted_output) * learning_rate
        self.output_bias += np.sum(d_predicted_output,axis=0,keepdims=True) * learning_rate
        self.hidden_weights += train.T.dot(d_hidden_layer) * learning_rate
        self.hidden_bias += np.sum(d_hidden_layer,axis=0,keepdims=True) * learning_rate
        
    def train(self, train, target, epochs):
        for _ in range(epochs):
            o, hl_out = self.feed_forward(train)
            self.backprop(train, target, o, hl_out)
    
    def print_results(self):
        print("Final hidden weights: ",end='')
        print(*hidden_weights)
        print("Final hidden bias: ",end='')
        print(*hidden_bias)
        print("Final output weights: ",end='')
        print(*output_weights)
        print("Final output bias: ",end='')
        print(*output_bias)

        print("\nOutput from neural network: ",end='')
        print(*predicted_output)

In [24]:
nn = Perceptron()
nn.train(train, target, 10000)
nn.print_results()

Final hidden weights: [6.71192937 4.84296007] [6.71162415 4.84288494]
Final hidden bias: [-3.01402014 -7.42793911]
Final output weights: [10.31080769] [-11.00564752]
Final output bias: [-4.80376289]

Output from neural network: [0.01302485] [0.98886121] [0.01146408] [0.98886143]


## 3. Multilayer Perceptron <a id="Q3"></a>

Implement a Neural Network Multilayer Perceptron class that uses backpropagation to update the network's weights.
Your network must have one hidden layer.
You do not have to update weights via gradient descent. You can use something like the derivative of the sigmoid function to update weights.
Train your model on the Heart Disease dataset from UCI:



In [4]:
import pandas as pd

In [5]:
# import heart diesase
header_row = ['age','sex','pain','BP','chol','fbs','ecg','maxhr','eiang','eist','slope','vessels','thal','diagnosis']
df = pd.read_csv('processed.cleveland.data', names=header_row)
df.head()

Unnamed: 0,age,sex,pain,BP,chol,fbs,ecg,maxhr,eiang,eist,slope,vessels,thal,diagnosis
0,63.0,1.0,1.0,145.0,233.0,1.0,2.0,150.0,0.0,2.3,3.0,0.0,6.0,0
1,67.0,1.0,4.0,160.0,286.0,0.0,2.0,108.0,1.0,1.5,2.0,3.0,3.0,2
2,67.0,1.0,4.0,120.0,229.0,0.0,2.0,129.0,1.0,2.6,2.0,2.0,7.0,1
3,37.0,1.0,3.0,130.0,250.0,0.0,0.0,187.0,0.0,3.5,3.0,0.0,3.0,0
4,41.0,0.0,2.0,130.0,204.0,0.0,2.0,172.0,0.0,1.4,1.0,0.0,3.0,0


In [6]:
df['diagnosis'].value_counts()

0    164
1     55
2     36
3     35
4     13
Name: diagnosis, dtype: int64

In [7]:
df.shape

(303, 14)

In [8]:
df.dtypes

age          float64
sex          float64
pain         float64
BP           float64
chol         float64
fbs          float64
ecg          float64
maxhr        float64
eiang        float64
eist         float64
slope        float64
vessels       object
thal          object
diagnosis      int64
dtype: object

In [9]:
df['vessels'].value_counts()

0.0    176
1.0     65
2.0     38
3.0     20
?        4
Name: vessels, dtype: int64

In [10]:
df['thal'].value_counts()

3.0    166
7.0    117
6.0     18
?        2
Name: thal, dtype: int64

In [11]:
df[df['vessels']=='?']

Unnamed: 0,age,sex,pain,BP,chol,fbs,ecg,maxhr,eiang,eist,slope,vessels,thal,diagnosis
166,52.0,1.0,3.0,138.0,223.0,0.0,0.0,169.0,0.0,0.0,1.0,?,3.0,0
192,43.0,1.0,4.0,132.0,247.0,1.0,2.0,143.0,1.0,0.1,2.0,?,7.0,1
287,58.0,1.0,2.0,125.0,220.0,0.0,0.0,144.0,0.0,0.4,2.0,?,7.0,0
302,38.0,1.0,3.0,138.0,175.0,0.0,0.0,173.0,0.0,0.0,1.0,?,3.0,0


In [12]:
df['vessels'] = df['vessels'].apply(lambda x: 0.0 if x=='?' else x)
df['vessels'].value_counts()

0.0    176
1.0     65
2.0     38
3.0     20
0.0      4
Name: vessels, dtype: int64

In [13]:
df['thal'] = df['thal'].apply(lambda x: 3.0 if x=='?' else x)
df['thal'].value_counts()

3.0    166
7.0    117
6.0     18
3.0      2
Name: thal, dtype: int64

In [14]:
df['vessels'] = df['vessels'].astype(float)
df['thal'] = df['thal'].astype(float)
df['diagnosis'] = df['diagnosis'].astype(float)

In [15]:
df.dtypes

age          float64
sex          float64
pain         float64
BP           float64
chol         float64
fbs          float64
ecg          float64
maxhr        float64
eiang        float64
eist         float64
slope        float64
vessels      float64
thal         float64
diagnosis    float64
dtype: object

In [16]:
df.head()

Unnamed: 0,age,sex,pain,BP,chol,fbs,ecg,maxhr,eiang,eist,slope,vessels,thal,diagnosis
0,63.0,1.0,1.0,145.0,233.0,1.0,2.0,150.0,0.0,2.3,3.0,0.0,6.0,0.0
1,67.0,1.0,4.0,160.0,286.0,0.0,2.0,108.0,1.0,1.5,2.0,3.0,3.0,2.0
2,67.0,1.0,4.0,120.0,229.0,0.0,2.0,129.0,1.0,2.6,2.0,2.0,7.0,1.0
3,37.0,1.0,3.0,130.0,250.0,0.0,0.0,187.0,0.0,3.5,3.0,0.0,3.0,0.0
4,41.0,0.0,2.0,130.0,204.0,0.0,2.0,172.0,0.0,1.4,1.0,0.0,3.0,0.0


In [17]:
df['diagnosis'].value_counts()

0.0    164
1.0     55
2.0     36
3.0     35
4.0     13
Name: diagnosis, dtype: int64

In [18]:
X = df.values[:,0:13]
y = df.values[:,13]

In [1]:
from tensorflow import keras

In [41]:
#calculate the means and standard deviations of features
means = X.mean(axis=0)
std = X.std(axis=0)

#subtract the means and divide by stddev to standardize
# values now represent # of stddevs from the mean
X = X - means
X = X / std

In [19]:
y = keras.utils.to_categorical(y, num_classes=5)

In [20]:
y

array([[1., 0., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 1., 0., 0., 0.],
       ...,
       [0., 0., 0., 1., 0.],
       [0., 1., 0., 0., 0.],
       [1., 0., 0., 0., 0.]], dtype=float32)

In [48]:
X.shape

(303, 13)

In [49]:
y.shape

(303, 5)

In [83]:
class MLP():
    
    def __init__(self):
        # Set up architecture
        self.learning_rate = 0.1
        self.inputLayer = 13
        self.hiddenLayer = 10
        self.outputLayer = 5
        self.final_predicted_output = 0

        # Initialize weights and biases
        self.hidden_weights = np.random.uniform(size=(self.inputLayer,self.hiddenLayer))
        self.hidden_bias = np.random.uniform(size=(1,self.hiddenLayer))
        self.output_weights = np.random.uniform(size=(self.hiddenLayer,self.outputLayer))
        self.output_bias = np.random.uniform(size=(1,self.outputLayer))
        
    def sigmoid (self, x):
        return 1/(1 + np.exp(-x))
    
    def sigmoid_derivative(self, x):
        return x * (1 - x)
        
    
    def feed_forward(self, train):
        # Forward Propagation to calculate NN inference
        hidden_layer_activation = np.dot(train,self.hidden_weights)
        hidden_layer_activation += self.hidden_bias
        hidden_layer_output = self.sigmoid(hidden_layer_activation)

        output_layer_activation = np.dot(hidden_layer_output,self.output_weights)
        output_layer_activation += self.output_bias
        predicted_output = self.sigmoid(output_layer_activation)
        return predicted_output, hidden_layer_output
    
    def backprop(self, train, target, o, hidden_layer_output):
        #target = target.reshape((target.shape[0], 1))
        error = target - o
        d_predicted_output = error * self.sigmoid_derivative(o)
    
        error_hidden_layer = d_predicted_output.dot(self.output_weights.T)
        d_hidden_layer = error_hidden_layer * self.sigmoid_derivative(hidden_layer_output)

        #Updating Weights and Biases
        self.output_weights += hidden_layer_output.T.dot(d_predicted_output) * self.learning_rate
        self.output_bias += np.sum(d_predicted_output,axis=0,keepdims=True) * self.learning_rate
        self.hidden_weights += train.T.dot(d_hidden_layer) * self.learning_rate
        self.hidden_bias += np.sum(d_hidden_layer,axis=0,keepdims=True) * self.learning_rate
        
    def train(self, train, target, epochs):
        for _ in range(epochs):
            for data, label in zip(train, target):
                # ideally the reshaping should happen in backprop()
                # but i'm already running out of time
                data = data.reshape((data.shape[0], 1)).T
                label = label.reshape((label.shape[0], 1)).T
                o, hl_out = self.feed_forward(data)
                self.backprop(data, label, o, hl_out)
        self.final_predicted_output = o
    
    def print_results(self):
        print("Final hidden weights: ",end='')
        print(*self.hidden_weights)
        print("Final hidden bias: ",end='')
        print(*self.hidden_bias)
        print("Final output weights: ",end='')
        print(*self.output_weights)
        print("Final output bias: ",end='')
        print(*self.output_bias)

In [84]:
hnn = MLP()
hnn.train(X, y, 100)
hnn.print_results()

Final hidden weights: [ 1.37787504 -0.49844539  0.47731552  0.65161254  0.14453156 -0.02964934
  0.78391619  1.75074824 -0.94101907  1.72263395] [-1.20545763  0.5846483   0.47799282  0.55398245  0.23405722 -0.43513067
  0.80206002  1.13361561  1.24980555  0.81630973] [ 1.08553325  2.79463517 -0.10518525  1.04446676  0.77782481  1.74556135
  1.42082233  1.00210397 -0.1323946   1.02506538] [ 0.29174792  0.39082944 -0.35975876 -0.91198436 -0.40318775  1.17095343
  0.74041174  0.81734936  0.11044539  0.8296546 ] [-0.16854416 -1.18749883 -0.23805793 -0.44666334  0.42560774  0.33816394
  0.55380831  1.91382159 -0.37439738  0.01290689] [ 1.05247044 -0.52351098 -0.16787493  1.41657503  0.7919179  -0.71996999
  0.8582332   1.13656483  0.68841229 -0.99445122] [0.42375985 0.15783985 0.47337341 2.00529046 0.84668143 1.60158976
 0.34901066 0.15655724 1.06533607 1.373829  ] [ 1.57565952 -1.20405117  0.97792348  0.42450316  0.28662414 -0.11942514
  0.75768213 -1.26763007  0.5770717   0.88842714] [-0.

In [89]:
hnn.feed_forward(X[0].reshape((X[0].shape[0], 1)).T)

(array([[0.87209663, 0.024983  , 0.05668932, 0.04094132, 0.050406  ]]),
 array([[0.99790946, 0.00291566, 0.81942023, 0.99552801, 0.55565545,
         0.01309479, 0.96197002, 0.25406394, 0.9986804 , 0.20782387]]))

## 4. Keras MMP <a id="Q4"></a>

Implement a Multilayer Perceptron architecture of your choosing using the Keras library. Train your model and report its baseline accuracy. Then hyperparameter tune at least two parameters and report your model's accuracy.
Use the Heart Disease Dataset (binary classification)
Use an appropriate loss function for a binary classification task
Use an appropriate activation function on the final layer of your network.
Train your model using verbose output for ease of grading.
Use GridSearchCV to hyperparameter tune your model. (for at least two hyperparameters)
When hyperparameter tuning, show you work by adding code cells for each new experiment.
Report the accuracy for each combination of hyperparameters as you test them so that we can easily see which resulted in the highest accuracy.
You must hyperparameter tune at least 5 parameters in order to get a 3 on this section.

## Import dataset

In [111]:
df2 = pd.read_csv('framingham.csv')

In [112]:
df2.head()

Unnamed: 0,male,age,education,currentSmoker,cigsPerDay,BPMeds,prevalentStroke,prevalentHyp,diabetes,totChol,sysBP,diaBP,BMI,heartRate,glucose,TenYearCHD
0,1,39,4.0,0,0.0,0.0,0,0,0,195.0,106.0,70.0,26.97,80.0,77.0,0
1,0,46,2.0,0,0.0,0.0,0,0,0,250.0,121.0,81.0,28.73,95.0,76.0,0
2,1,48,1.0,1,20.0,0.0,0,0,0,245.0,127.5,80.0,25.34,75.0,70.0,0
3,0,61,3.0,1,30.0,0.0,0,1,0,225.0,150.0,95.0,28.58,65.0,103.0,1
4,0,46,3.0,1,23.0,0.0,0,0,0,285.0,130.0,84.0,23.1,85.0,85.0,0


In [114]:
df2.shape

(4240, 16)

In [115]:
df2.columns.values

array(['male', 'age', 'education', 'currentSmoker', 'cigsPerDay',
       'BPMeds', 'prevalentStroke', 'prevalentHyp', 'diabetes', 'totChol',
       'sysBP', 'diaBP', 'BMI', 'heartRate', 'glucose', 'TenYearCHD'],
      dtype=object)

In [92]:
df2.dtypes

male                 int64
age                  int64
education          float64
currentSmoker        int64
cigsPerDay         float64
BPMeds             float64
prevalentStroke      int64
prevalentHyp         int64
diabetes             int64
totChol            float64
sysBP              float64
diaBP              float64
BMI                float64
heartRate          float64
glucose            float64
TenYearCHD           int64
dtype: object

In [117]:
list = ['male', 'age', 'currentSmoker', 'prevalentStroke', 'prevalentHyp', 'diabetes', 'TenYearCHD']
for item in list:
    df2[item] = df2[item].astype(float)

In [124]:
# split into input (X) and output (Y) variables
dataset = df2.values
X = dataset[:,0:15]
Y = dataset[:,15]

In [125]:
#calculate the means and standard deviations of features
means = X.mean(axis=0)
std = X.std(axis=0)

#subtract the means and divide by stddev to standardize
# values now represent # of stddevs from the mean
X = X - means
X = X / std

## Hyperparameter tuning

In [127]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier
import numpy
from sklearn.model_selection import GridSearchCV
# fix random seed for reproducibility
numpy.random.seed(42)

In [142]:
# baseline model
def create_model():
    # create model
    model = Sequential()
    model.add(Dense(12, input_dim=15, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    # Compile model
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

# create model
model = KerasClassifier(build_fn=create_model, verbose=1)
model.fit(X,Y)



<tensorflow.python.keras.callbacks.History at 0x1a39261278>

In [136]:
# Function to create model, required for KerasClassifier
def create_model():
    # create model
    model = Sequential()
    model.add(Dense(12, input_dim=15, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    # Compile model
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)

# load dataset
dataset = df2.values

# split into input (X) and output (Y) variables
X = dataset[:,0:15]
Y = dataset[:,15]

# create model
model = KerasClassifier(build_fn=create_model, verbose=1)

# define the grid search parameters

param_grid = {'batch_size': [10, 100, 1000],
              'epochs': [20]}

grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=1)
grid_result = grid.fit(X, Y)

# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20

In [139]:
# Function to create model, required for KerasClassifier
def create_model(optimizer='adam'):
    # create model
    model = Sequential()
    model.add(Dense(12, input_dim=15, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    # Compile model
    model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])
    return model

In [140]:
# create model
model = KerasClassifier(build_fn=create_model, epochs=20, batch_size=10, verbose=1)
# define the grid search parameters
optimizer = ['sgd', 'adam']
param_grid = dict(optimizer=optimizer)
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=1)
grid_result = grid.fit(X, Y)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20

In [141]:
print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}") 

Best: 0.8481131927865856 using {'optimizer': 'sgd'}
Means: 0.8481131927865856, Stdev: 0.006075599508848612 with: {'optimizer': 'sgd'}
Means: 0.8481131927865856, Stdev: 0.006075599508848612 with: {'optimizer': 'adam'}
