<img align="left" src="https://lever-client-logos.s3.amazonaws.com/864372b1-534c-480e-acd5-9711f850815c-1524247202159.png" width=200>
<br></br>
<br></br>

## *Data Science Unit 4 Sprint 2*

# Sprint Challenge - Neural Network Foundations

Table of Problems

1. [Defining Neural Networks](#Q1)
2. [Perceptron on XOR Gates](#Q2)
3. [Multilayer Perceptron](#Q3)
4. [Keras MMP](#Q4)

<a id="Q1"></a>
## 1. Define the following terms:

- **Neuron:** A neuron takes each of the input values, multiplies each of them by a weight, sums the products, and then passes the sum through an activation function which produces the final value.
- **Input Layer:** The input layer receives input from the dataset(s). The input layer is also called the "visible layer" because its the only part that is exposed to and interacts with the data directly. 
- **Hidden Layer:** They're hidden in the sense that we don't know what their current state is, and their values are neither input nor output data. They're intermediate states of NN computation.
- **Output Layer:** Output layers return a vector of values in a suitable format for the problem domain. Typically output values are modified by an activation function to transform it to said suitable format. 
- **Activation:** The activation function determines if a neuron "fires" or not. The activation function, more accurately, determines how much of the signal to pass along from one layer to the next.
- **Backpropagation:** Backpropagation is an algorithm for updating the weights of a neural network starting at the output layer and working backwards using various calculus methods. 

## 2. Perceptron on AND Gates <a id="Q3=2"></a>

Create a perceptron class that can model the behavior of an AND gate. You can use the following table as your training data:

|x1	|x2|x3|	y|
|---|---|---|---|
1|	1|	1|	1|
1|	0|	1|	0|
0|	1|	1|	0|
0|	0|	1|	0|

In [2]:
# Imports for Perceptron on AND Gates
import numpy as np
from pprint import pprint

In [3]:
# Initialize inputs and weights. Add bias to X too.
np.random.seed(42)

X = np.array([
    [1,1,1],
    [1,0,1],
    [0,1,1],
    [0,0,1]])
y = [[1], [0], [0], [0]]

weights = 2 * np.random.random((3,1)) - 1
pprint(weights)

array([[-0.25091976],
       [ 0.90142861],
       [ 0.46398788]])


In [4]:
# Define activation function and the derivative "update" function
def sigmoid(x):
    return 1 / (1+np.exp(-x))

def sigmoid_derivative(x):
    sx = sigmoid(x)
    return sx * (1-sx)

In [5]:
# Calculate weighted sum
weighted_sum = np.dot(X, weights)

# Calculate activation values for epoch #1
activated_output = sigmoid(weighted_sum)
pprint(activated_output)

array([[0.75296649],
       [0.55306642],
       [0.79663861],
       [0.61395979]])


In [6]:
# Calculate error and update weights
error = y - activated_output

adjustments = error * sigmoid_derivative(activated_output)
weights += np.dot(X.T, adjustments)
pprint(weights)

array([[-0.32535954],
       [ 0.78457259],
       [ 0.07903399]])


In [9]:
# 10,000 epochs
for _ in range(10_000):
    # Same procedure as above
    weighted_sum = np.dot(X, weights)
    activated_output = sigmoid(weighted_sum)
    error = y - activated_output
    adjustments = error * sigmoid_derivative(activated_output)
    
    # Update weights
    weights += np.dot(X.T, adjustments)

print("Predictions/Activated Output for 10,000 epochs")
print(np.round(activated_output, decimals=2))

Predictions/Activated Output for 10,000 epochs
[[1.]
 [0.]
 [0.]
 [0.]]


## 3. Multilayer Perceptron <a id="Q3"></a>

Implement a Neural Network Multilayer Perceptron class that uses backpropagation to update the network's weights.
Your network must have one hidden layer.
You do not have to update weights via gradient descent. You can use something like the derivative of the sigmoid function to update weights.
Train your model on the Heart Disease dataset from UCI:



In [10]:
# Imports for Multilayer Perceptron
import pandas as pd

In [21]:
df = pd.read_csv('https://raw.githubusercontent.com/ryanleeallred/datasets/master/heart.csv')
df.head()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,63,1,3,145,233,1,0,150,0,2.3,0,0,1,1
1,37,1,2,130,250,0,1,187,0,3.5,0,0,2,1
2,41,0,1,130,204,0,0,172,0,1.4,2,0,2,1
3,56,1,1,120,236,0,1,178,0,0.8,2,0,2,1
4,57,0,0,120,354,0,1,163,1,0.6,2,0,2,1


In [22]:
df.describe()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
count,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0
mean,54.366337,0.683168,0.966997,131.623762,246.264026,0.148515,0.528053,149.646865,0.326733,1.039604,1.39934,0.729373,2.313531,0.544554
std,9.082101,0.466011,1.032052,17.538143,51.830751,0.356198,0.52586,22.905161,0.469794,1.161075,0.616226,1.022606,0.612277,0.498835
min,29.0,0.0,0.0,94.0,126.0,0.0,0.0,71.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,47.5,0.0,0.0,120.0,211.0,0.0,0.0,133.5,0.0,0.0,1.0,0.0,2.0,0.0
50%,55.0,1.0,1.0,130.0,240.0,0.0,1.0,153.0,0.0,0.8,1.0,0.0,2.0,1.0
75%,61.0,1.0,2.0,140.0,274.5,0.0,1.0,166.0,1.0,1.6,2.0,1.0,3.0,1.0
max,77.0,1.0,3.0,200.0,564.0,1.0,2.0,202.0,1.0,6.2,2.0,4.0,3.0,1.0


In [23]:
df.isna().sum()

age         0
sex         0
cp          0
trestbps    0
chol        0
fbs         0
restecg     0
thalach     0
exang       0
oldpeak     0
slope       0
ca          0
thal        0
target      0
dtype: int64

In [81]:
# Split data into X,y and normalize X.
X = df.drop(columns='target').values
y = np.array(df['target'].values).reshape(-1,1)

print(X.shape, type(X))
print(y.shape, type(y))

X = X / np.amax(X, axis=0)

(303, 13) <class 'numpy.ndarray'>
(303, 1) <class 'numpy.ndarray'>


In [105]:
# MLP implementation with Feedforward and Backpropagation
class MultilayerPerceptron:
    def __init__(self):
        # Architecture
        self.inputs = 13  # 13 features
        self.hidden_nodes = 303
        self.output_nodes = 1
        
        # Initial weights
        # 13x303 matrix for first layer
        self.weights1 = np.random.randn(self.inputs, self.hidden_nodes)
        # 303x1 matrix array for hidden to output
        self.weights2 = np.random.randn(self.hidden_nodes, self.output_nodes)
        
    def sigmoid(self, s):
        # Activation function
        return 1 / (1+np.exp(-s))
    
    def sigmoidPrime(self, s):
        return s * (1-s)
    
    def feed_forward(self, X):
        #Weighted sume of inputs and hidden layer
        self.hidden_sum = np.dot(X, self.weights1)
        
        #Acivations of weighted sum
        self.activated_hidden = self.sigmoid(self.hidden_sum)
        
        # Weight sum between hidden and output
        self.output_sum = np.dot(self.activated_hidden, self.weights2)
        
        #Final activation of output
        self.activated_output = self.sigmoid(self.output_sum)
        
        return self.activated_output
    
    # HERE LIVES A BUG
    def backward(self, X, y, o):
        # Calculate error, apply sigmoid derivative
        self.o_error = y - o
        self.o_delta = self.o_error * self.sigmoidPrime(o)
        
        self.z2_error = self.o_delta.dot(self.weights2.T)
        # print(self.z2_error)
        
        self.z2_delta = self.z2_error*self.sigmoidPrime(self.activated_hidden)
        # print(self.z2_delta)
        
        self.weights1 += X.T.dot(self.z2_delta)
        self.weights2 += self.activated_hidden.T.dot(self.o_delta) 
        
    def train(self, X, y):
        o = self.feed_forward(X)
        self.backward(X, y, o)

In [106]:
nn = MultilayerPerceptron()

for i in range(1000):
    if (i+1 in [1,2,3,4,5]) or ((i+1) % 50 ==0):
        print('+' + '---' * 3 + f'EPOCH {i+1}' + '---'*3 + '+')
        # print('Input: \n', X)
        # print('Actual Output: \n', y)
        # print('Predicted Output: \n', str(nn.feed_forward(X)))
        print("Loss: \n", str(np.mean(np.square(y - nn.feed_forward(X)))))
    nn.train(X,y)

+---------EPOCH 1---------+
Loss: 
 0.27969423483476347
+---------EPOCH 2---------+
Loss: 
 0.5445544554455446
+---------EPOCH 3---------+
Loss: 
 0.5445544554455446
+---------EPOCH 4---------+
Loss: 
 0.5445544554455446
+---------EPOCH 5---------+
Loss: 
 0.5445544554455446
+---------EPOCH 50---------+
Loss: 
 0.5445544554455446
+---------EPOCH 100---------+
Loss: 
 0.5445544554455446
+---------EPOCH 150---------+
Loss: 
 0.5445544554455446
+---------EPOCH 200---------+
Loss: 
 0.5445544554455446
+---------EPOCH 250---------+
Loss: 
 0.5445544554455446
+---------EPOCH 300---------+
Loss: 
 0.5445544554455446
+---------EPOCH 350---------+
Loss: 
 0.5445544554455446
+---------EPOCH 400---------+
Loss: 
 0.5445544554455446
+---------EPOCH 450---------+
Loss: 
 0.5445544554455446
+---------EPOCH 500---------+
Loss: 
 0.5445544554455446
+---------EPOCH 550---------+
Loss: 
 0.5445544554455446
+---------EPOCH 600---------+
Loss: 
 0.5445544554455446
+---------EPOCH 650---------+
Loss: 
 0.5

## 4. Keras MMP <a id="Q4"></a>

Implement a Multilayer Perceptron architecture of your choosing using the Keras library. Train your model and report its baseline accuracy. Then hyperparameter tune at least two parameters and report your model's accuracy.
Use the Heart Disease Dataset (binary classification)
Use an appropriate loss function for a binary classification task
Use an appropriate activation function on the final layer of your network.
Train your model using verbose output for ease of grading.
Use GridSearchCV to hyperparameter tune your model. (for at least two hyperparameters)
When hyperparameter tuning, show you work by adding code cells for each new experiment.
Report the accuracy for each combination of hyperparameters as you test them so that we can easily see which resulted in the highest accuracy.
You must hyperparameter tune at least 5 parameters in order to get a 3 on this section.

In [117]:
# Imports for Keras MLP
import keras
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasClassifier

from sklearn.model_selection import GridSearchCV

In [115]:
# Dataset already loaded and normalized
X.shape, y.shape

((303, 13), (303, 1))

In [116]:
# Train & Baseline accuracy
inputs = X.shape[1]
epochs = 50
batch_size = 10

model = Sequential()
model.add(Dense(20, activation='sigmoid', input_shape=(inputs,)))
model.add(Dense(60, activation='sigmoid'))
model.add(Dense(1))

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

model.fit(X, y, validation_split=0.33, epochs=epochs, batch_size=batch_size)

Train on 203 samples, validate on 100 samples
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


<keras.callbacks.History at 0x7f3a6c5e32b0>

In [118]:
keras.callbacks.History()

<keras.callbacks.History at 0x7f3a3f8cde48>

**Tune batch size**

In [122]:
# Required for KerasClassifier
def create_model():
    model = Sequential()
    model.add(Dense(20, input_dim=13, activation='sigmoid'))
    model.add(Dense(1, activation='sigmoid'))
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    
    return model

# Make output verbose for grading
model = KerasClassifier(build_fn=create_model, verbose=1)

param_grid = {'batch_size': [5, 10, 20, 40, 60, 80],
              'epochs': [20]}

# Create Grid Search
grid = GridSearchCV(estimator=model, param_grid=param_grid, cv=3, n_jobs=-1)
grid_result = grid.fit(X, y)

# Report Results
print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20
Best: 0.33663366833339037 using {'batch_size': 5, 'epochs': 20}


**Tune epochs**

In [None]:
# define the grid search parameters
param_grid = {'batch_size': [5, 10, 20],
              'epochs': [20, 40, 60, 80, 100]}

# Create Grid Search
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=1)
grid_result = grid.fit(X, y)

In [125]:
# Report Results
print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")

Best: 0.6633663478955971 using {'batch_size': 5, 'epochs': 100}


In [126]:
# define the grid search parameters
param_grid = {'batch_size': [5],
              'epochs': [50, 100, 200]}

# Create Grid Search
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=1)
grid_result = grid.fit(X, y)

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/5

In [127]:
# Report Results
print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")

Best: 0.6666666775351704 using {'batch_size': 5, 'epochs': 200}


**Tune hidden layer architecture**

In [128]:
# Add hidden layer with 30 nodes
def create_model():
    model = Sequential()
    model.add(Dense(20, input_dim=13, activation='sigmoid'))
    model.add(Dense(30, activation='sigmoid'))
    model.add(Dense(1, activation='sigmoid'))
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    
    return model

# Make output verbose for grading
model = KerasClassifier(build_fn=create_model, verbose=1)

param_grid = {'batch_size': [5],
              'epochs': [200]}

# Create Grid Search
grid = GridSearchCV(estimator=model, param_grid=param_grid, cv=3, n_jobs=-1)
grid_result = grid.fit(X, y)

Epoch 1/200
Epoch 2/200
Epoch 3/200
Epoch 4/200
Epoch 5/200
Epoch 6/200
Epoch 7/200
Epoch 8/200
Epoch 9/200
Epoch 10/200
Epoch 11/200
Epoch 12/200
Epoch 13/200
Epoch 14/200
Epoch 15/200
Epoch 16/200
Epoch 17/200
Epoch 18/200
Epoch 19/200
Epoch 20/200
Epoch 21/200
Epoch 22/200
Epoch 23/200
Epoch 24/200
Epoch 25/200
Epoch 26/200
Epoch 27/200
Epoch 28/200
Epoch 29/200
Epoch 30/200
Epoch 31/200
Epoch 32/200
Epoch 33/200
Epoch 34/200
Epoch 35/200
Epoch 36/200
Epoch 37/200
Epoch 38/200
Epoch 39/200
Epoch 40/200
Epoch 41/200
Epoch 42/200
Epoch 43/200
Epoch 44/200
Epoch 45/200
Epoch 46/200
Epoch 47/200
Epoch 48/200
Epoch 49/200
Epoch 50/200
Epoch 51/200
Epoch 52/200
Epoch 53/200
Epoch 54/200
Epoch 55/200
Epoch 56/200
Epoch 57/200
Epoch 58/200
Epoch 59/200
Epoch 60/200
Epoch 61/200
Epoch 62/200
Epoch 63/200
Epoch 64/200
Epoch 65/200
Epoch 66/200
Epoch 67/200
Epoch 68/200
Epoch 69/200
Epoch 70/200
Epoch 71/200
Epoch 72/200
Epoch 73/200
Epoch 74/200
Epoch 75/200
Epoch 76/200
Epoch 77/200
Epoch 78

In [129]:
# Report Results
print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")

Best: 0.6765676677817165 using {'batch_size': 5, 'epochs': 200}


In [131]:
# Change 
def create_model():
    model = Sequential()
    model.add(Dense(20, input_dim=13, activation='sigmoid'))
    model.add(Dense(20, activation='sigmoid'))
    model.add(Dense())
    model.add(Dense(1, activation='sigmoid'))
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    
    return model

# Make output verbose for grading
model = KerasClassifier(build_fn=create_model, verbose=1)

param_grid = {'batch_size': [5],
              'epochs': [200]}

# Create Grid Search
grid = GridSearchCV(estimator=model, param_grid=param_grid, cv=3, n_jobs=-1)
grid_result = grid.fit(X, y)

Epoch 1/200
Epoch 2/200
Epoch 3/200
Epoch 4/200
Epoch 5/200
Epoch 6/200
Epoch 7/200
Epoch 8/200
Epoch 9/200
Epoch 10/200
Epoch 11/200
Epoch 12/200
Epoch 13/200
Epoch 14/200
Epoch 15/200
Epoch 16/200
Epoch 17/200
Epoch 18/200
Epoch 19/200
Epoch 20/200
Epoch 21/200
Epoch 22/200
Epoch 23/200
Epoch 24/200
Epoch 25/200
Epoch 26/200
Epoch 27/200
Epoch 28/200
Epoch 29/200
Epoch 30/200
Epoch 31/200
Epoch 32/200
Epoch 33/200
Epoch 34/200
Epoch 35/200
Epoch 36/200
Epoch 37/200
Epoch 38/200
Epoch 39/200
Epoch 40/200
Epoch 41/200
Epoch 42/200
Epoch 43/200
Epoch 44/200
Epoch 45/200
Epoch 46/200
Epoch 47/200
Epoch 48/200
Epoch 49/200
Epoch 50/200
Epoch 51/200
Epoch 52/200
Epoch 53/200
Epoch 54/200
Epoch 55/200
Epoch 56/200
Epoch 57/200
Epoch 58/200
Epoch 59/200
Epoch 60/200
Epoch 61/200
Epoch 62/200
Epoch 63/200
Epoch 64/200
Epoch 65/200
Epoch 66/200
Epoch 67/200
Epoch 68/200
Epoch 69/200
Epoch 70/200
Epoch 71/200
Epoch 72/200
Epoch 73/200
Epoch 74/200
Epoch 75/200
Epoch 76/200
Epoch 77/200
Epoch 78

In [132]:
# Report Results
print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")

Best: 0.6699670076173524 using {'batch_size': 5, 'epochs': 200}


In [133]:
# Change hidden layer #1 to 20 nodes, add second hidden layer
def create_model():
    model = Sequential()
    model.add(Dense(20, input_dim=13, activation='sigmoid'))
    model.add(Dense(20, activation='sigmoid'))
    model.add(Dense(10, activation='sigmoid'))
    model.add(Dense(1, activation='sigmoid'))
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    
    return model

# Make output verbose for grading
model = KerasClassifier(build_fn=create_model, verbose=1)

param_grid = {'batch_size': [5],
              'epochs': [200]}

# Create Grid Search
grid = GridSearchCV(estimator=model, param_grid=param_grid, cv=3, n_jobs=-1)
grid_result = grid.fit(X, y)

Epoch 1/200
Epoch 2/200
Epoch 3/200
Epoch 4/200
Epoch 5/200
Epoch 6/200
Epoch 7/200
Epoch 8/200
Epoch 9/200
Epoch 10/200
Epoch 11/200
Epoch 12/200
Epoch 13/200
Epoch 14/200
Epoch 15/200
Epoch 16/200
Epoch 17/200
Epoch 18/200
Epoch 19/200
Epoch 20/200
Epoch 21/200
Epoch 22/200
Epoch 23/200
Epoch 24/200
Epoch 25/200
Epoch 26/200
Epoch 27/200
Epoch 28/200
Epoch 29/200
Epoch 30/200
Epoch 31/200
Epoch 32/200
Epoch 33/200
Epoch 34/200
Epoch 35/200
Epoch 36/200
Epoch 37/200
Epoch 38/200
Epoch 39/200
Epoch 40/200
Epoch 41/200
Epoch 42/200
Epoch 43/200
Epoch 44/200
Epoch 45/200
Epoch 46/200
Epoch 47/200
Epoch 48/200
Epoch 49/200
Epoch 50/200
Epoch 51/200
Epoch 52/200
Epoch 53/200
Epoch 54/200
Epoch 55/200
Epoch 56/200
Epoch 57/200
Epoch 58/200
Epoch 59/200
Epoch 60/200
Epoch 61/200
Epoch 62/200
Epoch 63/200
Epoch 64/200
Epoch 65/200
Epoch 66/200
Epoch 67/200
Epoch 68/200
Epoch 69/200
Epoch 70/200
Epoch 71/200
Epoch 72/200
Epoch 73/200
Epoch 74/200
Epoch 75/200
Epoch 76/200
Epoch 77/200
Epoch 78

In [134]:
# Report Results
print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")

Best: 0.6864686584708715 using {'batch_size': 5, 'epochs': 200}
