# Neural Networks Sprint Challenge

## 1) Define the following terms:

- Neuron
- Input Layer
- Hidden Layer
- Output Layer
- Activation
- Backpropagation

Neuron: the individual node in a neural network that receives inputs and returns a single output

Input Layer: the layer that takes in the data that we are using to make predictions. It takes the X data 

Hidden Layer: layer(s) in the neural network that connect the input to the ouput layer, and  performs the transformations on the data.

Output Layer: the final layer. It returns the "y" predictions. 

Activation: a function that takes numbers and compresses them into a smaller range of values to determine what information is carried to the next node

Backpropogation: a method to update the weights of the NN based on the errors for the predictions

 YOUR ANSWER HERE

## 2) Create a perceptron class that can model the behavior of an AND gate. You can use the following table as your training data:

| x1 | x2 | x3 | y |
|----|----|----|---|
| 1  | 1  | 1  | 1 |
| 1  | 0  | 1  | 0 |
| 0  | 1  | 1  | 0 |
| 0  | 0  | 1  | 0 |

In [1]:
import numpy as np

In [10]:
class Perceptron(object): 
    
    def __init__(self,n_inputs, threshold=100, learning_rate=.01):
        self.threshold = threshold 
        self.learning_rate = learning_rate
        self.weights = np.zeros(n_inputs+1)
    
    def predict(self, inputs):
        summation = np.dot(inputs, self.weights[1:])
        if summation > 0:
            activation = 1
        else:
            activation = 0
        return activation
    
    def train(self, training_inputs, labels):
        for _ in range(self.threshold):
            for inputs, label in zip(training_inputs, labels):
                prediction = self.predict(inputs)
                self.weights[1:] += self.learning_rate * (label - prediction) * prediction
                self.weights[0] += self.learning_rate * (label - prediction)

In [11]:
inputs = np.array([[1,1,1],
                 [1,0,1],
                 [0,1,1],
                 [0,0,1]])

labels = np.array([[1],[0],[0],[0]])

In [13]:
perceptron = Perceptron(n_inputs=3)

In [16]:
perceptron.train(training_inputs=inputs, labels=labels)

In [17]:
perceptron.weights

array([2., 0., 0., 0.])

## 3) Implement a Neural Network Multilayer Perceptron class that uses backpropagation to update the network's weights. 
- Your network must have one hidden layer. 
- You do not have to update weights via gradient descent. You can use something like the derivative of the sigmoid function to update weights.
- Train your model on the Heart Disease dataset from UCI:

[Github Dataset](https://github.com/ryanleeallred/datasets/blob/master/heart.csv)

[Raw File on Github](https://raw.githubusercontent.com/ryanleeallred/datasets/master/heart.csv)


In [19]:
import pandas as pd
data = pd.read_csv('https://github.com/ryanleeallred/datasets/raw/master/heart.csv')

In [21]:
data.shape

(303, 14)

In [24]:
inputs = data.iloc[:,0:-1]
labels = data.iloc[:,-1]

In [23]:
inputs.shape

(303, 13)

In [28]:
labels.shape

(303,)

In [18]:
np.random.seed(1911)

In [32]:
class NN(object):
    def __init__(self):
        self.inputs = 13
        self.hiddenNodes = 7
        self.outputNodes = 1

        self.L1_weights = np.random.randn(self.inputs, self.hiddenNodes)
        self.L2_weights = np.random.randn(self.hiddenNodes, self.outputNodes)

    def feed_forward(self, X):
        self.hidden_sum = np.dot(X, self.L1_weights)
        self.activated_hidden = self.sigmoid(self.hidden_sum)
        self.output_sum = np.dot(self.activated_hidden, self.L2_weights)
        self.activated_output = self.sigmoid(self.output_sum)
        return self.activated_output
        
    def sigmoid(self, s):
        return 1/(1+np.exp(-s))
    
    def sigmoidPrime(self, s):
        return s * (1 - s)
    
    def backward(self, X, y, o):
        self.o_error = y - o 
        self.o_delta = self.o_error*self.sigmoidPrime(o) 

        self.z2_error = self.o_delta.dot(self.L2_weights.T) t error
        self.z2_delta = self.z2_error*self.sigmoidPrime(self.activated_hidden) 

        self.L1_weights += X.T.dot(self.z2_delta) 
        self.L2_weights += self.activated_hidden.T.dot(self.o_delta) 
        
    def train (self, X, y):
        o = self.feed_forward(X)
        self.backward(X, y, o)

In [78]:
train = data.iloc[:200,:]
t_inputs = np.array(train.iloc[:,:-1])
t_labels = train.iloc[:,-1].tolist()

label_list = []
for i in t_labels:
    label_list.append([i])
t_labels = np.array(test)

In [83]:
X = t_inputs
y = t_labels
NN = Neural_Network()
for i in range(10): 
    print('+---------- EPOCH', i+1, '-----------+')
    print("Loss: \n" + str(np.mean(np.square(y - NN.feed_forward(X))))) # mean sum squared loss
    print("\n")
    NN.train(X, y)

+---------- EPOCH 1 -----------+
Loss: 
0.14722586583847097


+---------- EPOCH 2 -----------+
Loss: 
0.1746731441907159


+---------- EPOCH 3 -----------+
Loss: 
0.17463973132304544


+---------- EPOCH 4 -----------+
Loss: 
0.17459896590831744


+---------- EPOCH 5 -----------+
Loss: 
0.17454819203946148


+---------- EPOCH 6 -----------+
Loss: 
0.17448333322240156


+---------- EPOCH 7 -----------+
Loss: 
0.17439782066818527


+---------- EPOCH 8 -----------+
Loss: 
0.17428040589193905


+---------- EPOCH 9 -----------+
Loss: 
0.17411026134619545


+---------- EPOCH 10 -----------+
Loss: 
0.17384459687241197




## 4) Implement a Multilayer Perceptron architecture of your choosing using the Keras library. Train your model and report its baseline accuracy. Then hyperparameter tune at least two parameters and report your model's accuracy. 

- Use the Heart Disease Dataset (binary classification)
- Use an appropriate loss function for a binary classification task
- Use an appropriate activation function on the final layer of your network. 
- Train your model using verbose output for ease of grading.
- Use GridSearchCV to hyperparameter tune your model. (for at least two hyperparameters)
- When hyperparameter tuning, show you work by adding code cells for each new experiment. 
- Report the accuracy for each combination of hyperparameters as you test them so that we can easily see which resulted in the highest accuracy.
- You must hyperparameter tune at least 5 parameters in order to get a 3 on this section.

In [85]:
from keras.models import Sequential
from keras.layers import Dense
from sklearn.model_selection import StratifiedKFold
from sklearn.model_selection import GridSearchCV
from keras.wrappers.scikit_learn import KerasClassifier

Using TensorFlow backend.


In [86]:
from sklearn.preprocessing import RobustScaler

In [88]:
scaler = RobustScaler()
X = scaler.fit_transform(data.iloc[:,:-1])
y = data.iloc[:,-1]

In [96]:
%%time
seed = 1911
np.random.seed(seed)

def create_model():
    model = Sequential()
    model.add(Dense(12, input_dim=13, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    # Compile model
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

model = KerasClassifier(build_fn=create_model, verbose=1)

kfold = StratifiedKFold(n_splits=3, shuffle=True, random_state=seed)

batch_size = [20,50,100]
epochs = [20,50,100]
param_grid = dict(batch_size=batch_size, epochs=epochs)

grid = GridSearchCV(estimator=model, param_grid=param_grid, 
                    n_jobs=1, cv=kfold)

grid_result = grid.fit(X, y)



CPU times: user 1min, sys: 4.66 s, total: 1min 5s
Wall time: 56.5 s


In [97]:
print(f"Best: {grid_result.best_score_:.3f} using {grid_result.best_params_}")

Best: 0.851 using {'batch_size': 20, 'epochs': 100}


### Best: batch 20 and epoch 100

In [103]:
seed = 1911
np.random.seed(seed)

def create_model():
    model = Sequential()
    model.add(Dense(12, input_dim=13, activation='relu'))
    model.add(Dense(12, input_dim=13, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    # Compile model
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

model = KerasClassifier(build_fn=create_model, verbose=1)

kfold = StratifiedKFold(n_splits=3, shuffle=True, random_state=seed)

batch_size = [20]
epochs = [100]
#optimizer = ['adam','sgd','rmsprop','nadam']

param_grid = dict(batch_size=batch_size, 
                  epochs=epochs
                 #,optimizer=optimizer
                 )

grid = GridSearchCV(estimator=model, param_grid=param_grid, 
                    n_jobs=1, cv=kfold)

grid_result = grid.fit(X, y)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

### 2 Layers Better than 1

In [104]:
seed = 1911
np.random.seed(seed)

def create_model():
    model = Sequential()
    model.add(Dense(12, input_dim=13, activation='relu'))
    model.add(Dense(12, input_dim=13, activation='relu'))
    model.add(Dense(12, input_dim=13, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))

    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

model = KerasClassifier(build_fn=create_model, verbose=1)

kfold = StratifiedKFold(n_splits=3, shuffle=True, random_state=seed)

batch_size = [20]
epochs = [100]
#optimizer = ['adam','sgd','rmsprop','nadam']

param_grid = dict(batch_size=batch_size, 
                  epochs=epochs
                 #,optimizer=optimizer
                 )

grid = GridSearchCV(estimator=model, param_grid=param_grid, 
                    n_jobs=1, cv=kfold)

grid_result = grid.fit(X, y)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

# 3 layers better than 2. Probably overfitting

In [106]:
seed = 1911
np.random.seed(seed)

def create_model(optimizer='adam'):
    model = Sequential()
    model.add(Dense(12, input_dim=13, activation='relu'))
    model.add(Dense(12, input_dim=13, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    # Compile model
    model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])
    return model

model = KerasClassifier(build_fn=create_model, verbose=1)

kfold = StratifiedKFold(n_splits=3, shuffle=True, random_state=seed)

batch_size = [20]
epochs = [100]
optimizer = ['adam','sgd','rmsprop','nadam']

param_grid = dict(batch_size=batch_size, 
                  epochs=epochs,
                 optimizer=optimizer)

grid = GridSearchCV(estimator=model, param_grid=param_grid, 
                    n_jobs=1, cv=kfold)

grid_result = grid.fit(X, y)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

In [107]:
print(f"Best: {grid_result.best_score_:.3f} using {grid_result.best_params_}")

Best: 0.842 using {'batch_size': 20, 'epochs': 100, 'optimizer': 'sgd'}


# Weird - it got worse this go around? 