# Neural Networks Sprint Challenge

## 1) Define the following terms:

- Neuron
- Input Layer
- Hidden Layer
- Output Layer
- Activation
- Backpropagation

#### Neuron:
or a perceptron, is a basic unit of a neural network, it has several inputs, for each input there is a weight (weight of that spesific connection). When the artifitial neuron activates, it computes its state, by adding all the incoming inputs multiplied by its corresponding connection weight. After computing its state the neuron passes it through its activation function, which normalizes the result. (between 0:1, -1:1, or only +). The neuron / perceptron consists of 4 parts:
Input values or One input layer
Weights and Bias
Net sum
Activation Function

#### Input Layer:
is also called a visible layer, since we see and interact with it. This is where we feed a dataset into Neural Networks. The input layer passes the data directly to the first hidden layer where the data is multiplied by the first hidden layer's weights. Also, the input layer might have its own weights that multiply the incoming data.


#### Hidden Layer:
is a layer which transforms inputs from the previous layer into something that the output layer can use. A feed forward neural network applies a series of functions to the data. The exact function will depend on the neural network (for ex., it can be a linear transformation of the previous layer, followed by a squashing nonlinearity, or computing logical functions). This layer is responsible extracting the required features from the input data.

#### Output Layer:
The output layer of the neural network collects and transmits the information accordingly in way it has been designed to give. The pattern presented by the output layer can be directly traced back to the input layer.

#### Activation:
The activation is the result of applying activation function to a weighted sum of inputs in a neuron. The activation function is the non linear transformation that we do over weighted sum of inputs. This activation is then sent to the next layer of neurons as input. There are several activation functions, such as sigmoid, tahn, relu, LeakyRelU, softmax, and others.

#### Backpropagation:
is short for 'Backwards propagation of errors' and refers to the process/algorithm for how weights in NN are updated in reverse at the end of each training epoch. The weights are updated by comparing the desired and actual output of NN.




## 2) Create a perceptron class that can model the behavior of an AND gate. You can use the following table as your training data:

| x1 | x2 | x3 | y |
|----|----|----|---|
| 1  | 1  | 1  | 1 |
| 1  | 0  | 1  | 0 |
| 0  | 1  | 1  | 0 |
| 0  | 0  | 1  | 0 |

In [86]:
##### Your Code Here #####
import numpy as np
np.random.seed(1)

#AND Gate data:
X = np.array([[1,0,1],
            [1,1,1],
            [0,0,1],
            [0,1,1]])
y = np.array([1,1,0,0])

def sigmoid(X):
        return 1 / (1 + np.exp(-X))
    
def sigmoid_derivative(X):
    return sigmoid(X) * (1 - sigmoid(X))    


class Perceptron():
    """
    Perceptron class for binary classification
    """
    def __init__(self, n_iter=100):
        self.n_iter = n_iter
        
        
    def sigmoid(self, X):
        return 1 / (1 + np.exp(-X))
    
    def sigmoid_derivative(self, X):
        return sigmoid(X) * (1 - sigmoid(X))    
    
    def fit(self, X, y):
        
        # initialize weights and cost list
        self.weights = 2 * np.random.random((3,1)) - 1
        # iterate 
        for i in range(self.n_iter):
            # Weighted sum of inputs and weights
            weighted_sum = np.dot(inputs, self.weights)

            # Activate with sigmoid function
            activated_output = sigmoid(weighted_sum)

            # Calculate Error
            error = correct_outputs - activated_output

            # Calculate weight adjustments with sigmoid_derivative
            adjustments = error * sigmoid_derivative(activated_output)

            # Update weights
            self.weights += np.dot(inputs.T, adjustments)
            
            
        return self
    
#Fitting AND gate:
ppn = Perceptron(n_iter=100)
ppn.fit(X, y)
ppn.weights     

array([[ 3.02192721],
       [ 3.03264893],
       [-4.84302163]])

## 3) Implement a Neural Network Multilayer Perceptron class that uses backpropagation to update the network's weights. 
- Your network must have one hidden layer. 
- You do not have to update weights via gradient descent. You can use something like the derivative of the sigmoid function to update weights.
- Train your model on the Heart Disease dataset from UCI:

[Github Dataset](https://github.com/ryanleeallred/datasets/blob/master/heart.csv)

[Raw File on Github](https://raw.githubusercontent.com/ryanleeallred/datasets/master/heart.csv)


In [96]:
##### Your Code Here #####
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

df = pd.read_csv('https://raw.githubusercontent.com/ryanleeallred/datasets/master/heart.csv')
df.head()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,63,1,3,145,233,1,0,150,0,2.3,0,0,1,1
1,37,1,2,130,250,0,1,187,0,3.5,0,0,2,1
2,41,0,1,130,204,0,0,172,0,1.4,2,0,2,1
3,56,1,1,120,236,0,1,178,0,0.8,2,0,2,1
4,57,0,0,120,354,0,1,163,1,0.6,2,0,2,1


In [97]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 303 entries, 0 to 302
Data columns (total 14 columns):
age         303 non-null int64
sex         303 non-null int64
cp          303 non-null int64
trestbps    303 non-null int64
chol        303 non-null int64
fbs         303 non-null int64
restecg     303 non-null int64
thalach     303 non-null int64
exang       303 non-null int64
oldpeak     303 non-null float64
slope       303 non-null int64
ca          303 non-null int64
thal        303 non-null int64
target      303 non-null int64
dtypes: float64(1), int64(13)
memory usage: 33.2 KB


In [101]:
X = df.iloc[:, 0:13].values
y = df.target.values
print(X.shape)
print(y.shape)

(303, 13)
(303,)


In [102]:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X = scaler.fit_transform(X)
X

array([[ 0.9521966 ,  0.68100522,  1.97312292, ..., -2.27457861,
        -0.71442887, -2.14887271],
       [-1.91531289,  0.68100522,  1.00257707, ..., -2.27457861,
        -0.71442887, -0.51292188],
       [-1.47415758, -1.46841752,  0.03203122, ...,  0.97635214,
        -0.71442887, -0.51292188],
       ...,
       [ 1.50364073,  0.68100522, -0.93851463, ..., -0.64911323,
         1.24459328,  1.12302895],
       [ 0.29046364,  0.68100522, -0.93851463, ..., -0.64911323,
         0.26508221,  1.12302895],
       [ 0.29046364, -1.46841752,  0.03203122, ..., -0.64911323,
         0.26508221, -0.51292188]])

In [111]:
#Multilayer perceptron algorithm with backpropagation
from random import seed
from random import randrange
from random import random
from csv import reader
from math import exp
 
# Load a CSV file
def load_csv(filename):
    dataset = list()
    with open(filename, 'r') as file:
        csv_reader = reader(file)
        for row in csv_reader:
            if not row:
                continue
            dataset.append(row)
    return dataset
 
# Convert string column to float
def str_column_to_float(dataset, column):
    for row in dataset:
        row[column] = float(row[column].strip())
 
# Convert string column to integer
def str_column_to_int(dataset, column):
    class_values = [row[column] for row in dataset]
    unique = set(class_values)
    lookup = dict()
    for i, value in enumerate(unique):
        lookup[value] = i
    for row in dataset:
        row[column] = lookup[row[column]]
    return lookup
 
# Find the min and max values for each column
def dataset_minmax(dataset):
    minmax = list()
    stats = [[min(column), max(column)] for column in zip(*dataset)]
    return stats

# Rescale dataset columns to the range 0-1
def normalize_dataset(dataset, minmax):
    for row in dataset:
        for i in range(len(row)-1):
            row[i] = (row[i] - minmax[i][0]) / (minmax[i][1] - minmax[i][0])

# Split a dataset into k folds
def cross_validation_split(dataset, n_folds):
    dataset_split = list()
    dataset_copy = list(dataset)
    fold_size = int(len(dataset) / n_folds)
    for i in range(n_folds):
        fold = list()
        while len(fold) < fold_size:
            index = randrange(len(dataset_copy))
            fold.append(dataset_copy.pop(index))
        dataset_split.append(fold)
    return dataset_split
 
# Calculate accuracy percentage
def accuracy_metric(actual, predicted):
    correct = 0
    for i in range(len(actual)):
        if actual[i] == predicted[i]:
            correct += 1
    return correct / float(len(actual)) * 100.0
 
# Evaluate an algorithm using a cross validation split
def evaluate_algorithm(dataset, algorithm, n_folds, *args):
    folds = cross_validation_split(dataset, n_folds)
    scores = list()
    for fold in folds:
        train_set = list(folds)
        train_set.remove(fold)
        train_set = sum(train_set, [])
        test_set = list()
        for row in fold:
            row_copy = list(row)
            test_set.append(row_copy)
            row_copy[-1] = None
        predicted = algorithm(train_set, test_set, *args)
        actual = [row[-1] for row in fold]
        accuracy = accuracy_metric(actual, predicted)
        scores.append(accuracy)
    return scores

# Calculate neuron activation for an input
def activate(weights, inputs):
    activation = weights[-1]
    for i in range(len(weights)-1):
        activation += weights[i] * inputs[i]
    return activation

# Transfer neuron activation
def transfer(activation):
    return 1.0 / (1.0 + exp(-activation))
 
# Forward propagate input to a network output
def forward_propagate(network, row):
    inputs = row
    for layer in network:
        new_inputs = []
        for neuron in layer:
            activation = activate(neuron['weights'], inputs)
            neuron['output'] = transfer(activation)
            new_inputs.append(neuron['output'])
        inputs = new_inputs
    return inputs
 
# Calculate the derivative of an neuron output
def transfer_derivative(output):
    return output * (1.0 - output)
 
# Backpropagate error and store in neurons
def backward_propagate_error(network, expected):
    for i in reversed(range(len(network))):
        layer = network[i]
        errors = list()
        if i != len(network)-1:
            for j in range(len(layer)):
                error = 0.0
                for neuron in network[i + 1]:
                    error += (neuron['weights'][j] * neuron['delta'])
                errors.append(error)
        else:
            for j in range(len(layer)):
                neuron = layer[j]
                errors.append(expected[j] - neuron['output'])
        for j in range(len(layer)):
            neuron = layer[j]
            neuron['delta'] = errors[j] * transfer_derivative(neuron['output'])
 

# Update network weights with error
def update_weights(network, row, l_rate):
    for i in range(len(network)):
        inputs = row[:-1]
        if i != 0:
            inputs = [neuron['output'] for neuron in network[i - 1]]
        for neuron in network[i]:
            for j in range(len(inputs)):
                neuron['weights'][j] += l_rate * neuron['delta'] * inputs[j]
            neuron['weights'][-1] += l_rate * neuron['delta']

# Train a network for a fixed number of epochs
def train_network(network, train, l_rate, n_epoch, n_outputs):
    for epoch in range(n_epoch):
        for row in train:
            outputs = forward_propagate(network, row)
            expected = [0 for i in range(n_outputs)]
            expected[row[-1]] = 1
            backward_propagate_error(network, expected)
            update_weights(network, row, l_rate)
            
 
# Initialize a network
def initialize_network(n_inputs, n_hidden, n_outputs):
    network = list()
    hidden_layer = [{'weights':[random() for i in range(n_inputs + 1)]} for i in range(n_hidden)]
    network.append(hidden_layer)
    output_layer = [{'weights':[random() for i in range(n_hidden + 1)]} for i in range(n_outputs)]
    network.append(output_layer)
    return network
 
# Make a prediction with a network
def predict(network, row):
    outputs = forward_propagate(network, row)
    return outputs.index(max(outputs))
 
# Backpropagation Algorithm With Stochastic Gradient Descent
def back_propagation(train, test, l_rate, n_epoch, n_hidden):
    n_inputs = len(train[0]) - 1
    n_outputs = len(set([row[-1] for row in train]))
    network = initialize_network(n_inputs, n_hidden, n_outputs)
    train_network(network, train, l_rate, n_epoch, n_outputs)
    predictions = list()
    for row in test:
        prediction = predict(network, row)
        predictions.append(prediction)
    return(predictions)

# Test Backprop on Seeds dataset
seed(1)

from mlxtend.data import mnist_data
X, y = mnist_data()

# load and prepare data
filename = 'heart.csv'
dataset = load_csv(filename)
for i in range(len(dataset[0])-1):
    str_column_to_float(dataset, i)
# convert class column to integers
str_column_to_int(dataset, len(dataset[0])-1)
# normalize input variables
minmax = dataset_minmax(df)
normalize_dataset(df, minmax)
# evaluate algorithm
n_folds = 5
l_rate = 0.3
n_epoch = 500
n_hidden = 5
scores = evaluate_algorithm(df, back_propagation, n_folds, l_rate, n_epoch, n_hidden)
print('Scores: %s' % scores)
print('Mean Accuracy: %.3f%%' % (sum(scores)/float(len(scores))))

## 4) Implement a Multilayer Perceptron architecture of your choosing using the Keras library. Train your model and report its baseline accuracy. Then hyperparameter tune at least two parameters and report your model's accuracy. 

- Use the Heart Disease Dataset (binary classification)
- Use an appropriate loss function for a binary classification task
- Use an appropriate activation function on the final layer of your network. 
- Train your model using verbose output for ease of grading.
- Use GridSearchCV to hyperparameter tune your model. (for at least two hyperparameters)
- When hyperparameter tuning, show you work by adding code cells for each new experiment. 
- Report the accuracy for each combination of hyperparameters as you test them so that we can easily see which resulted in the highest accuracy.
- You must hyperparameter tune at least 5 parameters in order to get a 3 on this section.

In [69]:
import keras
from keras.layers import Dense, Dropout
from keras.models import Sequential
from keras.optimizers import SGD, Adam, Nadam
from keras.wrappers.scikit_learn import KerasClassifier

import pandas as pd
import numpy as np
import category_encoders as ce


from sklearn.model_selection import train_test_split,cross_val_score, StratifiedKFold, KFold, GridSearchCV 
              
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

# fix random seed for reproducibility
np.random.seed(42)

In [74]:
##### Your Code Here #####
from keras.layers.advanced_activations import LeakyReLU, PReLU
from keras.layers import Dense, Dropout
from keras.optimizers import SGD, Adam, Nadam

def create_model(lr=0.05,
                 activation='relu',                 
                 input_shape=(X.shape[1],),
                 optimizer=Adam,
                 relu_alpha = 0.003,
                 dropout_rate = 0.2,
                weight_initializer='random_normal'):
    
    # initialize a model
    model = Sequential()
    
    # add input layer
    model.add(Dense(10, input_shape=input_shape, kernel_initializer=weight_initializer,))
    model.add(LeakyReLU(alpha=relu_alpha)) 
    model.add(Dropout(rate=dropout_rate))

    
    # add hidden layers
    model.add(Dense(10, kernel_initializer=weight_initializer,))
    model.add(LeakyReLU(alpha=relu_alpha)) 
    model.add(Dropout(rate=dropout_rate))
        
    model.add(Dense(10, kernel_initializer=weight_initializer,))
    model.add(LeakyReLU(alpha=relu_alpha)) 
    model.add(Dropout(rate=dropout_rate))
    
    model.add(Dense(8, kernel_initializer=weight_initializer,))
    model.add(LeakyReLU(alpha=relu_alpha)) 
    model.add(Dropout(rate=dropout_rate))

    
    # add final output layer
    model.add(Dense(1, activation='sigmoid'))
    
    # optimizer
    optimizer=optimizer(lr=lr)
    
    # compile model
    model.compile(optimizer=optimizer, loss='binary_crossentropy', metrics=['acc'])
              
    return model

### Batch size

In [76]:
model = KerasClassifier(build_fn=create_model, 
                               epochs=epochs,
                               batch_size=100,
                               verbose=0)

# define the grid search parameters
param_grid = {'batch_size': [20, 60, 100],
              'epochs': [20]}

# Create Grid Search
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=1)
grid_result = grid.fit(X, y)

# Report Results
print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}")
#Best: 0.6501650212228102 using {'batch_size': 20, 'epochs': 20}    
    

Best: 0.6897689856515072 using {'batch_size': 60, 'epochs': 20}
Means: 0.5511551115772512, Stdev: 0.17901057722136335 with: {'batch_size': 20, 'epochs': 20}
Means: 0.6897689856515072, Stdev: 0.09576645732787446 with: {'batch_size': 60, 'epochs': 20}
Means: 0.6303630315824704, Stdev: 0.09878963523601354 with: {'batch_size': 100, 'epochs': 20}


### Epochs

In [78]:
epochs = 20
model = KerasClassifier(build_fn=create_model, 
                               epochs=epochs,
                               batch_size=60,
                               verbose=0)

# define the grid search parameters
param_grid = {'batch_size': [60],
              'epochs': [20, 30, 40]}

# Create Grid Search
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=1)
grid_result = grid.fit(X, y)

# Report Results
print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}")

Best: 0.6963696316523914 using {'batch_size': 60, 'epochs': 30}
Means: 0.6666666772892766, Stdev: 0.13143437280589265 with: {'batch_size': 60, 'epochs': 20}
Means: 0.6963696316523914, Stdev: 0.04452389308394785 with: {'batch_size': 60, 'epochs': 30}
Means: 0.66996699669967, Stdev: 0.10550737415285161 with: {'batch_size': 60, 'epochs': 40}


### Dropout Regularization

In [79]:
# create model
model = KerasClassifier(build_fn=create_model, 
                               epochs=30,
                               batch_size=60,
                               verbose=0)

# define the grid search parameters
param_grid = {'dropout_rate' : [0.0, 0.2, 0.3]}

# Create Grid Search
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=1)
grid_result = grid.fit(X, y)

# Report Results
print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}")

Best: 0.7194719477848645 using {'dropout_rate': 0.2}
Means: 0.6798679891592598, Stdev: 0.11035151162175309 with: {'dropout_rate': 0.0}
Means: 0.7194719477848645, Stdev: 0.025986831487018202 with: {'dropout_rate': 0.2}
Means: 0.6600660036499351, Stdev: 0.14241221521980998 with: {'dropout_rate': 0.3}


### Optimizer and learning rate¶

In [80]:
from keras.optimizers import SGD, Adam

model = KerasClassifier(build_fn=create_model, 
                               epochs=30,
                               batch_size=60,
                               verbose=0)

# define the grid search parameters
param_grid = {'optimizer': [Adam, SGD],
              'lr': [.05, .03, 0.01]}

# Create Grid Search
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=1)
grid_result = grid.fit(X, y)

# Report Results
print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}")

Best: 0.7392739283763142 using {'lr': 0.01, 'optimizer': <class 'keras.optimizers.Adam'>}
Means: 0.6600660120103226, Stdev: 0.10765351243438732 with: {'lr': 0.05, 'optimizer': <class 'keras.optimizers.Adam'>}
Means: 0.12211220768025212, Stdev: 0.17269274023273257 with: {'lr': 0.05, 'optimizer': <class 'keras.optimizers.SGD'>}
Means: 0.6765676542083816, Stdev: 0.07245708078828142 with: {'lr': 0.03, 'optimizer': <class 'keras.optimizers.Adam'>}
Means: 0.21122112162042372, Stdev: 0.29871177485526024 with: {'lr': 0.03, 'optimizer': <class 'keras.optimizers.SGD'>}
Means: 0.7392739283763142, Stdev: 0.052598599214012005 with: {'lr': 0.01, 'optimizer': <class 'keras.optimizers.Adam'>}
Means: 0.21122112162042372, Stdev: 0.29871177485526024 with: {'lr': 0.01, 'optimizer': <class 'keras.optimizers.SGD'>}


### LeakyRelu Regularization

In [81]:
model = KerasClassifier(build_fn=create_model, 
                               epochs=30,
                               batch_size=60,
                               verbose=0)
# define the grid search parameters
param_grid = {'relu_alpha': [.001, .003, 0.005]}

# Create Grid Search
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=1)
grid_result = grid.fit(X, y)

# Report Results
print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}")

Best: 0.6963696471928763 using {'relu_alpha': 0.001}
Means: 0.6963696471928763, Stdev: 0.13750934506691428 with: {'relu_alpha': 0.001}
Means: 0.6765676622736966, Stdev: 0.09014191082830694 with: {'relu_alpha': 0.003}
Means: 0.6831683160448232, Stdev: 0.12601901723521391 with: {'relu_alpha': 0.005}


The best results were achieved with 
optimizer Adam, 
learning rate = 0.01, 
droupout rate = 0.2, 
epoch size = 30, 
batch size=60
LeakyRelU alpha rate = 0.001