<a href="https://colab.research.google.com/github/cocoisland/DS-Unit-4-Sprint-3-Neural-Networks/blob/master/DS43SC.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Neural Networks Sprint Challenge

## 1) Define the following terms:

1. Neuron - The neuron is nothing more than a set of inputs, a set of weights, and an activation function. The neuron translates these inputs into a single output, which can then be picked up as input for another layer of neurons later on.

2. Input Layer - The input layer passes the data directly to the first hidden layer where the data is multiplied by the first hidden layer's weights.

3. Hidden Layer The hidden layers' job is to transform the inputs into something that the output layer can use.

4. Output Layer - The output layer transforms the hidden layer activations into whatever scale you wanted your output to be on.

5. Activation - squashes the values in a smaller range viz. a Sigmoid activation function squashes values between a range 0 to 1. There are many activation functions used in deep learning industry and ReLU, SeLU and TanH are preferred over sigmoid activation function.

6. Backpropagation - information going back from the error cost backward through the network in order to compute the gradient or re-adjust the weights. It is refitting the neural network to minimize the error cost function.

 YOUR ANSWER HERE

## 2) Create a perceptron class that can model the behavior of an AND gate. You can use the following table as your training data:

| x1 | x2 | x3 | y |
|----|----|----|---|
| 1  | 1  | 1  | 1 |
| 1  | 0  | 1  | 0 |
| 0  | 1  | 1  | 0 |
| 0  | 0  | 1  | 0 |

In [0]:
import numpy as np

X = np.array([(1,1,1), (1,0,1), (0,1,1), (1,0,1)], dtype=float)
y = np.array([1,0,0,0], dtype=float)

X.shape, y.shape

((4, 3), (4,))

In [0]:
import numpy as np

class Perceptron(object):
  def __init__(self, rate = 0.01, niter = 10):
    self.rate = rate
    self.niter = niter

  def fit(self, X, y):
    """Fit training data
    X : Training vectors, X.shape : [#samples, #features]
    y : Target values, y.shape : [#samples]
    """

    # weights
    self.weight = np.zeros(1 + X.shape[1]).astype(float)

    # Number of misclassifications
    self.errors = []  # Number of misclassifications

    for i in range(self.niter):
      err = 0
      for xi, target in zip(X, y):
        delta_w = self.rate * (target - self.predict(xi))
        self.weight[1:] += delta_w * xi
        self.weight[0] += delta_w
        err += int(delta_w != 0.0)
      self.errors.append(err)
    return self

  def net_input(self, X):
    """Calculate net input"""
    return np.dot(X, self.weight[1:]) + self.weight[0]
    
  def predict(self, X):
    """Return class label after unit step"""
   
    return np.where(self.net_input(X) >= 0.5, 1, 0)

In [0]:
pn = Perceptron(1, 10000)
pn.fit(X, y)
pn.predict(X)

array([1, 0, 0, 0])

## 3) Implement a Neural Network Multilayer Perceptron class that uses backpropagation to update the network's weights. 
- Your network must have one hidden layer. 
- You do not have to update weights via gradient descent. You can use something like the derivative of the sigmoid function to update weights.
- Train your model on the Heart Disease dataset from UCI:

[Github Dataset](https://github.com/ryanleeallred/datasets/blob/master/heart.csv)

[Raw File on Github](https://raw.githubusercontent.com/ryanleeallred/datasets/master/heart.csv)


In [0]:
!wget https://raw.githubusercontent.com/ryanleeallred/datasets/master/heart.csv

--2019-04-05 15:43:21--  https://raw.githubusercontent.com/ryanleeallred/datasets/master/heart.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 11328 (11K) [text/plain]
Saving to: ‘heart.csv’


2019-04-05 15:43:21 (130 MB/s) - ‘heart.csv’ saved [11328/11328]



In [0]:
import pandas as pd

df = pd.read_csv('heart.csv')
df.head()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,63,1,3,145,233,1,0,150,0,2.3,0,0,1,1
1,37,1,2,130,250,0,1,187,0,3.5,0,0,2,1
2,41,0,1,130,204,0,0,172,0,1.4,2,0,2,1
3,56,1,1,120,236,0,1,178,0,0.8,2,0,2,1
4,57,0,0,120,354,0,1,163,1,0.6,2,0,2,1


In [0]:
df.isnull().sum()

age         0
sex         0
cp          0
trestbps    0
chol        0
fbs         0
restecg     0
thalach     0
exang       0
oldpeak     0
slope       0
ca          0
thal        0
target      0
dtype: int64

In [0]:
X=df.drop('target',axis=1).values.astype(float)
y=df.target.values.reshape(-1,1)

X.shape, y.shape

((303, 13), (303, 1))

In [0]:
class Neural_Network(object):
  def __init__(self):
    self.inputs = 13
    self.hiddenNodes = 16
    self.outputNodes = 1

    # Initlize Weights
    self.L1_weights = np.random.randn(self.inputs, self.hiddenNodes) # (13x6)
    self.L2_weights = np.random.randn(self.hiddenNodes, self.outputNodes) # (6x1)
    

  def feed_forward(self, X):
    # Weighted sum between inputs and hidden layer
    self.hidden_sum = np.dot(X, self.L1_weights)
    # Activations of weighted sum
    self.activated_hidden = self.sigmoid(self.hidden_sum)
    # Weighted sum between hidden and output
    self.output_sum = np.dot(self.activated_hidden, self.L2_weights)
    # final activation of output
    self.activated_output = self.sigmoid(self.output_sum)
    return self.activated_output
    
  def sigmoid(self, s):
    return 1/(1+np.exp(-s))
  
  # RuntimeWarning: overflow encountered in exp
  #def sigmoid(self,s):
  #  if -s > np.log(np.finfo(type(s)).max):
  #      return 0.0    
  #  a = np.exp(-s)
  #  return 1.0/ (1.0 + a)
  
  def sigmoidPrime(self, s):
    return s * (1 - s)
  
  def backward(self, X, y, o):
    # backward propgate through the network
    self.o_error = y - o # error in output
    self.o_delta = self.o_error*self.sigmoidPrime(o) # applying derivative of sigmoid to error
    
    self.z2_error = self.o_delta.dot(self.L2_weights.T) # z2 error: how much our hidden layer weights contributed to output error
    self.z2_delta = self.z2_error*self.sigmoidPrime(self.activated_hidden) # applying derivative of sigmoid to z2 error

    self.L1_weights += X.T.dot(self.z2_delta) # adjusting first set (input --> hidden) weights
    self.L2_weights += self.activated_hidden.T.dot(self.o_delta) # adjusting second set (hidden --> output) weights
    
  def train (self, X, y):
    o = self.feed_forward(X)
    self.backward(X, y, o)

In [0]:
NN = Neural_Network()
for i in range(100): # trains the NN 1,000 times
  if i+1 in [1,2,3,4,5] or (i+1) % 50 == 0:
    print('+---------- EPOCH', i+1, '-----------+')
    #print("Input: \n", X) 
    #print("Actual Output: \n", y)  
    #print("Predicted Output: \n" + str(NN.feed_forward(X))) 
    print("Loss: \n" + str(np.mean(np.square(y - NN.feed_forward(X))))) # mean sum squared loss
    print("\n")
  NN.train(X, y)

+---------- EPOCH 1 -----------+
Loss: 
0.44707792229767274


+---------- EPOCH 2 -----------+
Loss: 
0.5443442221291408


+---------- EPOCH 3 -----------+
Loss: 
0.5442560996521041


+---------- EPOCH 4 -----------+
Loss: 
0.5440642144937643


+---------- EPOCH 5 -----------+
Loss: 
0.5434467034172835


+---------- EPOCH 50 -----------+
Loss: 
0.5445544554454586


+---------- EPOCH 100 -----------+
Loss: 
0.5445544554454586






## 4) Implement a Multilayer Perceptron architecture of your choosing using the Keras library. Train your model and report its baseline accuracy. Then hyperparameter tune at least two parameters and report your model's accuracy. 

- Use the Heart Disease Dataset (binary classification)
- Use an appropriate loss function for a binary classification task
- Use an appropriate activation function on the final layer of your network. 
- Train your model using verbose output for ease of grading.
- Use GridSearchCV to hyperparameter tune your model. (for at least two hyperparameters)
- When hyperparameter tuning, show you work by adding code cells for each new experiment. 
- Report the accuracy for each combination of hyperparameters as you test them so that we can easily see which resulted in the highest accuracy.
- You must hyperparameter tune at least 5 parameters in order to get a 3 on this section.

In [0]:
import numpy
from sklearn.model_selection import GridSearchCV
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasClassifier


In [0]:
def create_model(activation='relu'):
  # create model
  model = Sequential()
  model.add(Dense(12, input_dim=13, kernel_initializer='glorot_normal', activation=activation))
  model.add(Dense(1, kernel_initializer='glorot_normal', activation='sigmoid'))
  # Compile model
 
  model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
  return model

# create model
model = KerasClassifier(build_fn=create_model, verbose=0)

# Create pipeline
#pipeline = make_pipeline(\
#                         ce.BinaryEncoder(),
#                         RobustScaler(), 
#                         KerasClassifier(build_fn=create_model, verbose=1))
# Model validation.
#param_grid = {
#    'kerasclassifier__batch_size': [20, 50, 80, 100, 200],
#    'kerasclassifier__activation': ['tanh']
#}

#gridsearch = GridSearchCV(pipeline, param_grid=param_grid, cv=3, 
#                         scoring='accuracy', verbose=10


param_grid = {'batch_size': [10,20],
              'epochs': [20],
              #'init_mode' : ['glorot_normal'],
              'activation' : ['softmax', 'softplus', 'softsign', 'relu', 'tanh', 'sigmoid', 'hard_sigmoid', 'linear']
              }

# Create Grid Search
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=1)
grid_result = grid.fit(X, y)

# Report Results
print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
  print(f"Means: {mean}, Stdev: {stdev} with: {param}") 




Best: 0.8910891065503111 using {'activation': 'relu', 'batch_size': 20, 'epochs': 20}
Means: 0.15181518102636432, Stdev: 0.214699087981611 with: {'activation': 'softmax', 'batch_size': 10, 'epochs': 20}
Means: 0.24092409324527966, Stdev: 0.33374154115013854 with: {'activation': 'softmax', 'batch_size': 20, 'epochs': 20}
Means: 0.21122112230892623, Stdev: 0.2987117758289498 with: {'activation': 'softplus', 'batch_size': 10, 'epochs': 20}
Means: 0.702970297619848, Stdev: 0.21976332381525265 with: {'activation': 'softplus', 'batch_size': 20, 'epochs': 20}
Means: 0.25082508718023205, Stdev: 0.30595323848214345 with: {'activation': 'softsign', 'batch_size': 10, 'epochs': 20}
Means: 0.24752475316375003, Stdev: 0.2893393896468499 with: {'activation': 'softsign', 'batch_size': 20, 'epochs': 20}
Means: 0.21122112230892623, Stdev: 0.2987117758289498 with: {'activation': 'relu', 'batch_size': 10, 'epochs': 20}
Means: 0.8910891065503111, Stdev: 0.1540232626067211 with: {'activation': 'relu', 'batc