# Neural Networks Sprint Challenge

## 1) Define the following terms:

A neural network is a function between two vector spaces $V^{m} \rightarrow V^{c}$ where $m$ is the number of features and $c$ is the cardinality of target.
- Neuron

A neuron is a function from a vector space to a scalar, it sums its inputs and applies an activation function which clips or normalizes it. 

- Input Layer

An input layer is where you send observations, it has as many neurons as the number of features. 

- Hidden Layer

A hidden layer can be of arbitary dimension. for layer length $w$ it transforms the neural net into a composite function $V^{m} \rightarrow 
V^{w} \rightarrow V^{w} \rightarrow V^{c}$

- Output Layer

The output layer is your prediction vector. In multiclass problems, it's a matrix beyond just being a column. 

- Activation

Activation can normalize values to probabilities (as in logistic regression) or clip them to just positive

- Backpropagation

Backpropagation updates weights, or the values in the matrices that represent your layers. It does this by iteratively nudging them closer to the goal, which is an accurate score. 

## 2) Create a perceptron class that can model the behavior of an AND gate. You can use the following table as your training data:

| x1 | x2 | x3 | y |
|----|----|----|---|
| 1  | 1  | 1  | 1 |
| 1  | 0  | 1  | 0 |
| 0  | 1  | 1  | 0 |
| 0  | 0  | 1  | 0 |

In [155]:
import numpy as np
import pandas as pd
import category_encoders as ce
import keras
from pandarallel import pandarallel
pandarallel.initialize()
from sklearn.preprocessing import StandardScaler
from keras.models import Sequential
from keras.layers import Dense

def squish(x: np.number) -> np.number: 
    return np.divide(1, 1 + np.exp(-x))

def del_squish(x: np.number) -> np.number: 
    s = squish(x)
    return s * (1 - s)

def relu(x): 
    if x < 0: 
        return 0
    else: 
        return x


def b(x: np.number) -> np.number: 
    ''' such that b after/composed sigmoid === derivative of sigmoid. '''
    return x * (1 - x)

New pandarallel memory created - Size: 2000 MB
Pandarallel will run on 4 workers


In [134]:
X = np.array([[1,1,1], [1,0,1], [0,1,1], [0,0,1]])
y = np.array([[1],[0],[0],[0]])

class Perceptron: 
    def __init__(self, X, y): 
        self.X = X
        self.y = y
        self.inputs = X.shape[1]
        self.weights = np.array([1,1,-1])
        self.prediction = [self.relu(x) for x in self.X @ self.weights]
    
    def relu(self, x): 
        if x < 0: 
            return 0
        else: 
            return x

P = Perceptron(X, y)

P.prediction

[1, 0, 0, 0]

## 3) Implement a Neural Network Multilayer Perceptron class that uses backpropagation to update the network's weights. 
- Your network must have one hidden layer. 
- You do not have to update weights via gradient descent. You can use something like the derivative of the sigmoid function to update weights.
- Train your model on the Heart Disease dataset from UCI:

[Github Dataset](https://github.com/ryanleeallred/datasets/blob/master/heart.csv)

[Raw File on Github](https://raw.githubusercontent.com/ryanleeallred/datasets/master/heart.csv)


In [151]:
def clean(dat): 
    assert dat.isna().sum().sum()==0
    assert all([t.name in ['int64', 'float64'] for t in dat.dtypes])
    print(dat.shape)
    a = StandardScaler().fit_transform(dat.drop('target', axis=1))
    return (pd.DataFrame(data=a, columns=dat.drop('target', axis=1).columns),
           dat.target)

X, y = clean(pd.read_csv("https://raw.githubusercontent.com/ryanleeallred/datasets/master/heart.csv"))

X.head()


(303, 14)


  return self.partial_fit(X, y)
  return self.fit(X, **fit_params).transform(X)


Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal
0,0.952197,0.681005,1.973123,0.763956,-0.256334,2.394438,-1.005832,0.015443,-0.696631,1.087338,-2.274579,-0.714429,-2.148873
1,-1.915313,0.681005,1.002577,-0.092738,0.072199,-0.417635,0.898962,1.633471,-0.696631,2.122573,-2.274579,-0.714429,-0.512922
2,-1.474158,-1.468418,0.032031,-0.092738,-0.816773,-0.417635,-1.005832,0.977514,-0.696631,0.310912,0.976352,-0.714429,-0.512922
3,0.180175,0.681005,0.032031,-0.663867,-0.198357,-0.417635,0.898962,1.239897,-0.696631,-0.206705,0.976352,-0.714429,-0.512922
4,0.290464,-1.468418,-0.938515,-0.663867,2.08205,-0.417635,0.898962,0.583939,1.435481,-0.379244,0.976352,-0.714429,-0.512922


In [152]:
class NeuralNetwork: 
    def __init__(self, X, y): 
        self.X = X.values
        self.y = y.values.reshape(-1,1)
        self.inputs = X.shape[1]
        self.hidden_1 = X.shape[1]
        self.output_nodes = 1
        # init weights: 
        self.L1_weights = np.random.randn(self.inputs, self.hidden_1)
        self.L2_weights = np.random.randn(self.hidden_1, self.output_nodes)
        self.predictions = self.refresh_ff()
        
    def feed_forward(self): 
        ''' matmul to produce predictions ''' 
        hidden_sum_1 = self.X @ self.L1_weights
        self.activated_hidden_1 = np.clip(hidden_sum_1, a_min=0, a_max=None) # relu this layer
        hidden_sum_2 = self.activated_hidden_1 @ self.L2_weights
        self.activated_hidden_2 = squish(hidden_sum_2)
        return self.activated_hidden_2

    def refresh_ff(self): 
        ''' run this when weights are updated '''
        prds = self.feed_forward()
        self.predictions = prds
        return prds
    
    def loss(self): 
        ''' mean squared error '''
        self.refresh_ff()
        n = len(self.y)
        assert len(self.predictions)==n
        #print(sum([Y for Y in (self.predictions - self.y)]))
        return np.divide(sum([Y**2 for Y in (self.predictions - self.y)]), n)
    
    def back(self): 
        ''' will modify values of Lk_weights '''
        predns = self.refresh_ff()
        output_error = predns - self.y
        del_output_error = output_error * b(predns)

        s2_error = del_output_error @ self.L2_weights.T
        del_s2_error = s2_error * b(self.activated_hidden_2)
        
        s1_error = del_s2_error @ self.L1_weights.T
        del_s1_error = s1_error * b(self.activated_hidden_1)
        
        assert self.L1_weights.shape == (X.T @ del_s2_error).shape
        assert self.L2_weights.shape == (self.activated_hidden_1.T @ del_output_error).shape
        self.L1_weights = X.T @ del_s2_error
        self.L2_weights = self.activated_hidden_1.T @ del_output_error
        pass
    
def report(N: NeuralNetwork) -> str: 
    s = ''
    ls = {}
    for epoch in range(50): 
        N.back()
        if epoch%11==0 or epoch in [1,2,3]: 
            ls[epoch+1] = N.loss()

    for k,v in ls.items(): 
        s += f"\tepoch {k+1} with MSSE loss {v}\n"
    return s

print(report(NeuralNetwork(X,y)))

#NN = NeuralNetwork(X,y)

# NN = NeuralNetwork(X,y)
# for _ in range(3): 
#     NN.back()

# print(NN.predictions.head())

	epoch 2 with MSSE loss 0.0
	epoch 3 with MSSE loss 0.0
	epoch 4 with MSSE loss 0.0
	epoch 5 with MSSE loss 0.0
	epoch 13 with MSSE loss 0.0
	epoch 24 with MSSE loss 0.0
	epoch 35 with MSSE loss 0.0
	epoch 46 with MSSE loss 0.0



# WARNING: Hypothesis: loss goes too low too quickly, it's numerically unstable! 

## 4) Implement a Multilayer Perceptron architecture of your choosing using the Keras library. Train your model and report its baseline accuracy. Then hyperparameter tune at least two parameters and report your model's accuracy. 

- Use the Heart Disease Dataset (binary classification)
- Use an appropriate loss function for a binary classification task
- Use an appropriate activation function on the final layer of your network. 
- Train your model using verbose output for ease of grading.
- Use GridSearchCV to hyperparameter tune your model. (for at least two hyperparameters)
- When hyperparameter tuning, show you work by adding code cells for each new experiment. 
- Report the accuracy for each combination of hyperparameters as you test them so that we can easily see which resulted in the highest accuracy.
- You must hyperparameter tune at least 5 parameters in order to get a 3 on this section.

In [153]:
X.head()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal
0,0.952197,0.681005,1.973123,0.763956,-0.256334,2.394438,-1.005832,0.015443,-0.696631,1.087338,-2.274579,-0.714429,-2.148873
1,-1.915313,0.681005,1.002577,-0.092738,0.072199,-0.417635,0.898962,1.633471,-0.696631,2.122573,-2.274579,-0.714429,-0.512922
2,-1.474158,-1.468418,0.032031,-0.092738,-0.816773,-0.417635,-1.005832,0.977514,-0.696631,0.310912,0.976352,-0.714429,-0.512922
3,0.180175,0.681005,0.032031,-0.663867,-0.198357,-0.417635,0.898962,1.239897,-0.696631,-0.206705,0.976352,-0.714429,-0.512922
4,0.290464,-1.468418,-0.938515,-0.663867,2.08205,-0.417635,0.898962,0.583939,1.435481,-0.379244,0.976352,-0.714429,-0.512922


In [156]:
# Important Hyperparameters
inputs = X.shape[1]
epochs = 22
batch_size = 16

# Create Model
model = Sequential()
model.add(Dense(64, activation='relu', input_shape=(inputs,)))
model.add(Dense(32, activation='relu'))
model.add(Dense(64, activation='sigmoid'))
model.add(Dense(1))
# Compile Model
model.compile(optimizer='adam', loss='mse', metrics=['mse'])
# Fit Model
model.fit(X, y, validation_split=0.33, epochs=epochs, batch_size=batch_size, verbose=1)

model.evaluate(X,y)

Train on 203 samples, validate on 100 samples
Epoch 1/22
Epoch 2/22
Epoch 3/22
Epoch 4/22
Epoch 5/22
Epoch 6/22
Epoch 7/22
Epoch 8/22
Epoch 9/22
Epoch 10/22
Epoch 11/22
Epoch 12/22
Epoch 13/22
Epoch 14/22
Epoch 15/22
Epoch 16/22
Epoch 17/22
Epoch 18/22
Epoch 19/22
Epoch 20/22
Epoch 21/22
Epoch 22/22


[0.11190128532966764, 0.11190128532966764]