## Neural Network Architecture

There are 3 layers in a typical NN:
- Input Layer
- Hidden Layers
- Output Layer (nodes - correspond to the number of outputs)

##### Each layer is made up of n individual neurons (aka activation units) which have a corresponding weight and bias.

#### What process occurs in a neural network?
The input layer receives input from our data (1 node per feature in our dataset). A signal from the input layer gets passed into the next layer (hidden layer), each input is multipled by the weight of that input, we add a bias term to the weighted sum of those inputs and weights, then we activate the weighted sum plus the bias using the activation function, squishifying it into  a probability, then we pass that activation value into the next layer

#### A signal is passed through the network by:
 - Taking in inputs from the training data (or previous layer)
 - Multiplying each input by its corresponding weight (think arrow/connecting line)
 - Adding a bias to this weighted some of inputs and weights
 - Activating this weighted sum + bias by squishifying it with sigmoid or some other activation function. With a single perceptron with three inputs, calculating the output from the node is done like so:
\begin{align}
 y = sigmoid(\sum(weight_{1}input_{1} + weight_{2}input_{2} + weight_{3}input_{3}) + bias)
\end{align}
 - this final activated value is the signal that gets passed onto the next layer of the network.



#### How to train a neural network:
0. Pick a network architecture
   - No. of input units = No. of features
   - No. of output units = Number of Classes (or expected targets)
   - Select the number of hidden layers and number of neurons within each hidden layer
1. Randomly initialize weights
2. Implement forward propagation to get $h_{\theta}(x^{(i)})$ for any $x^{(i)}$
3. Implement code to compute a cost function $J(\theta)$
4. Implement backpropagation to compute partial derivatives $\frac{\delta}{\delta\theta_{jk}^{l}}{J(\theta)}$
5. Use gradient descent (or other advanced optimizer) with backpropagation to minimize $J(\theta)$ as a function of parameters $\theta\$
6. Repeat steps 2 - 5 until cost function is 'minimized' or some other stopping criteria is met. One pass over steps 2 - 5 is called an iteration or epoch.


# Implement a Perceptron From Scratch

### Data and Imports

In [21]:
import pandas as pd
import numpy as np

from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

In [2]:
#titanic dataset
url = 'https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv'

df = pd.read_csv(url)
df = df.dropna()

# encode categorical features
dict = {'female':0, 'male':1}
df['Sex'] = df['Sex'].map(dict)

In [3]:
print(df.shape)
df.head()

(183, 12)


Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",0,38.0,1,0,PC 17599,71.2833,C85,C
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",0,35.0,1,0,113803,53.1,C123,S
6,7,0,1,"McCarthy, Mr. Timothy J",1,54.0,0,0,17463,51.8625,E46,S
10,11,1,3,"Sandstrom, Miss. Marguerite Rut",0,4.0,1,1,PP 9549,16.7,G6,S
11,12,1,1,"Bonnell, Miss. Elizabeth",0,58.0,0,0,113783,26.55,C103,S


### Train and test

In [9]:
# split data
train, test = train_test_split(df, test_size=0.2, random_state = 22)

# features and target
features = ['Pclass', 'Sex', 'Age', 'SibSp', 'Parch']
# features = ['Pclass']
target = 'Survived'

# X, y vectors
X_train = train[features]
y_train = train[target]
X_test = test[features]
y_test = test[target]

### Perceptron

In [30]:
class NeuralNetwork:
    def __init__(self):
        
        # Set up Architecture of Neural Network
        self.inputs = 5
        self.outputNodes = 1
        
        # random randn creates a draw along the standard normal distribution
        # create a 5 x 1 matrix 
        self.weights1 = np.random.randn(self.inputs, self.outputNodes)
        
    def sigmoid(self, s):
        return 1 /(1 + np.exp(-s))
    
    def sigmoidPrime(self, s):
        return s * (1 - s)
    
    def feed_forward(self, X):
        """
        Calculate the NN inference using feed forward. 
        aka 'predict'
        """
        
        # Weighted sum of inputs => hidden layer
        self.weighted_sum = np.dot(X, self.weights1)
        
        
        # Final activation of output
        self.activated_output = self.sigmoid(self.weighted_sum)
        
        return self.activated_output
    
    def predict(self, X):
        predictions = self.feed_forward(X)
        predictions = [1 if i>0.5 else 0 for i in predictions]
        return predictions

In [17]:
(y_test.values - y_pred)

array([ 7.99897142e-07,  7.99897142e-07, -9.99999200e-01, -9.99999200e-01,
        7.99897142e-07, -9.99999200e-01,  7.99897142e-07,  7.99897142e-07,
        7.99897142e-07,  7.99897142e-07,  7.99897142e-07,  7.99897142e-07,
       -9.99999200e-01, -9.99999200e-01,  7.99897142e-07,  7.99897142e-07,
        7.99897142e-07,  7.99897142e-07,  7.99897142e-07,  7.99897142e-07,
        7.99897142e-07,  7.99897142e-07,  7.99897142e-07,  7.99897142e-07,
        7.99897142e-07, -9.99999200e-01, -9.99999200e-01,  7.99897142e-07,
       -9.99999200e-01, -9.99999200e-01,  7.99897142e-07,  7.99897142e-07,
        7.99897142e-07,  7.99897142e-07,  7.99897142e-07,  7.99897142e-07,
       -9.99999200e-01])

In [32]:
# Instantiate Neural Network
nn = NeuralNetwork()

# Make predictions 
y_pred = nn.predict(X_test.values)

# Test Accuracy
score = accuracy_score(y_test, y_pred)
print('accuracy score:', score)

accuracy score: 0.7297297297297297


### Multi-Layer Perceptron

In [34]:
class NeuralNetwork:
    def __init__(self):
        
        # Set up Architecture of Neural Network
        self.inputs = 5
        self.hiddenNodes = 3
        self.outputNodes = 1
        
        # random randn creates a draw along the standard normal distribution
        # create a 5 x 3 matrix 
        self.weights1 = np.random.randn(self.inputs, self.hiddenNodes)
        
        # create a 3 x 1 matrix 
        self.weights2 = np.random.randn(self.hiddenNodes, self.outputNodes)
      
        
    def sigmoid(self, s):
        return 1 /(1 + np.exp(-s))
    
    def sigmoidPrime(self, s):
        return s * (1 - s)
    
    def feed_forward(self, X):
        """
        Calculate the NN inference using feed forward. 
        aka 'predict'
        """
        
        # Weighted sum of inputs => hidden layer
        self.weighted_sum = np.dot(X, self.weights1)
        
        # Activation of input => hidden
        self.activated_sum = self.sigmoid(self.weighted_sum)
        
        # Weighted sum of hidden layer => output layer
        self.weighted_sum = np.dot(self.activated_sum, self.weights2)
        
        # Final activation of output - hidden layer => output layer
        self.activated_output = self.sigmoid(self.weighted_sum)
        
        return self.activated_output
    
    def predict(self, X):
        predictions = self.feed_forward(X)
        predictions = [1 if i>0.5 else 0 for i in predictions]
        return predictions

In [35]:
# Instantiate Neural Network
nn = NeuralNetwork()

# Make predictions 
y_pred = nn.predict(X_test.values)

# Test Accuracy
score = accuracy_score(y_test, y_pred)
print('accuracy score:', score)

accuracy score: 0.6756756756756757


### Multi-Layer Perceptron w/ Back Propagation

In [36]:
class NeuralNetwork:
    def __init__(self):
        
        # Set up Architecture of Neural Network
        self.inputs = 5
        self.hiddenNodes = 3
        self.outputNodes = 1
        
        # random randn creates a draw along the standard normal distribution
        # create a 5 x 3 matrix 
        self.weights1 = np.random.randn(self.inputs, self.hiddenNodes)
        
        # create a 3 x 1 matrix 
        self.weights2 = np.random.randn(self.hiddenNodes, self.outputNodes)
      
        
    def sigmoid(self, s):
        return 1 /(1 + np.exp(-s))
    
    def sigmoidPrime(self, s):
        return s * (1 - s)
    
    def feed_forward(self, X):
        """
        Calculate the NN inference using feed forward. 
        aka 'predict'
        """
        
        # Weighted sum of inputs => hidden layer
        self.weighted_sum = np.dot(X, self.weights1)
        
        # Activation of input => hidden
        self.activated_sum = self.sigmoid(self.weighted_sum)
        
        # Weighted sum of hidden layer => output layer
        self.weighted_sum = np.dot(self.activated_sum, self.weights2)
        
        # Final activation of output - hidden layer => output layer
        self.activated_output = self.sigmoid(self.weighted_sum)
        
        return self.activated_output
    
    def predict(self, X):
        predictions = self.feed_forward(X)
        predictions = [1 if i>0.5 else 0 for i in predictions]
        return predictions
    
    
    def backward(self, X,y,o):
        """
        Backward propagate through the network
        """
        
        # Error in Output
        self.o_error = y - o
        
        # Appy  Derivative of Sigmoid to error
        # How far off are we in relation to the Sigmoid f(x) of the output
        self.o_delta = self.o_error * self.sigmoidPrime(o)
        
        # z2 error: how much our hidden
        self.z2_error = self.o_delta.dot(self.weights2.T)
        # How much of that "far off" can be explained by the hidden => output weights
        self.z2_delta = self.z2_error * self.sigmoidPrime(self.activated_hidden)
        
        # Adjustment to first set of weights (input => hidden)
        self.weights1 += X.T.dot(self.z2_delta)
        
        # Adjustment to second set of weights (hidden => output)
        self.weights2 += self.activated_hidden.T.dot(self.o_delta)
        
        
    def train(self, X, y):
        # o is output
        o = self.feed_forward(X)
        self.backward(X,y,o)

In [37]:
# Instantiate Neural Network
nn = NeuralNetwork()

# Make predictions 
y_pred = nn.predict(X_test.values)

# Test Accuracy
score = accuracy_score(y_test, y_pred)
print('accuracy score:', score)

accuracy score: 0.7297297297297297


## Activation Functions

#### What are the different types of activation functions and when are they useful?
- Linear
- Sigmoid
- Relu
- Leaky Relu
- Tanh
- Softmax

#### What are the differnt types of loss metrics and when are they useful?
- binary_crossentropy
- categorical_crossentropy
- binary_accuracy