# Neural Networks

## *LSDS Unit 4 Sprint 2 Assignment 1*

## Tobias Reaper

---
---

## Define the Following:

You can add image, diagrams, whatever you need to ensure that you understand the concepts below.

### Input Layer

The input layer is the "first line of defense" against an input. It is the first layer to receive the input, hence the name—though it is sometimes called the visible layer because it is the only part of the network that is exposed to / interacts with the data.

### Hidden Layer

The intermediate layer or layers between the input and the output, if the architecture of the network indeed has an output layer / cells. It is "hidden" because it is not generally viewed by the external observer - nor is it really cared about in most circumstances.

### Output Layer

As with the above, this layer is also aptly-named, as it is the final layer of neurons which end up providing the final output of the network.

### Neuron

An individual cell within a neural network; the most fundamental unit of such a network, out of which the rest of the properties of the network derive. It is a single node with an activation function that determines, given an input, to fire or not, or how much to fire and where.

### Weight

Another word for the bias term that can be added to the network and which affects the entirety of a single layer of neurons in the same way. This allows for the output of said layer to be...well...biased.

### Activation Function

The determinant inside each neuron of how much "signal" or information is passed through to the next layer. Typically, the neurons within the same layer all have the same activation function. To relate it to biology, this would be what determines whether a neuron fires or not.

The activation function is the specification that allows each neuron to aggregate all of the inputs it receives and to produce a single output.

Common activation functions such as sigmoid and tanh map all numbers to a limited range.

### Node Map:

A visual representation of the internal architecture of a neural network; i.e. a map of the nodes (neurons) in the network and how they are connected to and share information with one another.

### Perceptron

A perceptron is a neural network architecture which is the simplest of the feedforward methods and one of the simplest and most basic overall. Perceptrons are also known as linear binary classifiers, as they are usually used to classify data into two classes.

It is an artificial neuron with a single layer and a unit step activation function. A perceptron connects an arbitrary number of inputs with a single output node

---

## Inputs -> Outputs

> Explain the flow of information through a neural network from inputs to outputs.  
> Be sure to include: inputs, weights, bias, and activation functions.  
> How does it all flow from beginning to end?

The input layer is the first layer of neurons to interact with the data, and indeed the only layer to interact with it directly. However, what each neuron / perceptron does with the input is generally the same, and specifically similar within each layer.

The input is given to the first layer, and each of the neurons that make up that layer multiply the input by a weight (bias), sums the resulting products and passes them into an activation function. The activation function provides the output from each neuron either to the next layer or as the final output.

> _From Cassidy_

The activation of the nodes in the input layer is proportional to the data itself. Each feature in the dataset corresponds to a particular node in the activation layer. Starting after the input, all the neurons in the Nth layer are connected to all the neurons in the N+1th layer.  Each connection (between one node and the next) has an associated weight, which determines how important that connection is.  Higher weights mean that the activation of the N+1th neuron will respond more strongly to activation in the Nth. Each neuron in the hidden layer also has a bias, which gets added in the summation of activations from its predecessor neurons. That way each neuron has a baseline activity regardless of the activation of its predecessors. Each node sums up its inputs and passes them through an activation function, which can map the sum of outputs to a particular range (eg (0-1)) or otherwise transform it in a useful way. 

The overall flow of information looks like this.  An input neuron activates based on the value of a feature in the dataset.  The activity of that neuron influences all the neurons in the next layer, proportionally to the weights connecting them to the first. Each neuron in the 2nd layer decides on its own output value by considering the contributions from everything in the first layer plus a bias term, all passed through the activation function.  The process then repeats for the next layer. 

---
---

## Write your own perceptron code that can correctly classify (99.0% accuracy) a NAND gate. 

| x1 | x2 | y |
|----|----|---|
| 0  | 0  | 1 |
| 1  | 0  | 1 |
| 0  | 1  | 1 |
| 1  | 1  | 0 |

In [3]:
# ====== Imports ====== #
import pandas as pd
import numpy as np

import dovpanda

In [4]:
data = { 'x1': [0,1,0,1],
         'x2': [0,0,1,1],
         'y':  [1,1,1,0]
       }

df = pd.DataFrame.from_dict(data).astype('int')

In [6]:
df.head()

Unnamed: 0,x1,x2,y
0,0,0,1
1,1,0,1
2,0,1,1
3,1,1,0


In [5]:
# Define the training data and the target vector

inputs = df.drop(columns=["y"])

correct_outputs = df.drop(columns=["x1", "x2"])

print(inputs.shape)
print(correct_outputs.shape)

(4, 2)
(4, 1)


In [7]:
# Sigmoid activation function and its derivative for updating weights

def sigmoid(x):
    """Sigmoid activation function."""
    return 1 / (1 + np.exp(-x))


def sigmoid_derivative(x):
    """Derivative of sigmoid function, for updating weights."""
    sx = sigmoid(x)
    return sx * (1 - sx)

In [8]:
# Initialize random weights for the two inputs
weights = 2 * np.random.random((2, 1)) - 1
weights

array([[ 0.12803007],
       [-0.03577113]])

In [9]:
# Weighted sum of inputs and weights
weighted_sum = np.dot(inputs, weights)
weighted_sum

array([[ 0.        ],
       [ 0.12803007],
       [-0.03577113],
       [ 0.09225894]])

In [10]:
# Get the activated values for the end of first training epoch
# These are the predictions for the first round
activate_output = sigmoid(weighted_sum)
activate_output

array([[0.5       ],
       [0.53196387],
       [0.49105817],
       [0.52304839]])

In [11]:
# Calculate error by taking difference of output and correct output
err = correct_outputs - activate_output
err

Unnamed: 0,y
0,0.5
1,0.468036
2,0.508942
3,-0.523048


In [12]:
# Gradient descent - adjust the weights to reduce the error on the next epoch
adjusted = err = sigmoid_derivative(activate_output)
adjusted

array([[0.23500371],
       [0.23311533],
       [0.23551451],
       [0.23365174]])

In [13]:
# ====== FuncZone ====== #
# Put it into a function

def perceptronator(X, y):
    pass

In [16]:
# Loop through the process 5,000 times
for _ in range(10000):
    
    # Weighted sum of inputs and weights
    weighted_sum = np.dot(inputs, weights)
    
    # === Activate: Engage === #
    # Get the activated values for the end of first training epoch
    activated_out = sigmoid(weighted_sum)
    
    # Calculate the error
    error = correct_outputs - activated_out
    
    # Calc the adjustments to be made
    adjust = error + sigmoid_derivative(activated_out)
    
    # Update the weights
    weights += np.dot(inputs.T, adjust)

In [17]:
print("Weights after training")
print(weights)

print("Output after training")
print(activated_out)

Weights after training
[[0.6451489]
 [0.6451489]]
Output after training
[[0.5       ]
 [0.65591645]
 [0.65591645]
 [0.78419759]]


---

### Part 2

In [19]:
# Sigmoid activation function and its derivative for updating weights

def sigmoid(x):
    """Sigmoid activation function."""
    return 1 / (1 + np.exp(-x))


def sigmoid_prime(x):
    """Derivative of sigmoid function, for updating weights."""
    return x * (1 - x)

In [20]:
# Initialize random weights for the two inputs
weights = 2 * np.random.random((2, 1)) - 1
weights

array([[ 0.01593195],
       [-0.958399  ]])

In [23]:
# Loop through the process 5,000 times
for _ in range(5000):
    
    # Weighted sum of inputs and weights
    weighted_sum = np.dot(inputs, weights)
    
    # === Activate: Engage === #
    # Get the activated values for the end of first training epoch
    activated_out = sigmoid(weighted_sum)
    
    # Calculate the error
    error = correct_outputs - activated_out
    
    # Calc the adjustments to be made
    adjust = error + sigmoid_derivative(activated_out)
    
    # Update the weights
    weights += np.dot(inputs.T, adjust)

In [24]:
print("Weights after training")
print(weights)

print("Output after training")
print(activated_out)

Weights after training
[[0.6451489]
 [0.6451489]]
Output after training
[[0.5       ]
 [0.65591645]
 [0.65591645]
 [0.78419759]]


---
---

## Implement your own Perceptron Class and use it to classify a binary dataset: 

- [The Pima Indians Diabetes dataset](https://raw.githubusercontent.com/ryanleeallred/datasets/master/diabetes.csv) 

You may need to search for other's implementations in order to get inspiration for your own. There are *lots* of perceptron implementations on the internet with varying levels of sophistication and complexity. Whatever your approach, make sure you understand **every** line of your implementation and what its purpose is.

---

Test the class' functionality out on the NAND gate defined above.

In [40]:
class Perceptron(object):
    
    def __init__(self, n_iter: int = 10):
        self.n_iter = n_iter

    def sigmoid(self, x):
        """Sigmoid activation function."""
        return 1 / (1 + np.exp(-x))

    def sigmoid_derivative(self, x):
        """Derivative of sigmoid function, for updating weights."""
        sx = sigmoid(x)
        return sx * (1 - sx)
    
    def fit(self, X, y):
        """
        Fit on the training data.
        
        Parameters
        ----------
        X : np.arrays
            Training vectors, X.shape : [#samples, #features]
        y : np.array
            Target values, y.shape : [#samples]
        """
        # Set the initial random weight values
        weights = np.zeros(1 + X.shape[1])

        for _ in range(self.n_iter):
            # Weighted sum of inputs and weights
            weighted_sum = np.dot(X, weights)
            # === Activate === #
            activated_out = sigmoid(weighted_sum)
            # Calculate the error
            error = correct_outputs - activated_out
            # Calc the adjustments to be made
            adjust = error + sigmoid_derivative(activated_out)
            # Update the weights
            weights += np.dot(X.T, adjust)

    def predict(self, X):
        """Return class label after unit step.
        Default step function."""
        return np.where(self.net_input(X) >= 0.0, 1, -1)

In [31]:
# Set up the dataset as numpy arrays

inputs = np.array([[0,0],
                   [1,0],
                   [0,1],
                   [1,1]])

correct_outputs = np.array([[1],
                           [1],
                           [1],
                           [0]])

In [41]:
# Instantiate and train
pn = Perceptron()
pn.fit(inputs, correct_outputs)

ValueError: shapes (4,2) and (3,) not aligned: 2 (dim 1) != 3 (dim 0)

---

Now, time to apply it to the diabetes dataset.

In [4]:
diabetes = pd.read_csv('https://raw.githubusercontent.com/ryanleeallred/datasets/master/diabetes.csv')
diabetes.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


Although neural networks can handle non-normalized data, scaling or normalizing your data will improve your neural network's learning speed. Try to apply the sklearn `MinMaxScaler` or `Normalizer` to your diabetes dataset. 

In [10]:
from sklearn.preprocessing import MinMaxScaler, Normalizer

feats = list(diabetes)[:-1]

X = ...

---

## Diabetus, Part 2

*From Cassidy*

In [None]:
# Copied from https://medium.com/@thomascountz/19-line-line-by-line-python-perceptron-b6f113b161f3
# This class uses ReLU activation function and adjusts the weights by 
# multiplying times the error (label-prediction) and the learning rate. 
class Perceptron(object):

    def __init__(self, no_of_inputs, threshold=100, learning_rate=0.01):
        self.threshold = threshold
        self.learning_rate = learning_rate
        self.weights = np.zeros(no_of_inputs + 1)
           
    def predict(self, inputs):
        summation = np.dot(inputs, self.weights[1:]) + self.weights[0]
        if summation > 0:
            activation = 1
        else:
            activation = 0            
        return activation

    def train(self, training_inputs, labels):
        for _ in range(self.threshold):
            for inputs, label in zip(training_inputs, labels):
                prediction = self.predict(inputs)
                self.weights[1:] += self.learning_rate * (label - prediction) * inputs
                self.weights[0] += self.learning_rate * (label - prediction)

First, we'll test it on the same NAND gate defined above

In [None]:
inputs = np.array([[0,0],
                   [1,0],
                   [0,1],
                   [1,1]])

correct_outputs = np.array([[1],
                           [1],
                           [1],
                           [0]])

pn = Perceptron(no_of_inputs=2, threshold=100, learning_rate=0.01)
pn.train(inputs, correct_outputs)

In [None]:
pn.weights

array([ 0.03, -0.01, -0.02])

In [None]:
print("Perceptron's predictions for NAND gate")
for row in inputs:
    print(f'{row[0]} {row[1]} -> {pn.predict(row)}')

Perceptron's predictions for NAND gate
0 0 -> 1
1 0 -> 1
0 1 -> 1
1 1 -> 0


Now for something more complicated, let's try the Pima Indians Diatabetes database.

In [None]:
import pandas as pd
from sklearn.metrics import accuracy_score
url = 'https://raw.githubusercontent.com/ryanleeallred/datasets/master/diabetes.csv'
df = pd.read_csv(url)
df.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


In [None]:
X = df.drop(columns='Outcome').values
y = df.Outcome.values
no_of_inputs = pima_inputs.shape[1]

In [None]:
# Accuracy after 10 iterations
pn = Perceptron(no_of_inputs=no_of_inputs, threshold=10, learning_rate=0.01)
pn.train(X, y)
y_pred = [pn.predict(row) for row in X]
print(f'weights: {pn.weights}')
print(f'Accuracy: {accuracy_score(y, y_pred)}')

weights: [-2.9      8.97     1.14    -2.85    -1.9      1.63     0.635    0.56346
 -1.2    ]
Accuracy: 0.5885416666666666


In [None]:
# Accuracy after 100 iterations
pn = Perceptron(no_of_inputs=no_of_inputs, threshold=100, learning_rate=0.01)
pn.train(X, y)
y_pred = [pn.predict(row) for row in X]
print(f'weights: {pn.weights}')
print(f'Accuracy: {accuracy_score(y, y_pred)}')

weights: [-28.28     15.3       0.97     -3.48     -2.74      1.59     -0.127
   6.89463  -2.25   ]
Accuracy: 0.6536458333333334


In [None]:
# Accuracy after 1000 iterations
pn = Perceptron(no_of_inputs=no_of_inputs, threshold=1000, learning_rate=0.01)
pn.train(X, y)
y_pred = [pn.predict(row) for row in X]
print(f'weights: {pn.weights}')
print(f'Accuracy: {accuracy_score(y, y_pred)}')

weights: [-237.44      13.58       2.61      -3.4       -2.16       2.52
    2.666     46.19301   -1.95   ]
Accuracy: 0.6171875


In [None]:
# Accuracy after 10000 iterations
pn = Perceptron(no_of_inputs=no_of_inputs, threshold=10000, learning_rate=0.01)
pn.train(X, y)
y_pred = [pn.predict(row) for row in X]
print(f'weights: {pn.weights}')
print(f'Accuracy: {accuracy_score(y, y_pred)}')

weights: [-847.85      12.63       5.72      -3.94      -1.88       1.59
    8.516    124.50169    1.21   ]
Accuracy: 0.6822916666666666


---
---

## Stretch Goals:

- Research "backpropagation" to learn how weights get updated in neural networks (tomorrow's lecture). 
- Implement a multi-layer perceptron. (for non-linearly separable classes)
- Try and implement your own backpropagation algorithm.
- What are the pros and cons of the different activation functions? How should you decide between them for the different layers of a neural network?