# Intro to Neural Networks Assignment

## Define the Following:
You can add image, diagrams, whatever you need to ensure that you understand the concepts below.

### Input Layer:
The first layer of a NN. It's a series of variables that correspond directly to a feature in the data. In image recognition, for example, they could be the brightness of a particular pixel.  

### Hidden Layer:
Internal layers of the NN, connecting input and output layers.  They are a series of functions, each of which takes in the values of all the variables in the previous layer (starting with the input layer) and produces a single number as output.  The output value is generated as a sum of the values of all the nodes in the previous layer, each multiplied by a weight.  That summation gets passed through a squishification function, and added to a bias to produce the output.  Hidden layers don't need to correspond to any recognizable feature of the outside world.

### Output Layer:
The final layer of the NN.  Each node is a function like the hidden layers, but its output corresponds to the NN's predictions for a single outcome variable.  Note that the nodes in this layer don't usually have an activation function.

### Neuron:
Each of the nodes in the NN.  They have a structure analogous to that of biological neurons. Neurons read the activation state of a bunch of neurons in the previous layer (each weighed differently), and use that information to produce a single output value. 

### Weight:
Each neuron in one layer is connected to all the neurons in the next layer. The strength of each connection (called the weight) determines how sensitive the value of neuron N+1 is to the value of neuron N.

### Activation Function:
Each neuron must aggregate inputs and produce a single output.  The activation function shapes that output to be within useful bounds.  One can use several possible activation functions, but common functions (sigmoid, tanh) will map the whole numberline to a small range ((0,1) or (-1,1)) or get rid of negative numbers (ReLU).  Note that all the nodes in a layer of the NN tend to have the same activation function.

### Node Map:
A graphical representation of a NN that shows its layers, the node types within them, and how all the nodes connect to each other.

### Perceptron:
The simplest feedforward neural network.  It is an artificial neuron with a single layer and a unit step function as its activation function.  It connects a series of input nodes with a single output node.  It is also known as a linear binary classifier, and it is usually used to classify data into two parts.

### Bias
A constant term that gets included in a node's weighted sum of the activations of the nodes in the previous layer.  Bias tells us the baseline level of activation of a neuron.

## Inputs -> Outputs

### Explain the flow of information through a neural network from inputs to outputs. Be sure to include: inputs, weights, bias, and activation functions. How does it all flow from beginning to end?

The activation of the nodes in the input layer is proportional to the data itself. Each feature in the dataset corresponds to a particular node in the activation layer. Starting after the input, all the neurons in the Nth layer are connected to all the neurons in the N+1th layer.  Each connection (between one node and the next) has an associated weight, which determines how important that connection is.  Higher weights mean that the activation of the N+1th neuron will respond more strongly to activation in the Nth. Each neuron in the hidden layer also has a bias, which gets added in the summation of activations from its predecessor neurons. That way each neuron has a baseline activity regardless of the activation of its predecessors. Each node sums up its inputs and passes them through an activation function, which can map the sum of outputs to a particular range (eg (0-1)) or otherwise transform it in a useful way. 

The overall flow of information looks like this.  An input neuron activates based on the value of a feature in the dataset.  The activity of that neuron influences all the neurons in the next layer, proportionally to the weights connecting them to the first. Each neuron in the 2nd layer decides on its own output value by considering the contributions from everything in the first layer plus a bias term, all passed through the activation function.  The process then repeats for the next layer. 

## Write your own perceptron code that can correctly classify a NAND gate. 

| x1 | x2 | y |
|----|----|---|
| 0  | 0  | 1 |
| 1  | 0  | 1 |
| 0  | 1  | 1 |
| 1  | 1  | 0 |

In [34]:
import numpy as np
# np.random.seed(2)

inputs = np.array([[0,0,1],
                   [1,0,1],
                   [0,1,1],
                   [1,1,1]])

correct_outputs = [[1],
                   [1],
                   [1],
                   [0]]

def sigmoid(x):
    return 1/(1 + np.exp(-x))

def sigmoid_derivative(x):
    return sigmoid(x) * (1 - sigmoid(x))

# Initialize weights, all in the range (-1, 1)
weights = np.random.random((3,1))*2 - 1
weights

array([[-0.84070905],
       [ 0.01049218],
       [-0.86942699]])

In [35]:
for iteration in range(10000):

    # Weighted sum of inputs and weights
    weighted_sum = np.dot(inputs, weights)

    # Activate with sigmoid function
    activated_output = sigmoid(weighted_sum)

    # Calculate Error
    error = correct_outputs - activated_output

    # Calculate weight adjustments with sigmoid_derivative
    adjustments = error * sigmoid_derivative(activated_output)

    # Update weights
    weights += np.dot(inputs.T, adjustments)

print('optimized weights after training: ')
print(weights)

print("Output After Training:")
print(activated_output)
print()

optimized weights after training: 
[[-11.83931027]
 [-11.83931027]
 [ 17.80783017]]
Output After Training:
[[0.99999998]
 [0.99744825]
 [0.99744825]
 [0.00281299]]



## Implement your own Perceptron Class and use it to classify a binary dataset like: 
- [The Pima Indians Diabetes dataset](https://raw.githubusercontent.com/ryanleeallred/datasets/master/diabetes.csv) 
- [Titanic](https://raw.githubusercontent.com/ryanleeallred/datasets/master/titanic.csv)
- [A two-class version of the Iris dataset](https://raw.githubusercontent.com/ryanleeallred/datasets/master/Iris.csv)

You may need to search for other's implementations in order to get inspiration for your own. There are *lots* of perceptron implementations on the internet with varying levels of sophistication and complexity. Whatever your approach, make sure you understand **every** line of your implementation and what its purpose is.

In [39]:
# Copied from https://medium.com/@thomascountz/19-line-line-by-line-python-perceptron-b6f113b161f3
# This class uses ReLU activation function and adjusts the weights by 
# multiplying times the error (label-prediction) and the learning rate. 
class Perceptron(object):

    def __init__(self, no_of_inputs, threshold=100, learning_rate=0.01):
        self.threshold = threshold
        self.learning_rate = learning_rate
        self.weights = np.zeros(no_of_inputs + 1)
           
    def predict(self, inputs):
        summation = np.dot(inputs, self.weights[1:]) + self.weights[0]
        if summation > 0:
            activation = 1
        else:
            activation = 0            
        return activation

    def train(self, training_inputs, labels):
        for _ in range(self.threshold):
            for inputs, label in zip(training_inputs, labels):
                prediction = self.predict(inputs)
                self.weights[1:] += self.learning_rate * (label - prediction) * inputs
                self.weights[0] += self.learning_rate * (label - prediction)

First, we'll test it on the same NAND gate defined above

In [68]:
inputs = np.array([[0,0],
                   [1,0],
                   [0,1],
                   [1,1]])

correct_outputs = np.array([[1],
                           [1],
                           [1],
                           [0]])

pn = Perceptron(no_of_inputs=2, threshold=100, learning_rate=0.01)
pn.train(inputs, correct_outputs)

In [70]:
pn.weights

array([ 0.03, -0.01, -0.02])

In [81]:
print("Perceptron's predictions for NAND gate")
for row in inputs:
    print(f'{row[0]} {row[1]} -> {pn.predict(row)}')

Perceptron's predictions for NAND gate
0 0 -> 1
1 0 -> 1
0 1 -> 1
1 1 -> 0


Now for something more complicated, let's try the Pima Indians Diatabetes database.

In [83]:
import pandas as pd
from sklearn.metrics import accuracy_score
url = 'https://raw.githubusercontent.com/ryanleeallred/datasets/master/diabetes.csv'
df = pd.read_csv(url)
df.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


In [95]:
X = df.drop(columns='Outcome').values
y = df.Outcome.values
no_of_inputs = pima_inputs.shape[1]

In [101]:
# Accuracy after 10 iterations
pn = Perceptron(no_of_inputs=no_of_inputs, threshold=10, learning_rate=0.01)
pn.train(X, y)
y_pred = [pn.predict(row) for row in X]
print(f'weights: {pn.weights}')
print(f'Accuracy: {accuracy_score(y, y_pred)}')

weights: [-2.9      8.97     1.14    -2.85    -1.9      1.63     0.635    0.56346
 -1.2    ]
Accuracy: 0.5885416666666666


In [102]:
# Accuracy after 100 iterations
pn = Perceptron(no_of_inputs=no_of_inputs, threshold=100, learning_rate=0.01)
pn.train(X, y)
y_pred = [pn.predict(row) for row in X]
print(f'weights: {pn.weights}')
print(f'Accuracy: {accuracy_score(y, y_pred)}')

weights: [-28.28     15.3       0.97     -3.48     -2.74      1.59     -0.127
   6.89463  -2.25   ]
Accuracy: 0.6536458333333334


In [103]:
# Accuracy after 1000 iterations
pn = Perceptron(no_of_inputs=no_of_inputs, threshold=1000, learning_rate=0.01)
pn.train(X, y)
y_pred = [pn.predict(row) for row in X]
print(f'weights: {pn.weights}')
print(f'Accuracy: {accuracy_score(y, y_pred)}')

weights: [-237.44      13.58       2.61      -3.4       -2.16       2.52
    2.666     46.19301   -1.95   ]
Accuracy: 0.6171875


In [104]:
# Accuracy after 10000 iterations
pn = Perceptron(no_of_inputs=no_of_inputs, threshold=10000, learning_rate=0.01)
pn.train(X, y)
y_pred = [pn.predict(row) for row in X]
print(f'weights: {pn.weights}')
print(f'Accuracy: {accuracy_score(y, y_pred)}')

weights: [-847.85      12.63       5.72      -3.94      -1.88       1.59
    8.516    124.50169    1.21   ]
Accuracy: 0.6822916666666666


## Stretch Goals:

- Research "backpropagation" to learn how weights get updated in neural networks (tomorrow's lecture). 
- Implement a multi-layer perceptron. (for non-linearly separable classes)
- Try and implement your own backpropagation algorithm.
- What are the pros and cons of the different activation functions? How should you decide between them for the different layers of a neural network?