# Intro to Neural Networks Assignment

## Define the Following:
You can add image, diagrams, whatever you need to ensure that you understand the concepts below.

### Input Layer:
> This is the first layer of a "Neural Network." It's composed of a range of numbers or "neurons"/"nodes" (imputs - from 1 to n), where each number is that neuron's value or "Activation".
### Hidden Layer:
> Determines how Neural Network's process will be handled to produce "output layer." Number of layers and number of neurons/nodes in each layer varies.
### Output Layer:
> Last layer in the "Neural Network." Just like the imput layer, this has a range of neurons/nodes (0 to n) that returns resulting values from neural network (ie., prediction(s)/classifications).
### Neuron:
> Units that compose each layer. Neurons in layers past input layer are determined by the sum of the weighted activations from neurons connnected to it. This weighted sum is usually passed through a function that would restrict the resulting value into a desired value range (ie., maybe  using sigmoid function that squishes values to range of 0 and 1)
### Weight:
> One of the two parameters included in the connection of one neuron and the neuron in next layer. Weights are numbers that multiplied to the activation value of the neuron assigned . Initially, weights are assigned arbitrarilly but then adjusted usually through "back propagation" method after error is calculated at the end of each round/"Epoc".
### Activation Function: 
> Activations in one layer determine the activations in the next layer. A measure of how positive the relevant weighted sum is (activation = function * (w1.a1 + w2.a2 +...+wn.an)
### Node Map:
### Perceptron:
> Simplest kind of neural network. A single neuron/node that can take any number of inputs and return a single output. It takes the weighted sum of all inputs and passes it through the activation function before returning an output.
### Bias:
> Number (either positive or negative) added to the total weighted activation sum before passing through squisification function. The "bias" tells you how high or low the weighted sum needs to be before it starts to give meaningfully active. This is done for each of the neurons.


## Inputs -> Outputs

### Explain the flow of information through a neural network from inputs to outputs. Be sure to include: inputs, weights, bias, and activation functions. How does it all flow from beginning to end?

#### Your Answer Here

## Write your own perceptron code that can correctly classify a NAND gate. 

| x1 | x2 | y |
|----|----|---|
| 0  | 0  | 1 |
| 1  | 0  | 1 |
| 0  | 1  | 1 |
| 1  | 1  | 0 |

In [1]:
##### Your Code Here #####
import numpy as np
#np.random.seed(1)

inputs = np.array([[0, 0, 1],
                   [1, 0, 1],
                   [0, 1, 1],
                   [1, 1, 0]])

correct_outputs = ([[1],
                   [1],
                   [1],
                   [0]])

Sigmoid activation function:

In [2]:
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def sigmoid_derivative(x):
    return sigmoid(x) * (1 - sigmoid(x))

Weights for 3 inputs:

In [3]:
# weights with values between -1 and 1
weights = 2 * np.random.random((3,1)) - 1
weights

array([[ 0.31602553],
       [-0.03893788],
       [-0.31142186]])

Calculated weighted sum:

ALL STEPS ABOVE PUT TOGETHER:

In [4]:
for iteration in range(1000):
    weighted_sum = np.dot(inputs, weights)
    
    activated_output = sigmoid(weighted_sum)
    
    error = correct_outputs - activated_output
    
    adjustments = error * sigmoid_derivative(activated_output)
    
    weights += np.dot(inputs.T, adjustments)
    
print(f'Optimized Weights After Training: \n{weights}')
print(f'Output After Training: \n{activated_output}')

Optimized Weights After Training: 
[[-2.65703498]
 [-2.67003806]
 [ 8.28132436]]
Output After Training: 
[[0.99974649]
 [0.99640035]
 [0.99635338]
 [0.00483956]]


## Implement your own Perceptron Class and use it to classify a binary dataset like: 
- [The Pima Indians Diabetes dataset](https://raw.githubusercontent.com/ryanleeallred/datasets/master/diabetes.csv) 
- [Titanic](https://raw.githubusercontent.com/ryanleeallred/datasets/master/titanic.csv)
- [A two-class version of the Iris dataset](https://raw.githubusercontent.com/ryanleeallred/datasets/master/Iris.csv)

You may need to search for other's implementations in order to get inspiration for your own. There are *lots* of perceptron implementations on the internet with varying levels of sophistication and complexity. Whatever your approach, make sure you understand **every** line of your implementation and what its purpose is.

## Using Tutorial from `machinelearningmastery.com`
https://machinelearningmastery.com/implement-perceptron-algorithm-scratch-python/

### 1. Making Predictions
The first step is to develop a function that can make predictions.

This will be needed both in the evaluation of candidate weights values in stochastic gradient descent, and after the model is finalized and we wish to start making predictions on the data or new data.

Below is a function named `predict()` that predicts an output value for a row given a set of weights.

The first weight is always the bias as it is standalone and not responsible for a specific input value.

There are two inputs values (X1 and X2) and three weight values (bias, w1 and w2). The activation equation we have modeled for this problem is:

`activation = (w1 * X1) + (w2 * X2) + bias`

`activation = (0.206 * X1) + (-0.234 * X2) + -0.1`

In [5]:
# Make prediction with weights
def predict(row, weights):
    activation = weights[0]
    for i in range(len(row)-1):
        activation += weights[i + 1] * row[i]
    return 1.0 if activation >= 0.0 else 0.0

# We can contrive a small dataset to test our prediction function:
# test predictions
dataset = [[2.7810836,2.550537003,0],
           [1.465489372,2.362125076,0],
           [3.396561688,4.400293529,0],
           [1.38807019,1.850220317,0],
           [ 3.06407232,3.005305973,0],
           [7.627531214,2.759262235,1],
           [5.332441248,2.088626775,1],
           [6.922596716,1.77106367,1],
           [8.675418651,-0.242068655,1],
           [7.673756466,3.508563011,1]]

weights = [-0.1, 0.20653640140000007, -0.23418117710000003]

for row in dataset:
    prediction = predict(row, weights)
    print(f"Expected->{row[-1]}, Predicted->{prediction}")

Expected->0, Predicted->0.0
Expected->0, Predicted->0.0
Expected->0, Predicted->0.0
Expected->0, Predicted->0.0
Expected->0, Predicted->0.0
Expected->1, Predicted->1.0
Expected->1, Predicted->1.0
Expected->1, Predicted->1.0
Expected->1, Predicted->1.0
Expected->1, Predicted->1.0


### 2. Training Network Weights
We can estimate the weight values for our training data using "stochastic gradient descent."

Stochastic gradient descent requeres two parameters:
- **Learning Rate**: Uset to limit the amount each weight is corrected each time it is updatated.
- **Epochs**: The number of times to run through the training data while updating the weight.

These, along with the training data will be the arguments to the function.

There are 3 loops we need to perform in the function:
1. Loop over each epoch.
2. Loop over each row in the training data for an epoch.
3. Loop over each weight and update it for a row in an epoch.

We update each weight fore each row in the training data, each epoch.

Weights are updated based on the error the model made. The error is calculated as the difference between the expected output value and the prediction made with tthe candidate weights.

There is one weight for each input attribute, and these are updated in a consistent way, for example:

`w(t+1) = w(t) + learning_rate * (expected(t) - predicted(t)) *x(t)`

The bias is updated in a similar way, except without an input as it is not associated with a specific input value:

`bias(t+1) = bias(t) + learning_rate * (expected(t) - predicted(t))`

Below is included a function named **`train_weights()`** that calculates weight values for the dataset using stochastic gradient descent.

You can see that we keep track of the "sum of squared error" (a positive value) each epoch so that we can pring out a nice message each outer loop.

In [6]:
# Make prediction with weights
def predict(row, weights):
    activation = weights[0]
    for i in range(len(row)-1):
        activation += weights[i + 1] * row[i]
    return 1.0 if activation >= 0.0 else 0.0

# Estimate Perceptron weights using stochastic gradient descent
def train_weights(train, l_rate, n_epoch):
    weights = [0.0 for i in range(len(train[0]))]
    for epoch in range(n_epoch):
        sum_error = 0.0
        for row in train:
            prediction = predict(row, weights)
            error = row[-1] - prediction
            sum_error += error**2
            weights[0] = weights[0] + l_rate * error
            for i in range(len(row)-1):
                weights[i + 1] = weights[i + 1] + l_rate * error * row[i]
        print(f'>epoch={epoch}, lrate={l_rate}, error={sum_error}')
    return weights

# Calculate weights
dataset = [[2.7810836, 2.550537003, 0],
            [1.465489372, 2.362125076, 0],
            [3.396561688, 4.400293529, 0],
            [1.38807019, 1.850220317, 0],
            [3.06407232, 3.005305973, 0],
            [7.627531214, 2.759262235, 1],
            [5.332441248, 2.088626775, 1],
            [6.922596716, 1.77106367, 1],
            [8.675418651, -0.242068655, 1],
            [7.673756466, 3.508563011, 1]]
l_rate = 0.1
n_epoch = 5
weights = train_weights(dataset, l_rate, n_epoch)
print(weights)

>epoch=0, lrate=0.1, error=2.0
>epoch=1, lrate=0.1, error=1.0
>epoch=2, lrate=0.1, error=0.0
>epoch=3, lrate=0.1, error=0.0
>epoch=4, lrate=0.1, error=0.0
[-0.1, 0.20653640140000007, -0.23418117710000003]


## Modeling the "Pima Diabetes" Dataset

In [7]:
##### Your Code Here #####
import pandas as pd

# Pima Indians Diabetes dataset
url = 'https://raw.githubusercontent.com/ryanleeallred/datasets/master/diabetes.csv'

df = pd.read_csv(url)

In [8]:
df.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


In [9]:
df.shape

(768, 9)

In [31]:
# Saving df as csv
# df.to_csv("diabetes.csv", index=False)

Will use **k-fold cross validation** to estimate the performance of the learned model on unseen data. This means that we will construct and evaluate K models and estimate the performance as the mean model error. Classification accuracy will be used to evaluate each model. These behaviors are provided in the **cross_validation_split()**, **accuracy_metric()** and **evaluate_algorith()** helper functions.

We will use the **`predict()`** and **`train_weights()`** functions created above in the example above to train the model and new **`perceptron()`** function to tie them together.

In [42]:
# Perceptron Algorithm on the Pima Diabetes Dataset
from random import seed
from random import randrange
from csv import reader

# Load a CSV file
def load_csv(filename):
    dataset = list()
    with open(filename, 'r') as file:
        next(file)
        csv_reader = reader(file)
        for row in csv_reader:
            if not row:
                continue
            dataset.append(row)
    return dataset

# Convert string column to float
def str_column_to_float(dataset, column):
    for row in dataset:
        row[column] = float(row[column])
        
# Convert string column to integer
def str_column_to_int(dataset, column):
	class_values = [row[column] for row in dataset]
	unique = set(class_values)
	lookup = dict()
	for i, value in enumerate(unique):
		lookup[value] = i
	for row in dataset:
		row[column] = lookup[row[column]]
	return lookup
 
# Split a dataset into k folds
def cross_validation_split(dataset, n_folds):
	dataset_split = list()
	dataset_copy = list(dataset)
	fold_size = int(len(dataset) / n_folds)
	for i in range(n_folds):
		fold = list()
		while len(fold) < fold_size:
			index = randrange(len(dataset_copy))
			fold.append(dataset_copy.pop(index))
		dataset_split.append(fold)
	return dataset_split
 
# Calculate accuracy percentage
def accuracy_metric(actual, predicted):
	correct = 0
	for i in range(len(actual)):
		if actual[i] == predicted[i]:
			correct += 1
	return correct / float(len(actual)) * 100.0
 
# Evaluate an algorithm using a cross validation split
def evaluate_algorithm(dataset, algorithm, n_folds, *args):
	folds = cross_validation_split(dataset, n_folds)
	scores = list()
	for fold in folds:
		train_set = list(folds)
		train_set.remove(fold)
		train_set = sum(train_set, [])
		test_set = list()
		for row in fold:
			row_copy = list(row)
			test_set.append(row_copy)
			row_copy[-1] = None
		predicted = algorithm(train_set, test_set, *args)
		actual = [row[-1] for row in fold]
		accuracy = accuracy_metric(actual, predicted)
		scores.append(accuracy)
	return scores
 
# Make a prediction with weights
def predict(row, weights):
	activation = weights[0]
	for i in range(len(row)-1):
		activation += weights[i + 1] * row[i]
	return 1.0 if activation >= 0.0 else 0.0
 
# Estimate Perceptron weights using stochastic gradient descent
def train_weights(train, l_rate, n_epoch):
	weights = [0.0 for i in range(len(train[0]))]
	for epoch in range(n_epoch):
		for row in train:
			prediction = predict(row, weights)
			error = row[-1] - prediction
			weights[0] = weights[0] + l_rate * error
			for i in range(len(row)-1):
				weights[i + 1] = weights[i + 1] + l_rate * error * row[i]
	return weights
 
# Perceptron Algorithm With Stochastic Gradient Descent
def perceptron(train, test, l_rate, n_epoch):
	predictions = list()
	weights = train_weights(train, l_rate, n_epoch)
	for row in test:
		prediction = predict(row, weights)
		predictions.append(prediction)
	return(predictions)
 
# Test the Perceptron algorithm on the sonar dataset
seed(1)
# load and prepare data
filename = 'diabetes.csv'
dataset = load_csv(filename)
for i in range(len(dataset[0])-1):
	str_column_to_float(dataset, i)
# convert string class to integers
str_column_to_int(dataset, len(dataset[0])-1)
# evaluate algorithm
n_folds = 3
l_rate = 0.01
n_epoch = 500
scores = evaluate_algorithm(dataset, perceptron, n_folds, l_rate, n_epoch)

print('Scores: %s' % scores)
print('Mean Accuracy: %.3f%%' % (sum(scores)/float(len(scores))))

Scores: [50.390625, 70.703125, 60.546875]
Mean Accuracy: 60.547%


## Stretch Goals:

- Research "backpropagation" to learn how weights get updated in neural networks (tomorrow's lecture). 
- Implement a multi-layer perceptron. (for non-linearly separable classes)
- Try and implement your own backpropagation algorithm.
- What are the pros and cons of the different activation functions? How should you decide between them for the different layers of a neural network?