# Intro to Neural Networks Assignment

## Define the Following:
You can add image, diagrams, whatever you need to ensure that you understand the concepts below.

### Input Layer:
The Input Layer is what receives input from our dataset. Sometimes it is called the visible layer because it's the only part that is exposed to our data and that our data interacts with directly. Typically node maps are drawn with one input node for each of the different inputs/features/columns of our dataset that will be passed to the network.
### Hidden Layer:
Layers after the input layer are called Hidden Layers. This is because they cannot be accessed except through the input layer. They're inside of the network and they perform their functions, but we don't directly interact with them. The simplest possible network is to have a single neuron in the hidden layer that just outputs the value. "Deep Learning" apart from being a big buzzword simply means that we are using a Neural Network that has multiple hidden layers. "Deep Learning" is a big part of the renewed hype around ANNs because it allows networks that are structured in specific ways to accomplish tasks that were previously out of reach (image recognition for example).
### Output Layer: 
The final layer is called the Output Layer. The purpose of the output layer is to output a vector of values that is in a format that is suitable for the type of problem that we're trying to address. Typically the output value is modified by an "activation function" to transform it into a format that makes sense for our context, here's a couple of examples:

NNs applied to a regression problem might have a single output node with no activation function because what we want is an unbounded continuous value.

NNS applied to a binary classification problem might use a sigmoid function as its activation function in order to squishify values down to represent a probability. Outputs in this case would represent the probability of predicting the primary class of interest. We can turn this into a class-specific prediction by rounding the outputted sigmoid probability up to 1 or down to 0.

NNS applied to multiclass classification problems might have multiple output nodes in the output layer, one for each class that we're trying to predict. This output layer would probably employ what's called a "softmax function" for accomplishing this. Don't worry about how that activation function works just yet, we'll get to it soon.
### Neuron: 
![Wikipedia Neuron Diagram](http://www.ryanleeallred.com/wp-content/uploads/2019/03/Screen-Shot-2019-03-31-at-10.19.43-PM.png)

Neural Networks aren't exactly a new technology, but recent breakthroughs have revitalized the area. For example the "Perceptron" -one of the basic building blocks of the technology- was invented in 1957. 

Artificial Neural Networks are a computational model that was inspired by how neural networks in the brain process information. In the brain electrochemical signals flow from earlier neurons through the dendrites of the cell toward the cell body. If the received signals surpass a certain threshold with a given timing then the neuron fires sending a large spike of energy down the axon and through the axon terminals to other neurons down the line. 

In Artificial Neural Networks the neurons or "nodes" are similar in that they receive inputs and pass on their signal to the next layer of nodes if a certain threshold is reached, but that's about where the similarities end. Remember that ANNs are not brains. Don't fall into the common trap of assuming that if an Artificial Neural Network has as many nodes as the human brain that it will be just as powerful or just as capable. The goal with ANNs is not to create a realistic model of the brain but to craft robust algorithms and data structures that can model the complex relationships found in data.
### Weight:
Each input has an associated weight (w), which is assigned on the basis of its relative importance to other inputs. The node applies a function f (defined below) to the weighted sum of its inputs as 
### Activation Function:
Same as Transfer Function.  In Neural Networks, each node has an activation function. Each node in a given layer typically has the same activation function. These activation functions are the biggest piece of neural networks that have been inspired by actual biology. The activation function decides whether a cell "fires" or not. Sometimes it is said that the cell is "activated" or not. In Artificial Neural Networks activation functions decide how much signal to pass onto the next layer. This is why they are sometimes referred to as transfer functions because they determine how much signal is transferred to the next layer.

Common Activation Functions:

![Activation Functions](http://www.snee.com/bobdc.blog/img/activationfunctions.png)
### Node Map:
### Perceptron: 
A perceptron is just a single node or neuron of a neural network with nothing else. It can take any number of inputs and spit out an output.
![Figure 2.1](http://www.ryanleeallred.com/wp-content/uploads/2019/04/Screen-Shot-2019-04-01-at-2.34.58-AM.png)

## Inputs -> Outputs

### Explain the flow of information through a neural network from inputs to outputs. Be sure to include: inputs, weights, bias, and activation functions. How does it all flow from beginning to end?

#### Input or Visible Layers

The Input Layer is what receives input from our dataset. Sometimes it is called the visible layer because it's the only part that is exposed to our data and that our data interacts with directly. Typically node maps are drawn with one input node for each of the different inputs/features/columns of our dataset that will be passed to the network. 

#### Hidden Layers

Layers after the input layer are called Hidden Layers. This is because they cannot be accessed except through the input layer. They're inside of the network and they perform their functions, but we don't directly interact with them. The simplest possible network is to have a single neuron in the hidden layer that just outputs the value. "Deep Learning" apart from being a big buzzword simply means that we are using a Neural Network that has multiple hidden layers. "Deep Learning" is a big part of the renewed hype around ANNs because it allows networks that are structured in specific ways to accomplish tasks that were previously out of reach (image recognition for example).  

#### Output Layers

The final layer is called the Output Layer. The purpose of the output layer is to output a vector of values that is in a format that is suitable for the type of problem that we're trying to address. Typically the output value is modified by an "activation function" to transform it into a format that makes sense for our context, here's a couple of examples:

- NNs applied to a regression problem might have a single output node with no activation function because what we want is an unbounded continuous value.

- NNS applied to a binary classification problem might use a sigmoid function as its activation function in order to squishify values down to represent a probability. Outputs in this case would represent the probability of predicting the primary class of interest. We can turn this into a class-specific prediction by rounding the outputted sigmoid probability up to 1 or down to 0. 

- NNS applied to multiclass classification problems might have multiple output nodes in the output layer, one for each class that we're trying to predict. This output layer would probably employ what's called a "softmax function" for accomplishing this. Don't worry about how that activation function works just yet, we'll get to it soon.

## Write your own perceptron code that can correctly classify a NAND gate. 

| x1 | x2 | y |
|----|----|---|
| 0  | 0  | 1 |
| 1  | 0  | 1 |
| 0  | 1  | 1 |
| 1  | 1  | 0 |

In [1]:
# Establish training data

import numpy as np

inputs = np.array([[0,0,1],
                   [1,0,1], 
                   [0,1,1], 
                   [1,1,1]])

correct_outputs = [[1],
                  [1],
                  [1],
                  [0]]


inputs


array([[0, 0, 1],
       [1, 0, 1],
       [0, 1, 1],
       [1, 1, 1]])

In [2]:
# Sigmoid activation function and its derivative for updating weights

def sigmoid(x):
  return 1 / (1 + np.exp(-x))

def sigmoid_derivative(x):
  return sigmoid(x) * (1 - sigmoid(x))

In [3]:
# Initialize random weights for our three inputs

weights = 2 * np.random.random((3,1)) - 1
weights

array([[-0.96291297],
       [ 0.89731878],
       [ 0.32628254]])

In [4]:
#Calculate weighted sum of inputs and weights
weighted_sum = np.dot(inputs, weights)
weighted_sum

array([[ 0.32628254],
       [-0.63663043],
       [ 1.22360132],
       [ 0.26068835]])

In [5]:
# Output the activated value for the end of 1 training epoch

activated_output = sigmoid(weighted_sum)
activated_output

array([[0.58085459],
       [0.34600863],
       [0.77269669],
       [0.56480549]])

In [6]:
#Take difference of output and true values to calculate error

error = correct_outputs - activated_output
error

array([[ 0.41914541],
       [ 0.65399137],
       [ 0.22730331],
       [-0.56480549]])

In [7]:
adjustments = error * sigmoid_derivative(activated_output) ### Gradient Descent / Backpropagation Magic
adjustments

array([[ 0.09642209],
       [ 0.15870028],
       [ 0.04912155],
       [-0.13051312]])

In [8]:
# update weightsweights += 
np.dot(inputs.T, adjustments)
weights

array([[-0.96291297],
       [ 0.89731878],
       [ 0.32628254]])

In [9]:
for iteration in range(10000):
  
  # Weighted sum of inputs and weights
  weighted_sum = np.dot(inputs, weights)
  
  # Activate with sigmoid function
  activated_output = sigmoid(weighted_sum)
  
  # Calculate Error
  error = correct_outputs - activated_output
  
  # Calculate weight adjustments with sigmoid_derivative
  adjustments = error * sigmoid_derivative(activated_output)
  
  # Update weights
  weights += np.dot(inputs.T, adjustments)
  
print('optimized weights after training: ')
print(weights)

print("Output After Training:")
print(activated_output)

optimized weights after training: 
[[-11.83979111]
 [-11.83979111]
 [ 17.80855125]]
Output After Training:
[[0.99999998]
 [0.99744886]
 [0.99744886]
 [0.00281232]]


## Implement your own Perceptron Class and use it to classify a binary dataset like: 
- [The Pima Indians Diabetes dataset](https://raw.githubusercontent.com/ryanleeallred/datasets/master/diabetes.csv) 
- [Titanic](https://raw.githubusercontent.com/ryanleeallred/datasets/master/titanic.csv)
- [A two-class version of the Iris dataset](https://raw.githubusercontent.com/ryanleeallred/datasets/master/Iris.csv)

You may need to search for other's implementations in order to get inspiration for your own. There are *lots* of perceptron implementations on the internet with varying levels of sophistication and complexity. Whatever your approach, make sure you understand **every** line of your implementation and what its purpose is.

In [10]:
import pandas as pd
import matplotlib.pyplot as plt

url = 'https://raw.githubusercontent.com/ryanleeallred/datasets/master/diabetes.csv'
df = pd.read_csv(url)
df.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


In [11]:
X = np.array(df[['Glucose', 'BloodPressure']])
y = np.array(df['Outcome'])


In [12]:
# https://machinelearningmastery.com/implement-perceptron-algorithm-scratch-python/

from random import seed
from random import randrange
from csv import reader


 

 
# Calculate accuracy percentage
def accuracy_metric(actual, predicted):
	correct = 0
	for i in range(len(actual)):
		if actual[i] == predicted[i]:
			correct += 1
	return correct / float(len(actual)) * 100.0
 
# Evaluate an algorithm using a cross validation split
def evaluate_algorithm(dataset, algorithm, n_folds, *args):
	folds = cross_validation_split(dataset, n_folds)
	scores = list()
	for fold in folds:
		train_set = list(folds)
		train_set.remove(fold)
		train_set = sum(train_set, [])
		test_set = list()
		for row in fold:
			row_copy = list(row)
			test_set.append(row_copy)
			row_copy[-1] = None
		predicted = algorithm(train_set, test_set, *args)
		actual = [row[-1] for row in fold]
		accuracy = accuracy_metric(actual, predicted)
		scores.append(accuracy)
	return scores
 
# Make a prediction with weights
def predict(row, weights):
	activation = weights[0]
	for i in range(len(row)-1):
		activation += weights[i + 1] * row[i]
	return 1.0 if activation >= 0.0 else 0.0
 
# Estimate Perceptron weights using stochastic gradient descent
def train_weights(train, l_rate, n_epoch):
	weights = [0.0 for i in range(len(train[0]))]
	for epoch in range(n_epoch):
		for row in train:
			prediction = predict(row, weights)
			error = row[-1] - prediction
			weights[0] = weights[0] + l_rate * error
			for i in range(len(row)-1):
				weights[i + 1] = weights[i + 1] + l_rate * error * row[i]
	return weights
 
# Perceptron Algorithm With Stochastic Gradient Descent
def perceptron(train, test, l_rate, n_epoch):
	predictions = list()
	weights = train_weights(train, l_rate, n_epoch)
	for row in test:
		prediction = predict(row, weights)
		predictions.append(prediction)
	return(predictions)
 

# load and prepare data
filename = 'diabetes.csv'
dataset = pd.read_csv(filename, header = None)
# evaluate algorithm
n_folds = 5
l_rate = 0.01
n_epoch = 500
scores = evaluate_algorithm(dataset, perceptron, n_folds, l_rate, n_epoch)
print('Scores: %s' % scores)
print('Mean Accuracy: %.3f%%' % (sum(scores)/float(len(scores))))

NameError: name 'cross_validation_split' is not defined

In [None]:
pn = Perceptron(0.1, 100)
pn.fit(X, y)
plt.plot(range(1, len(pn.errors) + 1), pn.errors, marker='o')
plt.xlabel('Epochs')
plt.ylabel('Number of misclassifications')
plt.show()

## Stretch Goals:

- Research "backpropagation" to learn how weights get updated in neural networks (tomorrow's lecture). 
- Implement a multi-layer perceptron. (for non-linearly separable classes)
- Try and implement your own backpropagation algorithm.
- What are the pros and cons of the different activation functions? How should you decide between them for the different layers of a neural network?