# Intro to Neural Networks Assignment

## Define the Following:
You can add image, diagrams, whatever you need to ensure that you understand the concepts below.

### Input Layer: 
The Input Layer is what receives input from our dataset. Sometimes it is called the visible layer because it's the only part that is exposed to our data and that our data interacts with directly. Typically node maps are drawn with one input node for each of the different inputs/features/columns of our dataset that will be passed to the network.

### Hidden Layer:
Layers after the input layer are called Hidden Layers. This is because they cannot be accessed except through the input layer. They're inside of the network and they perform their functions, but we don't directly interact with them. The simplest possible network is to have a single neuron in the hidden layer that just outputs the value.

### Output Layer:
The final layer is called the Output Layer. The purpose of the output layer is to output a vector of values that is in a format that is suitable for the type of problem that we're trying to address. 

### Neuron:
"Nodes" that receive inputs and pass on their signal to the next layer of nodes if a certain threshold is reached.

### Weight:
Refers to a strength or amplitude of a connection between two nodes (the amount of influence the firing of one neuron has on another). Also, the importance of an input.

### Activation Function:
In Neural Networks, each node has an activation function. Each node in a given layer typically has the same activation function. These activation functions are the biggest piece of neural networks that have been inspired by actual biology. The activation function decides whether a cell "fires" or not. Sometimes it is said that the cell is "activated" or not. In Artificial Neural Networks activation functions decide how much signal to pass onto the next layer. This is why they are sometimes referred to as transfer functions because they determine how much signal is transferred to the next layer.

### Node Map:
A visual diagram of the architecture or "topology" of the neural network. It's kind of like a flow chart in that it shows the path from inputs to outputs. They are usually color coded and help us understand at a very high level, some of the differences in architecture between kinds of neural networks.

### Perceptron:
A perceptron is just a single node or neuron of a neural network with nothing else. It can take any number of inputs and spit out an output. What a neuron does is it takes each of the input values, multplies each of them by a weight, sums all of these products up, and then passes the sum through what is called an "activation function" the result of which is the final value.

## Inputs -> Outputs

### Explain the flow of information through a neural network from inputs to outputs. Be sure to include: inputs, weights, bias, and activation functions. How does it all flow from beginning to end?

The inputs x are multiplied by their weights w. All the values summed up, and activation function is applied to the sum. The resulting value is an input to the next layer, or is output.

## Write your own perceptron code that can correctly classify a NAND gate. 

| x1 | x2 | y |
|----|----|---|
| 0  | 0  | 1 |
| 1  | 0  | 1 |
| 0  | 1  | 1 |
| 1  | 1  | 0 |

In [56]:
# Wx+b >= 0 ==> 1

# Wx+b <0 ==> 0

import numpy as np
inputs = np.array([[1,0,0], 
                  [1,1,0], 
                  [1,0,1], 
                  [1,1,1]])

correct_outputs = [[1], 
                    [1], 
                    [1], 
                    [0]]

weights = 2 * np.random.random((3,1)) - 1
weights

array([[-0.09202501],
       [-0.36177774],
       [ 0.52527697]])

In [57]:
def sigmoid(x):
  return 1 / (1 + np.exp(-x))

def sigmoid_derivative(x):
  return sigmoid(x) * (1 - sigmoid(x))

In [58]:
for iteration in range(10000):
  
    # Weighted sum of inputs and weights
    weighted_sum = np.dot(inputs, weights)

    # Activate with sigmoid function
    activated_output = sigmoid(weighted_sum)

    # Calculate Error
    error = correct_outputs - activated_output

    # Calculate weight adjustments with sigmoid_derivative
    adjustments = error * sigmoid_derivative(activated_output)

    # Update weights
    weights += np.dot(inputs.T, adjustments)
    
  
print('optimized weights after training: ')
print(weights)

print("Output After Training:")
print(activated_output)

optimized weights after training: 
[[ 17.80769421]
 [-11.83921961]
 [-11.83921961]]
Output After Training:
[[0.99999998]
 [0.99744813]
 [0.99744813]
 [0.00281312]]


## Implement your own Perceptron Class and use it to classify a binary dataset like: 
- [The Pima Indians Diabetes dataset](https://raw.githubusercontent.com/ryanleeallred/datasets/master/diabetes.csv) 
- [Titanic](https://raw.githubusercontent.com/ryanleeallred/datasets/master/titanic.csv)
- [A two-class version of the Iris dataset](https://raw.githubusercontent.com/ryanleeallred/datasets/master/Iris.csv)

You may need to search for other's implementations in order to get inspiration for your own. There are *lots* of perceptron implementations on the internet with varying levels of sophistication and complexity. Whatever your approach, make sure you understand **every** line of your implementation and what its purpose is.

In [10]:
import pandas as pd

df = pd.read_csv("https://raw.githubusercontent.com/ryanleeallred/datasets/master/diabetes.csv")
df.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


In [11]:
df.isna().sum()

Pregnancies                 0
Glucose                     0
BloodPressure               0
SkinThickness               0
Insulin                     0
BMI                         0
DiabetesPedigreeFunction    0
Age                         0
Outcome                     0
dtype: int64

In [13]:
y = df['Outcome']
y.head()

0    1
1    0
2    1
3    0
4    1
Name: Outcome, dtype: int64

In [108]:
X = df.drop(columns=['Outcome','SkinThickness','DiabetesPedigreeFunction','BloodPressure','Pregnancies'])
X.head()

Unnamed: 0,Glucose,Insulin,BMI,Age
0,148,0,33.6,50
1,85,0,26.6,31
2,183,0,23.3,32
3,89,94,28.1,21
4,137,168,43.1,33


In [109]:
class PerceptronClassifier():
    """
    Basic perceptron class for binary classification
    """
    def __init__(self, learning_rate=0.1, n_iter=100, tolerance=0.01):
        self.learning_rate = learning_rate
        self.n_iter = n_iter
        self.tolerance = tolerance
    
    def fit(self, X, y):
        """
        Fit perceptron to a set of training data using gradient descent
        """
        # initialize weights and cost list
        self.weights_ = np.random.uniform(-0.01, 0.01, X.shape[1] + 1)
        self.costs_ = []
        # iterate until fit is adequate
        for i in range(self.n_iter):
            preds = self.predict_proba(X)
            errors = preds - y
            cost = np.sum(errors ** 2)
            self.costs_.append(cost)
            gradient = np.dot(X.T, errors)
            self.weights_[1:] -= self.learning_rate * gradient
            self.weights_[0] -= np.mean(errors)
            
            # break the loop if we are close enough
            if cost < self.tolerance:
                break
            
        return self
    
    def predict_proba(self, X):
        """
        Computes sigmoid output value given X
        """
        return 1. / (1. + np.exp(-(np.dot(X, self.weights_[1:]) + self.weights_[0])))
    
    def predict(self, X):
        """
        Predicts the binary class of X values
        """
        return np.where(self.predict_proba(X)>=0.5, 1, 0)
    
    def show_loss(self):
        """
        Shows loss along epochs
        """
        try:
            iters = range(len(self.costs_))
            fig, ax = plt.subplots()
            ax.plot(iters, self.costs_)
            ax.set_xlabel('Number of Iterations')
            ax.set_ylabel('Training Loss (SSE)')
            ax.set_title('Training Loss')
            plt.show()
        except:
            print ('Please train me first :)')

In [123]:
from sklearn.metrics import accuracy_score
from sklearn.metrics import roc_auc_score

ppn = PerceptronClassifier(learning_rate = .01, n_iter=10000)
ppn.fit(X, y)
train_predictions = ppn.predict(X)
y_pred_proba = ppn.predict_proba(X)

print(f'Train Accuracy: {accuracy_score(y, train_predictions)}')
print('Validation ROC AUC:', roc_auc_score(y, y_pred_proba))

Train Accuracy: 0.63671875
Validation ROC AUC: 0.5217686567164179


## Stretch Goals:

- Research "backpropagation" to learn how weights get updated in neural networks (tomorrow's lecture). 
- Implement a multi-layer perceptron. (for non-linearly separable classes)
- Try and implement your own backpropagation algorithm.
- What are the pros and cons of the different activation functions? How should you decide between them for the different layers of a neural network?