<img align="left" src="https://lever-client-logos.s3.amazonaws.com/864372b1-534c-480e-acd5-9711f850815c-1524247202159.png" width=200>
<br></br>
<br></br>

# Neural Networks

## *Data Science Unit 4 Sprint 2 Assignment 1*

## Define the Following:
You can add image, diagrams, whatever you need to ensure that you understand the concepts below.

### Input Layer:
The input layer is composed of artificial input neurons. This input brings the initial data into the system for further processing by subsequent layers of artificial neurons. The input layer is the very beginning of the workflow for the artificial neural network.

### Hidden Layer:
A hidden layer is a layer in between input layers and output layers, where artificial neurons take in a set of weighted inputs, the bias and then produce an output through an activation function. All the computation is done in this layer.

### Output Layer:
The output layer produces the result for given inputs, after being processed in the hidden layer.

### Neuron:
 a neuron is a mathematical function that model the functioning of a biological neuron. Typically, a neuron compute the weighted average of its input, and this sum is passed through a nonlinear function, often called activation function, such as the sigmoid.

### Weight:
A weight represent the strength of the connection between units. This weight brings down the importance of the input value.

### Activation Function:
the activation function is responsible for transforming the summed weighted input from the node into the activation of the node (or output) for that input.

### Node Map:
Node map is a visual representation of the structure of a neural networks layers.

### Perceptron:
A neural network with a single layer.

## Inputs -> Outputs

### Explain the flow of information through a neural network from inputs to outputs. Be sure to include: inputs, weights, bias, and activation functions. How does it all flow from beginning to end?

#### The data is fed into the neural network through nodes in the input layer. These input nodes are transformed in the hidden layers by computing the dot product between these inputs and the weights and then combining with a bias value. The results are fed through an activation function, being finally presented in the output layer. 

In [1]:
# libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

## Write your own perceptron code that can correctly classify (99.0% accuracy) a NAND gate. 

| x1 | x2 | y |
|----|----|---|
| 0  | 0  | 1 |
| 1  | 0  | 1 |
| 0  | 1  | 1 |
| 1  | 1  | 0 |

In [246]:
data = { 'x1': [0,1,0,1],
         'x2': [0,0,1,1],
         'y':  [1,1,1,0]
       }

df = pd.DataFrame.from_dict(data).astype('int')

np.random.seed(42)

inputs = np.array([
    [0,0],
    [1,0],
    [0,1],
    [1,1]
])

correct_outputs = np.array([[1], [1], [1], [0]])

X=inputs
y=correct_outputs

In [247]:
X.shape

(4, 2)

In [248]:
y.shape

(4, 1)

In [359]:
class Perceptron(object):
    """
    Perceptron classifier.
    
    Parameters
    -----------
    eta: float
        learning rate (between 0.0 and 1.0)
    n_iter: int
        Passes (epochs) over the training set.
    
    Attributes
    -----------
    weight: ld-array
        Weights after fitting.
    errors:list
        Number of misclassification in every epoch.
    """
    @staticmethod
    def sigmoid(x):
        return 1/ (1 + np.exp(-x))

    @staticmethod
    def sigmoid_derivative(x):
        sx = sigmoid(x)
        return sx * (1-sx)
    
    def __init__(self, n_iter = 10):
        self.n_iter = n_iter
    
    def fit(self, X, y):
        """
        Fit method for training data.
        
        Parameters
        ---------
        X: {array-like}, shape = [n_samples, n_features]   # rows by columns
            Training vectors, where 'n_samples' is the number of samples
            and 'n_features' is the number of features.
        y: {array-like}, shape = [n_samples]
            Target values.
            
        Returns
        ---------
        self:object
        
        """
        
        # bias
        self.bias = np.random.random()
        
        # weights
        # print(X.shape[1])
        self.weight = 2* np.random.random((2,1)) - 1 
        
        # number of misclassifications
        self.errors = []
        
        for _ in range(self.n_iter):   #loop through my data
            
            # weight sum of inputs/weights
            weighted_sum = np.dot(X, self.weight).reshape(-1,1)
            
            # Activate
            activated_output = self.sigmoid(weighted_sum)
            
            # calc error
            errors = y - activated_output
            
            # update the weights
            adjust = errors * self.sigmoid_derivative(activated_output)
            # print(adjust)
            # print(self.weight)
            # print(np.dot(X.T, adjust))
            self.bias += np.sum(adjust)
            self.weight += np.dot(X.T, adjust)
            
            return self
    
    def predict(self,X):
        """
        Return predicted value.
        """
        weighted_sum = np.dot(X, self.weight) + self.bias
        activated_output = self.sigmoid(weighted_sum)
        return np.round(activated_output)

In [360]:
nand_gate = Perceptron()

In [361]:
nand_gate.fit(X, y)

<__main__.Perceptron at 0x1a77611b6d8>

In [362]:
nand_gate.predict(X)

array([[1.],
       [1.],
       [1.],
       [0.]])

## Implement your own Perceptron Class and use it to classify a binary dataset: 
- [The Pima Indians Diabetes dataset](https://raw.githubusercontent.com/ryanleeallred/datasets/master/diabetes.csv) 

You may need to search for other's implementations in order to get inspiration for your own. There are *lots* of perceptron implementations on the internet with varying levels of sophistication and complexity. Whatever your approach, make sure you understand **every** line of your implementation and what its purpose is.

In [416]:
diabetes = pd.read_csv('https://raw.githubusercontent.com/ryanleeallred/datasets/master/diabetes.csv')
diabetes.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


Although neural networks can handle non-normalized data, scaling or normalizing your data will improve your neural network's learning speed. Try to apply the sklearn `MinMaxScaler` or `Normalizer` to your diabetes dataset. 

In [417]:
from sklearn.preprocessing import MinMaxScaler, Normalizer

feats = list(diabetes)[:-1]

X = diabetes[
             ['Pregnancies',
              'Glucose',
              'BloodPressure',
              'SkinThickness',
              'Insulin',
              'BMI',
              'DiabetesPedigreeFunction',
              'Age']
            ].values
y = diabetes[['Outcome']].values

In [418]:
X.shape

(768, 8)

In [419]:
X

array([[  6.   , 148.   ,  72.   , ...,  33.6  ,   0.627,  50.   ],
       [  1.   ,  85.   ,  66.   , ...,  26.6  ,   0.351,  31.   ],
       [  8.   , 183.   ,  64.   , ...,  23.3  ,   0.672,  32.   ],
       ...,
       [  5.   , 121.   ,  72.   , ...,  26.2  ,   0.245,  30.   ],
       [  1.   , 126.   ,  60.   , ...,  30.1  ,   0.349,  47.   ],
       [  1.   ,  93.   ,  70.   , ...,  30.4  ,   0.315,  23.   ]])

In [420]:
##### Update this Class #####

class Perceptron(object):
    """
    Perceptron classifier.
    
    Parameters
    -----------
    eta: float
        learning rate (between 0.0 and 1.0)
    n_iter: int
        Passes (epochs) over the training set.
    
    Attributes
    -----------
    weight: ld-array
        Weights after fitting.
    errors:list
        Number of misclassification in every epoch.
    """
    def __init__(self, eta = 0.01, n_iter = 10):
        self.eta = eta
        self.n_iter = n_iter

    def fit(self, X, y):
        """Fit training data
        X : Training vectors, X.shape : [#samples, #features]
        y : Target values, y.shape : [#samples]
        """

        # Randomly Initialize Weights
        self.weights = np.zeros(1 + X.shape[1])
        self.errors = []
        
        for _ in range(self.n_iter):
          err = 0
          for xi, target in zip(X, y):
            delta_w = self.eta * (target - self.predict(xi))
            self.weights[1:] += delta_w * xi
            self.weights[0] += delta_w
            err += int(delta_w != 0.0)
          self.errors.append(err)
        return self

    def net_input(self, X):
        """Calculate net input"""
        return np.dot(X, self.weights[1:]) + self.weights[0]

    def predict(self, X):
        """Return class label after unit step"""
        return np.where(self.net_input(X) >= 0.5, 1, 0)

In [421]:
model = Perceptron(n_iter = 1000)
model.fit(X,y)

<__main__.Perceptron at 0x1a7781a04e0>

In [431]:
model.predict(X)

array([0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0,
       1, 0, 1, 1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1,
       1, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0,
       0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0,
       1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1,
       1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1,
       1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0,
       0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0,
       1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0,
       1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0,
       0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1,
       1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1,

In [432]:
model.predict(X[1])

array(0)

In [433]:
from sklearn.metrics import accuracy_score

accuracy_score(model.predict(X), y)

0.69921875

## Stretch Goals:

- Research "backpropagation" to learn how weights get updated in neural networks (tomorrow's lecture). 
- Implement a multi-layer perceptron. (for non-linearly separable classes)
- Try and implement your own backpropagation algorithm.
- What are the pros and cons of the different activation functions? How should you decide between them for the different layers of a neural network?