<img align="left" src="https://lever-client-logos.s3.amazonaws.com/864372b1-534c-480e-acd5-9711f850815c-1524247202159.png" width=200>
<br></br>
<br></br>

# Neural Networks

## *Data Science Unit 4 Sprint 2 Assignment 1*

## Define the Following:
You can add image, diagrams, whatever you need to ensure that you understand the concepts below.

### Input Layer: aka visible layer. recieves input from dataset. only part of the NN exposed to the data. typically composed of the features/columns in a dataset. eg: This is the first layer in the neural network. It takes input signals(values) and passes them on to the next layer. It doesn’t apply any operations on the input signals(values) & has no weights and biases values associated

### Hidden Layer: can only be accessed through the input layer. inside the NN and perform some function, but not interacted with directly. eg : Hidden layers have neurons(nodes) which apply different transformations to the input data. One hidden layer is a collection of neurons stacked vertically(Representation).All the neurons in a hidden layer are connected to each and every neuron in the next layer, hence we have a fully connected hidden layers.

### Output Layer: final layer. modified by an activation function to produce a transformed output that is in the same context as the target. eg:This layer is the last layer in the network & receives input from the last hidden layer.

### Neuron: nodes in a NN that activate upon reaching a threshold, basic unit of a NN, When a signal(value) arrives, it gets multiplied by a weight value. If a neuron has 4 inputs, it has 4 weight values which can be adjusted during training time.

### Weight: value to multiply the input by, represents the strength of connection between two units.eg : If the weight from node 1 to node 2 has greater magnitude, it means that neuron 1 has greater influence over neuron 2. A weight brings down the importance of the input value. Weights near zero means changing this input will not change the output. Negative weights mean increasing this input will decrease the output. A weight decides how much influence the input will have on the output.

### Activation Function: used to introduce non-linearity to neural networks. It squashes the values in a smaller range viz. a Sigmoid activation function squashes values between a range 0 to 1. There are many activation functions used in deep learning industry and ReLU, SeLU and TanH are preferred over sigmoid activation function

### Node Map: a function or a "mapping" from inputs to outputs


### Perceptron: A Perceptron is a type of artificial neuron which takes in several binary inputs x1, x2, … , xn and produces a single binary output


## Inputs -> Outputs

### Explain the flow of information through a neural network from inputs to outputs. Be sure to include: inputs, weights, bias, and activation functions. How does it all flow from beginning to end?

#### input is fed to a hidden layer where weights and bias are applied then transformed by an activation function to determine if it 'fires' or not. output is then returned in similar context as target

## Write your own perceptron code that can correctly classify a NAND gate. 

| x1 | x2 | y |
|----|----|---|
| 0  | 0  | 1 |
| 1  | 0  | 1 |
| 0  | 1  | 1 |
| 1  | 1  | 0 |

In [9]:
import numpy as np

np.random.seed(747)

#add column of ones as bias
inputs = np.array([
    [0,0,1],
    [1,0,1],
    [0,1,1],
    [1,1,1]
])

correct_outputs = [[1], [1], [1], [0]]

In [2]:
#sigmoid functions for activation threshold
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def sigmoid_derivative(x):
    sx = sigmoid(x)
    return sx * (1-sx)

In [4]:
# random starting weights for each column
weights = 2 * np.random.random((3,1)) - 1
weights

array([[-0.06388669],
       [ 0.10725378],
       [-0.30554464]])

In [5]:
# matrix math for weights
weighted_sum = np.dot(inputs, weights)
weighted_sum

array([[-0.30554464],
       [-0.36943133],
       [-0.19829086],
       [-0.26217755]])

In [6]:
# activation values via sigmoid
activated_output = sigmoid(weighted_sum)
activated_output

array([[0.42420261],
       [0.40867844],
       [0.45058908],
       [0.43482849]])

In [7]:
# error values
error = correct_outputs - activated_output
error

array([[ 0.57579739],
       [ 0.59132156],
       [ 0.54941092],
       [-0.43482849]])

In [8]:
# single round of gradient descent/backprop
adjustments = error * sigmoid_derivative(activated_output)
adjustments

array([[ 0.13766288],
       [ 0.14182565],
       [ 0.13061033],
       [-0.10372635]])

In [9]:
# updated weights
weights += np.dot(inputs.T, adjustments)
weights

array([[-0.02578739],
       [ 0.13413777],
       [ 0.00082788]])

In [13]:
def perceptron(inputs, weights, iter=10000):
    'single neuron to perform grad. desc./back prop'
    
    for iteration in range(iter):
        
        #weighted sum of inputs / weights
        weighted_sum = np.dot(inputs, weights)
        
        #activate
        activated_output = sigmoid(weighted_sum)
        
        #Calc error
        error = correct_outputs - activated_output
        
        adjustments = error * sigmoid_derivative(activated_output)
        
        # Update the weights
        weights += np.dot(inputs.T, adjustments)
    
    print("weights after training:", weights)
    print("output after training:", activated_output)
    

In [16]:
perceptron(inputs, weights, iter=10000)

weights after training: [[-16.82044092]
 [-16.82044092]
 [ 25.27884448]]
output after training: [[1.00000000e+00]
 [9.99787933e-01]
 [9.99787933e-01]
 [2.33515394e-04]]


## Implement your own Perceptron Class and use it to classify a binary dataset like: 
- [The Pima Indians Diabetes dataset](https://raw.githubusercontent.com/ryanleeallred/datasets/master/diabetes.csv) 
- [Titanic](https://raw.githubusercontent.com/ryanleeallred/datasets/master/titanic.csv)
- [A two-class version of the Iris dataset](https://raw.githubusercontent.com/ryanleeallred/datasets/master/Iris.csv)

You may need to search for other's implementations in order to get inspiration for your own. There are *lots* of perceptron implementations on the internet with varying levels of sophistication and complexity. Whatever your approach, make sure you understand **every** line of your implementation and what its purpose is.

In [58]:
inputs[:,:-1]

array([[0, 0],
       [1, 0],
       [0, 1],
       [1, 1]])

In [7]:
class perceptron(object):
    """single neuron using gradient descent and back prop"""
    
    def __init__(self, num_inputs, niter=77, eta=0.1):
        self.niter = niter
        self.eta = eta
        self.weights = np.zeros(num_inputs + 1)
        
    
    def predict(self, inputs):
        
        weighted_sum = np.dot(inputs, self.weights[1:]) + self.weights[0]
        
        if weighted_sum > 0:
            activation = 1
        
        else:
            activation = 0            
        
        
        return activation
        

    def train(self, training_inputs, labels):
        for _ in range(self.niter):
            for inputs, label in zip(training_inputs, labels):
                prediction = self.predict(inputs)
                self.weights[1:] += self.eta * (label - prediction) * inputs
                self.weights[0] += self.eta * (label - prediction)

In [8]:
x = perceptron(num_inputs=2)

In [9]:
x.train(inputs[:,:-1], np.array(correct_outputs).reshape(-1,1))

In [10]:
pred_input = np.array([1,1])
x.predict(pred_input)

0

In [2]:
import pandas as pd

In [3]:
df = pd.read_csv('https://raw.githubusercontent.com/ryanleeallred/datasets/master/diabetes.csv')

In [4]:
df.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


In [5]:
len(df)

768

In [10]:
perc = perceptron(8)

In [11]:
inp = df.iloc[:,:-1]
inp = np.array(inp)
type(inp)

numpy.ndarray

In [12]:
tar = df.iloc[:,-1:]
tar = np.array(tar)
type(tar)

numpy.ndarray

In [13]:
perc.train(inp, tar)

In [14]:
pred_inp = np.array([0,111,76,27,111,29.7,0.432,44])
perc.predict(pred_inp)

0

In [18]:
import matplotlib.pyplot as plt

In [19]:
plt.plot(range(1, len(perc.errors) + 1), perc.errors, marker='o')
plt.xlabel('Epochs')
plt.ylabel('Number of misclassifications')
plt.show()

AttributeError: 'perceptron' object has no attribute 'errors'

## Stretch Goals:

- Research "backpropagation" to learn how weights get updated in neural networks (tomorrow's lecture). 
- Implement a multi-layer perceptron. (for non-linearly separable classes)
- Try and implement your own backpropagation algorithm.
- What are the pros and cons of the different activation functions? How should you decide between them for the different layers of a neural network?