# Agenda
- Deep Learning
- Artificial Neural Network(ANN)
- How Neural Network Works
- Mathematics Behind NN
- Activation Function
- Bias Node
- Forward Propagation
- Back Propagation
- Learning Rate


## Why Deep Learning?

- Suppose we have n=100 features  
$ x_1^2 + x_1x_2 + x_1x_3 + x_1x_4 + x_1x_5+   .....x_{99}x_{100} + x_{100}^2$
- For Non-Linear Logistic Regression, a quadratic function of order 2 will have features = n*(n+1)/2 = 5050 
- For order 3 it will be = n*(n+1)*(n+2)/6 = 1,71,700
  
  
Suppose We have an Image of pixel size 100 * 100  
Total pixels = 100 * 100 = 10,000 pixels, n = 10,000 (30,000 for RGB )  
n = 10,000 for order 2 quadratic feature will have total 100 Million features (900 Million feature for RGB)






## Deep Learning
- Deep learning (also known as deep structured learning) is part of a broader family of machine learning methods based on artificial neural networks with representation learning. 
- Learning can be supervised, semi-supervised or unsupervised
- The “deep” in deep learning refers to the depth of the network.

<img src="Image/deep.png" width="400" />

#### Why Deep Learning?

<img src="Image/perf.png" width="600" />

# Artificial Neural Network
- Dendrites are the structures on the neuron that receive electrical messages, to process these signals, and to transfer the information to the soma of the neuron
- Axons: primary transmission lines of the nervous system


<img src="Image/neuron.jpg" width="400" />
<br>
<br>  
<br>  
<br>
<br>
  
  
  
  
  

#### How Neural Network Works?
<img src="Image/neuron-3.png" width="300" />
<br>
<br>  
<br>  
<br>
<br>
<img src="Image/Basic_Neural.png" width="500" />
<br>
<br>  
<br>  
<br>
<br>

#### Neural Network with Hidden Layer

- Between Input and Output layer
- Allow for the function of a neural network to be broken down into specific transformations of the data
- Each hidden layer function is having specific task to produce a defined output

<img src="Image/house_neural.png" width="500" />
<br>
<br>  
<br>  
<br>
<br>
<img src="Image/house_hidden2.png" width="500" />
<br>
<br>  
<br>  
<br>
<br>

## Activation Function

- These type of functions are attached to each neuron in the network, and determines whether it should be activated or not, based on whether each neuron’s input is relevant for the model’s prediction.
- It helps to standardize the output of each neuron.
- E.g: Threshold, Sigmoid, Relu(Rectifier), Softmax

<img src="Image/activation.png" width="500" />

#### Whats the diff between Step function, Linear function and Sigmoid function?  


Linear Function:
- Using Linear function only will make the output layer to be a linear function as well so we can't map a non-linear dataset

Step Function: 
- we define threshold values and have discrete output values
- if(z > threshold) — “activate” the node (value 1)
- if(z < threshold) — don’t “activate” the node (value 0)
- So, we have value either 0 or 1
- issue here is that it is possible multiple output classes/nodes to be activated (to have the value 1). So we are not able to properly classify/decide.

Sigmoid Function:  

$ \theta(x) = \frac {1} {1 + e^{-x}} $

- It is a non-linear function
- Value range is (0,1)
- classify values either 1 or 0

<img src="Image/sigmoid1.png">




#### Threshold Function

$ \theta(x): $   
=0 if x < 0  
=1 if x > 0 

<img src="Image/threshold1.png" width="300" />




#### Rectifier Function

$ \theta(x) = max(x,0) $

<img src="Image/rectifier1.png">

We use relu in hidden layers  
sigmoid in output layer for binary classification problems  and softmax for multiclass classification problems  
relu in output layer for regression problems

## Bias
- Its a constant which helps the model in a way that it can fit best for the given data


## Forward Propagation


## How do Neural network Learn?

- Cost reduces with adjustment in weight(w)
<img src="Image/neural_learn.png">

## Back Propagation

- Error propagates from right to left and update the weights according to how much they are responsible for the error.
- Determining how changing the weights impact the overall cost in the neural network.

## Learning Rate

The learning rate decides by how much we update the weights

## Perceptron
A collection of neurons, along with a set of input nodes connected to the inputs via weighted edges, is a perceptron, the simplest neural network.

<img src="Image/neuron-3.png" width="300" />

### Neural Network Examples:

$ x_1,x_2 \epsilon (0,1)$  
y = x1 AND x2
<img src="Image/ex-nn.png">

#### AND Gate
$h_\theta(x) = g(-50+30x_1+30x_2)$  

<img src="Image/AND1.png">

#### OR Gate
$h_\theta(x) = g(-20+30x_1+30x_2)$  

<img src="Image/OR.png">

In [1]:
import numpy, random, os
lr = 1 #learning rate
bias = 1 #value of bias
weights = [-50, 10, 30]
print(weights)

[-50, 10, 30]


In [3]:
#Training the model
def perceptron(x_1, x_2, output) :
    outputP = bias*weights[0]+x_1*weights[1]+x_2*weights[2]
    if outputP > 4.6 : #activation function (here Heaviside)
        outputP = 1
    else :
        outputP = 0
    #print(output,outputP)    
    error = (output-outputP)**2
    #print(error)
    weights[0] += error * bias * lr
    weights[1] += error * x_1 * lr
    weights[2] += error * x_2 * lr

    #print (weights)

#Making the prediction
def predict(x_1, x_2):
    outputP = bias*weights[0] + x*weights[1] + y*weights[2] 
    if outputP > 4.6: #activation function
        outputP = 1
    else :
        
        outputP = 0
    return outputP
    
for i in range(50) :
    #print("Running loop i=%s"%i)
    perceptron(1,1,1) #True or true
    perceptron(1,0,1) #True or false
    perceptron(0,1,1) #False or true
    perceptron(0,0,0) #False or false
print(weights)     
x = int(input())
y = int(input())
output_predict = predict(x, y )
print(x, "or", y, "is : ", output_predict)    

[-24, 30, 39]
1
0
1 or 0 is :  1


## Gradient Descent

Two Types:
1. Batch Gradient Descent
2. Stochastic Gradient Descent



## Artificial Neural Network with Python

In [5]:
#Import Libraries
import pandas as pd                 # pandas is a dataframe library
import matplotlib.pyplot as plt      # matplotlib.pyplot plots data

#Read the data
df = pd.read_csv("Data/Deep_Learning/pima-data.csv")

#Check the Correlation
#df.corr()
#Delete the correlated feature
del df['skin']

#Data Molding
diabetes_map = {True : 1, False : 0}
df['diabetes'] = df['diabetes'].map(diabetes_map)

#Splitting the data
from sklearn.model_selection import train_test_split

#This will copy all columns from 0 to 7(8 - second place counts from 1)
X = df.iloc[:, 0:8]
y = df.iloc[:, 8]

split_test_size = 0.30

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=split_test_size, random_state=42) 

#Imputing
#from sklearn.impute import SimpleImputer 
from sklearn.preprocessing import Imputer

#Impute with mean all 0 readings
fill_0 = Imputer(missing_values=0, strategy="mean")

X_train = fill_0.fit_transform(X_train)
X_test = fill_0.transform(X_test)

# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)


# Importing the Keras libraries and packages
import keras
from keras.models import Sequential
from keras.layers import Dense
from keras.utils.vis_utils import plot_model
from keras.regularizers import l2

# Initialising the ANN
classifier = Sequential()

# Adding the input layer and the first hidden layer, kernel_regularizer parameter is optional
classifier.add(Dense(units = 8, kernel_initializer = 'uniform', activation = 'relu', input_dim = 8, kernel_regularizer=l2(0.001)))

# Adding the second hidden layer
classifier.add(Dense(units = 5, kernel_initializer = 'uniform', activation = 'relu', kernel_regularizer=l2(0.001)))

# Adding the output layer
classifier.add(Dense(units = 1, kernel_initializer = 'uniform', activation = 'sigmoid'))

# Compiling the ANN
classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])

# Fitting the ANN to the Training set
classifier.fit(X_train, y_train, batch_size = 10, epochs = 100)



Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

Epoch 83/100
Epoch 84/100
Epoch 85/100
Epoch 86/100
Epoch 87/100
Epoch 88/100
Epoch 89/100
Epoch 90/100
Epoch 91/100
Epoch 92/100
Epoch 93/100
Epoch 94/100
Epoch 95/100
Epoch 96/100
Epoch 97/100
Epoch 98/100
Epoch 99/100
Epoch 100/100


<keras.callbacks.History at 0x24fc6259160>

In [2]:
# Predicting the Test set results
y_pred = classifier.predict(X_test)
y_pred = (y_pred > 0.5)

# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
print(cm)

[[117  34]
 [ 27  53]]
