## Machine Learning A-Z™

© Kirill Eremenko, Hadelin de Ponteves, SuperDataScience Team |
[Super Data Science](http://www.superdatascience.com)

Part 8: Deep Learning | Section 36: Artificial Neural Networks (ANN)

Created on Tue Apr  20, 2019
@author: yinka_ola

---

In [4]:
## ---

## Deep Learning:
## Deep Learning is the most exciting and powerful branch of Machine Learning. 
## Deep Learning models can be used for a variety of complex tasks:

## Artificial Neural Networks for Regression and Classification
## Convolutional Neural Networks for Computer Vision
## Recurrent Neural Networks for Time Series Analysis
## Self Organizing Maps for Feature Extraction
## Deep Boltzmann Machines for Recommendation Systems
## Auto Encoders for Recommendation Systems

## In this part, you will understand and learn how to implement the following 
## Deep Learning models: Artificial Neural Networks for a Business Problem

## ---

## Deep Learning History:
## Invented in the 70s, caugth on in the 80s
## died off over the next decade => computational processing power not available
## huge jump in technology progress = exponential growth
## future: DNA data storage
## 1kg DNA to store all of the worlds data

## So what is deep learning?
## Geoffery Hinton: Godfather of Deep learning: currently at Google
## goal: mimic how the human brain operate and recreate it
## human brain: one of the most powerful machine of learning
## approx. 100 billion neuron in the human brain
## create an artificial structure = artificail neural net
## input layer + hidden layer + output layer
## multiple hidden layer and connect each together 

## ---

## Artificial Neural Networks (ANN):
## the neuron
## the activation function
## how do neural networks work? (example)
## how neural networks learn
## Gradient descent
## Stochastic gradient descent
## Backpropagation

## ---
## The neuron:
## goal of ANN: recreate the neuron
## Neuron, dendrites, axon
## a sole neuron  is not strong // but much more effective in groups
## neurons transmit signals via the axon // synopses

## input value => neuron => output value
## output value can be either continuous  or categorical
## same input row = same row into output
## synopsis get assigned weights = ANN learns/trains by adjusting weights
## what happens inside the neuron?

## The activation function:
## The threshold function (binary)
## the sigmoid funtion = smoother, no kinks like the threshold function (binary)
## The rectifier function
## hyperbolic tangent (tanh)

## How neural network works:
## case study: property valuation
## input layer: area,bedrooms, distance to city, age, etc.
## output layer: price of house (no hidden layer)
## add hidden layers to see how it grants extra power to computation
## hidden layer uses combination of input layer
## combination of layers creates powerful prediction computation

## how neural networks learn?
## ex. image recognition
## 1. hard coding: a lot of rules to be specified (i.e.)
## 2. create a facility so NN can understand and learn on its own
## goal of 2: avoid inputting rules
## here you program an architecture and point it into the folder to learn
## cost function
## in order for a  NN to learn: back propagation of weight adjustment occurs

## Gradient descent:
## the curse of dimensionality
## sunway taihulight: world fastest super computer 93 PFLOPSec
## 10^50 years (longer than age of universe) to compute 10^75combinations = No
## solution: gradient descent
## reduces computational time to minutes
## roll the ball down a parabola = to find the best weight that minize cost fcn
## you are descending to the minimum cost function
## requires cost function to be convex

## Stochastic Gradient Descent:
## what if cost function is not convex? (multi-dmimensional space)
## goal: we want to global (not local) minimum
## solution: stochastic gradient descent
## here adjust the weights after evaluating each row/iteration
## rows are picked at random
## it helps avoid descovering local minimum from global minimum 
## batch gradient descent method: a deterministic algorithm (same result)

## Backpropagation
## It adjust all the weights at the same time

## Steps to training ANN via Stochastic gradient descent
## 1. randomly initize the weights to small numbers close to 0 (not 0)
## 2. input the first observation of dataset in input layer, 1 feature per input node
## 3. forward propagation: from left to right. neurons are activated such that
## impact of each neuron's activation is limited by weights. propagate 
## activations until you get predicted result y
## 4. compare predicted result to the actual result. measure generated error
## 5. back propagation: Right to Left, error is back propagated. Update weights
## according to contrbution to error. learning rate decides by how much weight is adjusted
## 6. repreat 1 to 5 + update weights after each obs (reinforcement learning) Or:
## repeat 1 to 5  but update weights only after a batch of obs (Batch Learning).
## 7. when whole training set passed theough ANN => Epoch. Redo more epochs

## ---

#Data Scenario: 
## a bank snapshot of 10k randomly selected customers
## bank is experiencing unsually high churn rate
## investigate what is causing the churn and recommend a solution
## 6 months investigation: did they stay or leave the bank?
## determine which of the customers are going to leave? 
## determine factors which influences the outcome
## note: this analysis is transferrable to any industry

## Python Libraries and Packages
## Keras package integrates TensorFlow
## in mac/pc terminal: conda install -c conda-forge keras
## Theano: fast numerical computation library
## GPU: processor for graphic puposes: much more powerful than CPU
## Tensor: numerical computation
## Theano + Tensor flow: for research purposes ( a lot of lines of code)
## Keras: based on Theano + Tensor flow (few lines of code)
## use keras to build deep learning algorithm efficiently

## ---

In [5]:
# Importing the libraries
import pandas as pd #data
import numpy as np #mathematics
import os
#plotting packages
import matplotlib.pyplot as plt #plotting charts
import seaborn as sns
sns.set()
%matplotlib inline
plt.rcParams['figure.figsize'] = 10,5
#ignore warnings
import warnings
warnings.filterwarnings('ignore')

In [6]:
## Part 1 - Data Preprocessing

# Importing the dataset
dataset = pd.read_csv('Churn_Modelling.csv')
x = dataset.iloc[:, 3:13].values #upperbound is excluded in a range
y = dataset.iloc[:, 13].values

In [7]:
# Encoding categorical data (independent variables)
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_x_1 = LabelEncoder()
x[:, 1] = labelencoder_x_1.fit_transform(x[:, 1])
labelencoder_x_2 = LabelEncoder()
x[:, 2] = labelencoder_x_2.fit_transform(x[:, 2])
onehotencoder = OneHotEncoder(categorical_features = [1]) #no order to categorical var.
x = onehotencoder.fit_transform(x).toarray()
x = x[:, 1:]
print (x)

[[0.0000000e+00 0.0000000e+00 6.1900000e+02 ... 1.0000000e+00
  1.0000000e+00 1.0134888e+05]
 [0.0000000e+00 1.0000000e+00 6.0800000e+02 ... 0.0000000e+00
  1.0000000e+00 1.1254258e+05]
 [0.0000000e+00 0.0000000e+00 5.0200000e+02 ... 1.0000000e+00
  0.0000000e+00 1.1393157e+05]
 ...
 [0.0000000e+00 0.0000000e+00 7.0900000e+02 ... 0.0000000e+00
  1.0000000e+00 4.2085580e+04]
 [1.0000000e+00 0.0000000e+00 7.7200000e+02 ... 1.0000000e+00
  0.0000000e+00 9.2888520e+04]
 [0.0000000e+00 0.0000000e+00 7.9200000e+02 ... 1.0000000e+00
  0.0000000e+00 3.8190780e+04]]




In [8]:
# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.2, random_state = 0)


In [9]:
# Feature Scaling
# it is needed, due to high computational settings
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
x_train = sc.fit_transform(x_train)
x_test = sc.transform(x_test)


In [10]:
# Part 2 - Now let's make the ANN!
# Importing the Keras libraries and packages
import keras
from keras.models import Sequential
from keras.layers import Dense #module to create layer in ANN

# Initialising the ANN
classifier = Sequential()

# Adding the input layer and the first hidden layer
## we have 11 independent variables
classifier.add(Dense(output_dim = 6, init = 'uniform', activation = 'relu', input_dim = 11))

# Adding the second hidden layer
## not useful for our dataset, but good to know how to do it.
classifier.add(Dense(output_dim = 6, init = 'uniform', activation = 'relu'))

# Adding the output layer
## if dependent variable has > 2 categories = use soft max 
classifier.add(Dense(output_dim = 1, init = 'uniform', activation = 'sigmoid'))

# Compiling the ANN
classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])

# Fitting the ANN to the Training set
classifier.fit(x_train, y_train, batch_size = 10, nb_epoch = 100)

Instructions for updating:
Use tf.cast instead.
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100

Epoch 80/100
Epoch 81/100
Epoch 82/100
Epoch 83/100
Epoch 84/100
Epoch 85/100
Epoch 86/100
Epoch 87/100
Epoch 88/100
Epoch 89/100
Epoch 90/100
Epoch 91/100
Epoch 92/100
Epoch 93/100
Epoch 94/100
Epoch 95/100
Epoch 96/100
Epoch 97/100
Epoch 98/100
Epoch 99/100
Epoch 100/100


<keras.callbacks.History at 0x1a3b022ef0>

In [11]:
## Observation: our accuracy converges to 85.8%

In [1]:
## Part 3 - Making the predictions and evaluating the model 
# Predicting the Test set results
## this is the probability that the 2000 test customer will leave the bank
## we need a threshold to determine if category is 0 or 1 => 50% threshold
y_pred = classifier.predict(x_test)
y_pred = (y_pred > 0.5)

NameError: name 'classifier' is not defined

In [None]:
## bank can use this to make prediction on training set if accuracy is close to 
## 83.5% then can rank them and target customers most likely to leave and create 
## to keep them from leaving measures

In [None]:
# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
print(cm)

In [None]:
## confirm the accuracy
## on new observation, we get an accuracy of 86% => same accuracy as training
## bank can use this to predict to predict if client will stay 
(1550 +175)/2000