# Machine Learning A-Z: Section 29 Artificial Neural Network

In this notebook we'll be using an Artificial Neural Network to solve a business problem described later. First though, we'll cover a quick description of what a neural network is and how it works. Neural Networks though are an entire area of study and a branch of Machine Learning on their own so we'll only give a high level explanation of them here.

Artificial networks consist of groups of little units called neurons (after the ones in your brain!). A single neuron takes inputs from other neurons, multiplies each input by a weight (more on that in a moment), sums the resulting weighted inputs and then calculates an activation (i.e. on or off) from that sum. An artificial neural network is composed of many layers of the neurons, each layer passing their outputs as input to the next layer. The last layer (called the output layer) is used to pass the results back to the user.

In order for a neural network to be useful it needs to be trained. However, before training the developer needs to decide the structure of the network (how many layers, how many neurons in each layer, what to use as an activation function, etc.). After the structure is setup the weight for each linkage is initialized to a random value and the network is ready to be trained. 

To train the neural network, it is fed the training data and the output (which will initially be random) is compared to the expected output. The error in the prediction is computed and then using gradient-descent and back-propogration the weights are adjusted to make the output a little closer to the true output. Then the process is repeated on the next piece of training data, and the next, and the next and so-on. Once the network has been trained with all the available training data, that completes a single training epoch. The training algorithm then repeats epochs a set number of times always inching closer a better solution. Be careful though it is possible to overfit your training data by training on it too long. The length of time you can train on your dataset will vary by how large the dataset is, how large the network is, how many independent variables there are, how you subdivide the training data, etc. However, You'll be able to identify overfitting by evaluating the performance of the model on a test set after every epoch and looking for a drop in the accuracy of the predictions.

The problem we'll be solving with the ANN will be one of trying to determine which users are likely to stop using a particular bank (churn). In this case we'll use an ANN to create a geodemographic model from a sample of data the bank collected about it's customers and try to identify who is likely to leave the bank.

## Step 1 Import and Prepare the data.

In [1]:
import numpy as np # Libraries for fast linear algebra and array manipulation
import pandas as pd # Import and manage datasets
from plotly import __version__ as py__version__
import plotly.express as px # Libraries for ploting data
import plotly.graph_objects as go # Libraries for ploting data
from sklearn import __version__ as skl__version__
from sklearn.model_selection import train_test_split # Library to split data into training and test sets.
from sklearn.preprocessing import LabelEncoder, OneHotEncoder # Libraries to do encoding of categorical variables
from sklearn.compose import ColumnTransformer # Library to transform only certain columns/features at a time
from sklearn.preprocessing import StandardScaler # Library to do feature scaling
from sklearn.metrics import confusion_matrix #Function for computing the confusion matrix
from tensorflow import __version__ as tf__version__
from tensorflow import keras # High level library for building Neural Networks
from tensorflow.keras.models import Sequential # Keras module for building a neural network with sequential layers
from tensorflow.keras.layers import Dense #Keras module for building a neural network with fully interconnected layers

Library versions used in this code:

In [2]:
print('Numpy: ' + np.__version__)
print('Pandas: ' + pd.__version__)
print('Plotly: ' + py__version__)
print('Scikit-learn: ' + skl__version__)
print('Tensorflow Verion: ' + tf__version__)

Numpy: 1.16.4
Pandas: 0.25.1
Plotly: 4.0.0
Scikit-learn: 0.21.2
Tensorflow Verion: 2.0.0


In [3]:
def LoadData():
    dataset = pd.read_csv('Churn_Modelling.csv')
    return dataset

dataset = LoadData()
print(dataset.head(3))
print()
print(dataset.info())

   RowNumber  CustomerId   Surname  CreditScore Geography  Gender  Age  \
0          1    15634602  Hargrave          619    France  Female   42   
1          2    15647311      Hill          608     Spain  Female   41   
2          3    15619304      Onio          502    France  Female   42   

   Tenure    Balance  NumOfProducts  HasCrCard  IsActiveMember  \
0       2       0.00              1          1               1   
1       1   83807.86              1          0               1   
2       8  159660.80              3          1               0   

   EstimatedSalary  Exited  
0        101348.88       1  
1        112542.58       0  
2        113931.57       1  

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 14 columns):
RowNumber          10000 non-null int64
CustomerId         10000 non-null int64
Surname            10000 non-null object
CreditScore        10000 non-null int64
Geography          10000 non-null object
Gender     

We can see that the dataset contains 14 columns. However, not all of them are useful for out model. Only the columns listed below will actually be useful:
* CreditScore
* Geography
* Gender
* Age
* Tenure (How long the person has been a customer)
* Balance
* NumOfProducts (How many of the bank's products does the customer use)
* HasCrCard
* IsActiveMember
* Estimated Salary

Using the data in these columns we'll try to predict the value in the *Exited* column to determine if a user will leave the bank soon or not.

You'll notice that some of these columns are categorical variables that we'll need to encode to work properly. Also there does not appear to be any missing data in this data set.

## Step 2. Split and Encode the Data

In [4]:
X = dataset.iloc[:,3:-1].values # All the columns except the last are features
y = dataset.iloc[:,-1].values # The last column is the dependent variable

Now that we've split the data into dependent and independent datasets we need to encode the categorical variables in the independent variables.

We'll use One-Hot encoding on both gender and country to encode the categorical data. Don't forget to remove one of the new columns from the one-hot encoded categorical variables to avoid the dummy variable trap!

In [5]:
columntransformer = ColumnTransformer([
    ('Country_Category', OneHotEncoder(drop='first'), [1]),
    ('Gender_Category', OneHotEncoder(drop='first'), [2])],
    remainder = 'passthrough')
X = np.array(columntransformer.fit_transform(X))

print(X)

[[0.0 0.0 0.0 ... 1 1 101348.88]
 [0.0 1.0 0.0 ... 0 1 112542.58]
 [0.0 0.0 0.0 ... 1 0 113931.57]
 ...
 [0.0 0.0 0.0 ... 0 1 42085.58]
 [1.0 0.0 1.0 ... 1 0 92888.52]
 [0.0 0.0 0.0 ... 1 0 38190.78]]


Now it's time to split the data into test and training sets.

In [6]:
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size = 0.2, random_state = 42)

Finally we need to scale the data to easy the computations we'll do on the data and prevent bias from columns which tend to have larger numbers & variations

In [7]:
sc_X = StandardScaler()
X_train = sc_X.fit_transform(X_train)
X_test = sc_X.transform(X_test)

print(X_train)
print()
print(X_test)

[[-0.57946723 -0.57638802  0.91324755 ...  0.64920267  0.97481699
   1.36766974]
 [ 1.72572313 -0.57638802  0.91324755 ...  0.64920267  0.97481699
   1.6612541 ]
 [-0.57946723  1.73494238  0.91324755 ...  0.64920267 -1.02583358
  -0.25280688]
 ...
 [-0.57946723 -0.57638802 -1.09499335 ... -1.54035103 -1.02583358
  -0.1427649 ]
 [-0.57946723 -0.57638802  0.91324755 ...  0.64920267 -1.02583358
  -0.05082558]
 [ 1.72572313 -0.57638802  0.91324755 ...  0.64920267  0.97481699
  -0.81456811]]

[[ 1.72572313 -0.57638802  0.91324755 ... -1.54035103 -1.02583358
  -1.01960511]
 [-0.57946723 -0.57638802  0.91324755 ...  0.64920267  0.97481699
   0.79888291]
 [-0.57946723  1.73494238 -1.09499335 ...  0.64920267 -1.02583358
  -0.72797953]
 ...
 [-0.57946723 -0.57638802 -1.09499335 ...  0.64920267 -1.02583358
  -1.16591585]
 [-0.57946723 -0.57638802  0.91324755 ...  0.64920267 -1.02583358
  -0.41163463]
 [ 1.72572313 -0.57638802  0.91324755 ...  0.64920267  0.97481699
   0.12593183]]


Now our dataset is ready for use and we can put together the structure of our ANN!
## Step 3. Design the ANN

In [23]:
classifier = Sequential()
classifier.add(Dense(6, kernel_initializer = 'uniform', activation = 'relu', input_shape=(11,))) # First Hidden layer. We'll just default to the average of the input size (11) and output size (1)
classifier.add(Dense(6, kernel_initializer = 'uniform', activation = 'relu')) # Second Hidden Layer
classifier.add(Dense(1, kernel_initializer = 'uniform', activation = 'sigmoid')) # Final output Layer. If your output has multiple categories (instead of binary), you'll need use the softmax activation function and have an output neuron for each category

In [24]:
classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])

## Step 4. Train the ANN

In [25]:
classifier.fit(x=X_train, y=y_train, batch_size = 10, epochs = 100)

Train on 8000 samples
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100

<tensorflow.python.keras.callbacks.History at 0x7f0e981b2d90>

## Step 5. Evaluate the ANN

In [27]:
y_pred = (classifier.predict(X_test) > 0.5)
print(y_pred)

[[False]
 [False]
 [False]
 ...
 [ True]
 [False]
 [False]]


In [28]:
cm = confusion_matrix(y_test, y_pred)
print(cm)

[[1533   74]
 [ 216  177]]


Here we can finally see that the accuracy of our neural network on the training set ends up around 86%. Looking at the confusion matrix we see that the we are correctly able to identify 85.5% of the users who will leave the bank. This shows that our model fits the data well and is not overfit.

The bank could now use this ANN to identify their customers who are most likely to leave the bank and dig into those users to look for patterns that may identify why and what the bank could do to retain those users most likely to leave. Very valuable insights indeed!