# Churn Modelling Prediction

## Problem Statement :- 
A bank uses these independent variables and analyse the behaviour of customer to see wether they leave the bank or stay. Now the bank has to create a predictive model based on this data set in order to predict the behaviour of new customers. This predictive model has to predict for any new customer wether he or she will stay or leave the bank so that the bank can offer something special for the customer, whom the predictive model predicts that they will leave the bank. 

### Exploring the Dataset :-
Independent Variables are:-
1. Customer id
2. Surname 
3. Credit score
4. Geography 
5. Gender 
6. Age
7. Tenure 
8. Balance
9. NumofProducts
10. HasCrCard
11. IsActiveMember

Dependent Variables are:-
1. Exited

In [39]:
import warnings
warnings.filterwarnings('ignore')

In [40]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

In [41]:
dataset = pd.read_csv('Churn_Modelling.csv')

In [42]:
dataset

Unnamed: 0,RowNumber,CustomerId,Surname,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
0,1,15634602,Hargrave,619,France,Female,42,2,0.00,1,1,1,101348.88,1
1,2,15647311,Hill,608,Spain,Female,41,1,83807.86,1,0,1,112542.58,0
2,3,15619304,Onio,502,France,Female,42,8,159660.80,3,1,0,113931.57,1
3,4,15701354,Boni,699,France,Female,39,1,0.00,2,0,0,93826.63,0
4,5,15737888,Mitchell,850,Spain,Female,43,2,125510.82,1,1,1,79084.10,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9995,9996,15606229,Obijiaku,771,France,Male,39,5,0.00,2,1,0,96270.64,0
9996,9997,15569892,Johnstone,516,France,Male,35,10,57369.61,1,1,1,101699.77,0
9997,9998,15584532,Liu,709,France,Female,36,7,0.00,1,0,1,42085.58,1
9998,9999,15682355,Sabbatini,772,Germany,Male,42,3,75075.31,2,1,0,92888.52,1


In [43]:
# Splitting the dataset into Dependent and Independent values 
X=dataset.iloc[:,3:13].values
Y=dataset.iloc[:,13].values

In [44]:
print(X)

[[619 'France' 'Female' ... 1 1 101348.88]
 [608 'Spain' 'Female' ... 0 1 112542.58]
 [502 'France' 'Female' ... 1 0 113931.57]
 ...
 [709 'France' 'Female' ... 0 1 42085.58]
 [772 'Germany' 'Male' ... 1 0 92888.52]
 [792 'France' 'Female' ... 1 0 38190.78]]


In [45]:
print(Y)

[1 0 1 ... 1 1 0]


### Label and One-Hot Encoding
Converting values into numerial values (For example in above dataset where column Geography and Gender ) in order to work our algorithm properly it is known as Encoding.
For an instance in the above dataset there is a column 'gender' in which we can assign 0 to Female and 1 to male this is known as Label encoding. 

In the another instance there is a column 'geography' where we use hot encoding instead of label encoding. In the hot encoding each label is identified ob array of integers rather than array of integer.

With the help of sklearn library we can encode the values by ourselves

In [46]:
from sklearn.preprocessing import LabelEncoder,OneHotEncoder

In [47]:
labelencoder_X_2 = LabelEncoder()

In [48]:
X[:,2]=labelencoder_X_2.fit_transform(X[:,2])

In [49]:
print(X)

[[619 'France' 0 ... 1 1 101348.88]
 [608 'Spain' 0 ... 0 1 112542.58]
 [502 'France' 0 ... 1 0 113931.57]
 ...
 [709 'France' 0 ... 0 1 42085.58]
 [772 'Germany' 1 ... 1 0 92888.52]
 [792 'France' 0 ... 1 0 38190.78]]


After this one hot encoding is used to convert the values of Geography into array of integers.

In [50]:
from sklearn.compose import ColumnTransformer 

In [51]:
ct=ColumnTransformer([('ohe',OneHotEncoder(),[1])],remainder='passthrough')

In [52]:
X=np.array(ct.fit_transform(X),dtype=np.str)

In [53]:
X=X[:,1:]

In [54]:
print(X)

[['0.0' '0.0' '619' ... '1' '1' '101348.88']
 ['0.0' '1.0' '608' ... '0' '1' '112542.58']
 ['0.0' '0.0' '502' ... '1' '0' '113931.57']
 ...
 ['0.0' '0.0' '709' ... '0' '1' '42085.58']
 ['1.0' '0.0' '772' ... '1' '0' '92888.52']
 ['0.0' '0.0' '792' ... '1' '0' '38190.78']]


### Splitting data into Training and Test set

In [55]:
from sklearn.model_selection import train_test_split

In [56]:
X_train,X_test,Y_train,Y_test = train_test_split(X, Y, test_size = 0.2, random_state = 0)

### Perform Feature Scaling :-
Feature scaling is a technique used to standardize the independent features present in the data in a fixed range. As we can see in the data, set ranges of attributes are not fixed to a specific range. Those huge range require a lot of time for calculation. So, to overcome this problem, we perform feature scaling. One thing you need t be aware of is that it's better to always perform feature scaling in deep learning no matter what range of value you're in. 

In [57]:
from sklearn.preprocessing import StandardScaler

In [58]:
sc = StandardScaler()

You might noticed in below line that we use fit_trandform in training set, and we only use fit_transform in testing set. We do so because we learn the parameter of scaling on the training data and at the same time, we scale the trainig data. Now for the test data, we use the scaling parameters learned on the training data to scale the teat data, so there is no need for the fit_transform for the test data.   

In [59]:
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

In [60]:
print(X_train)

[[-0.5698444   1.74309049  0.16958176 ...  0.64259497 -1.03227043
   1.10643166]
 [ 1.75486502 -0.57369368 -2.30455945 ...  0.64259497  0.9687384
  -0.74866447]
 [-0.5698444  -0.57369368 -1.19119591 ...  0.64259497 -1.03227043
   1.48533467]
 ...
 [-0.5698444  -0.57369368  0.9015152  ...  0.64259497 -1.03227043
   1.41231994]
 [-0.5698444   1.74309049 -0.62420521 ...  0.64259497  0.9687384
   0.84432121]
 [ 1.75486502 -0.57369368 -0.28401079 ...  0.64259497 -1.03227043
   0.32472465]]


As the output of the below line, we can see all values have been scaled to a fesiable range.

In [61]:
#It shows the data with there row and column number.
pd.DataFrame(X_train)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10
0,-0.569844,1.743090,0.169582,-1.091687,-0.464608,0.006661,-1.215717,0.809503,0.642595,-1.032270,1.106432
1,1.754865,-0.573694,-2.304559,0.916013,0.301026,-1.377440,-0.006312,-0.921591,0.642595,0.968738,-0.748664
2,-0.569844,-0.573694,-1.191196,-1.091687,-0.943129,-1.031415,0.579935,-0.921591,0.642595,-1.032270,1.485335
3,-0.569844,1.743090,0.035566,0.916013,0.109617,0.006661,0.473128,-0.921591,0.642595,-1.032270,1.276528
4,-0.569844,1.743090,2.056114,-1.091687,1.736588,1.044737,0.810193,0.809503,0.642595,0.968738,0.558378
...,...,...,...,...,...,...,...,...,...,...,...
7995,1.754865,-0.573694,-0.582970,-1.091687,-0.656016,-0.339364,0.703104,0.809503,0.642595,0.968738,1.091330
7996,-0.569844,1.743090,1.478815,-1.091687,-1.613058,-0.339364,0.613060,-0.921591,0.642595,0.968738,0.131760
7997,-0.569844,-0.573694,0.901515,0.916013,-0.368904,0.006661,1.361474,0.809503,0.642595,-1.032270,1.412320
7998,-0.569844,1.743090,-0.624205,-1.091687,-0.081791,1.390762,-1.215717,0.809503,0.642595,0.968738,0.844321


In [62]:
#from tensorflow.keras.model import Sequential
from keras.models import Sequential 

In [63]:
from keras.layers import Dense

In [64]:
classifier = Sequential()

### Adding the input layer and the first hidden layer.
In the below line, we are going to add a further components of network to our model.
In this add method take the input from dense class. Here, Dense create a fully connected two layer network which consist of 11 input neurons and 6 output layers. We are going to add more layers like this, so that six input neuron will be in the hidden layer of our complete network.
The parameter for dense include :-
1. Units - It represent the no. of last layer neurons for now.
2. Kernel_initializer - The neural network need to start with some weights and then iteratively update them to better values. The term kernel_initiliser is a fancy term for which a statistical distribution or a function is used to initilise the weights in case of a statistical distribution. The library will generate numbers from the statistical distribution and use them as starting weights. The value of this parameter is uniform this means that all widths will be uniformly distributed in our network at first.
3. Activation_function - We have use the Rectifier Activation Function(RELU).
4. input_dim - It means dimention of input neuron, in our case it is 11. Whay it is 11? because we have 11 independent variables including 2 colums of geography, Thant's input_dim = 11. 

In [65]:
#Input Layer
classifier.add(Dense(units=6,kernel_initializer = 'uniform',activation = 'relu',input_dim = 11))

### Adding the next hidden layer

In [66]:
#Hidden Layer
classifier.add(Dense(units=6,kernel_initializer = 'uniform',activation='relu'))

### Adding the Output Layer
In the output layer we need one neuron. Why we need only one neuron because we can see our dependent variable in binary form. therefore we have to predict in zero or one form. 
Below line is similar to previous one but here the unit parameter is 1 because we know that one neuron is enough to predict the output and the activation function is used here is sigmoid activation function because it provide the probability of the customer staying the bank or leaving the bank.

In [67]:
#Output Layer
classifier.add(Dense(units = 1,kernel_initializer = 'uniform',activation = 'sigmoid'))

Now, we have done with the creation of the structure of our first Artificial neural network.
Now, In the next step we have to compile this network and also need to fit the data into the structure.


### Compiling the Artificial Neural Network   
Compiler is the method of the tensor flow library, which the artificial neural network together.
-The first parameter is the OPTIMISER its value in th code is adam which performs the stochastic gradient decent optimiser technique. 
-The second parameter is the Loss which represent the loss function. It is an optimization function which is used in case of training a classification model which classidy the data by predicting the probability wether the data is belong to one class or the other. One thing you need to aware of when you are doing a binary prediction similar to this one always use loss function 'binary_crssentropy' because the binary_crssentropy funtion computes the cross entropy loss between true levels and predicted levels.
-The third parameter is METRICS it is about the valuation of our network to evaluate our artificial neural network model. In this we are going to use accuracy metrics which calculates how often predictions equal levels.
This metric create two local variables tota and count that are used to compute the frequency with Y_pred matches Y_true

In [68]:
from sklearn import metrics


classifier.compile(optimizer = 'adam', loss  = 'binary_crossentropy',metrics = ['accuracy'])

### Fit the ANN model to the Training Set
Next objective to fit our dataset into our model to do this, we are using the fit method that takes 4 arguments.
First we have the value for the input neuron in this case it has X_train 
The second argument is the value for the output layer nerons.
The third argument is the batch_size instead of comparing the predictions with real results one by one, it's better to perform this task in a batch so at a time it takes 10 rows.
And the last argument is epochs it defines as the no. of time an algorithm visits the dataset. In other word epoch is one backward and one forward pass of the training examples.The neural network os train on certain no. of epochs to improve the accuracy over time. When you run the below line shows the accuracy in each epoch. 


In [69]:
classifier.fit(X_train, Y_train, batch_size = 10, epochs = 100)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

<keras.callbacks.History at 0x29ee7c481c0>

### Prediction of Result
In the below line we have done that if y_pred is greater than 0.5 then it become true else it become false.

In [70]:
y_pred = classifier.predict(X_test)
y_pred = (y_pred>0.5)



In [71]:
print(y_pred)

[[False]
 [False]
 [False]
 ...
 [False]
 [False]
 [False]]


In [72]:
from sklearn.metrics import confusion_matrix

cm = confusion_matrix(Y_test,y_pred)

In [73]:
print(cm)

[[1537   58]
 [ 252  153]]


In [74]:
from sklearn.metrics import accuracy_score

accuracy_score(Y_test,y_pred)

0.845