# Is my client going to exit my bank?

Exercise:
In the CSV file called *Churn_Modelling.csv* you will find information about different particulars, like:
- geography
- Gender
- Age
- Tenure
- ...

Your work as a Data Scientist in the bank is to build a model that detects whether a customer will leave the bank.
For that, the CSV file has a column named: EXITED, which tells you if the customer has left or not.

As it is trendy nowadays, your boss has told you to use DeepLearning.

Please, Remember the Universal **Workflow of ML** in order to handle this problem.

## Part1 - Data Preprocessing
Make use of the knowledge you already have to build:
- X_train, X_test, X_val
- y_train, y_test, y_val

Remember! you have Categorical data!

### importing Libraries

In [1]:
# Importing the libraries




### Importing the dataset

In [2]:
# Importing the dataset
import pandas as pd

df = pd.read_csv("Churn_Modelling.csv")
df

Unnamed: 0,RowNumber,CustomerId,Surname,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
0,1,15634602,Hargrave,619,France,Female,42,2,0.00,1,1,1,101348.88,1
1,2,15647311,Hill,608,Spain,Female,41,1,83807.86,1,0,1,112542.58,0
2,3,15619304,Onio,502,France,Female,42,8,159660.80,3,1,0,113931.57,1
3,4,15701354,Boni,699,France,Female,39,1,0.00,2,0,0,93826.63,0
4,5,15737888,Mitchell,850,Spain,Female,43,2,125510.82,1,1,1,79084.10,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9995,9996,15606229,Obijiaku,771,France,Male,39,5,0.00,2,1,0,96270.64,0
9996,9997,15569892,Johnstone,516,France,Male,35,10,57369.61,1,1,1,101699.77,0
9997,9998,15584532,Liu,709,France,Female,36,7,0.00,1,0,1,42085.58,1
9998,9999,15682355,Sabbatini,772,Germany,Male,42,3,75075.31,2,1,0,92888.52,1


In [3]:
'''
We will delete the following columns that will not be relevant for our furture model:
- RowNumber
- CustomerId: unique number on each customer
- Surname: not relevant
'''

df = df.drop(['RowNumber', 'CustomerId', 'Surname'], axis=1)
df

Unnamed: 0,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
0,619,France,Female,42,2,0.00,1,1,1,101348.88,1
1,608,Spain,Female,41,1,83807.86,1,0,1,112542.58,0
2,502,France,Female,42,8,159660.80,3,1,0,113931.57,1
3,699,France,Female,39,1,0.00,2,0,0,93826.63,0
4,850,Spain,Female,43,2,125510.82,1,1,1,79084.10,0
...,...,...,...,...,...,...,...,...,...,...,...
9995,771,France,Male,39,5,0.00,2,1,0,96270.64,0
9996,516,France,Male,35,10,57369.61,1,1,1,101699.77,0
9997,709,France,Female,36,7,0.00,1,0,1,42085.58,1
9998,772,Germany,Male,42,3,75075.31,2,1,0,92888.52,1


### Handling Categorical values
which ones are categorical? how do we handle them? (did someone say One hot encoding?)

In [4]:
#Handle Categorical values

from sklearn.preprocessing import OneHotEncoder

num_cols = [
    "CreditScore", 
    "CreditScore", 
    "Age", 
    "Tenure", 
    "Balance", 
    "NumOfProducts", 
    "HasCrCard", 
    "IsActiveMember", 
    "EstimatedSalary"
]

cat_cols = [
    "Geography",
    "Gender"
]

df_num = df[num_cols]
df_cat = df[cat_cols]
df_target = df['Exited']

enc = OneHotEncoder(handle_unknown='ignore').fit(df_cat)
df_cat_encoded = pd.DataFrame(enc.transform(df_cat).toarray())

df_cat_encoded.head()

Unnamed: 0,0,1,2,3,4
0,1.0,0.0,0.0,1.0,0.0
1,0.0,0.0,1.0,1.0,0.0
2,1.0,0.0,0.0,1.0,0.0
3,1.0,0.0,0.0,1.0,0.0
4,0.0,0.0,1.0,1.0,0.0


In [5]:
df_final = pd.DataFrame(df_num)
df_final = df_final.join(df_cat_encoded)
df_final = df_final.join(df_target)
df_final

Unnamed: 0,CreditScore,CreditScore.1,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,0,1,2,3,4,Exited
0,619,619,42,2,0.00,1,1,1,101348.88,1.0,0.0,0.0,1.0,0.0,1
1,608,608,41,1,83807.86,1,0,1,112542.58,0.0,0.0,1.0,1.0,0.0,0
2,502,502,42,8,159660.80,3,1,0,113931.57,1.0,0.0,0.0,1.0,0.0,1
3,699,699,39,1,0.00,2,0,0,93826.63,1.0,0.0,0.0,1.0,0.0,0
4,850,850,43,2,125510.82,1,1,1,79084.10,0.0,0.0,1.0,1.0,0.0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9995,771,771,39,5,0.00,2,1,0,96270.64,1.0,0.0,0.0,0.0,1.0,0
9996,516,516,35,10,57369.61,1,1,1,101699.77,1.0,0.0,0.0,0.0,1.0,0
9997,709,709,36,7,0.00,1,0,1,42085.58,1.0,0.0,0.0,1.0,0.0,1
9998,772,772,42,3,75075.31,2,1,0,92888.52,0.0,1.0,0.0,0.0,1.0,1


### Split the data
please use random_state=42 in order to be able to compare all the solutions

In [7]:
# Splitting the dataset into the Training set, Test set and validation set
# please use random_state=42

from sklearn.model_selection import train_test_split

X = df_final.iloc[:,:-1]
y = df_final.iloc[:,-1]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

### Remember, we should always scale the inputs for a Neural network

In [None]:
# Feature Scaling
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler().fit(X_train)

X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)

We have preprocessed the data! now, lets go to the second part!

## Part 2 - Lets Build our ANN

Always taking into account the Universal workflow of ML
you can use all the workflow, or only a part of it:


*   For example, Number 8, hyperparameter optimization is not mandatory, you can choose, if you think your model is good enogh, just don't do it
*   or... you don't have to use all the regularization techniques in step 7, choose the ones you think are the best, and try.
* **But I really think that step 1 through 6 and step 9 are always mandatory**

Evaluate your model on the test set! how accurate is it? which is the confussion matrix?



15

In [17]:
## Part 2 - Model building
from tensorflow.keras import models
from tensorflow.keras import layers
from tensorflow.keras.utils import plot_model
from tensorflow.keras.callbacks import EarlyStopping

nn = models.Sequential([
    layers.Dense(64, activation='relu', input_shape=(X_train.shape[1],)),
    layers.Dense(64, activation='relu'),
    layers.Dense(1, activation='sigmoid')
])

nn.compile(optimizer='rmsprop', loss='mean_squared_error', metrics=['binary_accuracy'])

EarlyStopping_CallBack = [EarlyStopping(monitor='val_loss', min_delta=0, patience=10, verbose=0, mode='auto', baseline=None)]
nn.fit(X_train, y_train, callbacks=EarlyStopping_CallBack, epochs=500, batch_size=512, validation_data=(X_test, y_test))

Train on 6700 samples, validate on 3300 samples
Epoch 1/500
Epoch 2/500
Epoch 3/500
Epoch 4/500
Epoch 5/500
Epoch 6/500
Epoch 7/500
Epoch 8/500
Epoch 9/500
Epoch 10/500
Epoch 11/500


<tensorflow.python.keras.callbacks.History at 0x23f5fec0908>

## Part 3 - Make a prediction

Could you pleas tell me if this person will leave the bank or not?

- Credit = 600
- Geography= France
- Gender = Male
- Age= 40
- Tenure= 3
- Balance= 60000
- NumOfProducts=2
- HasCard=1
- IsActiveMember = 1
- EstimatedSalary=50000