The case study is from an open source dataset from Kaggle. Link to the Kaggle project site:

https://www.kaggle.com/barelydedicated/bank-customer-churn-modeling

Given a Bank customer, can we build a classifier which can determine whether they will leave or not using Neural networks?

Data file - bank.csv

The points distribution for this case is as follows:

Read the dataset Drop the columns which are unique for all users like IDs (2.5 points)

Distinguish the feature and target set (2.5 points) 

Divide the data set into Train and test sets Normalize the train and test data (2.5 points) 

Initialize & build the model (10 points) 

Optimize the model (5 points) 

Predict the results using 0.5 as a threshold (5 points)

Print the Accuracy score and confusion matrix (2.5 points)



# Richer syntax highlighting

Improved support for nested languages:

```notebook-python
df = pd.io.gbq.read_gbq('''
  SELECT 
    REGEXP_EXTRACT(name, '[a-zA-Z]+'),
    SUM(number) as count
  FROM `bigquery-public-data.usa_names.usa_1910_2013`
  WHERE state = 'TX'
  GROUP BY name
  ORDER BY count DESC
  LIMIT 100
''')
```

In [0]:
import tensorflow as tf
tf.enable_eager_execution()

In [31]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [0]:
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

In [33]:
tf.__version__

'1.15.0-rc3'

In [0]:
import pandas as pd

In [0]:
data = pd.read_csv('/content/drive/My Drive/Data/bank.csv')

In [0]:
data.drop(columns=['CustomerId', 'RowNumber', 'Surname'], inplace=True)

In [198]:
data.shape

(10000, 11)

In [199]:
data.head()

Unnamed: 0,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
0,619,France,Female,42,2,0.0,1,1,1,101348.88,1
1,608,Spain,Female,41,1,83807.86,1,0,1,112542.58,0
2,502,France,Female,42,8,159660.8,3,1,0,113931.57,1
3,699,France,Female,39,1,0.0,2,0,0,93826.63,0
4,850,Spain,Female,43,2,125510.82,1,1,1,79084.1,0


**Distinguish the feature and target set!**

In [200]:
from sklearn.model_selection import train_test_split

X = data.drop(['Exited'], axis=1)
y = data['Exited']

print (X.shape)
print (X.head())

(10000, 10)
   CreditScore Geography  Gender  ...  HasCrCard  IsActiveMember  EstimatedSalary
0          619    France  Female  ...          1               1        101348.88
1          608     Spain  Female  ...          0               1        112542.58
2          502    France  Female  ...          1               0        113931.57
3          699    France  Female  ...          0               0         93826.63
4          850     Spain  Female  ...          1               1         79084.10

[5 rows x 10 columns]


In [168]:
y = data.iloc[:, 10]
print (y.shape)
print (y.head())

(10000,)
0    1
1    0
2    1
3    0
4    0
Name: Exited, dtype: int64


**Divide the data set into Train and test sets Normalize the train and test data**

In [220]:
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.7, random_state=1)

print ('X train', X_train.shape)
print ('X test', X_test.shape)
print ('y train', y_train.shape)
print ('y test', y_test.shape)

X train (7000, 10)
X test (3000, 10)
y train (7000,)
y test (3000,)


In [221]:
X_train = pd.get_dummies(X_train)
X_train.shape
X_train.head()

(7000, 13)

Unnamed: 0,CreditScore,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Geography_France,Geography_Germany,Geography_Spain,Gender_Female,Gender_Male
2228,644,37,8,0.0,2,1,0,20968.88,1,0,0,1,0
5910,481,39,6,0.0,1,1,1,24677.54,1,0,0,1,0
1950,680,37,10,123806.28,1,1,0,81776.84,1,0,0,1,0
2119,690,29,5,0.0,2,1,0,108577.97,1,0,0,0,1
5947,656,45,7,145933.27,1,1,1,199392.14,1,0,0,1,0


In [0]:
X_test = pd.get_dummies(X_test)

In [223]:
X_test.shape
X_test.head()

(3000, 13)

Unnamed: 0,CreditScore,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Geography_France,Geography_Germany,Geography_Spain,Gender_Female,Gender_Male
9953,550,47,2,0.0,2,1,1,97057.28,1,0,0,0,1
3850,680,34,3,143292.95,1,1,0,66526.01,1,0,0,0,1
4962,531,42,2,0.0,2,0,1,90537.47,1,0,0,1,0
3886,710,34,8,147833.3,2,0,1,1561.58,0,1,0,0,1
5437,543,30,6,73481.05,1,1,1,176692.65,0,1,0,0,1


In [0]:
from sklearn.preprocessing import Normalizer

In [0]:
transformer = Normalizer()
X_train = transformer.fit_transform(X_train)
X_test = transformer.fit_transform(X_test)


In [227]:
X_train.shape
X_train[0]

(7000, 13)

array([3.06976548e-02, 1.76368513e-03, 3.81337326e-04, 0.00000000e+00,
       9.53343316e-05, 4.76671658e-05, 0.00000000e+00, 9.99527079e-01,
       4.76671658e-05, 0.00000000e+00, 0.00000000e+00, 4.76671658e-05,
       0.00000000e+00])

In [228]:
y_test.shape
# y_test.head


(3000,)

In [0]:
import numpy as np

**Initialize & build the model**

In [0]:
#Initialize Sequential model
model = tf.keras.models.Sequential()

#Input data
model.add(tf.keras.layers.Reshape((13,),input_shape=(13,)))
#model.add(tf.keras.layers.Dense(13, input_shape=(13,)))
#Normalize the data
#model.add(tf.keras.layers.BatchNormalization())

#Add OUTPUT layer
model.add(tf.keras.layers.Dense(1, activation='sigmoid'))

#Comile the model
model.compile(optimizer='sgd', loss='mse', metrics=['accuracy'])

In [210]:
model.summary()

Model: "sequential_15"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_20 (Dense)             (None, 13)                182       
_________________________________________________________________
batch_normalization_15 (Batc (None, 13)                52        
_________________________________________________________________
dense_21 (Dense)             (None, 1)                 14        
Total params: 248
Trainable params: 222
Non-trainable params: 26
_________________________________________________________________


In [184]:
pd.Series.to_numpy (y_test)

array([0, 0, 0, ..., 0, 0, 0])

In [185]:
#y_test = pd.get_dummies(y_test).values
y_test.shape

(3000,)

In [160]:
y_test.shape

(3000,)

In [229]:
#y_train = pd.get_dummies(y_train).values
y_test = pd.Series.to_numpy (y_test)
y_test
y_test[0]

array([0, 0, 0, ..., 0, 0, 0])

0

In [230]:
#y_train = pd.get_dummies(y_train).values
y_train = pd.Series.to_numpy (y_train)
y_train
y_train[0]

array([0, 0, 1, ..., 1, 0, 1])

0

In [231]:
model.fit(X_train,y_train,          
          validation_data=(X_test,y_test),
          epochs=15)

Train on 7000 samples, validate on 3000 samples
Epoch 1/15
Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15
Epoch 9/15
Epoch 10/15
Epoch 11/15
Epoch 12/15
Epoch 13/15
Epoch 14/15
Epoch 15/15


<tensorflow.python.keras.callbacks.History at 0x7f6d402e12b0>

# **Predict the results using 0.5 as a threshold (5 points) !**


In [0]:
pred = model.predict(X_test)

In [234]:
pred

array([[0.1980516 ],
       [0.2223061 ],
       [0.19806167],
       ...,
       [0.2434839 ],
       [0.18918967],
       [0.18953416]], dtype=float32)

In [0]:
y_pred = (pred > 0.5)


In [236]:
y_pred

array([[False],
       [False],
       [False],
       ...,
       [False],
       [False],
       [False]])

### **Print the Accuracy score and confusion matrix (2.5 points)**




In [237]:
model.evaluate(X_test, y_test, verbose=0)[1]

0.791

In [0]:
from sklearn.metrics import confusion_matrix

In [239]:
confusion_matrix(y_test, y_pred)

array([[2373,    0],
       [ 627,    0]])

**Optimize the Model**

In [0]:
#Initialize Sequential model
model = tf.keras.models.Sequential()

#Input data
model.add(tf.keras.layers.Reshape((13,),input_shape=(13,)))

#Normalize the data
model.add(tf.keras.layers.BatchNormalization())

#Add 1st hidden layer
model.add(tf.keras.layers.Dense(10, activation='sigmoid'))

#Add 2nd hidden layer
model.add(tf.keras.layers.Dense(8, activation='sigmoid'))

#Add OUTPUT layer
model.add(tf.keras.layers.Dense(1, activation='sigmoid'))



In [0]:
sgd_optimizer = tf.keras.optimizers.SGD(lr=0.03)

#Comile the model
model.compile(optimizer=sgd_optimizer, loss='mse', metrics=['accuracy'])

In [244]:
model.fit(X_train,y_train,          
          validation_data=(X_test,y_test),
          epochs=15)

Train on 7000 samples, validate on 3000 samples
Epoch 1/15
Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15
Epoch 9/15
Epoch 10/15
Epoch 11/15
Epoch 12/15
Epoch 13/15
Epoch 14/15
Epoch 15/15


<tensorflow.python.keras.callbacks.History at 0x7f6d3f1825c0>

In [245]:
model.evaluate(X_test, y_test, verbose=0)[1]

0.791

**With additional optimization also, the accuracy is same**