Cross Vadation: K-Fold CV, stratified Fold CV, Time Series CV

## Part 1: Data preprocessing


### 1.1 Importing the libraries


In [1]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

### 1.2 Importing the dataset

In [2]:
df = pd.read_csv('Churn_Modelling.csv')

### 1.3 Review data

In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 14 columns):
RowNumber          10000 non-null int64
CustomerId         10000 non-null int64
Surname            10000 non-null object
CreditScore        10000 non-null int64
Geography          10000 non-null object
Gender             10000 non-null object
Age                10000 non-null int64
Tenure             10000 non-null int64
Balance            10000 non-null float64
NumOfProducts      10000 non-null int64
HasCrCard          10000 non-null int64
IsActiveMember     10000 non-null int64
EstimatedSalary    10000 non-null float64
Exited             10000 non-null int64
dtypes: float64(2), int64(9), object(3)
memory usage: 1.1+ MB


In [4]:
df.head()

Unnamed: 0,RowNumber,CustomerId,Surname,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
0,1,15634602,Hargrave,619,France,Female,42,2,0.0,1,1,1,101348.88,1
1,2,15647311,Hill,608,Spain,Female,41,1,83807.86,1,0,1,112542.58,0
2,3,15619304,Onio,502,France,Female,42,8,159660.8,3,1,0,113931.57,1
3,4,15701354,Boni,699,France,Female,39,1,0.0,2,0,0,93826.63,0
4,5,15737888,Mitchell,850,Spain,Female,43,2,125510.82,1,1,1,79084.1,0


In [5]:
df.tail()

Unnamed: 0,RowNumber,CustomerId,Surname,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
9995,9996,15606229,Obijiaku,771,France,Male,39,5,0.0,2,1,0,96270.64,0
9996,9997,15569892,Johnstone,516,France,Male,35,10,57369.61,1,1,1,101699.77,0
9997,9998,15584532,Liu,709,France,Female,36,7,0.0,1,0,1,42085.58,1
9998,9999,15682355,Sabbatini,772,Germany,Male,42,3,75075.31,2,1,0,92888.52,1
9999,10000,15628319,Walker,792,France,Female,28,4,130142.79,1,1,0,38190.78,0


### 1.4 Split data into the independent vs dependent variables

In [6]:
X = df.iloc[:,3:13].values
y = df.iloc[:,-1].values

### 1.5 Encoding categorical data


In [7]:
from sklearn.preprocessing import LabelEncoder, OneHotEncoder

For Geography,

In [8]:
labelencoder_X_1 = LabelEncoder() 

In [9]:
X[:, 1] = labelencoder_X_1.fit_transform(X[:, 1])# column [1] for Geography

For gender,

In [10]:
labelencoder_X_2 = LabelEncoder()

In [11]:
X[:, 2] = labelencoder_X_2.fit_transform(X[:, 2])

In [12]:
# create dummy variable for countries column:
onehotencoder = OneHotEncoder(categorical_features = [1])
X = onehotencoder.fit_transform(X).toarray()
#remove the 1st column to avoid dummy variable trap:
X = X[:,1:] 


In case you used a LabelEncoder before this OneHotEncoder to convert the categories to integers, then you can now use the OneHotEncoder directly.


In [13]:
print(X[0,:])

[0.0000000e+00 0.0000000e+00 6.1900000e+02 0.0000000e+00 4.2000000e+01
 2.0000000e+00 0.0000000e+00 1.0000000e+00 1.0000000e+00 1.0000000e+00
 1.0134888e+05]


### 1.6 Split data into train and test sets

In [14]:
from sklearn.model_selection import train_test_split

In [15]:
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.25, random_state = 0)

### 1.7 Feature Scaling

In [16]:
from sklearn.preprocessing import StandardScaler

In [17]:
sc_X = StandardScaler()

In [18]:
X_train = sc_X.fit_transform(X_train)

In [19]:
X_test = sc_X.fit_transform(X_test)

## Part 2: Making the ANN


### 2.1 Import the Keras libraries and packages:

In [20]:
from tensorflow.keras.models import Sequential  #initialize the neural network
from tensorflow.keras.layers import Dense # bulid the layers of ANN

### 2.2 Adding layers to RNN

#### Initialising the ANN:

In [21]:
classifier = Sequential()

#### Adding the input layer and the first hidden layer:

In [22]:
classifier.add(Dense(6, kernel_initializer = 'uniform', activation = 'relu', input_dim = 11)) 

W0913 13:57:13.919348 140735623418752 deprecation.py:506] From /Users/PhuocNhatDANG/.local/lib/python3.6/site-packages/tensorflow/python/keras/initializers.py:119: calling RandomUniform.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor


Note:
    - The 1st hidden layer has 6 nodes
    - 'uniform': initialize the weights randomly and close to zero
    - 'relu': rectifier activation function
    - rectifier for hidden layers  (sigmoid func for output layer)

#### Adding the 2nd hidden layer:

In [23]:
classifier.add(Dense(6, kernel_initializer = 'uniform', activation = 'relu')) 

#### Adding the output layer:

In [24]:
classifier.add(Dense(1, kernel_initializer = 'uniform', activation = 'sigmoid')) 

Note:
    - For the output more 2 categories (ex:3), we have to change the number of units and the activation func by 3 and 'softmax'.
    - Softmax is the sigmoid function to three or more categories output. 

 ### 2.3 Compiling the ANN:

In [25]:
classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'] )

W0913 13:57:14.066694 140735623418752 deprecation.py:323] From /Users/PhuocNhatDANG/.local/lib/python3.6/site-packages/tensorflow/python/ops/nn_impl.py:180: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where


Here,
    - _optimizer_: algorithm to find the optimal set of weights
    - _loss_: for the output more 2 categories (ex:3), we change to 'categorical_crossentropy'.
    - _metrics_: criterion to evaluate our model.


### 2.4 Fitting the ANN to the Training set:


In [26]:
classifier.fit(X_train, y_train, batch_size = 10, nb_epoch = 100)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

Epoch 80/100
Epoch 81/100
Epoch 82/100
Epoch 83/100
Epoch 84/100
Epoch 85/100
Epoch 86/100
Epoch 87/100
Epoch 88/100
Epoch 89/100
Epoch 90/100
Epoch 91/100
Epoch 92/100
Epoch 93/100
Epoch 94/100
Epoch 95/100
Epoch 96/100
Epoch 97/100
Epoch 98/100
Epoch 99/100
Epoch 100/100


<tensorflow.python.keras.callbacks.History at 0x1a39757588>

Here,             
    - bacth_size: the number of observations after  which you want to update the weights
     - epochs : number of rounds that the whole training set pass through the ANN

# Part 3. Making the predictions 

### 3.1 Predicting the Test set results:

In [27]:
y_pred = classifier.predict(X_test)
y_pred = (y_pred > 0.5)

### 3.2 Predicting a new single observation

Use our ANN model to predict if the customer with the following informations will leave the bank: 

    Geography: France -> [0, 0]
    
    Credit Score: 600
    
    Gender: Male -> [1]
    
    Age: 40 years old
    
    Tenure: 3 years
    
    Balance: $60000
    
    Number of Products: 2
    
    Does this customer have a credit card ? Yes ->[1]
    
    Is this customer an Active Member: Yes ->[1]
    
    Estimated Salary: $50000
    
So should we say goodbye to that customer ?


Firstly, we create a horizontal vector to contain information of this customer

In [75]:
new_cus = np.array([[0, 0, 600, 1 , 40, 3, 60000, 2, 1, 1, 50000]])

Before predicting to this customer, we have to scale the information vector, like we did with X_train and X_test.

In [84]:
new_cus = sc_X.transform(new_cus)

Time to predict whether we should say goodbye to this customer.

In [85]:
new_prediction = classifier.predict(new_cus)  
new_prediction = (new_prediction > 0.5) 

In [86]:
print(new_prediction)

[[False]]


From the new_prediction, False means the customer does not leave the bank and we should not say goodbye to this customer.

### 3.3 Making the confusion matrix

In [28]:
from sklearn.metrics import confusion_matrix

In [29]:
cm = confusion_matrix(y_test,y_pred)

In [30]:
cm

array([[1836,  155],
       [ 217,  292]])

### 3.4 Calculate accuracy

In [91]:
acc = (cm[0,0]+cm[1,1])/np.sum(cm)

In [92]:
acc

0.8512

Compare to training accuracy (0.86), we see that test accuray (0.85) is lower than it and both of them are low Therefore, we need to improve our model.

# Part 4. Evaluating the ANN

We see that we have the high variance problem when we run a lot of time our model. To fix it, we use k-Fold Cross Validation

We know that K-fold cross validation function belongs to Scikit-learn and our model is implemented with Keras. So we have to combine them together.  We will include K-Fold cross validation in our Keras classifier by Keras wrapper.

### 4.1 Import needed libraries and classes

In [20]:
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier #wrap Keras and scikit learn.
from sklearn.model_selection import cross_val_score # k-fold cross validation
from tensorflow.keras.models import Sequential  #initialize the neural network
from tensorflow.keras.layers import Dense # bulid the layers of ANN

### 4.2 Combine Keras and K-Fold Cross Validation's scikit-learn

KerasClassifier that we imported recently expects for one of its argument a function. It's actually its first argument we will see called build_fn (build function). This function is simply a function that returns the classifier that we made above. Let's start making this function,

In [21]:
def build_classifier():
    # Initialising the ANN:
    classifier = Sequential()

    # Adding the input layer and the first hidden layer:
    classifier.add(Dense(6, kernel_initializer = 'uniform', activation = 'relu', input_dim = 11)) 

    # Adding the 2nd hidden layer:
    classifier.add(Dense(6, kernel_initializer = 'uniform', activation = 'relu')) 

    # Adding the output layer:
    classifier.add(Dense(1, kernel_initializer = 'uniform', activation = 'sigmoid')) 
    
    #compiling the ANN:
    classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'] )
    return classifier

Next, we will wrap all of things together. To do it, we will create a global classifier variable that is trained on 10 different training folds. 

In [22]:
classifier = KerasClassifier(build_fn = build_classifier, batch_size = 10, epochs = 100)


Create a vector to store 10 accuracies returned by K-fold cross validation.

In [23]:
accuracies = cross_val_score(estimator = classifier,X=X_train,y=y_train, cv = 10, n_jobs = -1)

<code>n_jobs</code> is the number of CPUs to use to do the computation. -1 means all CPUs.

__!!!!!!!!Restart kernal__ and only run __Part 1__ and __Part 4!!!!!!!!!!!!__. 

We will calculate the relevant accuracy and their variance to see if we have high variance or low variance.

In [25]:
mean_acc = np.mean(accuracies) #accuracy
mean_acc

0.8389333307743072

In [26]:
std_dev = np.sqrt(sum(np.square(accuracies - mean_acc))/len(accuracies))#variance
std_dev

0.011605374713808434

or

In [27]:
#std = np.std(accuracies)
#std

This is greater than 1%, so we meet high variance problem. Next, we will improve our ANN model by tuning the parameters.

# Part 5. Improving the ANN

In the case we get overfitting or high variance problem, you can use dropout regularization. In our case, we got high variance problem. Therefore we will add dropout layers to our ANN to solve this problem. 

In [20]:
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import cross_val_score # k-fold cross validation
from tensorflow.keras.models import Sequential  #initialize the neural network
from tensorflow.keras.layers import Dense # add the output layers of ANN
from tensorflow.keras.layers import Dropout # add the Dropout layers of ANN


In [26]:
def build_classifier():
    # Initialising the ANN:
    classifier = Sequential()

    # Adding the input layer and the first hidden layer:
    classifier.add(Dense(6, kernel_initializer = 'uniform', activation = 'relu', input_dim = 11)) 
    classifier.add(Dropout(rate=0.1))
    
    # Adding the 2nd hidden layer:
    classifier.add(Dense(6, kernel_initializer = 'uniform', activation = 'relu')) 
    classifier.add(Dropout(rate=0.1))
    
    # Adding the output layer:
    classifier.add(Dense(1, kernel_initializer = 'uniform', activation = 'sigmoid')) 
    
    #compiling the ANN:
    classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'] )
    return classifier

For choosing the rate of dropout, you can try from 0.1 until 0.5. DON'T go over 0.5. You will get too close to underfitting.

In [22]:
classifier = KerasClassifier(build_fn = build_classifier, batch_size = 10, epochs = 100)


In [23]:
accuracies = cross_val_score(estimator = classifier,X=X_train,y=y_train, cv = 10, n_jobs = -1)

__!!!!!!!!Restart kernal__ and only run __Part 1__ and __Part 5!!!!!!!!!!!!__. 

In [24]:
mean_acc = np.mean(accuracies) #accuracy
mean_acc

0.8044000089168548

In [25]:
std_dev = np.sqrt(sum(np.square(accuracies - mean_acc))/len(accuracies))#variance
std_dev

0.02205488427424117

We solved the high variance problem. Next, we will increase our accuracy by tuning our ANN model.

# Part 6. Tuning the ANN

In this part, we will find the best hyperparameters with a technique called _Grid Search_. This will test several combinations of hyperparameters and return the best selection.

In [27]:
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import cross_val_score # k-fold cross validation
from tensorflow.keras.models import Sequential  #initialize the neural network
from tensorflow.keras.layers import Dense # add the output layers of ANN
from tensorflow.keras.layers import Dropout # add the Dropout layers of ANN
from sklearn.model_selection import GridSearchCV # parameter tuning 


In [26]:
def build_classifier(optimizer):
    # Initialising the ANN:
    classifier = Sequential()

    # Adding the input layer and the first hidden layer:
    classifier.add(Dense(6, kernel_initializer = 'uniform', activation = 'relu', input_dim = 11)) 
    classifier.add(Dropout(rate=0.1))
    
    # Adding the 2nd hidden layer:
    classifier.add(Dense(6, kernel_initializer = 'uniform', activation = 'relu')) 
    classifier.add(Dropout(rate=0.1))
    
    # Adding the output layer:
    classifier.add(Dense(1, kernel_initializer = 'uniform', activation = 'sigmoid')) 
    
    #compiling the ANN:
    classifier.compile(optimizer = optimizer, loss = 'binary_crossentropy', metrics = ['accuracy'] )
    return classifier

We need to create a classifier variable for tuning 

In [22]:
classifier = KerasClassifier(build_fn = build_classifier)


Then, we create a dictionary that will contain our hyperparameters that we want to optimize,

In [28]:
parameters = {'batch_size': [25, 32],
              'epochs': [100,500],
              'optimizer':['adam','rmsprop']}

Implementing Grid Search,

In [29]:
grid_search = GridSearchCV(estimator = classifier,
                           param_grid = parameters,
                           scoring = 'accuracy',
                           cv =10)

We have not yet fiited to the training set,  time to do it.

In [30]:
grid_search = grid_search.fit(X=X_train,y=y_train)

ValueError: optimizer is not a legal parameter

In [32]:
best_parameters = grid_search.best_params_
best_accuracy =  grid_search.best_score_

AttributeError: 'GridSearchCV' object has no attribute 'best_params_'

In [None]:
best_parameters

In [None]:
best_accuaracy