<h1>Credit Card Fraud Detection - Kaggle Competition - Part 2</h1> <a id='top'></a>

Link to the original competition:
<a href="https://www.kaggle.com/mlg-ulb/creditcardfraud">Link to Kaggle Competition</a> <br>
Some resources:
<a href="https://medium.com/codex/credit-card-fraud-detection-with-machine-learning-in-python-ac7281991d87">Some1</a> 
<br>
<a href="https://chrisalbon.com/">Notes On Using Data Science & Machine Learning To Fight For Something That Matters</a>
<br><br>
Part Two Content:
<ul>
    <li><a href='#log_reg'>Logistic Regression</a></li>
    <li><a href='#iso_for'>Isolation Forest Model</a></li>
    <li><a href='#ran_for'>Random Forests</a></li>
    <li><a href='#knn'>K-Nearest Neighbors</a></li>
    <li><a href='#svm'>Support Vector Machines</a></li>
    <li><a href='#xgb'>XGBoost</a></li>
</ul>

Other Information:

<ul>
    <li><a href='#eva'>Initial Evaluation</a></li>
    <ul>
        <li><a href='#acc'>Simple Accuracy</a></li>
        <li><a href='#prc'>Precision</a></li>
        <li><a href='#rec'>Recall</a></li>
        <li><a href='#f1'>F1 Score</a></li>
    </ul>
</ul>

<ul>
    <li><a href='#grid_cv'>Grid Search CV</a></li>
    <ul>
        <li><a href='#log_reg_gs_cv'>Logistic Regression - Cross Validation</a></li>
        <li><a href='#iso_for_gs_cv'>Isolation Forest Model - Cross Validation</a></li>
        <li><a href='#ran_for_gs_cv'>Random Forests - Cross Validation</a></li>
        <li><a href='#knn_gs_cv'>K-Nearest Neighbors - Cross Validation</a></li>
        <li><a href='#svm_gs_cv'>Support Vector Machines - Cross Validation</a></li>
        <li><a href='#xgb_gs_cv'>XGBoost</a></li>
    </ul>
</ul>    

 
<ul>
    <li><a href='#eva_cv'>Cross Validation</a></li>
    <ul>
        <li><a href='#log_reg_cv'>Logistic Regression - Cross Validation</a></li>
        <li><a href='#iso_for_cv'>Isolation Forest Model - Cross Validation</a></li>
        <li><a href='#ran_for_cv'>Random Forests - Cross Validation</a></li>
        <li><a href='#knn_cv'>K-Nearest Neighbors - Cross Validation</a></li>
        <li><a href='#svm_cv'>Support Vector Machines - Cross Validation</a></li>
        <li><a href='#xgb_cv'>XGBoost</a></li>
    </ul>
</ul>    

<strong>Load libraries</strong>

In [1]:
import pandas as pd
import numpy as np
%matplotlib inline

<strong>Load Data</strong>

In [2]:
# Load Data into pandas dataframe
df = pd.read_csv("creditcard.csv")
df.drop('Time', axis = 1, inplace = True)
# print("Shape: ", df.shape, "\n\nDescribtion: \n", df.describe())

In [3]:
# Replace empty data with nan:
df = df.replace('', np.nan)

# Check if there are any nan data:
if len([df.iloc[i,j] for i,j in zip(*np.where(pd.isnull(df)))])==0 :
    print("No empty records")
else:
    for i,j in zip(*np.where(pd.isnull(df))):
        df.iloc[i,j]

No empty records


In [4]:
# https://re-thought.com/pandas-value_counts/
# value_counts() - function is used to get a Series containing counts of unique values. Excludes NA values by default
# dropna = default True - with dropna = False, it will include NA values
print(df.Class.value_counts(dropna = False))
print("Fraudulant transactions are",round(len(df[df.Class == 1])/len(df[df.Class == 0]) * 100,2),"% of valid transaction")

0    284315
1       492
Name: Class, dtype: int64
Fraudulant transactions are 0.17 % of valid transaction


We have 284315 valid transactions and 492 fraudulent transactions. This means that the data is highly inbalanced

In [5]:
nonfraud_cases = df[df.Class == 0]
fraud_cases = df[df.Class == 1]



In [6]:
from termcolor import colored as cl # text customization
print(cl('CASE AMOUNT STATISTICS', attrs = ['bold']))
print(cl('--------------------------------------------', attrs = ['bold']))
print(cl('NON-FRAUD CASE AMOUNT STATS', attrs = ['bold']))
print(nonfraud_cases.Amount.describe())
print(cl('--------------------------------------------', attrs = ['bold']))
print(cl('FRAUD CASE AMOUNT STATS', attrs = ['bold']))
print(fraud_cases.Amount.describe())
print(cl('--------------------------------------------', attrs = ['bold']))

[1mCASE AMOUNT STATISTICS[0m
[1m--------------------------------------------[0m
[1mNON-FRAUD CASE AMOUNT STATS[0m
count    284315.000000
mean         88.291022
std         250.105092
min           0.000000
25%           5.650000
50%          22.000000
75%          77.050000
max       25691.160000
Name: Amount, dtype: float64
[1m--------------------------------------------[0m
[1mFRAUD CASE AMOUNT STATS[0m
count     492.000000
mean      122.211321
std       256.683288
min         0.000000
25%         1.000000
50%         9.250000
75%       105.890000
max      2125.870000
Name: Amount, dtype: float64
[1m--------------------------------------------[0m


While seeing the statistics, it is seen that the values in the ‘Amount’ variable are varying enormously when compared to the rest of the variables. To reduce its wide range of values, we can normalize it using the ‘StandardScaler’ method in python.

In [7]:
# Good description of StandardScaler can be found here
# https://machinelearningmastery.com/standardscaler-and-minmaxscaler-transforms-in-python/
from sklearn.preprocessing import StandardScaler # data normalization
# StandardScaler.fit_transform will scale data so mean will be zero
sc = StandardScaler()
amount = df['Amount'].values
df['Amount'] = sc.fit_transform(amount.reshape(-1, 1))
print(cl(df['Amount'].describe(), attrs = ['bold']))


[1mcount    2.848070e+05
mean     2.913952e-17
std      1.000002e+00
min     -3.532294e-01
25%     -3.308401e-01
50%     -2.652715e-01
75%     -4.471707e-02
max      1.023622e+02
Name: Amount, dtype: float64[0m


In [8]:
# # This is has nothing to do with the actual task, it is just to show how StandardScaler.fit_transform works
# from numpy import asarray
# data = asarray([[100, 0.001],[8, 0.05],[50, 0.005],[88, 0.07],[4, 0.1]])
# print(data)
# # define standard scaler
# scaler = StandardScaler()
# # transform data
# scaled = scaler.fit_transform(data)
# print(scaled)

<h2>Feature Selection & Data Split</h2>
In this process, we are going to define the independent (X) and the dependent variables(Y). Using the defined variables, we will split the data into a training set and testing set which is further used for modeling and evaluating. We can split th data easily using the 'train_test_split' algorithm in python.
#section_id

In [9]:
# All the data except Class is independent data so this is loaded into X
x = df.drop('Class', axis = 1).values
# Since we want to predict class (class = 1 fraud and class = 0 valid transaction)
y = df['Class'].values

<a id='split'></a>
<h4>Split data to training data and test data, test size 20%</h4>

In [10]:
from sklearn.model_selection import train_test_split # data split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.2, random_state = 0)

print(cl('x_train samples : ', attrs = ['bold']), x_train[:1])
print(cl('x_test samples : ', attrs = ['bold']), x_test[0:1])
print(cl('y_train samples : ', attrs = ['bold']), y_train[0:20])
print(cl('y_test samples : ', attrs = ['bold']), y_test[0:20])

[1mx_train samples : [0m [[-1.11504743  1.03558276  0.80071244 -1.06039825  0.03262117  0.85342216
  -0.61424348 -3.23116112  1.53994798 -0.81690879 -1.30559201  0.1081772
  -0.85960958 -0.07193421  0.90665563 -1.72092961  0.79785322 -0.0067594
   1.95677806 -0.64489556  3.02038533 -0.53961798  0.03315649 -0.77494577
   0.10586781 -0.43085348  0.22973694 -0.0705913  -0.30145418]]
[1mx_test samples : [0m [[-0.32333357  1.05745525 -0.04834115 -0.60720431  1.25982115 -0.09176072
   1.1591015  -0.12433461 -0.17463954 -1.64440065 -1.11886302  0.20264731
   1.14596495 -1.80235956 -0.24717793 -0.06094535  0.84660574  0.37945439
   0.84726224  0.18640942 -0.20709827 -0.43389027 -0.26161328 -0.04665061
   0.2115123   0.00829721  0.10849443  0.16113917 -0.19330595]]
[1my_train samples : [0m [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[1my_test samples : [0m [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]


<a id='log_reg'></a><a href='#top'>Top</a>
<h2>Logistic Regression</h2>
Logistic Regresion is on of the simplest machine learning algorithms. Often used as setting up baseline model for further evaluation. The predicted parameters (trained weights) give inference about the importance of each feature. 
<br>

In [11]:
from sklearn.linear_model import LogisticRegression # Logistic regression algorithm
lr = LogisticRegression()
lr.fit(x_train, y_train)
lr_yhat = lr.predict(x_test)

<a id='iso_for'></a><a href='#top'>Top</a>
<h2>Isolation Forest Model</h2>
Isolation forest is generally used for Anomaly detection. Isolation tree works by spliting data into uniform intervals unitl we get single data point in one of the invertvals. Isolation trees are great for tabular data with anomalies

In [12]:
from sklearn.ensemble import IsolationForest # Isolation forest algorithm

outlier_fraction = len(df[df.Class == 1])/len(df[df.Class == 0])

ifc=IsolationForest(max_samples=len(x_train), contamination=outlier_fraction, random_state=1)
ifc.fit(x_train)
ifc_yhat = ifc.predict(x_test)

# Reshapre the prediction values to 0 for valid, 1 for fraud. 
# This is needed in order to be able to run metrics
ifc_yhat[ifc_yhat == 1] = 0
ifc_yhat[ifc_yhat == -1] = 1

<a href='#top'>Top</a>
<h2>Random Forests</h2><a id='ran_for'></a>
Logistic Regresion is on of the simplest machine learning algorithms. Often used as setting up baseline model for further evaluation. The predicted parameters (trained weights) give inference about the importance of each feature.

In [13]:
from sklearn.ensemble import RandomForestClassifier # Random forest tree algorithm
rf = RandomForestClassifier(max_depth = 4)
rf.fit(x_train, y_train)
rf_yhat = rf.predict(x_test)

<a id='knn'></a><a href='#top'>Top</a>
<h2>K-Nearest Neighbors</h2>
’KNeighborsClassifier’ algorithm and mentioned the ‘n_neighbors’ to be ‘5’. The value of the ‘n_neighbors’ is randomly selected but can be chosen optimistically through iterating a range of values, followed by fitting and storing the predicted values into the ‘knn_yhat’ variable.

In [14]:
from sklearn.neighbors import KNeighborsClassifier # KNN algorithm
n = 5
knn = KNeighborsClassifier(n_neighbors = n)
knn.fit(x_train, y_train)
knn_yhat = knn.predict(x_test)

<a id='svm'></a><a href='#top'>Top</a>
<h2>Support Vector Machine (SVM)</h2>
Support Vector Machine model using the ‘SVC’ algorithm and we didn’t mention anything inside the algorithm as we managed to use the default kernel which is the ‘rbf’ kernel. After that, we stored the predicted values into the ‘svm_yhat’ after fitting the model.

In [15]:
from sklearn.svm import SVC # SVM algorithm
svm = SVC()
svm.fit(x_train, y_train)
svm_yhat = svm.predict(x_test)

<a id='xgb'></a><a href='#top'>Top</a>
<h2>XGBoost</h2>
We built the model using the ‘XGBClassifier’ algorithm provided by the xgboost package. We mentioned the ‘max_depth’ to be 4 and finally, fitted and stored the predicted values into the ‘xgb_yhat’

In [16]:
# Install via command:
# conda install -c anaconda py-xgboost
from xgboost import XGBClassifier # XGBoost algorithm

xgb = XGBClassifier(max_depth = 4)
xgb.fit(x_train, y_train)
xgb_yhat = xgb.predict(x_test)

<a id='eva'></a><a href='#top'>Top</a>
<h2>Evaluation</h2>
In this process we are going to evaluate our built models using the evaluation metrics provided by the scikit-learn package. Our main objective in this process is to find the best model for our given case. The evaluation metrics we are going to use are the accuracy score metric, f1 score metric, and finally the confusion matrix.

In [17]:
from sklearn.metrics import confusion_matrix # evaluation metric
from sklearn.metrics import accuracy_score # evaluation metric
from sklearn.metrics import precision_score
from sklearn.metrics import recall_score
from sklearn.metrics import f1_score # evaluation metric

<a id='acc'></a><a href='#top'>Top</a><br>
<strong>Simple Accuracy</strong> is one of the most basic evaluation metrics. The accuracy score is calculated simply by dividing the number of correct predictions made by the model by the total number of predictions:
<br><br>
$Simple Accuracy = \frac{No. of correct predictions}{Total  no.of predictions} = \frac{True Positive + True Negative}{True Positive + True Negative + False Postivie + False Negative}$ 
<br><br>
The <span style="color:red;font-weight: bold">problem</span> with <strong>Simple Accuracy</strong> appears when data is unbalanced, like in this task where fraudulent transactions is only 0.17%. When the model would indicate that all the transactions are valid and totaly disregard fraudulent transactions, accuracy would be

all_ok_accuracy $ = \frac{TP + TN}{TP + TN + FP + FN} = \frac{0 + 284315}{0 + 284315 + 0 + 492}*100\% = 99.8\% $
<br><br>
This shows almost perfect accuracy but we haven't detected any fraud. I'm not sure whether the credit card issuer had this in their mind. 
<br><br>
Anyway lets's the accuracy for our models to see how they are doing against our "always valid" function:

In [18]:
print(cl('ACCURACY SCORE', attrs = ['bold']))
print(cl('------------------------------------------------------------------------', attrs = ['bold']))
print(cl('Accuracy score of the Logistic Regression model is {}%'.format(round(accuracy_score(y_test, lr_yhat) *100,1)), attrs = ['bold']))
print(cl('------------------------------------------------------------------------', attrs = ['bold']))
print(cl('Accuracy score of the Isolation Forest model is {}%'.format(round(accuracy_score(y_test, ifc_yhat) *100,1)), attrs = ['bold']))
print(cl('------------------------------------------------------------------------', attrs = ['bold']))
print(cl('Accuracy score of the Random Forest Tree model is {}%'.format(round(accuracy_score(y_test, rf_yhat) *100,1)), attrs = ['bold']))
print(cl('------------------------------------------------------------------------', attrs = ['bold']))
print(cl('Accuracy score of the KNN model is {}%'.format(round(accuracy_score(y_test, knn_yhat) *100,1)), attrs = ['bold'], color = 'green'))
print(cl('------------------------------------------------------------------------', attrs = ['bold']))
print(cl('Accuracy score of the SVM model is {}%'.format(round(accuracy_score(y_test, svm_yhat) *100,1)), attrs = ['bold']))
print(cl('------------------------------------------------------------------------', attrs = ['bold']))
print(cl('Accuracy score of the XGBoost model is {}%'.format(round(accuracy_score(y_test, xgb_yhat) *100,1)), attrs = ['bold']))
print(cl('------------------------------------------------------------------------', attrs = ['bold']))

[1mACCURACY SCORE[0m
[1m------------------------------------------------------------------------[0m
[1mAccuracy score of the Logistic Regression model is 99.9%[0m
[1m------------------------------------------------------------------------[0m
[1mAccuracy score of the Isolation Forest model is 99.8%[0m
[1m------------------------------------------------------------------------[0m
[1mAccuracy score of the Random Forest Tree model is 99.9%[0m
[1m------------------------------------------------------------------------[0m
[1m[32mAccuracy score of the KNN model is 100.0%[0m
[1m------------------------------------------------------------------------[0m
[1mAccuracy score of the SVM model is 99.9%[0m
[1m------------------------------------------------------------------------[0m
[1mAccuracy score of the XGBoost model is 99.9%[0m
[1m------------------------------------------------------------------------[0m


<strong>Our "alway valid" function is not bad!</strong> <br>
    <span style="color:red;font-weight: bold">Or is it???</span> Let's look at other most popular metrics <br>
***   


<a id='prc'></a><a href='#top'>Top</a><br>
<strong>PRECISION</strong> tries answer the question: Out of the all the emails, sent to the spam inbox, how many were actually spam?

$$ Precision = \frac{True Positive}{True Positive + False Positive} = \frac{Spam In Spam Folder}{Spam In Spam Folder + Not Spam In Spam Folder}$$

We <span style="color:green;font-weight: bold">CARE</span> about <strong>High Precision</strong> when we want to make sure that whatever we classify as positive, is definitely a positive and not a false positive. 
<br><br>
Good example is a spam detector:
<br>
We want to make sure that whatever is classified as SPAM (TRUE) is definitely as SPAM (TRUE POSITIVE). Because the risk of sending an important email to SPAM (FALSE POSITIVE) significantly out weights the hassle of getting some SPAM emails through our filter from time to time (FALSE NEGATIVE).

In [19]:
print(cl('PRECISION SCORE', attrs = ['bold']))
print(cl('------------------------------------------------------------------------', attrs = ['bold']))
print(cl('Precision score of the Logistic Regression model is {}%'.format(round(precision_score(y_test, lr_yhat) *100,1)), attrs = ['bold']))
print(cl('------------------------------------------------------------------------', attrs = ['bold']))
print(cl('Precision score of the Isolation Forest model is {}%'.format(round(precision_score(y_test, ifc_yhat) *100,1)), attrs = ['bold']))
print(cl('------------------------------------------------------------------------', attrs = ['bold']))
print(cl('Precision score of the Random Forest Tree model is {}%'.format(round(precision_score(y_test, rf_yhat) *100,1)), attrs = ['bold']))
print(cl('------------------------------------------------------------------------', attrs = ['bold']))
print(cl('Precision score of the KNN model is {}%'.format(round(precision_score(y_test, knn_yhat) *100,1)), attrs = ['bold'], color = 'green'))
print(cl('------------------------------------------------------------------------', attrs = ['bold']))
print(cl('Precision score of the SVM model is {}%'.format(round(precision_score(y_test, svm_yhat) *100,1)), attrs = ['bold']))
print(cl('------------------------------------------------------------------------', attrs = ['bold']))
print(cl('Precision score of the XGBoost model is {}%'.format(round(precision_score(y_test, xgb_yhat) *100,1)), attrs = ['bold']))
print(cl('------------------------------------------------------------------------', attrs = ['bold']))

[1mPRECISION SCORE[0m
[1m------------------------------------------------------------------------[0m
[1mPrecision score of the Logistic Regression model is 87.7%[0m
[1m------------------------------------------------------------------------[0m
[1mPrecision score of the Isolation Forest model is 31.6%[0m
[1m------------------------------------------------------------------------[0m
[1mPrecision score of the Random Forest Tree model is 90.8%[0m
[1m------------------------------------------------------------------------[0m
[1m[32mPrecision score of the KNN model is 92.0%[0m
[1m------------------------------------------------------------------------[0m
[1mPrecision score of the SVM model is 91.9%[0m
[1m------------------------------------------------------------------------[0m
[1mPrecision score of the XGBoost model is 89.9%[0m
[1m------------------------------------------------------------------------[0m


***
<a id='rec'></a><a href='#top'>Top</a><br>
<strong>RECALL</strong>  tres to answer the question: Out of the sick patients, how many did we correctly diagnose as sick?
<br><br>
$$ Recall = \frac{True Positive}{True Positive + False Negative} = \frac{Sick Patient DiagnosedAsSick}{Sick Patient
DiagnosedAsSick + Sick Patient DiagnosedAsHealthy}$$
<br>
We <span style="color:green;font-weight: bold">CARE</span> about <strong>High Recall</strong> when we want to make sure that we find as much postives as possible. 
<br><br>
Good example is a medical diagnostics:
<br>
We want to make sure that we find as much sick people (TRUE POSITIVE) as possible and we won't send back home without treatment anybody who is sick (FALSE NEGATIVE). So in this case we don't care too much if we diagnose as a sick person somebody who is healthy (FALSE POSTIVE) because it will cost us some extra checks. Whrease sending somebody who is sick without treament might be deadly for that person.


In [20]:
print(cl('RECALL SCORE', attrs = ['bold']))
print(cl('------------------------------------------------------------------------', attrs = ['bold']))
print(cl('Recall score of the Logistic Regression model is {}%'.format(round(recall_score(y_test, lr_yhat) *100,1)), attrs = ['bold']))
print(cl('------------------------------------------------------------------------', attrs = ['bold']))
print(cl('Recall score of the Isolation Forest model is {}%'.format(round(recall_score(y_test, ifc_yhat) *100,1)), attrs = ['bold']))
print(cl('------------------------------------------------------------------------', attrs = ['bold']))
print(cl('Recall score of the Random Forest Tree model is {}%'.format(round(recall_score(y_test, rf_yhat) *100,1)), attrs = ['bold']))
print(cl('------------------------------------------------------------------------', attrs = ['bold']))
print(cl('Recall score of the KNN model is {}%'.format(round(recall_score(y_test, knn_yhat) *100,1)), attrs = ['bold'], color = 'green'))
print(cl('------------------------------------------------------------------------', attrs = ['bold']))
print(cl('Recall score of the SVM model is {}%'.format(round(recall_score(y_test, svm_yhat) *100,1)), attrs = ['bold']))
print(cl('------------------------------------------------------------------------', attrs = ['bold']))
print(cl('Recall score of the XGBoost model is {}%'.format(round(recall_score(y_test, xgb_yhat) *100,1)), attrs = ['bold']))
print(cl('------------------------------------------------------------------------', attrs = ['bold']))

[1mRECALL SCORE[0m
[1m------------------------------------------------------------------------[0m
[1mRecall score of the Logistic Regression model is 63.4%[0m
[1m------------------------------------------------------------------------[0m
[1mRecall score of the Isolation Forest model is 30.7%[0m
[1m------------------------------------------------------------------------[0m
[1mRecall score of the Random Forest Tree model is 68.3%[0m
[1m------------------------------------------------------------------------[0m
[1m[32mRecall score of the KNN model is 80.2%[0m
[1m------------------------------------------------------------------------[0m
[1mRecall score of the SVM model is 67.3%[0m
[1m------------------------------------------------------------------------[0m
[1mRecall score of the XGBoost model is 79.2%[0m
[1m------------------------------------------------------------------------[0m


***
<a id='f1'></a><a href='#top'>Top</a><br>
<strong>F1 SCORE</strong> is one of the most basic evaluation metrics. The accuracy score is calculated simply by dividing the number of correct predictions made by the model by the total number of predictions:

In [21]:
print(cl('F1 SCORE', attrs = ['bold']))
print(cl('------------------------------------------------------------------------', attrs = ['bold']))
print(cl('F1 score of the Logistic Regression model is {}%'.format(round(f1_score(y_test, lr_yhat) *100,1)), attrs = ['bold']))
print(cl('------------------------------------------------------------------------', attrs = ['bold']))
print(cl('F1 score of the Isolation Forest model is {}%'.format(round(f1_score(y_test, ifc_yhat) *100,1)), attrs = ['bold']))
print(cl('------------------------------------------------------------------------', attrs = ['bold']))
print(cl('F1 score of the Random Forest Tree model is {}%'.format(round(f1_score(y_test, rf_yhat) *100,1)), attrs = ['bold']))
print(cl('------------------------------------------------------------------------', attrs = ['bold']))
print(cl('F1 score of the KNN model is {}%'.format(round(f1_score(y_test, knn_yhat) *100,1)), attrs = ['bold'], color = 'green'))
print(cl('------------------------------------------------------------------------', attrs = ['bold']))
print(cl('F1 score of the SVM model is {}%'.format(round(f1_score(y_test, svm_yhat) *100,1)), attrs = ['bold']))
print(cl('------------------------------------------------------------------------', attrs = ['bold']))
print(cl('F1 score of the XGBoost model is {}%'.format(round(f1_score(y_test, xgb_yhat) *100,1)), attrs = ['bold']))
print(cl('------------------------------------------------------------------------', attrs = ['bold']))


[1mF1 SCORE[0m
[1m------------------------------------------------------------------------[0m
[1mF1 score of the Logistic Regression model is 73.6%[0m
[1m------------------------------------------------------------------------[0m
[1mF1 score of the Isolation Forest model is 31.2%[0m
[1m------------------------------------------------------------------------[0m
[1mF1 score of the Random Forest Tree model is 78.0%[0m
[1m------------------------------------------------------------------------[0m
[1m[32mF1 score of the KNN model is 85.7%[0m
[1m------------------------------------------------------------------------[0m
[1mF1 score of the SVM model is 77.7%[0m
[1m------------------------------------------------------------------------[0m
[1mF1 score of the XGBoost model is 84.2%[0m
[1m------------------------------------------------------------------------[0m


***
<a id='eva_cv'></a><a href='#top'>Top</a>
<h2>CROSS VALIDATION</h2>

The models above are assessed based on particular split between training data and test data (see <a href='#split'>train_test_split</a>). The issue might be that we are lucky with our split and our models work well in our particular case.<br><br>
<strong>But what if we had a different split?</strong> Fortunately, <strong>Cross Validation</strong> comes to the rescue! <br><br> Cross validation essentially splits the training data into training data and validation data, trains the model and checks it against validation set. 
<br><br>Then the step is repeated multiple times (multiple experiments in figure below) with different split between test data and validation data. <br><br>
After multiple operations of splitting and validation, we get an array of metrics. From the array we will learn what is the worst, what is mean value of the metric. This gives us insight into how our model will perform in production which can be communicated to the decision makers.

<img src="cross_validation_diagram.png" alt="cross validation diagram" class="bg-primary" width="600px"> 

In [22]:
from sklearn.model_selection import cross_val_score

<a id='log_reg_cv'></a><a href='#top'>Top</a>
<h2>Logistic Regression - Cross Validation</h2>

In [23]:
lr = LogisticRegression()
lr_cv = np.array(cross_val_score(lr, x_train, y_train, cv=5, scoring = 'recall')) 

In [24]:
lr_cv.min(), lr_cv.mean()

(0.5512820512820513, 0.6113924050632911)

<a id='iso_for_cv'></a><a href='#top'>Top</a>
<h2>Isolation Forest Model - Cross Validation</h2>

In [25]:
# ifc=IsolationForest(max_samples=len(x_train), contamination=outlier_fraction, random_state=1)
# ifc_cv = np.array(cross_val_score(ifc, x_train, y_train, cv=5, scoring = 'recall')) 

In [26]:
# from sklearn.ensemble import IsolationForest # Isolation forest algorithm

# outlier_fraction = len(df[df.Class == 1])/len(df[df.Class == 0])

# ifc=IsolationForest(max_samples=len(x_train), contamination=outlier_fraction, random_state=1)
# ifc.fit(x_train)
# ifc_yhat = ifc.predict(x_test)

# # Reshapre the prediction values to 0 for valid, 1 for fraud. 
# # This is needed in order to be able to run metrics
# ifc_yhat[ifc_yhat == 1] = 0
# ifc_yhat[ifc_yhat == -1] = 1

<a href='#top'>Top</a>
<h2>Random Forests - Cross Validation</h2><a id='ran_for_cv'></a>

In [27]:
# from sklearn.ensemble import RandomForestClassifier # Random forest tree algorithm
rf = RandomForestClassifier(max_depth = 4)

rf_cv = np.array(cross_val_score(rf, x_train, y_train, cv=5, scoring = 'recall')) 


In [28]:
rf_cv

array([0.74358974, 0.64102564, 0.73076923, 0.75641026, 0.62025316])

<a id='knn_cv'></a><a href='#top'>Top</a>
<h2>K-Nearest Neighbors - Cross Validation</h2>

In [29]:

n = 5
knn = KNeighborsClassifier(n_neighbors = n)

knn_cv = np.array(cross_val_score(knn, x_train, y_train, cv=5, scoring = 'recall')) 
# Some of the relevant metrics for this work are:
# accuracy
# precision
# recall
# f1 
# List of metrics are available here:
# https://scikit-learn.org/stable/modules/model_evaluation.html#scoring-parameter

In [30]:
knn_cv.min(), knn_cv.mean()

(0.7307692307692307, 0.7596234988640052)

<a id='svm_cv'></a><a href='#top'>Top</a>
<h2>Support Vector Machine (SVM) - Cross Validation</h2>

In [31]:
# from sklearn.svm import SVC # SVM algorithm
svm = SVC()
svm_cv = np.array(cross_val_score(svm, x_train, y_train, cv=5, scoring = 'recall')) 

In [32]:
svm_cv.min(), svm_cv.mean()

(0.6153846153846154, 0.675267770204479)

<a id='xgb_cv'></a><a href='#top'>Top</a>
<h2>XGBoost - Cross Validation</h2>

In [33]:
# Install via command:
# conda install -c anaconda py-xgboost
# from xgboost import XGBClassifier # XGBoost algorithm

xgb = XGBClassifier(max_depth = 4)
xgb_cv = np.array(cross_val_score(xgb, x_train, y_train, cv=5, scoring = 'recall')) 

In [34]:
xgb_cv.min(), xgb_cv.mean()

(0.6923076923076923, 0.7825381369685166)

***
<a id='grid_cv'></a><a href='#top'>Top</a>
<h2>GRID SEARCH CV</h2>

GridSearchCV- Select the best hyperparameter for any Classification Model <br>
https://www.youtube.com/watch?v=CgmvAMiVKFE&ab_channel=KrishNaik

tooltip (shifti+tab)doesn't work

<a id='log_reg_gs_cv'></a><a href='#top'>Top</a>
<h2>Logistic Regression - GridSearchCV</h2>

In [35]:
from sklearn.model_selection import GridSearchCV

parameters={"C":np.logspace(-3,3,7), "penalty":["l1","l2"]}
lr_gs_cv=GridSearchCV(estimator = lr, param_grid = parameters, scoring = 'recall',cv=10, n_jobs = -1)
lr_gs_cv.fit(x_train,y_train)

GridSearchCV(cv=10, estimator=LogisticRegression(), n_jobs=-1,
             param_grid={'C': array([1.e-03, 1.e-02, 1.e-01, 1.e+00, 1.e+01, 1.e+02, 1.e+03]),
                         'penalty': ['l1', 'l2']},
             scoring='recall')

In [36]:
lr_opt_recall = lr_gs_cv.best_score_
lr_opt_recall_param = lr_gs_cv.best_params_ 

In [37]:
lr_opt_recall

0.6085897435897435

<a id='iso_for_gs_cv'></a><a href='#top'>Top</a>
<h2>Isolation Forest Model - GridSearchCV</h2>

In [38]:
# https://scikit-learn.org/stable/modules/model_evaluation.html#implementing-your-own-scoring-object
# https://stackoverflow.com/questions/58186702/using-gridsearchcv-with-isolationforest-for-finding-outliers
# 
# tuned = {'n_estimators':[70,80], 'max_samples':['auto'],
#      'contamination':['legacy'], 'max_features':[1],
#      'bootstrap':[True], 'n_jobs':[None,1,2],
#      'random_state':[None,1,], 'verbose':[0,1,2], 'warm_start':[True]}  

# tuned = {'n_estimators':[70,80]} # - this works
# tuned = {'max_samples':['auto']} # - this works
# tuned = {'contamination':['legacy']} # - TypeError: can't multiply sequence by non-int of type 'float'
# tuned = {'max_features':[1]} # - this works
# tuned = {'bootstrap':[True]} # - this works
# tuned = {'n_jobs':[None,1,2]} # - this works
# tuned = {'n_jobs':[-1]} # - this works
# tuned = {'random_state':[None,1,]} # - this works
# tuned = {'verbose':[0,1,2]} # - this works
# tuned = {'warm_start':[True]} # - this works

tuned = {'n_estimators':[70,80], 'max_samples':['auto'],
         'max_features':[1],'bootstrap':[True], 'n_jobs':[None,1,2],
         'random_state':[None,1,], 'verbose':[0,1,2], 'warm_start':[True]}  

def scorer_f(estimator, X):   #your own scorer
      return np.mean(estimator.score_samples(X))

isolation_forest = GridSearchCV(IsolationForest(), tuned, scoring=scorer_f)
model = isolation_forest.fit(x_train)

[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.2s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.2s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.2s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.2s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.2s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


Building estimator 1 of 70 for this parallel run (total 70)...
Building estimator 2 of 70 for this parallel run (total 70)...
Building estimator 3 of 70 for this parallel run (total 70)...
Building estimator 4 of 70 for this parallel run (total 70)...
Building estimator 5 of 70 for this parallel run (total 70)...
Building estimator 6 of 70 for this parallel run (total 70)...
Building estimator 7 of 70 for this parallel run (total 70)...
Building estimator 8 of 70 for this parallel run (total 70)...
Building estimator 9 of 70 for this parallel run (total 70)...
Building estimator 10 of 70 for this parallel run (total 70)...
Building estimator 11 of 70 for this parallel run (total 70)...
Building estimator 12 of 70 for this parallel run (total 70)...
Building estimator 13 of 70 for this parallel run (total 70)...
Building estimator 14 of 70 for this parallel run (total 70)...
Building estimator 15 of 70 for this parallel run (total 70)...
Building estimator 16 of 70 for this parallel run

[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.2s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.2s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


Building estimator 1 of 70 for this parallel run (total 70)...
Building estimator 2 of 70 for this parallel run (total 70)...
Building estimator 3 of 70 for this parallel run (total 70)...
Building estimator 4 of 70 for this parallel run (total 70)...
Building estimator 5 of 70 for this parallel run (total 70)...
Building estimator 6 of 70 for this parallel run (total 70)...
Building estimator 7 of 70 for this parallel run (total 70)...
Building estimator 8 of 70 for this parallel run (total 70)...
Building estimator 9 of 70 for this parallel run (total 70)...
Building estimator 10 of 70 for this parallel run (total 70)...
Building estimator 11 of 70 for this parallel run (total 70)...
Building estimator 12 of 70 for this parallel run (total 70)...
Building estimator 13 of 70 for this parallel run (total 70)...
Building estimator 14 of 70 for this parallel run (total 70)...
Building estimator 15 of 70 for this parallel run (total 70)...
Building estimator 16 of 70 for this parallel run

[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.2s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.2s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


Building estimator 1 of 70 for this parallel run (total 70)...
Building estimator 2 of 70 for this parallel run (total 70)...
Building estimator 3 of 70 for this parallel run (total 70)...
Building estimator 4 of 70 for this parallel run (total 70)...
Building estimator 5 of 70 for this parallel run (total 70)...
Building estimator 6 of 70 for this parallel run (total 70)...
Building estimator 7 of 70 for this parallel run (total 70)...
Building estimator 8 of 70 for this parallel run (total 70)...
Building estimator 9 of 70 for this parallel run (total 70)...
Building estimator 10 of 70 for this parallel run (total 70)...
Building estimator 11 of 70 for this parallel run (total 70)...
Building estimator 12 of 70 for this parallel run (total 70)...
Building estimator 13 of 70 for this parallel run (total 70)...
Building estimator 14 of 70 for this parallel run (total 70)...
Building estimator 15 of 70 for this parallel run (total 70)...
Building estimator 16 of 70 for this parallel run

[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.3s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.3s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


Building estimator 1 of 70 for this parallel run (total 70)...
Building estimator 2 of 70 for this parallel run (total 70)...
Building estimator 3 of 70 for this parallel run (total 70)...
Building estimator 4 of 70 for this parallel run (total 70)...
Building estimator 5 of 70 for this parallel run (total 70)...
Building estimator 6 of 70 for this parallel run (total 70)...
Building estimator 7 of 70 for this parallel run (total 70)...
Building estimator 8 of 70 for this parallel run (total 70)...
Building estimator 9 of 70 for this parallel run (total 70)...
Building estimator 10 of 70 for this parallel run (total 70)...
Building estimator 11 of 70 for this parallel run (total 70)...
Building estimator 12 of 70 for this parallel run (total 70)...
Building estimator 13 of 70 for this parallel run (total 70)...
Building estimator 14 of 70 for this parallel run (total 70)...
Building estimator 15 of 70 for this parallel run (total 70)...
Building estimator 16 of 70 for this parallel run

[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.2s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.2s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


Building estimator 1 of 70 for this parallel run (total 70)...
Building estimator 2 of 70 for this parallel run (total 70)...
Building estimator 3 of 70 for this parallel run (total 70)...
Building estimator 4 of 70 for this parallel run (total 70)...
Building estimator 5 of 70 for this parallel run (total 70)...
Building estimator 6 of 70 for this parallel run (total 70)...
Building estimator 7 of 70 for this parallel run (total 70)...
Building estimator 8 of 70 for this parallel run (total 70)...
Building estimator 9 of 70 for this parallel run (total 70)...
Building estimator 10 of 70 for this parallel run (total 70)...
Building estimator 11 of 70 for this parallel run (total 70)...
Building estimator 12 of 70 for this parallel run (total 70)...
Building estimator 13 of 70 for this parallel run (total 70)...
Building estimator 14 of 70 for this parallel run (total 70)...
Building estimator 15 of 70 for this parallel run (total 70)...
Building estimator 16 of 70 for this parallel run

[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.2s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.2s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.2s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.2s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.2s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.2s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.2s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


Building estimator 1 of 70 for this parallel run (total 70)...
Building estimator 2 of 70 for this parallel run (total 70)...
Building estimator 3 of 70 for this parallel run (total 70)...
Building estimator 4 of 70 for this parallel run (total 70)...
Building estimator 5 of 70 for this parallel run (total 70)...
Building estimator 6 of 70 for this parallel run (total 70)...
Building estimator 7 of 70 for this parallel run (total 70)...
Building estimator 8 of 70 for this parallel run (total 70)...
Building estimator 9 of 70 for this parallel run (total 70)...
Building estimator 10 of 70 for this parallel run (total 70)...
Building estimator 11 of 70 for this parallel run (total 70)...
Building estimator 12 of 70 for this parallel run (total 70)...
Building estimator 13 of 70 for this parallel run (total 70)...
Building estimator 14 of 70 for this parallel run (total 70)...
Building estimator 15 of 70 for this parallel run (total 70)...
Building estimator 16 of 70 for this parallel run

[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.2s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.2s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


Building estimator 1 of 70 for this parallel run (total 70)...
Building estimator 2 of 70 for this parallel run (total 70)...
Building estimator 3 of 70 for this parallel run (total 70)...
Building estimator 4 of 70 for this parallel run (total 70)...
Building estimator 5 of 70 for this parallel run (total 70)...
Building estimator 6 of 70 for this parallel run (total 70)...
Building estimator 7 of 70 for this parallel run (total 70)...
Building estimator 8 of 70 for this parallel run (total 70)...
Building estimator 9 of 70 for this parallel run (total 70)...
Building estimator 10 of 70 for this parallel run (total 70)...
Building estimator 11 of 70 for this parallel run (total 70)...
Building estimator 12 of 70 for this parallel run (total 70)...
Building estimator 13 of 70 for this parallel run (total 70)...
Building estimator 14 of 70 for this parallel run (total 70)...
Building estimator 15 of 70 for this parallel run (total 70)...
Building estimator 16 of 70 for this parallel run

[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.2s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.2s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


Building estimator 1 of 70 for this parallel run (total 70)...
Building estimator 2 of 70 for this parallel run (total 70)...
Building estimator 3 of 70 for this parallel run (total 70)...
Building estimator 4 of 70 for this parallel run (total 70)...
Building estimator 5 of 70 for this parallel run (total 70)...
Building estimator 6 of 70 for this parallel run (total 70)...
Building estimator 7 of 70 for this parallel run (total 70)...
Building estimator 8 of 70 for this parallel run (total 70)...
Building estimator 9 of 70 for this parallel run (total 70)...
Building estimator 10 of 70 for this parallel run (total 70)...
Building estimator 11 of 70 for this parallel run (total 70)...
Building estimator 12 of 70 for this parallel run (total 70)...
Building estimator 13 of 70 for this parallel run (total 70)...
Building estimator 14 of 70 for this parallel run (total 70)...
Building estimator 15 of 70 for this parallel run (total 70)...
Building estimator 16 of 70 for this parallel run

[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.2s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.2s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


Building estimator 1 of 70 for this parallel run (total 70)...
Building estimator 2 of 70 for this parallel run (total 70)...
Building estimator 3 of 70 for this parallel run (total 70)...
Building estimator 4 of 70 for this parallel run (total 70)...
Building estimator 5 of 70 for this parallel run (total 70)...
Building estimator 6 of 70 for this parallel run (total 70)...
Building estimator 7 of 70 for this parallel run (total 70)...
Building estimator 8 of 70 for this parallel run (total 70)...
Building estimator 9 of 70 for this parallel run (total 70)...
Building estimator 10 of 70 for this parallel run (total 70)...
Building estimator 11 of 70 for this parallel run (total 70)...
Building estimator 12 of 70 for this parallel run (total 70)...
Building estimator 13 of 70 for this parallel run (total 70)...
Building estimator 14 of 70 for this parallel run (total 70)...
Building estimator 15 of 70 for this parallel run (total 70)...
Building estimator 16 of 70 for this parallel run

[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.2s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.2s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


Building estimator 1 of 70 for this parallel run (total 70)...
Building estimator 2 of 70 for this parallel run (total 70)...
Building estimator 3 of 70 for this parallel run (total 70)...
Building estimator 4 of 70 for this parallel run (total 70)...
Building estimator 5 of 70 for this parallel run (total 70)...
Building estimator 6 of 70 for this parallel run (total 70)...
Building estimator 7 of 70 for this parallel run (total 70)...
Building estimator 8 of 70 for this parallel run (total 70)...
Building estimator 9 of 70 for this parallel run (total 70)...
Building estimator 10 of 70 for this parallel run (total 70)...
Building estimator 11 of 70 for this parallel run (total 70)...
Building estimator 12 of 70 for this parallel run (total 70)...
Building estimator 13 of 70 for this parallel run (total 70)...
Building estimator 14 of 70 for this parallel run (total 70)...
Building estimator 15 of 70 for this parallel run (total 70)...
Building estimator 16 of 70 for this parallel run

[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.2s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.2s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.2s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.2s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.2s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.2s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.2s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


Building estimator 1 of 70 for this parallel run (total 70)...
Building estimator 2 of 70 for this parallel run (total 70)...
Building estimator 3 of 70 for this parallel run (total 70)...
Building estimator 4 of 70 for this parallel run (total 70)...
Building estimator 5 of 70 for this parallel run (total 70)...
Building estimator 6 of 70 for this parallel run (total 70)...
Building estimator 7 of 70 for this parallel run (total 70)...
Building estimator 8 of 70 for this parallel run (total 70)...
Building estimator 9 of 70 for this parallel run (total 70)...
Building estimator 10 of 70 for this parallel run (total 70)...
Building estimator 11 of 70 for this parallel run (total 70)...
Building estimator 12 of 70 for this parallel run (total 70)...
Building estimator 13 of 70 for this parallel run (total 70)...
Building estimator 14 of 70 for this parallel run (total 70)...
Building estimator 15 of 70 for this parallel run (total 70)...
Building estimator 16 of 70 for this parallel run

[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.2s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.2s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


Building estimator 1 of 70 for this parallel run (total 70)...
Building estimator 2 of 70 for this parallel run (total 70)...
Building estimator 3 of 70 for this parallel run (total 70)...
Building estimator 4 of 70 for this parallel run (total 70)...
Building estimator 5 of 70 for this parallel run (total 70)...
Building estimator 6 of 70 for this parallel run (total 70)...
Building estimator 7 of 70 for this parallel run (total 70)...
Building estimator 8 of 70 for this parallel run (total 70)...
Building estimator 9 of 70 for this parallel run (total 70)...
Building estimator 10 of 70 for this parallel run (total 70)...
Building estimator 11 of 70 for this parallel run (total 70)...
Building estimator 12 of 70 for this parallel run (total 70)...
Building estimator 13 of 70 for this parallel run (total 70)...
Building estimator 14 of 70 for this parallel run (total 70)...
Building estimator 15 of 70 for this parallel run (total 70)...
Building estimator 16 of 70 for this parallel run

[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.2s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.2s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


Building estimator 1 of 70 for this parallel run (total 70)...
Building estimator 2 of 70 for this parallel run (total 70)...
Building estimator 3 of 70 for this parallel run (total 70)...
Building estimator 4 of 70 for this parallel run (total 70)...
Building estimator 5 of 70 for this parallel run (total 70)...
Building estimator 6 of 70 for this parallel run (total 70)...
Building estimator 7 of 70 for this parallel run (total 70)...
Building estimator 8 of 70 for this parallel run (total 70)...
Building estimator 9 of 70 for this parallel run (total 70)...
Building estimator 10 of 70 for this parallel run (total 70)...
Building estimator 11 of 70 for this parallel run (total 70)...
Building estimator 12 of 70 for this parallel run (total 70)...
Building estimator 13 of 70 for this parallel run (total 70)...
Building estimator 14 of 70 for this parallel run (total 70)...
Building estimator 15 of 70 for this parallel run (total 70)...
Building estimator 16 of 70 for this parallel run

[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.2s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.2s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


Building estimator 1 of 70 for this parallel run (total 70)...
Building estimator 2 of 70 for this parallel run (total 70)...
Building estimator 3 of 70 for this parallel run (total 70)...
Building estimator 4 of 70 for this parallel run (total 70)...
Building estimator 5 of 70 for this parallel run (total 70)...
Building estimator 6 of 70 for this parallel run (total 70)...
Building estimator 7 of 70 for this parallel run (total 70)...
Building estimator 8 of 70 for this parallel run (total 70)...
Building estimator 9 of 70 for this parallel run (total 70)...
Building estimator 10 of 70 for this parallel run (total 70)...
Building estimator 11 of 70 for this parallel run (total 70)...
Building estimator 12 of 70 for this parallel run (total 70)...
Building estimator 13 of 70 for this parallel run (total 70)...
Building estimator 14 of 70 for this parallel run (total 70)...
Building estimator 15 of 70 for this parallel run (total 70)...
Building estimator 16 of 70 for this parallel run

[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.2s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.2s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


Building estimator 1 of 70 for this parallel run (total 70)...
Building estimator 2 of 70 for this parallel run (total 70)...
Building estimator 3 of 70 for this parallel run (total 70)...
Building estimator 4 of 70 for this parallel run (total 70)...
Building estimator 5 of 70 for this parallel run (total 70)...
Building estimator 6 of 70 for this parallel run (total 70)...
Building estimator 7 of 70 for this parallel run (total 70)...
Building estimator 8 of 70 for this parallel run (total 70)...
Building estimator 9 of 70 for this parallel run (total 70)...
Building estimator 10 of 70 for this parallel run (total 70)...
Building estimator 11 of 70 for this parallel run (total 70)...
Building estimator 12 of 70 for this parallel run (total 70)...
Building estimator 13 of 70 for this parallel run (total 70)...
Building estimator 14 of 70 for this parallel run (total 70)...
Building estimator 15 of 70 for this parallel run (total 70)...
Building estimator 16 of 70 for this parallel run

[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.2s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.2s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.2s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.2s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.2s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.2s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.2s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


Building estimator 1 of 70 for this parallel run (total 70)...
Building estimator 2 of 70 for this parallel run (total 70)...
Building estimator 3 of 70 for this parallel run (total 70)...
Building estimator 4 of 70 for this parallel run (total 70)...
Building estimator 5 of 70 for this parallel run (total 70)...
Building estimator 6 of 70 for this parallel run (total 70)...
Building estimator 7 of 70 for this parallel run (total 70)...
Building estimator 8 of 70 for this parallel run (total 70)...
Building estimator 9 of 70 for this parallel run (total 70)...
Building estimator 10 of 70 for this parallel run (total 70)...
Building estimator 11 of 70 for this parallel run (total 70)...
Building estimator 12 of 70 for this parallel run (total 70)...
Building estimator 13 of 70 for this parallel run (total 70)...
Building estimator 14 of 70 for this parallel run (total 70)...
Building estimator 15 of 70 for this parallel run (total 70)...
Building estimator 16 of 70 for this parallel run

[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.2s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.2s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


Building estimator 1 of 70 for this parallel run (total 70)...
Building estimator 2 of 70 for this parallel run (total 70)...
Building estimator 3 of 70 for this parallel run (total 70)...
Building estimator 4 of 70 for this parallel run (total 70)...
Building estimator 5 of 70 for this parallel run (total 70)...
Building estimator 6 of 70 for this parallel run (total 70)...
Building estimator 7 of 70 for this parallel run (total 70)...
Building estimator 8 of 70 for this parallel run (total 70)...
Building estimator 9 of 70 for this parallel run (total 70)...
Building estimator 10 of 70 for this parallel run (total 70)...
Building estimator 11 of 70 for this parallel run (total 70)...
Building estimator 12 of 70 for this parallel run (total 70)...
Building estimator 13 of 70 for this parallel run (total 70)...
Building estimator 14 of 70 for this parallel run (total 70)...
Building estimator 15 of 70 for this parallel run (total 70)...
Building estimator 16 of 70 for this parallel run

[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.2s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.2s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


Building estimator 1 of 70 for this parallel run (total 70)...
Building estimator 2 of 70 for this parallel run (total 70)...
Building estimator 3 of 70 for this parallel run (total 70)...
Building estimator 4 of 70 for this parallel run (total 70)...
Building estimator 5 of 70 for this parallel run (total 70)...
Building estimator 6 of 70 for this parallel run (total 70)...
Building estimator 7 of 70 for this parallel run (total 70)...
Building estimator 8 of 70 for this parallel run (total 70)...
Building estimator 9 of 70 for this parallel run (total 70)...
Building estimator 10 of 70 for this parallel run (total 70)...
Building estimator 11 of 70 for this parallel run (total 70)...
Building estimator 12 of 70 for this parallel run (total 70)...
Building estimator 13 of 70 for this parallel run (total 70)...
Building estimator 14 of 70 for this parallel run (total 70)...
Building estimator 15 of 70 for this parallel run (total 70)...
Building estimator 16 of 70 for this parallel run

[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.2s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.2s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


Building estimator 1 of 70 for this parallel run (total 70)...
Building estimator 2 of 70 for this parallel run (total 70)...
Building estimator 3 of 70 for this parallel run (total 70)...
Building estimator 4 of 70 for this parallel run (total 70)...
Building estimator 5 of 70 for this parallel run (total 70)...
Building estimator 6 of 70 for this parallel run (total 70)...
Building estimator 7 of 70 for this parallel run (total 70)...
Building estimator 8 of 70 for this parallel run (total 70)...
Building estimator 9 of 70 for this parallel run (total 70)...
Building estimator 10 of 70 for this parallel run (total 70)...
Building estimator 11 of 70 for this parallel run (total 70)...
Building estimator 12 of 70 for this parallel run (total 70)...
Building estimator 13 of 70 for this parallel run (total 70)...
Building estimator 14 of 70 for this parallel run (total 70)...
Building estimator 15 of 70 for this parallel run (total 70)...
Building estimator 16 of 70 for this parallel run

[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.2s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.2s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


Building estimator 1 of 70 for this parallel run (total 70)...
Building estimator 2 of 70 for this parallel run (total 70)...
Building estimator 3 of 70 for this parallel run (total 70)...
Building estimator 4 of 70 for this parallel run (total 70)...
Building estimator 5 of 70 for this parallel run (total 70)...
Building estimator 6 of 70 for this parallel run (total 70)...
Building estimator 7 of 70 for this parallel run (total 70)...
Building estimator 8 of 70 for this parallel run (total 70)...
Building estimator 9 of 70 for this parallel run (total 70)...
Building estimator 10 of 70 for this parallel run (total 70)...
Building estimator 11 of 70 for this parallel run (total 70)...
Building estimator 12 of 70 for this parallel run (total 70)...
Building estimator 13 of 70 for this parallel run (total 70)...
Building estimator 14 of 70 for this parallel run (total 70)...
Building estimator 15 of 70 for this parallel run (total 70)...
Building estimator 16 of 70 for this parallel run

[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.2s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.2s finished
[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done   2 out of   2 | elapsed:    0.1s remaining:    0.0s
[Parallel(n_jobs=2)]: Done   2 out of   2 | elapsed:    0.1s finished
[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done   2 out of   2 | elapsed:    0.1s remaining:    0.0s
[Parallel(n_jobs=2)]: Done   2 out of   2 | elapsed:    0.1s finished
[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done   2 out of   2 | elapsed:    0.1s remaining:    0.0s
[Parallel(n_jobs=2)]: Done   2 out of   2 | elapsed:    0.1s finished
[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done   2 out of   2 | elapsed:    0.1s remaining:    0.0s
[Parallel(n_jobs=2)]

Building estimator 1 of 80 for this parallel run (total 80)...
Building estimator 2 of 80 for this parallel run (total 80)...
Building estimator 3 of 80 for this parallel run (total 80)...
Building estimator 4 of 80 for this parallel run (total 80)...
Building estimator 5 of 80 for this parallel run (total 80)...
Building estimator 6 of 80 for this parallel run (total 80)...
Building estimator 7 of 80 for this parallel run (total 80)...
Building estimator 8 of 80 for this parallel run (total 80)...
Building estimator 9 of 80 for this parallel run (total 80)...
Building estimator 10 of 80 for this parallel run (total 80)...
Building estimator 11 of 80 for this parallel run (total 80)...
Building estimator 12 of 80 for this parallel run (total 80)...
Building estimator 13 of 80 for this parallel run (total 80)...
Building estimator 14 of 80 for this parallel run (total 80)...
Building estimator 15 of 80 for this parallel run (total 80)...
Building estimator 16 of 80 for this parallel run

[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.3s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.3s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


Building estimator 1 of 80 for this parallel run (total 80)...
Building estimator 2 of 80 for this parallel run (total 80)...
Building estimator 3 of 80 for this parallel run (total 80)...
Building estimator 4 of 80 for this parallel run (total 80)...
Building estimator 5 of 80 for this parallel run (total 80)...
Building estimator 6 of 80 for this parallel run (total 80)...
Building estimator 7 of 80 for this parallel run (total 80)...
Building estimator 8 of 80 for this parallel run (total 80)...
Building estimator 9 of 80 for this parallel run (total 80)...
Building estimator 10 of 80 for this parallel run (total 80)...
Building estimator 11 of 80 for this parallel run (total 80)...
Building estimator 12 of 80 for this parallel run (total 80)...
Building estimator 13 of 80 for this parallel run (total 80)...
Building estimator 14 of 80 for this parallel run (total 80)...
Building estimator 15 of 80 for this parallel run (total 80)...
Building estimator 16 of 80 for this parallel run

[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.3s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.3s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


Building estimator 1 of 80 for this parallel run (total 80)...
Building estimator 2 of 80 for this parallel run (total 80)...
Building estimator 3 of 80 for this parallel run (total 80)...
Building estimator 4 of 80 for this parallel run (total 80)...
Building estimator 5 of 80 for this parallel run (total 80)...
Building estimator 6 of 80 for this parallel run (total 80)...
Building estimator 7 of 80 for this parallel run (total 80)...
Building estimator 8 of 80 for this parallel run (total 80)...
Building estimator 9 of 80 for this parallel run (total 80)...
Building estimator 10 of 80 for this parallel run (total 80)...
Building estimator 11 of 80 for this parallel run (total 80)...
Building estimator 12 of 80 for this parallel run (total 80)...
Building estimator 13 of 80 for this parallel run (total 80)...
Building estimator 14 of 80 for this parallel run (total 80)...
Building estimator 15 of 80 for this parallel run (total 80)...
Building estimator 16 of 80 for this parallel run

[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.3s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.3s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


Building estimator 1 of 80 for this parallel run (total 80)...
Building estimator 2 of 80 for this parallel run (total 80)...
Building estimator 3 of 80 for this parallel run (total 80)...
Building estimator 4 of 80 for this parallel run (total 80)...
Building estimator 5 of 80 for this parallel run (total 80)...
Building estimator 6 of 80 for this parallel run (total 80)...
Building estimator 7 of 80 for this parallel run (total 80)...
Building estimator 8 of 80 for this parallel run (total 80)...
Building estimator 9 of 80 for this parallel run (total 80)...
Building estimator 10 of 80 for this parallel run (total 80)...
Building estimator 11 of 80 for this parallel run (total 80)...
Building estimator 12 of 80 for this parallel run (total 80)...
Building estimator 13 of 80 for this parallel run (total 80)...
Building estimator 14 of 80 for this parallel run (total 80)...
Building estimator 15 of 80 for this parallel run (total 80)...
Building estimator 16 of 80 for this parallel run

[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.3s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.3s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


Building estimator 1 of 80 for this parallel run (total 80)...
Building estimator 2 of 80 for this parallel run (total 80)...
Building estimator 3 of 80 for this parallel run (total 80)...
Building estimator 4 of 80 for this parallel run (total 80)...
Building estimator 5 of 80 for this parallel run (total 80)...
Building estimator 6 of 80 for this parallel run (total 80)...
Building estimator 7 of 80 for this parallel run (total 80)...
Building estimator 8 of 80 for this parallel run (total 80)...
Building estimator 9 of 80 for this parallel run (total 80)...
Building estimator 10 of 80 for this parallel run (total 80)...
Building estimator 11 of 80 for this parallel run (total 80)...
Building estimator 12 of 80 for this parallel run (total 80)...
Building estimator 13 of 80 for this parallel run (total 80)...
Building estimator 14 of 80 for this parallel run (total 80)...
Building estimator 15 of 80 for this parallel run (total 80)...
Building estimator 16 of 80 for this parallel run

[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.3s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.3s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.2s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.3s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.2s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.2s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.2s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


Building estimator 1 of 80 for this parallel run (total 80)...
Building estimator 2 of 80 for this parallel run (total 80)...
Building estimator 3 of 80 for this parallel run (total 80)...
Building estimator 4 of 80 for this parallel run (total 80)...
Building estimator 5 of 80 for this parallel run (total 80)...
Building estimator 6 of 80 for this parallel run (total 80)...
Building estimator 7 of 80 for this parallel run (total 80)...
Building estimator 8 of 80 for this parallel run (total 80)...
Building estimator 9 of 80 for this parallel run (total 80)...
Building estimator 10 of 80 for this parallel run (total 80)...
Building estimator 11 of 80 for this parallel run (total 80)...
Building estimator 12 of 80 for this parallel run (total 80)...
Building estimator 13 of 80 for this parallel run (total 80)...
Building estimator 14 of 80 for this parallel run (total 80)...
Building estimator 15 of 80 for this parallel run (total 80)...
Building estimator 16 of 80 for this parallel run

[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.3s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.3s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


Building estimator 1 of 80 for this parallel run (total 80)...
Building estimator 2 of 80 for this parallel run (total 80)...
Building estimator 3 of 80 for this parallel run (total 80)...
Building estimator 4 of 80 for this parallel run (total 80)...
Building estimator 5 of 80 for this parallel run (total 80)...
Building estimator 6 of 80 for this parallel run (total 80)...
Building estimator 7 of 80 for this parallel run (total 80)...
Building estimator 8 of 80 for this parallel run (total 80)...
Building estimator 9 of 80 for this parallel run (total 80)...
Building estimator 10 of 80 for this parallel run (total 80)...
Building estimator 11 of 80 for this parallel run (total 80)...
Building estimator 12 of 80 for this parallel run (total 80)...
Building estimator 13 of 80 for this parallel run (total 80)...
Building estimator 14 of 80 for this parallel run (total 80)...
Building estimator 15 of 80 for this parallel run (total 80)...
Building estimator 16 of 80 for this parallel run

[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.2s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.2s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


Building estimator 1 of 80 for this parallel run (total 80)...
Building estimator 2 of 80 for this parallel run (total 80)...
Building estimator 3 of 80 for this parallel run (total 80)...
Building estimator 4 of 80 for this parallel run (total 80)...
Building estimator 5 of 80 for this parallel run (total 80)...
Building estimator 6 of 80 for this parallel run (total 80)...
Building estimator 7 of 80 for this parallel run (total 80)...
Building estimator 8 of 80 for this parallel run (total 80)...
Building estimator 9 of 80 for this parallel run (total 80)...
Building estimator 10 of 80 for this parallel run (total 80)...
Building estimator 11 of 80 for this parallel run (total 80)...
Building estimator 12 of 80 for this parallel run (total 80)...
Building estimator 13 of 80 for this parallel run (total 80)...
Building estimator 14 of 80 for this parallel run (total 80)...
Building estimator 15 of 80 for this parallel run (total 80)...
Building estimator 16 of 80 for this parallel run

[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.3s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.3s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


Building estimator 1 of 80 for this parallel run (total 80)...
Building estimator 2 of 80 for this parallel run (total 80)...
Building estimator 3 of 80 for this parallel run (total 80)...
Building estimator 4 of 80 for this parallel run (total 80)...
Building estimator 5 of 80 for this parallel run (total 80)...
Building estimator 6 of 80 for this parallel run (total 80)...
Building estimator 7 of 80 for this parallel run (total 80)...
Building estimator 8 of 80 for this parallel run (total 80)...
Building estimator 9 of 80 for this parallel run (total 80)...
Building estimator 10 of 80 for this parallel run (total 80)...
Building estimator 11 of 80 for this parallel run (total 80)...
Building estimator 12 of 80 for this parallel run (total 80)...
Building estimator 13 of 80 for this parallel run (total 80)...
Building estimator 14 of 80 for this parallel run (total 80)...
Building estimator 15 of 80 for this parallel run (total 80)...
Building estimator 16 of 80 for this parallel run

[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.3s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.3s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


Building estimator 1 of 80 for this parallel run (total 80)...
Building estimator 2 of 80 for this parallel run (total 80)...
Building estimator 3 of 80 for this parallel run (total 80)...
Building estimator 4 of 80 for this parallel run (total 80)...
Building estimator 5 of 80 for this parallel run (total 80)...
Building estimator 6 of 80 for this parallel run (total 80)...
Building estimator 7 of 80 for this parallel run (total 80)...
Building estimator 8 of 80 for this parallel run (total 80)...
Building estimator 9 of 80 for this parallel run (total 80)...
Building estimator 10 of 80 for this parallel run (total 80)...
Building estimator 11 of 80 for this parallel run (total 80)...
Building estimator 12 of 80 for this parallel run (total 80)...
Building estimator 13 of 80 for this parallel run (total 80)...
Building estimator 14 of 80 for this parallel run (total 80)...
Building estimator 15 of 80 for this parallel run (total 80)...
Building estimator 16 of 80 for this parallel run

[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.3s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.3s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.2s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.2s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.2s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.2s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.2s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


Building estimator 1 of 80 for this parallel run (total 80)...
Building estimator 2 of 80 for this parallel run (total 80)...
Building estimator 3 of 80 for this parallel run (total 80)...
Building estimator 4 of 80 for this parallel run (total 80)...
Building estimator 5 of 80 for this parallel run (total 80)...
Building estimator 6 of 80 for this parallel run (total 80)...
Building estimator 7 of 80 for this parallel run (total 80)...
Building estimator 8 of 80 for this parallel run (total 80)...
Building estimator 9 of 80 for this parallel run (total 80)...
Building estimator 10 of 80 for this parallel run (total 80)...
Building estimator 11 of 80 for this parallel run (total 80)...
Building estimator 12 of 80 for this parallel run (total 80)...
Building estimator 13 of 80 for this parallel run (total 80)...
Building estimator 14 of 80 for this parallel run (total 80)...
Building estimator 15 of 80 for this parallel run (total 80)...
Building estimator 16 of 80 for this parallel run

[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.3s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.3s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


Building estimator 1 of 80 for this parallel run (total 80)...
Building estimator 2 of 80 for this parallel run (total 80)...
Building estimator 3 of 80 for this parallel run (total 80)...
Building estimator 4 of 80 for this parallel run (total 80)...
Building estimator 5 of 80 for this parallel run (total 80)...
Building estimator 6 of 80 for this parallel run (total 80)...
Building estimator 7 of 80 for this parallel run (total 80)...
Building estimator 8 of 80 for this parallel run (total 80)...
Building estimator 9 of 80 for this parallel run (total 80)...
Building estimator 10 of 80 for this parallel run (total 80)...
Building estimator 11 of 80 for this parallel run (total 80)...
Building estimator 12 of 80 for this parallel run (total 80)...
Building estimator 13 of 80 for this parallel run (total 80)...
Building estimator 14 of 80 for this parallel run (total 80)...
Building estimator 15 of 80 for this parallel run (total 80)...
Building estimator 16 of 80 for this parallel run

[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.3s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.3s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


Building estimator 1 of 80 for this parallel run (total 80)...
Building estimator 2 of 80 for this parallel run (total 80)...
Building estimator 3 of 80 for this parallel run (total 80)...
Building estimator 4 of 80 for this parallel run (total 80)...
Building estimator 5 of 80 for this parallel run (total 80)...
Building estimator 6 of 80 for this parallel run (total 80)...
Building estimator 7 of 80 for this parallel run (total 80)...
Building estimator 8 of 80 for this parallel run (total 80)...
Building estimator 9 of 80 for this parallel run (total 80)...
Building estimator 10 of 80 for this parallel run (total 80)...
Building estimator 11 of 80 for this parallel run (total 80)...
Building estimator 12 of 80 for this parallel run (total 80)...
Building estimator 13 of 80 for this parallel run (total 80)...
Building estimator 14 of 80 for this parallel run (total 80)...
Building estimator 15 of 80 for this parallel run (total 80)...
Building estimator 16 of 80 for this parallel run

[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.2s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.2s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


Building estimator 1 of 80 for this parallel run (total 80)...
Building estimator 2 of 80 for this parallel run (total 80)...
Building estimator 3 of 80 for this parallel run (total 80)...
Building estimator 4 of 80 for this parallel run (total 80)...
Building estimator 5 of 80 for this parallel run (total 80)...
Building estimator 6 of 80 for this parallel run (total 80)...
Building estimator 7 of 80 for this parallel run (total 80)...
Building estimator 8 of 80 for this parallel run (total 80)...
Building estimator 9 of 80 for this parallel run (total 80)...
Building estimator 10 of 80 for this parallel run (total 80)...
Building estimator 11 of 80 for this parallel run (total 80)...
Building estimator 12 of 80 for this parallel run (total 80)...
Building estimator 13 of 80 for this parallel run (total 80)...
Building estimator 14 of 80 for this parallel run (total 80)...
Building estimator 15 of 80 for this parallel run (total 80)...
Building estimator 16 of 80 for this parallel run

[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.2s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.2s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


Building estimator 1 of 80 for this parallel run (total 80)...
Building estimator 2 of 80 for this parallel run (total 80)...
Building estimator 3 of 80 for this parallel run (total 80)...
Building estimator 4 of 80 for this parallel run (total 80)...
Building estimator 5 of 80 for this parallel run (total 80)...
Building estimator 6 of 80 for this parallel run (total 80)...
Building estimator 7 of 80 for this parallel run (total 80)...
Building estimator 8 of 80 for this parallel run (total 80)...
Building estimator 9 of 80 for this parallel run (total 80)...
Building estimator 10 of 80 for this parallel run (total 80)...
Building estimator 11 of 80 for this parallel run (total 80)...
Building estimator 12 of 80 for this parallel run (total 80)...
Building estimator 13 of 80 for this parallel run (total 80)...
Building estimator 14 of 80 for this parallel run (total 80)...
Building estimator 15 of 80 for this parallel run (total 80)...
Building estimator 16 of 80 for this parallel run

[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.3s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.3s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.2s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.3s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.2s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.2s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.2s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


Building estimator 1 of 80 for this parallel run (total 80)...
Building estimator 2 of 80 for this parallel run (total 80)...
Building estimator 3 of 80 for this parallel run (total 80)...
Building estimator 4 of 80 for this parallel run (total 80)...
Building estimator 5 of 80 for this parallel run (total 80)...
Building estimator 6 of 80 for this parallel run (total 80)...
Building estimator 7 of 80 for this parallel run (total 80)...
Building estimator 8 of 80 for this parallel run (total 80)...
Building estimator 9 of 80 for this parallel run (total 80)...
Building estimator 10 of 80 for this parallel run (total 80)...
Building estimator 11 of 80 for this parallel run (total 80)...
Building estimator 12 of 80 for this parallel run (total 80)...
Building estimator 13 of 80 for this parallel run (total 80)...
Building estimator 14 of 80 for this parallel run (total 80)...
Building estimator 15 of 80 for this parallel run (total 80)...
Building estimator 16 of 80 for this parallel run

[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.3s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.3s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


Building estimator 1 of 80 for this parallel run (total 80)...
Building estimator 2 of 80 for this parallel run (total 80)...
Building estimator 3 of 80 for this parallel run (total 80)...
Building estimator 4 of 80 for this parallel run (total 80)...
Building estimator 5 of 80 for this parallel run (total 80)...
Building estimator 6 of 80 for this parallel run (total 80)...
Building estimator 7 of 80 for this parallel run (total 80)...
Building estimator 8 of 80 for this parallel run (total 80)...
Building estimator 9 of 80 for this parallel run (total 80)...
Building estimator 10 of 80 for this parallel run (total 80)...
Building estimator 11 of 80 for this parallel run (total 80)...
Building estimator 12 of 80 for this parallel run (total 80)...
Building estimator 13 of 80 for this parallel run (total 80)...
Building estimator 14 of 80 for this parallel run (total 80)...
Building estimator 15 of 80 for this parallel run (total 80)...
Building estimator 16 of 80 for this parallel run

[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.3s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.3s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


Building estimator 1 of 80 for this parallel run (total 80)...
Building estimator 2 of 80 for this parallel run (total 80)...
Building estimator 3 of 80 for this parallel run (total 80)...
Building estimator 4 of 80 for this parallel run (total 80)...
Building estimator 5 of 80 for this parallel run (total 80)...
Building estimator 6 of 80 for this parallel run (total 80)...
Building estimator 7 of 80 for this parallel run (total 80)...
Building estimator 8 of 80 for this parallel run (total 80)...
Building estimator 9 of 80 for this parallel run (total 80)...
Building estimator 10 of 80 for this parallel run (total 80)...
Building estimator 11 of 80 for this parallel run (total 80)...
Building estimator 12 of 80 for this parallel run (total 80)...
Building estimator 13 of 80 for this parallel run (total 80)...
Building estimator 14 of 80 for this parallel run (total 80)...
Building estimator 15 of 80 for this parallel run (total 80)...
Building estimator 16 of 80 for this parallel run

[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.3s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.3s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


Building estimator 1 of 80 for this parallel run (total 80)...
Building estimator 2 of 80 for this parallel run (total 80)...
Building estimator 3 of 80 for this parallel run (total 80)...
Building estimator 4 of 80 for this parallel run (total 80)...
Building estimator 5 of 80 for this parallel run (total 80)...
Building estimator 6 of 80 for this parallel run (total 80)...
Building estimator 7 of 80 for this parallel run (total 80)...
Building estimator 8 of 80 for this parallel run (total 80)...
Building estimator 9 of 80 for this parallel run (total 80)...
Building estimator 10 of 80 for this parallel run (total 80)...
Building estimator 11 of 80 for this parallel run (total 80)...
Building estimator 12 of 80 for this parallel run (total 80)...
Building estimator 13 of 80 for this parallel run (total 80)...
Building estimator 14 of 80 for this parallel run (total 80)...
Building estimator 15 of 80 for this parallel run (total 80)...
Building estimator 16 of 80 for this parallel run

[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.3s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.3s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


Building estimator 1 of 80 for this parallel run (total 80)...
Building estimator 2 of 80 for this parallel run (total 80)...
Building estimator 3 of 80 for this parallel run (total 80)...
Building estimator 4 of 80 for this parallel run (total 80)...
Building estimator 5 of 80 for this parallel run (total 80)...
Building estimator 6 of 80 for this parallel run (total 80)...
Building estimator 7 of 80 for this parallel run (total 80)...
Building estimator 8 of 80 for this parallel run (total 80)...
Building estimator 9 of 80 for this parallel run (total 80)...
Building estimator 10 of 80 for this parallel run (total 80)...
Building estimator 11 of 80 for this parallel run (total 80)...
Building estimator 12 of 80 for this parallel run (total 80)...
Building estimator 13 of 80 for this parallel run (total 80)...
Building estimator 14 of 80 for this parallel run (total 80)...
Building estimator 15 of 80 for this parallel run (total 80)...
Building estimator 16 of 80 for this parallel run

[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.3s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.3s finished
[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done   2 out of   2 | elapsed:    0.2s remaining:    0.0s
[Parallel(n_jobs=2)]: Done   2 out of   2 | elapsed:    0.2s finished
[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done   2 out of   2 | elapsed:    0.2s remaining:    0.0s
[Parallel(n_jobs=2)]: Done   2 out of   2 | elapsed:    0.2s finished
[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done   2 out of   2 | elapsed:    0.2s remaining:    0.0s
[Parallel(n_jobs=2)]: Done   2 out of   2 | elapsed:    0.2s finished
[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done   2 out of   2 | elapsed:    0.2s remaining:    0.0s
[Parallel(n_jobs=2)]

Building estimator 1 of 70 for this parallel run (total 70)...
Building estimator 2 of 70 for this parallel run (total 70)...
Building estimator 3 of 70 for this parallel run (total 70)...
Building estimator 4 of 70 for this parallel run (total 70)...
Building estimator 5 of 70 for this parallel run (total 70)...
Building estimator 6 of 70 for this parallel run (total 70)...
Building estimator 7 of 70 for this parallel run (total 70)...
Building estimator 8 of 70 for this parallel run (total 70)...
Building estimator 9 of 70 for this parallel run (total 70)...
Building estimator 10 of 70 for this parallel run (total 70)...
Building estimator 11 of 70 for this parallel run (total 70)...
Building estimator 12 of 70 for this parallel run (total 70)...
Building estimator 13 of 70 for this parallel run (total 70)...
Building estimator 14 of 70 for this parallel run (total 70)...
Building estimator 15 of 70 for this parallel run (total 70)...
Building estimator 16 of 70 for this parallel run

[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.3s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.3s finished


In [39]:
print(model.best_params_)

{'bootstrap': True, 'max_features': 1, 'max_samples': 'auto', 'n_estimators': 70, 'n_jobs': None, 'random_state': None, 'verbose': 2, 'warm_start': True}


In [40]:
ifc=IsolationForest(bootstrap = True, max_features = 1, max_samples = "auto",
                    n_estimators = 70, n_jobs = None, random_state = None,
                    verbose = 1, warm_start = True)
ifc.fit(x_train)
ifc_yhat = ifc.predict(x_test)

# Reshapre the prediction values to 0 for valid, 1 for fraud. 
# This is needed in order to be able to run metrics
ifc_yhat[ifc_yhat == 1] = 0
ifc_yhat[ifc_yhat == -1] = 1

[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.3s finished


In [41]:
print(cl('Accuracy score of the Isolation Forest model is {}%'.format(round(accuracy_score(y_test, ifc_yhat) *100,1)), attrs = ['bold']))
print(cl('Precision score of the Isolation Forest model is {}%'.format(round(precision_score(y_test, ifc_yhat) *100,1)), attrs = ['bold']))
print(cl('Recall score of the Isolation Forest model is {}%'.format(round(recall_score(y_test, ifc_yhat) *100,1)), attrs = ['bold']))
print(cl('f1 score of the Isolation Forest model is {}%'.format(round(f1_score(y_test, ifc_yhat) *100,1)), attrs = ['bold']))

[1mAccuracy score of the Isolation Forest model is 94.5%[0m
[1mPrecision score of the Isolation Forest model is 2.6%[0m
[1mRecall score of the Isolation Forest model is 82.2%[0m
[1mf1 score of the Isolation Forest model is 5.0%[0m


<a id='ran_for_gs_cv'></a><a href='#top'>Top</a>
<h2>Random Forests - GridSearchCV</h2>

In [None]:
rf = RandomForestClassifier(max_depth = 4)

parameters = { 
    'n_estimators': [200, 500],
    'max_features': ['auto', 'sqrt', 'log2'],
    'max_depth' : [4,5,6,7,8],
    'criterion' :['gini', 'entropy']
}
rf_gs_cv=GridSearchCV(estimator = rf, param_grid = parameters, scoring = 'recall',cv=10, n_jobs = -1)
rf_gs_cv.fit(x_train,y_train)

In [None]:
rf_opt_recall = rf_gs_cv.best_score_
rf_opt_recall_param = rf_gs_cv.best_params_ 

<a id='knn_gs_cv'></a><a href='#top'>Top</a>
<h2>K-Nearest Neighbors - GridSearchCV</h2>

In [None]:
# https://www.datasklr.com/select-classification-methods/k-nearest-neighbors
n = 5
knn = KNeighborsClassifier(n_neighbors = n)

parameters_knn = {
    'n_neighbors': (1,10, 1),
    'leaf_size': (20,40,1),
    'p': (1,2),
    'weights': ('uniform', 'distance'),
    'metric': ('minkowski', 'chebyshev')
}
knn_gs_cv=GridSearchCV(estimator = knn, param_grid = parameters_knn, scoring = 'recall',cv=5, n_jobs = -1)
knn_gs_cv.fit(x_train,y_train)

In [None]:
knn_opt_recall = knn_gs_cv.best_score_
knn_opt_recall_param = knn_gs_cv.best_params_ 

<a id='svm_gs_cv'></a><a href='#top'>Top</a>
<h2>Support Vector Machine (SVM) - GridSearchCV</h2>

In [None]:
svm = SVC()

parameters_svm=[{"C":np.logspace(0,3,4), "kernel":['linear']},
                {"C":np.logspace(0,3,4), "kernel":['rbf'], 'gamma': np.arange(1, 10)/10}]

svm_gs_cv = GridSearchCV(estimator = svm, param_grid = parameters_svm, scoring = 'recall',cv=5, n_jobs = -1)
svm_gs_cv.fit(x_train,y_train)

In [None]:
svm_opt_recall = svm_gs_cv.best_score_
svm_opt_recall_param = svm_gs_cv.best_params_ 

<a id='xgb_gs_cv'></a><a href='#top'>Top</a>
<h2>XGBoost - GridSearchCV</h2>

In [None]:
xgb = XGBClassifier(max_depth = 4)

parameters_xgb = {'nthread':[4], #when use hyperthread, xgboost may become slower
                  'objective':['binary:logistic'],
                  'learning_rate': [0.05], #so called `eta` value
                  'max_depth': [6],
                  'min_child_weight': [11],
                  'silent': [1],
                  'subsample': [0.8],
                  'colsample_bytree': [0.7],
                  'n_estimators': [5], #number of trees, change it to 1000 for better results
                  'missing':[-999],
                  'seed': [1337]}

xgb_gs_cv = GridSearchCV(estimator = xgb, param_grid = parameters_xgb, scoring='recall',
                         cv=5, n_jobs = -1, verbose=2, refit=True)

In [None]:
xgb_opt_recall = xgb_gs_cv.best_score_
xgb_opt_recall_param = xgb_gs_cv.best_params_ 