# Hyperparameter Optimization

Prepared by [Ali Rifat Kaya](https://www.linkedin.com/in/alirifatkaya/)

# Table of Contents

1. [Import Libraries & Define Functions](#Import-Libraries-&-Define-Functions)
2. [Read Data](#Read-Data)
3. [Logistic Regression](#Logistic-Regression)
4. [Decision Tree Classifier](#Decision-Tree-Classifier)
5. [Random Forest Classifier](#Random-Forest-Classifier)
6. [Extra Tress Classifier](#Extra-Trees-Classifier)
7. [AdaBoost Classifier](#AdaBoost-Classifier)
8. [XGBoost Classifier](#XGBoost-Classifier)
9. [KNN Classifier](#KNN-Classifier)
10. [Conclusion](#Conclusion)

# Import Libraries & Define Functions

In [1]:
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import ExtraTreesClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import AdaBoostClassifier
from sklearn.ensemble import GradientBoostingClassifier
from xgboost import XGBClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import StratifiedKFold
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import scale
from sklearn.metrics import make_scorer
from sklearn.metrics import precision_recall_curve
from sklearn.metrics import auc
import pandas as pd
import numpy as np
import pickle
import matplotlib.pyplot as plt

In [2]:
def auc_pr(y_test, predict_proba):
    p, r, _ = precision_recall_curve(y_test, predict_proba)
    return auc(r, p)  # Area Under Curve score
# Turns a performance metric into a scorer which help us to use it as
# a score function that we can pass as the parameter to the algorithms
# needs_proba requires predicted probabilities
aucpr = make_scorer(auc_pr, needs_proba=True)

# Read Data

In [3]:
cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=1)

In [4]:
# reads data into a dataframe
raw_data = pd.read_csv('creditcard_new.csv', header=0)
# copy data into another dataframe by keeping the original safe
df = raw_data.copy()
# input matrix and target array
X = df.drop(['Class', 'Hours'], axis=1).values  # define the input matrix
y = df.Class.values  # the label array (classes)
# split the data into the training data (70%) and the test data (30%)
X_train, X_test, y_train, y_test = train_test_split(X,
                                                    y,
                                                    test_size=0.3,
                                                    random_state=1)
X_train_scaled = scale(X_train)

In [5]:
print('Training data has {} fraudulent transactions.'.format(y_train.sum()))
print('Test data has {} fradulent transactions.'.format(y_test.sum()))
print('\nDistribution of positive class in training data: {}'.format(
    round((y_train[y_train == 1].size/y_train.size), 4)))
print('Distribution of positive class in test data: {}'.format(
    round((y_test[y_test == 1].size/y_test.size), 4)))

Training data has 337 fraudulent transactions.
Test data has 128 fradulent transactions.

Distribution of positive class in training data: 0.0017
Distribution of positive class in test data: 0.0015


# Logistic Regression

In [6]:
lr = LogisticRegression()
# define hyperparameter space
solvers = ['newton-cg', 'lbfgs', 'sag', 'saga']
c_values = [100, 10, 1.0, 0.1, 0.01]
class_weight = ['balanced', None]
# define grid search
grid = dict(solver=solvers, C=c_values, class_weight=class_weight)
grid_search = GridSearchCV(
    estimator=lr, param_grid=grid, n_jobs=-1, cv=cv, scoring=aucpr, error_score=0)
grid_result = grid_search.fit(X_train_scaled, y_train)
# summarize results
print('{:7}: '.format('Best'), end='')
print("%.3f using %s" %
      (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
rest_scores = []
for mean, stdev, param in zip(means, stds, params):
    if param == {'C': 1.0, 'class_weight': None, 'solver': 'lbfgs'}:
        print('Default: %.3f using %s' % (mean, param))
    else:
        rest_scores.append('%.3f (%.3f) with: %s' % (mean, stdev, param))
print('-'*81)
_ = [print(score) for score in rest_scores]

Best   : 0.758 using {'C': 0.1, 'class_weight': None, 'solver': 'lbfgs'}
Default: 0.756 using {'C': 1.0, 'class_weight': None, 'solver': 'lbfgs'}
---------------------------------------------------------------------------------
0.750 (0.025) with: {'C': 100, 'class_weight': 'balanced', 'solver': 'newton-cg'}
0.750 (0.025) with: {'C': 100, 'class_weight': 'balanced', 'solver': 'lbfgs'}
0.746 (0.036) with: {'C': 100, 'class_weight': 'balanced', 'solver': 'sag'}
0.739 (0.043) with: {'C': 100, 'class_weight': 'balanced', 'solver': 'saga'}
0.756 (0.029) with: {'C': 100, 'class_weight': None, 'solver': 'newton-cg'}
0.756 (0.029) with: {'C': 100, 'class_weight': None, 'solver': 'lbfgs'}
0.756 (0.028) with: {'C': 100, 'class_weight': None, 'solver': 'sag'}
0.757 (0.027) with: {'C': 100, 'class_weight': None, 'solver': 'saga'}
0.749 (0.025) with: {'C': 10, 'class_weight': 'balanced', 'solver': 'newton-cg'}
0.749 (0.025) with: {'C': 10, 'class_weight': 'balanced', 'solver': 'lbfgs'}
0.731 (0.055

# Decision Tree Classifier

In [7]:
dt = DecisionTreeClassifier()
# define hyperparameter space
max_features = ['sqrt', 'log2', None]
class_weight = ['balanced', None]
# define grid search
grid = dict(max_features=max_features,
            class_weight=class_weight)
grid_search = GridSearchCV(estimator=dt,
                           param_grid=grid,
                           n_jobs=-1,
                           cv=cv,
                           scoring=aucpr)
grid_result = grid_search.fit(X_train, y_train)
# summarize results
print('{:7}: '.format('Best'), end='')
print("%.3f using %s" %
      (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
rest_scores = []
for mean, stdev, param in zip(means, stds, params):
    if param == {'class_weight': None, 'max_features': None}:
        print('Default: %.3f using %s' % (mean, param))
    else:
        rest_scores.append('%.3f (%.3f) with: %s' % (mean, stdev, param))
print('-' * 81)
_ = [print(score) for score in rest_scores]

Best   : 0.761 using {'class_weight': None, 'max_features': 'log2'}
Default: 0.734 using {'class_weight': None, 'max_features': None}
---------------------------------------------------------------------------------
0.750 (0.030) with: {'class_weight': 'balanced', 'max_features': 'sqrt'}
0.725 (0.022) with: {'class_weight': 'balanced', 'max_features': 'log2'}
0.746 (0.051) with: {'class_weight': 'balanced', 'max_features': None}
0.727 (0.064) with: {'class_weight': None, 'max_features': 'sqrt'}
0.761 (0.033) with: {'class_weight': None, 'max_features': 'log2'}


# Random Forest Classifier

In [8]:
rf = RandomForestClassifier()
# define hyperparameter space
n_estimators = [10, 100, 1000]
max_features = ['sqrt', 'log2', None]
class_weight = ['balanced', None]
# define grid search
grid = dict(n_estimators=n_estimators,
            max_features=max_features,
            class_weight=class_weight)
grid_search = GridSearchCV(estimator=rf,
                           param_grid=grid,
                           n_jobs=-1,
                           cv=cv,
                           scoring=aucpr)
grid_result = grid_search.fit(X_train, y_train)
# summarize results
print('{:7}: '.format('Best'), end='')
print("%.3f using %s" %
      (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
rest_scores = []
for mean, stdev, param in zip(means, stds, params):
    if param == {'class_weight': None, 'max_features': 'sqrt', 'n_estimators': 100}:
        print('Default: %.3f using %s' % (mean, param))
    else:
        rest_scores.append('%.3f (%.3f) with: %s' % (mean, stdev, param))
print('-' * 81)
_ = [print(score) for score in rest_scores]

Best   : 0.860 using {'class_weight': 'balanced', 'max_features': 'log2', 'n_estimators': 1000}
Default: 0.854 using {'class_weight': None, 'max_features': 'sqrt', 'n_estimators': 100}
---------------------------------------------------------------------------------
0.844 (0.027) with: {'class_weight': 'balanced', 'max_features': 'sqrt', 'n_estimators': 10}
0.857 (0.032) with: {'class_weight': 'balanced', 'max_features': 'sqrt', 'n_estimators': 100}
0.858 (0.034) with: {'class_weight': 'balanced', 'max_features': 'sqrt', 'n_estimators': 1000}
0.858 (0.029) with: {'class_weight': 'balanced', 'max_features': 'log2', 'n_estimators': 10}
0.853 (0.037) with: {'class_weight': 'balanced', 'max_features': 'log2', 'n_estimators': 100}
0.860 (0.029) with: {'class_weight': 'balanced', 'max_features': 'log2', 'n_estimators': 1000}
0.830 (0.035) with: {'class_weight': 'balanced', 'max_features': None, 'n_estimators': 10}
0.841 (0.034) with: {'class_weight': 'balanced', 'max_features': None, 'n_esti

# Extra Trees Classifier

In [9]:
et = ExtraTreesClassifier()
# use the hyperparameter space for RandomForestClassifier
grid_search = GridSearchCV(estimator=et,
                           param_grid=grid,
                           n_jobs=-1,
                           cv=cv,
                           scoring=aucpr)
grid_result = grid_search.fit(X_train, y_train)
# summarize results
print('{:7}: '.format('Best'), end='')
print("%.3f using %s" %
      (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
rest_scores = []
for mean, stdev, param in zip(means, stds, params):
    if param == {'class_weight': None, 'max_features': 'sqrt', 'n_estimators': 100}:
        print('Default: %.3f using %s' % (mean, param))
    else:
        rest_scores.append('%.3f (%.3f) with: %s' % (mean, stdev, param))
print('-' * 81)
_ = [print(score) for score in rest_scores]

Best   : 0.864 using {'class_weight': 'balanced', 'max_features': None, 'n_estimators': 1000}
Default: 0.861 using {'class_weight': None, 'max_features': 'sqrt', 'n_estimators': 100}
---------------------------------------------------------------------------------
0.855 (0.038) with: {'class_weight': 'balanced', 'max_features': 'sqrt', 'n_estimators': 10}
0.861 (0.027) with: {'class_weight': 'balanced', 'max_features': 'sqrt', 'n_estimators': 100}
0.863 (0.030) with: {'class_weight': 'balanced', 'max_features': 'sqrt', 'n_estimators': 1000}
0.853 (0.034) with: {'class_weight': 'balanced', 'max_features': 'log2', 'n_estimators': 10}
0.860 (0.029) with: {'class_weight': 'balanced', 'max_features': 'log2', 'n_estimators': 100}
0.862 (0.029) with: {'class_weight': 'balanced', 'max_features': 'log2', 'n_estimators': 1000}
0.858 (0.027) with: {'class_weight': 'balanced', 'max_features': None, 'n_estimators': 10}
0.863 (0.032) with: {'class_weight': 'balanced', 'max_features': None, 'n_estima

# AdaBoost Classifier

In [10]:
adaboost = AdaBoostClassifier()
# define hyperparameter space
n_estimators = [10, 50, 100, 1000]
learning_rate = [0.0001, 0.001, 0.01, 0.1, 1]
# define grid search
grid = dict(n_estimators=n_estimators,
            learning_rate=learning_rate)
grid_search = GridSearchCV(estimator=adaboost,
                           param_grid=grid,
                           n_jobs=-1,
                           cv=cv,
                           scoring=aucpr)
grid_result = grid_search.fit(X_train, y_train)
# summarize results
print('{:7}: '.format('Best'), end='')
print("%.3f using %s" %
      (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
rest_scores = []
for mean, stdev, param in zip(means, stds, params):
    if param == {'learning_rate': 1, 'n_estimators': 50}:
        print('Default: %.3f using %s' % (mean, param))
    else:
        rest_scores.append('%.3f (%.3f) with: %s' % (mean, stdev, param))
print('-' * 81)
_ = [print(score) for score in rest_scores]

Best   : 0.829 using {'learning_rate': 0.1, 'n_estimators': 1000}
Default: 0.761 using {'learning_rate': 1, 'n_estimators': 50}
---------------------------------------------------------------------------------
0.732 (0.030) with: {'learning_rate': 0.0001, 'n_estimators': 10}
0.732 (0.030) with: {'learning_rate': 0.0001, 'n_estimators': 50}
0.732 (0.030) with: {'learning_rate': 0.0001, 'n_estimators': 100}
0.739 (0.040) with: {'learning_rate': 0.0001, 'n_estimators': 1000}
0.732 (0.030) with: {'learning_rate': 0.001, 'n_estimators': 10}
0.733 (0.031) with: {'learning_rate': 0.001, 'n_estimators': 50}
0.739 (0.040) with: {'learning_rate': 0.001, 'n_estimators': 100}
0.739 (0.036) with: {'learning_rate': 0.001, 'n_estimators': 1000}
0.732 (0.032) with: {'learning_rate': 0.01, 'n_estimators': 10}
0.747 (0.030) with: {'learning_rate': 0.01, 'n_estimators': 50}
0.740 (0.035) with: {'learning_rate': 0.01, 'n_estimators': 100}
0.766 (0.031) with: {'learning_rate': 0.01, 'n_estimators': 1000}
0

# Gradient Boosting Classifier

In [11]:
gb = GradientBoostingClassifier()
# define hyperparameter space
n_estimators = [10, 100, 1000]
learning_rate = [0.0001, 0.001, 0.01, 0.1, 1]
max_features = ['sqrt', 'log2', None]
# define grid search
grid = dict(n_estimators=n_estimators,
            learning_rate=learning_rate,
            max_features=max_features)
grid_search = GridSearchCV(estimator=gb,
                           param_grid=grid,
                           n_jobs=-1,
                           cv=cv,
                           scoring=aucpr)
grid_result = grid_search.fit(X_train, y_train)
# summarize results
print('{:7}: '.format('Best'), end='')
print("%.3f using %s" %
      (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
rest_scores = []
for mean, stdev, param in zip(means, stds, params):
    if param == {'learning_rate': 0.1, 'max_features': None, 'n_estimators': 100}:
        print('Default: %.3f using %s' % (mean, param))
    else:
        rest_scores.append('%.3f (%.3f) with: %s' % (mean, stdev, param))
print('-' * 81)
_ = [print(score) for score in rest_scores]

Best   : 0.828 using {'learning_rate': 0.001, 'max_features': None, 'n_estimators': 1000}
Default: 0.604 using {'learning_rate': 0.1, 'max_features': None, 'n_estimators': 100}
---------------------------------------------------------------------------------
0.782 (0.028) with: {'learning_rate': 0.0001, 'max_features': 'sqrt', 'n_estimators': 10}
0.801 (0.025) with: {'learning_rate': 0.0001, 'max_features': 'sqrt', 'n_estimators': 100}
0.806 (0.027) with: {'learning_rate': 0.0001, 'max_features': 'sqrt', 'n_estimators': 1000}
0.770 (0.038) with: {'learning_rate': 0.0001, 'max_features': 'log2', 'n_estimators': 10}
0.798 (0.028) with: {'learning_rate': 0.0001, 'max_features': 'log2', 'n_estimators': 100}
0.802 (0.029) with: {'learning_rate': 0.0001, 'max_features': 'log2', 'n_estimators': 1000}
0.809 (0.024) with: {'learning_rate': 0.0001, 'max_features': None, 'n_estimators': 10}
0.808 (0.024) with: {'learning_rate': 0.0001, 'max_features': None, 'n_estimators': 100}
0.812 (0.019) with

# XGBoost Classifier

In [12]:
xgb = XGBClassifier()
# define hyperparameter space
eta = [0.001, 0.01, 0.1, 0.3, 0.5, 1]
colsample_bytree = [0.2, 0.4, 0.6, 0.8, 1.0]
# define grid search
grid = dict(eta=eta,
            colsample_bytree=colsample_bytree)
grid_search = GridSearchCV(estimator=xgb,
                           param_grid=grid,
                           n_jobs=-1,
                           cv=cv,
                           scoring=aucpr)
grid_result = grid_search.fit(X_train, y_train)
# summarize results
print('{:7}: '.format('Best'), end='')
print("%.3f using %s" %
      (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
rest_scores = []
for mean, stdev, param in zip(means, stds, params):
    if param == {'colsample_bytree': 1.0, 'eta': 0.3}:
        print('Default: %.3f using %s' % (mean, param))
    else:
        rest_scores.append('%.3f (%.3f) with: %s' % (mean, stdev, param))
print('-' * 81)
_ = [print(score) for score in rest_scores]

Best   : 0.861 using {'colsample_bytree': 0.2, 'eta': 0.5}
Default: 0.850 using {'colsample_bytree': 1.0, 'eta': 0.3}
---------------------------------------------------------------------------------
0.816 (0.037) with: {'colsample_bytree': 0.2, 'eta': 0.001}
0.822 (0.035) with: {'colsample_bytree': 0.2, 'eta': 0.01}
0.858 (0.027) with: {'colsample_bytree': 0.2, 'eta': 0.1}
0.860 (0.030) with: {'colsample_bytree': 0.2, 'eta': 0.3}
0.861 (0.024) with: {'colsample_bytree': 0.2, 'eta': 0.5}
0.836 (0.030) with: {'colsample_bytree': 0.2, 'eta': 1}
0.832 (0.031) with: {'colsample_bytree': 0.4, 'eta': 0.001}
0.834 (0.029) with: {'colsample_bytree': 0.4, 'eta': 0.01}
0.859 (0.025) with: {'colsample_bytree': 0.4, 'eta': 0.1}
0.855 (0.035) with: {'colsample_bytree': 0.4, 'eta': 0.3}
0.853 (0.025) with: {'colsample_bytree': 0.4, 'eta': 0.5}
0.845 (0.041) with: {'colsample_bytree': 0.4, 'eta': 1}
0.844 (0.029) with: {'colsample_bytree': 0.6, 'eta': 0.001}
0.845 (0.027) with: {'colsample_bytree': 0

# KNN Classifier

* Since in the [KNN Hyperparameter Optimization](knn_hyperparameter_optimization.ipynb) notebook, I used intervals to select the best score. The best score was `n_neighbors=3`. However, it is important to see the performance of `n_neighbors=4` before having a final decision. Even though, the difference is tiny, I will go with `n_neighbors=3`.

In [15]:
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier()
# define hyperparameter space
n_neighbors = [3, 4]
# define grid search
grid = dict(n_neighbors=n_neighbors)
grid_search = GridSearchCV(estimator=knn,
                           param_grid=grid,
                           n_jobs=-1,
                           cv=cv,
                           scoring=aucpr)
grid_result = grid_search.fit(X_train_scaled, y_train)
# summarize results
print("%.3f using %s" %
      (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
params = grid_result.cv_results_['params']
for mean, param in zip(means, params):
    if param != grid_result.best_params_:
        print('%.3f using %s' % (mean, param))

0.859 using {'n_neighbors': 3}
0.857 using {'n_neighbors': 4}


# Conclusion

In the hyperparameter optimization, I tried to chose the most important hyperparameters and had to chose a narrow hyperparameter space due to the limited computing power. However, I obtained moderate and great improvement on the performances of the algorithms. The table shows the default, the tuned performance and the performance improvement.

| Algorithm                      | Default Parameters | Tuned Parameters | Change |
|--------------------------------|--------------------|------------------|--------|
| Logistic Regression            | 0.756 | 0.758 | 0.002 |
| Decision Tree Classifier       | 0.734 | 0.761 | 0.027 |
| Random Forest Classifier       | 0.854 | 0.860 | 0.060 |
| Extra Trees Classifier         | 0.861 | 0.864 | 0.003 |
| AdaBoost Classifier            | 0.761 | 0.829 | 0.680 |
| Gradient Boosting Classifier   | 0.604 | 0.828 | 0.224 |
| XGBoost Classifier             | 0.850 | 0.861 | 0.011 |
| K-Nearest Neighbors Classifier | 0.859 | 0.857 | -0.002|

* It is important to remember that the final results will come with the test set and I will compare the performance of the algorithms with the default parameters and the tuned parameters. At the end, the best algorithms to detect _fraudulent transactions_ will be ready for the deployment.

Next stage is to [test the algorithms with tuned parameters](final_evaluation.ipynb).