# Grid Search & Cross Validation
___

#### Grid Search and CV
* 1.1 Train test strategy and hyper parameter tuning
* 1.2 Grid Search with Early Stopping
* 1.3 Parameter grid
* 1.4 Predicting offer completion after viewing
* 1.5 Predicting offer completion binary

#### Results
* 2.0 Grid Search Results
* 2.1 Results - completion after viewing multiclass
* 2.2 Results - completion binary
* 2.3 Results - comparison
* 2.4 Results - analysis

In [2]:
# mount google drive if running in colab
import os
import sys

if os.path.exists('/usr/lib/python3.6/'):
    from google.colab import drive
    drive.mount('/content/drive/')
    sys.path.append('/content/drive/My Drive/Colab Notebooks/Starbucks_Udacity')
    %cd /content/drive/My Drive/Colab Notebooks/Starbucks_Udacity/notebooks/exploratory
else:
    sys.path.append('../../')

In [11]:
import numpy as np
import os
import pandas as pd
import matplotlib.pyplot as plt
import progressbar
import catboost
import joblib
from catboost import CatBoostClassifier
from catboost import Pool
from catboost import MetricVisualizer
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report, recall_score
from sklearn.metrics import f1_score
from sklearn.model_selection import train_test_split
import timeit

from sklearn.model_selection import train_test_split, GridSearchCV, GroupKFold
from sklearn.model_selection import ParameterGrid

pd.set_option('display.max_columns', 500)
pd.set_option('display.max_rows', 500)
pd.options.display.max_seq_items = 1000
pd.set_option('max_colwidth', 200) 

from sklearn.model_selection import GridSearchCV, TimeSeriesSplit, GroupShuffleSplit
import seaborn as sns

%load_ext autoreload
%autoreload 2
%aimport src.models.train_model
%aimport src.data.make_dataset

from src.data import make_dataset
from src.data.make_dataset import save_file
from src.models.train_model import gridsearch_early_stopping, generate_folds
from src.models.train_model import label_creater
from src.utilities import cf_matrix
from src.models.train_model import exploratory_training

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


### 1.1 Train test strategy and hyper parameter tuning

Each offer is time dependent and we have embedded past data for each customer within the features of each offer observation. This is therefore a time series problem and in order to build a predictive model, our test data needs to be in the future relative to our training data. We will therefore use Time Series Split by 'time_day's for our cross validation strategy.
A randomised train test split in this case would cause data leakage.

* We will run a grid search with 5 fold cross validation on the training set to determine optimal learning rate, depth and 12_leaf_reg parameters.
* Once optimal parameters are determined, we will rerun training using the whole train set, and measure results against the test set.
* We can then analyse feature importance and determine if any features can be dropped to improve model performance.

### 1.2 Grid Search with Early Stopping
In order to utilise early stopping during gridsearch we will be unable to use the SKlearn GridSearchCV and will instead need to use our own custom function.

Early stopping means we do not need to specify the number of iterations for CatBoost to run, instead CatBoost will check validation error vs the test data in each fold and stop training if logloss (our loss function) begins to increase for a given number of iterations. 

In [1]:
def generate_folds(cv, X_train, y_train):
    '''
    Iterate through cv folds and split into list of folds
    Checks that each fold has the same % of positive class
    
    Parameters
    -----------
    cv: cross validation generator
               
    Returns
    -------
    X_train, X_test, y_train, y_test: DataFrames
    '''
    train_X, train_y, test_X, test_y = [], [], [], []
    
    for i in cv:
        train_X.append(X_train.iloc[i[0]])
        train_y.append(y_train.iloc[i[0]])

        test_X.append(X_train.iloc[i[1]])
        test_y.append(y_train.iloc[i[1]])
      
    print('positive classification % per fold and length')
    for i in range(len(train_X)):
        print('train[' + str(i) + ']' , round(train_y[i].sum() / train_y[i].count(), 4), 
              train_y[i].shape)
        print('test[' + str(i) + '] ' , round(test_y[i].sum() / test_y[i].count(), 4), 
              test_y[i].shape)
           
    return train_X, train_y, test_X, test_y

In [15]:
def gridsearch_early_stopping(cv, X, y, folds, grid, cat_features=None, save=None):
    '''
    Perform grid search with early stopping across folds specified by index 
    
    Parameters
    -----------
    cv: cross validation
    X: DataFrame or Numpy array
    y: DataFrame or Numpy array
    fold: list of fold indexes
    grid: parameter grid
    save:   string, excluding file extension (default=None)
            saves results_df for each fold to folder '../../data/interim'
    '''
    
    if np.unique(y).size <= 2:
        loss_function = 'Logloss'
    else:
        loss_function = 'MultiClass'
           
    # generate data folds 
    train_X, train_y, test_X, test_y = generate_folds(cv, X, y)
    
    # iterate through specified folds
    for fold in folds:
        # assign train and test pools
        test_pool = Pool(data=test_X[fold], label=test_y[fold], cat_features=cat_features)
        train_pool = Pool(data=train_X[fold], label=train_y[fold], cat_features=cat_features)

        # creating results_df dataframe
        results_df = pd.DataFrame(columns=['params' + str(fold), loss_function + str(fold), 
                                           'Accuracy'+ str(fold), 'iteration'+ str(fold)])

        best_score = 99999

        # iterate through parameter grid
        for params in ParameterGrid(grid):

            # create catboost classifer with parameter params
            model = CatBoostClassifier(cat_features=cat_features,
                                        early_stopping_rounds=50,
                                        task_type='GPU',
                                        custom_loss=['Accuracy'],
                                        iterations=3000,
                                        #class_weights=weights, 
                                        **params)

            # fit model
            model.fit(train_pool, eval_set=test_pool, verbose=400)

            # append results to results_df
            
            print(model.get_best_score()['validation'])
            results_df = results_df.append(pd.DataFrame(
                            [[params, model.get_best_score()['validation'][loss_function], 
                              model.get_best_score()['validation']['Accuracy'], 
                              model.get_best_iteration()]], 
                              columns=['params' + str(fold), loss_function + str(fold), 
                                       'Accuracy' + str(fold), 'iteration' + str(fold)]))

            # save best score and parameters
            if model.get_best_score()['validation'][loss_function] < best_score:
                best_score = model.get_best_score()['validation'][loss_function]
                best_grid = params

        print("Best logloss: ", best_score)
        print("Grid:", best_grid)

        save_file(results_df, save + str(fold) + '.joblib', dirName='../../models')
        display(results_df)

### 1.3. Parameter grid
We will optimise across the following parameters:
* Depth - The depth that each decision tree can grow to. Greater depth increases the algorithms ability to fit the data but higher depth can also lead to overfitting to the training set.
* Learning rate - This is the step size rate of learning for each iteration. Higher learning rates will lead the algorithm to learn more quickly, however there may be a tendency to over step the optimal minimum of the loss function and therefore not capture enough detail. Learning rate balances a trade off between speed and accuracy.
* 12_leaf_reg - This is a regularisation parameter utilised in Catboost. Values can range from 0 to infinity.

For this dataset, CatBoost default parameters are:
* depth: 6
* learning_rate: 0.03
* 12_leaf_reg: 3

I have therefore chosen a parameter grid spread around these default values:

In [11]:
params = {'depth': [6,7,8,9],
          'learning_rate': [0.07, 0.03, 0.01],
          'l2_leaf_reg':[1,3,5,10]}

In [13]:
cat_features = [0,4,5,92,93,94,95,96,97]

### 1.4. Predicting Multiclass - offer completion after viewing

In [9]:
complete_from_view = {'completed_not_viewed': 2, 
                    'completed_before_viewed': 2, 
                    'complete_anyway': 1,
                    'completed_responsive': 1,
                    'incomplete_responsive': 0,
                    'no_complete_no_view': 0,
                    'unresponsive': 0}

In [None]:
df = joblib.load('../../data/interim/transcript_final_optimised.joblib')
df = src.models.train_model.label_creater(df, label_grid=complete_from_view)
df.sort_values('time_days', inplace=True)

X = df.drop('label', axis=1)
y = df.label

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, shuffle=False, 
                                                    random_state=42)
grid = params
cv = TimeSeriesSplit(n_splits=5).split(X_train, y_train)
folds = list(range(0,5))
gridsearch_early_stopping(cv, X_train, y_train, folds, grid, cat_features=cat_features, 
                          save='multiclass_gridsearch_inc10')

### 1.5 Predicting Binary - offer completion

In [18]:
complete = {'completed_not_viewed': 1, 
        'completed_before_viewed': 1, 
        'complete_anyway': 1,
        'completed_responsive': 1,
        'incomplete_responsive': 0,
        'no_complete_no_view': 0,
        'unresponsive': 0}

In [None]:
df = joblib.load('../../data/interim/transcript_final_optimised.joblib')
df = src.models.train_model.label_creater(df, label_grid=complete)
df.sort_values('time_days', inplace=True)

X = df.drop('label', axis=1)
y = df.label

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, shuffle=False, 
                                                    random_state=42)
grid = params
cv = TimeSeriesSplit(n_splits=5).split(X_train, y_train)
folds = list(range(0,5))
gridsearch_early_stopping(cv, X_train, y_train, folds, grid, cat_features=cat_features, 
                          save='complete_gridsearch')

### 2. Grid Search Results

The following function brings together the logloss and accuracy results for each fold. It calculates the accuracy mean and logloss mean for each parameter set across each fold, highlighting the best scores.

In [5]:
def grid_search_results(raw_file, num_folds):
    '''
    Loads raw cross validation fold results.
    Displays results highlighting best scores
    
    Parameters
    -----------
    raw_file: string, the name of the file excluding fold number and extension
    extension: string, type of file, e.g '.joblib', '.pkl'
    num_folds: number of cv folds
                
    Returns
    -------
    results DataFrame
    '''
    
    # list of folds
    results_files = [0 for i in range(0, num_folds)]
        
    # read results files for each fold
    for i in range(0, num_folds):
        results_files[i] = joblib.load(f'../../models/{raw_file}{i}.joblib')            
    
    # join results files in one dataframe
    results_df = pd.concat([results_files[i] for i in range(0, num_folds)], axis=1)
    metrics = int(results_df.shape[1] / num_folds - 1)
    
    # drop extra params columns
    results_df.rename(columns={"params0": "Params"}, inplace=True)
    results_df.drop([i for i in results_df.columns if 'params' in i], axis=1, inplace=True)
    
    # convert data columns to numeric 
    def to_numeric_ignore(x, errors='ignore'):
        return pd.to_numeric(x, errors=errors)    
    results_df = results_df.apply(to_numeric_ignore)
    
    # loops through metrics and create mean column for each metric
    metric_names=[]
    for i in results_df.columns[0:metrics+1]:
        i = i[:-1]
        metric_names.append(i)
        results_df[i + '_mean'] = results_df[[x for x in results_df.columns 
                                              if i in x]].mean(axis=1)
    
    results_df.reset_index(drop=True, inplace=True)
        
    # instantiating best_scores dataframe
    best_scores = pd.DataFrame(columns=['Params', 'Metric', 'Score'])
        
    negative_better = ['MultiClass', 'iteration', 'logloss']
    positive_better = ['Accuracy']
    
      
    # get index of best parameters
    best_param_idx = []
    for i in metric_names:
        if i in negative_better:
            best_param_idx = results_df[i+ '_mean'].idxmin(axis=0)
        if i in positive_better:
            best_param_idx = results_df[i+ '_mean'].idxmax(axis=0)

        row = pd.DataFrame({'Metric': [i + '_mean'], 
                            'Params': [results_df.loc[best_param_idx, 'Params']], 
                            'Score': [results_df.loc[best_param_idx, i + '_mean']]})
        best_scores = best_scores.append(row, ignore_index=True)

    results_df.insert(0, 'Parameters', results_df.Params)
    results_df.drop(['Params', 'Param_mean'], axis=1, inplace=True)

    best_scores = best_scores[best_scores.Metric != 'Param_mean']
    
    display(best_scores)
    
    negative_columns = []
    positive_columns = []
    
    # highlight columns where negative metrics are better
    for i in negative_better:
        negative_columns.extend([x for x in results_df.columns if i in x])
    
    # highlight columns where positive metrics are better
    for i in positive_better:
        positive_columns.extend([x for x in results_df.columns if i in x])
        
    display(results_df.style
    .highlight_max(subset = positive_columns, color='lightgreen')
    .highlight_min(subset= negative_columns, color='lightgreen'))
    
    return results_df, best_scores

### 2.1. Results - completion after viewing multiclass

In [9]:
results_complete_after, best_scores_complete_after = grid_search_results('multiclass_gridsearch_inc10', 5)

Unnamed: 0,Metric,Params,Score
0,Accuracy_mean,"{'depth': 7, 'l2_leaf_reg': 1, 'learning_rate': 0.01}",0.745192
1,MultiClass_mean,"{'depth': 6, 'l2_leaf_reg': 5, 'learning_rate': 0.03}",0.588902
2,iteration_mean,"{'depth': 10, 'l2_leaf_reg': 1, 'learning_rate': 0.07}",161.6


Unnamed: 0,Parameters,Accuracy0,MultiClass0,iteration0,Accuracy1,MultiClass1,iteration1,Accuracy2,MultiClass2,iteration2,Accuracy3,MultiClass3,iteration3,Accuracy4,MultiClass4,iteration4,Accuracy_mean,MultiClass_mean,iteration_mean
0,"{'depth': 6, 'l2_leaf_reg': 1, 'learning_rate': 0.07}",0.719371,0.632694,161,0.750442,0.593918,347,0.749066,0.587632,372,0.752311,0.566662,755,0.753196,0.573183,600,0.744877,0.590818,447.0
1,"{'depth': 6, 'l2_leaf_reg': 1, 'learning_rate': 0.03}",0.720059,0.631393,473,0.749459,0.592918,632,0.749164,0.58608,1041,0.755359,0.562912,1937,0.751622,0.573435,1545,0.745133,0.589348,1125.6
2,"{'depth': 6, 'l2_leaf_reg': 1, 'learning_rate': 0.01}",0.720059,0.631847,1120,0.749754,0.593823,1709,0.748279,0.586026,2726,0.751917,0.566763,2998,0.750836,0.575224,2998,0.744169,0.590737,2310.2
3,"{'depth': 6, 'l2_leaf_reg': 3, 'learning_rate': 0.07}",0.720551,0.631514,221,0.745919,0.596149,251,0.746509,0.587413,615,0.753097,0.566722,722,0.752212,0.573214,760,0.743658,0.591003,513.8
4,"{'depth': 6, 'l2_leaf_reg': 3, 'learning_rate': 0.03}",0.721337,0.631446,568,0.747591,0.593472,660,0.747788,0.586198,1052,0.755457,0.56249,2078,0.750737,0.573819,1292,0.744582,0.589485,1130.0
5,"{'depth': 6, 'l2_leaf_reg': 3, 'learning_rate': 0.01}",0.719469,0.632198,1120,0.749754,0.593146,2429,0.747099,0.58618,2999,0.750344,0.567918,2997,0.749459,0.576541,2999,0.743225,0.591197,2508.8
6,"{'depth': 6, 'l2_leaf_reg': 5, 'learning_rate': 0.07}",0.720059,0.632089,194,0.746509,0.593839,352,0.747001,0.587904,617,0.753294,0.567293,833,0.753589,0.570826,1177,0.74409,0.59039,634.6
7,"{'depth': 6, 'l2_leaf_reg': 5, 'learning_rate': 0.03}",0.720157,0.631593,601,0.748574,0.592998,823,0.747296,0.587704,1067,0.755064,0.562383,2139,0.751622,0.569831,2310,0.744543,0.588902,1388.0
8,"{'depth': 6, 'l2_leaf_reg': 5, 'learning_rate': 0.01}",0.719469,0.631972,1387,0.749853,0.593118,2488,0.747198,0.586847,2980,0.749066,0.569103,2997,0.749164,0.576687,2999,0.74295,0.591546,2570.2
9,"{'depth': 6, 'l2_leaf_reg': 10, 'learning_rate': 0.07}",0.719272,0.631473,299,0.749361,0.593706,515,0.747001,0.58805,548,0.755752,0.565046,1120,0.752507,0.574497,757,0.744779,0.590554,647.8


Here we can see that the parameters that generated the best mean accuracy score were: 

In [23]:
best_params = best_scores_complete_after.Params[0]
best_params

{'depth': 7, 'l2_leaf_reg': 1, 'learning_rate': 0.01}

Across the whole breadth of parameters the standard deviation of the mean accuracy per parameter combination was only 0.001454. This indicates very marginal difference in model accuracy when selecting between parameters.

In [24]:
results_complete_after.Accuracy_mean.describe()

count    60.000000
mean      0.743048
std       0.001454
min       0.740334
25%       0.741917
50%       0.743225
75%       0.744184
max       0.745192
Name: Accuracy_mean, dtype: float64

### 2.2 Results - Completion binary

In [10]:
results_complete, best_scores = grid_search_results('complete_gridsearch', 5)

Unnamed: 0,Metric,Params,Score
1,logloss_mean,"{'depth': 7, 'l2_leaf_reg': 3, 'learning_rate': 0.03}",0.348861
2,Accuracy_mean,"{'depth': 9, 'l2_leaf_reg': 1, 'learning_rate': 0.03}",0.840944
3,iteration_mean,"{'depth': 7, 'l2_leaf_reg': 3, 'learning_rate': 0.07}",208.4


Unnamed: 0,Parameters,logloss0,Accuracy0,iteration0,logloss1,Accuracy1,iteration1,logloss2,Accuracy2,iteration2,logloss3,Accuracy3,iteration3,logloss4,Accuracy4,iteration4,logloss_mean,Accuracy_mean,iteration_mean
0,"{'depth': 6, 'l2_leaf_reg': 1, 'learning_rate': 0.07}",0.422632,0.783677,397,0.357815,0.838545,192,0.332788,0.852311,184,0.314578,0.864405,556,0.327861,0.856834,223,0.351135,0.839154,310.4
1,"{'depth': 6, 'l2_leaf_reg': 1, 'learning_rate': 0.03}",0.422619,0.783776,796,0.357919,0.839921,493,0.332392,0.851524,533,0.320245,0.85821,662,0.320974,0.859882,557,0.35083,0.838663,608.2
2,"{'depth': 6, 'l2_leaf_reg': 1, 'learning_rate': 0.01}",0.422698,0.784267,1762,0.356932,0.83825,1259,0.333271,0.851327,1762,0.324929,0.859685,812,0.322745,0.858997,1730,0.352115,0.838505,1465.0
3,"{'depth': 6, 'l2_leaf_reg': 3, 'learning_rate': 0.07}",0.42219,0.784661,232,0.362201,0.835693,235,0.335489,0.849361,195,0.316048,0.86116,629,0.324031,0.857522,259,0.351992,0.837679,310.0
4,"{'depth': 6, 'l2_leaf_reg': 3, 'learning_rate': 0.03}",0.422314,0.784661,538,0.357693,0.839331,496,0.332504,0.852212,708,0.315782,0.861455,1046,0.326563,0.857719,652,0.350971,0.839076,688.0
5,"{'depth': 6, 'l2_leaf_reg': 3, 'learning_rate': 0.01}",0.422029,0.784562,1811,0.358094,0.837365,1063,0.333525,0.850836,1461,0.316274,0.859882,2999,0.323875,0.857522,1350,0.350759,0.838033,1736.8
6,"{'depth': 6, 'l2_leaf_reg': 5, 'learning_rate': 0.07}",0.421943,0.783776,329,0.358173,0.839823,216,0.335137,0.849361,333,0.320559,0.858702,273,0.326365,0.855457,289,0.352435,0.837424,288.0
7,"{'depth': 6, 'l2_leaf_reg': 5, 'learning_rate': 0.03}",0.421956,0.784562,767,0.357716,0.83943,352,0.332655,0.850344,771,0.318186,0.860079,961,0.322488,0.858014,613,0.3506,0.838486,692.8
8,"{'depth': 6, 'l2_leaf_reg': 5, 'learning_rate': 0.01}",0.421887,0.784267,2144,0.358135,0.838446,1101,0.333107,0.850442,1723,0.315477,0.860669,2738,0.328253,0.856539,954,0.351372,0.838073,1732.0
9,"{'depth': 6, 'l2_leaf_reg': 10, 'learning_rate': 0.07}",0.42099,0.786234,374,0.361422,0.835103,179,0.335663,0.849558,369,0.317741,0.859587,380,0.33102,0.853982,272,0.353367,0.836893,314.8


The best parameters were:

In [164]:
best_params = best_scores.Params[2]
best_params

{'depth': 9, 'l2_leaf_reg': 1, 'learning_rate': 0.03}

In [166]:
results_complete.Accuracy_mean.describe()

count    48.000000
mean      0.839393
std       0.000987
min       0.836893
25%       0.838653
50%       0.839548
75%       0.840261
max       0.840944
Name: Accuracy_mean, dtype: float64

Again we see the standard deviation of the accuracy mean very low at below 0.000987. Again there is marginal difference in accuracy between the chosen parameters. 

### 2.3 Results Comparison

We will now retrain the two models, this time using the full train set and scoring against the test set.

We can there compare the accuracy for:
    
* Default parameters with no feature engineering
* Default parameters with feature engineering
* Optimised parameters with no feature engineering
* Optimised parameters with feature engineering

In [171]:
parameters = {'complete_from_view': {'depth': 7, 'l2_leaf_reg': 1, 'learning_rate': 0.01}, 
             'complete': {'depth': 9, 'l2_leaf_reg': 1, 'learning_rate': 0.03},
             'default': {'depth': 6, 'l2_leaf_reg': 3, 'learning_rate': 0.03}}

labelling = {'complete_from_view': {'failed': 0, 'complete after':1, 'complete before':2},
             'complete': {'failed': 0, 'complete':1}}

experiments = ['complete_from_view', 'complete']

compact_label = {'complete_from_view': {'failed': 0, 'complete after':1, 'complete before':2},
                'complete': {'failed': 0, 'complete':1}}

In [189]:
def compare_accuracies(experiments, compact_label, parameters):
    '''
    Trains CatBoost Classifer across specified hyper parameter, label, feature and experiment 
    sets. Compares and returns results in DataFrame
    Saves results as '../../models/results_summary_compare.joblib'
    
    Parameters
    ----------
    experiments: list of experiment name strings
    compact_label: dictionary of dictionary compact labels
    parameters: dictionary of dictionary optimal and default parameters    
            
    Returns
    -------
    DataFrame
    '''  
    results_summary=[]
    
    # train classifer across parameter, label, feature, experiment combinations
    for engineering in [True, False]:    
        for experiment in experiments:
            for param_selection in ['default', experiment]:
                compact = compact_label[experiment] 
                results_summary.append([engineering, experiment, 
                                        parameters[param_selection], 
                                        exploratory_training(
                                            labels=labels[experiment], 
                                            labels_compact=compact_label, 
                                            feature_engineering=engineering, verbose=False, 
                                            return_model=False, **parameters[param_selection])])
                   
    pd.set_option('max_colwidth', 200)
    
    #convert to DataFrame
    results_accuracy = pd.DataFrame(results_summary, 
                                    columns=['Feature Engineering', 'Experiment', 'Parameters', 
                                             'Accuracy'])
    # reorder columns
    results_accuracy = results_accuracy[['Parameters', 'param', 'Experiment', 
                                         'Feature Engineering', 'Accuracy']]
    results_accuracy.sort_values(['Experiment', 'Feature Engineering', 'Accuracy'], inplace=True)
    
    # calculate differences between accuracies
    results_accuracy['Delta'] = results_accuracy.Accuracy.diff(periods=1)
    results_accuracy.fillna(0, inplace=True)
    
    joblib.dump(results_summary, '../../models/results_summary.joblib', compress=True)
        
    return results_accuracy.style.format({'Delta': "{:.2%}"})

In [None]:
# uncomment to run, otherwise load results from results_summary.joblib
#results_accuracy = compare_accuracies(experiments, compact_label, labelling, parameters)

In [None]:
results_accuracy = joblib.load('../../models/results_summary.joblib')

In [191]:
results_accuracy.sort_values(['Experiment', 'Feature Engineering', 'Accuracy'], inplace=True)
results_accuracy['Delta'] = results_accuracy.Accuracy.diff(periods=1).fillna(0)
results_accuracy.style.format({'Delta': "{:.2%}"})

Unnamed: 0,Parameters,Experiment,Feature Engineering,Accuracy,Delta
0,"{'depth': 9, 'l2_leaf_reg': 1, 'learning_rate': 0.03}",complete,False,0.816138,0.00%
1,"{'depth': 6, 'l2_leaf_reg': 3, 'learning_rate': 0.03}",complete,False,0.816466,0.03%
2,"{'depth': 9, 'l2_leaf_reg': 1, 'learning_rate': 0.03}",complete,True,0.847863,3.14%
3,"{'depth': 6, 'l2_leaf_reg': 3, 'learning_rate': 0.03}",complete,True,0.851993,0.41%
4,"{'depth': 6, 'l2_leaf_reg': 3, 'learning_rate': 0.03}",complete_from_view,False,0.668458,-18.35%
5,"{'depth': 7, 'l2_leaf_reg': 1, 'learning_rate': 0.01}",complete_from_view,False,0.670425,0.20%
6,"{'depth': 6, 'l2_leaf_reg': 3, 'learning_rate': 0.03}",complete_from_view,True,0.712834,4.24%
7,"{'depth': 7, 'l2_leaf_reg': 1, 'learning_rate': 0.01}",complete_from_view,True,0.715653,0.28%


### 2.4 Results analysis 
#### Complete from view multiclass

* Without any feature engineering and using the default parameters of {'depth': 6, 'l2_leaf_reg': 3, 'learning_rate': 0.03} we managed to achieve an accuracy of 0.668.
* Using the best parameters from cross validation further increased accuracy by 0.2% to 0.67, a very minor improvement.
* Using the default parameters and adding the extensive feature engineering improved accuracy by 4.24% vs default paramters with no feature engineering. 
* The best cross validation hyper parameters of {'depth': 7, 'l2_leaf_reg': 1, 'learning_rate': 0.01} further improved on this accuracy by 0.28% to give a maximum accuracy of 0.716.

#### Complete Binary

* Without any feature engineering and using the default parameters of {'depth': 6, 'l2_leaf_reg': 3, 'learning_rate': 0.03} we managed to achieve an accuracy of 0.8164.
* Using the best parameters from cross validation of {'depth': 9, 'l2_leaf_reg': 1, 'learning_rate': 0.03}, when training with the full training set, actually decreased test accuracy by 0.03% down to 0.8161. 
* Going forward we therefore utilise the default parameters {'depth': 6, 'l2_leaf_reg': 3, 'learning_rate': 0.03}.
* Using the default parameters with feature engineering improved accuracy by 3.14% vs the default (and also optimal hyper parameters) to give a maximum accuracy of 0.8519.

#### Overall insights
* Feature engineering provided the greatest increase in accuracy across both models although my expectation was that it would have had a greater impact on model accuracy than it did.
* Optimising parameters with grid search and cross validation only provided very minor improvement in accuracy for the Complete from view multiclass model and actually reduced accuracy in the Complete Binary model.
* This is indicative of how well Catboost is able to select its default parameters.