### Optimal Parameters

The purpose of this notebook is to find the best parameters for the different sets of features. In order to do so, a model needs to be selected, along with the parameter space that will be searched. First a coarse grid search is performed, and then a more narrow one. The result of a notebook end to end run, should be a csv file with a specified name that mentions the model it is for. The csv for an SVM for example should look like:

    parameter, value  <-- (header)
    kernel, rbf
    nu, 0.01
    degree, 3
    gamma, 0.05
    coef0, 0.0
    
In order to successfully run the notebook for its purpose, the sections that need to be modified for a new algorithm are:

    1. Model Selection (where a new model is initialized)
    2. Parameter Space - Coarse (where the selected parameters need to correspond to the algorithm)
    3. Drop a cleaner csv with only the best parameters (where the parameter names need to be set
       accordingly.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import os
import itertools
from time import time
from IPython.display import display, clear_output


# Pretty display for notebooks
%matplotlib inline

random_state = 0

rows_each = 60

# specify the place from where we will use the data
source_path_rel = 'data/Main collection - features/'

# specify the target path for dropping the parameters
target_path_rel = 'data/Main collection - results/parameters/'

# find the file names in the directory
feature_files = os.listdir(source_path_rel)

# remove the readme, and the Giorgos files
feature_files.remove('README.md')
feature_files.remove('Giorgos Mon 19.csv')
feature_files.remove('Giorgos Wed 21.csv')
feature_files.remove('Giorgos Fri 23.csv')
feature_files.remove('Giorgos Mon 26.csv')
feature_files.remove('best ratios by multiclass RF')

print("The number of files selected from the directory is {}.".format(len(feature_files)))

The number of files selected from the directory is 16.


In [2]:
# DEBUGGING OPTIONS
np.set_printoptions(threshold=80)
pd.set_option('display.max_rows', 8)
pd.set_option('display.max_columns', 8)

### Load all data in a single DataFrame

The process here is to sample 60 locks from each participant. Some participants have more than 60 locks and in that case we select the lokcs randomly.

In [3]:
# initialize the data frame with the locks that will be used
X = pd.DataFrame()

for i, file in enumerate(feature_files):
    
    # read each features file into a df
    df = pd.read_csv(source_path_rel+file)
    
    # select a sample of 60 values (without replacement)
    df = df.sample(n=rows_each, random_state=random_state, axis=0)
    
    # sort the sampled dataframe by index 
    df.index = range(df.shape[0])
    
    # add the data to the bigger dataframe
    X = pd.concat([X, df.loc[:, :]], ignore_index=True)
    
print("The dataframe that holds the data is as follows:")
display(X)

The dataframe that holds the data is as follows:


Unnamed: 0,AB_mil,AB_xyz,AB|AC_mil,AB|AC_xyz,...,zSpeed_range,zSpeed_skew,zSpeed_std,zSpeed_var
0,434.0,258.930923,0.418919,0.655369,...,1.081136,-0.149765,0.184878,0.034180
1,502.0,297.418459,0.548035,0.629426,...,2.088235,1.069284,0.282818,0.079986
2,529.0,339.767000,0.549325,0.757873,...,1.235294,0.071162,0.249953,0.062477
3,503.0,301.571361,0.558269,0.689733,...,2.217469,1.207258,0.290968,0.084663
...,...,...,...,...,...,...,...,...,...
956,445.0,158.355584,0.654412,0.820856,...,0.558824,0.489388,0.124969,0.015617
957,436.0,138.876355,0.586022,0.597251,...,0.491597,0.393809,0.109206,0.011926
958,495.0,183.983717,0.651316,0.818607,...,0.897829,-0.483202,0.137261,0.018841
959,477.0,175.668280,0.671831,0.857197,...,0.421429,0.742492,0.105224,0.011072


### Feature sets selection

After loading the data, the next step is to select features. The parameter tuning will be across all the feature groups. There will be 10 groups of features that we will deal with here. These 10 groups of features can be grouped in 3 broader groups which are the overal/holistic features, the distances and the ratios.

In [4]:
def loadRatioFeatureIds(file_name, n=30):
    """ Loads the file with features. """
    df = pd.read_csv('data/Main collection - features/best ratios by multiclass RF/' + file_name)
    return df.loc[:n-1, 'feature id']

In [5]:
def findMagnitudes(f1, f2):
    """ Selects the features for both f1 and f2 lists that start with mag. """
    ff = [f for f in f1 if f[:3]=='mag']
    ff.extend([f for f in f2 if f[:3]=='mag'])
    return ff

In [6]:
# get all the features
all_features = X.columns.values.tolist()

# find all positional features
pos_features = [f for f in all_features if f[1:4]=='Pos' and f[4]!='I']

# find all position intervals
posInc_features = [f for f in all_features if f[1:7]=='PosInc' or f[3:9]=='PosInc']

# find all speed intervals
speed_features = [f for f in all_features if f[1:6]=='Speed' or f[3:8]=='Speed']

# find all x axis features
x_features_pos_posInc_speed = [f for f in pos_features if f[0]=='x'] + \
                                [f for f in posInc_features if f[0]=='x'] + \
                                [f for f in speed_features if f[0]=='x']

# find all y axis features
y_features_pos_posInc_speed = [f for f in pos_features if f[0]=='y'] + \
                                [f for f in posInc_features if f[0]=='y'] + \
                                [f for f in speed_features if f[0]=='y']

# find all z axis features
z_features_pos_posInc_speed = [f for f in pos_features if f[0]=='z'] + \
                                [f for f in posInc_features if f[0]=='z'] + \
                                [f for f in speed_features if f[0]=='z']

# find all magnitude features
magnitude_features = findMagnitudes(posInc_features, speed_features)

# combine best overall features
best_overall_features = x_features_pos_posInc_speed + y_features_pos_posInc_speed

# combine all overall features
all_overall_features = pos_features + posInc_features + speed_features

## Distances next ##

# find all euclidean distances
euc_dist_features = [f for f in all_features if f[2]=='_' and f[-3:]=='xyz']

# find all temporal distances
mil_dist_features = [f for f in all_features if f[2]=='_' and f[-3:]=='mil']

# combine euc and mil distances
all_distances = euc_dist_features + mil_dist_features

## Ratios next ##

# find the selected euclidean ratios
euc_all_ratio_features = [f for f in all_features if f[2]=='|' and f[-3:]=='xyz']

# find the selected temporal ratios
mil_all_ratio_features = [f for f in all_features if f[2]=='|' and f[-3:]=='mil']

# combine euc and temp ratios
all_ratio_features = euc_all_ratio_features + mil_all_ratio_features

## best ratios ##

# find the selected euclidean ratios
euc_best_ratio_features = loadRatioFeatureIds('Euclidean distance ratios.csv')

# find the selected temporal ratios
mil_best_ratio_features = loadRatioFeatureIds('Temporal distance ratios.csv')

# all best ratio features
all_best_ratio_features = pd.concat([euc_best_ratio_features, 
                                     mil_best_ratio_features], 
                                    ignore_index=True)

## Best all next ##

## best Combinations ## (1)
best_overall_distances = best_overall_features + all_distances
best_overall_ratios = best_overall_features + list(all_best_ratio_features)
best_ratios_distances = list(all_best_ratio_features) + all_distances
best_overall_distances_ratios = best_overall_features + all_distances + list(all_best_ratio_features)

## best Combinations ## (2 - unused)
pos_with_distances = pos_features + all_distances
pos_with_best_ratios = pos_features + list(all_best_ratio_features)
pos_with_distances_with_best_ratios = pos_features + all_distances + list(all_best_ratio_features)

# all features
#all_features

In [7]:
feature_sets = [pos_features, posInc_features, speed_features, euc_dist_features,
                mil_dist_features, euc_all_ratio_features, mil_all_ratio_features]

feature_set_names = ['positions', 'position intervals', 'speed intervals', 'euclidean distances',
                     'temporal distances', 'euclidean ratios (all 356)', 'temporal ratios (all 356)']

In [8]:
feature_sets = [best_overall_distances_ratios]

feature_set_names = ['best overall dist ratios']

### Model Selection

The next step is to initialize the models and make a selection of the one that will be used. Because this notebook is supposed to use only one model, the selected classifier needs to be selected in the next cell by commenting out the remaining options.

In [9]:
# from sklearn import svm
# selected_classifier = svm.OneClassSVM()
# selected_classifier_name = 'one-class SVM'

from sklearn.ensemble import IsolationForest
selected_classifier = IsolationForest()
selected_classifier_name = 'Isolation Forest'

# from sklearn.covariance import EllipticEnvelope
# selected_classifier = EllipticEnvelope()
# selected_classifier_name = 'Elliptic Envelope'

# from sklearn.neighbors import LocalOutlierFactor
# selected_classifier = LocalOutlierFactor()
# selected_classifier_name = 'Local Outlier Factor'

### Parameter Space - Coarse

Here for every algorithm that will be probed, we will define the parameter space for a coarse grid search.

##### Parameters for one-class SVM:

In [10]:
nu = [0.1, 0.3, 0.5, 0.7, 0.9]    # nu
degree = [1, 2, 3, 4]             # degree (poly)
gamma = [0.1, 0.3, 0.5, 0.7, 0.9] # gamma (rbf, poly sigmoid)
coef0 = [0.1, 0.3, 0.5, 0.7, 0.9] # coef0 (poly, sigmoid)

def_degree = [3.0]
def_gamma = ['auto']
def_coef0 = [0.0]

# initialize a list of parameters the classifier can take
parameters_SVM = [{'kernel': ['linear'],
                   'nu': nu,
                   'degree': def_degree,
                   'gamma': def_gamma,
                   'coef0': def_coef0},
                  {'kernel': ['rbf'],
                   'nu': nu,
                   'degree': def_degree,
                   'gamma': gamma,
                   'coef0': def_coef0},
                  {'kernel': ['poly'],
                   'nu': nu,
                   'degree': degree,
                   'gamma': gamma,
                   'coef0': coef0},
                  {'kernel': ['sigmoid'],
                   'nu': nu,
                   'degree': def_degree,
                   'gamma': gamma,
                   'coef0': coef0}]

param_names_SVM = ['kernel', 'nu', 'degree', 'gamma', 'coef0']

In [11]:
# initialize a list of parameters the classifier can take
parameters_SVM_only_rbf = [{'kernel': ['rbf'],
                           'nu': nu,
                           'degree': def_degree,
                           'gamma': gamma,
                           'coef0': def_coef0}]

param_names_SVM = ['kernel', 'nu', 'degree', 'gamma', 'coef0']

##### Parameters for Isolation Forest:

In [12]:
estimators = [300, 400, 500, 600]      # number of estimators
contamination = [0.05, 0.2, 0.35] # contamination
max_features = [0.5, 0.7, 0.9]    # max features

def_samples = [1.0]

parameters_IF = [{'n_estimators': estimators,
                  'max_samples': def_samples,
                  'contamination': contamination,
                  'max_features': max_features}]

param_names_IF = ['n_estimators', 'max_samples', 'contamination', 'max_features']

##### Make list of parameters:

In [13]:
def findListOfParameters(parameters):
    """ Returns all combinations by the dictionary values, in a list of dictionaries """
    return [tup for d in parameters for tup in list(itertools.product(*d.values()))]

In [14]:
selected_parameters = findListOfParameters(parameters_IF)

### Coarse Grid Search

In order to find the optimal parameter combination, we will run the selected model for every user and every group of features. Then we will select the parameters with the best mean score across all combinations of users with features. The data structure that we will use will look like:

|                 | f1,u1 | f1,u2 |  ...  | f1,uN |  ...  | fN,u1 | fN,u2 |  ...  | fN,uN |
|-----------------|-------|-------|-------|-------|-------|-------|-------|-------|-------|
|   parameters1   |  val  |  val  |  ...  |  val  |  ...  |  val  |  val  |  ...  |  val  |
|   parameters2   |  val  |  val  |  ...  |  val  |  ...  |  val  |  val  |  ...  |  val  |
|       ...       |  val  |  val  |  ...  |  val  |  ...  |  val  |  val  |  ...  |  val  |
|   parametersN   |  val  |  val  |  ...  |  val  |  ...  |  val  |  val  |  ...  |  val  |

In [15]:
from sklearn.model_selection import KFold
from sklearn.preprocessing import MinMaxScaler
from scipy import interp
from sklearn.metrics import auc, roc_curve, roc_auc_score

In [16]:
def addParamsToClf(clf, p, clf_name):
    """ Adds the parameters of the classifier according to its name. """
    
    if clf_name == 'one-class SVM':
        clf.set_params(kernel=p[0], nu=p[1], degree=p[2], gamma=p[3], coef0=p[4])
    elif clf_name == 'Isolation Forest':
        clf.set_params(n_estimators=p[0], max_samples=p[1], contamination=p[2], max_features=p[3])
    else:
        raise ValueError('The classifier name is not enlisted')
    
    return clf

In [17]:
def classifyWithCV(X_positive, X_negative, clf, scaler, folds=5, random_state=0):
    """ Finds the convergence line of a classifier, in different samples. """
    
    # initialize a cross validation object
    kf = KFold(n_splits=folds, shuffle=True, random_state=random_state)

    mean_FPR = np.linspace(0, 1, 100)

    TPRs_cv = []
        
    for train, test in kf.split(X_positive):

        # fit the scalar
        scaler.fit(X_positive.loc[train, :])

        # scale all data before fitting the classifier or making predictions
        X_pos_transformed_train = scaler.transform(X_positive.loc[train, :])
        X_pos_transformed_test = scaler.transform(X_positive.loc[test, :])
        X_neg_transformed_test = scaler.transform(X_negative)
        
        # fit the classifier
        clf.fit(X_pos_transformed_train)

        # make prediction with decision function on Positive data
        prediction_pos_DF = clf.decision_function(X_pos_transformed_test)

        # make prediction with decision function on Negative data
        prediction_neg_DF = clf.decision_function(X_neg_transformed_test)
        
        y_true = [1 for _ in range(prediction_pos_DF.shape[0])] + [0 for _ in range(prediction_neg_DF.shape[0])]
        y_scores = [s for s in prediction_pos_DF] + [s for s in prediction_neg_DF]
        
        # find values for the ROC curve
        fpr, tpr, thresholds = roc_curve(y_true, y_scores, pos_label=1)
            
        # interpolate the linear space of mean_FPR to the fpr, tpr coordinates
        TPRs_cv.append(interp(mean_FPR, fpr, tpr))
        TPRs_cv[-1][0] = 0.0
            
    # find the mean curve
    mean_tpr = np.mean(TPRs_cv, axis=0)
    mean_tpr[-1] = 1.0
    
    return auc(mean_FPR, mean_tpr)

In [18]:
def bestParameter_GridSearch(X_all, clf, params, features, users, clf_name, search_type,
                             locks_per_participant=60):
    """ Finds the best set of parameters across all users and features """
    
    # initialize the dataframe for all values
    df = pd.DataFrame()
    
    # initialize a minmax scalar
    scaler = MinMaxScaler()
    
    for cc, p in enumerate(params):
        
        # add the parameters to the classifier according to its name
        classifier = addParamsToClf(selected_classifier, p, selected_classifier_name)
        
        if clf_name=='one-class SVM':
            param_string = '{}, {}, {}, {}, {}'.format(*p)
        elif clf_name=='Isolation Forest':
            param_string = '{}, {}, {}, {}'.format(*p)
        else:
            raise ValueError("Something is wrong with the classifier name")
        
        print("Currently working on the {}/{} set of parameters which is: "
              .format(cc+1, len(params))+param_string+" ...")
        
        for i, f in enumerate(features):

            for j, u in enumerate(users):
                
                # define the column name
                col = 'f'+str(i+1)+', u'+str(j+1)
                
                # find the starting index of the original dataframe
                idx = j*locks_per_participant
        
                # find the mask for positive values to slice the dataset
                positive_mask = X_all.index.isin(range(idx, idx+locks_per_participant))

                # find the lines that correspond to this participant and the feature set
                X_pos = X_all.loc[positive_mask, f]

                # find the lines that correspond to all other participants and the feature set
                X_neg = X_all.loc[~positive_mask, f]

                # reset the index of the features dataframe
                X_pos.index = range(X_pos.shape[0])
                
                # reset the index of the features dataframe
                X_neg.index = range(X_neg.shape[0])
                
                # run classify with cv
                df.loc[param_string, col] = classifyWithCV(X_pos, X_neg, classifier, scaler)
                
            # give some feedback that the feature is finished
            print('The models are finished for {}/10 features'.format(i+1))
            
        clear_output()
            
    cols = df.columns.tolist()
    
    # set new column for the parameters
    df.loc[:, 'parameters'] = df.index
    
    # make new column for the mean scores
    df.loc[:, 'mean AUC'] = np.mean(df.loc[:, cols], axis=1)
    
    # make new column for the mean scores
    df.loc[:, 'std AUC'] = np.std(df.loc[:, cols], axis=1)
    
    # reset the index of the features dataframe
    df.index = range(df.shape[0])
    
    cols = df.columns.tolist()
    
    # reorder the columns
    cols = cols[-3:] + cols[:-3]
    
    df = df[cols]
    
    print("The result of the grid search for the parameters is:\n")
    
    # show the dataframe
    display(df)
    
    # log the results
    df.to_csv(target_path_rel + '{} logs for {} (AUC).csv'.format(search_type, clf_name), index=False)
    
    # find the winning line
    winning_line = df.loc[df.loc[:, 'mean AUC'].idxmax(), :]
        
    # return the parameter with the smallest sum of mean FN and FP
    return winning_line['parameters'], winning_line['mean AUC'], winning_line['std AUC']

In [19]:
print('The selected parameter sets are {}.'. format(len(selected_parameters)))

The selected parameter sets are 36.


In [20]:
start = time()
best_parameters_coarse_search,\
best_mean_of_AUC_coarse_search,\
std_of_AUC_mean_coarse_search = bestParameter_GridSearch(X,
                                                         selected_classifier,
                                                         selected_parameters,
                                                         feature_sets,
                                                         feature_files, # meaning users
                                                         selected_classifier_name,
                                                         'Coarse Grid Search')

print('It took {:.2f} minutes to run the above function'.format((time() - start)/60))

The result of the grid search for the parameters is:



Unnamed: 0,parameters,mean AUC,std AUC,"f1, u1",...,"f1, u13","f1, u14","f1, u15","f1, u16"
0,"300, 1.0, 0.05, 0.5",0.974548,0.016361,0.981818,...,0.944444,0.968182,0.986027,0.994444
1,"300, 1.0, 0.05, 0.7",0.974621,0.015621,0.979966,...,0.951010,0.973737,0.987710,0.994444
2,"300, 1.0, 0.05, 0.9",0.976441,0.015176,0.981650,...,0.956229,0.975253,0.987542,0.993939
3,"300, 1.0, 0.2, 0.5",0.974779,0.014848,0.979293,...,0.947475,0.972222,0.986195,0.994613
...,...,...,...,...,...,...,...,...,...
32,"600, 1.0, 0.2, 0.9",0.976452,0.014705,0.980135,...,0.947811,0.976599,0.988047,0.994949
33,"600, 1.0, 0.35, 0.5",0.977031,0.013886,0.980135,...,0.952189,0.976431,0.988552,0.994949
34,"600, 1.0, 0.35, 0.7",0.975705,0.014373,0.978788,...,0.947643,0.972559,0.985354,0.994444
35,"600, 1.0, 0.35, 0.9",0.975989,0.014400,0.979630,...,0.950505,0.974916,0.985859,0.994613


It took 252.49 minutes to run the above function


In [21]:
print("The best parameters of the coarse search made with the {} classifier are:\n{}"
      .format(selected_classifier_name, best_parameters_coarse_search))
print('The error on those parameters in terms of AUC scores is {}.'
      .format(best_mean_of_AUC_coarse_search))
print('The std of this error across all features and participants is {}.'
      .format(std_of_AUC_mean_coarse_search))

The best parameters of the coarse search made with the Isolation Forest classifier are:
600, 1.0, 0.2, 0.5
The error on those parameters in terms of AUC scores is 0.9774936868686869.
The std of this error across all features and participants is 0.015097959824428575.


### Fine Grid Search

Here we define the parameters for a more refined grid search according to the results of the previous search.

##### Parameters for one-class SVM:

In [22]:
new_nu = [0.001, 0.005, 0.01, 0.015, 0.1, 0.15, 0.2, 0.25, 0.3]        # nu - best was 0.1 (rbf)
new_gamma = [0.0001, 0.0005, 0.001, 0.005, 0.01, 0.05, 0.1, 0.15, 0.2] # gamma - best was 0.1 (rbf)

new_def_degree = [3.0]
new_def_coef0 = [0.0]

# initialize a list of parameters the classifier can take
new_parameters_SVM = [{'kernel': ['rbf'],
                       'nu': new_nu,
                       'degree': new_def_degree,
                       'gamma': new_gamma,
                       'coef0': new_def_coef0}]

##### Parameters for Isolation Forest:

In [30]:
estimators = [350, 400, 450]           # number of estimators
contamination = [0.1, 0.15, 0.2, 0.25] # contamination
max_features = [0.4, 0.5, 1.0]         # max features

# for group 25 specifics
estimators = [550, 600, 650]                           # number of estimators
contamination = [0.1, 0.15, 0.2, 0.25, 0.3]            # contamination
max_features = [0.35, 0.4, 0.45, 0.5, 0.55, 0.6, 0.65] # max features


def_samples = [1.0]

new_parameters_IF = [{'n_estimators': estimators,
                      'max_samples': def_samples,
                      'contamination': contamination,
                      'max_features': max_features}]

In [31]:
selected_parameters_fine_grid_search = findListOfParameters(new_parameters_IF)

print("The number of parameters for the coarse grid search is {}."
      .format(len(selected_parameters_fine_grid_search)))

The number of parameters for the coarse grid search is 105.


In [32]:
start = time()
best_parameters_fine_search,\
best_mean_of_AUC_fine_search,\
std_of_AUC_mean_fine_search = bestParameter_GridSearch(X, 
                                                       selected_classifier, 
                                                       selected_parameters_fine_grid_search,
                                                       feature_sets, 
                                                       feature_files,
                                                       selected_classifier_name, 
                                                       'Fine Grid Search')

print('It took {:.2f} minutes to run the above function'.format((time() - start)/60))

The result of the grid search for the parameters is:



Unnamed: 0,parameters,mean AUC,std AUC,"f1, u1",...,"f1, u13","f1, u14","f1, u15","f1, u16"
0,"550, 1.0, 0.1, 0.35",0.976652,0.013936,0.977609,...,0.951010,0.975589,0.985354,0.994276
1,"550, 1.0, 0.1, 0.4",0.976431,0.014323,0.979630,...,0.949832,0.977778,0.985354,0.994444
2,"550, 1.0, 0.1, 0.45",0.976052,0.015368,0.977609,...,0.946970,0.976431,0.986869,0.994781
3,"550, 1.0, 0.1, 0.5",0.976168,0.014660,0.981313,...,0.950505,0.974242,0.985690,0.994613
...,...,...,...,...,...,...,...,...,...
101,"650, 1.0, 0.3, 0.5",0.976926,0.014978,0.982828,...,0.949495,0.977778,0.987205,0.994444
102,"650, 1.0, 0.3, 0.55",0.976357,0.014522,0.977946,...,0.951515,0.973401,0.985354,0.994613
103,"650, 1.0, 0.3, 0.6",0.976357,0.014910,0.981481,...,0.950337,0.975758,0.985185,0.994613
104,"650, 1.0, 0.3, 0.65",0.977041,0.013336,0.978620,...,0.950673,0.975253,0.984848,0.994444


It took 243.14 minutes to run the above function


In [33]:
print("The best parameters of the fine search made with the {} classifier are:\n{}"
      .format(selected_classifier_name, best_parameters_fine_search))
print('The error on those parameters in terms of AUC scores is {}.'
      .format(best_mean_of_AUC_fine_search))
print('The std of this error across all features and participants is {}.'
      .format(std_of_AUC_mean_fine_search))

The best parameters of the fine search made with the Isolation Forest classifier are:
550, 1.0, 0.2, 0.45
The error on those parameters in terms of AUC scores is 0.9777777777777777.
The std of this error across all features and participants is 0.013064097807381103.


### Drop a cleaner csv with only the best parameters

In order to do that we will pick the best parameters and drop them in a smaller csv. We will use the name of the classifier in the title. The goal is that for every classifier that there is such a file for every classifier that is being used in the next stage.

In [34]:
# split he list according to the comma and space characters that separate the params (', ')
best_params_list = best_parameters_fine_search.split(', ')

best_params_list = [best_params_list[0]] + [p for p in best_params_list[1:]]

# put them to a dataframe
best_params_df = pd.DataFrame({'parameter name': param_names_IF,
                               'parameter value': best_params_list})

print("The dataframe with the best parameters is:\n")
display(best_params_df)

The dataframe with the best parameters is:



Unnamed: 0,parameter name,parameter value
0,n_estimators,550.0
1,max_samples,1.0
2,contamination,0.2
3,max_features,0.45


In [35]:
# make a csv using the name of the classifier as the file name
best_params_df.to_csv(target_path_rel + 'Optimal Parameters --- {} (AUC).csv'
                      .format(selected_classifier_name), 
                      index=False)

In [36]:
# append the score and std of the best params
fd = open(target_path_rel + 'Optimal Parameters --- {} (AUC).csv'.format(selected_classifier_name),'a')
fd.write('--------------------------------------------------------------------------------\n')
fd.write('--------------------------------------------------------------------------------\n')
fd.write('The mean AUC rate across features and users of the above parameters is {:.4f}\n'
         .format(best_mean_of_AUC_fine_search))
fd.write('The std of the aforementioned score is {:.4f}\n'.format(std_of_AUC_mean_fine_search))
fd.write('--------------------------------------------------------------------------------\n')
fd.write('The error for every separate case is measured in terms of its AUC score')
fd.close()