# 6 Modeling - selection of the best linear models

<b> Purpose of the action </b> - checking accuracy of prediction on test set using different types of models:
- LogisticRegression
- LinearSVC
- SVC with RBF
- KNeighborsClassifier

<b> </b>
<b> Action plan </b>:
- Test 20 diffrent models for each type
- Use ParameterSampler to generate different models with random hyperparameters
- Use training set for fitting model and use validation set for model evaluation 
- Select the best 5 models of each type and create one AveragingClassifier
- Train the best base models(top 1) of each type model on all data (training and validation sets)
- Do the same with AveragingClassifiers
- Create one AveragingClassifier using the best one model of each type
- Create LargeAveragingClassifier from the previously created AveragingClassifier (each model contains the top 5 models of the same type)
- Save models for use in future
- Compare prediction accuracy and other metrics on test set and save results for future purpose

## 6.1 Import nessesary libraries and modules

In [1]:
import numpy as np
import pandas as pd
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
from modeling import Metrics, select_best_classifiers, show_best_models
from classifiers import AveragingClassifier, LargeAveragingClassifier
from preprocessing_pipelines import categorical_preprocess_pipeline, ImportantFeaturesSelector
import pickle

## 6.2 Import data sets dedicated for linear models

In [2]:
# data sets for selecting best models of each type
train_set = pd.read_csv("./preprocessed_data/processed_categorical_train_set.csv", index_col=0)
validation_set = pd.read_csv("./preprocessed_data/processed_categorical_validation_set.csv", index_col=0)

# data sets for final fiting and prediction
train_set_all = pd.read_csv('./preprocessed_data/train_set_stage2.csv', index_col=0)
test_set = pd.read_csv('./preprocessed_data/test_set_stage2.csv', index_col=0) 

## 6.3 Split datasets to feature and label sets

In [3]:
# feature and label sets for selecting models
X_train, y_train = np.array(train_set.drop(columns='FTR')), np.array(train_set['FTR'])
X_val, y_val = np.array(validation_set.drop(columns='FTR')), np.array(validation_set['FTR'])

# feature and label sets for final training and prediction
X_train_all, y_train_all = train_set_all, np.array(train_set_all['FTR'])
X_test, y_test = test_set, np.array(test_set['FTR'])

## 6.4 Create placeholders to hold prediction results

In [4]:
# placeholder to hold prediction results
prediction_metrics = Metrics()

# lists to hold model objects
single_models = []
averaging_models = []

## 6.5 LogisticRegression

### 6.5.1  Select best models

Choose the best 5 models from 20 tested models using multiprocessing and <b> ParameterSampler </b> for generating random parameters. Use accuracy_score on validation set as metric for models evaluation.
Feature selection is made in the pipeline inside function for each model.

In [5]:
# define params for random grid search
params_grid={
   'C': [0.001, 0.01, 0.1, 1, 10, 100, 1000],
   'random_state': [0, 1, 2, 3, 4, 5, 6, 7, 8 ,9, 10],
   'max_iter': [1000],
   'penalty': ['l1', 'l2'],
   'solver' : ['liblinear']
}
    
# to safely run multiprocessing on Windows
if __name__ == '__main__':
    
    # function selecting best classifiers using multiprocessing
    best_models, best_scoring = select_best_classifiers(estimator=LogisticRegression, 
                                                        params_grid=params_grid,
                                                        n_iter=20, 
                                                        random_state=23,
                                                        X_train=X_train, 
                                                        y_train=y_train, 
                                                        X_val=X_val, 
                                                        y_val=y_val, 
                                                        verbose=1,
                                                        n_best_models=5)
    # show best selected models
    show_best_models(best_models, best_scoring)

Place: 1
LogisticRegression{'solver': 'liblinear', 'random_state': 2, 'penalty': 'l1', 'max_iter': 1000, 'C': 0.1}
Accuracy score on validation set: 0.6818
-------------------------------------------------------------------------------------------------------------------------------
Place: 2
LogisticRegression{'solver': 'liblinear', 'random_state': 1, 'penalty': 'l1', 'max_iter': 1000, 'C': 0.1}
Accuracy score on validation set: 0.6727
-------------------------------------------------------------------------------------------------------------------------------
Place: 3
LogisticRegression{'solver': 'liblinear', 'random_state': 8, 'penalty': 'l1', 'max_iter': 1000, 'C': 0.1}
Accuracy score on validation set: 0.6667
-------------------------------------------------------------------------------------------------------------------------------
Place: 4
LogisticRegression{'solver': 'liblinear', 'random_state': 0, 'penalty': 'l2', 'max_iter': 1000, 'C': 1000}
Accuracy score on validation set

### 6.5.2 Extract single models from list

In [6]:
clf_1, clf_2, clf_3, clf_4, clf_5 = best_models[:,1][0].steps[1][1], best_models[:,1][1].steps[1][1], \
                                    best_models[:,1][2].steps[1][1], best_models[:,1][3].steps[1][1], \
                                    best_models[:,1][4].steps[1][1]

### 6.5.3 Create compleated pipelines (with scaling, encoding and futures selection) for each individual classifiers

In [7]:
# all base preprocess pipeline and transformers come from module preprocessing_pipelines.py
pipe_clf_1 = Pipeline([ ('preprocess_pipeline', categorical_preprocess_pipeline),
                        ('feature_seletion', ImportantFeaturesSelector(clf_1, 'basic') ),
                        ('classification', clf_1)
                      ])

pipe_clf_2 = Pipeline([ ('preprocess_pipeline', categorical_preprocess_pipeline),
                        ('feature_seletion', ImportantFeaturesSelector(clf_2, 'basic') ),
                        ('classification', clf_2)
                      ])

pipe_clf_3 = Pipeline([ ('preprocess_pipeline', categorical_preprocess_pipeline),
                        ('feature_seletion', ImportantFeaturesSelector(clf_3, 'basic') ),
                        ('classification', clf_3)
                      ])

pipe_clf_4 = Pipeline([ ('preprocess_pipeline', categorical_preprocess_pipeline),
                        ('feature_seletion', ImportantFeaturesSelector(clf_4, 'basic') ),
                        ('classification', clf_4)
                      ])

pipe_clf_5 = Pipeline([ ('preprocess_pipeline', categorical_preprocess_pipeline),
                        ('feature_seletion', ImportantFeaturesSelector(clf_5, 'basic') ),
                        ('classification', clf_5)
                      ])

### 6.5.4  Make AveragingClassifier from the best 5 selected models (pipelines)

In [8]:
avg_clf = AveragingClassifier(base_estimators=[pipe_clf_1,
                                               pipe_clf_2,
                                               pipe_clf_3,
                                               pipe_clf_4,
                                               pipe_clf_5],
                              voting='soft')

# print(avg_clf.base_estimators[0])

### 6.5.5 Fit single and averaging models on the entire data set 

In [9]:
# to safely run multiprocessing on Windows
if __name__ == '__main__':
    
    # train model on all data
    pipe_clf_1.fit(X_train_all, y_train_all)
    avg_clf.fit(X_train_all, y_train_all)

    # give models a name
    clf_1_name = f'{clf_1.__class__.__name__}'
    avg_clf_name = f'Averaging{clf_1.__class__.__name__}'
    print(clf_1_name, avg_clf_name)

LogisticRegression AveragingLogisticRegression


### 6.5.6 Calculate metrics of prediction and add results to the lists

In [10]:
# add prediction metrics for single classifier to placeholder
prediction_metrics.add_metrics(pipe_clf_1, clf_1_name, X_test, y_test)

# add prediction metrics for averaging classifier to placeholder
prediction_metrics.add_metrics(avg_clf, avg_clf_name, X_test, y_test)

# add both classifiers to the lists (to create largest average classifiers)
single_models.append( (pipe_clf_1) )
averaging_models.append( (avg_clf) )

## 6.6 LinearSVC

### 6.6.1 Select best models

Choose the best 5 models from 20 tested models using multiprocessing and <b> ParameterSampler </b> for generating random parameters. Use accuracy_score on validation set as metric for models evaluation.
Feature selection is made in the pipeline inside function for each model.

In [11]:
# define params for random grid search
params_grid={
   'C': [0.01, 0.1, 1, 10, 100, 1000, 10000],
   'random_state': [0, 1, 2, 3, 4, 5, 6, 7, 8 ,9, 10],
   'max_iter': [100000],
   'kernel': ['linear'],
   'probability' : [True],
} 

# to safely run multiprocessing on Windows
if __name__ == '__main__':
    
    # function selecting best classifiers using multiprocessing
    best_models, best_scoring = select_best_classifiers(estimator=SVC, 
                                                        params_grid=params_grid,
                                                        n_iter=20, 
                                                        random_state=23,
                                                        X_train=X_train, 
                                                        y_train=y_train, 
                                                        X_val=X_val, 
                                                        y_val=y_val, 
                                                        verbose=1,
                                                        n_best_models=5)
    # show best selected models
    show_best_models(best_models, best_scoring)

Place: 1
SVC{'random_state': 7, 'probability': True, 'max_iter': 100000, 'kernel': 'linear', 'C': 1}
Accuracy score on validation set: 0.6727
-------------------------------------------------------------------------------------------------------------------------------
Place: 2
SVC{'random_state': 8, 'probability': True, 'max_iter': 100000, 'kernel': 'linear', 'C': 0.01}
Accuracy score on validation set: 0.6667
-------------------------------------------------------------------------------------------------------------------------------
Place: 3
SVC{'random_state': 9, 'probability': True, 'max_iter': 100000, 'kernel': 'linear', 'C': 0.01}
Accuracy score on validation set: 0.6667
-------------------------------------------------------------------------------------------------------------------------------
Place: 4
SVC{'random_state': 3, 'probability': True, 'max_iter': 100000, 'kernel': 'linear', 'C': 0.01}
Accuracy score on validation set: 0.6606
---------------------------------------

### 6.6.2 Extract single models from list

In [12]:
clf_1, clf_2, clf_3, clf_4, clf_5 = best_models[:,1][0].steps[1][1], best_models[:,1][1].steps[1][1], \
                                    best_models[:,1][2].steps[1][1], best_models[:,1][3].steps[1][1], \
                                    best_models[:,1][4].steps[1][1]

### 6.6.3 Create compleated pipelines (with scaling, encoding and futures selection) for each individual classifiers

In [13]:
# all base preprocess pipeline and transformers come from module preprocessing_pipelines.py
pipe_clf_1 = Pipeline([ ('preprocess_pipeline', categorical_preprocess_pipeline),
                        ('feature_seletion', ImportantFeaturesSelector(clf_1, 'basic') ),
                        ('classification', clf_1)
                      ])

pipe_clf_2 = Pipeline([ ('preprocess_pipeline', categorical_preprocess_pipeline),
                        ('feature_seletion', ImportantFeaturesSelector(clf_2, 'basic') ),
                        ('classification', clf_2)
                      ])

pipe_clf_3 = Pipeline([ ('preprocess_pipeline', categorical_preprocess_pipeline),
                        ('feature_seletion', ImportantFeaturesSelector(clf_3, 'basic') ),
                        ('classification', clf_3)
                      ])

pipe_clf_4 = Pipeline([ ('preprocess_pipeline', categorical_preprocess_pipeline),
                        ('feature_seletion', ImportantFeaturesSelector(clf_4, 'basic') ),
                        ('classification', clf_4)
                      ])

pipe_clf_5 = Pipeline([ ('preprocess_pipeline', categorical_preprocess_pipeline),
                        ('feature_seletion', ImportantFeaturesSelector(clf_5, 'basic') ),
                        ('classification', clf_5)
                      ])

### 6.6.4  Make AveragingClassifier from the best 5 selected models (pipelines)

In [14]:
avg_clf = AveragingClassifier(base_estimators=[pipe_clf_1,
                                               pipe_clf_2,
                                               pipe_clf_3,
                                               pipe_clf_4,
                                               pipe_clf_5],
                              voting='soft')

# print(avg_clf.base_estimators[0])

### 6.6.5 Fit single and averaging models on the entire data set 

In [15]:
# to safely run multiprocessing on Windows
if __name__ == '__main__':
    
    # train model on all data
    pipe_clf_1.fit(X_train_all, y_train_all)
    avg_clf.fit(X_train_all, y_train_all)

    # give models a name
    clf_1_name = f'Linear{clf_1.__class__.__name__}'
    avg_clf_name = f'AveragingLinear{clf_1.__class__.__name__}'
    print(clf_1_name, avg_clf_name)



LinearSVC AveragingLinearSVC


### 6.6.6 Calculate metrics of prediction and add results to the lists

In [16]:
# add prediction metrics for single classifier to placeholder
prediction_metrics.add_metrics(pipe_clf_1, clf_1_name, X_test, y_test)

# add prediction metrics for averaging classifier to placeholder
prediction_metrics.add_metrics(avg_clf, avg_clf_name, X_test, y_test)

# add both classifiers to the lists (to create largest average classifiers)
single_models.append( (pipe_clf_1) )
averaging_models.append( (avg_clf) )

## 6.7 SVC with RBF 

### 6.7.1 Select best models

Choose the best 5 models from 20 tested models using multiprocessing and <b> ParameterSampler </b> for generating random parameters. Use accuracy_score on validation set as metric for models evaluation.
Feature selection is made in the pipeline inside function for each model.

In [17]:
# define params for random grid search
params_grid={
   'C':  [100, 1000, 10000, 100000, 1000000],
   'gamma': [0.000001, 0.00001, 0.0001, 0.001, 0.01],
   'random_state': [0, 1, 2, 3, 4, 5, 6, 7, 8 ,9, 10],
   'probability' : [True],
   'max_iter': [100000]
} 

# to safely run multiprocessing on Windows
if __name__ == '__main__':
    
    # function selecting best classifiers using multiprocessing
    best_models, best_scoring = select_best_classifiers(estimator=SVC, 
                                                        params_grid=params_grid,
                                                        n_iter=20, 
                                                        random_state=23,
                                                        X_train=X_train, 
                                                        y_train=y_train, 
                                                        X_val=X_val, 
                                                        y_val=y_val, 
                                                        verbose=1,
                                                        n_best_models=5)
    # show best selected models
    show_best_models(best_models, best_scoring)

Place: 1
SVC{'random_state': 3, 'probability': True, 'max_iter': 100000, 'gamma': 1e-05, 'C': 100}
Accuracy score on validation set: 0.6727
-------------------------------------------------------------------------------------------------------------------------------
Place: 2
SVC{'random_state': 5, 'probability': True, 'max_iter': 100000, 'gamma': 0.0001, 'C': 1000}
Accuracy score on validation set: 0.6697
-------------------------------------------------------------------------------------------------------------------------------
Place: 3
SVC{'random_state': 8, 'probability': True, 'max_iter': 100000, 'gamma': 1e-05, 'C': 100}
Accuracy score on validation set: 0.6697
-------------------------------------------------------------------------------------------------------------------------------
Place: 4
SVC{'random_state': 0, 'probability': True, 'max_iter': 100000, 'gamma': 1e-06, 'C': 10000}
Accuracy score on validation set: 0.6545
----------------------------------------------------

### 6.7.2 Extract single models from list

In [18]:
clf_1, clf_2, clf_3, clf_4, clf_5 = best_models[:,1][0].steps[1][1], best_models[:,1][1].steps[1][1], \
                                    best_models[:,1][2].steps[1][1], best_models[:,1][3].steps[1][1], \
                                    best_models[:,1][4].steps[1][1]

### 6.7.3 Create compleated pipelines (with scaling, encoding and futures selection) for each individual classifiers

In [19]:
# all base preprocess pipeline and transformers come from module preprocessing_pipelines.py
pipe_clf_1 = Pipeline([ ('preprocess_pipeline', categorical_preprocess_pipeline),
                        ('feature_seletion', ImportantFeaturesSelector(clf_1, 'basic') ),
                        ('classification', clf_1)
                      ])

pipe_clf_2 = Pipeline([ ('preprocess_pipeline', categorical_preprocess_pipeline),
                        ('feature_seletion', ImportantFeaturesSelector(clf_2, 'basic') ),
                        ('classification', clf_2)
                      ])

pipe_clf_3 = Pipeline([ ('preprocess_pipeline', categorical_preprocess_pipeline),
                        ('feature_seletion', ImportantFeaturesSelector(clf_3, 'basic') ),
                        ('classification', clf_3)
                      ])

pipe_clf_4 = Pipeline([ ('preprocess_pipeline', categorical_preprocess_pipeline),
                        ('feature_seletion', ImportantFeaturesSelector(clf_4, 'basic') ),
                        ('classification', clf_4)
                      ])

pipe_clf_5 = Pipeline([ ('preprocess_pipeline', categorical_preprocess_pipeline),
                        ('feature_seletion', ImportantFeaturesSelector(clf_5, 'basic') ),
                        ('classification', clf_5)
                      ])

### 6.7.4  Make AveragingClassifier from the best 5 selected models (pipelines)

In [20]:
avg_clf = AveragingClassifier(base_estimators=[pipe_clf_1,
                                               pipe_clf_2,
                                               pipe_clf_3,
                                               pipe_clf_4,
                                               pipe_clf_5],
                              voting='soft')

# print(avg_clf.base_estimators[0])

### 6.7.5 Fit single and averaging models on the entire data set 

In [21]:
# to safely run multiprocessing on Windows
if __name__ == '__main__':
    
    # train model on all data
    pipe_clf_1.fit(X_train_all, y_train_all)
    avg_clf.fit(X_train_all, y_train_all)

    # give models a name
    clf_1_name = 'SvcRbf'
    avg_clf_name = 'AveragingSvcRbf'
    print(clf_1_name, avg_clf_name)

SvcRbf AveragingSvcRbf


### 6.7.6 Calculate metrics of prediction and add results to the lists

In [22]:
# add prediction metrics for single classifier to placeholder
prediction_metrics.add_metrics(pipe_clf_1, clf_1_name, X_test, y_test)

# add prediction metrics for averaging classifier to placeholder
prediction_metrics.add_metrics(avg_clf, avg_clf_name, X_test, y_test)

# add both classifiers to the lists (to create largest average classifiers)
single_models.append( (pipe_clf_1) )
averaging_models.append( (avg_clf) )

## 6.8 KNeighborsClassifier

### 6.8.1 Select best models

Choose the best 5 models from 20 tested models using multiprocessing and <b> ParameterSampler </b> for generating random parameters. Use accuracy_score on validation set as metric for models evaluation.
Feature selection is made in the pipeline inside function for each model.

In [23]:
# define params for random grid search
params_grid = {
                  'n_neighbors' : [9, 11, 13, 15, 17, 19, 21],
                  'metric': ['manhattan', 'cosine'],
                  'leaf_size' : [15, 20, 25, 30, 35, 40, 45]
              } 

# to safely run multiprocessing on Windows
if __name__ == '__main__':
    
    # function selecting best classifiers using multiprocessing
    best_models, best_scoring = select_best_classifiers(estimator=KNeighborsClassifier, 
                                                        params_grid=params_grid,
                                                        n_iter=20, 
                                                        random_state=23,
                                                        X_train=X_train, 
                                                        y_train=y_train, 
                                                        X_val=X_val, 
                                                        y_val=y_val, 
                                                        verbose=1,
                                                        n_best_models=5)
    # show best selected models
    show_best_models(best_models, best_scoring)

Place: 1
KNeighborsClassifier{'n_neighbors': 13, 'metric': 'manhattan', 'leaf_size': 40}
Accuracy score on validation set: 0.6818
-------------------------------------------------------------------------------------------------------------------------------
Place: 2
KNeighborsClassifier{'n_neighbors': 11, 'metric': 'manhattan', 'leaf_size': 25}
Accuracy score on validation set: 0.6788
-------------------------------------------------------------------------------------------------------------------------------
Place: 3
KNeighborsClassifier{'n_neighbors': 17, 'metric': 'manhattan', 'leaf_size': 15}
Accuracy score on validation set: 0.6727
-------------------------------------------------------------------------------------------------------------------------------
Place: 4
KNeighborsClassifier{'n_neighbors': 19, 'metric': 'cosine', 'leaf_size': 20}
Accuracy score on validation set: 0.6667
---------------------------------------------------------------------------------------------------

### 6.8.2 Extract single models from list

In [24]:
clf_1, clf_2, clf_3, clf_4, clf_5 = best_models[:,1][0].steps[1][1], best_models[:,1][1].steps[1][1], \
                                    best_models[:,1][2].steps[1][1], best_models[:,1][3].steps[1][1], \
                                    best_models[:,1][4].steps[1][1]

### 6.8.3 Create compleated pipelines (with sclaing, encoding and futures selection) for each individual classifiers

In [26]:
# all base preprocess pipeline and transformers come from module preprocessing_pipelines.py
pipe_clf_1 = Pipeline([ ('preprocess_pipeline', categorical_preprocess_pipeline),
                        ('feature_seletion', ImportantFeaturesSelector(clf_1, 'basic') ),
                        ('classification', clf_1)
                      ])

pipe_clf_2 = Pipeline([ ('preprocess_pipeline', categorical_preprocess_pipeline),
                        ('feature_seletion', ImportantFeaturesSelector(clf_2, 'basic') ),
                        ('classification', clf_2)
                      ])

pipe_clf_3 = Pipeline([ ('preprocess_pipeline', categorical_preprocess_pipeline),
                        ('feature_seletion', ImportantFeaturesSelector(clf_3, 'basic') ),
                        ('classification', clf_3)
                      ])

pipe_clf_4 = Pipeline([ ('preprocess_pipeline', categorical_preprocess_pipeline),
                        ('feature_seletion', ImportantFeaturesSelector(clf_4, 'basic') ),
                        ('classification', clf_4)
                      ])

pipe_clf_5 = Pipeline([ ('preprocess_pipeline', categorical_preprocess_pipeline),
                        ('feature_seletion', ImportantFeaturesSelector(clf_5, 'basic') ),
                        ('classification', clf_5)
                      ])

### 6.8.4  Make AveragingClassifier from the best 5 selected models (pipelines)

In [27]:
avg_clf = AveragingClassifier(base_estimators=[pipe_clf_1,
                                               pipe_clf_2,
                                               pipe_clf_3,
                                               pipe_clf_4,
                                               pipe_clf_5],
                              voting='soft')

# print(avg_clf.base_estimators[0])

### 6.8.5 Fit single and averaging models on the entire data set 

In [28]:
# to safely run multiprocessing on Windows
if __name__ == '__main__':
    
    # train model on all data
    pipe_clf_1.fit(X_train_all, y_train_all)
    avg_clf.fit(X_train_all, y_train_all)

    # give models a name
    clf_1_name = f'{clf_1.__class__.__name__}'
    avg_clf_name = f'Averaging{clf_1.__class__.__name__}'
    print(clf_1_name, avg_clf_name)

KNeighborsClassifier AveragingKNeighborsClassifier


### 6.8.6 Calculate metrics of prediction and add results to the lists

In [29]:
# add prediction metrics for single classifier to placeholder
prediction_metrics.add_metrics(pipe_clf_1, clf_1_name, X_test, y_test)

# add prediction metrics for averaging classifier to placeholder
prediction_metrics.add_metrics(avg_clf, avg_clf_name, X_test, y_test)

# add both classifiers to the lists (to create largest average classifiers)
single_models.append( (pipe_clf_1) )
averaging_models.append( (avg_clf) )

## 6.9 Merge single and averaging models in largest averaging models

### 6.9.1 Create new largest averaging models

In [30]:
# create models (all base model is already fitted)

# as base models using single classifier
average_lin_clf = AveragingClassifier(base_estimators=single_models, voting='soft')

# as base models using averaging classifier
large_average_lin_clf = LargeAveragingClassifier(base_estimators=averaging_models, voting='soft')

# give models a name
average_lin_clf_name = 'LinearModelsAveragingClassifier'
large_average_lin_clf_name = 'LargeLinearModelsAveragingClassifier'
print(average_lin_clf_name, large_average_lin_clf_name)

LinearModelsAveragingClassifier LargeLinearModelsAveragingClassifier


### 6.9.2 Calculate metrics of prediction and add results to the lists

In [31]:
# add prediction metrics for averaging classifier to placeholder
prediction_metrics.add_metrics(average_lin_clf, average_lin_clf_name, X_test, y_test)

# add prediction metrics for large averaging classifier to placeholder
prediction_metrics.add_metrics(large_average_lin_clf, large_average_lin_clf_name, X_test, y_test)

### 6.9.3 Save models for future purpose

In [32]:
# save single voting model using pickle library
with open(f'./models/{average_lin_clf_name}.pickle', 'wb') as f:
    # pickle the 'models'using the highest protocol available.
    pickle.dump(average_lin_clf, f, pickle.HIGHEST_PROTOCOL)
    
# save averaging voting model using pickle library
with open(f'./models/{large_average_lin_clf_name}.pickle', 'wb') as f:
    # pickle the 'models'using the highest protocol available.
    pickle.dump(large_average_lin_clf, f, pickle.HIGHEST_PROTOCOL)

## 6.10 Show all result in one table and save it for future purpose

In [33]:
# get prediction metric result lists from placeholder
precision_score, recall_score, f1_score, roc_auc_score, accuracy_score = prediction_metrics.get_metrics()

# get model names list from placeholder
models_name = prediction_metrics.get_names()

# create dictionary of results 
results_dict = {'precision_score': precision_score, 
               'recall_score': recall_score, 
               'f1_score': f1_score,
               'roc_auc_score' : roc_auc_score,
               'accuracy_score' : accuracy_score}

results_df = pd.DataFrame(data=results_dict)
results_df.insert(loc=0, column='Model', value=models_name)
results_df

Unnamed: 0,Model,precision_score,recall_score,f1_score,roc_auc_score,accuracy_score
0,LogisticRegression,0.613861,0.738095,0.67027,0.741914,0.678947
1,AveragingLogisticRegression,0.606383,0.678571,0.640449,0.73978,0.663158
2,LinearSVC,0.633663,0.761905,0.691892,0.743261,0.7
3,AveragingLinearSVC,0.616162,0.72619,0.666667,0.735737,0.678947
4,SvcRbf,0.604396,0.654762,0.628571,0.718553,0.657895
5,AveragingSvcRbf,0.64,0.761905,0.695652,0.737084,0.705263
6,KNeighborsClassifier,0.6,0.642857,0.62069,0.690757,0.652632
7,AveragingKNeighborsClassifier,0.615385,0.666667,0.64,0.703729,0.668421
8,LinearModelsAveragingClassifier,0.615385,0.666667,0.64,0.736074,0.668421
9,LargeLinearModelsAveragingClassifier,0.612245,0.714286,0.659341,0.734951,0.673684


In [34]:
results_df.to_csv("./results/linear_models_results.csv")