# Example Notebook for classifier finder

## 0. set log-level of sam_ml library [can be ignored]

(e.g. debugging)

NOTE: has to happen before importing the sam_ml library

In [1]:
import os
os.environ["SAM_ML_LOG_LEVEL"] = "info"

## 1. libraries

In [2]:
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

from sam_ml.models.classifier import LR
from sam_ml.models.automl import CTest

## 2. data

In [3]:
df = load_iris()
y = pd.Series(df.target)
X = pd.DataFrame(df.data, columns=df.feature_names)
x_train, x_test, y_train, y_test = train_test_split(X,y, train_size=0.80, random_state=42)

## 3. model

### 3.1. create tester class object

CTest is an auto-ml class. You can use it to compare different models and find the best one for your data.

**models**: list of *Classifier* subclass objects or *'all'* (for all integrated wrapper class classifier) or *'basic'* (for a smaller selection of basic classifier)

**vectorizer**, **scaler**, **selector**, **sampler**: CTest init creates *Pipeline* objects out of the given models with the data class parameters given
(look into the *iris_pipeline.ipynb* notebook to see the possible parameters)

In [4]:
tester = CTest("all", scaler="minmax")

get all models in the CTest class object

In [5]:
tester.models

{'LogisticRegression (vec=None, scaler=minmax, selector=None, sampler=None)': Pipeline(vectorizer=None, scaler=Scaler(algorithm='minmax', ), selector=None, sampler=None, model=LR(model_name='LogisticRegression'), model_name='LogisticRegression (vec=None, scaler=minmax, selector=None, sampler=None)'),
 'QuadraticDiscriminantAnalysis (vec=None, scaler=minmax, selector=None, sampler=None)': Pipeline(vectorizer=None, scaler=Scaler(algorithm='minmax', ), selector=None, sampler=None, model=QDA(model_name='QuadraticDiscriminantAnalysis'), model_name='QuadraticDiscriminantAnalysis (vec=None, scaler=minmax, selector=None, sampler=None)'),
 'LinearDiscriminantAnalysis (vec=None, scaler=minmax, selector=None, sampler=None)': Pipeline(vectorizer=None, scaler=Scaler(algorithm='minmax', ), selector=None, sampler=None, model=LDA(model_name='LinearDiscriminantAnalysis'), model_name='LinearDiscriminantAnalysis (vec=None, scaler=minmax, selector=None, sampler=None)'),
 'MLP Classifier (vec=None, scaler=

you can add models

In [6]:
tester.add_model(LR(model_name="LogisticRegression (elasticnet penalty)", penalty="elasticnet", solver="saga", l1_ratio=0.5))

In [7]:
tester.models

{'LogisticRegression (vec=None, scaler=minmax, selector=None, sampler=None)': Pipeline(vectorizer=None, scaler=Scaler(algorithm='minmax', ), selector=None, sampler=None, model=LR(model_name='LogisticRegression'), model_name='LogisticRegression (vec=None, scaler=minmax, selector=None, sampler=None)'),
 'QuadraticDiscriminantAnalysis (vec=None, scaler=minmax, selector=None, sampler=None)': Pipeline(vectorizer=None, scaler=Scaler(algorithm='minmax', ), selector=None, sampler=None, model=QDA(model_name='QuadraticDiscriminantAnalysis'), model_name='QuadraticDiscriminantAnalysis (vec=None, scaler=minmax, selector=None, sampler=None)'),
 'LinearDiscriminantAnalysis (vec=None, scaler=minmax, selector=None, sampler=None)': Pipeline(vectorizer=None, scaler=Scaler(algorithm='minmax', ), selector=None, sampler=None, model=LDA(model_name='LinearDiscriminantAnalysis'), model_name='LinearDiscriminantAnalysis (vec=None, scaler=minmax, selector=None, sampler=None)'),
 'MLP Classifier (vec=None, scaler=

you can remove models

In [8]:
tester.remove_model("LogisticRegression (elasticnet penalty) (vec=None, scaler=minmax, selector=None, sampler=None)")

In [9]:
tester.models

{'LogisticRegression (vec=None, scaler=minmax, selector=None, sampler=None)': Pipeline(vectorizer=None, scaler=Scaler(algorithm='minmax', ), selector=None, sampler=None, model=LR(model_name='LogisticRegression'), model_name='LogisticRegression (vec=None, scaler=minmax, selector=None, sampler=None)'),
 'QuadraticDiscriminantAnalysis (vec=None, scaler=minmax, selector=None, sampler=None)': Pipeline(vectorizer=None, scaler=Scaler(algorithm='minmax', ), selector=None, sampler=None, model=QDA(model_name='QuadraticDiscriminantAnalysis'), model_name='QuadraticDiscriminantAnalysis (vec=None, scaler=minmax, selector=None, sampler=None)'),
 'LinearDiscriminantAnalysis (vec=None, scaler=minmax, selector=None, sampler=None)': Pipeline(vectorizer=None, scaler=Scaler(algorithm='minmax', ), selector=None, sampler=None, model=LDA(model_name='LinearDiscriminantAnalysis'), model_name='LinearDiscriminantAnalysis (vec=None, scaler=minmax, selector=None, sampler=None)'),
 'MLP Classifier (vec=None, scaler=

## 3.1. evaluation of the models

CTest has 3 ways implemented to evaluate the models. Depending on the dataset you can choose which one to use

### 3.1.1. one-vs-all cross validation

**Concept:**

The model will be trained on all datapoints except one and then tested on this last one. This will be repeated for all datapoints so that we have our predictions for all datapoints.

**Advantage:** optimal use of information for training

**Disadvantage:** long train time

This concept is very useful for small datasets (datapoints < 150) because the long train time is still not too long and especially with a small amount of information for the model, it is important to use all the information one has for the training.

In [10]:
tester.eval_models_cv(X ,y , avg="macro", small_data_eval=True)
tester.output_scores_as_pd(sort_by="recall", console_out=False)

Crossvalidation:   0%|          | 0/21 [00:00<?, ?it/s]



Unnamed: 0,accuracy,precision,recall,s_score,l_score,train_score,train_time
"LinearDiscriminantAnalysis (vec=None, scaler=minmax, selector=None, sampler=None)",0.98,0.980125,0.98,0.9904373,1.0,0.98,0:00:00
"QuadraticDiscriminantAnalysis (vec=None, scaler=minmax, selector=None, sampler=None)",0.973333,0.973825,0.973333,0.9894085,1.0,0.980045,0:00:00
"AdaBoostClassifier (RFC based) (vec=None, scaler=minmax, selector=None, sampler=None)",0.953333,0.953448,0.953333,0.9861117,1.0,1.0,0:00:00
"KNeighborsClassifier (vec=None, scaler=minmax, selector=None, sampler=None)",0.953333,0.953448,0.953333,0.9861117,1.0,0.960045,0:00:00
"BaggingClassifier (RFC based) (vec=None, scaler=minmax, selector=None, sampler=None)",0.953333,0.953448,0.953333,0.9861117,1.0,0.988367,0:00:00
"GaussianNB (vec=None, scaler=minmax, selector=None, sampler=None)",0.953333,0.953448,0.953333,0.9861117,1.0,0.959418,0:00:00
"ExtraTreesClassifier (vec=None, scaler=minmax, selector=None, sampler=None)",0.953333,0.953448,0.953333,0.9861117,1.0,1.0,0:00:00
"RandomForestClassifier (vec=None, scaler=minmax, selector=None, sampler=None)",0.953333,0.953448,0.953333,0.9861117,1.0,1.0,0:00:00
"SupportVectorClassifier (vec=None, scaler=minmax, selector=None, sampler=None)",0.953333,0.953448,0.953333,0.9861117,1.0,0.979732,0:00:00
"GradientBoostingMachine (vec=None, scaler=minmax, selector=None, sampler=None)",0.953333,0.953448,0.953333,0.9861117,1.0,1.0,0:00:00


### 3.1.2. multiple split crossvalidation

does **cv_num** splits and takes the average values for evaluating the model

In [11]:
tester.eval_models_cv(X, y, avg="macro", small_data_eval=False, cv_num=10)
tester.output_scores_as_pd(sort_by="recall", console_out=False)

Crossvalidation:   0%|          | 0/21 [00:00<?, ?it/s]

Unnamed: 0,accuracy,precision,recall,s_score,l_score,train_time,train_score
"LinearDiscriminantAnalysis (vec=None, scaler=minmax, selector=None, sampler=None)",0.966667,0.85,0.833333,0.698512,0.7,0:00:00,0.979259
"QuadraticDiscriminantAnalysis (vec=None, scaler=minmax, selector=None, sampler=None)",0.966667,0.85,0.833333,0.698512,0.7,0:00:00,0.982222
"SupportVectorClassifier (vec=None, scaler=minmax, selector=None, sampler=None)",0.953333,0.8,0.776667,0.599006,0.6,0:00:00,0.979259
"RandomForestClassifier (vec=None, scaler=minmax, selector=None, sampler=None)",0.946667,0.8,0.773333,0.599004,0.6,0:00:00,1.0
"ExtraTreesClassifier (vec=None, scaler=minmax, selector=None, sampler=None)",0.946667,0.8,0.773333,0.599005,0.6,0:00:00,1.0
"KNeighborsClassifier (vec=None, scaler=minmax, selector=None, sampler=None)",0.946667,0.8,0.773333,0.599005,0.6,0:00:00,0.964444
"GradientBoostingMachine (vec=None, scaler=minmax, selector=None, sampler=None)",0.926667,0.8,0.763333,0.598997,0.6,0:00:00,1.0
"DecisionTreeClassifier (vec=None, scaler=minmax, selector=None, sampler=None)",0.953333,0.766667,0.743333,0.499746,0.5,0:00:00,1.0
"GaussianNB (vec=None, scaler=minmax, selector=None, sampler=None)",0.946667,0.766667,0.74,0.499745,0.5,0:00:00,0.961481
"AdaBoostClassifier (RFC based) (vec=None, scaler=minmax, selector=None, sampler=None)",0.94,0.75,0.72,0.499498,0.5,0:00:00,1.0


### 3.1.3. evaluate on given train-test-split

sometimes it only makes sense to split a dataset in one way then cross validation is useless

In [12]:
tester.eval_models(x_train, y_train, x_test, y_test, avg="macro")
tester.output_scores_as_pd(sort_by="recall", console_out=False)

Evaluation:   0%|          | 0/21 [00:00<?, ?it/s]

Unnamed: 0,accuracy,precision,recall,s_score,l_score,train_time,train_score
"AdaBoostClassifier (RFC based) (vec=None, scaler=minmax, selector=None, sampler=None)",1.0,1.0,1.0,0.9926,1.0,0:00:00,1.0
"GradientBoostingMachine (vec=None, scaler=minmax, selector=None, sampler=None)",1.0,1.0,1.0,0.9926,1.0,0:00:00,1.0
"BaggingClassifier (RFC based) (vec=None, scaler=minmax, selector=None, sampler=None)",1.0,1.0,1.0,0.9926,1.0,0:00:00,0.966667
"BaggingClassifier (DTC based) (vec=None, scaler=minmax, selector=None, sampler=None)",1.0,1.0,1.0,0.9926,1.0,0:00:00,0.95
"GaussianNB (vec=None, scaler=minmax, selector=None, sampler=None)",1.0,1.0,1.0,0.9926,1.0,0:00:00,0.95
"ExtraTreesClassifier (vec=None, scaler=minmax, selector=None, sampler=None)",1.0,1.0,1.0,0.9926,1.0,0:00:00,1.0
"KNeighborsClassifier (vec=None, scaler=minmax, selector=None, sampler=None)",1.0,1.0,1.0,0.9926,1.0,0:00:00,0.958333
"AdaBoostClassifier (DTC based) (vec=None, scaler=minmax, selector=None, sampler=None)",1.0,1.0,1.0,0.9926,1.0,0:00:00,0.966667
"XGBClassifier (vec=None, scaler=minmax, selector=None, sampler=None)",1.0,1.0,1.0,0.9926,1.0,0:00:00,1.0
"SupportVectorClassifier (vec=None, scaler=minmax, selector=None, sampler=None)",1.0,1.0,1.0,0.9926,1.0,0:00:00,0.975


### 3.2. find best model

**Idea:**

The find_best_model_randomCV method is using randomCVsearch for every modeltype to find its best hyperparameters and afterwards comparing the results of the modeltypes. The randomCVsearch should be just for a few models per modeltype to save time.

**Useful parameters:**

- you can change the used crossvalidation with **small_data_eval** and if *small_data_eval=False*, you can set the number of CVs with **cv_num**

- with the **scoring** parameter you can choose which metric to look at for searching the best model (you can use **avg**, **secondary_scoring**, **strength**, and **pos_label** to more specify it)

- you can change the number of parameter sets to test for each model type with **n_trails**

- you can change with **leave_loadbar** if the loading bar from the  randomCVsearches of each model type shall disappear after they finished

In [13]:
scores = tester.find_best_model_randomCV(x_train, y_train, x_test, y_test, scoring="recall", avg="macro", small_data_eval=False)

randomCVsearch:   0%|          | 0/21 [00:00<?, ?it/s]

randomCVsearch (LogisticRegression (vec=None, scaler=minmax, selector=None, sampler=None)): 100%|██████████| 5/5 [00:00<00:00, 21.16it/s]

2024-01-15 12:23:06,172 - sam_ml.models.main_auto_ml - INFO - LogisticRegression (vec=None, scaler=minmax, selector=None, sampler=None) - score: 0.9506172839506172 (recall) - parameters: {'C': 63.512210106407046, 'penalty': 'l2', 'solver': 'saga'}



randomCVsearch (QuadraticDiscriminantAnalysis (vec=None, scaler=minmax, selector=None, sampler=None)): 100%|██████████| 5/5 [00:00<00:00, 21.35it/s]

2024-01-15 12:23:06,435 - sam_ml.models.main_auto_ml - INFO - QuadraticDiscriminantAnalysis (vec=None, scaler=minmax, selector=None, sampler=None) - score: 0.9598765432098766 (recall) - parameters: {'reg_param': 0.0}



randomCVsearch (LinearDiscriminantAnalysis (vec=None, scaler=minmax, selector=None, sampler=None)): 100%|██████████| 4/4 [00:00<00:00, 22.59it/s]

2024-01-15 12:23:06,638 - sam_ml.models.main_auto_ml - INFO - LinearDiscriminantAnalysis (vec=None, scaler=minmax, selector=None, sampler=None) - score: 0.9814814814814815 (recall) - parameters: {'solver': 'svd'}



randomCVsearch (MLP Classifier (vec=None, scaler=minmax, selector=None, sampler=None)): 100%|██████████| 5/5 [00:00<00:00,  7.20it/s]

2024-01-15 12:23:07,358 - sam_ml.models.main_auto_ml - INFO - MLP Classifier (vec=None, scaler=minmax, selector=None, sampler=None) - score: 0.9363298738298736 (recall) - parameters: {'activation': 'relu', 'alpha': 0.0001, 'hidden_layer_sizes': (100,), 'learning_rate': 'constant', 'solver': 'adam'}



randomCVsearch (LinearSupportVectorClassifier (vec=None, scaler=minmax, selector=None, sampler=None)): 100%|██████████| 5/5 [00:00<00:00, 22.94it/s]

2024-01-15 12:23:07,697 - sam_ml.models.main_auto_ml - INFO - LinearSupportVectorClassifier (vec=None, scaler=minmax, selector=None, sampler=None) - score: 0.9907407407407408 (recall) - parameters: {'C': 635.1221010640695, 'dual': True, 'penalty': 'l2'}



randomCVsearch (DecisionTreeClassifier (vec=None, scaler=minmax, selector=None, sampler=None)): 100%|██████████| 5/5 [00:00<00:00, 19.49it/s]

2024-01-15 12:23:07,985 - sam_ml.models.main_auto_ml - INFO - DecisionTreeClassifier (vec=None, scaler=minmax, selector=None, sampler=None) - score: 0.939373897707231 (recall) - parameters: {'criterion': 'gini', 'max_depth': 2, 'max_features': 'sqrt', 'max_leaf_nodes': 77, 'min_samples_leaf': 2, 'min_samples_split': 7, 'min_weight_fraction_leaf': 0.22803499210851796}



randomCVsearch (RandomForestClassifier (vec=None, scaler=minmax, selector=None, sampler=None)): 100%|██████████| 5/5 [00:02<00:00,  2.40it/s]

2024-01-15 12:23:10,095 - sam_ml.models.main_auto_ml - INFO - RandomForestClassifier (vec=None, scaler=minmax, selector=None, sampler=None) - score: 0.9463183421516755 (recall) - parameters: {'bootstrap': False, 'criterion': 'gini', 'max_depth': 3, 'min_samples_leaf': 1, 'min_samples_split': 5, 'min_weight_fraction_leaf': 0.14607232426760908, 'n_estimators': 24}



randomCVsearch (SupportVectorClassifier (vec=None, scaler=minmax, selector=None, sampler=None)): 100%|██████████| 5/5 [00:00<00:00, 16.33it/s]

2024-01-15 12:23:10,520 - sam_ml.models.main_auto_ml - INFO - SupportVectorClassifier (vec=None, scaler=minmax, selector=None, sampler=None) - score: 0.9560185185185185 (recall) - parameters: {'C': 24.81040974867808, 'gamma': 0.29154431891537513, 'kernel': 'sigmoid', 'probability': True}



randomCVsearch (GradientBoostingMachine (vec=None, scaler=minmax, selector=None, sampler=None)): 100%|██████████| 5/5 [00:03<00:00,  1.38it/s]

2024-01-15 12:23:14,183 - sam_ml.models.main_auto_ml - INFO - GradientBoostingMachine (vec=None, scaler=minmax, selector=None, sampler=None) - score: 0.9529320987654321 (recall) - parameters: {'criterion': 'friedman_mse', 'learning_rate': 0.009470976192691145, 'max_depth': 6, 'max_features': 'log2', 'min_samples_leaf': 3, 'min_samples_split': 20, 'n_estimators': 141, 'subsample': 0.8777243706586128}



randomCVsearch (AdaBoostClassifier (DTC based) (vec=None, scaler=minmax, selector=None, sampler=None)): 100%|██████████| 5/5 [00:01<00:00,  2.72it/s]

2024-01-15 12:23:16,450 - sam_ml.models.main_auto_ml - INFO - AdaBoostClassifier (DTC based) (vec=None, scaler=minmax, selector=None, sampler=None) - score: 0.9529320987654321 (recall) - parameters: {'algorithm': 'SAMME', 'estimator__max_depth': 2, 'learning_rate': 1.1666347719377983, 'n_estimators': 871}



randomCVsearch (AdaBoostClassifier (RFC based) (vec=None, scaler=minmax, selector=None, sampler=None)): 100%|██████████| 5/5 [00:14<00:00,  2.81s/it]

2024-01-15 12:23:31,649 - sam_ml.models.main_auto_ml - INFO - AdaBoostClassifier (RFC based) (vec=None, scaler=minmax, selector=None, sampler=None) - score: 0.9529320987654321 (recall) - parameters: {'algorithm': 'SAMME', 'estimator__max_depth': 10, 'estimator__n_estimators': 92, 'learning_rate': 1.5634197689032772, 'n_estimators': 25}



randomCVsearch (AdaBoostClassifier (LR based) (vec=None, scaler=minmax, selector=None, sampler=None)): 100%|██████████| 5/5 [00:02<00:00,  1.88it/s]

2024-01-15 12:23:34,463 - sam_ml.models.main_auto_ml - INFO - AdaBoostClassifier (LR based) (vec=None, scaler=minmax, selector=None, sampler=None) - score: 0.8628747795414462 (recall) - parameters: {'algorithm': 'SAMME.R', 'learning_rate': 1.365320783953279, 'n_estimators': 321}



randomCVsearch (KNeighborsClassifier (vec=None, scaler=minmax, selector=None, sampler=None)): 100%|██████████| 5/5 [00:00<00:00, 20.00it/s]

2024-01-15 12:23:35,715 - sam_ml.models.main_auto_ml - INFO - KNeighborsClassifier (vec=None, scaler=minmax, selector=None, sampler=None) - score: 0.9542548500881834 (recall) - parameters: {'leaf_size': 19, 'n_neighbors': 5, 'p': 4, 'weights': 'distance'}



randomCVsearch (ExtraTreesClassifier (vec=None, scaler=minmax, selector=None, sampler=None)): 100%|██████████| 5/5 [00:01<00:00,  3.25it/s]

2024-01-15 12:23:37,281 - sam_ml.models.main_auto_ml - INFO - ExtraTreesClassifier (vec=None, scaler=minmax, selector=None, sampler=None) - score: 0.9529320987654321 (recall) - parameters: {'bootstrap': False, 'criterion': 'gini', 'max_depth': 5, 'min_samples_leaf': 1, 'min_samples_split': 2, 'min_weight_fraction_leaf': 0.0, 'n_estimators': 100}



randomCVsearch (GaussianNB (vec=None, scaler=minmax, selector=None, sampler=None)): 100%|██████████| 5/5 [00:00<00:00, 23.11it/s]

2024-01-15 12:23:37,754 - sam_ml.models.main_auto_ml - INFO - GaussianNB (vec=None, scaler=minmax, selector=None, sampler=None) - score: 0.9380511463844797 (recall) - parameters: {'var_smoothing': 1e-09}



randomCVsearch (BernoulliNB (vec=None, scaler=minmax, selector=None, sampler=None)): 100%|██████████| 5/5 [00:00<00:00, 15.78it/s]

2024-01-15 12:23:38,100 - sam_ml.models.main_auto_ml - INFO - BernoulliNB (vec=None, scaler=minmax, selector=None, sampler=None) - score: 0.3694885361552028 (recall) - parameters: {'binarize': 0, 'fit_prior': True}



randomCVsearch (GaussianProcessClassifier (vec=None, scaler=minmax, selector=None, sampler=None)): 100%|██████████| 5/5 [00:00<00:00, 11.40it/s]

2024-01-15 12:23:38,572 - sam_ml.models.main_auto_ml - INFO - GaussianProcessClassifier (vec=None, scaler=minmax, selector=None, sampler=None) - score: 0.9321496404829738 (recall) - parameters: {'max_iter_predict': 47, 'multi_class': 'one_vs_one'}



randomCVsearch (BaggingClassifier (DTC based) (vec=None, scaler=minmax, selector=None, sampler=None)): 100%|██████████| 5/5 [00:02<00:00,  2.42it/s]

2024-01-15 12:23:40,698 - sam_ml.models.main_auto_ml - INFO - BaggingClassifier (DTC based) (vec=None, scaler=minmax, selector=None, sampler=None) - score: 0.9529320987654321 (recall) - parameters: {'bootstrap': True, 'bootstrap_features': True, 'estimator__max_depth': 7, 'max_features': 4, 'max_samples': 0.373818018663584, 'n_estimators': 205}



randomCVsearch (BaggingClassifier (RFC based) (vec=None, scaler=minmax, selector=None, sampler=None)): 100%|██████████| 5/5 [00:32<00:00,  6.41s/it]

2024-01-15 12:24:13,035 - sam_ml.models.main_auto_ml - INFO - BaggingClassifier (RFC based) (vec=None, scaler=minmax, selector=None, sampler=None) - score: 0.9529320987654321 (recall) - parameters: {'bootstrap': True, 'bootstrap_features': False, 'estimator__max_depth': 5, 'estimator__n_estimators': 50, 'max_features': 1.0, 'max_samples': 1.0, 'n_estimators': 10}



randomCVsearch (BaggingClassifier (LR based) (vec=None, scaler=minmax, selector=None, sampler=None)): 100%|██████████| 5/5 [00:03<00:00,  1.29it/s]

2024-01-15 12:24:17,447 - sam_ml.models.main_auto_ml - INFO - BaggingClassifier (LR based) (vec=None, scaler=minmax, selector=None, sampler=None) - score: 0.9175315425315426 (recall) - parameters: {'bootstrap': True, 'bootstrap_features': False, 'max_features': 1.0, 'max_samples': 1.0, 'n_estimators': 10}



randomCVsearch (XGBClassifier (vec=None, scaler=minmax, selector=None, sampler=None)): 100%|██████████| 5/5 [00:00<00:00,  5.78it/s]

2024-01-15 12:24:18,413 - sam_ml.models.main_auto_ml - INFO - XGBClassifier (vec=None, scaler=minmax, selector=None, sampler=None) - score: 0.9459876543209876 (recall) - parameters: {'colsample_bytree': 1.0, 'gamma': 0.0, 'learning_rate': 0.1, 'max_depth': 6, 'min_child_weight': 1, 'n_estimators': 100, 'reg_alpha': 0, 'reg_lambda': 1.0}





2024-01-15 12:24:18,587 - sam_ml.models.main_auto_ml - INFO - best model type LogisticRegression (vec=None, scaler=minmax, selector=None, sampler=None) - recall: 1.0 - parameters: {'C': 63.512210106407046, 'penalty': 'l2', 'solver': 'saga'}


In [14]:
tester.output_scores_as_pd(sort_by=["recall", "train_time"], console_out=False)

Unnamed: 0,accuracy,precision,recall,s_score,l_score,train_time,train_score,best_score (rCVs),best_hyperparameters (rCVs)
"LogisticRegression (vec=None, scaler=minmax, selector=None, sampler=None)",1.0,1.0,1.0,0.9926,1.0,0:00:00,0.967063,0.950617,"{'C': 63.512210106407046, 'penalty': 'l2', 'so..."
"LinearDiscriminantAnalysis (vec=None, scaler=minmax, selector=None, sampler=None)",1.0,1.0,1.0,0.9926,1.0,0:00:00,0.975193,0.981481,{'solver': 'svd'}
"DecisionTreeClassifier (vec=None, scaler=minmax, selector=None, sampler=None)",1.0,1.0,1.0,0.9926,1.0,0:00:00,0.949135,0.939374,"{'criterion': 'gini', 'max_depth': 2, 'max_fea..."
"RandomForestClassifier (vec=None, scaler=minmax, selector=None, sampler=None)",1.0,1.0,1.0,0.9926,1.0,0:00:00,0.966229,0.946318,"{'bootstrap': False, 'criterion': 'gini', 'max..."
"GradientBoostingMachine (vec=None, scaler=minmax, selector=None, sampler=None)",1.0,1.0,1.0,0.9926,1.0,0:00:00,0.98374,0.952932,"{'criterion': 'friedman_mse', 'learning_rate':..."
"AdaBoostClassifier (RFC based) (vec=None, scaler=minmax, selector=None, sampler=None)",1.0,1.0,1.0,0.9926,1.0,0:00:00,1.0,0.952932,"{'algorithm': 'SAMME', 'estimator__max_depth':..."
"KNeighborsClassifier (vec=None, scaler=minmax, selector=None, sampler=None)",1.0,1.0,1.0,0.9926,1.0,0:00:00,1.0,0.954255,"{'leaf_size': 19, 'n_neighbors': 5, 'p': 4, 'w..."
"ExtraTreesClassifier (vec=None, scaler=minmax, selector=None, sampler=None)",1.0,1.0,1.0,0.9926,1.0,0:00:00,0.98374,0.952932,"{'bootstrap': False, 'criterion': 'gini', 'max..."
"GaussianNB (vec=None, scaler=minmax, selector=None, sampler=None)",1.0,1.0,1.0,0.9926,1.0,0:00:00,0.949969,0.938051,{'var_smoothing': 1e-09}
"GaussianProcessClassifier (vec=None, scaler=minmax, selector=None, sampler=None)",1.0,1.0,1.0,0.9926,1.0,0:00:00,0.941422,0.93215,"{'max_iter_predict': 47, 'multi_class': 'one_v..."


**Note:** If you are using the sam_ml version with SMAC, you can also use the `find_best_model_smac` method which often has a bit better results

**Note:** If you need to test a lot of different pipelines, for example at the beginning to find the best preprocessing steps, you can use the `find_best_model_mass_search` method which is faster and can test a lot of different models/pipelines. Be aware that this method has lower quality results and is recommended to use for an overview.