# README
**Reminder to Windows and Mac OSX user:**
- auto-sklearn is not or partially supported on these systems, consider using Colab, VM, Docker.
- https://automl.github.io/auto-sklearn/master/installation.html#windows-osx-compatibility  
  
**General workflow for colab user**  
This notebook is organized in multiple steps, grouped in sections and subsections.
1. Set up  
    - install auto-sklearn
    - mount google drive
    - import packages
    - set paths for save/load
2. STEP A - Fetch datasets from OpenML (run once)
    - using sklearn API to fetch (multiple) datasets from OpenML
    - save each dataset as a pickle (.pkl), containing a dict with 'x', 'y' as keys and data as values, named as openml_datasetID.pkl
    - this step is intended to be run once only, afterwards user can load the data from pickle rather than fetch again which is time consuming. If you already have datasets in data folder, skip STEP A.
3. STEP B - Compute meta features (optional)
    - compute the meta features from pickle files, and write result to 'meta_features.pkl'
    - as meta info
    - to test whether the data loading works correctly
    - to test whether the saving result works correctly
    - can skip to STEP C
4. STEP C - Run experiment
    1. Set up
        - set up experiment by choosing budgets, datasets etc.
        - set up autosklearn classifier or modify settings
        - generates seed for reproducbility
        - check estimated time to run experiment, make sure you won't encounter session timeout
    2. Run
        - proceed to start experiment, a nested loop over datasets (outer) and seeds (inner) repeating the procedure 
            - train-test-split
            - train classifier on train set
           - evaluate balanced accuracy on test set
        - generate an unique timestamp id at the beginning
        - results are gathered in a 'res' dict with keys 'cls' and 'acc' containing the trained classifier and test accuracy
    3. Save (recommended)
        - save the 'res' dict to 'experiment_id.pkl' in results 
folder
        - storage warning: file size ~ 1.2GB for 190 trained classifiers + acc, make sure you have sufficient space, else do not save classifiers
        - avoid results loss in colab due to session timeout
        - analyze results in separate notebook to keep running experiment
        - (to do) saving results after each iteration of outer loop

**Particular instruction to colab user:**
- 'Run for colab' contains two cells, each with function to
    1.   install auto-sklearn on colab, then auto restart runtime. Run only once.
    2.   mount google drive for loading/saving datasets and results. In the examples in this notebook, dataset can be load from pickle and results can be dump to pickle, both stored at user-defined directories in google drive.

**Particular instruction to non-colab user:**
- Ignore 'Run for colab'
- Change the paths in 'Set paths' section


**File structure**  
The assumed file structure for the experiment.
  
    directory/
    -- Experiments.ipynb 
    -- Analyze_results.ipynb
    -- data/
    ---- openml_1234.pkl  
    -- results/
    ---- experiments_timestamp/ (note: current this subfolder is not functionable yet)
    ------ experiments_timestamp.pkl
    ------ fig_description_123.pkl

# INCLUDE TO SET UP

## Run for colab

In [None]:
# For Colab, you need to install auto-sklearn every time
import sys
IN_COLAB = 'google.colab' in sys.modules
if IN_COLAB:
    !pip install auto-sklearn # Downgrade scipy to 1.4.x
    #!pip install scipy # Upgrade scipy to 1.7.x

import os, signal
os.kill(os.getpid(), signal.SIGKILL) # Restart_runtime

Collecting auto-sklearn
  Downloading auto-sklearn-0.14.2.tar.gz (6.3 MB)
[K     |████████████████████████████████| 6.3 MB 4.4 MB/s 
Collecting scipy>=1.7.0
  Downloading scipy-1.7.3-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (38.1 MB)
[K     |████████████████████████████████| 38.1 MB 1.4 MB/s 
Collecting scikit-learn<0.25.0,>=0.24.0
  Downloading scikit_learn-0.24.2-cp37-cp37m-manylinux2010_x86_64.whl (22.3 MB)
[K     |████████████████████████████████| 22.3 MB 1.2 MB/s 
Collecting distributed<2021.07,>=2.2.0
  Downloading distributed-2021.6.2-py3-none-any.whl (722 kB)
[K     |████████████████████████████████| 722 kB 51.7 MB/s 
Collecting liac-arff
  Downloading liac-arff-2.5.0.tar.gz (13 kB)
Collecting ConfigSpace<0.5,>=0.4.14
  Downloading ConfigSpace-0.4.20-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.2 MB)
[K     |████████████████████████████████| 4.2 MB 46.5 MB/s 
[?25hCollecting pynisher>=0.6.3
  Downloading pynisher-0.6.4.tar.gz (11 kB)
Colle

In [None]:
import sys
IN_COLAB = 'google.colab' in sys.modules

from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


## Run for everyone

In [None]:
# Dependencies
# Common
import os
import pickle
import sys
import time
from tqdm.autonotebook import tqdm
from datetime import datetime

import numpy as np
import pandas as pd

# Plot
import matplotlib.pyplot as plt
import seaborn as sns

# ML
import sklearn # Import sklearn before autosklearn, solve scipy version error
from sklearn.model_selection import train_test_split
import sklearn.datasets
from sklearn.metrics import accuracy_score, balanced_accuracy_score

# AML
import autosklearn
import autosklearn.classification
from autosklearn.metrics import balanced_accuracy, precision, recall

# Check machine
import multiprocessing
multiprocessing.cpu_count()

  import sys


2

## Set paths

In [None]:
# Set up paths

# User insert folder to store pickles
# dataset name in format openml_xxx.pkl
datasets_folder = '/content/drive/My Drive/Colab Notebooks/course_AML_proj/data'
results_folder = '/content/drive/My Drive/Colab Notebooks/course_AML_proj/results'

In [None]:
# Example of writing to google drive
#with open('/content/drive/My Drive/foo.txt', 'w') as f:
#  f.write('Hello Google Drive!')
#!cat /content/drive/My\ Drive/foo.txt

Hello Google Drive!

# List all classifiers in auto-sklearn
https://automl.github.io/auto-sklearn/master/examples/40_advanced/example_interpretable_models.html#sphx-glr-examples-40-advanced-example-interpretable-models-py

In [None]:
from autosklearn.pipeline.components.classification import ClassifierChoice

for name in ClassifierChoice.get_components():
    print(name)

adaboost
bernoulli_nb
decision_tree
extra_trees
gaussian_nb
gradient_boosting
k_nearest_neighbors
lda
liblinear_svc
libsvm_svc
mlp
multinomial_nb
passive_aggressive
qda
random_forest
sgd


# WORKFLOW


## STEP A: Fetch dataset and store in pickle (run once)
sklearn fetch from openml https://scikit-learn.org/stable/datasets/loading_other_datasets.html

Error: dataset 41147 gave internal server error

In [None]:
# Fetch OpenML datasets

# All dataset ids are to be downloaded 
# except 41147 that returned error
# Dataset ids extracted from  Appendix D, Table 20 from the autosklearn2.0 paper
dataset_ids = [41165, 41161, 41159, 41163,
               41142, 1468, 41164, 40996, 
               1111, 12, 41166, 41138,
               41143, 1486, 
               #41147,  # exclude due to internal server error
               41167, 
               41168, 1596, 41150, 40668,
               3, 23512, 41169, 1067,
               23517, 31, 41146, 40984,
               54, 1461, 40981, 1590,
               4135, 40685, 1169, 40975,
               41027, 1489, 1464]

# User insert folder to store pickles
folder_path = datasets_folder

# Fetch and save each dataset
for dataset_id in dataset_ids:
    print(f'Fetching data: {dataset_id}')
    # Using fetch function from sklearn
    x, y = sklearn.datasets.fetch_openml(data_id=dataset_id, 
                                         return_X_y=True, 
                                         as_frame=True) # Return dataframe
    # Store as dict
    data = {'x': x, 'y': y}
    # Dump dict to pickle
    file_to_dump = data

    fname = os.path.join(folder_path, 'openml_' + str(dataset_id))
    file_write = open(fname, 'wb')
    # "wb" mode opens the file in binary format for writing
    pickle.dump(file_to_dump, file_write)
    file_write.close()

## STEP B: Test data loading and compute meta features of test dataset, test result saving 
- Compute meta features of all datasets.
- Also serves as checking whether data can be successfully loaded
- Takes < 1 min to run

In [None]:
meta_features = {}

folder_path = datasets_folder
all_dataset_names = os.listdir(folder_path)

print(f'Total # datasets = {len(all_dataset_names)}')

for dataset_name in all_dataset_names:

    # Load data
    fname = os.path.join(folder_path, dataset_name)
    file_read = open(fname, "rb")
    data = pickle.load(file_read) # Load pickle to data
    file_read.close()

    # dataset name in format openml_xxx.pkl
    # extract xxx as key
    dataset_id = dataset_name.replace('_', '.').split('.')[1]

    x, y = data['x'], data['y']
    print(f'\nCompute meta features of test dataset: {dataset_id}')
    print(f'meta feature: # instance {x.shape[0]}, # feature {x.shape[1]}')
    meta_features[str(dataset_id)] = {'n_instance': x.shape[0], # Row
                                      'n_feature': x.shape[1], # Col
                                      'n_class': len(np.unique(y))}    

Total # datasets = 38

Compute meta features of test dataset: 41168
meta feature: # instance 83733, # feature 54

Compute meta features of test dataset: 1596
meta feature: # instance 581012, # feature 54

Compute meta features of test dataset: 41150
meta feature: # instance 130064, # feature 50

Compute meta features of test dataset: 40668
meta feature: # instance 67557, # feature 42

Compute meta features of test dataset: 3
meta feature: # instance 3196, # feature 36

Compute meta features of test dataset: 23512
meta feature: # instance 98050, # feature 28

Compute meta features of test dataset: 41169
meta feature: # instance 65196, # feature 27

Compute meta features of test dataset: 1067
meta feature: # instance 2109, # feature 21

Compute meta features of test dataset: 23517
meta feature: # instance 96320, # feature 21

Compute meta features of test dataset: 31
meta feature: # instance 1000, # feature 20

Compute meta features of test dataset: 41146
meta feature: # instance 5124, #

In [None]:
# Save meta features
folder_path = results_folder

# Dump dict to pickle
file_to_dump = meta_features
fname = os.path.join(folder_path, 'openml_meta_features.pkl')
file_write = open(fname, 'wb')
# "wb" mode opens the file in binary format for writing
pickle.dump(file_to_dump, file_write)
file_write.close()

## STEP C: Run experiments

### Experiment 1 part 1

#### Set up experiment
- exp id =  'experiment_20211202-220329.pkl'
- time budget = 60s (1 min)
- memory budget = 3072MB
- datasets = 38 datasets

In [None]:
# Dataset

# User insert folder to store pickles
folder_path = datasets_folder
# Datasets D_new to evaluate
# All datasets located in the datasets_folder directory set up by user
all_dataset_names = os.listdir(folder_path)

# Auto-sklearn hyperparameters
# Time limit in seconds for the search of appropriate models.
time_left_for_this_task = 60 # Time budget
#per_run_time_limit = int(time_left_for_this_task / 10) # Default is budget/10. Longer the more training time.
# Memory limit in MB for the machine learning algorithm.
memory_limit = 3072 # Memory budget

# Repeat evaluation with different seeds
repeat = 5 

# Set random seed generator
rng = np.random.default_rng(12345) # Fix seed for reproducibility
seeds = rng.integers(0, 2**32-1, repeat) # Max seed limited by autoskl

seed_split = 12345 # Fix seed for reproducibility on train test split


# Check estimated experiment duration before proceed
print(f'Est. time required: {repeat * time_left_for_this_task * len(all_dataset_names) / 60} min')
print(f'seeds: {seeds}')

Est. time required: 190.0 min
seeds: [3003105692  976400780 3387213021 1360466708  876933080]


#### Start experiment

In [None]:
# Run experiment

# Set unique identifier
timestamp = datetime.now().strftime('%Y%m%d-%H%M%S')
exp_id = "experiment_" + timestamp

# Tqdm bars
outter_bar = tqdm(dataset_ids, desc='dataset', leave=False)
inner_bar = tqdm(seeds, desc='seed', leave=False)

# Store result
res = {}

# Start experiment
print(f'\nStart running experiment id: {exp_id}')
start = time.perf_counter()

for dataset_name in all_dataset_names:
    
    outter_bar.update(1) # tqdm progress bar + 1

    # Load data
    fname = os.path.join(folder_path, dataset_name)
    file_read = open(fname, "rb")
    data = pickle.load(file_read) # Load pickle to data
    file_read.close()

    # dataset name in format openml_xxx.pkl
    # extract xxx as key
    dataset_id = dataset_name.replace('_', '.').split('.')[1]

    x, y = data['x'], data['y']
    print(f'\nStart training on dataset: {dataset_id}')
    print(f'meta feature: # instance = {x.shape[0]}, # feature = {x.shape[1]}')

    # Split data
    x_train, x_test, y_train, y_test = train_test_split(
        x, y, test_size=0.2, random_state=seed_split)

    res_cls = []
    res_acc = []

    for seed in seeds:

        inner_bar.update(1) # tqdm progress bar + 1

        # Train
        cls = autosklearn.classification.AutoSklearnClassifier(
            time_left_for_this_task=time_left_for_this_task,
#            per_run_time_limit=per_run_time_limit, # Default 1/10 of budget
            seed=int(seed),
            memory_limit=memory_limit,
            metric=balanced_accuracy, # For optim
            scoring_functions=[precision, recall] # Not for optim
        )
        # Feurer et al.(2021): single metric for binary clf, multiclass clf and unbalanced datasets

        # Train
        cls.fit(x_train, y_train, x_test, y_test,
                dataset_name=dataset_id)
        
        # Test on trained classfier
        predictions = cls.predict(x_test)
        test_acc = balanced_accuracy_score(y_test, predictions)

        # Store
        res_cls.append(cls)
        res_acc.append(test_acc)
    
    inner_bar.reset() # reset the seed progress bar for next dataset

    res[dataset_id] = {'cls': res_cls, 'acc': res_acc}

# Finish experiment
end = time.perf_counter()

print(f'\nExperiment completed in: {(end-start)/60:.1f} min')

dataset:   0%|          | 0/38 [00:00<?, ?it/s]

seed:   0%|          | 0/5 [00:00<?, ?it/s]


Start training on dataset: 41165
meta feature: # instance 10000, # feature 7200

Start training on dataset: 41161
meta feature: # instance 20000, # feature 4296

Start training on dataset: 41159
meta feature: # instance 20000, # feature 4296

Start training on dataset: 41163
meta feature: # instance 10000, # feature 2000

Start training on dataset: 41142
meta feature: # instance 5418, # feature 1636

Start training on dataset: 1468
meta feature: # instance 1080, # feature 856

Start training on dataset: 41164
meta feature: # instance 8237, # feature 800

Start training on dataset: 40996
meta feature: # instance 70000, # feature 784

Start training on dataset: 1111
meta feature: # instance 50000, # feature 230


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X[column] = X[column].astype('category')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X[column] = X[column].astype('category')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X[column] = X[column].astype('category')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[


Start training on dataset: 12
meta feature: # instance 2000, # feature 216

Start training on dataset: 41166
meta feature: # instance 58310, # feature 180

Start training on dataset: 41138
meta feature: # instance 76000, # feature 170

Start training on dataset: 41143
meta feature: # instance 2984, # feature 144

Start training on dataset: 1486
meta feature: # instance 34465, # feature 118

Start training on dataset: 41167
meta feature: # instance 416188, # feature 60

Start training on dataset: 41168
meta feature: # instance 83733, # feature 54

Start training on dataset: 1596
meta feature: # instance 581012, # feature 54

Start training on dataset: 41150
meta feature: # instance 130064, # feature 50

Start training on dataset: 40668
meta feature: # instance 67557, # feature 42

Start training on dataset: 3
meta feature: # instance 3196, # feature 36

Start training on dataset: 23512
meta feature: # instance 98050, # feature 28

Start training on dataset: 41169
meta feature: # instan

#### Saving trained classifiers, test accuracy to pickle
WARNING: file size may be large
1.12GB

In [None]:
# Save result and trained model to pickle for persitence

# Save
file_dir = results_folder
file_path = os.path.join(file_dir, exp_id)
file_write = open(file_path, "wb")
# "wb" mode opens the file in binary format for writing
pickle.dump(res, file_write)
file_write.close()

#### Logs

All dataset completed without warnings except for:

    Start training on dataset: 1111
    meta feature: # instance 50000, # feature 230
    /usr/local/lib/python3.7/dist-packages/autosklearn/data/feature_validator.py:209: SettingWithCopyWarning: 
    A value is trying to be set on a copy of a slice from a DataFrame.
    Try using .loc[row_indexer,col_indexer] = value instead


#### Process results (suggest to be done separately after saving res file)

In [None]:
pd.DataFrame({k: {'mean_acc': np.mean(v['acc']), 'sd_acc': np.std(v['acc'])} for k,v in res.items()}).T

### Experiment 1 part 2

#### Set up experiment
- exp id =  'experiment_20211203-081432.pkl', 'experiment_20211203-120718.pkl'
- time budget = 240s (4 min)
- memory budget = 3072MB
- datasets = subset of 10, easier

- exp id = 'experiment_20211203-133531'
- time budget = 480s (8 min)
- memory budget = 3072MB
- datasets = subset of 10, easier

In [None]:
# Auto-sklearn hyperparameters
# Time limit in seconds for the search of appropriate models.
time_left_for_this_task = 480 # Time budget
#per_run_time_limit = int(time_left_for_this_task / 10) # Default is budget/10. Longer the more training time.
# Memory limit in MB for the machine learning algorithm.
memory_limit = 3072 # Memory budget

# Repeat evaluation with different seeds
repeat = 5 

# Set random seed generator
rng = np.random.default_rng(12345) # Fix seed for reproducibility
seeds = rng.integers(0, 2**32-1, repeat) # Max seed limited by autoskl

seed_split = 12345 # Fix seed for reproducibility on train test split

# Dataset

# User insert folder to store pickles
folder_path = datasets_folder
# Datasets D_new to evaluate
# All datasets located in the datasets_folder directory set up by user

eval_all_datasets = False

if eval_all_datasets:
    # All datasets in dataset folder
    all_dataset_names = os.listdir(folder_path)
else:
    # Subset of datasets indicated below
    # Subset below has better performance for shorter time budget
    subset = ['41143', '54', '1489', '40981', '41146', '1468', '12', '40984', '3', '40975']
    all_dataset_names = ['openml_' + name + '.pkl' for name in subset]


# Check estimated experiment duration before proceed
print(f'Est. time required: {repeat * time_left_for_this_task * len(all_dataset_names) / 60} min')
print(f'seeds: {seeds}')

Est. time required: 400.0 min
seeds: [3003105692  976400780 3387213021 1360466708  876933080]


#### Start experiment

In [None]:
# Run experiment

# Set unique identifier
timestamp = datetime.now().strftime('%Y%m%d-%H%M%S')
exp_id = "experiment_" + timestamp

# Tqdm bars
outter_bar = tqdm(all_dataset_names, desc='dataset', leave=False)
inner_bar = tqdm(seeds, desc='seed', leave=False)

# Store result
res = {}

# Start experiment
print(f'\nStart running experiment id: {exp_id}')
start = time.perf_counter()

for dataset_name in all_dataset_names:
    
    outter_bar.update(1) # tqdm progress bar + 1

    # Load data
    fname = os.path.join(folder_path, dataset_name)
    file_read = open(fname, "rb")
    data = pickle.load(file_read) # Load pickle to data
    file_read.close()

    # dataset name in format openml_xxx.pkl
    # extract xxx as key
    dataset_id = dataset_name.replace('_', '.').split('.')[1]

    x, y = data['x'], data['y']
    print(f'\nStart training on dataset: {dataset_id}')
    print(f'meta feature: # instance = {x.shape[0]}, # feature = {x.shape[1]}')

    # Split data
    x_train, x_test, y_train, y_test = train_test_split(
        x, y, test_size=0.2, random_state=seed_split)

    res_cls = []
    res_acc = []

    for seed in seeds:

        inner_bar.update(1) # tqdm progress bar + 1

        # Train
        cls = autosklearn.classification.AutoSklearnClassifier(
            time_left_for_this_task=time_left_for_this_task,
#            per_run_time_limit=per_run_time_limit, # Default 1/10 of budget
            seed=int(seed),
            memory_limit=memory_limit,
            metric=balanced_accuracy, # For optim
            scoring_functions=[precision, recall] # Not for optim
        )
        # Feurer et al.(2021): single metric for binary clf, multiclass clf and unbalanced datasets

        # Train
        cls.fit(x_train, y_train, x_test, y_test,
                dataset_name=dataset_id)
        
        # Test on trained classfier
        predictions = cls.predict(x_test)
        test_acc = balanced_accuracy_score(y_test, predictions)

        # Store
        res_cls.append(cls)
        res_acc.append(test_acc)
    
    inner_bar.reset() # reset the seed progress bar for next dataset

    res[dataset_id] = {'cls': res_cls, 'acc': res_acc}

# Finish experiment
end = time.perf_counter()

print(f'\nExperiment completed in: {(end-start)/60:.1f} min')

dataset:   0%|          | 0/10 [00:00<?, ?it/s]

seed:   0%|          | 0/5 [00:00<?, ?it/s]


Start running experiment id: experiment_20211203-133531

Start training on dataset: 41143
meta feature: # instance = 2984, # feature = 144

Start training on dataset: 54
meta feature: # instance = 846, # feature = 18

Start training on dataset: 1489
meta feature: # instance = 5404, # feature = 5

Start training on dataset: 40981
meta feature: # instance = 690, # feature = 14

Start training on dataset: 41146
meta feature: # instance = 5124, # feature = 20


Process ForkProcess-5613:
Traceback (most recent call last):
  File "/usr/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/usr/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.7/dist-packages/autosklearn/util/logging_.py", line 320, in start_log_server
    receiver.serve_until_stopped()
  File "/usr/local/lib/python3.7/dist-packages/autosklearn/util/logging_.py", line 352, in serve_until_stopped
    self.timeout)
KeyboardInterrupt



Start training on dataset: 1468
meta feature: # instance = 1080, # feature = 856

Start training on dataset: 12
meta feature: # instance = 2000, # feature = 216

Start training on dataset: 40984
meta feature: # instance = 2310, # feature = 18

Start training on dataset: 3
meta feature: # instance = 3196, # feature = 36

Start training on dataset: 40975
meta feature: # instance = 1728, # feature = 6


In [None]:
res

NameError: ignored

#### Saving trained classifiers, test accuracy to pickle
WARNING: file size may be large
1.12GB

In [None]:
# Save result and trained model to pickle for persitence

# Save
file_dir = results_folder
file_path = os.path.join(file_dir, exp_id + '.pkl')
file_write = open(file_path, "wb")
# "wb" mode opens the file in binary format for writing
pickle.dump(res, file_write)
file_write.close()

#### Logs

All dataset completed without warnings except for:

60s, 3072MB  

    Start training on dataset: 23517
    meta feature: # instance = 96320, # feature = 21
    [WARNING] [2021-12-03 00:56:35,991:Client-EnsembleBuilder] No models better than random - using Dummy loss!Number of models besides current dummy model: 1. Number of dummy models: 1

120s, 3072MB  

    Start training on dataset: 40984
    meta feature: # instance = 2310, # feature = 18
    [ERROR] [2021-12-03 10:37:55,070:Client-AutoML(976400780):40984] Dummy prediction failed with run state StatusType.MEMOUT and additional output: {'error': 'Memout (used more than 3072 MB).', 'configuration_origin': 'DUMMY'}.

120s, 3072MB  

    Start training on dataset: 40975
    meta feature: # instance = 1728, # feature = 6
    [WARNING] [2021-12-03 12:53:45,142:smac.runhistory.runhistory2epm.RunHistory2EPM4LogCost] Got cost of smaller/equal to 0. Replace by 0.000010 since we use log cost.

#### Process results (suggest to be done separately after saving res file)

In [None]:
pd.DataFrame({k: {'mean_acc': np.mean(v['acc']), 'sd_acc': np.std(v['acc'])} for k,v in res.items()}).T

Unnamed: 0,mean_acc,sd_acc
40984,0.977665,0.001179
3,0.986602,0.001189
40975,0.999438,0.001124
