# Evaluation of default configurations


We have two answer to questions:
 1. By which method can we find good Symbolic Defaults? 
 2. **Can we find good (i.e. better than currently known) symbolic defaults?**
 
This notebook addresses the second question.

----

After determining good symbolic defaults, we ought to see how they compare to current (scikit-learn) defaults.
To this end, we compare three different default configurations (in bold is the name by which they will be referenced henceforth):

 - The **symbolic** defaults we found from evolutionary optimization, specifically: `C=128, gamma=(mkd / 4)`
 - The scikit-learn **0.20** defaults, specifically: `C=1., gamma=(1 / n_features)`
 - The scikit-learn >= **0.22** defaults, specifically: `C=1., gamma=(1 / (n_features * X.var()))`
 
Note that actually all of these defaults are symbolic.

A second important detail to note is that these settings are not tried by themselves.
A (fairly standard) preprocessing pipeline is applied:
 - **Imputation**: using the mean for numeric features, and the most frequent value for categorical features.
 - **Transformation**: numeric features are scaled to N(0, 1), categorical features are one-hot encoded.
 - **Feature Selection**: all constant features are removed.
 
After these steps, the SVC is invoked on the preprocessed data with the given values for `C` and `gamma`.

Note: for the scikit-learn defaults, currently the metafeatures of the preprocessed data are used (e.g. `n_features` is determined after one-hot encoding, for instance). For the *symbolic* method `mkd` is determined on the original, (largely) unprocessed dataset (samples with NaN values are ignored).

## 1.1  Loading Data

In [6]:
# Data from results
from persistence import load_problem, load_results_for_problem
p = load_problem('problems.json', 'svc')
svc_results = load_results_for_problem(p)

In [7]:
# A couple methods that help parse data from the output logs.
import operator
def is_performance_line(line):
    return line.count(' ') == 2

def parse_performance_line(line):
    task, avg, std = line[:-1].split(' ')
    return int(task), float(avg), float(std)

def get_performance_from_console_output(file):
    with open(file, 'r') as fh:
        lines = fh.readlines()
    return [parse_performance_line(line) for line in lines
            if is_performance_line(line)]

def compare(list1, list2, idx, operator):
    """ Compares two lists of tuples by their i-th element according to operator. """
    return list([operator(a[idx], b[idx]) for (a, b) in zip(list1, list2)])

In [12]:
# load output logs
symb_default_file = "data/results/pipeline_c128mkd4.txt"
old_default_file = "data/results/pipeline_default.txt"
new_default_file = "data/results/pipeline_scale.txt"

symb_default_performances = get_performance_from_console_output(symb_default_file)
old_default_performances = get_performance_from_console_output(old_default_file)
new_default_performances = get_performance_from_console_output(new_default_file)

# Make sure that we always compare the same tasks (idx 0 is task)
assert sum(compare(symb_default_performances, old_default_performances, 0, operator.ne)) == 0
assert sum(compare(symb_default_performances, new_default_performances, 0, operator.ne)) == 0

## 1.2 Comparing Results
We compare results by number of times one's average cross-validation performance is better (first three columns) and by their loss as compared to the best found result in the original set of experiments (last column).

In [13]:
import pandas as pd
import numpy as np

methods = ['Symbolic', '0.20', '0.22']
df = pd.DataFrame(np.zeros(shape=(len(methods), len(methods)+1)), columns = methods + ['loss'])
df.index = methods

performances = list(zip(methods, [symb_default_performances, old_default_performances, new_default_performances]))
for (method, performance) in performances:
    for (method2, performance2) in performances:
        one_over_two = compare(performance, performance2, 1, operator.gt)
        df.loc[method][method2] = sum(one_over_two)

for (method, performance) in performances:
    loss_sum = 0
    for (task, score, _) in performance:
        best_score = svc_results[svc_results.task_id == task].predictive_accuracy.max()
        loss = best_score - score
        if loss < 0:
            print('{} outperformed best on task {} by {}'.format(method, task, loss))
        loss_sum += loss
    df.loc[method]['loss'] = loss_sum
    
df

Symbolic outperformed best on task 3543 by -1.1102230246251565e-16
Symbolic outperformed best on task 3561 by -0.00732265496049167
Symbolic outperformed best on task 34538 by -0.005555074074073962
Symbolic outperformed best on task 9956 by -0.0012622452830188813
0.20 outperformed best on task 3543 by -1.1102230246251565e-16
0.20 outperformed best on task 34538 by -0.0027772962962961945
0.20 outperformed best on task 23 by -0.006135132744989891
0.22 outperformed best on task 3543 by -1.1102230246251565e-16
0.22 outperformed best on task 34538 by -0.0027772962962961945
0.22 outperformed best on task 23 by -0.005468649935649883
0.22 outperformed best on task 20 by -0.0024999999999999467


Unnamed: 0,Symbolic,0.20,0.22,loss
Symbolic,0.0,23.0,21.0,0.639799
0.20,16.0,0.0,3.0,2.893993
0.22,16.0,17.0,0.0,2.359051
