# Evaluation of default configurations


We have two answer to questions:
 1. By which method can we find good Symbolic Defaults? 
 2. **Can we find good (i.e. better than currently known) symbolic defaults?**
 
This notebook addresses the second question.

----

# 1. SVC

After determining good symbolic defaults, we ought to see how they compare to current (scikit-learn) defaults. To this end, we compare three different default configurations (in bold is the name by which they will be referenced henceforth):

 - The **symbolic_pre** defaults we found from evolutionary optimization, specifically: `C=128, gamma=(mkd / 4)`. 
 This symbolic function uses metafeatures as calculated on the dataset *before* it is preprocessed.
 - The **symbolic_post** defaults we found from evolutionary optimization, specifically: `C=64, gamma=mkd`.
     This symbolic function uses metafeatures as calculated on the dataset *after* it has been preprocessed.
 - The scikit-learn **0.20** defaults, specifically: `C=1., gamma=(1 / n_features)`
 - The scikit-learn >= **0.22** defaults, specifically: `C=1., gamma=(1 / (n_features * X.var()))`
 
Note that actually all of these defaults are symbolic.

A second important detail to note is that these settings are not tried by themselves.
A (fairly standard) preprocessing pipeline is applied:
 - **Imputation**: using the mean for numeric features, and the most frequent value for categorical features.
 - **Transformation**: numeric features are scaled to N(0, 1), categorical features are one-hot encoded.
 - **Feature Selection**: all constant features are removed.
 
After these steps, the SVC is invoked on the preprocessed data with the given values for `C` and `gamma`.

Note: for the scikit-learn defaults, currently the metafeatures of the preprocessed data are used (e.g. `n_features` is determined after one-hot encoding, for instance). For the *symbolic* method `mkd` is determined on the original, (largely) unprocessed dataset (samples with NaN values are ignored).

## 1.1  Loading Data

In [None]:
from persistence import load_problem, load_results_for_problem
from visualization.output_parser import get_performance_from_console_output

def load_random_search_results(problem_name):
    p = load_problem('problems.json', problem_name)
    return load_results_for_problem(p)

In [None]:
# The grid search result from Jan.
svc_results = load_random_search_results(p)

# results currently still stored in log. should be aggregated to single file..
# "data/results/pipeline_c128mkd4.txt"
symb_default_performances = get_performance_from_console_output("data/results/pipeline_c128mkd4.txt")
old_default_performances = get_performance_from_console_output("data/results/pipeline_default.txt")
new_default_performances = get_performance_from_console_output("data/results/pipeline_scale.txt")

## 1.2 Comparing Results
We compare results by number of times one's average cross-validation performance is better (first three columns) and by their loss as compared to the best found result in the original set of experiments (last column).

In [None]:
import pandas as pd
import numpy as np

methods = ['Symbolic', '0.20', '0.22']
df = pd.DataFrame(np.zeros(shape=(len(methods), len(methods)+1)), columns = methods + ['loss'])
df.index = methods

# Calculate 'wins'
performances = list(zip(methods, [symb_default_performances, old_default_performances, new_default_performances]))
for (method, performance) in performances:
    for (method2, performance2) in performances:
        one_over_two = (performance.avg - performance2.avg) > 0
        df.loc[method][method2] = sum(one_over_two)

# Calculate loss        
for (method, performance) in performances:
    loss_sum = 0
    for i, row in performance.iterrows():
        best_score = svc_results[svc_results.task_id == row.name].predictive_accuracy.max()
        loss = best_score - row.avg
        if loss < 0:
            print('{} outperformed best on task {} by {}'.format(method, task, loss))
        loss_sum += loss
    df.loc[method]['loss'] = loss_sum
    
df

This reads as *Symbolic* won over the *0.20* default 40 times, while the *0.20* default was better than *Symbolic* on 22 tasks. *Symbolic* obtained a loss of 1.612 over the best known result of each task.

We see that *Symbolic* (i.e. `C=128, gamma=mkd/4`) as default outperforms either of the two scikit-learn ones, both in terms of tasks where it achieves higher predictive accuracy, and the loss in accuracy it occurs across tasks.

# 2. AdaBoost

# 3. Random Forest

----
**note**: Everything below is scratchpad and should be ignored

----