This notebook generates shared files required for the figures in the paper.

In [117]:
import os
import pandas as pd

# surrogate_performance.csv
Contains predicted performance of the generated solutions on the task that was left out during optimization and selection.

Columns:
 - task: the task
 - learner: the learner the expression is found for (e.g. knn, svm)
 - expression: the expression for which the score is predicted
 - score: the normalized score as predicted by the surrogate model for the task
 - optimizer: the optimizer used for finding the expression, one of
     - `Symbolic Default`: obtained with the $\mu$ + $\lambda$ symbolic regression including symbolic terminals.
     - `Constant Default`: obtained with the $\mu$ + $\lambda$ symbolic regression without symbolic terminals.
     - `Random Search X`: obtained with random search but otherwise same as `Symbolic Default`
     - `Package Default`: the scikit-learn or mlr package default.
     - `Optimistic Random Search X`: The best test score on the task among X randomly drawn experiments of **real data**.
     
Note on the difference of `Random Search` and `Optimistic Random Search`, the `Random Search` is an estimate where random search is employed as optimizer for symbolic expressions. The expression is optimized and selected based on tasks that are *not* the target task. By contrast, `Optimistic Random Search` directly optimizes the configuration on the test task. So `Random Search` finds a *default* whereas `Optimistic Random Search` simulates optimization on the task.

## Generated Default Surrogate Scores
Results for optimizers `Symbolic Default`, `Constant Default`, and `Random Search X`.

First we load the generated defaults ...

In [118]:
defaults_directory = "../data/generated_defaults"
directory_map = dict(
    # dirname = (optimizer, constants_only)
    symbolic=("mupluslambda", False), 
    constants=("mupluslambda", True), 
    # symbolic=("mu_plus_lambda", False), 
)

generated_defaults = []
for dirname, (optimizer, constants) in directory_map.items():
    for defaults_file in os.listdir(os.path.join(defaults_directory, dirname)):
        if not "mean_rank" in defaults_file:
            continue

        with open(os.path.join(defaults_directory, dirname, defaults_file), "r") as fh:
            lines = fh.readlines()

        for line in lines[1:]:
            learner, task, expression = line[:-1].split(',', 2)
            generated_defaults.append(dict(
                task=task,
                learner=learner,
                optimizer=optimizer,
                constants=constants,
                expression=expression[1:-1],  # expression was exported with quotes
            ))

We could recompile the expressions and query the surrogates to obtain the scores. However this is complicated to do for all algorithms in the same script due to some `DEAP` limitations. For that reason we simply look up the recorded test performance from the run files.

In [119]:
main_directory = "../run"
run_directories = [
    os.path.join(main_directory, subdir, rundir)
    for subdir in os.listdir(main_directory) if os.path.isdir(os.path.join(main_directory, subdir))
    for rundir in os.listdir(os.path.join(main_directory, subdir))
]

In [120]:
runs = []

for run_directory in run_directories:
    with open(os.path.join(run_directory, "metadata.csv"), "r") as fh:
        lines = fh.readlines()
    metadata = dict(line[:-1].split(';') for line in lines[1:])
    if metadata['aggregate'] != 'mean':
        continue
    
    optimizer = metadata['algorithm']
    constants = (metadata['constants_only'] == 'True')
    learner = metadata['problem'][len('mlr_'):]
    
    for default in generated_defaults:
        if 'surrogate_score' in default:
            continue
        
        # run conditions don't matter for the score of the expression on the test set,
        # but we can avoid loading a bunch of `final_pareto` files which likely don't have
        # the expression we are looking for this way.
        different_optimizer = default['optimizer'] != optimizer
        different_constant_constraint = default['constants'] != constants
        different_learner = default['learner'] != learner
        if different_optimizer or different_constant_constraint or different_learner:
            continue
        
        with open(os.path.join(run_directory, "final_pareto.csv"), "r") as fh:
            for line in fh.readlines():
                if default["expression"] in line:
                    _, _, task, score, *_ = line[:-1].split(';')
                    if default["task"] == task:
                        default["surrogate_score"] = score 

FileNotFoundError: [Errno 2] No such file or directory: '../run\\results_04_02_2021_evening.tar\\results_04_02_2021_evening\\metadata.csv'

In [None]:
missing_records = [d for d in generated_defaults if "surrogate_score" not in d]
print(f"Missing {len(missing_records)} surrogate performance estimates.")

In [None]:
surrogate_performance = pd.DataFrame.from_dict(generated_defaults, orient='columns')
surrogate_performance.sample(5)

## Implementation Default Surrogate Scores
Results for `Package Default` "optimizer".

In [112]:
implementation_default_names = ["sklearn_default", "mlr_default"]
implementation_defaults = []

# we only need implementation performance for (task, learner) pairs which have a generated default
for task, learner in set(zip(surrogate_performance.task, surrogate_performance.learner)):
    for name in implementation_default_names:
        # defaults only recorded for some problem; the glmnet default is ignored
        if learner not in ["svm", "xgboost"] and name == "sklearn_default":
            continue
        if learner in ["xgboost"] and name == "mlr_default":
            continue
        implementation_defaults.append(dict(
            task=task,
            learner=learner,
            optimizer=name,
            constants=False,
            expression=name,
        ))

In [113]:
for run_directory in run_directories:
    with open(os.path.join(run_directory, "metadata.csv"), "r") as fh:
        lines = fh.readlines()
    metadata = dict(line[:-1].split(';') for line in lines[1:])
    learner = metadata['problem'][len('mlr_'):]
    if metadata['aggregate'] != 'mean':
        continue
    
    for default in implementation_defaults:
        if 'surrogate_score' in default:
            continue
        
        # Since all runs evaluate defaults regardless of optimization,
        # we don't need as strict filtering as above.
        if default['learner'] != learner:
            continue
        
        with open(os.path.join(run_directory, "evaluations.csv"), "r") as fh:
            # implementation defaults are reported last
            for line in fh.readlines()[-100:]:
                if default["expression"] in line:
                    _, _, task, _, _, score, *_ = line[:-1].split(';')
                    if default["task"] == task:
                        default["surrogate_score"] = score 

In [114]:
missing_records = [d for d in implementation_defaults if "surrogate_score" not in d]
print(f"Missing {len(missing_records)} surrogate performance estimates.")

Missing 0 surrogate performance estimates.


In [115]:
default_performance = pd.DataFrame.from_dict(implementation_defaults, orient='columns')
default_performance.sample(5)

Unnamed: 0,task,learner,optimizer,constants,expression,surrogate_score
214,168331,glmnet,mlr_default,False,mlr_default,0.844
97,3512,glmnet,mlr_default,False,mlr_default,0.907
77,168910,rpart,mlr_default,False,mlr_default,0.5292
8,3573,glmnet,mlr_default,False,mlr_default,0.9156
633,9950,glmnet,mlr_default,False,mlr_default,0.9612


In [116]:
pd.concat([default_performance, surrogate_performance]).to_csv("surrogate_performance.csv", sep=';', index=False)

# Optimistic Random Search

In contrast to the previous two, these are drawn directly from the real experiment data.

In [107]:
import arff
import json

learners = surrogate_performance.learner.unique()
task_learner_combinations = set(zip(surrogate_performance.task, surrogate_performance.learner))
data_for_learner = dict()


for learner in learners:
    print(f"Loading {learner}")
    with open(f"../problems/mlr_{learner}.json", "r") as json_file:
        problem_definition = json.load(json_file)
    experiment_file = problem_definition["experiment_data"]
    
    with open(os.path.join("..", experiment_file)) as arff_file:
        d = arff.load(arff_file)
    columns, dtypes = zip(*d["attributes"])
    experiment_data = pd.DataFrame(d["data"], columns=columns)
    data_for_learner[learner] = experiment_data

Loading glmnet
Loading knn
Loading rf
Loading rpart
Loading svm
Loading xgboost


In [108]:
results = []
for task, learner in task_learner_combinations:
    experiment_data = data_for_learner[learner]
    learner_task_data = experiment_data[experiment_data.task_id == float(task)].copy()
    # log loss is the target, and it is minimized
    score_column = -learner_task_data["perf.logloss"]
    # perform min-max scaling to compare to surrogate predictions
    learner_task_data["normalized_score"] = (score_column - min(score_column)) / (max(score_column) - min(score_column))
    
    for x in [2**i for i in range(1, 6)]:
        scores = []
        for _ in range(100):
            x_sample = learner_task_data.sample(x)
            best_result = x_sample["normalized_score"].idxmax()
            scores.append(x_sample.loc[best_result]["normalized_score"])
        
        results.append(dict(
            task=task,
            learner=learner,
            optimizer=f"optimistic_random_search_{x}",
            constants=False,
            expression="100 replications",
            surrogate_score=sum(scores)/len(scores),
        ))


In [109]:
optimistic_random_search = pd.DataFrame.from_dict(results, orient='columns')

In [110]:
optimistic_random_search

Unnamed: 0,task,learner,optimizer,constants,expression,surrogate_score
0,9960,xgboost,optimistic_random_search_2,False,100 replications,0.876472
1,9960,xgboost,optimistic_random_search_4,False,100 replications,0.941642
2,9960,xgboost,optimistic_random_search_8,False,100 replications,0.979111
3,9960,xgboost,optimistic_random_search_16,False,100 replications,0.992509
4,9960,xgboost,optimistic_random_search_32,False,100 replications,0.996307
...,...,...,...,...,...,...
3290,168912,svm,optimistic_random_search_2,False,100 replications,0.674328
3291,168912,svm,optimistic_random_search_4,False,100 replications,0.874119
3292,168912,svm,optimistic_random_search_8,False,100 replications,0.939477
3293,168912,svm,optimistic_random_search_16,False,100 replications,0.949916


In [111]:
optimistic_random_search.to_csv("optimistic_random_search.csv", sep=';', index=False)