# Generated Hyperparameter Values
This notebooks looks at the hyperparameter values generated by symbolic expressions.
We want to identify if and when the generated hyperparameter values are out of bounds.
Hyperparamater values are out of bound if they are not allowed by the algorithm, or outside of our experiment data.

To collect the data for this experiment, we recorded for each evaluated individual the following data during one optimization trace:
 - The **generation** of optimization in which the individual was considered
 - The **individual** symbolic expression
 - The **task** for which the expression was resolved to constant values
 - The **hyperparameter values** that result from applying the **individual**'s expression on the given **task**'s meta-features.

## RPart
For RPart we have experiment data on the following hyperparameters ([docs](https://mlr3.mlr-org.com/reference/mlr_learners_classif.rpart.html?q=rpart)):

| Hyperparameter | Default | Documented Range | Experimental Range |
| :------------- | ------: | ---------------: | -----------------: |
| cp             |  0.01   | \[0, 1\]         | \[2e-6, 0.997\]    |
| max.depth      |  30     | \[1, 30\]        | \[1, 30\]          |
| min.bucket     |  1      | \[1, ∞)          | \[1, 195\]         |
| min.split      |  20     | \[1, ∞)          | \[1, 164\]         |

Table 1. *The legal hyperparameter values for each hyperparameter per documentation compared to the values used in experiments.*

In [1]:
import sys
sys.path.append("./src/")
from src.problem import Problem
problem = Problem("mlr_rpart")
for hyperparameter in problem.hyperparameters:
    print(f"{hyperparameter}: [{problem.data[hyperparameter].min()},{problem.data[hyperparameter].max()}]")

cp: [1.97159e-06,0.996794]
maxdepth: [1.0,30.0]
minbucket: [1.0,195.0]
minsplit: [1.0,164.0]


We optimized a symbolic expression holding out task 9986 (chosen arbitrarily), and collected the data described above.
Our symbolic expressions once resolved can lead to ranges many orders of magnitude out of scale:

In [2]:
import pandas as pd
df = pd.read_csv("data/rpart_hp_big.csv", header=0, sep=';')

In [3]:
for hyperparameter in problem.hyperparameters:
    print(f"{hyperparameter}: [{df[hyperparameter].min()},{df[hyperparameter].max()}]")

cp: [-16916649214.586359,3.4142787736421956e+29]
maxdepth: [-3.7131929274565926e+29,8.320664372691359e+29]
minbucket: [-4.7063248002557094e+27,1.1227368346945231e+30]
minsplit: [-1.1619031153871814e+30,1.1236536623544072e+30]


While an outlier is not necessarily a problem given the surrogate model should effectively truncate the value to the min (or max) of the experimental range, depending on how often this happens it may hurt optimization.

In [4]:
experiment_ranges = dict(
    cp=(0, 1),
    maxdepth=(1, 30),
    minbucket=(1, 195),
    minsplit=(1, 164),
)
def in_range(hyperparameter, value):
    return experiment_ranges[hyperparameter][0] <= value <=experiment_ranges[hyperparameter][1]

data = []
for hp in problem.hyperparameters:
    data.append({
        "hyperparameter": hp,
        "minimum": df[hp].min(),
        " 5%": df[hp].quantile(q=0.05),
        "25%": df[hp].quantile(q=0.25),
        "median": df[hp].median(),
        "25%": df[hp].quantile(q=0.75),
        "95%": df[hp].quantile(q=0.95),
        "maximum": df[hp].max(),
        "% in exp. range": sum(in_range(hp, v) for v in df[hp])/len(df),
    })
symb_ranges = pd.DataFrame(data)
symb_ranges

Unnamed: 0,hyperparameter,minimum,5%,25%,median,95%,maximum,% in exp. range
0,cp,-16916650000.0,1e-06,0.002438,0.001548,0.533284,3.414279e+29,0.946218
1,maxdepth,-3.713193e+29,18.0,26.0,26.0,144.936859,8.320664e+29,0.904493
2,minbucket,-4.706325e+27,0.36,11.0,9.0,18.0,1.122737e+30,0.900386
3,minsplit,-1.161903e+30,1.0,27.0,17.0,94.794326,1.123654e+30,0.911982


Table 2. *An overview of the range of instantiated hyperparameter values.*

While >90% of evaluations are in hyperparameter range, we can still ask when it is not in range:
 - does it depend on the task?
 - does it depend on the generation?
 - what is the effect of all-constants configurations on this number?

In [5]:
data = []
for i, row in df.iterrows():        
    data.append({
        f"{hp}_in_range": in_range(hp, row[hp])
        for hp in problem.hyperparameters
    })
is_in_range = pd.DataFrame(data)
df2 = pd.concat([df, is_in_range], axis=1)

In [6]:
import re
regex = re.compile(r"(\(|, )(m|mkd|p|n|mcp|rc|xvar)")

def is_constant(expr):
    return re.search(regex, expr) is None

df2['is_constant'] = df2.expression.apply(is_constant)

In [7]:
percentage_by_task = df2.groupby(by=['task']).agg(
    {c: lambda x: sum(x)/len(x) for c in is_in_range.columns}
)
percentage_by_task.describe()

Unnamed: 0,cp_in_range,maxdepth_in_range,minbucket_in_range,minsplit_in_range
count,114.0,114.0,114.0,114.0
mean,0.946218,0.904493,0.900386,0.911982
std,0.002917,0.026824,0.085453,0.050645
min,0.937277,0.867745,0.537639,0.655051
25%,0.944672,0.882601,0.911528,0.912659
50%,0.946481,0.884902,0.925493,0.92701
75%,0.94805,0.937224,0.933128,0.929909
max,0.953237,0.946428,0.943555,0.936054


Table 3. *Summary of how many expressions are in range per hyperparameter by task. E.g. the task which has fewest expressions where `cp` is in range, still has it in range `93.7%` of the time.*

In [8]:
percentage_by_gen = df2.groupby(by=['gen']).agg(
    {c: lambda x: sum(x)/len(x) for c in list(is_in_range.columns) + ['is_constant']}
)
percentage_by_gen.describe()

Unnamed: 0,cp_in_range,maxdepth_in_range,minbucket_in_range,minsplit_in_range,is_constant
count,200.0,200.0,200.0,200.0,200.0
mean,0.948815,0.908359,0.903617,0.915761,0.083562
std,0.099721,0.167671,0.054552,0.096442,0.058754
min,0.30711,0.210437,0.391431,0.343384,0.0
25%,0.953009,0.93547,0.893165,0.916157,0.041132
50%,0.967321,0.95533,0.912969,0.93859,0.0733
75%,0.97963,0.971889,0.928504,0.956346,0.117336
max,1.0,1.0,0.972552,0.997608,0.25


Table 4. *Aggregate for how often expressions are in range by generation. We see that in most generations most expressions are in range, though in the extreme case `<50%` are in range in one generation (this need not be the same generation for each hyperparameter).*

In [9]:
percentage_by_gen

Unnamed: 0_level_0,cp_in_range,maxdepth_in_range,minbucket_in_range,minsplit_in_range,is_constant
gen,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
0,0.540554,0.317244,0.391431,0.343384,0.000000
1,0.496793,0.468497,0.664592,0.450104,0.000000
2,0.307110,0.375993,0.649861,0.457341,0.000000
3,0.326846,0.317708,0.719846,0.609101,0.000000
4,0.312591,0.217288,0.862299,0.407255,0.000000
...,...,...,...,...,...
195,0.997852,0.961779,0.898317,0.930183,0.020408
196,0.966851,0.945152,0.883934,0.912742,0.000000
197,0.962943,0.969209,0.874239,0.920337,0.030612
198,0.967901,0.986890,0.894833,0.961153,0.043956


Table 5. *Initial populations score poorly, but later generations seem generally in range.*

This might be a sign that a good symbolic expression operates within the ranges of the hyperparameters. It could also be a sign that Symbolic Expressions are learnt which are effectively constants.

In [10]:
percentage_is_constant = df2.groupby(by=['is_constant']).agg(
    {c: lambda x: sum(x)/len(x) for c in is_in_range.columns}
)
percentage_is_constant

Unnamed: 0_level_0,cp_in_range,maxdepth_in_range,minbucket_in_range,minsplit_in_range
is_constant,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
False,0.942558,0.900572,0.893777,0.906703
True,0.987063,0.948254,0.974127,0.970893


Table 6. *Even our constant expressions aren't always in range, though they are more likely to be than symbolic expressions.*

Are Symbolic Expressions effectively constant? This would be either because:
 - The expression is always out-of-range, so it is truncated to constant values.
 - The expression always evaluates to the same constant (e.g. `rc/rc`)

In [82]:
symbolic_expressions = df2[~df2.is_constant]

In [83]:
print(f"{symbolic_expressions.expression.nunique()} unique symbolic expressions were evaluated across {symbolic_expressions.task.nunique()} tasks.")

12729 unique symbolic expressions were evaluated across 114 tasks.


In [84]:
percentage_by_expression = symbolic_expressions.groupby(by=['expression']).agg(
    {c: lambda x: sum(x)/len(x) for c in list(is_in_range.columns) + ['is_constant']}
)
percentage_by_expression.describe()

Unnamed: 0,cp_in_range,maxdepth_in_range,minbucket_in_range,minsplit_in_range
count,12729.0,12729.0,12729.0,12729.0
mean,0.929094,0.879929,0.875538,0.886815
std,0.248183,0.293587,0.256377,0.27958
min,0.0,0.0,0.0,0.0
25%,1.0,1.0,0.947368,0.964912
50%,1.0,1.0,0.95614,0.973684
75%,1.0,1.0,1.0,1.0
max,1.0,1.0,1.0,1.0


Table 7. *Expressions are typically in range.*

In [85]:
percentage_by_expression[
    (percentage_by_expression.cp_in_range < 0.05) &
    (percentage_by_expression.maxdepth_in_range < 0.05) &
    (percentage_by_expression.minbucket_in_range < 0.05) &
    (percentage_by_expression.minsplit_in_range < 0.05)
]

Unnamed: 0_level_0,cp_in_range,maxdepth_in_range,minbucket_in_range,minsplit_in_range,is_constant
expression,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
"make_tuple(neg(po), neg(rc), add(n, mkd), min(rc, mcp))",0.0,0.0,0.008772,0.0,False
"make_tuple(neg(po), neg(rc), n, min(rc, mcp))",0.0,0.0,0.008772,0.0,False
"make_tuple(sub(neg(0.03935436677289785), max(rc, n)), add(mul(po, p), neg(mcp)), min(8., neg(po)), sub(sub(rc, m), pow(po, 7.)))",0.0,0.04386,0.0,0.0,False
"make_tuple(sub(neg(0.03935436677289785), max(rc, n)), truediv(truediv(n, mkd), min(xvar, mcp)), min(8., neg(po)), min(truediv(mkd, p), n))",0.0,0.0,0.0,0.0,False
"make_tuple(sub(rc, po), sub(0.0015624419718911716, rc), sub(p, n), max(0.0016451488688464565, mcp))",0.0,0.0,0.0,0.0,False


Table 8. *Only the above 6 (of ~13k) symbolic expressions evaluate out of range for more than 95% tasks for all hyperparameters at the same time.*

In [11]:
# the following is some ugly code to help us compile the expressions outside of the normal script
from collections import namedtuple
from evolution import setup_toolbox

FakeArgs = namedtuple("FakeArgs", "constants_only optimize_constants max_start_size max_number_operators")

fake_args = FakeArgs(False, False, 3, 3)
toolbox, pset = setup_toolbox(problem, fake_args)

# integer numbers are not interpreted correctly unless we had a trailing period
# e.g. make_tuple(1, p, p, 2) does not work, but make_tuple(1., p, p, 2.) does
numbers = re.compile(r'(\(| )\d+(\)|,)')
def add_trailing_period(match):
    match_str = match.group(0)
    return match_str[:-1] + '.' + match_str[-1]

In [88]:
from deap import gp, creator

values_by_expression = {}
for str_expression in symbolic_expressions.expression.unique():
    values_by_task = {}
    fixed_expression = re.sub(numbers, add_trailing_period, str_expression)
    tree_expression = gp.PrimitiveTree.from_string(fixed_expression, pset)
    individual = creator.Individual(tree_expression)
    for task, metadata in problem.metadata.iterrows():
        symbolic_expression = gp.compile(individual, pset)
        hyperparameter_values = toolbox.evaluate(symbolic_expression, metadata)
        values_by_task[task] = hyperparameter_values
    values_by_expression[str_expression] = values_by_task



In [89]:
expr_is_constant = {}
for expression, by_task in values_by_expression.items():
    a = pd.DataFrame.from_dict(by_task, columns=problem.hyperparameters, orient='index')
    expr_is_constant[expression] = all([v < 2 for v in (a.describe().loc['max'] - a.describe().loc['min'])])
    

In [93]:
print(f"{len([e for e, c in expr_is_constant.items() if c])} expressions evaluate to constants.")

1711 expressions evaluate to constants.


In [94]:
[e for e, c in expr_is_constant.items() if c]

['make_tuple(min(xvar, n), expit(n), add(mkd, 136.), min(mcp, po))',
 'make_tuple(truediv(rc, rc), sub(mkd, mkd), min(m, mcp), sub(0.008483523159916725, mcp))',
 'make_tuple(sub(mcp, 0.024184355866803904), if_gt(mkd, xvar, m, mkd), max(rc, mkd), pow(rc, mcp))',
 'make_tuple(if_gt(truediv(rc, 1009.), max(112., po), mul(n, 0.01389005822292668), pow(p, n)), sub(pow(xvar, xvar), mul(p, rc)), neg(min(n, mkd)), mul(max(n, p), expit(444.)))',
 'make_tuple(expit(p), pow(rc, mkd), min(rc, po), sub(mkd, 0.24995302167720845))',
 'make_tuple(sub(sub(mcp, xvar), expit(0.17677585367839033)), truediv(if_gt(107., mkd, 0.1913097530238635, 58.), truediv(mkd, mkd)), sub(pow(12., 0.033353104720138874), max(mkd, 14.)), mul(pow(xvar, p), expit(mcp)))',
 'make_tuple(neg(if_gt(mcp, rc, m, xvar)), pow(add(mkd, po), add(xvar, 2.)), pow(neg(po), max(p, n)), truediv(expit(n), mul(xvar, po)))',
 'make_tuple(if_gt(n, xvar, mkd, 725.), neg(mcp), neg(4.), pow(xvar, m))',
 'make_tuple(pow(0.020003428836256873, 0.51897