Basic user example too complicated #847

eddiebergman · 2022-06-12T22:25:38Z

The basc example here
is too complicated for what is a simple optimization of one parameter.

I would advocate for a minimal working example likes this where the main advantage is just import smac,
that's it, no need to mess with anything else, to know about scenarios or to know about ConfigSpace.

It's a good starting point to add on complexity as well. Some of the most well used libaries are import x as y, i.e.
How often have you done from pandas.dataframes.concatenator import ConcatenarStrategy?

import numpy as np
import smac

from sklearn.ensemble import RandomForestClassifier

X_train, y_train = np.random.randint(2, size=(20, 2)), np.random.randint(2, size=20)
X_val, y_val = np.random.randint(2, size=(5, 2)), np.random.randint(2, size=5)


def train_random_forest(depth: int) -> float:
    model = RandomForestClassifier(max_depth=depth)
    model.fit(X_train, y_train)

    # define the evaluation metric as return
    return 1 - model.score(X_val, y_val)


if __name__ == "__main__":
    best_config = smac.optimize(
        train_random_forest,
        config_space={"depth": [2, 100]}
        n_runs=10,
    )

This could be strapped on fairly easly into SMAC but requires creatining functionality in ConfigSpace.
However, I think some simple implementation of this in ConfigSpace could look like:

(cat_1, cat_2, cat_3) ordinal categorical by a tuple
{cat_1, cat_2, cat_3} choice categorical by a set
[0, 1] lower and upper bound numerical int.
[0.0, 1.0] lower and upper numerical float

This leaves out some of the power of ConfigSpace like log scale parameters, conditionals etc...
but the goal is to accomodate as much as possible within the constraints of this simplicity.

Current version:

import numpy as np

from sklearn.ensemble import RandomForestClassifier
from ConfigSpace import ConfigurationSpace
from ConfigSpace.hyperparameters import UniformIntegerHyperparameter
from smac.facade.smac_bb_facade import SMAC4BB
from smac.scenario.scenario import Scenario


X_train, y_train = np.random.randint(2, size=(20, 2)), np.random.randint(2, size=20)
X_val, y_val = np.random.randint(2, size=(5, 2)), np.random.randint(2, size=5)


def train_random_forest(config):
    model = RandomForestClassifier(max_depth=config["depth"])
    model.fit(X_train, y_train)

    # define the evaluation metric as return
    return 1 - model.score(X_val, y_val)


if __name__ == "__main__":
    # Define your hyperparameters
    configspace = ConfigurationSpace()
    configspace.add_hyperparameter(UniformIntegerHyperparameter("depth", 2, 100))

    # Provide meta data for the optimization
    scenario = Scenario({
        "run_obj": "quality",  # Optimize quality (alternatively runtime)
        "runcount-limit": 10,  # Max number of function evaluations (the more the better)
        "cs": configspace,
    })

    smac = SMAC4BB(scenario=scenario, tae_runner=train_random_forest)
    best_found_config = smac.optimize()

The text was updated successfully, but these errors were encountered:

alexandertornede · 2023-01-26T10:20:36Z

Fixed in the README of 2.0.

eddiebergman added the enhancement label Jun 12, 2022

eddiebergman added this to the v0.14.0 milestone Jun 12, 2022

eddiebergman self-assigned this Jun 12, 2022

eddiebergman mentioned this issue Jun 14, 2022

Easy api automl/ConfigSpace#255

Merged

renesass modified the milestones: v0.14.0, v1.4.0 Jun 17, 2022

eddiebergman modified the milestones: v1.4.0, v2.0.0 Jun 30, 2022

alexandertornede closed this as completed Jan 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Basic user example too complicated #847

Basic user example too complicated #847

eddiebergman commented Jun 12, 2022

alexandertornede commented Jan 26, 2023

Basic user example too complicated #847

Basic user example too complicated #847

Comments

eddiebergman commented Jun 12, 2022

alexandertornede commented Jan 26, 2023