# Format DataFrame

In [1]:
import pandas as pd
from sklearn.datasets import make_regression

data = make_regression(n_samples=600, n_features=50, noise=0.1, random_state=42)
train_df = pd.DataFrame(data[0], columns=["x_{}".format(_) for _ in range(data[0].shape[1])])
train_df["target"] = data[1]

print(train_df.shape)
train_df.head()

(600, 51)


Unnamed: 0,x_0,x_1,x_2,x_3,x_4,x_5,x_6,x_7,x_8,x_9,...,x_41,x_42,x_43,x_44,x_45,x_46,x_47,x_48,x_49,target
0,1.705718,-0.131674,-1.323795,1.136606,-0.928763,0.284816,-1.187778,-3.055318,-0.212013,1.671612,...,0.410953,0.755633,1.088105,0.111611,-0.614697,-0.207736,0.179674,-0.231539,0.767044,-124.517151
1,0.713429,-1.736042,0.767225,0.745636,-1.87669,-0.712041,0.229587,-0.058826,-2.344972,0.654297,...,-0.274046,-0.340395,0.085439,-0.241921,1.369061,1.652834,0.265328,-0.184885,-0.238244,-189.821389
2,1.322404,-1.008531,0.080041,0.243827,0.441083,0.301931,1.414744,-0.287359,0.848805,-0.608154,...,-0.306996,1.363546,-1.992186,-0.690337,-0.246858,1.112149,0.256325,-0.505621,-0.065529,-117.707136
3,-0.147538,-0.75678,1.085672,-1.070508,0.28983,-0.148429,-2.815124,-2.617113,1.057043,-0.225982,...,-0.245819,0.809846,1.387846,0.492437,1.475454,-0.334773,-0.230849,0.824702,-0.202968,-205.799334
4,0.343111,0.093387,-1.988132,-1.796316,-2.628521,-0.493052,-0.788237,-0.470098,0.26202,-1.088517,...,0.561577,-0.518497,0.426443,1.003505,0.201326,-1.052442,0.353306,-1.566265,0.532124,99.591967


# Set Up Environment

In [2]:
from hyperparameter_hunter import Environment, CVExperiment
from sklearn.metrics import explained_variance_score

env = Environment(
    train_dataset=train_df,
    results_path="HyperparameterHunterAssets",
    metrics=dict(evs=explained_variance_score),
    cv_type="KFold",
    cv_params=dict(n_splits=3, shuffle=True, random_state=1337),
    runs=2,
)

Cross-Experiment Key:   'Sxkz36nLbTyi4QJM7w6wCWkbIucXm0RbzMsirLPuYmw='


Now that HyperparameterHunter has an active `Environment`, we can do two things:

# 1. Perform Experiments

*Note: If this is your first HyperparameterHunter example, the CatBoost classification example may be a better starting point.*

In this Experiment, we're also going to use `model_extra_params` to provide arguments to `CatBoostRegressor`'s `fit` method, just like we would if we weren't using HyperparameterHunter.

We'll be using the `verbose` argument to print evaluations of our `CatBoostRegressor` every 50 iterations, and we'll also be using the dataset sentinels offered by `Environment`. You can read more about the exciting thing you can do with the `Environment` sentinels in the documentation and in the example dedicated to them. For now, though, we'll be using them to provide each fold's `env.validation_input`, and `env.validation_target` to `CatBoostRegressor.fit` via its `eval_set` argument.

You could also easily add `CatBoostRegressor.fit`'s `early_stopping_rounds` argument to `model_extra_params["fit"]` to use early stopping, but doing so here with only `iterations=100` doesn't make much sense.

In [3]:
from catboost import CatBoostRegressor

experiment = CVExperiment(
    model_initializer=CatBoostRegressor,
    model_init_params=dict(
        iterations=100,
        learning_rate=0.05,
        depth=5,
        bootstrap_type="Bayesian",
        save_snapshot=False,
        allow_writing_files=False,
    ),
    model_extra_params=dict(
        fit=dict(
            verbose=50,
            eval_set=[(env.validation_input, env.validation_target)],
        ),
    ),
)

<22:13:22> Validated Environment:  'Sxkz36nLbTyi4QJM7w6wCWkbIucXm0RbzMsirLPuYmw='
<22:13:22> Initialized Experiment: 'df2982f6-ab54-4c74-80b6-ad1c1e49fe45'
<22:13:22> Hyperparameter Key:     '913_iDDPY_PMp5ulOyK251mgLCZ2qau7kjOvDdy1rz8='
<22:13:22> 
0:	learn: 181.0383680	test: 179.9132919	best: 179.9132919 (0)	total: 63ms	remaining: 6.24s
50:	learn: 97.8034574	test: 110.2078251	best: 110.2078251 (50)	total: 427ms	remaining: 410ms
99:	learn: 65.3572502	test: 86.2235253	best: 86.2235253 (99)	total: 783ms	remaining: 0us

bestTest = 86.22352531
bestIteration = 99

<22:13:22> F0/R0  |  OOF(evs=0.77530)  |  Time Elapsed: 0.81737 s
0:	learn: 180.3273426	test: 178.7632887	best: 178.7632887 (0)	total: 7.5ms	remaining: 742ms
50:	learn: 95.3065291	test: 108.1124182	best: 108.1124182 (50)	total: 373ms	remaining: 358ms
99:	learn: 63.7322170	test: 85.4546004	best: 85.4546004 (99)	total: 724ms	remaining: 0us

bestTest = 85.45460041
bestIteration = 99

<22:13:23> F0/R1  |  OOF(evs=0.77929)  |  Time El

Notice above that CatBoost printed scores for our `eval_set` every 50 iterations just like we said in `model_extra_params["fit"]`; although, it made our results rather difficult to read, so we'll switch back to `verbose=False` during optimization.

# 2. Hyperparameter Optimization

Notice below that `optimizer` still recognizes the results of `experiment` as valid learning material even though their `verbose` values differ. This is because it knows that `verbose` has no effect on actual results.

In [4]:
from hyperparameter_hunter import DummyOptPro, Real, Integer, Categorical

optimizer = DummyOptPro(iterations=10, random_state=777)

optimizer.set_experiment_guidelines(
    model_initializer=CatBoostRegressor,
    model_init_params=dict(
        iterations=100,
        learning_rate=Real(0.001, 0.2),
        depth=Integer(3, 7),
        bootstrap_type=Categorical(["Bayesian", "Bernoulli"]),
        save_snapshot=False,
        allow_writing_files=False,
    ),
    model_extra_params=dict(
        fit=dict(
            verbose=False,
            eval_set=[(env.validation_input, env.validation_target)],
        ),
    ),
)

optimizer.go()

Validated Environment with key: "Sxkz36nLbTyi4QJM7w6wCWkbIucXm0RbzMsirLPuYmw="
[31mSaved Result Files[0m
[31m_________________________________________________________________________________________[0m
 Step |       ID |   Time |      Value |   bootstrap_type |     depth |   learning_rate | 
Experiments matching cross-experiment key/algorithm: 1
Experiments fitting in the given space: 1
Experiments matching current guidelines: 1
    0 | df2982f6 | 00m00s | [35m   0.75971[0m | [32m        Bayesian[0m | [32m        5[0m | [32m         0.0500[0m | 
[31mHyperparameter Optimization[0m
[31m_________________________________________________________________________________________[0m
 Step |       ID |   Time |      Value |   bootstrap_type |     depth |   learning_rate | 
    1 | e21de31a | 00m19s | [35m   0.86756[0m | [32m        Bayesian[0m | [32m        7[0m | [32m         0.1719[0m | 
    2 | ea45250b | 00m02s | [35m   0.89994[0m | [32m        Bayesian[0m | [32