# Format DataFrame

Be advised, this dataset (SKLearn's Forest Cover Types) can take a little while to download...

This is a multi-class classification task, in which the target is label-encoded.

We'll also subtract one from the targets, to make the seven labels fall within the range of 0-6, rather than the default range of 1-7. This is to keep CatBoost from complaining.

In [1]:
import pandas as pd
from sklearn.datasets import fetch_covtype

data = fetch_covtype(shuffle=True, random_state=32)
train_df = pd.DataFrame(data.data, columns=["x_{}".format(_) for _ in range(data.data.shape[1])])
train_df["y"] = data.target - 1

print(train_df.shape)
train_df.head()

(581012, 55)


Unnamed: 0,x_0,x_1,x_2,x_3,x_4,x_5,x_6,x_7,x_8,x_9,...,x_45,x_46,x_47,x_48,x_49,x_50,x_51,x_52,x_53,y
0,3247.0,289.0,12.0,268.0,40.0,1624.0,186.0,238.0,193.0,2525.0,...,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0
1,3200.0,46.0,17.0,162.0,45.0,1592.0,223.0,200.0,105.0,2254.0,...,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0
2,2368.0,48.0,19.0,277.0,121.0,1260.0,224.0,196.0,99.0,1237.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2
3,2828.0,50.0,11.0,417.0,73.0,1252.0,225.0,215.0,123.0,962.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1
4,2932.0,32.0,11.0,618.0,55.0,638.0,218.0,217.0,134.0,1092.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0


# Set Up Environment

In [2]:
from hyperparameter_hunter import Environment, CVExperiment
from sklearn.metrics import f1_score
from sklearn.model_selection import KFold

env = Environment(
    train_dataset=train_df,
    results_path="HyperparameterHunterAssets",
    target_column="y",
    metrics=dict(f1=lambda y_true, y_pred: f1_score(y_true, y_pred, average="micro")),
    cv_type=KFold,
    cv_params=dict(n_splits=5, random_state=32),
)

Cross-Experiment Key:   'S8Q6MRvEmEfVCMfR0XLC03UB2-5lkgXQPowUlVgqREs='


Now that HyperparameterHunter has an active `Environment`, we can do two things:

# 1. Perform Experiments

In [3]:
from catboost import CatBoostClassifier

experiment = CVExperiment(
    model_initializer=CatBoostClassifier,
    model_init_params=dict(
        iterations=100,
        learning_rate=0.03,
        depth=6,
        save_snapshot=False,
        allow_writing_files=False,
        loss_function="MultiClass",
        classes_count=7,
    ),
)

<21:15:02> Validated Environment:  'S8Q6MRvEmEfVCMfR0XLC03UB2-5lkgXQPowUlVgqREs='
<21:15:02> Initialized Experiment: 'f29d973e-fb49-4044-a97f-e210dd87f0f1'
<21:15:02> Hyperparameter Key:     '1b9sh7OR9oG66Hz0_L_kCWJLqKWeytMNgvGp_0JXLJ8='
<21:15:02> 
<21:15:52> F0.0 AVG:   OOF(f1=0.72561)  |  Time Elapsed: 49.73871 s
<21:16:41> F0.1 AVG:   OOF(f1=0.72578)  |  Time Elapsed: 49.28626 s
<21:17:31> F0.2 AVG:   OOF(f1=0.72496)  |  Time Elapsed: 49.28679 s
<21:18:20> F0.3 AVG:   OOF(f1=0.72663)  |  Time Elapsed: 49.53078 s
<21:19:09> F0.4 AVG:   OOF(f1=0.72581)  |  Time Elapsed: 49.38508 s
<21:19:10> 
<21:19:10> FINAL:    OOF(f1=0.72576)  |  Time Elapsed: 4.0 m, 7.4541 s
<21:19:10> 
<21:19:10> Saving results for Experiment: 'f29d973e-fb49-4044-a97f-e210dd87f0f1'


# 2. Hyperparameter Optimization

In [4]:
from hyperparameter_hunter import GBRT, Real, Integer, Categorical

optimizer = GBRT(iterations=8, random_state=42)

optimizer.forge_experiment(
    model_initializer=CatBoostClassifier,
    model_init_params=dict(
        iterations=100,
        learning_rate=Real(low=0.0001, high=0.5),
        depth=Integer(4, 15),
        save_snapshot=False,
        allow_writing_files=False,
        loss_function="MultiClass",
        classes_count=7,
    ),
)

optimizer.go()

Validated Environment with key: "S8Q6MRvEmEfVCMfR0XLC03UB2-5lkgXQPowUlVgqREs="
[31mSaved Result Files[0m
[31m______________________________________________________________________[0m
 Step |       ID |   Time |      Value |     depth |   learning_rate | 
Experiments matching cross-experiment key/algorithm: 1
Experiments fitting in the given space: 1
Experiments matching current guidelines: 1
    0 | f29d973e | 00m00s | [35m   0.72576[0m | [32m        6[0m | [32m         0.0300[0m | 
[31mHyperparameter Optimization[0m
[31m______________________________________________________________________[0m
 Step |       ID |   Time |      Value |     depth |   learning_rate | 
    1 | 14757f77 | 04m53s | [35m   0.88707[0m | [32m        9[0m | [32m         0.4295[0m | 
    2 | 9aa18f5d | 07m16s | [35m   0.89922[0m | [32m       12[0m | [32m         0.2106[0m | 
    3 | 014c6375 | 06m04s | [35m   0.90904[0m | [32m       11[0m | [32m         0.3770[0m | 
    4 | 756d3178

Notice, `optimizer` recognizes our earlier `experiment`'s hyperparameters fit inside the search space/guidelines set for `optimizer`.

Then, when optimization is started, it automatically learns from `experiment`'s results - without any extra work for us!