# Hyperparameters tuning


This tutorial will show you how to tune hyperparameters in CatBoost using built-in capabilities, [optuna](https://github.com/optuna/optuna), and [hyperopt](https://github.com/hyperopt/hyperopt) frameworks.

In [None]:
!pip3 install catboost
!pip3 install optuna
!pip3 install hyperopt

### Dataset

We are going to tune hyperparameters for a binary classsification task on the [UCI Adult Dataset](https://archive.ics.uci.edu/ml/datasets/Adult). We will predict a person's annual income - whether they make more than 50K or not. Let's build 2 sets of catboost pools(train, validation, test) for that, one consisting only of numerical features and the other of numerical and categorical features.

In [1]:
import catboost
from catboost.datasets import adult
from catboost.utils import eval_metric

adult_train, adult_test = adult()
adult_train.head(5)

Unnamed: 0,age,workclass,fnlwgt,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loss,hours-per-week,native-country,income
0,39.0,State-gov,77516.0,Bachelors,13.0,Never-married,Adm-clerical,Not-in-family,White,Male,2174.0,0.0,40.0,United-States,<=50K
1,50.0,Self-emp-not-inc,83311.0,Bachelors,13.0,Married-civ-spouse,Exec-managerial,Husband,White,Male,0.0,0.0,13.0,United-States,<=50K
2,38.0,Private,215646.0,HS-grad,9.0,Divorced,Handlers-cleaners,Not-in-family,White,Male,0.0,0.0,40.0,United-States,<=50K
3,53.0,Private,234721.0,11th,7.0,Married-civ-spouse,Handlers-cleaners,Husband,Black,Male,0.0,0.0,40.0,United-States,<=50K
4,28.0,Private,338409.0,Bachelors,13.0,Married-civ-spouse,Prof-specialty,Wife,Black,Female,0.0,0.0,40.0,Cuba,<=50K


In [3]:
from sklearn.model_selection import train_test_split

numeric_features = ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', 'hours-per-week']
categorical_features = ['workclass', 'education', 'marital-status', 'occupation', 'relationship', 'race', 'sex', 'native-country']

adult_train['income'] = adult_train['income'].map({'<=50K': 0., '>50K': 1.})
adult_test['income'] = adult_test['income'].map({'<=50K': 0., '>50K': 1.})
for c in categorical_features:
    adult_train[c].fillna('nan', inplace=True)
    adult_test[c].fillna('nan', inplace=True)

X_train, X_test, y_train, y_test = train_test_split(adult_train[numeric_features + categorical_features], adult_train['income'], test_size=0.2, random_state=42)


numeric_train_pool = catboost.Pool(X_train[numeric_features], y_train)
numeric_val_pool = catboost.Pool(X_test[numeric_features], y_test)
numeric_test_pool = catboost.Pool(adult_test[numeric_features], adult_test['income'])


cat_train_pool = catboost.Pool(X_train, y_train, cat_features=categorical_features)
cat_val_pool = catboost.Pool(X_test, y_test, cat_features=categorical_features)
cat_test_pool = catboost.Pool(adult_test, adult_test['income'], cat_features=categorical_features)

This function will calculate the quality of found hyperparameters on a test dataset:

In [4]:
def calc_test_quality(train_pool=numeric_train_pool, val_pool=numeric_val_pool, test_pool=numeric_test_pool, **kwargs):
    model = catboost.CatBoostClassifier(**kwargs, random_seed=42)
    model.fit(train_pool, verbose=0, eval_set=val_pool)
    y_pred = model.predict_proba(test_pool)
    return eval_metric(test_pool.get_label(), y_pred[:, 1], 'AUC')

Let's train a model with default parameters:

In [5]:
calc_test_quality()

[0.8744062991309358]

First, we will optimize the folowing hyperparameters using numeric features only:
- **learning_rate** - used for reducing the gradient step
- **depth** - depth of the tree
- **l2_leaf_reg** - coefficient at the L2 regularization term of the cost function
- **boosting_type** - boosting scheme. Possible values: Ordered, Plain

There are 2 built-in methods of hyperparameters tuning in CatBoost: grid search and random search. Grid search is simply an exhaustive searching through a manually specified subset of the hyperparameter space. 

In [6]:
grid = {
    'learning_rate': [0.03, 0.06],
    'depth':[3, 6, 9],
    'l2_leaf_reg': [2, 3, 4],
    'boosting_type': ['Ordered', 'Plain']
}

grid_search_model = catboost.CatBoostClassifier(iterations=1000, random_seed=42)
grid_search_result = grid_search_model.grid_search(grid, numeric_train_pool, verbose=20)


bestTest = 0.3475488771
bestIteration = 999

0:	loss: 0.3475489	best: 0.3475489 (0)	total: 9.18s	remaining: 5m 21s

bestTest = 0.3468175664
bestIteration = 594


bestTest = 0.3481154366
bestIteration = 999


bestTest = 0.3454653087
bestIteration = 975


bestTest = 0.3488477618
bestIteration = 991


bestTest = 0.34700929
bestIteration = 954


bestTest = 0.3469102947
bestIteration = 575


bestTest = 0.3462598759
bestIteration = 660


bestTest = 0.3475839061
bestIteration = 774


bestTest = 0.3469096484
bestIteration = 456


bestTest = 0.3469822493
bestIteration = 989


bestTest = 0.3469540474
bestIteration = 769


bestTest = 0.3459891057
bestIteration = 863


bestTest = 0.3465270739
bestIteration = 453


bestTest = 0.3468671756
bestIteration = 962


bestTest = 0.347612502
bestIteration = 323


bestTest = 0.347555923
bestIteration = 900


bestTest = 0.3476361304
bestIteration = 410


bestTest = 0.3498501695
bestIteration = 994


bestTest = 0.3486111677
bestIteration = 749


bestTest = 0.

In [7]:
calc_test_quality(**grid_search_result['params']), grid_search_result['params'] 

([0.8751648039383577],
 {'depth': 3,
  'l2_leaf_reg': 3,
  'learning_rate': 0.06,
  'boosting_type': 'Ordered'})

Random search replaces the exhaustive enumeration of all combinations by selecting them randomly. In random search, we define distributions for hyperparameters instead of specific values. You can also pass a list of values - it will be sampled uniformly.

In [8]:
from scipy import stats

params_distribution = {
    'learning_rate': stats.uniform(0.01, 0.1),
    'depth': list(range(3, 10)),
    'l2_leaf_reg': stats.uniform(1, 10),
    'boosting_type': ['Ordered', 'Plain'],
}

random_search_model = catboost.CatBoostClassifier(random_seed=42)
random_search_result = random_search_model.randomized_search(
    params_distribution, 
    numeric_train_pool, 
    n_iter=20, 
    verbose=5, 
    partition_random_seed=123)


bestTest = 0.3570621798
bestIteration = 994

0:	loss: 0.3570622	best: 0.3570622 (0)	total: 9.38s	remaining: 2m 58s

bestTest = 0.3585051894
bestIteration = 576


bestTest = 0.354942476
bestIteration = 398


bestTest = 0.3536843023
bestIteration = 995


bestTest = 0.3549438598
bestIteration = 445


bestTest = 0.3548070331
bestIteration = 998

5:	loss: 0.3548070	best: 0.3536843 (3)	total: 1m 1s	remaining: 2m 22s

bestTest = 0.3541591107
bestIteration = 986


bestTest = 0.3531888405
bestIteration = 881


bestTest = 0.3538994015
bestIteration = 980


bestTest = 0.3525667359
bestIteration = 999


bestTest = 0.3526242018
bestIteration = 377

10:	loss: 0.3526242	best: 0.3525667 (9)	total: 2m 30s	remaining: 2m 3s

bestTest = 0.354935685
bestIteration = 835


bestTest = 0.3561806368
bestIteration = 878


bestTest = 0.3616929754
bestIteration = 997


bestTest = 0.356413664
bestIteration = 321


bestTest = 0.3570983676
bestIteration = 369

15:	loss: 0.3570984	best: 0.3525667 (9)	total: 3m 3s	rem

In [9]:
calc_test_quality(**random_search_result['params']), random_search_result['params']

([0.8767458281765127],
 {'depth': 8,
  'learning_rate': 0.026233677654785993,
  'l2_leaf_reg': 6.629482011941916,
  'boosting_type': 'Ordered'})

## Bayesian optimization

Random and grid search pay no attention to past results when searhing the best hyperparametes. Bayesian optimization, in contrast to random or grid search, keeps track of past evaluation results which it uses to form a probabilistic model mapping hyperparameters to a probability of a score on the objective function. There're a number of libraries that can do it. In this tutorial we will try 2 of them:  [optuna](https://github.com/optuna/optuna) and [hyperopt](https://github.com/hyperopt/hyperopt).

Let's start with optuna: 

In [10]:
import optuna
from optuna.samplers import TPESampler
from catboost.utils import eval_metric


def objective(trial):
    params = {
        'learning_rate': trial.suggest_float('learning_rate', 0.01, 0.1),
        'depth': trial.suggest_int('depth', 3, 10),
        'l2_leaf_reg': trial.suggest_float('l2_leaf_reg', 1, 10),
        'boosting_type': trial.suggest_categorical('boosting_type', ['Ordered', 'Plain']),
    }

    model = catboost.CatBoostClassifier(**params, random_seed=42)
    model.fit(numeric_train_pool, verbose=0, eval_set=numeric_val_pool)
    y_pred = model.predict_proba(numeric_val_pool)
    return eval_metric(numeric_val_pool.get_label(), y_pred[:, 1], 'AUC')

sampler = TPESampler(seed=123)
study = optuna.create_study(direction='maximize', sampler=sampler)
study.optimize(objective, n_trials=20)

[32m[I 2021-05-02 16:17:55,634][0m A new study created in memory with name: no-name-f2f48a39-17c5-4111-8df5-b5d050c48f1f[0m
[32m[I 2021-05-02 16:18:48,836][0m Trial 0 finished with value: 0.8836909164770923 and parameters: {'learning_rate': 0.07268222670380756, 'depth': 9, 'l2_leaf_reg': 4.856238335681431, 'boosting_type': 'Ordered'}. Best is trial 0 with value: 0.8836909164770923.[0m
[32m[I 2021-05-02 16:18:56,100][0m Trial 1 finished with value: 0.8808166455904405 and parameters: {'learning_rate': 0.05961832921746022, 'depth': 6, 'l2_leaf_reg': 5.420070400893375, 'boosting_type': 'Plain'}. Best is trial 0 with value: 0.8836909164770923.[0m
[32m[I 2021-05-02 16:19:02,431][0m Trial 2 finished with value: 0.8817266285087795 and parameters: {'learning_rate': 0.0982687778546154, 'depth': 3, 'l2_leaf_reg': 6.217248673203491, 'boosting_type': 'Plain'}. Best is trial 0 with value: 0.8836909164770923.[0m
[32m[I 2021-05-02 16:19:15,288][0m Trial 3 finished with value: 0.880400230

As we can see from the logs above, Ordered boosting type is used more often than Plain. It shows that optuna learnt that Ordered boosting type scores better than Plain.

In [11]:
calc_test_quality(**study.best_params), study.best_params

([0.8768748819916609],
 {'learning_rate': 0.018811391183601193,
  'depth': 10,
  'l2_leaf_reg': 1.7615014321430658,
  'boosting_type': 'Ordered'})

In [12]:
from hyperopt import hp, fmin, tpe
import numpy as np

def hyperopt_objective(params):
    print(params)
    model = catboost.CatBoostClassifier(**params, random_seed=42)
    model.fit(numeric_train_pool, verbose=0, eval_set=numeric_val_pool)
    y_pred = model.predict_proba(numeric_val_pool)
    return -eval_metric(numeric_val_pool.get_label(), y_pred[:, 1], 'AUC')[0]

space = {
    'learning_rate': hp.uniform('learning_rate', 0.01, 0.1),
    'depth': hp.randint('depth', 3, 10),
    'l2_leaf_reg': hp.uniform('l2_leaf_reg', 1, 10),
    'boosting_type': hp.choice('boosting_type', ['Ordered', 'Plain']),
}

best = fmin(hyperopt_objective,
    space=space,
    algo=tpe.suggest,
    max_evals=20,
    rstate=np.random.RandomState(seed=123))

{'boosting_type': 'Plain', 'depth': 8, 'l2_leaf_reg': 6.3139710353627025, 'learning_rate': 0.07552270821291171}
{'boosting_type': 'Ordered', 'depth': 6, 'l2_leaf_reg': 4.308614773826791, 'learning_rate': 0.08728243927833365}
{'boosting_type': 'Ordered', 'depth': 5, 'l2_leaf_reg': 7.867792270156927, 'learning_rate': 0.09932662966595064}
{'boosting_type': 'Ordered', 'depth': 3, 'l2_leaf_reg': 7.303807275407774, 'learning_rate': 0.0706129528849592}
{'boosting_type': 'Plain', 'depth': 6, 'l2_leaf_reg': 4.1869403239777645, 'learning_rate': 0.0606353426143548}
{'boosting_type': 'Ordered', 'depth': 7, 'l2_leaf_reg': 5.079708066414372, 'learning_rate': 0.039778097292241806}
{'boosting_type': 'Ordered', 'depth': 8, 'l2_leaf_reg': 6.2552942709930175, 'learning_rate': 0.047109386241537723}
{'boosting_type': 'Ordered', 'depth': 6, 'l2_leaf_reg': 6.468768565626519, 'learning_rate': 0.07822320859088815}
{'boosting_type': 'Ordered', 'depth': 3, 'l2_leaf_reg': 2.6549804340741274, 'learning_rate': 0.07

In [13]:
best_params = best.copy()
best_params['boosting_type'] = 'Plain' if best['boosting_type'] == 1 else 'Ordered'
calc_test_quality(**best_params), best_params

([0.8758792836635058],
 {'boosting_type': 'Ordered',
  'depth': 6,
  'l2_leaf_reg': 4.308614773826791,
  'learning_rate': 0.08728243927833365})

### Categorical features

Let's apply bayesian optimization approaches to a dataset with categorical features. We will also optimize **max_ctr_complexity** parameter - the maximum number of features that can be combined. Each resulting combination consists of one or more categorical features and can optionally contain binary features in the following form: “numeric feature > value”. But first let's train a model with default parameters:

In [14]:
calc_test_quality(train_pool=cat_train_pool,val_pool=cat_val_pool,test_pool=cat_test_pool)

[0.9273316513681858]

In [15]:

def objective(trial):
    params = {
        'learning_rate': trial.suggest_float('learning_rate', 0.01, 0.1),
        'depth': trial.suggest_int('depth', 3, 10),
        'l2_leaf_reg': trial.suggest_float('l2_leaf_reg', 1, 10),
        'boosting_type': trial.suggest_categorical('boosting_type', ['Ordered', 'Plain']),
        'max_ctr_complexity': trial.suggest_int('max_ctr_complexity', 0, 8)
    }

    model = catboost.CatBoostClassifier(**params, random_seed=42)
    model.fit(cat_train_pool, verbose=0, eval_set=cat_val_pool)
    y_pred = model.predict_proba(cat_val_pool)
    return eval_metric(cat_val_pool.get_label(), y_pred[:, 1], 'AUC')

sampler = TPESampler(seed=123)
study = optuna.create_study(direction='maximize', sampler=sampler)
study.optimize(objective, n_trials=20)

[32m[I 2021-05-02 16:39:39,439][0m A new study created in memory with name: no-name-26b54ca2-7882-4537-8ea3-bc0198fd2dbf[0m
[32m[I 2021-05-02 16:41:03,976][0m Trial 0 finished with value: 0.9337410717988759 and parameters: {'learning_rate': 0.07268222670380756, 'depth': 9, 'l2_leaf_reg': 4.856238335681431, 'boosting_type': 'Ordered', 'max_ctr_complexity': 6}. Best is trial 0 with value: 0.9337410717988759.[0m
[32m[I 2021-05-02 16:41:24,110][0m Trial 1 finished with value: 0.9328366273469896 and parameters: {'learning_rate': 0.07472352791392958, 'depth': 5, 'l2_leaf_reg': 4.807958141120149, 'boosting_type': 'Ordered', 'max_ctr_complexity': 1}. Best is trial 0 with value: 0.9337410717988759.[0m
[32m[I 2021-05-02 16:42:19,225][0m Trial 2 finished with value: 0.9331029245421298 and parameters: {'learning_rate': 0.0716346764726377, 'depth': 9, 'l2_leaf_reg': 2.2595568636734606, 'boosting_type': 'Ordered', 'max_ctr_complexity': 0}. Best is trial 0 with value: 0.9337410717988759.[

In [16]:
calc_test_quality(train_pool=cat_train_pool,
                  val_pool=cat_val_pool,
                  test_pool=cat_test_pool,
                  **study.best_params), study.best_params

([0.9291097900449995],
 {'learning_rate': 0.07268222670380756,
  'depth': 9,
  'l2_leaf_reg': 4.856238335681431,
  'boosting_type': 'Ordered',
  'max_ctr_complexity': 6})

In [17]:
def hyperopt_objective(params):
    print(params)
    model = catboost.CatBoostClassifier(**params, random_seed=42)
    model.fit(cat_train_pool, verbose=0, eval_set=cat_val_pool)
    y_pred = model.predict_proba(cat_val_pool)
    return -eval_metric(cat_val_pool.get_label(), y_pred[:, 1], 'AUC')[0]

space = {
    'learning_rate': hp.uniform('learning_rate', 0.01, 0.1),
    'depth': hp.randint('depth', 3, 10),
    'l2_leaf_reg': hp.uniform('l2_leaf_reg', 1, 10),
    'boosting_type': hp.choice('boosting_type', ['Ordered', 'Plain']),
    'max_ctr_complexity': hp.randint('max_ctr_complexity', 0, 8)
}

best = fmin(hyperopt_objective,
    space=space,
    algo=tpe.suggest,
    max_evals=20,
    rstate=np.random.RandomState(seed=123))

{'boosting_type': 'Ordered', 'depth': 8, 'l2_leaf_reg': 9.02418957750765, 'learning_rate': 0.07874394992843053, 'max_ctr_complexity': 4}
{'boosting_type': 'Ordered', 'depth': 3, 'l2_leaf_reg': 7.77026577135344, 'learning_rate': 0.08423590349470189, 'max_ctr_complexity': 0}
{'boosting_type': 'Plain', 'depth': 9, 'l2_leaf_reg': 6.876425031781056, 'learning_rate': 0.043661293824597616, 'max_ctr_complexity': 7}
{'boosting_type': 'Plain', 'depth': 7, 'l2_leaf_reg': 8.255101676728357, 'learning_rate': 0.09359026791928762, 'max_ctr_complexity': 6}
{'boosting_type': 'Plain', 'depth': 6, 'l2_leaf_reg': 8.555879516541207, 'learning_rate': 0.012200281335930548, 'max_ctr_complexity': 7}
{'boosting_type': 'Ordered', 'depth': 7, 'l2_leaf_reg': 4.577410159173021, 'learning_rate': 0.04272764683363647, 'max_ctr_complexity': 1}
{'boosting_type': 'Ordered', 'depth': 5, 'l2_leaf_reg': 6.024099192696876, 'learning_rate': 0.08881365189368236, 'max_ctr_complexity': 7}
{'boosting_type': 'Plain', 'depth': 7, '

In [18]:
best_params = best.copy()
best_params['boosting_type'] = 'Plain' if best['boosting_type'] == 1 else 'Ordered'
calc_test_quality(train_pool=cat_train_pool,
                  val_pool=cat_val_pool,
                  test_pool=cat_test_pool,
                  **best_params), best_params

([0.9289912641941946],
 {'boosting_type': 'Ordered',
  'depth': 7,
  'l2_leaf_reg': 4.577410159173021,
  'learning_rate': 0.04272764683363647,
  'max_ctr_complexity': 1})

### Results

|Features|Opt algorithm|AUC score|
|---|---|---|
|Numeric|No opt|0.87440|
|Numeric|Grid search|0.87516|
|Numeric|Randomized search|0.87648|
|Numeric|Optuna|**0.87687**|
|Numeric|Hyperopt|0.87587|
|Numeric & categorical|No opt|0.92733|
|Numeric & categorical|Optuna|**0.92910**|
|Numeric & categorical|Hyperopt|0.92899|