In [1]:
import sys
import os

sys.path.append(os.path.join(os.getcwd(), '..'))

In [2]:
from catboost import CatBoostRegressor, CatBoostClassifier, Pool

In [3]:
pool_dir = "amazon/"
path_to_dataset = pool_dir + 'train_all3'  # Path to the dataset.
description_file = pool_dir + 'train_full3.cd' # Path to the description file of the dataset.

In [4]:
import catboost

In [41]:
#Import plotly to show plots in ipython notebook

In [42]:
from plotly.offline import iplot, init_notebook_mode
init_notebook_mode(connected=True)

In [5]:
from eval.catboost_evaluation import *

Pipeline for feature evaluation is pretty simple:
1. Choose fold size, this step is task-dependent. One should use fold size which will guarantee representative sample of general population.  We suggest to use at most 50% of dataset as fold size, because otherwise it'll be easier to overfit on all dataset. 
2. Estimation learning params for such fold size. If we have representative sample, this params should be consistent for different folds. For feature evaluation one doesn't not need to tune all params, just the main ones: learning rate and ensemble size.  
3. Train models on each fold and compute test metrics on rest dataset. For big dataset test fold could be much bigger than learn. Such datasets should provide more stable results.

Here we will assume we know optimal learning rate and ensemble size, later we will show, how one could easily estimate this params and use this library for other tasks, like model comparisons

So we will use fold_size 16000 and learning rate 0.03

In [71]:
fold_size = 16000
fold_offset = 0
folds_count = 20
random_seed = 0 
remove_models = True


Library support custom group_column which could be used to split instances for learn/test fold folds.

CatboostEvaluation(group_column=id)

This column could be of any type in cd file (for example, auxilary)

In [72]:
learn_params = {'iterations': 2000, 
                'random_seed': 0, 
                'thread_count': 16, 
                'logging_level': 'Silent',
                #You could set GPU learning if you like
                #"devices":"1",  
                #"task_type":"GPU",
                'loss_function' : 'Logloss',
                'boosting_type': 'Plain', #For feature evaluation learnign time is important and we need just relative quality
                'max_ctr_complexity' : 4}

We will check quality of first 3 features

In [73]:
features_to_evaluate = [0, 1, 2]

In [74]:
#First we need to create evaluator object

In [75]:
evaluator = CatboostEvaluation(path_to_dataset,
                               fold_size, 
                               folds_count,  
                               column_description=description_file,
                               seed=random_seed,
                               #working_dir=...  — working directory, we will need to create temp files during evaluation, 
                               #so ensure you have enough free space. 
                               #By default we will create unique temp dir in system temp directory
                               #group_column=... — set it if you have column which should be used to split 
)

In [76]:
result = evaluator.eval_features(learn_config=learn_params,
                                 objective_function="Logloss", #you could skips it here at set only in learn_params
                                 eval_metrics=["Logloss", "Accuracy"],
                                 set_features_to_eval=features_to_evaluate,
                                 eval_type=EvalType.SeqAdd, 
                                 thread_count=16, #if none we will try to take them from learn_params)
                                )

[INFO]: Model Ignore: 0-2 on fold #0 was fitted
[INFO]: Model Ignore: 1-2 on fold #0 was fitted
[INFO]: Model Ignore: 0:2 on fold #0 was fitted
[INFO]: Model Ignore: 0-1 on fold #0 was fitted
[INFO]: Model Ignore: 0-2 on fold #1 was fitted
[INFO]: Model Ignore: 1-2 on fold #1 was fitted
[INFO]: Model Ignore: 0:2 on fold #1 was fitted
[INFO]: Model Ignore: 0-1 on fold #1 was fitted
[INFO]: Start metric computation for folds [0, 2)
[INFO]: Computation of metrics for  folds [0, 2) is completed
[INFO]: Model Ignore: 0-2 on fold #2 was fitted
[INFO]: Model Ignore: 1-2 on fold #2 was fitted
[INFO]: Model Ignore: 0:2 on fold #2 was fitted
[INFO]: Model Ignore: 0-1 on fold #2 was fitted
[INFO]: Model Ignore: 0-2 on fold #3 was fitted
[INFO]: Model Ignore: 1-2 on fold #3 was fitted
[INFO]: Model Ignore: 0:2 on fold #3 was fitted
[INFO]: Model Ignore: 0-1 on fold #3 was fitted
[INFO]: Start metric computation for folds [2, 4)
[INFO]: Computation of metrics for  folds [2, 4) is completed
[INFO]: 

As a result we have instance of EvaluationResults object. 
To get results for one metric we need to call get_metric_results method

In [77]:
logloss_result = result.get_metric_results("Logloss")

Baseline case is pool without all tested features.
Test cases are EvalType dependent.
For SeqAdd mode we test pool with each feature added to pool, while other still ignored (forward selection):
So if we have SeqAdd mode for [0, 1, 2] features set, than test pools are:
1. Pool with all features expcept 0, 1
2. Pool with all features expcept 0, 2
3. Pool with all features expcept 1, 2

And baseline pool is pool without features 0, 1, 2

logloss_result is instance of MetricEvaluationResult class and contains all information about one metric
Usually we need to call just 3 method.
First one is get_baseline_comparision — create table with evaluation results

In [78]:
logloss_result.get_baseline_comparison()

Unnamed: 0,PValue,Score,Quantile 0.005,Quantile 0.995,Decision,Overfit iter diff,Overfit iter pValue
Ignore: 1-2,0.999911,63.877555,62.301502,65.61121,GOOD,402.95,0.999912
Ignore: 0:2,0.999911,40.459995,37.848642,43.088764,GOOD,28.1,0.79567
Ignore: 0-1,0.940481,0.18567,-1.175424,1.562836,UNKNOWN,37.2,0.654159


In this table for each test case (Pool without some features) we show human-readable score (could be configured by  ScoreConfig param, by default we use (baseline_score - test_score) * magic_multiplier / baseline_score as score on one fold. Table score is mean score for all folds. Bigger score means bettter quality of this features).

Most important table entry: Decision. If there is UNKNOWN or BAD, than this feature doesn't not give any profit or we don't have enough data to see quality improvment from this feature.

If Deceision is good, than feature gives some profit. 

But be aware, that if overfit iter diff is big, than it could possible explanation for all profit you see

Second useful method — create_fold_learning_curve will create plotly plot on selected fold

In [79]:
iplot(logloss_result.create_fold_learning_curves(0))

In [80]:
baseline_case = logloss_result.get_baseline_case()

In [81]:
baseline_case

Ignore: 0-2

In [82]:
baseline_result = logloss_result.get_case_result(baseline_case)

In [83]:
iplot(baseline_result.create_learning_curves_plot())

## Other tasks

### Choose learning rate

Now lets show, how to use library for other task.
For example, we could search for learning rate.
For this we just need to fit models with different learning rates and fixed ensemble size and look on learning curves

In [10]:
learning_rate_params = learn_params

In [11]:
baseline_case = ExecutionCase(label="Step {}".format(0.03),
                              params=learning_rate_params, 
                              learning_rate=0.03)

In [22]:
other_learning_rate_cases = [ExecutionCase(label="Step {}".format(step), 
                                           params=learning_rate_params, 
                                           learning_rate=step) for step in [0.05, 0.015]]

In [23]:
evaluator = CatboostEvaluation(path_to_dataset,
                               fold_size, 
                               1,  #For learning rate estimation we need just 1 fold
                               column_description=description_file,
                               seed=random_seed)

In [24]:
evaluator.get_working_dir()

'/var/tmp/tmpYDCozi'

In [25]:
learning_rates_result = evaluator.eval_cases(baseline_case, 
                                             other_learning_rate_cases,
                                             thread_count=16,
                                             eval_metrics="Logloss")

[INFO]: Model Step 0.03, Ignore: None on fold #0 was fitted
[INFO]: Model Step 0.05, Ignore: None on fold #0 was fitted
[INFO]: Model Step 0.015, Ignore: None on fold #0 was fitted
[INFO]: Start metric computation for folds [0, 1)
[INFO]: Computation of metrics for  folds [0, 1) is completed


Lets get results

In [26]:
logloss_learning_rate_search_results = learning_rates_result.get_metric_results("Logloss")

Let's see learning curves

In [43]:
iplot(logloss_learning_rate_search_results.create_fold_learning_curves(fold=0, offset=200))

So default learning rate (0.03) seems good