# $$User\ Defined\ Metrics\ Tutorial$$

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/catboost/tutorials/blob/master/custom_loss/custom_loss_and_metric_tutorial.ipynb)

# Contents
* [1. Introduction](#1.\-Introduction)
* [2. Classification](#2.\-Classification)
* [3. Regression](#3.\-Regression)
* [4. Multiclassification](#4.\-Multiclassification)
* [5. Multiregression](#5.\-Multiregression)
* [6. MultiLabel Classification](#6.\-MultiLabel-Classification)

# 1. Introduction

CatBoost allows you to create and pass to model your own loss functions and metrics. To do this you should implement classes with specicial interfaces.

<div class="alert alert-block alert-info"><b>Tip:</b> Install <code>numba</code> package to speed up computation of your custom losses/metrics. CatBosst will import it for you and perform a just-in-time compilation, if your code is supported by <code>numba</code>. <a href="https://numba.pydata.org/numba-doc/dev/reference/pysupported.html">Here</a> you can check supported Python features.</div>

##### Interface for user defined objectives:

In [1]:
class UserDefinedObjective(object):
    def calc_ders_range(self, approxes, targets, weights):
        """
        Computes first and second derivative of the loss function 
        with respect to the predicted value for each object.

        Parameters
        ----------
        approxes : indexed container of floats
            Current predictions for each object.

        targets : indexed container of floats
            Target values you provided with the dataset.

        weight : float, optional (default=None)
            Instance weight.

        Returns
        -------
            der1 : list-like object of float
            der2 : list-like object of float

        """
        pass
    
class UserDefinedMultiClassObjective(object):
    def calc_ders_multi(self, approxes, target, weight):
        """
        Computes first derivative and Hessian matrix of the loss function 
        with respect to the predicted value for each dimension.

        Parameters
        ----------
        approxes : indexed container of floats
            Predictions for each dimension of single object.

        targets : single expected value
            True label.

        weight : float, optional (default=None)
            Instance weight.

        Returns
        -------
            der1 : list-like object of float
            der2 : list of lists of float

        """
        pass

class MultiTargetCustomObjective:
    def calc_ders_multi(self, approxes, targets, weights):
        """
        Computes first derivative and Hessian matrix of the loss function 
        with respect to the predicted value for each dimension.

        Parameters
        ----------
        approxes : indexed container of floats
            Vector of approx labels.

        targets : list of float
            Vector of true labels.

        weight : float, optional (default=None)
            Instance weight.

        Returns
        -------
            der1 : list of float
            der2 : list of lists of float

        """
        pass

##### Interface for user defined metrics:

In [2]:
class UserDefinedMetric(object):
    def is_max_optimal(self):
        """
        Returns whether great values of metric are better
        """
        pass

    def evaluate(self, approxes, target, weight):
        """
        Evaluates metric value.

        Parameters
        ----------
        approxes : list of indexed containers (containers with only __len__ and __getitem__ defined) of float
            Vectors of approx labels.

        targets : one dimensional indexed container of float
            Vectors of true labels.

        weights : one dimensional indexed container of float, optional (default=None)
            Weight for each instance.

        Returns
        -------
            weighted error : float
            total weight : float

        """
        pass
    
    def get_final_error(self, error, weight):
        """
        Returns final value of metric based on error and weight.

        Parameters
        ----------
        error : float
            Sum of errors in all instances.

        weight : float
            Sum of weights of all instances.

        Returns
        -------
        metric value : float

        """
        pass


class MultiTargetCustomMetric:
    def evaluate(self, approxes, targets, weights):
        """
        Evaluates metric value.

        Parameters
        ----------
        approxes : list of lists of float
            Vectors of approx labels.

        targets : list of lists of float
            Vectors of true labels.

        weights : list of float, optional (default=None)
            Weight for each instance.

        Returns
        -------
            weighted error : float
            total weight : float

        """
        pass

    def is_max_optimal(self):
        """
        Returns whether great values of metric are better
        """
        pass

    def get_final_error(self, error, weight):
        """
        Returns final value of metric based on error and weight.

        Parameters
        ----------
        error : float
            Sum of errors in all instances.

        weight : float
            Sum of weights of all instances.

        Returns
        -------
        metric value : float

        """
        pass

Below we consider examples of user defined metrics for different types of tasks. We will use the following variables:
<center>$a$ - approx value</center>
<center>$p$ - probability</center>
<center>$t$ - target</center>
<center>$w$ - weight</center>

In [3]:
# import neccessary packages
from catboost import CatBoostClassifier, CatBoostRegressor, MultiTargetCustomMetric, MultiTargetCustomObjective
import numpy as np
from sklearn.datasets import make_classification, make_regression, make_multilabel_classification
from sklearn.model_selection import train_test_split

# 2. Classification

Note: for binary classification problems approxes are not equal to probabilities. Probabilities are calculated from approxes using sigmoid function.
<h4><center>$p=\frac{1}{1 + e^{-a}}=\frac{e^a}{1 + e^a}$</center></h4>
As an example, let's take Logloss metric which is defined by the following formula:
<h4><center>$Logloss_i = -{w_i * (t_i * log(p_i) + (1 - t_i) * log(1 - p_i))}$</center></h4>
<h4><center>$Logloss = \frac{\sum_{i=1}^{N}{Logloss_i}}{\sum_{i=1}^{N}{w_i}}$</center></h4>
This metric has derivative and can be used as objective. The derivatives of Logloss for single object are defined by the following formulas:
<h4><center>$\frac{\partial(Logloss_i)}{\partial a_i} = w_i * (t_i - p_i)$</center></h4>
<h4><center>$\frac{\partial^2(Logloss_i)}{(\partial a_i)^2} = -w_i * p_i * (1 - p_i)$</center></h4>
Below you can see implemented Logloss objective and metric.

In [4]:
class LoglossObjective(object):
    def calc_ders_range(self, approxes, targets, weights):
        assert len(approxes) == len(targets)
        if weights is not None:
            assert len(weights) == len(approxes)
        
        result = []
        for index in range(len(targets)):
            e = np.exp(approxes[index])
            p = e / (1 + e)
            der1 = targets[index] - p
            der2 = -p * (1 - p)

            if weights is not None:
                der1 *= weights[index]
                der2 *= weights[index]

            result.append((der1, der2))
        return result

In [5]:
class LoglossMetric(object):
    def get_final_error(self, error, weight):
        return error / (weight + 1e-38)

    def is_max_optimal(self):
        return False

    def evaluate(self, approxes, target, weight):
        assert len(approxes) == 1
        assert len(target) == len(approxes[0])

        approx = approxes[0]

        error_sum = 0.0
        weight_sum = 0.0

        for i in range(len(approx)):
            e = np.exp(approx[i])
            p = e / (1 + e)
            w = 1.0 if weight is None else weight[i]
            weight_sum += w
            error_sum += -w * (target[i] * np.log(p) + (1 - target[i]) * np.log(1 - p))

        return error_sum, weight_sum

Below there are examples of training with built-in Logloss function and our Logloss objective and metric. As we can see, the results are the same.

In [6]:
X, y = make_classification(n_classes=2, random_state=0)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

In [7]:
model1 = CatBoostClassifier(iterations=10, loss_function='Logloss', eval_metric='Logloss',
                            learning_rate=0.03, bootstrap_type='Bayesian', boost_from_average=False,
                            leaf_estimation_iterations=1, leaf_estimation_method='Gradient')
model1.fit(X_train, y_train, eval_set=(X_test, y_test))

0:	learn: 0.6900369	test: 0.6907175	best: 0.6907175 (0)	total: 47.5ms	remaining: 428ms
1:	learn: 0.6866047	test: 0.6873479	best: 0.6873479 (1)	total: 48.5ms	remaining: 194ms
2:	learn: 0.6835374	test: 0.6852325	best: 0.6852325 (2)	total: 49.7ms	remaining: 116ms
3:	learn: 0.6804561	test: 0.6829075	best: 0.6829075 (3)	total: 50.5ms	remaining: 75.7ms
4:	learn: 0.6776695	test: 0.6816999	best: 0.6816999 (4)	total: 51.4ms	remaining: 51.4ms
5:	learn: 0.6749048	test: 0.6794533	best: 0.6794533 (5)	total: 52.3ms	remaining: 34.8ms
6:	learn: 0.6712608	test: 0.6772634	best: 0.6772634 (6)	total: 53ms	remaining: 22.7ms
7:	learn: 0.6681650	test: 0.6747041	best: 0.6747041 (7)	total: 53.7ms	remaining: 13.4ms
8:	learn: 0.6658758	test: 0.6732683	best: 0.6732683 (8)	total: 54.4ms	remaining: 6.04ms
9:	learn: 0.6633794	test: 0.6720979	best: 0.6720979 (9)	total: 55ms	remaining: 0us

bestTest = 0.6720978617
bestIteration = 9



<catboost.core.CatBoostClassifier at 0x7f54e5f9edd0>

In [8]:
model2 = CatBoostClassifier(iterations=10, loss_function=LoglossObjective(), eval_metric=LoglossMetric(), 
                            learning_rate=0.03, bootstrap_type='Bayesian', boost_from_average=False,
                            leaf_estimation_iterations=1, leaf_estimation_method='Gradient')
model2.fit(X_train, y_train, eval_set=(X_test, y_test))

0:	learn: 0.6900380	test: 0.6907175	best: 0.6907175 (0)	total: 447ms	remaining: 4.02s
1:	learn: 0.6866060	test: 0.6873479	best: 0.6873479 (1)	total: 448ms	remaining: 1.79s
2:	learn: 0.6835392	test: 0.6852325	best: 0.6852325 (2)	total: 449ms	remaining: 1.05s
3:	learn: 0.6804590	test: 0.6829075	best: 0.6829075 (3)	total: 449ms	remaining: 674ms
4:	learn: 0.6776740	test: 0.6816999	best: 0.6816999 (4)	total: 450ms	remaining: 450ms
5:	learn: 0.6749116	test: 0.6794533	best: 0.6794533 (5)	total: 451ms	remaining: 300ms
6:	learn: 0.6712701	test: 0.6772634	best: 0.6772634 (6)	total: 451ms	remaining: 193ms
7:	learn: 0.6681755	test: 0.6747041	best: 0.6747041 (7)	total: 452ms	remaining: 113ms
8:	learn: 0.6658881	test: 0.6732683	best: 0.6732683 (8)	total: 452ms	remaining: 50.3ms
9:	learn: 0.6633931	test: 0.6720979	best: 0.6720979 (9)	total: 453ms	remaining: 0us

bestTest = 0.6720978617
bestIteration = 9



<catboost.core.CatBoostClassifier at 0x7f54e5f7bad0>

# 3. Regression

For regression approxes don't need any transformations. As an example of regression loss function and metric we take well-known RMSE which is defined by the following formulas:
<h3><center>$RMSE = \sqrt{\frac{\sum_{i=1}^{N}{w_i * (t_i - a_i)^2}}{\sum_{i=1}^{N}{w_i}}}$</center></h3>
It is more convenient to calculate MSE derivative, we will use it instead of RMSE derivative. It will not affect the solution as these metrics have the same optimums. 
<h4><center>$\frac{\partial(MSE)}{\partial a_i} = w_i * (t_i - a_i)$</center></h4>
<h4><center>$\frac{\partial^2(MSE)}{(\partial a_i)^2} = -w_i$</center></h4>

In [9]:
class RmseObjective(object):
    def calc_ders_range(self, approxes, targets, weights):
        assert len(approxes) == len(targets)
        if weights is not None:
            assert len(weights) == len(approxes)
        
        result = []
        for index in range(len(targets)):
            der1 = targets[index] - approxes[index]
            der2 = -1

            if weights is not None:
                der1 *= weights[index]
                der2 *= weights[index]

            result.append((der1, der2))
        return result

In [10]:
class RmseMetric(object):
    def get_final_error(self, error, weight):
        return np.sqrt(error / (weight + 1e-38))

    def is_max_optimal(self):
        return False

    def evaluate(self, approxes, target, weight):
        assert len(approxes) == 1
        assert len(target) == len(approxes[0])

        approx = approxes[0]

        error_sum = 0.0
        weight_sum = 0.0

        for i in range(len(approx)):
            w = 1.0 if weight is None else weight[i]
            weight_sum += w
            error_sum += w * ((approx[i] - target[i])**2)

        return error_sum, weight_sum

Below there are examples of training with built-in RMSE function and our RMSE objective and metric. As we can see, the results are the same.

In [11]:
X, y = make_regression(random_state=0)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

In [12]:
model1 = CatBoostRegressor(iterations=10, loss_function='RMSE', eval_metric='RMSE',
                           learning_rate=0.03, bootstrap_type='Bayesian', boost_from_average=False,
                           leaf_estimation_iterations=1, leaf_estimation_method='Gradient')
model1.fit(X_train, y_train, eval_set=(X_test, y_test))

0:	learn: 128.6631656	test: 140.6536718	best: 140.6536718 (0)	total: 3.23ms	remaining: 29ms
1:	learn: 128.0351695	test: 140.7369887	best: 140.6536718 (0)	total: 5.82ms	remaining: 23.3ms
2:	learn: 126.7781283	test: 141.0444768	best: 140.6536718 (0)	total: 11.1ms	remaining: 25.9ms
3:	learn: 125.7603646	test: 141.1458855	best: 140.6536718 (0)	total: 16.5ms	remaining: 24.7ms
4:	learn: 124.6922146	test: 141.0856002	best: 140.6536718 (0)	total: 20.9ms	remaining: 20.9ms
5:	learn: 123.6667350	test: 141.0495141	best: 140.6536718 (0)	total: 23.3ms	remaining: 15.5ms
6:	learn: 122.7210914	test: 140.8511986	best: 140.6536718 (0)	total: 30.7ms	remaining: 13.2ms
7:	learn: 121.8418528	test: 140.7646996	best: 140.6536718 (0)	total: 35.5ms	remaining: 8.88ms
8:	learn: 121.0103984	test: 140.4834561	best: 140.4834561 (8)	total: 39.4ms	remaining: 4.38ms
9:	learn: 119.9286951	test: 140.2935285	best: 140.2935285 (9)	total: 43.7ms	remaining: 0us

bestTest = 140.2935285
bestIteration = 9



<catboost.core.CatBoostRegressor at 0x7f55340dce50>

In [13]:
model2 = CatBoostRegressor(iterations=10, loss_function=RmseObjective(), eval_metric=RmseMetric(),
                           learning_rate=0.03, bootstrap_type='Bayesian', boost_from_average=False,
                           leaf_estimation_iterations=1, leaf_estimation_method='Gradient')
model2.fit(X_train, y_train, eval_set=(X_test, y_test))

0:	learn: 128.6631656	test: 140.6536718	best: 140.6536718 (0)	total: 410ms	remaining: 3.69s
1:	learn: 128.0351695	test: 140.7369887	best: 140.6536718 (0)	total: 413ms	remaining: 1.65s
2:	learn: 126.7781283	test: 141.0444768	best: 140.6536718 (0)	total: 416ms	remaining: 971ms
3:	learn: 125.7603646	test: 141.1458855	best: 140.6536718 (0)	total: 418ms	remaining: 627ms
4:	learn: 124.6922146	test: 141.0856002	best: 140.6536718 (0)	total: 420ms	remaining: 420ms
5:	learn: 123.6667350	test: 141.0495141	best: 140.6536718 (0)	total: 422ms	remaining: 282ms
6:	learn: 122.7210914	test: 140.8511986	best: 140.6536718 (0)	total: 424ms	remaining: 182ms
7:	learn: 121.8418528	test: 140.7646996	best: 140.6536718 (0)	total: 426ms	remaining: 106ms
8:	learn: 121.0103984	test: 140.4834561	best: 140.4834561 (8)	total: 427ms	remaining: 47.5ms
9:	learn: 119.9286951	test: 140.2935285	best: 140.2935285 (9)	total: 429ms	remaining: 0us

bestTest = 140.2935285
bestIteration = 9



<catboost.core.CatBoostRegressor at 0x7f54e5f9e990>

# 4. Multiclassification

Note: for multiclassification problems approxes are not equal to probabilities. Usually approxes are transformed to probabilities using Softmax function.
<h3><center>$p_{i,c} = \frac{e^{a_{i,c}}}{\sum_{j=1}^k{e^{a_{i,j}}}}$</center></h3>
<center>$p_{i,c}$ - the probability that $x_i$ belongs to class $c$</center>
<center>$k$ - number of classes</center>
<center>$a_{i,j}$ - approx for object $x_i$ for class $j$</center>

Let's implement MultiClass objective that is defined as follows:
<h3><center>$MultiClass_i = w_i * \log{p_{i,t_i}}$</center></h3>
<h3><center>$MultiClass = \frac{\sum_{i=1}^{N}Multiclass_i}{\sum_{i=1}^{N}w_i}$</center></h3>

<h3><center>$\frac{\partial(Multiclass_i)}{\partial{a_{i,c}}} = \begin{cases} 
w_i-\frac{w_i*e^{a_{i,c}}}{\sum_{j=1}^{k}e^{a_{i,j}}}, & \mbox{if } c = t_i \\ 
-\frac{w_i*e^{a_{i,c}}}{\sum_{j=1}^{k}e^{a_{i,j}}}, & \mbox{if } c \neq t_i 
\end{cases}$</center></h3>

<h3><center>$\frac{\partial^2(Multiclass_i)}{\partial{a_{i,c_1}}\partial{a_{i,c_2}}} = \begin{cases} 
\frac{w_i*e^{2*a_{i,c_1}}}{(\sum_{j=1}^{k}e^{a_{i,j}})^2} - \frac{w_i*e^{a_{i, c_1}}}{\sum_{j=1}^{k}e^{a_{i,j}}}, & \mbox{if } c_1 = c_2 \\ 
\frac{w_i*e^{a_{i,c_1}+a_{i,c_2}}}{(\sum_{j=1}^{k}e^{a_{i,j}})^2}, & \mbox{if } c_1 \neq c_2 
\end{cases}$</center></h3>

In [14]:
class MultiClassObjective(object):
    def calc_ders_multi(self, approx, target, weight):
        approx = [approx[i] - max(approx) for i in range(len(approx))]
        exp_approx = [np.exp(approx[i]) for i in range(len(approx))]
        exp_sum = 0.0
        for v in exp_approx:
            exp_sum += v
        grad = []
        hess = []
        for j in range(len(approx)):
            der1 = -exp_approx[j] / exp_sum
            if j == target:
                der1 += 1
            hess_row = []
            for j2 in range(len(approx)):
                der2 = exp_approx[j] * exp_approx[j2] / (exp_sum**2)
                if j2 == j:
                    der2 -= exp_approx[j] / exp_sum
                hess_row.append(der2 * weight)
                
            grad.append(der1 * weight)
            hess.append(hess_row)
            
        return (grad, hess)

In [15]:
class AccuracyMetric(object):
    def get_final_error(self, error, weight):
        return error / (weight + 1e-38)

    def is_max_optimal(self):
        return True

    def evaluate(self, approxes, target, weight):
        best_class = np.zeros(len(approxes[0]))
        
        for i in range(len(approxes[0])):
            approx_i = [approxes[j][i] for j in range(len(approxes))]
            best_class[i] = np.argmax(np.array(approx_i))
        
        accuracy_sum = 0
        weight_sum = 0 

        for i in range(len(target)):
            w = 1.0 if weight is None else weight[i]
            weight_sum += w
            
            accuracy_sum += w * (best_class[i] == target[i])

        return accuracy_sum, weight_sum

Below there are examples of training with built-in MultiClass function and our MultiClass objective. As we can see, the results are the same.

In [16]:
X, y = make_classification(n_samples=1000, n_features=50, n_informative=40, n_classes=5, random_state=0)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

In [17]:
model1 = CatBoostClassifier(iterations=10, loss_function='MultiClass', eval_metric='Accuracy',
                           learning_rate=0.03, bootstrap_type='Bayesian', boost_from_average=False,
                           leaf_estimation_iterations=1, leaf_estimation_method='Newton', classes_count=5)
model1.fit(X_train, y_train, eval_set=(X_test, y_test))

0:	learn: 0.3893333	test: 0.2200000	best: 0.2200000 (0)	total: 25.9ms	remaining: 233ms
1:	learn: 0.4933333	test: 0.2680000	best: 0.2680000 (1)	total: 41.9ms	remaining: 167ms
2:	learn: 0.5493333	test: 0.3240000	best: 0.3240000 (2)	total: 56.8ms	remaining: 132ms
3:	learn: 0.6080000	test: 0.3200000	best: 0.3240000 (2)	total: 72.8ms	remaining: 109ms
4:	learn: 0.6640000	test: 0.3240000	best: 0.3240000 (2)	total: 86ms	remaining: 86ms
5:	learn: 0.6880000	test: 0.3720000	best: 0.3720000 (5)	total: 99.3ms	remaining: 66.2ms
6:	learn: 0.7106667	test: 0.3640000	best: 0.3720000 (5)	total: 113ms	remaining: 48.3ms
7:	learn: 0.7066667	test: 0.3440000	best: 0.3720000 (5)	total: 127ms	remaining: 31.8ms
8:	learn: 0.7320000	test: 0.3800000	best: 0.3800000 (8)	total: 140ms	remaining: 15.6ms
9:	learn: 0.7453333	test: 0.3800000	best: 0.3800000 (8)	total: 156ms	remaining: 0us

bestTest = 0.38
bestIteration = 8

Shrink model to first 9 iterations.


<catboost.core.CatBoostClassifier at 0x7f54da793dd0>

In [18]:
model2 = CatBoostClassifier(iterations=10, loss_function=MultiClassObjective(), eval_metric=AccuracyMetric(),
                           learning_rate=0.03, bootstrap_type='Bayesian', boost_from_average=False,
                           leaf_estimation_iterations=1, leaf_estimation_method='Newton', classes_count=5)
model2.fit(X_train, y_train, eval_set=(X_test, y_test))

0:	learn: 0.3893333	test: 0.2200000	best: 0.2200000 (0)	total: 949ms	remaining: 8.54s
1:	learn: 0.4933333	test: 0.2680000	best: 0.2680000 (1)	total: 969ms	remaining: 3.88s
2:	learn: 0.5493333	test: 0.3240000	best: 0.3240000 (2)	total: 988ms	remaining: 2.31s
3:	learn: 0.6080000	test: 0.3200000	best: 0.3240000 (2)	total: 1.01s	remaining: 1.51s
4:	learn: 0.6640000	test: 0.3240000	best: 0.3240000 (2)	total: 1.03s	remaining: 1.03s
5:	learn: 0.6880000	test: 0.3720000	best: 0.3720000 (5)	total: 1.04s	remaining: 697ms
6:	learn: 0.7106667	test: 0.3640000	best: 0.3720000 (5)	total: 1.06s	remaining: 456ms
7:	learn: 0.7066667	test: 0.3440000	best: 0.3720000 (5)	total: 1.08s	remaining: 271ms
8:	learn: 0.7320000	test: 0.3800000	best: 0.3800000 (8)	total: 1.1s	remaining: 123ms
9:	learn: 0.7453333	test: 0.3800000	best: 0.3800000 (8)	total: 1.13s	remaining: 0us

bestTest = 0.38
bestIteration = 8

Shrink model to first 9 iterations.


<catboost.core.CatBoostClassifier at 0x7f54da788210>

# 5. Multiregression

<div class="alert alert-block alert-warning">
Multi-target custom objectives and metrics are required to be inherited from <code>cb.MultiTargetCustomObjective</code> and <code>cb.MultiTargetCustomMetric</code> respectively.
</div>    

We will take MultiRMSE as an example for multiregression loss function and metric:

<h3><center>$MultiRMSE = \sqrt{\displaystyle\frac{\sum_{i=1}^{N}\sum_{d=1}^{dim}(a_{i,d} - t_{i, d})^{2} w_{i}}{\sum_{i=1}^{N} w_{i}}},$</center></h3>
$dim$ is the identifier of the dimension of the target. Here we also will use MultiMSE derivatives as it is more convinient to calculate.
<h3><center>$\frac{\partial(MultiMSE)}{\partial(a_{i,d})} = w_i * (t_i - a_i)$</center></h3>
<h3><center>$\frac{\partial^2(MultiMSE)}{(\partial a_{i,d})^2} = -w_i$</center></h3>

In [19]:
class MultiRmseObjective(MultiTargetCustomObjective):
    def calc_ders_multi(self, approx, target, weight):
        assert len(target) == len(approx)
        
        w = weight if weight is not None else 1.0
        der1 = [(target[i] - approx[i]) * w for i in range(len(approx))]
        der2 = [-w for i in range(len(approx))]

        return (der1, der2)

In [20]:
class MultiRmseMetric(MultiTargetCustomMetric):
    def get_final_error(self, error, weight):
        return np.sqrt(error / (weight + 1e-38))

    def is_max_optimal(self):
        return False

    def evaluate(self, approxes, target, weight):
        assert len(target) == len(approxes)
        assert len(target[0]) == len(approxes[0])

        error_sum = 0.0
        weight_sum = 0.0
        
        for i in range(len(approxes[0])):
            w = 1.0 if weight is None else weight[i]
            weight_sum += w
            for d in range(len(approxes)):
                error_sum += w * ((approxes[d][i] - target[d][i])**2)
        return error_sum, weight_sum

Below there are examples of training with built-in MultiRMSE function and our MultiRMSE objective. As we can see, the results are the same.

In [21]:
X, y = make_regression(random_state=0, n_targets=3)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

In [22]:
model1 = CatBoostRegressor(iterations=10, loss_function='MultiRMSE', eval_metric='MultiRMSE',
                           learning_rate=0.03, bootstrap_type='Bayesian', boost_from_average=False,
                           leaf_estimation_iterations=1, leaf_estimation_method='Gradient')
model1.fit(X_train, y_train, eval_set=(X_test, y_test))

0:	learn: 259.1763401	test: 282.0142823	best: 282.0142823 (0)	total: 9.37ms	remaining: 84.3ms
1:	learn: 257.6741584	test: 281.6794060	best: 281.6794060 (1)	total: 16.5ms	remaining: 66ms
2:	learn: 256.0533781	test: 281.6770678	best: 281.6770678 (2)	total: 21.8ms	remaining: 50.9ms
3:	learn: 254.5986664	test: 282.3533075	best: 281.6770678 (2)	total: 26.5ms	remaining: 39.8ms
4:	learn: 252.7508961	test: 281.5696382	best: 281.5696382 (4)	total: 30.9ms	remaining: 30.9ms
5:	learn: 250.7948713	test: 281.0709409	best: 281.0709409 (5)	total: 35.6ms	remaining: 23.7ms
6:	learn: 249.8911839	test: 281.4832428	best: 281.0709409 (5)	total: 43.9ms	remaining: 18.8ms
7:	learn: 248.3460853	test: 282.1097335	best: 281.0709409 (5)	total: 50.3ms	remaining: 12.6ms
8:	learn: 246.9427803	test: 281.7479127	best: 281.0709409 (5)	total: 54.8ms	remaining: 6.09ms
9:	learn: 245.7896028	test: 282.3497824	best: 281.0709409 (5)	total: 60.3ms	remaining: 0us

bestTest = 281.0709409
bestIteration = 5

Shrink model to first 

<catboost.core.CatBoostRegressor at 0x7f54e10d7890>

In [23]:
model2 = CatBoostRegressor(iterations=10, loss_function=MultiRmseObjective(), eval_metric=MultiRmseMetric(),
                           learning_rate=0.03, bootstrap_type='Bayesian', boost_from_average=False,
                           leaf_estimation_iterations=1, leaf_estimation_method='Gradient')
model2.fit(X_train, y_train, eval_set=(X_test, y_test))

0:	learn: 259.1763401	test: 282.0142823	best: 282.0142823 (0)	total: 493ms	remaining: 4.44s
1:	learn: 257.6741584	test: 281.6794060	best: 281.6794060 (1)	total: 500ms	remaining: 2s
2:	learn: 256.0533781	test: 281.6770678	best: 281.6770678 (2)	total: 505ms	remaining: 1.18s
3:	learn: 254.5986664	test: 282.3533075	best: 281.6770678 (2)	total: 510ms	remaining: 765ms
4:	learn: 252.7508961	test: 281.5696382	best: 281.5696382 (4)	total: 515ms	remaining: 515ms
5:	learn: 250.7948713	test: 281.0709409	best: 281.0709409 (5)	total: 521ms	remaining: 347ms
6:	learn: 249.8911839	test: 281.4832428	best: 281.0709409 (5)	total: 528ms	remaining: 226ms
7:	learn: 248.3460853	test: 282.1097335	best: 281.0709409 (5)	total: 535ms	remaining: 134ms
8:	learn: 246.9427803	test: 281.7479127	best: 281.0709409 (5)	total: 541ms	remaining: 60.1ms
9:	learn: 245.7896028	test: 282.3497824	best: 281.0709409 (5)	total: 545ms	remaining: 0us

bestTest = 281.0709409
bestIteration = 5

Shrink model to first 6 iterations.


<catboost.core.CatBoostRegressor at 0x7f54da788f50>

# 6. MultiLabel Classification

<div class="alert alert-block alert-warning">
Multi-target custom objectives and metrics are required to be inherited from <code>catboost.MultiTargetCustomObjective</code> and <code>catboost.MultiTargetCustomMetric</code> respectively.
</div>    

Note: for binary multilabel classification problems approxes are not equal to probabilities. Probability for $i$ object and $j$ class dimension is calculated from approxes using sigmoid function:
<h4><center>$p_{ij}=\frac{1}{1 + e^{-a_{ij}}}=\frac{e^{a_{ij}}}{1 + e^{a_{ij}}}$</center></h4>
As an example, let's take MultiLogloss metric which is defined by the following formula:
<h3><center>$MultiLogloss = \frac{-\sum_{j=0}^{M-1} \sum_{i=1}^{N} w_{i} (t_{ij} \log p_{ij} + (1-t_{ij}) \log (1 - p_{ij}) )}{M\cdot\sum_{i=1}^{N}w_{i}} { ,}$</center></h3>

  where $t_{ij} \in {0, 1}$ is the corresponding class.  
This metric has derivative and can be used as objective. The derivatives of MultiLogloss for $i$ object and $j$ class dimension are defined by the following formulas:
<h4><center>$\frac{\partial(MultiLogloss)}{\partial(a_{ij})} = w_i * (t_{ij} - p_{ij})$</center></h4>
<h4><center>$\frac{\partial^2(MultiLogloss)}{(\partial a_{ij})^2} = -w_i * p_{ij} * (1 - p_{ij})$</center></h4>
Below you can see implemented MultiLogloss objective and metric.

In [24]:
class MultiLoglossObjective(MultiTargetCustomObjective):
    def calc_ders_multi(self, approx, target, weight):
        assert len(target) == len(approx)
        
        e = np.exp(approx)
        p = e / (1 + e)
        
        w = weight if weight is not None else 1.0
        der1 = [(target[i] - p[i]) * w for i in range(len(approx))]
        der2 = [-p[i] * (1 - p[i]) * w for i in range(len(approx))]

        return (der1, der2)

In [25]:
class MultiLoglossMetric(MultiTargetCustomMetric):
    def get_final_error(self, error, weight):
        return error / (weight + 1e-38)

    def is_max_optimal(self):
        return False

    def evaluate(self, approxes, target, weight):
        assert len(target) == len(approxes)
        assert len(target[0]) == len(approxes[0])

        error_sum = 0.0
        weight_sum = 0.0

        for i in range(len(approxes[0])):
            w = 1.0 if weight is None else weight[i]
            weight_sum += w
            for d in range(len(approxes)):
                e = np.exp(approxes[d][i])
                p = e / (1 + e)
                error_sum += -w * (target[d][i] * np.log(p) + (1 - target[d][i]) * np.log(1 - p))
        return error_sum / len(approxes), weight_sum

Below there are examples of training with built-in MultiLogloss function and our MultiLogloss objective. As we can see, the results are the same.

In [26]:
X, y = make_multilabel_classification(random_state=0, n_classes=3)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

In [27]:
model1 = CatBoostClassifier(iterations=10, loss_function='MultiLogloss', eval_metric='MultiLogloss',
                            learning_rate=0.03, bootstrap_type='Bayesian', boost_from_average=False,
                            leaf_estimation_iterations=1, leaf_estimation_method='Gradient')
model1.fit(X_train, y_train, eval_set=(X_test, y_test))

0:	learn: 0.6913620	test: 0.6923958	best: 0.6923958 (0)	total: 645us	remaining: 5.81ms
1:	learn: 0.6893857	test: 0.6919275	best: 0.6919275 (1)	total: 1.35ms	remaining: 5.39ms
2:	learn: 0.6874246	test: 0.6909644	best: 0.6909644 (2)	total: 2.31ms	remaining: 5.38ms
3:	learn: 0.6854196	test: 0.6897256	best: 0.6897256 (3)	total: 2.79ms	remaining: 4.18ms
4:	learn: 0.6832049	test: 0.6888271	best: 0.6888271 (4)	total: 3.49ms	remaining: 3.49ms
5:	learn: 0.6810096	test: 0.6885706	best: 0.6885706 (5)	total: 3.99ms	remaining: 2.66ms
6:	learn: 0.6790223	test: 0.6875159	best: 0.6875159 (6)	total: 4.76ms	remaining: 2.04ms
7:	learn: 0.6773518	test: 0.6866457	best: 0.6866457 (7)	total: 5.57ms	remaining: 1.39ms
8:	learn: 0.6750769	test: 0.6850851	best: 0.6850851 (8)	total: 6.07ms	remaining: 674us
9:	learn: 0.6731996	test: 0.6845872	best: 0.6845872 (9)	total: 6.55ms	remaining: 0us

bestTest = 0.6845871591
bestIteration = 9



<catboost.core.CatBoostClassifier at 0x7f54e10d7450>

In [28]:
model2 = CatBoostClassifier(iterations=10, loss_function=MultiLoglossObjective(), eval_metric=MultiLoglossMetric(),
                            learning_rate=0.03, bootstrap_type='Bayesian', boost_from_average=False,
                            leaf_estimation_iterations=1, leaf_estimation_method='Gradient')
model2.fit(X_train, y_train, eval_set=(X_test, y_test))

0:	learn: 0.6913620	test: 0.6923958	best: 0.6923958 (0)	total: 547ms	remaining: 4.92s
1:	learn: 0.6893857	test: 0.6919275	best: 0.6919275 (1)	total: 548ms	remaining: 2.19s
2:	learn: 0.6874246	test: 0.6909644	best: 0.6909644 (2)	total: 549ms	remaining: 1.28s
3:	learn: 0.6854196	test: 0.6897256	best: 0.6897256 (3)	total: 549ms	remaining: 824ms
4:	learn: 0.6832049	test: 0.6888271	best: 0.6888271 (4)	total: 550ms	remaining: 550ms
5:	learn: 0.6810096	test: 0.6885706	best: 0.6885706 (5)	total: 550ms	remaining: 367ms
6:	learn: 0.6790223	test: 0.6875159	best: 0.6875159 (6)	total: 551ms	remaining: 236ms
7:	learn: 0.6773518	test: 0.6866457	best: 0.6866457 (7)	total: 551ms	remaining: 138ms
8:	learn: 0.6750769	test: 0.6850851	best: 0.6850851 (8)	total: 552ms	remaining: 61.3ms
9:	learn: 0.6731996	test: 0.6845872	best: 0.6845872 (9)	total: 552ms	remaining: 0us

bestTest = 0.6845871591
bestIteration = 9



<catboost.core.CatBoostClassifier at 0x7f54e10d7d90>