# Multi Label Binary Classification with CatBoost

CatBoost since the 1.0.0 version supports multilabel binary classification. A target for this mode is a matrix of size $NxK$ where $N$ is the number of objects and $K$ is the number of classes (a.k.a. labels).

There are two loss functions for multilabel binary classification - $MultiLogloss$ and $MultiCrossEntropy$.

For $MultiLogloss$ loss function $y_{ik}$ should be in $\{0, 1\}$:
$$
Y = \begin{pmatrix}
1 & 0 & 0 & ... & 1 & 1\\
0 & 1 & 0 & ... & 1 & 0\\
... & ... & ... & y_{ik} & ... & ...\\
1 & 0 & 1 & ... & 0 & 1\\
\end{pmatrix}_{NxK}
$$

For $MultiCrossEntropy$ loss function $y_{ik}$ should be in $[0, 1]$:
$$
Y = \begin{pmatrix}
0.2 & 0.8 & 0.0 & ... & 0.7 & 0.3\\
0 & 0.6 & 0.3 & ... & 0.9 & 0.0\\
... & ... & ... & y_{ik} & ... & ...\\
1.0 & 0.0 & 0.3 & ... & 0.3 & 0.8\\
\end{pmatrix}_{NxK}
$$

The formula is the same for both loss functions:
$$
MultiLogloss = MultiCrossEntropy = \displaystyle\frac{-\sum\limits_{k=1}^{K} \sum\limits_{i=1}^{N} w_{i} (y_{ik} \log p_{ik} + (1 - t_{ik}) \log (1 - p_{ik}) )}{K\sum\limits_{i=1}^{N}w_{i}}{ ,}
$$

$$
  \mbox{where }p_{ik} = \sigma(a_{ik}) = \frac{e^{a_{ik}}}{1 + e^{a_{ik}}}{ ,}
$$

$$
 a_{ik} \mbox{- raw value of model for } i\mbox{-th object and } j\mbox{-th class}
$$

Let's try multilabel classification mode on synthetic dataset.  

In [1]:
# !pip install catboost
# !pip install sklearn

In [2]:
from catboost import CatBoostClassifier, Pool
from catboost.utils import eval_metric
from sklearn.datasets import make_multilabel_classification
from sklearn.model_selection import train_test_split

#### Generate synthetic dataset

In [3]:
X, Y = make_multilabel_classification(n_samples=500, n_features=20, n_classes=5, random_state=0)
X_train, X_test, Y_train, Y_test = train_test_split(X, Y)
train_pool = Pool(X_train, Y_train)
test_pool = Pool(X_test, Y_test)

#### Train model

In [4]:
clf = CatBoostClassifier(
    loss_function='MultiLogloss',
    eval_metric='HammingLoss',
    iterations=500,
    class_names=['A', 'B', 'C', 'D', 'E']
)
clf.fit(train_pool, eval_set=test_pool, metric_period=10, plot=True, verbose=50)

Learning rate set to 0.033623
0:	learn: 0.2778667	test: 0.3328000	best: 0.3328000 (0)	total: 51ms	remaining: 25.4s
50:	learn: 0.0917333	test: 0.2496000	best: 0.2496000 (50)	total: 135ms	remaining: 1.19s
100:	learn: 0.0432000	test: 0.2304000	best: 0.2304000 (100)	total: 217ms	remaining: 857ms
150:	learn: 0.0160000	test: 0.2352000	best: 0.2304000 (100)	total: 301ms	remaining: 695ms
200:	learn: 0.0058667	test: 0.2352000	best: 0.2304000 (100)	total: 390ms	remaining: 580ms
250:	learn: 0.0016000	test: 0.2320000	best: 0.2304000 (100)	total: 480ms	remaining: 476ms
300:	learn: 0.0000000	test: 0.2384000	best: 0.2304000 (100)	total: 572ms	remaining: 378ms
350:	learn: 0.0000000	test: 0.2320000	best: 0.2304000 (100)	total: 663ms	remaining: 281ms
400:	learn: 0.0000000	test: 0.2368000	best: 0.2304000 (100)	total: 756ms	remaining: 187ms
450:	learn: 0.0000000	test: 0.2336000	best: 0.2304000 (100)	total: 851ms	remaining: 92.5ms
499:	learn: 0.0000000	test: 0.2384000	best: 0.2304000 (100)	total: 947ms	rem

<catboost.core.CatBoostClassifier at 0x7f14ab052b50>

#### Predict for test data

In [5]:
test_predict = clf.predict(X_test)

## Metrics

**Common deifinitions:**

$N$ - the number of objects

$K$ - the number of labels (classes)

$y_{ik} \in {0, 1}$ - the target value of k-th label for i-th object

$c_{ik} \in {0, 1}$ - the predicted value of k-th label for i-th object

$p_{ik} \in [0, 1]$ - the predicted probability of k-th label for i-th object

### Accuracy

Classic accuracy for multilabel classifications for each object compares whole target vector with whole predicted vector:
$$Accuracy_i = 
\begin{equation}
    \begin{cases}
      1, \mbox{if }\forall k: c_{ik} = y_{ik} \\
      0, \mbox{otherwise}
    \end{cases}\,
\end{equation}
$$

And total accuracy is an average accuracy for all objects:
$$Accuracy = \frac{\sum\limits_{i=1}^{N} w_i Accuracy_i}{\sum\limits_{i=1}^{N} w_i} $$

In [8]:
accuracy = eval_metric(Y_test, test_predict, 'Accuracy')[0]
print(f'Accuracy: {accuracy}')

Accuracy: 0.368


### Accuracy per class

You may want to calculate accuracy for each class individually, then you should specify `type` parameter for Accuracy metric like this:

In [7]:
accuracy_per_class = eval_metric(Y_test, test_predict, 'Accuracy:type=PerClass')
for cls, value in zip(clf.classes_, accuracy_per_class):
    print(f'Accuracy for class {cls}: {value}')

Accuracy for class A: 0.776
Accuracy for class B: 0.712
Accuracy for class C: 0.744
Accuracy for class D: 0.776
Accuracy for class E: 0.84


Accuracy per class is usually better because it more sensible for model changes. But we can't use it as `eval_metric` for early stopping and best model selection, because it returns several values.
Instead of accuracy per class we can use $HammingLoss$.

### HammingLoss

Essentially $HammingLoss$ is a mean accuracy per class subtracted from $1$. So the lower value of HammingLoss is better. 

$$
HammingLoss = \frac{\sum\limits_{i=1}^{N} \sum\limits_{j=1}^{K} I(c_{ik}, y_{ik})}{NK}
$$

$$
\mbox{where } I(c, y) = 
\begin{equation}
    \begin{cases}
      1, \mbox{if } c = y \\
      0, \mbox{otherwise}
    \end{cases}\,
\end{equation}
$$

In [9]:
hamming = eval_metric(Y_test, test_predict, 'HammingLoss')[0]
print(f'HammingLoss: {hamming:.4f}')
mean_accuracy_per_class = sum(accuracy_per_class) / len(accuracy_per_class)
print(f'MeanAccuracyPerClass: {mean_accuracy_per_class:.4f}')
print(f'HammingLoss + MeanAccuracyPerClass = {hamming + mean_accuracy_per_class}')

HammingLoss: 0.2304
MeanAccuracyPerClass: 0.7696
HammingLoss + MeanAccuracyPerClass = 1.0


### Precision, Recall, F1

These metrics are calculated for each class individually:

In [10]:
for metric in ('Precision', 'Recall', 'F1'):
    print(metric)
    values = eval_metric(Y_test, test_predict, metric)
    for cls, value in zip(clf.classes_, values):
        print(f'class={cls}: {value:.4f}')
    print()

Precision
class=A: 0.8333
class=B: 0.6066
class=C: 0.7826
class=D: 0.8235
class=E: 0.9655

Recall
class=A: 0.3750
class=B: 0.7551
class=C: 0.4000
class=D: 0.5600
class=E: 0.5957

F1
class=A: 0.5172
class=B: 0.6727
class=C: 0.5294
class=D: 0.6667
class=E: 0.7368

