### Problem 2

If in CatBoostClassifier `loss_function=MultiClass` and the labels are in range `{0,1}`, then the probabilities of classes are diferent from  `loss_function=Logloss`

In [1]:
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris

# Preparing Iris data

In [2]:
np.random.seed(0)
iris = load_iris()
df = pd.DataFrame(iris.data, columns=iris.feature_names)
df['species'] = pd.Categorical.from_codes(iris.target, iris.target_names)

features = df.columns[:4]

df['species'] = pd.factorize(df['species'])[0] # enumerate lables
df['is_train'] = np.random.uniform(0, 1, len(df)) <= .75

In [3]:
df = df[df['species'] <= 1] # only {0,1} labels

In [4]:
df['species'].unique()

array([0, 1])

# This is a binary classification task

In [5]:
X_train, X_test = df[df['is_train']==True], df[df['is_train']==False]

y_train = X_train['species']
y_test = X_test['species']

X_train = X_train[features]
X_test = X_test[features]

In [6]:
from catboost import CatBoostClassifier

In [7]:
cb_clf_multiclass = CatBoostClassifier(loss_function="MultiClass", n_estimators=5)
cb_clf_multiclass.fit(X_train, y_train)

0:	learn: -0.6723609	total: 57.1ms	remaining: 228ms
1:	learn: -0.6562648	total: 69.9ms	remaining: 105ms
2:	learn: -0.6350226	total: 75.9ms	remaining: 50.6ms
3:	learn: -0.6177452	total: 82.7ms	remaining: 20.7ms
4:	learn: -0.6010231	total: 89.7ms	remaining: 0us


<catboost.core.CatBoostClassifier at 0x7f0c59f0ee48>

In [8]:
cb_clf_binary = CatBoostClassifier(loss_function="Logloss", n_estimators=5)
cb_clf_binary.fit(X_train, y_train)

Learning rate set to 0.5
0:	learn: 0.1619941	total: 67.5ms	remaining: 270ms
1:	learn: 0.0671021	total: 104ms	remaining: 156ms
2:	learn: 0.0378335	total: 151ms	remaining: 101ms
3:	learn: 0.0162693	total: 181ms	remaining: 45.4ms
4:	learn: 0.0093864	total: 215ms	remaining: 0us


<catboost.core.CatBoostClassifier at 0x7f0c59f20860>

In [9]:
cb_clf_binary.predict_proba(X_test)[:3]

array([[0.99567032, 0.00432968],
       [0.99119748, 0.00880252],
       [0.99567032, 0.00432968]])

In [10]:
cb_clf_multiclass.predict_proba(X_test)[:3]

array([[0.55703732, 0.44296268],
       [0.55278628, 0.44721372],
       [0.55703732, 0.44296268]])

We see that the probabilities are different.

**Possible solution:** check the number of unique values in the `target` vector. If the there are only two unique values, then automatically switch `loss_function` to `Logloss` or print warning to to user that "your labels are in range `{0,1}`, maybe you should set `loss_function=Logloss`"