## 1. Intro

There are several ways to get multilabel output:

* Binary relevance
* Classifier stacking

### 1.1 Binary relevance

Assume, that all classes are independant. Build $K$ classifiers $b_1(x), ... , b_k(x)$, to predict for 1 sample, if it belongs to class $K$ or not. Final answer will be $a(x) = (b_1(x), \dots , b_k(x))$.

In [9]:
import numpy as np
from sklearn.datasets import make_classification
from sklearn.multioutput import MultiOutputClassifier
from sklearn.utils import shuffle

X, y1 = make_classification(n_samples=1000, n_features=10, n_informative=5, n_redundant=5, n_classes=3, random_state=1)
y2 = shuffle(y1, random_state=1)
y3 = shuffle(y1, random_state=2)
Y = np.vstack((y1, y2, y3)).T

In [10]:
from sklearn.svm import SVC

svm = SVC()

In [12]:
multi_target_svc = MultiOutputClassifier(svm, n_jobs=-1)
multi_target_svc.fit(X, Y).predict(X)

array([[1, 0, 1],
       [0, 0, 1],
       [1, 0, 1],
       ...,
       [0, 1, 1],
       [2, 0, 2],
       [1, 0, 0]])

### 1.1 Binary relevance

**Idea** Take into account correlation between classes.

Separate $X$ into 2 parts $X_1$ and $X_2$. 
Train $k$ independent classifiers $b_1(x), ... , b_k(x)$ on $X_1$ as in binary relevance method. Predict features from sample $X_2$ and get $X_{2}^{'}$. Train $k$ independent classifiers $a_1(x), ... , a_k(x)$ on $X_{2}^{'}$ as in binary relevance method. 

Since the results of $a$ are based on forecasts of $b$, then we take into account the interaction between the classes.