# MultiLabelBinarizer 

This notebook will cover how to use multi-label binarizers to represent multiple categories into numeric representations

It is a utility class to help create a label indicator matrix from a list of multi-label labels

https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MultiLabelBinarizer.html#sklearn.preprocessing.MultiLabelBinarizer

In [1]:
from sklearn.preprocessing import MultiLabelBinarizer

import numpy as np
import pandas as pd

### Creating a label Binarizer object
Converts a list or a set of tuples to a multi-label format where each category is represented by its presence or absence

In [2]:
multilabel_binarizer = MultiLabelBinarizer()

* neg_label : int (default: 0)
Value, with which negative labels must be encoded.
* pos_label : int (default: 1)
Value, with which positive labels must be encoded.
* sparse_output : boolean (default: False)
True if the returned array from transform is desired to be in sparse CSR format.

### Fitting the labels

In [3]:
courses = [
    ('Math', 'English'),
    ('Math', 'Science'),
    ('Geography', 'History'),
    ('Statistics', )
]

In [4]:
multilabel_binarizer.fit(courses)

MultiLabelBinarizer(classes=None, sparse_output=False)

### Label classes

Order of the labels represents the order for the columns corresponding to the labels

In [5]:
multilabel_binarizer.classes_

array(['English', 'Geography', 'History', 'Math', 'Science', 'Statistics'],
      dtype=object)

### Tranforming the labels to binary form

Can have multiple ones to indicate belonging to multiple categories

In [6]:
multilabel_binarizer.transform(courses)

array([[1, 0, 0, 1, 0, 0],
       [0, 0, 0, 1, 1, 0],
       [0, 1, 1, 0, 0, 0],
       [0, 0, 0, 0, 0, 1]])

In [7]:
new_courses = [
    ('Math', 'Statistics'),
    ('Geography', 'History', 'Math')
]

In [8]:
multilabel_binarizer.transform(new_courses)

array([[0, 0, 0, 1, 0, 1],
       [0, 1, 1, 1, 0, 0]])