# LabelBinarizer 

This notebook will cover how the LabelBinarizer binarizes labels in a one-vs-all fashion, converts multi-class labels to binary labels 

It is a utility class to help create a label indicator matrix from a list of multi-class labels

https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelBinarizer.html

In [1]:
from sklearn.preprocessing import LabelBinarizer

import numpy as np
import pandas as pd

### Creating a label Binarizer object

In [2]:
num_binarizer = LabelBinarizer()

* neg_label : int (default: 0)
Value, with which negative labels must be encoded.
* pos_label : int (default: 1)
Value, with which positive labels must be encoded.
* sparse_output : boolean (default: False)
True if the returned array from transform is desired to be in sparse CSR format.

The label binarizer may seem very similar to one-hot-encoding, but scikit-learn recomends the label binarizer to encode y values and one hot encoding for x values

### Fitting the labels

In [3]:
num_binarizer.fit([2, 5, 6, 4, 5])

LabelBinarizer(neg_label=0, pos_label=1, sparse_output=False)

### Label classes

In [4]:
num_binarizer.classes_

array([2, 4, 5, 6])

### Tranforming the labels to binary form
* Each of the rows of the matrix represents each of the label classes (in alphanumeric order)
* The first row is indicating the label `2`, the second row is indicating the label `5` and so on
* Here `5` is repeating which is in 1 and 4 index and in the binarize matrix also the indication is same at roew of index 1 and 4

In [5]:
num_binarizer.transform([2, 5, 6, 4, 5])

array([[1, 0, 0, 0],
       [0, 0, 1, 0],
       [0, 0, 0, 1],
       [0, 1, 0, 0],
       [0, 0, 1, 0]])

Here labels `2` and `6` are indicated with the same rows of number one and three as above

In [6]:
num_binarizer.transform([2, 6])

array([[1, 0, 0, 0],
       [0, 0, 0, 1]])

Initiating the label binarizer object for non numerical values
* We can set the `negative label` and `pos label`

In [7]:
temp_binarizer = LabelBinarizer(neg_label=-1, 
                                pos_label=1, 
                                sparse_output=False)

In [8]:
temperature = ['cold', 
               'cold', 
               'warm', 
               'cold', 
               'hot', 
               'hot', 
               'warm', 
               'cold', 
               'warm', 
               'hot']

In [9]:
temp_binarizer.fit_transform(temperature)

array([[ 1, -1, -1],
       [ 1, -1, -1],
       [-1, -1,  1],
       [ 1, -1, -1],
       [-1,  1, -1],
       [-1,  1, -1],
       [-1, -1,  1],
       [ 1, -1, -1],
       [-1, -1,  1],
       [-1,  1, -1]])

In [10]:
temp_binarizer.classes_

array(['cold', 'hot', 'warm'], dtype='<U4')

## Label binarizing binary labels

In [11]:
binary_label_binarizer = LabelBinarizer()

In [12]:
binary_label_binarizer.fit([1, 0, 0, 1])

LabelBinarizer(neg_label=0, pos_label=1, sparse_output=False)

In [13]:
binary_label_binarizer.classes_

array([0, 1])

It transforms binary targets to a column vector, one distinction is that teh result is NOT the same as for one-hot-encoding, uses one vs rest encoding

In [14]:
binary_label_binarizer.transform([1, 0, 0, 1])

array([[1],
       [0],
       [0],
       [1]])

It does same with strings having binary property

In [15]:
binary_label_binarizer.fit_transform(['yes', 'no', 'no', 'yes'])

array([[1],
       [0],
       [0],
       [1]])

Label Binarization of pandas dataframe labels

In [16]:
employee_data = pd.read_csv("datasets/employee_salary.csv")
employee_data

Unnamed: 0,Designation,Age,Salary,Retired
0,Manager,54,72000,Yes
1,Supervisor,27,32000,No
2,Vice-president,30,42000,No
3,Manager,58,83000,Yes
4,Supervisor,40,35000,No
5,Supervisor,35,42000,No
6,Employee,40,48000,No
7,Vice-president,55,79000,Yes
8,Employee,45,67000,No
9,Supervisor,40,45000,No


In [17]:
designation_binarizer = LabelBinarizer()

In [18]:
designation_data = pd.DataFrame(designation_binarizer.fit_transform(employee_data["Designation"]),
                                columns = designation_binarizer.classes_,
                                index = employee_data.index)

designation_data.head()

Unnamed: 0,Employee,Manager,Supervisor,Vice-president
0,0,1,0,0
1,0,0,1,0
2,0,0,0,1
3,0,1,0,0
4,0,0,1,0


In [19]:
joined_data = employee_data.join(designation_data)

In [20]:
print(designation_binarizer.classes_)

['Employee' 'Manager' 'Supervisor' 'Vice-president']


In [21]:
joined_data.head()

Unnamed: 0,Designation,Age,Salary,Retired,Employee,Manager,Supervisor,Vice-president
0,Manager,54,72000,Yes,0,1,0,0
1,Supervisor,27,32000,No,0,0,1,0
2,Vice-president,30,42000,No,0,0,0,1
3,Manager,58,83000,Yes,0,1,0,0
4,Supervisor,40,35000,No,0,0,1,0
