# What is label encoder

Scikit understands only Numeric values, when string values like color(red, blue, green etc), or cities(london, paris, rome etc) come in we need to convert these to numeric values, to represent them.

     red      0
     ------------------------
     blue     1
     ------------------------
     green    2

In [19]:
from sklearn import preprocessing

cities           = ['paris', 'paris', 'tokyo', 'amsterdam']

le               = preprocessing.LabelEncoder()
encoded_value    = le.fit_transform(cities)
encoded_value

array([1, 1, 2, 0])

We can always get back the city values from the encoded values

In [20]:
encoded_city_values = [0, 1, 2]
le.inverse_transform(encoded_city_values)

array(['amsterdam', 'paris', 'tokyo'], dtype='<U9')

# Understanding LabelBinarizer

Label Binarizer is basically a Label encoder, except that:
    1. It will assign binary labels to entities in the list, that is 1 or 0, or a series of 1's and 0's for more than 2 states.
    2. If two values are present ON and OFF, they will get values 1 and 0
    3. This is because 1 bit is enough to represent 2 states.
    
The following two examples show this
    1. In the first example, simple ON , OFF is encoded as 1 and 0
    2. In the second example, a series switch presses is shown: ON, OFF, ON, OFF and this is encoded as 1, 0, 1, 0

In [23]:
switch_state                 = ['on', 'off']
encoded_switch_states_values = lb.fit_transform(switch_state)
encoded_switch_states_values

array([[1],
       [0]])

In [26]:
switch_press_sequence = ['on', 'off', 'on', 'off']
sps_values            = lb.fit_transform(switch_press_sequence)
sps_values

array([[1],
       [0],
       [1],
       [0]])

In this example we look at a sequence of elements which represent 3 states:
    1. ON
    2. OFF
    3. FUZZY

We can't use just 1 or 0, because we now have a third state. So Label Binarizer encodes this as:
    1. 0 0 1 AS ONE
    2. 0 1 0 AS TWO
    3. 1 0 0 AS FUZZY

In [22]:
states                = ['on', 'off', 'fuzzy']

lb                    = preprocessing.LabelBinarizer()
encoded_states_values = lb.fit_transform(states)
encoded_states_values

array([[0, 0, 1],
       [0, 1, 0],
       [1, 0, 0]])

In this example we see a series of switch press sequence, like earlier, except that we now introduced a new press state called 'FUZZY'

The Label Binarizer has perfectly encoded them as shown below.

In [27]:
faulty_switch_press_sequence = ['on', 'off', 'on', 'off', 'fuzzy']
faulty_sps_values            = lb.fit_transform(faulty_switch_press_sequence)
faulty_sps_values

array([[0, 0, 1],
       [0, 1, 0],
       [0, 0, 1],
       [0, 1, 0],
       [1, 0, 0]])

We can also use this in a slightly different way.
    1. First train the Binarizer.
    2. Next do the transform
    
In the succeeding example, we show how we can get back the states from encoded values.

In [29]:
lb = preprocessing.LabelBinarizer()

lb.fit(faulty_switch_press_sequence)
switch_press_transformed = lb.transform(['on', 'fuzzy'])
switch_press_transformed

array([[0, 0, 1],
       [1, 0, 0]])

In [30]:
lb.inverse_transform(switch_press_transformed)

array(['on', 'fuzzy'], dtype='<U5')