You have a feature with nominal classes that has no intrinsic ordering (e.g.,
apple, pear, banana).

One-hot encode the feature using scikit-learn’s LabelBinarizer:


In [11]:
from sklearn.preprocessing import LabelBinarizer, MultiLabelBinarizer
import numpy as np
import pandas as pd



In [12]:
# Create feature
feature = np.array([["Texas"],
["California"],
["Texas"],
["Delaware"],
["Texas"]])

In [13]:
one_hot=LabelBinarizer()
one_hot.fit_transform(feature)

array([[0, 0, 1],
       [1, 0, 0],
       [0, 0, 1],
       [0, 1, 0],
       [0, 0, 1]])

In [14]:
one_hot.classes_

array(['California', 'Delaware', 'Texas'], dtype='<U10')

If we want to reverse the one-hot encoding, we can use inverse_transform

In [15]:
# Reverse one hot encoding
one_hot.inverse_transform(one_hot.fit_transform(feature))

array(['Texas', 'California', 'Texas', 'Delaware', 'Texas'], dtype='<U10')

We can even use pandas to one-hot encode the feature

In [18]:
# Create dummy variables from feature
pd.get_dummies(feature[:,0])


Unnamed: 0,California,Delaware,Texas
0,0,0,1
1,1,0,0
2,0,0,1
3,0,1,0
4,0,0,1


One helpful ability of scikit-learn is to handle a situation where each observation
lists multiple classes:

In [19]:
multiclass_feature = [("Texas", "Florida"),
("California", "Alabama"),("Texas", "Florida"),
("Delware", "Florida"),
("Texas", "Alabama")]

In [20]:
one_hot_multi=MultiLabelBinarizer()
one_hot_multi.fit_transform(multiclass_feature)

array([[0, 0, 0, 1, 1],
       [1, 1, 0, 0, 0],
       [0, 0, 0, 1, 1],
       [0, 0, 1, 1, 0],
       [1, 0, 0, 0, 1]])

In [21]:
one_hot_multi.classes_

array(['Alabama', 'California', 'Delware', 'Florida', 'Texas'],
      dtype=object)

We might think the proper strategy is to assign each class a numerical value
(e.g., Texas = 1, California = 2). However, when our classes have no intrinsic
ordering (e.g., Texas isn’t “less” than California), our numerical values
erroneously create an ordering that is not present.
The proper strategy is to create a binary feature for each class in the original
feature. This is often called one-hot encoding (in machine learning literature) or
dummying (in statistical and research literature). Our solution’s feature was a
vector containing three classes (i.e., Texas, California, and Delaware). In one-hot
encoding, each class becomes its own feature with 1s when the class appears and
0s otherwise. Because our feature had three classes, one-hot encoding returned
three binary features (one for each class). By using one-hot encoding we can
capture the membership of an observation in a class while preserving the notion
that the class lacks any sort of hierarchy.

Finally, it is worthwhile to note that it is often recommended that after one-hot
encoding a feature, we drop one of the one-hot encoded features in the resulting
matrix to avoid linear dependence.
