[![Open in Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/justmarkham/scikit-learn-tips/master?filepath=notebooks%2F06_encode_categorical_features.ipynb)

[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/justmarkham/scikit-learn-tips/blob/master/notebooks/06_encode_categorical_features.ipynb)

# 🤖⚡ scikit-learn tip #6 ([video](https://www.youtube.com/watch?v=0w78CHM_ubM&list=PL5-da3qGB5ID7YYAqireYEew2mWVvgmj6&index=6))

Two common ways to encode categorical features:

- OneHotEncoder for unordered (nominal) data
- OrdinalEncoder for ordered (ordinal) data

See example 👇

P.S. LabelEncoder is for labels, not features!

In [2]:
import pandas as pd
X = pd.DataFrame({'Shape':['square', 'square', 'oval', 'circle'],
                  'Class': ['third', 'first', 'second', 'third'],
                  'Size': ['S', 'S', 'L', 'XL']})

In [3]:
# "Shape" is unordered, "Class" and "Size" are ordered
X

Unnamed: 0,Shape,Class,Size
0,square,third,S
1,square,first,S
2,oval,second,L
3,circle,third,XL


In [5]:
from sklearn.preprocessing import OneHotEncoder, OrdinalEncoder

In [6]:
# left-to-right column order is alphabetical (circle, oval, square)
ohe = OneHotEncoder(sparse=False)
ohe.fit_transform(X[['Shape']])



array([[0., 0., 1.],
       [0., 0., 1.],
       [0., 1., 0.],
       [1., 0., 0.]])

In [8]:
# category ordering (within each feature) is defined by you
oe = OrdinalEncoder(categories=[['first', 'second', 'third'], ['S', 'M', 'L', 'XL']])
oe.fit_transform(X[['Class', 'Size']])

array([[2., 0.],
       [0., 0.],
       [1., 2.],
       [2., 3.]])

### Want more tips? [View all tips on GitHub](https://github.com/justmarkham/scikit-learn-tips) or [Sign up to receive 2 tips by email every week](https://scikit-learn.tips) 💌

© 2020 [Data School](https://www.dataschool.io). All rights reserved.