You have an ordinal categorical feature (e.g., high, medium, low).
Use pandas DataFrame’s replace method to transform string labels to numerical
equivalents


In [13]:
# Load library
import pandas as pd
# Create features
dataframe = pd.DataFrame({"Score": ["Low", "Low", "Medium", "Medium", "High"]})
dataframe


Unnamed: 0,Score
0,Low
1,Low
2,Medium
3,Medium
4,High


In [14]:
scale_mapper={
    'Low': 1,
    'Medium':2,
    'High':3

}

In [15]:
dataframe.replace(scale_mapper)

Unnamed: 0,Score
0,1
1,1
2,2
3,2
4,3


In [16]:
dataframe['Score'].replace(scale_mapper)

0    1
1    1
2    2
3    2
4    3
Name: Score, dtype: int64

Often we have a feature with classes that have some kind of natural ordering. A
famous example is the Likert scale:
- Strongly Agree
- Agree
- Neutral
- Disagree
- Strongly Disagree
When encoding the feature for use in machine learning, we need to transform the
ordinal classes into numerical values that maintain the notion of ordering. The
most common approach is to create a dictionary that maps the string label of the
class to a number and then apply that map to the feature.

It is important that our choice of numeric values is based on our prior
information on the ordinal classes. In our solution, high is literally three times
larger than low. This is fine in any instances, but can break down if the assumed
intervals between the classes are not equal:



In [17]:
dataframe = pd.DataFrame({"Score": ["Low",
"Low",
"Medium",
"Medium",
"High",
"Barely More Than Medium"]})

scale_mapper = {"Low":1,
"Medium":2,
"Barely More Than Medium": 3,
"High":4}

dataframe["Score"].replace(scale_mapper)

0    1
1    1
2    2
3    2
4    4
5    3
Name: Score, dtype: int64

In this example, the distance between Low and Medium is the same as the distance
between Medium and Barely More Than Medium, which is almost certainly not
accurate. The best approach is to be conscious about the numerical values
mapped to classes:


In [18]:
scale_mapper = {"Low":1,
"Medium":2,
"Barely More Than Medium": 2.1,
"High":3}
dataframe["Score"].replace(scale_mapper)

0    1.0
1    1.0
2    2.0
3    2.0
4    3.0
5    2.1
Name: Score, dtype: float64