- 1. Set Up the Environment
First, ensure you have pandas installed and create a sample DataFrame with a "Grade" column containing the categorical values you want to encode.



In [33]:
import pandas as pd

# Sample DataFrame
data = {
    "Student": ["Alice", "Bob", "Charlie", "David"],
    "Grade": ["1st Class", "2nd Class", "3rd Class", "1st Class"]
}
df = pd.DataFrame(data)

print("Original DataFrame:")
df.head()

Original DataFrame:


Unnamed: 0,Student,Grade
0,Alice,1st Class
1,Bob,2nd Class
2,Charlie,3rd Class
3,David,1st Class


- 2. Define the Mapping Dictionary
You already have the dictionary:



In [34]:
grade_mapping = {"1st Class": 1, "2nd Class": 2, "3rd Class": 3}

- 3. Map the "Grade" Column Using map()
Use the map() function to replace the categorical values in the "Grade" column with their corresponding numerical values from the dictionary.


In [35]:
# Apply the mapping to the "Grade" column
df["Grade_encoded"] = df["Grade"].map(grade_mapping)

print("DataFrame after mapping:")
df

DataFrame after mapping:


Unnamed: 0,Student,Grade,Grade_encoded
0,Alice,1st Class,1
1,Bob,2nd Class,2
2,Charlie,3rd Class,3
3,David,1st Class,1


- 4. Handle Missing Values 
If your DataFrame contains "Grade" values that are not in the dictionary (e.g., "4th Class"), map() will result in NaN for those values. You can handle this by providing a default value using fillna():



In [36]:
# Example with a missing value
data_with_missing = {
    "Student": ["Alice", "Bob", "Charlie", "David", "Eve"],
    "Grade": ["1st Class", "2nd Class", "3rd Class", "1st Class", "4th Class"]
}
df_with_missing = pd.DataFrame(data_with_missing)

# Map the values, then fill NaN with a default value (e.g., -1)
df_with_missing["Grade"] = df_with_missing["Grade"].map(grade_mapping).fillna(-1)

print("DataFrame with missing value handled:")
print(df_with_missing)

DataFrame with missing value handled:
   Student  Grade
0    Alice    1.0
1      Bob    2.0
2  Charlie    3.0
3    David    1.0
4      Eve   -1.0


- 5. Verify the Encoding
To ensure the mapping worked correctly, you can check the unique values or data types in the "Grade" column:



In [37]:
print(f"Unique values in Grade: {df["Grade"].unique()}")
print(f"Data type of Grade:{df["Grade"].dtype}" )

Unique values in Grade: ['1st Class' '2nd Class' '3rd Class']
Data type of Grade:object


- 7. Example with One-Hot Encoding (Optional)
If you want to avoid implying ordinal relationships, use one-hot encoding:



In [38]:
df.drop("Grade_encoded", axis=1, inplace=True)
df

Unnamed: 0,Student,Grade
0,Alice,1st Class
1,Bob,2nd Class
2,Charlie,3rd Class
3,David,1st Class


In [39]:
# One-hot encode the "Grade" column
df_one_hot = pd.get_dummies(df, columns=["Grade"], prefix="Grade")

print("DataFrame with one-hot encoding:")
df_one_hot

DataFrame with one-hot encoding:


Unnamed: 0,Student,Grade_1st Class,Grade_2nd Class,Grade_3rd Class
0,Alice,True,False,False
1,Bob,False,True,False
2,Charlie,False,False,True
3,David,True,False,False


In [40]:
df

Unnamed: 0,Student,Grade
0,Alice,1st Class
1,Bob,2nd Class
2,Charlie,3rd Class
3,David,1st Class


- 8. Additional Notes
Categorical Encoding: This method is a form of label encoding, where categories are mapped to numerical values. For machine learning, be cautious with label encoding, as it can imply ordinal relationships (e.g., 3 > 2 > 1), which may not always be appropriate. If your "Grade" categories are nominal (no inherent order), consider using one-hot encoding instead (pd.get_dummies() or sklearn.preprocessing.OneHotEncoder).

Performance: For large DataFrames, map() is efficient and scales well.

Permanence: The mapping modifies the DataFrame in place (unless you specify inplace=False or assign to a new column). If you want to preserve the original categorical data, create a new column:

