- 1. Set Up the Environment
First, ensure you have pandas installed and create a sample DataFrame with a "Grade" column containing the categorical values you want to encode.



In [43]:
import pandas as pd
df= pd.read_csv('../data/student_performance_large_dataset.csv')

In [44]:
df.head()

Unnamed: 0,Student_ID,Age,Gender,Study_Hours_per_Week,Preferred_Learning_Style,Online_Courses_Completed,Participation_in_Discussions,Assignment_Completion_Rate (%),Exam_Score (%),Attendance_Rate (%),Use_of_Educational_Tech,Self_Reported_Stress_Level,Time_Spent_on_Social_Media (hours/week),Sleep_Hours_per_Night,Final_Grade
0,S00001,18,Female,48,Kinesthetic,14,Yes,100,69,66,Yes,High,9,8,C
1,S00002,29,Female,30,Reading/Writing,20,No,71,40,57,Yes,Medium,28,8,D
2,S00003,20,Female,47,Kinesthetic,11,No,60,43,79,Yes,Low,13,7,D
3,S00004,23,Female,13,Auditory,0,Yes,63,70,60,Yes,Low,24,10,B
4,S00005,19,Female,24,Auditory,19,Yes,59,63,93,Yes,Medium,26,8,C


- Using cut for Binning Numerical Data

cut is used to bin numerical data into discrete intervals. Let's use it to categorize the Age column into age groups:

In [45]:
bins = [17, 20, 25, 30, 40]  # Define age bins
labels = ['18-20', '21-25', '26-30', '31-40']  # Labels for the bins

df['Age_Group'] = pd.cut(df['Age'], bins=bins, labels=labels, right=False)
df[['Age', 'Age_Group']].head()

Unnamed: 0,Age,Age_Group
0,18,18-20
1,29,26-30
2,20,21-25
3,23,21-25
4,19,18-20


- Using qcut for Quantile-Based Binning

qcut is similar to cut, but it bins data into quantiles (equal-sized bins). Let's use it to categorize Study_Hours_per_Week into quartiles:

In [46]:
df['Study_Hours_Quartile'] = pd.qcut(df['Study_Hours_per_Week'], q=4, labels=['Low', 'Medium', 'High', 'Very High'])
df[['Study_Hours_per_Week', 'Study_Hours_Quartile']].head()

Unnamed: 0,Study_Hours_per_Week,Study_Hours_Quartile
0,48,Very High
1,30,High
2,47,Very High
3,13,Low
4,24,Medium


- Using map for Value Mapping

map is used to transform values in a Series based on a mapping dictionary or a function. Let's use it to convert the Gender column to numerical values:

In [47]:
((df.Gender == 'F')+0).head()

0    0
1    0
2    0
3    0
4    0
Name: Gender, dtype: int64

In [48]:
gender_mapping = {'Female': 0, 'Male': 1}
df['Gender_Numerical'] = df['Gender'].map(gender_mapping)
df[['Gender', 'Gender_Numerical']].head()

Unnamed: 0,Gender,Gender_Numerical
0,Female,0.0
1,Female,0.0
2,Female,0.0
3,Female,0.0
4,Female,0.0


- Using map for Custom Transformations

You can also use map with a function to perform custom transformations. Let's create a simplified stress level category based on Self_Reported_Stress_Level:

In [49]:
df['Self_Reported_Stress_Level'].unique()

array(['High', 'Medium', 'Low'], dtype=object)

In [50]:

df['Stress_Simplified'] = df['Self_Reported_Stress_Level'].map({'High': "Stressed", 'Medium':"Moderate", "Low":"LowS_tress"})
df[['Self_Reported_Stress_Level', 'Stress_Simplified']].sample(5)

Unnamed: 0,Self_Reported_Stress_Level,Stress_Simplified
3209,Medium,Moderate
9495,Low,LowS_tress
7997,Low,LowS_tress
5329,High,Stressed
945,Medium,Moderate


mportant Notes:

Data Exploration: Before using these functions, explore your data to understand its distribution and characteristics.
- Binning: Choose appropriate bins for cut and qcut based on your data and the desired categories.
- Mapping: Ensure that your mapping dictionary or function covers all possible values in the column you are transforming.
- Error Handling: If you encounter errors, check for typos in column names or mapping values.
- Data Types: Be mindful of data types. cut and qcut work best with numerical data, while map can be used with any data type.
- Missing Values: Handle missing values appropriately before or after applying these transformations.