**Encoding** is the process of `converting categorical data into a numerical format` that can be easily understood and processed by machine learning algorithms. \
This is essential because most algorithms require numerical input to perform calculations and make predictions. \
Common encoding techniques include Label Encoding, One-Hot Encoding, and Ordinal Encoding.   

In [6]:
# Import libraries
import pandas as pd
import seaborn as sns
from sklearn.preprocessing import LabelEncoder, OneHotEncoder, OrdinalEncoder

# Load the sample dataset
df = sns.load_dataset("tips")

In [7]:
# 1. Label Encoding
# Converts categorical values into numeric labels (e.g., 'Lunch' → 0, 'Dinner' → 1)
le = LabelEncoder()
df['time_encoded'] = le.fit_transform(df['time'])

In [8]:
# 2. Ordinal Encoding
# Converts categorical values into ordered numeric values.
# Here, we define the order of days: Thur < Fri < Sat < Sun
ode = OrdinalEncoder(categories=[['Thur', 'Fri', 'Sat', 'Sun']])
df['day_encoded'] = ode.fit_transform(df[['day']])

In [9]:
# 3. One Hot Encoding
# Converts categorical values into multiple binary columns (dummy variables).
# Example: 'Male' → [1,0], 'Female' → [0,1]
ohe = OneHotEncoder()
onehot_encoded = ohe.fit_transform(df[['sex']]).toarray()

In [10]:
# Create a DataFrame for one-hot encoded values
ohe_df = pd.DataFrame(onehot_encoded, columns=ohe.get_feature_names_out(['sex']))

# Concatenate with the original DataFrame
df = pd.concat([df, ohe_df], axis=1)

# Display the first few rows
print(df.head())

   total_bill   tip     sex smoker  day    time  size  time_encoded  \
0       16.99  1.01  Female     No  Sun  Dinner     2             0   
1       10.34  1.66    Male     No  Sun  Dinner     3             0   
2       21.01  3.50    Male     No  Sun  Dinner     3             0   
3       23.68  3.31    Male     No  Sun  Dinner     2             0   
4       24.59  3.61  Female     No  Sun  Dinner     4             0   

   day_encoded  sex_Female  sex_Male  
0          3.0         1.0       0.0  
1          3.0         0.0       1.0  
2          3.0         0.0       1.0  
3          3.0         0.0       1.0  
4          3.0         1.0       0.0  
