## 🧠 One-Hot and Label Encoding Tutorial
Welcome to this educational notebook on how to perform **One-hot Encoding** and **Label Encoding** using different Python libraries. We'll use the dataset 📁 `historical_record.csv` from the ISE518 course.

We'll cover:
- 📦 `pandas`
- ⚙️ `Scikit-Learn`
- 🧰 `Feature-engine`
- 🔧 `Featuretools`

Let's get started!

In [2]:
# 📥 Load the dataset
import pandas as pd

url = 'https://raw.githubusercontent.com/Dr-AlaaKhamis/ISE518/refs/heads/main/5_Datafication/data/historical/historical_record.csv'
df = pd.read_csv(url)
df.head()

Unnamed: 0,timestamp,machine_id,temperature,vibration,humidity,pressure,energy_consumption,machine_status,anomaly_flag,predicted_remaining_life,failure_type,downtime_risk,maintenance_required
0,2025-01-01 00:00:00,39,78.61,28.65,79.96,3.73,2.16,1,0,106,Normal,0.0,0
1,2025-01-01 00:01:00,29,68.19,57.28,35.94,3.64,0.69,1,0,320,Normal,0.0,0
2,2025-01-01 00:02:00,15,98.94,50.2,72.06,1.0,2.49,1,1,19,Normal,1.0,1
3,2025-01-01 00:03:00,43,90.91,37.65,30.34,3.15,4.96,1,1,10,Normal,1.0,1
4,2025-01-01 00:04:00,8,72.32,40.69,56.71,2.68,0.63,2,0,65,Vibration Issue,0.0,1


#### 🐼 One-hot and Label Encoding using **pandas**

**Pandas** provides easy-to-use methods for encoding categorical variables:
- `pd.get_dummies()` for one-hot encoding
- `astype('category').cat.codes` for label encoding

In [3]:
# 🔍 Let's encode the 'Machine_Status' column
# One-hot encoding
df_pandas_ohe = pd.get_dummies(df, columns=['machine_status'])

# Label encoding
df['Machine_Status_Label'] = df['machine_status'].astype('category').cat.codes

# Show the result
df_pandas_ohe.head()

Unnamed: 0,timestamp,machine_id,temperature,vibration,humidity,pressure,energy_consumption,anomaly_flag,predicted_remaining_life,failure_type,downtime_risk,maintenance_required,machine_status_0,machine_status_1,machine_status_2
0,2025-01-01 00:00:00,39,78.61,28.65,79.96,3.73,2.16,0,106,Normal,0.0,0,False,True,False
1,2025-01-01 00:01:00,29,68.19,57.28,35.94,3.64,0.69,0,320,Normal,0.0,0,False,True,False
2,2025-01-01 00:02:00,15,98.94,50.2,72.06,1.0,2.49,1,19,Normal,1.0,1,False,True,False
3,2025-01-01 00:03:00,43,90.91,37.65,30.34,3.15,4.96,1,10,Normal,1.0,1,False,True,False
4,2025-01-01 00:04:00,8,72.32,40.69,56.71,2.68,0.63,0,65,Vibration Issue,0.0,1,False,False,True


#### ⚙️ One-hot and Label Encoding using **Scikit-Learn**
`Scikit-Learn` offers:
- `LabelEncoder` for label encoding
- `OneHotEncoder` for one-hot encoding (returns NumPy arrays, so we often convert to DataFrame)

In [None]:
# ⚙️ Scikit-Learn encoding
# !pip install scikit-learn
import sklearn
from sklearn.preprocessing import OneHotEncoder
from packaging import version

if version.parse(sklearn.__version__) >= version.parse("1.2"):
    ohe = OneHotEncoder(sparse_output=False)
else:
    ohe = OneHotEncoder(sparse=False)

encoded = ohe.fit_transform(df[['machine_status']])
df_ohe_sk = pd.DataFrame(encoded, columns=ohe.get_feature_names_out(['machine_status']))
df_ohe_sk.head()

Unnamed: 0,machine_status_0,machine_status_1,machine_status_2
0,0.0,1.0,0.0
1,0.0,1.0,0.0
2,0.0,1.0,0.0
3,0.0,1.0,0.0
4,0.0,0.0,1.0


#### 🧰 One-hot and Label Encoding using **Feature-engine**
`Feature-engine` works well with Pandas pipelines and offers more control over encodings.
- `OneHotEncoder()` for one-hot
- `OrdinalEncoder()` for label encoding

In [None]:
# !pip install feature_engine
from feature_engine.encoding import OneHotEncoder as FEOneHotEncoder
from feature_engine.encoding import OrdinalEncoder as FELabelEncoder

# Ensure column is categorical
df['machine_status'] = df['machine_status'].astype('category')

# One-hot encoding
fe_ohe = FEOneHotEncoder(variables=['machine_status'], drop_last=False)
df_fe_ohe = fe_ohe.fit_transform(df)

# Label encoding
fe_le = FELabelEncoder(encoding_method='arbitrary', variables=['machine_status'])
df_fe_le = fe_le.fit_transform(df)

df_fe_ohe.head()

Unnamed: 0,timestamp,machine_id,temperature,vibration,humidity,pressure,energy_consumption,anomaly_flag,predicted_remaining_life,failure_type,downtime_risk,maintenance_required,Machine_Status_Label,machine_status_1,machine_status_2,machine_status_0
0,2025-01-01 00:00:00,39,78.61,28.65,79.96,3.73,2.16,0,106,Normal,0.0,0,1,1,0,0
1,2025-01-01 00:01:00,29,68.19,57.28,35.94,3.64,0.69,0,320,Normal,0.0,0,1,1,0,0
2,2025-01-01 00:02:00,15,98.94,50.2,72.06,1.0,2.49,1,19,Normal,1.0,1,1,1,0,0
3,2025-01-01 00:03:00,43,90.91,37.65,30.34,3.15,4.96,1,10,Normal,1.0,1,1,1,0,0
4,2025-01-01 00:04:00,8,72.32,40.69,56.71,2.68,0.63,0,65,Vibration Issue,0.0,1,2,0,1,0
