### 📦 Feature Transformation: Binning (Discretization)
In this notebook, we'll explore how to apply **binning** (also known as **discretization**) to continuous numerical features using:
- 🐼 `pandas`
- ⚙️ `scikit-learn`
- 🧰 `feature-engine`

**Binning** is the process of converting continuous variables into discrete intervals (bins), which can help improve interpretability or prepare features for models that prefer categorical inputs.

We'll use the `historical_record.csv` dataset from the ISE518 course.

In [1]:
# 📥 Import required libraries
import pandas as pd
import numpy as np
from sklearn.preprocessing import KBinsDiscretizer
from feature_engine.discretisation import EqualWidthDiscretiser, EqualFrequencyDiscretiser

In [2]:
# 📄 Load the dataset
url = 'https://raw.githubusercontent.com/Dr-AlaaKhamis/ISE518/refs/heads/main/5_Datafication/data/historical/historical_record.csv'
df = pd.read_csv(url)
df.columns = df.columns.str.strip()
df.head()

Unnamed: 0,timestamp,machine_id,temperature,vibration,humidity,pressure,energy_consumption,machine_status,anomaly_flag,predicted_remaining_life,failure_type,downtime_risk,maintenance_required
0,2025-01-01 00:00:00,39,78.61,28.65,79.96,3.73,2.16,1,0,106,Normal,0.0,0
1,2025-01-01 00:01:00,29,68.19,57.28,35.94,3.64,0.69,1,0,320,Normal,0.0,0
2,2025-01-01 00:02:00,15,98.94,50.2,72.06,1.0,2.49,1,1,19,Normal,1.0,1
3,2025-01-01 00:03:00,43,90.91,37.65,30.34,3.15,4.96,1,1,10,Normal,1.0,1
4,2025-01-01 00:04:00,8,72.32,40.69,56.71,2.68,0.63,2,0,65,Vibration Issue,0.0,1


#### 🔍 Feature to Bin
We will apply binning to the following feature:
- `temperature`

#### 🐼 Binning using `pandas.cut()`
We can use `pd.cut()` to divide the `temperature` column into fixed-width bins.

In [3]:
# 🐼 Create 4 equal-width bins for temperature
df['temp_binned_pandas'] = pd.cut(df['temperature'], bins=4, labels=False)
df[['temperature', 'temp_binned_pandas']].head()

Unnamed: 0,temperature,temp_binned_pandas
0,78.61,1
1,68.19,1
2,98.94,2
3,90.91,2
4,72.32,1


#### ⚙️ Binning using `scikit-learn` (KBinsDiscretizer)
This method can do equal-width, equal-frequency, or k-means binning.
- `strategy='uniform'`: equal-width
- `strategy='quantile'`: equal-frequency
- `strategy='kmeans'`: data-driven clustering

In [4]:
# ⚙️ Equal-frequency binning using scikit-learn
# Configure the binning transformer
freq_discretiser = EqualFrequencyDiscretiser(
    q=4,  # number of quantile bins
    variables=['temperature'],
    return_object=True,         # return as object labels (optional)
    return_boundaries=False     ## avoids the (-inf, ...) bin labels or True to add bin boundaries as new features
)

# Apply transformation
df_freq_binned = freq_discretiser.fit_transform(df.copy())

# View binned output and original column
df_freq_binned[['temperature'] + [col for col in df_freq_binned.columns if 'temperature' in col and col != 'temperature']].head()


Unnamed: 0,temperature
0,2
1,0
2,3
3,3
4,1


#### 🧰 Binning using `feature-engine`
`feature-engine` provides transformers for equal-width and equal-frequency binning.

In [5]:
# 🧰 Equal-width binning
width_discretiser = EqualWidthDiscretiser(bins=4, variables=['temperature'])
df_width_binned = width_discretiser.fit_transform(df.copy())
df_width_binned[['temperature'] + [col for col in df_width_binned.columns if 'temperature' in col and col != 'temperature']].head()

Unnamed: 0,temperature
0,1
1,1
2,2
3,2
4,1


In [6]:
# 🧰 Equal-frequency binning
freq_discretiser = EqualFrequencyDiscretiser(q=4, variables=['temperature'])
df_freq_binned = freq_discretiser.fit_transform(df.copy())
df_freq_binned[['temperature'] + [col for col in df_freq_binned.columns if 'temperature' in col and col != 'temperature']].head()

Unnamed: 0,temperature
0,2
1,0
2,3
3,3
4,1
