# Notebook 05 — Binning Numerical Features
📁 File name: 05_binning_numerical_features.ipynb

This notebook introduces binning (a.k.a. discretization) — a powerful technique for transforming continuous numerical features into categorical buckets. It’s especially useful in tree-based models, rule-based systems, and improving interpretability.

📒 Notebook Sections
1. Title & Introduction
2. What is Binning?
3. Load & Inspect Numeric Columns
4. Binning with KBinsDiscretizer
5. Strategies: Uniform vs Quantile vs KMeans
6. Visualize Before and After
7. Summary & What’s Next

## 1. Title & Introduction (Markdown)
### 05 — Binning Numerical Features

In this notebook, we’ll learn how to **convert continuous numeric features into bins** (i.e., categories or ranges). This technique is often called:

- Binning
- Discretization
- Bucketing

We'll use `KBinsDiscretizer` from scikit-learn.

## 2. What is Binning? (Markdown)
### What Is Binning?

Binning turns a numeric column (e.g., age) into **categories** like:

- 0–18 → Teen
- 18–35 → Young Adult
- 35–60 → Adult
- 60+ → Senior

This can help when:

- Data is skewed or has outliers
- You want a model that’s easier to interpret
- You're using models that handle categoricals well (trees, rules)


## 3. Load & Inspect Numeric Columns

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv("../data/sample_data.csv")

# Pick a numeric column for binning (e.g., 'Age')
df["Age"].hist(bins=20)
plt.title("Original Age Distribution")
plt.xlabel("Age")
plt.ylabel("Frequency")
plt.show()

## 4. Binning with KBinsDiscretizer

In [None]:
from sklearn.preprocessing import KBinsDiscretizer

# Bin 'Age' into 4 categories
binner = KBinsDiscretizer(n_bins=4, encode="ordinal", strategy="uniform")
df["Age_binned"] = binner.fit_transform(df[["Age"]])

df[["Age", "Age_binned"]].head()

## 5. Try Different Binning Strategies

In [None]:
# Uniform: Equal-width bins
uniform = KBinsDiscretizer(n_bins=4, encode="ordinal", strategy="uniform")
quantile = KBinsDiscretizer(n_bins=4, encode="ordinal", strategy="quantile")
kmeans = KBinsDiscretizer(n_bins=4, encode="ordinal", strategy="kmeans")

df["Age_uniform"] = uniform.fit_transform(df[["Age"]])
df["Age_quantile"] = quantile.fit_transform(df[["Age"]])
df["Age_kmeans"] = kmeans.fit_transform(df[["Age"]])

## 6. Visualize All Bin Results

In [None]:
fig, axes = plt.subplots(1, 3, figsize=(15, 4))

axes[0].hist(df["Age_uniform"], bins=4)
axes[0].set_title("Uniform Bins")

axes[1].hist(df["Age_quantile"], bins=4)
axes[1].set_title("Quantile Bins")

axes[2].hist(df["Age_kmeans"], bins=4)
axes[2].set_title("KMeans Bins")

plt.tight_layout()
plt.show()

## 7. Summary & What’s Next (Markdown)
###  Summary

- Binning (discretization) converts numeric features into categories.
- We used **KBinsDiscretizer** with:
  - Uniform (equal width)
  - Quantile (equal frequency)
  - KMeans (based on clustering)
- These techniques can help when dealing with outliers or preparing data for interpretable models.

**Next Up**: `06_polynomial_features.ipynb`  
We’ll explore how to create interaction terms and polynomial features to enrich our models.
