# Module 03: Feature Scaling and Normalization

**Difficulty**: ⭐ Beginner  
**Estimated Time**: 50 minutes  
**Prerequisites**: [Module 02: Encoding Categorical Variables](02_encoding_categorical_variables.ipynb)

## Learning Objectives

By the end of this notebook, you will be able to:

1. Understand why feature scaling is critical for certain algorithms
2. Apply Min-Max scaling to normalize features to [0, 1] range
3. Use StandardScaler to standardize features (mean=0, std=1)
4. Apply RobustScaler to handle outliers effectively
5. Identify which algorithms require scaling and which don't
6. Visualize the impact of scaling on model performance
7. Avoid data leakage when scaling features

## 1. Why Feature Scaling Matters

### The Problem

Features often have vastly different scales:

| Feature | Example Values | Range |
|---------|---------------|-------|
| Age | 25, 40, 65 | 0-100 |
| Income | $30,000, $75,000, $150,000 | 0-500,000 |
| Credit Score | 650, 720, 800 | 300-850 |

**Problem**: Income dominates because its values are 1000x larger!

### Which Algorithms Need Scaling?

#### ✅ NEED Scaling (Distance-Based & Gradient-Based)
- **K-Nearest Neighbors (KNN)**: Uses Euclidean distance
- **Support Vector Machines (SVM)**: Depends on distances
- **Linear/Logistic Regression**: Gradient descent converges faster
- **Neural Networks**: Faster training, better convergence
- **Principal Component Analysis (PCA)**: Variance-based
- **K-Means Clustering**: Distance-based

#### ❌ DON'T NEED Scaling (Tree-Based)
- **Decision Trees**: Split on individual features
- **Random Forests**: Ensemble of decision trees
- **Gradient Boosting (XGBoost, LightGBM)**: Tree-based

### Real-World Impact

Without scaling, a KNN model might:
- Find "neighbors" based mostly on income
- Ignore age completely (scale too small)
- Give poor predictions

## 2. Setup

In [None]:
# Core libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Scaling methods
from sklearn.preprocessing import MinMaxScaler, StandardScaler, RobustScaler, MaxAbsScaler

# Models for comparison
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier

# Evaluation
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.metrics import accuracy_score, classification_report

# Configuration
%matplotlib inline
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")
np.random.seed(42)
pd.set_option('display.max_columns', None)
pd.set_option('display.precision', 3)

print("✓ Libraries imported successfully!")

## 3. Create Dataset to Demonstrate Scaling

Let's create a loan approval dataset with features on different scales.

In [None]:
# Set seed for reproducibility
np.random.seed(42)
n_samples = 1000

# Create features with very different scales
loan_data = pd.DataFrame({
    'age': np.random.randint(22, 70, n_samples),
    'annual_income': np.random.normal(60000, 30000, n_samples).clip(20000, 300000),
    'credit_score': np.random.randint(450, 850, n_samples),
    'loan_amount': np.random.normal(150000, 80000, n_samples).clip(10000, 500000),
    'employment_years': np.random.randint(0, 45, n_samples),
    'existing_debts': np.random.normal(25000, 15000, n_samples).clip(0, 100000)
})

# Create target: loan approved based on multiple factors
approval_score = (
    0.3 * (loan_data['credit_score'] - 450) / 400 +
    0.3 * (loan_data['annual_income'] - 20000) / 280000 +
    0.2 * (loan_data['employment_years'] / 45) +
    0.2 * (1 - loan_data['existing_debts'] / 100000) +
    np.random.normal(0, 0.1, n_samples)
)

loan_data['approved'] = (approval_score > 0.5).astype(int)

print(f"Created dataset with {len(loan_data)} loan applications")
print(f"\nApproval rate: {loan_data['approved'].mean():.1%}")
print(f"\nFirst few rows:")
loan_data.head()

In [None]:
# Analyze feature scales
features = ['age', 'annual_income', 'credit_score', 'loan_amount', 'employment_years', 'existing_debts']

print("Feature Statistics (showing different scales):\n")
stats = loan_data[features].describe().loc[['min', 'max', 'mean', 'std']]
print(stats)

print("\n⚠️  Notice the huge scale differences:")
print(f"  - age: {loan_data['age'].min():.0f} to {loan_data['age'].max():.0f}")
print(f"  - annual_income: ${loan_data['annual_income'].min():.0f} to ${loan_data['annual_income'].max():.0f}")
print(f"  - credit_score: {loan_data['credit_score'].min():.0f} to {loan_data['credit_score'].max():.0f}")

In [None]:
# Visualize scale differences
fig, axes = plt.subplots(2, 3, figsize=(15, 8))
axes = axes.ravel()

for idx, feature in enumerate(features):
    axes[idx].hist(loan_data[feature], bins=30, edgecolor='black', alpha=0.7)
    axes[idx].set_title(f'{feature}\n(Range: {loan_data[feature].min():.0f} - {loan_data[feature].max():.0f})')
    axes[idx].set_xlabel('Value')
    axes[idx].set_ylabel('Frequency')

plt.tight_layout()
plt.show()

print("Notice how different the x-axis scales are!")
print("This will cause problems for distance-based algorithms.")

## 4. Method 1: Min-Max Scaling (Normalization)

**Formula**: $X_{scaled} = \frac{X - X_{min}}{X_{max} - X_{min}}$

**Result**: All features scaled to [0, 1] range

**When to use**:
- When you need bounded values [0, 1]
- For neural networks (activation functions work well in this range)
- When data doesn't have outliers

**Pros**: 
- Preserves zero values
- Bounded range
- Intuitive

**Cons**: 
- Sensitive to outliers
- Doesn't center data

In [None]:
# Prepare data for scaling
X = loan_data[features]
y = loan_data['approved']

# IMPORTANT: Split BEFORE scaling!
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

print(f"Training set: {len(X_train)} samples")
print(f"Test set: {len(X_test)} samples")

In [None]:
# Apply Min-Max Scaling
minmax_scaler = MinMaxScaler()

# Fit on training data only!
X_train_minmax = minmax_scaler.fit_transform(X_train)

# Transform test data using training statistics
X_test_minmax = minmax_scaler.transform(X_test)

# Convert back to DataFrame for visualization
X_train_minmax_df = pd.DataFrame(X_train_minmax, columns=features, index=X_train.index)
X_test_minmax_df = pd.DataFrame(X_test_minmax, columns=features, index=X_test.index)

print("Min-Max Scaling Applied\n")
print("Original training data (first 3 rows):")
print(X_train.head(3))
print("\nScaled training data (first 3 rows):")
print(X_train_minmax_df.head(3))

print("\nScaled data statistics:")
print(X_train_minmax_df.describe().loc[['min', 'max', 'mean']])
print("\n✓ All features now in [0, 1] range!")

## 5. Method 2: Standardization (Z-Score Normalization)

**Formula**: $X_{scaled} = \frac{X - \mu}{\sigma}$

where $\mu$ = mean, $\sigma$ = standard deviation

**Result**: Mean = 0, Standard Deviation = 1

**When to use**:
- Most common scaling method
- When features are approximately normally distributed
- For algorithms assuming zero-centered data (SVM, Logistic Regression)

**Pros**: 
- Less sensitive to outliers than Min-Max
- Maintains shape of original distribution
- Works well with PCA

**Cons**: 
- No bounded range
- Assumes data is roughly Gaussian

In [None]:
# Apply Standardization
standard_scaler = StandardScaler()

# Fit on training data
X_train_standard = standard_scaler.fit_transform(X_train)
X_test_standard = standard_scaler.transform(X_test)

# Convert to DataFrame
X_train_standard_df = pd.DataFrame(X_train_standard, columns=features, index=X_train.index)
X_test_standard_df = pd.DataFrame(X_test_standard, columns=features, index=X_test.index)

print("Standardization (Standard Scaling) Applied\n")
print("Scaled training data (first 3 rows):")
print(X_train_standard_df.head(3))

print("\nScaled data statistics:")
print(X_train_standard_df.describe().loc[['mean', 'std', 'min', 'max']])
print("\n✓ Mean ≈ 0, Standard Deviation ≈ 1 for all features!")

## 6. Method 3: Robust Scaling

**Formula**: $X_{scaled} = \frac{X - median}{IQR}$

where IQR = Interquartile Range (75th percentile - 25th percentile)

**When to use**:
- When data has outliers
- When outliers are meaningful (not errors)
- More robust than StandardScaler

**Pros**: 
- Not affected by outliers
- Uses median instead of mean

**Cons**: 
- Not as common
- Doesn't guarantee bounded range

In [None]:
# Create dataset WITH outliers to demonstrate
X_with_outliers = X_train.copy()

# Add some extreme outliers
outlier_indices = np.random.choice(X_with_outliers.index, 20, replace=False)
X_with_outliers.loc[outlier_indices, 'annual_income'] *= 5  # Extreme incomes
X_with_outliers.loc[outlier_indices, 'existing_debts'] *= 10  # Extreme debts

print("Dataset with outliers:")
print(X_with_outliers[['annual_income', 'existing_debts']].describe())
print("\nNotice the max values are now much larger!")

In [None]:
# Compare StandardScaler vs RobustScaler on data with outliers

# StandardScaler (affected by outliers)
standard_with_outliers = StandardScaler().fit_transform(X_with_outliers)
standard_with_outliers_df = pd.DataFrame(standard_with_outliers, columns=features, index=X_with_outliers.index)

# RobustScaler (robust to outliers)
robust_scaler = RobustScaler()
robust_scaled = robust_scaler.fit_transform(X_with_outliers)
robust_scaled_df = pd.DataFrame(robust_scaled, columns=features, index=X_with_outliers.index)

# Compare income scaling
fig, axes = plt.subplots(1, 3, figsize=(15, 4))

axes[0].hist(X_with_outliers['annual_income'], bins=30, edgecolor='black')
axes[0].set_title('Original (with outliers)')
axes[0].set_xlabel('Annual Income')

axes[1].hist(standard_with_outliers_df['annual_income'], bins=30, edgecolor='black', color='orange')
axes[1].set_title('StandardScaler\n(outliers compress normal data)')
axes[1].set_xlabel('Scaled Value')

axes[2].hist(robust_scaled_df['annual_income'], bins=30, edgecolor='black', color='green')
axes[2].set_title('RobustScaler\n(outliers have less impact)')
axes[2].set_xlabel('Scaled Value')

plt.tight_layout()
plt.show()

print("RobustScaler is less affected by outliers because it uses median and IQR")
print("instead of mean and standard deviation.")

## 7. Impact on Model Performance

Let's demonstrate how scaling affects different algorithms.

In [None]:
# Compare models with and without scaling

# Prepare datasets
datasets = {
    'No Scaling': (X_train, X_test),
    'Min-Max': (X_train_minmax, X_test_minmax),
    'Standard': (X_train_standard, X_test_standard)
}

# Models to test
models = {
    'KNN': KNeighborsClassifier(n_neighbors=5),
    'Logistic Regression': LogisticRegression(max_iter=1000, random_state=42),
    'SVM': SVC(kernel='rbf', random_state=42),
    'Random Forest': RandomForestClassifier(n_estimators=100, random_state=42, max_depth=10)
}

# Store results
results = []

for scaling_method, (X_tr, X_te) in datasets.items():
    for model_name, model in models.items():
        # Train
        model.fit(X_tr, y_train)
        
        # Evaluate
        y_pred = model.predict(X_te)
        accuracy = accuracy_score(y_test, y_pred)
        
        results.append({
            'Scaling': scaling_method,
            'Model': model_name,
            'Accuracy': accuracy
        })

# Create results DataFrame
results_df = pd.DataFrame(results)
results_pivot = results_df.pivot(index='Model', columns='Scaling', values='Accuracy')

print("Model Performance Comparison:\n")
print(results_pivot)

print("\nKey Observations:")
print("1. KNN, Logistic Regression, and SVM benefit greatly from scaling")
print("2. Random Forest performs similarly with or without scaling")
print("3. Standard scaling often performs best for distance-based algorithms")

In [None]:
# Visualize the impact
fig, ax = plt.subplots(figsize=(12, 6))

results_pivot.plot(kind='bar', ax=ax, width=0.8)
ax.set_ylabel('Accuracy')
ax.set_title('Impact of Feature Scaling on Different Algorithms')
ax.set_xlabel('Algorithm')
ax.legend(title='Scaling Method')
ax.set_ylim([0.5, 1.0])
ax.grid(axis='y', alpha=0.3)

# Add value labels on bars
for container in ax.containers:
    ax.bar_label(container, fmt='%.3f', padding=3)

plt.tight_layout()
plt.show()

## 8. Visualizing Scaled vs Unscaled Data

Let's visualize how scaling affects the feature space for KNN.

In [None]:
# Select two features for 2D visualization
feature1 = 'age'
feature2 = 'annual_income'

# Get subset of data
sample_indices = np.random.choice(X_train.index, 200, replace=False)
X_sample = X_train.loc[sample_indices]
y_sample = y_train.loc[sample_indices]

# Scale the sample
scaler = StandardScaler()
X_sample_scaled = scaler.fit_transform(X_sample)
X_sample_scaled_df = pd.DataFrame(X_sample_scaled, columns=features, index=sample_indices)

# Visualize
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Unscaled
axes[0].scatter(X_sample[feature1], X_sample[feature2], 
                c=y_sample, cmap='coolwarm', alpha=0.6, edgecolors='black')
axes[0].set_xlabel(feature1)
axes[0].set_ylabel(feature2)
axes[0].set_title('Unscaled Features\n(annual_income dominates the scale)')
axes[0].grid(True, alpha=0.3)

# Scaled
axes[1].scatter(X_sample_scaled_df[feature1], X_sample_scaled_df[feature2],
                c=y_sample, cmap='coolwarm', alpha=0.6, edgecolors='black')
axes[1].set_xlabel(f'{feature1} (scaled)')
axes[1].set_ylabel(f'{feature2} (scaled)')
axes[1].set_title('Scaled Features\n(both features equally weighted)')
axes[1].grid(True, alpha=0.3)
axes[1].set_aspect('equal')

plt.tight_layout()
plt.show()

print("Notice:")
print("- Left plot: annual_income has much larger range, dominates distance calculations")
print("- Right plot: both features have similar scale, both contribute equally to distance")

## 9. When to Use Which Scaler?

| Scaler | When to Use | Pros | Cons |
|--------|-------------|------|------|
| **MinMaxScaler** | - Neural networks<br>- Algorithms needing bounded values<br>- No outliers | - Bounded [0,1]<br>- Preserves zero<br>- Intuitive | - Sensitive to outliers |
| **StandardScaler** | - Most common choice<br>- SVM, Logistic Regression<br>- PCA<br>- Gaussian features | - Less sensitive to outliers than MinMax<br>- Centers data<br>- Maintains shape | - No bounded range<br>- Assumes Gaussian |
| **RobustScaler** | - Data has outliers<br>- Outliers are meaningful | - Very robust to outliers<br>- Uses median/IQR | - Less common<br>- No bounded range |
| **MaxAbsScaler** | - Sparse data<br>- Don't want to center | - Preserves sparsity<br>- Bounded [-1,1] | - Sensitive to outliers |
| **None** | - Tree-based models<br>- Features already on same scale | - Simpler<br>- Faster | - Can't use with distance-based models |

## 10. Best Practices

### ✅ DO:

1. **Always split BEFORE scaling**
   ```python
   X_train, X_test = train_test_split(X, y)
   scaler.fit(X_train)  # Fit on training only
   X_train_scaled = scaler.transform(X_train)
   X_test_scaled = scaler.transform(X_test)
   ```

2. **Scale features for**:
   - KNN, SVM, Neural Networks
   - Logistic/Linear Regression (faster convergence)
   - PCA, K-Means

3. **Use StandardScaler as default**
   - Most common and works well
   - Switch to RobustScaler if outliers
   - Switch to MinMaxScaler for neural networks

4. **Scale in production**
   - Save fitted scaler
   - Apply same transformation to new data

### ❌ DON'T:

1. **Don't fit scaler on test data** (data leakage!)
2. **Don't scale tree-based models** (unnecessary)
3. **Don't forget to scale new data** in production
4. **Don't scale target variable** (usually)
5. **Don't scale binary features** (0/1) - often not needed

## 11. Exercise Section

### Exercise 1: Identify Which Models Need Scaling

For each algorithm, determine if scaling is needed.

In [None]:
# Exercise 1: Which algorithms need scaling?

algorithms = {
    'A': 'K-Nearest Neighbors (KNN)',
    'B': 'Decision Tree',
    'C': 'Support Vector Machine (SVM)',
    'D': 'Random Forest',
    'E': 'Neural Network',
    'F': 'Gradient Boosting (XGBoost)',
    'G': 'Logistic Regression',
    'H': 'Principal Component Analysis (PCA)'
}

print("Which algorithms NEED scaling?\n")
for key, algo in algorithms.items():
    print(f"{key}. {algo}")

print("\nYour answers (list letters that NEED scaling):")
# needs_scaling = []
# doesnt_need = []

In [None]:
# Solution to Exercise 1

print("Solutions:\n")
print("✅ NEEDS Scaling (distance or gradient-based):")
print("  A - KNN: Uses distance calculations")
print("  C - SVM: Distance-based decision boundary")
print("  E - Neural Network: Gradient descent converges faster")
print("  G - Logistic Regression: Gradient descent optimization")
print("  H - PCA: Variance-based, sensitive to scale")

print("\n❌ DOESN'T NEED Scaling (tree-based):")
print("  B - Decision Tree: Splits on individual features")
print("  D - Random Forest: Ensemble of decision trees")
print("  F - Gradient Boosting: Tree-based algorithm")

print("\nKey Rule: Distance-based and gradient-based → SCALE")
print("          Tree-based → NO SCALING NEEDED")

### Exercise 2: Apply the Right Scaler

Choose and apply the appropriate scaler for each scenario.

In [None]:
# Exercise 2: Choose the right scaler

# Scenario 1: House prices with outliers (some mansions worth $10M+)
house_data = pd.DataFrame({
    'sqft': [1000, 1500, 2000, 2500, 15000],  # Last one is a mansion
    'bedrooms': [2, 3, 3, 4, 10],
    'age': [10, 20, 5, 15, 50]
})

print("Scenario 1: House prices with outliers")
print(house_data)
print("\nWhich scaler should you use?")
print("A) MinMaxScaler")
print("B) StandardScaler")
print("C) RobustScaler")
print("\nYour answer: ???")

# TODO: Apply the correct scaler
# scaler = ???
# house_scaled = ???

In [None]:
# Solution to Exercise 2

print("Solution: C) RobustScaler\n")
print("Reason: Data has a clear outlier (15,000 sqft mansion)")
print("RobustScaler uses median and IQR, which are robust to outliers\n")

# Apply RobustScaler
robust_scaler = RobustScaler()
house_scaled = robust_scaler.fit_transform(house_data)
house_scaled_df = pd.DataFrame(house_scaled, columns=house_data.columns)

print("Original data:")
print(house_data)
print("\nScaled with RobustScaler:")
print(house_scaled_df)

# Compare with StandardScaler
standard_scaler = StandardScaler()
house_standard = standard_scaler.fit_transform(house_data)
house_standard_df = pd.DataFrame(house_standard, columns=house_data.columns)

print("\nScaled with StandardScaler (for comparison):")
print(house_standard_df)
print("\nNotice: StandardScaler is more affected by the outlier!")

### Exercise 3: Detect Data Leakage in Scaling

Find and fix the data leakage problem.

In [None]:
# Exercise 3: Fix the data leakage

print("Problematic code:")
print('''
# Load data
X = data.drop('target', axis=1)
y = data['target']

# Scale features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Split data
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y)

# Train model
model.fit(X_train, y_train)
''')

print("\nWhat's wrong with this code?")
print("How would you fix it?")
# Your answer:

In [None]:
# Solution to Exercise 3

print("Problem: DATA LEAKAGE!\n")
print("The scaler is fit on ALL data (including test set) before splitting.")
print("This means test set statistics (mean, std) leak into training.\n")

print("Why this is bad:")
print("- Scaler uses mean/std from ENTIRE dataset")
print("- Test set influences the scaling")
print("- Model performance appears better than it actually is")
print("- Won't work correctly in production\n")

print("Correct approach:")
print('''
# Load data
X = data.drop('target', axis=1)
y = data['target']

# Split FIRST!
X_train, X_test, y_train, y_test = train_test_split(X, y)

# Fit scaler on training data only
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)

# Apply same transformation to test data
X_test_scaled = scaler.transform(X_test)  # Note: transform, not fit_transform!

# Train model
model.fit(X_train_scaled, y_train)
''')

print("\n✓ Key: Split → Fit on train → Transform both train and test")

### Exercise 4: Scaling Impact Experiment

Demonstrate the impact of scaling on KNN performance.

In [None]:
# Exercise 4: Measure scaling impact on KNN

# Create a simple dataset
np.random.seed(42)
n = 200

exercise_data = pd.DataFrame({
    'feature1': np.random.uniform(0, 10, n),      # Small scale
    'feature2': np.random.uniform(0, 10000, n),   # Large scale
    'feature3': np.random.uniform(0, 100, n)      # Medium scale
})

# Target depends equally on all three features
exercise_data['target'] = (
    (exercise_data['feature1'] > 5).astype(int) +
    (exercise_data['feature2'] > 5000).astype(int) +
    (exercise_data['feature3'] > 50).astype(int)
) >= 2
exercise_data['target'] = exercise_data['target'].astype(int)

print("Dataset created with features on different scales:")
print(exercise_data.describe())

# TODO: 
# 1. Train KNN on unscaled data
# 2. Train KNN on scaled data (StandardScaler)
# 3. Compare accuracy
# 4. Which performs better and why?

# Your code here:

In [None]:
# Solution to Exercise 4

X = exercise_data.drop('target', axis=1)
y = exercise_data['target']

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# 1. KNN on unscaled data
knn_unscaled = KNeighborsClassifier(n_neighbors=5)
knn_unscaled.fit(X_train, y_train)
accuracy_unscaled = knn_unscaled.score(X_test, y_test)

# 2. KNN on scaled data
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

knn_scaled = KNeighborsClassifier(n_neighbors=5)
knn_scaled.fit(X_train_scaled, y_train)
accuracy_scaled = knn_scaled.score(X_test_scaled, y_test)

# 3. Compare
print("Results:\n")
print(f"KNN without scaling: {accuracy_unscaled:.3f}")
print(f"KNN with scaling:    {accuracy_scaled:.3f}")
print(f"\nImprovement: {(accuracy_scaled - accuracy_unscaled):.3f} ({(accuracy_scaled - accuracy_unscaled)/accuracy_unscaled*100:.1f}%)")

print("\nWhy scaling helps:")
print("- feature2 (0-10,000) dominates distance calculations without scaling")
print("- feature1 (0-10) is essentially ignored")
print("- With scaling, all features contribute equally")
print("- KNN can now use all features to find true neighbors")

## 12. Summary

### Key Takeaways

1. **Feature scaling is critical for distance-based algorithms**
   - KNN, SVM, Neural Networks NEED scaling
   - Tree-based models DON'T need scaling
   - Can improve accuracy by 10-30%+

2. **Three main scaling methods**:
   - **MinMaxScaler**: [0, 1] range, good for neural networks
   - **StandardScaler**: mean=0, std=1, most common choice
   - **RobustScaler**: median/IQR, robust to outliers

3. **Critical rule: Split BEFORE scaling**
   - Fit scaler on training data only
   - Transform both train and test with same scaler
   - Prevents data leakage

4. **Choosing the right scaler**:
   - Default: StandardScaler
   - Outliers: RobustScaler
   - Neural networks: MinMaxScaler
   - Tree models: No scaling needed

5. **Production considerations**:
   - Save fitted scaler with model
   - Apply same transformation to new data
   - Never refit scaler in production

### What's Next?

**Module 04**: Polynomial Features and Interactions - Learn how to create new features by combining existing ones

### Additional Resources

- [Sklearn Preprocessing](https://scikit-learn.org/stable/modules/preprocessing.html)
- [Feature Scaling Guide](https://scikit-learn.org/stable/auto_examples/preprocessing/plot_all_scaling.html)
- "Feature Engineering and Selection" by Kuhn & Johnson

---

**Congratulations!** You've completed Module 03. You now know:
- Why feature scaling matters for ML algorithms
- How to apply Min-Max, Standard, and Robust scaling
- When to use each scaling method
- How scaling dramatically improves distance-based algorithms
- How to avoid data leakage during scaling

Ready to create new features? Move to **Module 04: Polynomial Features and Interactions**!