# Module 04: Polynomial Features and Interactions

**Difficulty**: ⭐⭐ Intermediate  
**Estimated Time**: 60 minutes  
**Prerequisites**: [Module 03: Feature Scaling and Normalization](03_feature_scaling_normalization.ipynb)

## Learning Objectives

By the end of this notebook, you will be able to:

1. Understand when linear models need polynomial features
2. Create polynomial features using sklearn's PolynomialFeatures
3. Engineer interaction terms to capture feature relationships
4. Measure the impact of polynomial features on model performance
5. Recognize and avoid the curse of dimensionality
6. Apply feature selection to manage polynomial feature explosion
7. Decide the appropriate polynomial degree for your data

## 1. Why Polynomial Features?

### The Limitation of Linear Models

Linear models assume a **straight-line relationship**:

$$y = w_1x_1 + w_2x_2 + ... + w_nx_n + b$$

**Problem**: Real-world relationships are often **non-linear**!

### Example: House Prices

**Linear model**: 
```
price = w₁ × sqft + w₂ × bedrooms + b
```

**Better with polynomial**:
```
price = w₁ × sqft + w₂ × sqft² + w₃ × bedrooms + w₄ × (sqft × bedrooms) + b
                      ^^^^^^^^                          ^^^^^^^^^^^^^^^^^^^
                   Polynomial term                     Interaction term
```

### What We Create

1. **Polynomial terms**: $x^2, x^3, ...$
   - Capture non-linear relationships
   - Example: Profit might increase exponentially with experience

2. **Interaction terms**: $x_1 \times x_2$
   - Capture relationships between features
   - Example: Large house with many bedrooms is worth MORE than the sum of parts

### Visual Example

```
Linear:        Quadratic:        Cubic:
  y              y                 y
  |  /           |   _             |    __
  | /            |  / \            |   /  \
  |/             | /   \           |  /    \_
  +----x         +------x          +--------x
```

## 2. Setup

In [None]:
# Core libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Polynomial features
from sklearn.preprocessing import PolynomialFeatures, StandardScaler

# Models
from sklearn.linear_model import LinearRegression, Ridge, Lasso
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.metrics import mean_squared_error, r2_score, mean_absolute_error

# Feature selection
from sklearn.feature_selection import SelectKBest, f_regression

# Pipeline
from sklearn.pipeline import Pipeline

# Configuration
%matplotlib inline
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")
np.random.seed(42)
pd.set_option('display.max_columns', None)
pd.set_option('display.precision', 3)

print("✓ Libraries imported successfully!")

## 3. Demonstrating the Need for Polynomial Features

Let's create a dataset with a non-linear relationship.

In [None]:
# Create synthetic data with quadratic relationship
np.random.seed(42)
n_samples = 200

# Feature: years of experience
experience = np.random.uniform(0, 10, n_samples)

# Target: salary has a quadratic relationship with experience
# Early years: rapid salary growth
# Later years: growth slows down
salary = 30000 + 8000 * experience + 500 * experience**2 + np.random.normal(0, 5000, n_samples)

# Create DataFrame
salary_data = pd.DataFrame({
    'experience': experience,
    'salary': salary
})

print(f"Created dataset with {len(salary_data)} samples")
print(f"\nFirst few rows:")
print(salary_data.head())

# Visualize the relationship
plt.figure(figsize=(10, 6))
plt.scatter(salary_data['experience'], salary_data['salary'], alpha=0.5, edgecolors='black')
plt.xlabel('Years of Experience')
plt.ylabel('Salary ($)')
plt.title('Salary vs Experience (Non-Linear Relationship)')
plt.grid(True, alpha=0.3)
plt.show()

print("\nNotice: The relationship is curved, not a straight line!")
print("A linear model will struggle to capture this pattern.")

### Compare Linear vs Polynomial Models

In [None]:
# Prepare data
X = salary_data[['experience']]
y = salary_data['salary']

# Split data
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Model 1: Linear (degree=1)
linear_model = LinearRegression()
linear_model.fit(X_train, y_train)
y_pred_linear = linear_model.predict(X_test)
rmse_linear = np.sqrt(mean_squared_error(y_test, y_pred_linear))
r2_linear = r2_score(y_test, y_pred_linear)

# Model 2: Polynomial (degree=2)
poly = PolynomialFeatures(degree=2, include_bias=False)
X_train_poly = poly.fit_transform(X_train)
X_test_poly = poly.transform(X_test)

poly_model = LinearRegression()
poly_model.fit(X_train_poly, y_train)
y_pred_poly = poly_model.predict(X_test_poly)
rmse_poly = np.sqrt(mean_squared_error(y_test, y_pred_poly))
r2_poly = r2_score(y_test, y_pred_poly)

# Compare
print("Model Performance Comparison:\n")
print(f"Linear Model (degree=1):")
print(f"  RMSE: ${rmse_linear:,.2f}")
print(f"  R²:   {r2_linear:.4f}")
print(f"\nPolynomial Model (degree=2):")
print(f"  RMSE: ${rmse_poly:,.2f}")
print(f"  R²:   {r2_poly:.4f}")
print(f"\nImprovement: {(rmse_linear - rmse_poly)/rmse_linear*100:.1f}% reduction in error!")

In [None]:
# Visualize both models
plt.figure(figsize=(12, 5))

# Generate smooth predictions for visualization
X_plot = np.linspace(0, 10, 100).reshape(-1, 1)
y_plot_linear = linear_model.predict(X_plot)
y_plot_poly = poly_model.predict(poly.transform(X_plot))

# Plot 1: Linear model
plt.subplot(1, 2, 1)
plt.scatter(X_train, y_train, alpha=0.5, label='Training data')
plt.plot(X_plot, y_plot_linear, 'r-', linewidth=2, label='Linear fit')
plt.xlabel('Years of Experience')
plt.ylabel('Salary ($)')
plt.title(f'Linear Model\nR² = {r2_linear:.4f}')
plt.legend()
plt.grid(True, alpha=0.3)

# Plot 2: Polynomial model
plt.subplot(1, 2, 2)
plt.scatter(X_train, y_train, alpha=0.5, label='Training data')
plt.plot(X_plot, y_plot_poly, 'g-', linewidth=2, label='Polynomial fit (degree=2)')
plt.xlabel('Years of Experience')
plt.ylabel('Salary ($)')
plt.title(f'Polynomial Model\nR² = {r2_poly:.4f}')
plt.legend()
plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("Notice: Polynomial model captures the curved relationship much better!")

## 4. Creating Polynomial Features with PolynomialFeatures

Let's understand what PolynomialFeatures actually creates.

In [None]:
# Simple example with 2 features
simple_data = pd.DataFrame({
    'x1': [1, 2, 3],
    'x2': [4, 5, 6]
})

print("Original features:")
print(simple_data)

# Degree 2 polynomial
poly2 = PolynomialFeatures(degree=2, include_bias=False)
features_poly2 = poly2.fit_transform(simple_data)

# Get feature names
feature_names = poly2.get_feature_names_out(['x1', 'x2'])

poly_df = pd.DataFrame(features_poly2, columns=feature_names)

print("\nPolynomial features (degree=2):")
print(poly_df)

print("\nWhat was created:")
print("- x1: Original feature")
print("- x2: Original feature")
print("- x1²: Polynomial term (x1 squared)")
print("- x1 × x2: Interaction term (product of x1 and x2)")
print("- x2²: Polynomial term (x2 squared)")
print(f"\nOriginal: {simple_data.shape[1]} features")
print(f"After polynomial: {poly_df.shape[1]} features")

In [None]:
# What happens with degree=3?
poly3 = PolynomialFeatures(degree=3, include_bias=False)
features_poly3 = poly3.fit_transform(simple_data)
feature_names_3 = poly3.get_feature_names_out(['x1', 'x2'])

print("Polynomial features (degree=3):")
print(f"\nFeature names: {list(feature_names_3)}")
print(f"\nNumber of features: {len(feature_names_3)}")

print("\nIncludes:")
print("- Original: x1, x2")
print("- Degree 2: x1², x1×x2, x2²")
print("- Degree 3: x1³, x1²×x2, x1×x2², x2³")

print("\n⚠️  Warning: Features grow rapidly with degree!")
print(f"   2 features → {len(feature_names_3)} features with degree=3")

## 5. Interaction Features

**Interaction terms** capture how features work together.

### Real-World Examples:

1. **House prices**: 
   - `sqft × location_quality`: Large house in good location = premium price
   - Effect is multiplicative, not additive

2. **Marketing**:
   - `ad_spend × target_audience_size`: More impactful when both are high

3. **Education**:
   - `study_hours × teaching_quality`: Great teaching + effort = best outcomes

In [None]:
# Create dataset where interaction matters
np.random.seed(42)
n = 500

house_features = pd.DataFrame({
    'sqft': np.random.uniform(1000, 4000, n),
    'bedrooms': np.random.randint(2, 6, n),
    'age': np.random.randint(0, 50, n)
})

# Price depends on interaction: sqft × bedrooms
# Large house with many bedrooms is worth MORE than sum of parts
house_features['price'] = (
    100 * house_features['sqft'] +
    50000 * house_features['bedrooms'] +
    30 * house_features['sqft'] * house_features['bedrooms'] +  # Interaction!
    -1000 * house_features['age'] +
    np.random.normal(0, 30000, n)
)

print(f"Created house dataset with {len(house_features)} samples")
print(house_features.head())

In [None]:
# Compare model with and without interaction terms
X = house_features[['sqft', 'bedrooms', 'age']]
y = house_features['price']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Model 1: No interactions
model_no_interaction = LinearRegression()
model_no_interaction.fit(X_train, y_train)
r2_no_interaction = model_no_interaction.score(X_test, y_test)

# Model 2: With interactions (degree=2 includes interactions)
poly_interaction = PolynomialFeatures(degree=2, include_bias=False, interaction_only=True)
X_train_inter = poly_interaction.fit_transform(X_train)
X_test_inter = poly_interaction.transform(X_test)

model_with_interaction = LinearRegression()
model_with_interaction.fit(X_train_inter, y_train)
r2_with_interaction = model_with_interaction.score(X_test_inter, y_test)

print("Impact of Interaction Terms:\n")
print(f"Without interactions: R² = {r2_no_interaction:.4f}")
print(f"With interactions:    R² = {r2_with_interaction:.4f}")
print(f"\nImprovement: {(r2_with_interaction - r2_no_interaction):.4f}")

print("\nInteraction features created:")
print(poly_interaction.get_feature_names_out(['sqft', 'bedrooms', 'age']))
print("\nNote: interaction_only=True creates only interaction terms, not polynomial terms")

## 6. The Curse of Dimensionality

**Problem**: Polynomial features grow **exponentially**!

### Growth Rate

Formula: $\binom{n + d}{d}$ where n = features, d = degree

| Original Features | Degree | Total Features |
|------------------|--------|----------------|
| 5 | 2 | 20 |
| 5 | 3 | 56 |
| 10 | 2 | 65 |
| 10 | 3 | 285 |
| 20 | 2 | 230 |
| 20 | 3 | 1,771 |

**Consequences**:
1. ❌ Computational cost explodes
2. ❌ Risk of overfitting increases
3. ❌ Model becomes hard to interpret
4. ❌ Training time increases dramatically

In [None]:
# Demonstrate feature explosion
def count_polynomial_features(n_features, degree):
    """Count features created by PolynomialFeatures"""
    dummy_data = np.ones((1, n_features))
    poly = PolynomialFeatures(degree=degree, include_bias=False)
    transformed = poly.fit_transform(dummy_data)
    return transformed.shape[1]

# Test different combinations
results = []
for n_features in [5, 10, 15, 20]:
    for degree in [2, 3, 4]:
        total = count_polynomial_features(n_features, degree)
        results.append({
            'Original Features': n_features,
            'Degree': degree,
            'Total Features': total,
            'Multiplier': f'{total/n_features:.1f}x'
        })

results_df = pd.DataFrame(results)
print("Feature Explosion with Polynomial Features:\n")
print(results_df.to_string(index=False))

print("\n⚠️  WARNING: 20 features with degree=4 would create thousands of features!")
print("This leads to overfitting and slow training.")

In [None]:
# Visualize feature explosion
pivot_df = results_df.pivot(index='Original Features', columns='Degree', values='Total Features')

plt.figure(figsize=(10, 6))
pivot_df.plot(kind='bar', width=0.8)
plt.xlabel('Number of Original Features')
plt.ylabel('Total Features After Transformation')
plt.title('Polynomial Feature Explosion')
plt.legend(title='Polynomial Degree')
plt.yscale('log')  # Log scale to see the growth
plt.grid(True, alpha=0.3, which='both')
plt.tight_layout()
plt.show()

print("Notice the exponential growth! (log scale used)")

## 7. Managing Polynomial Features

### Strategies to Avoid Curse of Dimensionality:

1. **Limit degree**: Usually degree=2 or 3 is sufficient
2. **Feature selection**: Select only important polynomial features
3. **Regularization**: Use Ridge/Lasso to penalize complex models
4. **Domain knowledge**: Only create interactions that make sense
5. **interaction_only=True**: Skip polynomial terms, keep only interactions

In [None]:
# Strategy 1: Use Regularization (Ridge/Lasso)
# Regularization penalizes large coefficients, helping with many features

# Create dataset with many features
np.random.seed(42)
n = 300
X_many = pd.DataFrame({
    f'feature_{i}': np.random.randn(n) for i in range(10)
})

# Target depends on only first 3 features
y_many = (
    5 * X_many['feature_0'] + 
    3 * X_many['feature_1']**2 + 
    2 * X_many['feature_0'] * X_many['feature_1'] +
    np.random.normal(0, 1, n)
)

# Split
X_train, X_test, y_train, y_test = train_test_split(X_many, y_many, test_size=0.2, random_state=42)

# Create polynomial features (degree=2)
poly = PolynomialFeatures(degree=2, include_bias=False)
X_train_poly = poly.fit_transform(X_train)
X_test_poly = poly.transform(X_test)

print(f"Original features: {X_train.shape[1]}")
print(f"After polynomial (degree=2): {X_train_poly.shape[1]}")

# Compare models
models = {
    'Linear (no regularization)': LinearRegression(),
    'Ridge (L2 regularization)': Ridge(alpha=1.0),
    'Lasso (L1 regularization)': Lasso(alpha=0.1)
}

print("\nModel Performance on Polynomial Features:\n")
for name, model in models.items():
    model.fit(X_train_poly, y_train)
    r2 = model.score(X_test_poly, y_test)
    print(f"{name:30s} R² = {r2:.4f}")

print("\nRegularization helps prevent overfitting with many polynomial features!")

In [None]:
# Strategy 2: Feature Selection
# Keep only the most important polynomial features

# Select top 20 features
selector = SelectKBest(score_func=f_regression, k=20)
X_train_selected = selector.fit_transform(X_train_poly, y_train)
X_test_selected = selector.transform(X_test_poly)

print(f"Features before selection: {X_train_poly.shape[1]}")
print(f"Features after selection: {X_train_selected.shape[1]}")

# Train model on selected features
model_selected = LinearRegression()
model_selected.fit(X_train_selected, y_train)
r2_selected = model_selected.score(X_test_selected, y_test)

print(f"\nR² with selected features: {r2_selected:.4f}")
print("\nFeature selection reduces dimensionality while maintaining performance!")

## 8. Choosing the Right Polynomial Degree

Let's find the optimal degree by comparing performance.

In [None]:
# Test different polynomial degrees
degrees = range(1, 6)
train_scores = []
test_scores = []

# Use salary data from earlier
X = salary_data[['experience']]
y = salary_data['salary']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

for degree in degrees:
    # Create polynomial features
    poly = PolynomialFeatures(degree=degree, include_bias=False)
    X_train_poly = poly.fit_transform(X_train)
    X_test_poly = poly.transform(X_test)
    
    # Train model
    model = LinearRegression()
    model.fit(X_train_poly, y_train)
    
    # Evaluate
    train_score = model.score(X_train_poly, y_train)
    test_score = model.score(X_test_poly, y_test)
    
    train_scores.append(train_score)
    test_scores.append(test_score)
    
    print(f"Degree {degree}: Train R² = {train_score:.4f}, Test R² = {test_score:.4f}")

# Visualize
plt.figure(figsize=(10, 6))
plt.plot(degrees, train_scores, 'o-', label='Training Score', linewidth=2)
plt.plot(degrees, test_scores, 's-', label='Test Score', linewidth=2)
plt.xlabel('Polynomial Degree')
plt.ylabel('R² Score')
plt.title('Model Performance vs Polynomial Degree')
plt.legend()
plt.grid(True, alpha=0.3)
plt.xticks(degrees)
plt.show()

optimal_degree = degrees[np.argmax(test_scores)]
print(f"\nOptimal degree: {optimal_degree}")
print("\nNotice: After degree 2, test performance doesn't improve (may even decrease!)")
print("This is overfitting - model memorizes training data.")

## 9. Best Practices

### ✅ DO:

1. **Start with degree=2**
   - Often sufficient for most problems
   - Good balance of complexity and interpretability

2. **Scale features BEFORE creating polynomials**
   - Prevents numerical instability
   - Squared large numbers get VERY large!

3. **Use regularization (Ridge/Lasso)**
   - Essential when creating many features
   - Prevents overfitting

4. **Validate on test set**
   - Don't trust training accuracy
   - Watch for overfitting

5. **Consider domain knowledge**
   - Create only meaningful interactions
   - Not all feature pairs interact

### ❌ DON'T:

1. **Don't use degree > 3** (usually)
2. **Don't create polynomials without scaling**
3. **Don't ignore feature explosion**
4. **Don't forget regularization with many features**
5. **Don't use with tree-based models** (they handle non-linearity naturally)

## 10. Exercise Section

### Exercise 1: Identify Non-Linear Relationships

Which scenarios would benefit from polynomial features?

In [None]:
# Exercise 1: Which scenarios need polynomial features?

scenarios = {
    'A': 'Predicting crop yield based on fertilizer amount (too little OR too much is bad)',
    'B': 'Predicting salary based on job title (Junior, Senior, Manager)',
    'C': 'Predicting breaking distance based on car speed (physics: d = v²)',
    'D': 'Predicting house price based on number of bedrooms',
    'E': 'Predicting student test score based on study hours (diminishing returns)'
}

print("Which scenarios would benefit from polynomial features?\n")
for key, scenario in scenarios.items():
    print(f"{key}. {scenario}")

print("\nYour answers (list letters that need polynomial features):")
# needs_polynomial = []

In [None]:
# Solution to Exercise 1

print("Solutions:\n")
print("✅ NEEDS Polynomial Features:")
print("  A - Crop yield: Quadratic relationship (inverted U-shape)")
print("      Too little fertilizer = low yield, optimal amount = high, too much = low")
print("  C - Braking distance: Quadratic relationship (d ∝ v²)")
print("      Distance increases with square of speed")
print("  E - Test scores: Diminishing returns (logarithmic/quadratic)")
print("      First hours of study help a lot, additional hours help less")

print("\n❌ DON'T NEED Polynomial Features:")
print("  B - Job title: Ordinal encoding is sufficient")
print("  D - Bedrooms: Likely linear relationship")

print("\nKey insight: Look for relationships that are curved, not straight!")

### Exercise 2: Create Interaction Features

Create meaningful interaction features for a marketing dataset.

In [None]:
# Exercise 2: Create interaction features

# Marketing campaign dataset
marketing_data = pd.DataFrame({
    'ad_spend': [1000, 5000, 10000, 2000, 8000],
    'email_opens': [100, 500, 800, 200, 600],
    'website_visits': [500, 2000, 4000, 1000, 3000]
})

print("Marketing dataset:")
print(marketing_data)

# TODO: Create interaction features that make business sense
# Which features might interact?
# Hint: High ad spend + high email opens = ?

# Your code here:
# marketing_data['interaction1'] = ???
# marketing_data['interaction2'] = ???

In [None]:
# Solution to Exercise 2

marketing_data_solution = marketing_data.copy()

# Meaningful interactions
marketing_data_solution['spend_x_opens'] = (
    marketing_data_solution['ad_spend'] * marketing_data_solution['email_opens']
)

marketing_data_solution['opens_x_visits'] = (
    marketing_data_solution['email_opens'] * marketing_data_solution['website_visits']
)

marketing_data_solution['engagement_score'] = (
    marketing_data_solution['ad_spend'] * 
    marketing_data_solution['email_opens'] * 
    marketing_data_solution['website_visits'] / 1000000  # Normalize
)

print("Solution:")
print(marketing_data_solution)

print("\nInteractions created:")
print("1. ad_spend × email_opens: Campaigns with both high spend AND high engagement")
print("2. email_opens × website_visits: Email effectiveness leading to site traffic")
print("3. Combined engagement score: Overall campaign effectiveness")

print("\nThese interactions capture synergies between features!")

### Exercise 3: Avoid Overfitting with Polynomial Features

Demonstrate overfitting with high-degree polynomials.

In [None]:
# Exercise 3: Show overfitting with high polynomial degree

# Small dataset (prone to overfitting)
np.random.seed(42)
n_small = 20  # Very small!

X_small = np.random.uniform(0, 10, n_small).reshape(-1, 1)
y_small = 2 * X_small.ravel() + 5 + np.random.normal(0, 2, n_small)

# TODO: 
# 1. Fit polynomial models with degrees 1, 5, and 10
# 2. Plot the fitted curves
# 3. Which degree overfits?

# Your code here:

In [None]:
# Solution to Exercise 3

fig, axes = plt.subplots(1, 3, figsize=(15, 4))
degrees_test = [1, 5, 10]

X_plot = np.linspace(0, 10, 100).reshape(-1, 1)

for idx, degree in enumerate(degrees_test):
    # Fit model
    poly = PolynomialFeatures(degree=degree, include_bias=False)
    X_poly = poly.fit_transform(X_small)
    X_plot_poly = poly.transform(X_plot)
    
    model = LinearRegression()
    model.fit(X_poly, y_small)
    y_plot = model.predict(X_plot_poly)
    
    # Plot
    axes[idx].scatter(X_small, y_small, color='blue', s=50, alpha=0.7, label='Data')
    axes[idx].plot(X_plot, y_plot, 'r-', linewidth=2, label=f'Degree {degree} fit')
    axes[idx].set_xlabel('X')
    axes[idx].set_ylabel('y')
    axes[idx].set_title(f'Polynomial Degree = {degree}')
    axes[idx].legend()
    axes[idx].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("Observations:")
print("- Degree 1 (left): Underfits - too simple")
print("- Degree 5 (middle): Starts to wiggle - early overfitting")
print("- Degree 10 (right): Severe overfitting - wild oscillations!")
print("\nWith only 20 data points, high degrees memorize noise instead of learning patterns.")

### Exercise 4: Build Complete Pipeline

Create a pipeline with scaling, polynomial features, and regularization.

In [None]:
# Exercise 4: Build a complete pipeline

from sklearn.pipeline import Pipeline

# Use house price data
X = house_features[['sqft', 'bedrooms', 'age']]
y = house_features['price']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# TODO: Create a pipeline with:
# 1. StandardScaler
# 2. PolynomialFeatures (degree=2)
# 3. Ridge regression (alpha=1.0)

# Your code here:
# pipeline = Pipeline([
#     ('scaler', ???),
#     ('poly', ???),
#     ('model', ???)
# ])

# pipeline.fit(X_train, y_train)
# score = pipeline.score(X_test, y_test)
# print(f"Pipeline R² score: {score:.4f}")

In [None]:
# Solution to Exercise 4

# Create pipeline
pipeline = Pipeline([
    ('scaler', StandardScaler()),
    ('poly', PolynomialFeatures(degree=2, include_bias=False)),
    ('model', Ridge(alpha=1.0))
])

# Fit pipeline
pipeline.fit(X_train, y_train)

# Evaluate
train_score = pipeline.score(X_train, y_train)
test_score = pipeline.score(X_test, y_test)

print("Pipeline Results:")
print(f"Training R²: {train_score:.4f}")
print(f"Test R²:     {test_score:.4f}")

print("\nBenefits of using a Pipeline:")
print("✓ Ensures proper order: scale → polynomial → model")
print("✓ Prevents data leakage (all steps fit on train only)")
print("✓ Easy to deploy (save entire pipeline)")
print("✓ Cleaner code, less error-prone")

# Make predictions
new_house = pd.DataFrame({
    'sqft': [2500],
    'bedrooms': [4],
    'age': [10]
})

predicted_price = pipeline.predict(new_house)[0]
print(f"\nPrediction for new house: ${predicted_price:,.2f}")

## 11. Summary

### Key Takeaways

1. **Polynomial features enable linear models to capture non-linearity**
   - Transform straight lines into curves
   - Critical when relationships are quadratic, cubic, etc.

2. **Two types of features created**:
   - **Polynomial terms**: x², x³ (capture curved relationships)
   - **Interaction terms**: x₁×x₂ (capture feature synergies)

3. **Curse of dimensionality is real**:
   - Features grow exponentially with degree
   - 10 features, degree 3 → 285 features!
   - Leads to overfitting and slow training

4. **Best practices**:
   - Start with degree=2 (usually sufficient)
   - Scale features BEFORE creating polynomials
   - Use regularization (Ridge/Lasso)
   - Validate on test set to detect overfitting
   - Consider feature selection

5. **When to use**:
   - ✅ Linear models on non-linear data
   - ✅ When domain knowledge suggests interactions
   - ✅ When you have enough data (avoid overfitting)
   - ❌ Tree-based models (they handle non-linearity naturally)
   - ❌ When you have very little data

### What's Next?

**Module 05**: Binning and Discretization - Learn how to convert continuous features into categorical bins

### Additional Resources

- [Sklearn PolynomialFeatures](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.PolynomialFeatures.html)
- [Feature Engineering for Machine Learning](https://www.oreilly.com/library/view/feature-engineering-for/9781491953235/)
- [Regularization Techniques](https://scikit-learn.org/stable/modules/linear_model.html#ridge-regression)

---

**Congratulations!** You've completed Module 04. You now know:
- How to create polynomial and interaction features
- When polynomial features improve model performance
- How to avoid the curse of dimensionality
- How to use regularization with polynomial features
- How to build complete pipelines with scaling and polynomials

Ready to learn discretization? Move to **Module 05: Binning and Discretization**!