# Module 10: Automated Feature Engineering

**Difficulty**: ‚≠ê‚≠ê‚≠ê Advanced  
**Estimated Time**: 70 minutes  
**Prerequisites**: Module 09 (Feature Importance and Interpretability)

## Learning Objectives

By the end of this notebook, you will be able to:

1. Understand the concept and benefits of automated feature engineering
2. Use polynomial features and interaction generators
3. Apply Deep Feature Synthesis concepts
4. Create custom automated transformation pipelines
5. Compare manual vs automated feature engineering
6. Know when automation helps and when it hurts

## 1. Why Automate Feature Engineering?

**Manual feature engineering challenges**:
- Time-consuming and labor-intensive
- Requires deep domain knowledge
- May miss complex interactions
- Not scalable to many datasets

**Automated feature engineering can**:
- ‚úÖ Generate hundreds of features automatically
- ‚úÖ Discover non-obvious patterns
- ‚úÖ Save time in exploration phase
- ‚úÖ Provide good baseline features

**But automation also**:
- ‚ùå Can create too many irrelevant features
- ‚ùå Loses interpretability
- ‚ùå Increases overfitting risk
- ‚ùå Still needs feature selection

**Best approach**: Combine manual domain knowledge with automated exploration!

## 2. Setup

In [None]:
# Core libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from itertools import combinations

# Feature engineering tools
from sklearn.preprocessing import PolynomialFeatures, FunctionTransformer
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer

# Models
from sklearn.ensemble import RandomForestRegressor
from sklearn.linear_model import Ridge
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.feature_selection import SelectKBest, f_regression

# Configuration
%matplotlib inline
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")

# Set random seed for reproducibility
np.random.seed(42)

# Display settings
pd.set_option('display.max_columns', None)
pd.set_option('display.precision', 3)

print("‚úì Setup complete!")

## 3. Create Synthetic Dataset

We'll create a dataset where important features come from interactions and transformations.

In [None]:
# Create synthetic dataset where target depends on feature interactions
n_samples = 1000

# Generate base features
np.random.seed(42)
X_raw = pd.DataFrame({
    'feature_1': np.random.uniform(1, 10, n_samples),
    'feature_2': np.random.uniform(1, 10, n_samples),
    'feature_3': np.random.uniform(1, 10, n_samples),
    'feature_4': np.random.uniform(1, 10, n_samples),
    'feature_5': np.random.uniform(1, 10, n_samples),
})

# Target depends on ENGINEERED features (not raw ones!)
y = (
    5 * (X_raw['feature_1'] * X_raw['feature_2']) +  # Interaction
    3 * (X_raw['feature_3'] ** 2) +  # Polynomial
    2 * np.sqrt(X_raw['feature_4']) +  # Non-linear transform
    1 * (X_raw['feature_1'] + X_raw['feature_5']) +  # Sum
    np.random.normal(0, 10, n_samples)  # Noise
)

print(f"Dataset: {X_raw.shape}")
print(f"\nTrue data generating process:")
print("  y = 5*(f1*f2) + 3*(f3¬≤) + 2*‚àöf4 + (f1+f5) + noise")
print("\nChallenge: Can automated feature engineering discover these patterns?")
print(f"\nFirst few rows:")
X_raw.head()

## 4. Baseline: No Feature Engineering

In [None]:
# Split data
X_train, X_test, y_train, y_test = train_test_split(
    X_raw, y, test_size=0.2, random_state=42
)

# Train model with raw features only
baseline_model = Ridge(alpha=1.0)
baseline_model.fit(X_train, y_train)

# Evaluate
y_pred_baseline = baseline_model.predict(X_test)
rmse_baseline = np.sqrt(mean_squared_error(y_test, y_pred_baseline))
r2_baseline = r2_score(y_test, y_pred_baseline)

print("Baseline (Raw Features Only):")
print(f"  RMSE: {rmse_baseline:.2f}")
print(f"  R¬≤ Score: {r2_baseline:.3f}")
print("\nNote: Poor performance because model can't discover interactions!")

## 5. Method 1: Polynomial Features

**PolynomialFeatures** automatically generates:
- Powers of features (x¬≤, x¬≥)
- Interactions (x‚ÇÅ √ó x‚ÇÇ, x‚ÇÅ √ó x‚ÇÇ √ó x‚ÇÉ)

**Example**: [a, b] with degree=2 ‚Üí [1, a, b, a¬≤, ab, b¬≤]

In [None]:
# Create polynomial features (degree 2 = include interactions and squares)
poly = PolynomialFeatures(degree=2, include_bias=False)
X_train_poly = poly.fit_transform(X_train)
X_test_poly = poly.transform(X_test)

# Get feature names
poly_feature_names = poly.get_feature_names_out(X_raw.columns)

print(f"Original features: {X_train.shape[1]}")
print(f"After polynomial expansion: {X_train_poly.shape[1]}")
print(f"\nGenerated features include:")
print(list(poly_feature_names[:10]), "...")
print("\nNotice features like 'feature_1 feature_2' (interaction!)")

In [None]:
# Train model with polynomial features
poly_model = Ridge(alpha=1.0)  # Regularization important with many features!
poly_model.fit(X_train_poly, y_train)

# Evaluate
y_pred_poly = poly_model.predict(X_test_poly)
rmse_poly = np.sqrt(mean_squared_error(y_test, y_pred_poly))
r2_poly = r2_score(y_test, y_pred_poly)

print("With Polynomial Features:")
print(f"  RMSE: {rmse_poly:.2f} (baseline: {rmse_baseline:.2f})")
print(f"  R¬≤ Score: {r2_poly:.3f} (baseline: {r2_baseline:.3f})")
print(f"\nImprovement: {((rmse_baseline - rmse_poly) / rmse_baseline * 100):.1f}% error reduction!")

In [None]:
# Examine which generated features are most important
coefficients = pd.Series(poly_model.coef_, index=poly_feature_names)
top_features = coefficients.abs().nlargest(10)

print("Top 10 most important polynomial features:")
print(top_features)

# Visualize
plt.figure(figsize=(10, 6))
top_features.plot(kind='barh', color='steelblue', edgecolor='black')
plt.xlabel('Absolute Coefficient Value')
plt.title('Most Important Polynomial Features', fontsize=12, fontweight='bold')
plt.gca().invert_yaxis()
plt.tight_layout()
plt.show()

print("\nNotice: Model discovered the important interactions!")
print("  - 'feature_1 feature_2' (true interaction)")
print("  - 'feature_3¬≤' (true polynomial)")

## 6. Method 2: Custom Automated Transformations

Create functions that automatically generate domain-inspired features.

In [None]:
def create_mathematical_features(X):
    """
    Automatically create mathematical transformations.
    """
    X_new = X.copy()
    
    for col in X.columns:
        # Non-linear transformations
        X_new[f'{col}_squared'] = X[col] ** 2
        X_new[f'{col}_sqrt'] = np.sqrt(X[col])
        X_new[f'{col}_log'] = np.log(X[col] + 1)  # +1 to avoid log(0)
    
    return X_new

def create_interaction_features(X, max_combinations=2):
    """
    Automatically create interaction features.
    """
    X_new = X.copy()
    
    # Create pairwise interactions
    for col1, col2 in combinations(X.columns, 2):
        X_new[f'{col1}_times_{col2}'] = X[col1] * X[col2]
        X_new[f'{col1}_div_{col2}'] = X[col1] / (X[col2] + 1e-5)  # Avoid div by 0
    
    return X_new

def create_aggregate_features(X):
    """
    Create aggregate features across all columns.
    """
    X_new = X.copy()
    
    X_new['sum_all'] = X.sum(axis=1)
    X_new['mean_all'] = X.mean(axis=1)
    X_new['max_all'] = X.max(axis=1)
    X_new['min_all'] = X.min(axis=1)
    X_new['std_all'] = X.std(axis=1)
    
    return X_new

# Apply all transformations
X_train_auto = X_train.copy()
X_test_auto = X_test.copy()

# Mathematical transformations
X_train_auto = create_mathematical_features(X_train_auto)
X_test_auto = create_mathematical_features(X_test_auto)

# Interactions
X_train_auto = create_interaction_features(X_train_auto)
X_test_auto = create_interaction_features(X_test_auto)

# Aggregates
X_train_auto = create_aggregate_features(X_train_auto)
X_test_auto = create_aggregate_features(X_test_auto)

print(f"Original features: {X_train.shape[1]}")
print(f"After automated engineering: {X_train_auto.shape[1]}")
print(f"\nGenerated {X_train_auto.shape[1] - X_train.shape[1]} new features!")
print(f"\nSample feature names:")
print(list(X_train_auto.columns[-10:]))

In [None]:
# Train with automated features
auto_model = Ridge(alpha=1.0)
auto_model.fit(X_train_auto, y_train)

# Evaluate
y_pred_auto = auto_model.predict(X_test_auto)
rmse_auto = np.sqrt(mean_squared_error(y_test, y_pred_auto))
r2_auto = r2_score(y_test, y_pred_auto)

print("With Automated Features:")
print(f"  RMSE: {rmse_auto:.2f}")
print(f"  R¬≤ Score: {r2_auto:.3f}")

# Check top features
coefficients = pd.Series(auto_model.coef_, index=X_train_auto.columns)
top_features = coefficients.abs().nlargest(10)

print(f"\nTop 10 discovered features:")
for feature, coef in top_features.items():
    print(f"  {feature}: {coef:.2f}")

## 7. Feature Selection After Automation

**Problem**: Automated methods create MANY features!
**Solution**: Use feature selection to keep only the best.

In [None]:
# Select top k features using statistical test
k = 20
selector = SelectKBest(score_func=f_regression, k=k)
X_train_selected = selector.fit_transform(X_train_auto, y_train)
X_test_selected = selector.transform(X_test_auto)

# Get selected feature names
selected_features = X_train_auto.columns[selector.get_support()].tolist()

print(f"Selected top {k} features from {X_train_auto.shape[1]} total:")
print(selected_features)

# Train with selected features
selected_model = Ridge(alpha=1.0)
selected_model.fit(X_train_selected, y_train)

# Evaluate
y_pred_selected = selected_model.predict(X_test_selected)
rmse_selected = np.sqrt(mean_squared_error(y_test, y_pred_selected))
r2_selected = r2_score(y_test, y_pred_selected)

print(f"\nWith Feature Selection:")
print(f"  RMSE: {rmse_selected:.2f}")
print(f"  R¬≤ Score: {r2_selected:.3f}")
print(f"\nUsing only {k}/{X_train_auto.shape[1]} features ({k/X_train_auto.shape[1]*100:.1f}%)!")

## 8. Compare All Approaches

In [None]:
# Comprehensive comparison
results = pd.DataFrame({
    'Method': [
        'Baseline (Raw Features)',
        'Polynomial Features',
        'Automated Features (All)',
        'Automated + Selection'
    ],
    'Num Features': [
        X_train.shape[1],
        X_train_poly.shape[1],
        X_train_auto.shape[1],
        k
    ],
    'RMSE': [rmse_baseline, rmse_poly, rmse_auto, rmse_selected],
    'R¬≤ Score': [r2_baseline, r2_poly, r2_auto, r2_selected]
})

print("\nPerformance Comparison:")
print("="*70)
print(results.to_string(index=False))
print("="*70)

In [None]:
# Visualize comparison
fig, axes = plt.subplots(1, 2, figsize=(15, 5))

# RMSE comparison
axes[0].barh(results['Method'], results['RMSE'], color='coral', edgecolor='black')
axes[0].set_xlabel('RMSE (Lower is Better)')
axes[0].set_title('Model Error by Method', fontsize=12, fontweight='bold')
axes[0].invert_yaxis()
axes[0].grid(True, alpha=0.3, axis='x')

# Feature count vs R¬≤
axes[1].scatter(results['Num Features'], results['R¬≤ Score'], 
               s=200, alpha=0.6, edgecolor='black')
for idx, row in results.iterrows():
    axes[1].annotate(row['Method'], 
                    (row['Num Features'], row['R¬≤ Score']),
                    fontsize=8, ha='left', va='bottom')
axes[1].set_xlabel('Number of Features')
axes[1].set_ylabel('R¬≤ Score (Higher is Better)')
axes[1].set_title('Performance vs Feature Count', fontsize=12, fontweight='bold')
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("\nKey insights:")
print("1. Automated features dramatically improve performance")
print("2. Feature selection maintains performance with fewer features")
print("3. More features ‚â† always better (need selection!)")

## 9. When Automation Helps vs Hurts

In [None]:
# Demonstrate curse of dimensionality with too many features
print("Demonstrating when automation can HURT:\n")

# Create dataset where automation creates mostly noise
X_simple = pd.DataFrame({
    'x1': np.random.randn(200),
    'x2': np.random.randn(200),
    'x3': np.random.randn(200),
})

# Simple linear relationship
y_simple = 2*X_simple['x1'] + 3*X_simple['x2'] + np.random.randn(200)*0.1

X_train_s, X_test_s, y_train_s, y_test_s = train_test_split(
    X_simple, y_simple, test_size=0.3, random_state=42
)

# Baseline
model1 = Ridge(alpha=0.1)
model1.fit(X_train_s, y_train_s)
r2_simple = r2_score(y_test_s, model1.predict(X_test_s))

# With polynomial features (generates many irrelevant features!)
poly_extreme = PolynomialFeatures(degree=3, include_bias=False)
X_train_poly_s = poly_extreme.fit_transform(X_train_s)
X_test_poly_s = poly_extreme.transform(X_test_s)

model2 = Ridge(alpha=0.1)  # Same regularization
model2.fit(X_train_poly_s, y_train_s)
r2_poly_s = r2_score(y_test_s, model2.predict(X_test_poly_s))

print(f"Simple linear data:")
print(f"  Raw features (3): R¬≤ = {r2_simple:.3f}")
print(f"  Polynomial features ({X_train_poly_s.shape[1]}): R¬≤ = {r2_poly_s:.3f}")
print(f"\nAutomation HURT performance by {(r2_simple - r2_poly_s)*100:.1f}%!")
print(f"\nWhy? Generated {X_train_poly_s.shape[1]-3} mostly irrelevant features.")

In [None]:
# Summary: When to use automated feature engineering

summary = """
‚ïî‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïó
‚ïë           WHEN AUTOMATED FEATURE ENGINEERING HELPS vs HURTS          ‚ïë
‚ï†‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ï£
‚ïë                                                                       ‚ïë
‚ïë  ‚úÖ AUTOMATION HELPS WHEN:                                           ‚ïë
‚ïë  ‚Ä¢ Target has non-linear relationships                               ‚ïë
‚ïë  ‚Ä¢ Feature interactions are important                                ‚ïë
‚ïë  ‚Ä¢ You have enough data (>1000 samples)                              ‚ïë
‚ïë  ‚Ä¢ Domain knowledge is limited (exploratory phase)                   ‚ïë
‚ïë  ‚Ä¢ Using regularization + feature selection                          ‚ïë
‚ïë  ‚Ä¢ Tree-based models (handle many features well)                     ‚ïë
‚ïë                                                                       ‚ïë
‚ïë  ‚ùå AUTOMATION HURTS WHEN:                                           ‚ïë
‚ïë  ‚Ä¢ Simple linear relationships                                       ‚ïë
‚ïë  ‚Ä¢ Small datasets (<500 samples)                                     ‚ïë
‚ïë  ‚Ä¢ Interpretability is critical                                      ‚ïë
‚ïë  ‚Ä¢ No feature selection applied                                      ‚ïë
‚ïë  ‚Ä¢ Low regularization                                                ‚ïë
‚ïë  ‚Ä¢ Production constraints (too many features)                        ‚ïë
‚ïë                                                                       ‚ïë
‚ïë  üí° BEST PRACTICE:                                                   ‚ïë
‚ïë  1. Start with domain-driven manual features                         ‚ïë
‚ïë  2. Add automated features for exploration                           ‚ïë
‚ïë  3. Apply feature selection aggressively                             ‚ïë
‚ïë  4. Use cross-validation to prevent overfitting                      ‚ïë
‚ïë  5. Compare with baseline (raw features)                             ‚ïë
‚ïë                                                                       ‚ïë
‚ïö‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïù
"""

print(summary)

## 10. Exercise Section

### Exercise 1: Polynomial Features with Different Degrees

Compare polynomial features with degrees 1, 2, and 3. Find the optimal degree.

In [None]:
# Exercise 1: Test different polynomial degrees

# TODO:
# 1. Create polynomial features with degree 1, 2, and 3
# 2. Train Ridge models for each
# 3. Compare test R¬≤ scores
# 4. Plot performance vs degree
#
# Which degree works best and why?

# Your code here:


In [None]:
# Solution to Exercise 1

degrees = [1, 2, 3]
results = []

for degree in degrees:
    # Create polynomial features
    poly = PolynomialFeatures(degree=degree, include_bias=False)
    X_train_p = poly.fit_transform(X_train)
    X_test_p = poly.transform(X_test)
    
    # Train model
    model = Ridge(alpha=1.0)
    model.fit(X_train_p, y_train)
    
    # Evaluate
    r2 = r2_score(y_test, model.predict(X_test_p))
    
    results.append({
        'Degree': degree,
        'Num Features': X_train_p.shape[1],
        'R¬≤ Score': r2
    })

results_df = pd.DataFrame(results)
print("Polynomial Degree Comparison:")
print(results_df)

# Plot
plt.figure(figsize=(10, 5))
plt.plot(results_df['Degree'], results_df['R¬≤ Score'], marker='o', linewidth=2)
plt.xlabel('Polynomial Degree')
plt.ylabel('R¬≤ Score')
plt.title('Performance vs Polynomial Degree', fontsize=12, fontweight='bold')
plt.grid(True, alpha=0.3)
plt.xticks(degrees)
plt.tight_layout()
plt.show()

best_degree = results_df.loc[results_df['R¬≤ Score'].idxmax(), 'Degree']
print(f"\nOptimal degree: {best_degree}")
print("Degree 2 captures interactions and squares (matches our data generating process!)")

### Exercise 2: Custom Feature Generator

Create a function that generates ratio features (feature_i / feature_j for all pairs).

In [None]:
# Exercise 2: Create ratio features

# TODO: Write a function that:
# 1. Takes a dataframe X
# 2. Creates ratio features for all column pairs
# 3. Returns expanded dataframe
#
# Hint: Use itertools.combinations

def create_ratio_features(X):
    # Your code here
    pass

# Test your function
# X_with_ratios = create_ratio_features(X_train)
# print(f"Original: {X_train.shape[1]} features")
# print(f"With ratios: {X_with_ratios.shape[1]} features")

In [None]:
# Solution to Exercise 2

from itertools import combinations

def create_ratio_features(X):
    """
    Create ratio features for all column pairs.
    """
    X_new = X.copy()
    
    # Generate all pairwise ratios
    for col1, col2 in combinations(X.columns, 2):
        # Avoid division by zero
        X_new[f'{col1}_div_{col2}'] = X[col1] / (X[col2] + 1e-5)
        X_new[f'{col2}_div_{col1}'] = X[col2] / (X[col1] + 1e-5)
    
    return X_new

# Test
X_train_ratios = create_ratio_features(X_train)
X_test_ratios = create_ratio_features(X_test)

print(f"Original: {X_train.shape[1]} features")
print(f"With ratios: {X_train_ratios.shape[1]} features")
print(f"\nAdded {X_train_ratios.shape[1] - X_train.shape[1]} ratio features")
print(f"\nSample ratio features:")
print(list(X_train_ratios.columns[-10:]))

# Train and evaluate
model = Ridge(alpha=1.0)
model.fit(X_train_ratios, y_train)
r2 = r2_score(y_test, model.predict(X_test_ratios))

print(f"\nR¬≤ with ratio features: {r2:.3f}")
print(f"Baseline R¬≤: {r2_baseline:.3f}")
print(f"Improvement: {(r2 - r2_baseline)*100:.1f}%")

### Exercise 3: Feature Selection Importance

Compare automated features WITH and WITHOUT feature selection on a small dataset.

In [None]:
# Exercise 3: Demonstrate overfitting without selection

# Create small dataset (overfitting risk!)
X_small = X_raw.sample(100, random_state=42)
y_small = y.loc[X_small.index]

X_train_small, X_test_small, y_train_small, y_test_small = train_test_split(
    X_small, y_small, test_size=0.3, random_state=42
)

print(f"Small dataset: {len(X_train_small)} training samples\n")

# TODO:
# 1. Create polynomial features (degree 2)
# 2. Train WITHOUT feature selection
# 3. Train WITH feature selection (select top 10)
# 4. Compare train vs test R¬≤ (check overfitting!)

# Your code here:


In [None]:
# Solution to Exercise 3

# Create polynomial features
poly = PolynomialFeatures(degree=2, include_bias=False)
X_train_poly_small = poly.fit_transform(X_train_small)
X_test_poly_small = poly.transform(X_test_small)

print(f"Features: {X_train_poly_small.shape[1]}")
print(f"Samples: {X_train_poly_small.shape[0]}")
print(f"Features > Samples? {X_train_poly_small.shape[1] > X_train_poly_small.shape[0]}\n")

# 1. WITHOUT feature selection
model_no_sel = Ridge(alpha=1.0)
model_no_sel.fit(X_train_poly_small, y_train_small)

train_r2_no_sel = r2_score(y_train_small, model_no_sel.predict(X_train_poly_small))
test_r2_no_sel = r2_score(y_test_small, model_no_sel.predict(X_test_poly_small))

# 2. WITH feature selection
selector = SelectKBest(score_func=f_regression, k=10)
X_train_selected_small = selector.fit_transform(X_train_poly_small, y_train_small)
X_test_selected_small = selector.transform(X_test_poly_small)

model_with_sel = Ridge(alpha=1.0)
model_with_sel.fit(X_train_selected_small, y_train_small)

train_r2_with_sel = r2_score(y_train_small, model_with_sel.predict(X_train_selected_small))
test_r2_with_sel = r2_score(y_test_small, model_with_sel.predict(X_test_selected_small))

# Compare
print("Performance Comparison (Small Dataset):")
print("="*60)
print(f"{'Method':<30} {'Train R¬≤':<12} {'Test R¬≤':<12} {'Overfit'}")
print("-"*60)
print(f"{'Without Selection (all features)':<30} {train_r2_no_sel:<12.3f} {test_r2_no_sel:<12.3f} {train_r2_no_sel - test_r2_no_sel:.3f}")
print(f"{'With Selection (top 10)':<30} {train_r2_with_sel:<12.3f} {test_r2_with_sel:<12.3f} {train_r2_with_sel - test_r2_with_sel:.3f}")
print("="*60)

print("\nKey observation:")
print("Feature selection REDUCES overfitting (smaller train-test gap)")
print("This is CRITICAL with small datasets and many features!")

## 11. Summary

### Key Takeaways

1. **Automated feature engineering can discover patterns humans miss**
   - Polynomial features find interactions and non-linearities
   - Custom transformations apply domain knowledge at scale
   - Can dramatically improve model performance

2. **Automation creates many features - need feature selection!**
   - More features ‚â† better performance
   - Without selection ‚Üí overfitting risk
   - Select top k features or use regularization

3. **When automation helps**:
   - Complex non-linear relationships
   - Feature interactions important
   - Sufficient data (>1000 samples)
   - Exploratory phase

4. **When automation hurts**:
   - Simple linear relationships
   - Small datasets
   - Need interpretability
   - No feature selection applied

5. **Best approach: Hybrid strategy**
   - Start with manual domain features
   - Add automated features for exploration
   - Apply aggressive feature selection
   - Compare with baseline

### Common Automated Techniques

**Mathematical Transformations**:
- Polynomial (x¬≤, x¬≥)
- Square root, log, exponential
- Trigonometric (sin, cos)

**Interaction Features**:
- Multiplication (x‚ÇÅ √ó x‚ÇÇ)
- Division (x‚ÇÅ / x‚ÇÇ)
- Min/Max combinations

**Aggregate Features**:
- Sum, mean, std across features
- Count of values above/below threshold
- Percentiles

### Tools and Libraries

**Scikit-learn**:
- `PolynomialFeatures`: Automatic interactions
- `FunctionTransformer`: Custom transformations

**Advanced (not covered in detail)**:
- `featuretools`: Deep Feature Synthesis
- `tsfresh`: Time-series features
- `category_encoders`: Advanced categorical encoding

### What's Next?

**Module 11**: Final Project Pipeline - Build a complete end-to-end feature engineering pipeline combining all techniques

### Additional Resources

- [Featuretools Documentation](https://docs.featuretools.com/)
- [Feature Engineering for Machine Learning](https://www.oreilly.com/library/view/feature-engineering-for/9781491953235/)
- [Sklearn Preprocessing](https://scikit-learn.org/stable/modules/preprocessing.html)

---

**Congratulations!** You've completed Module 10. You now understand:
- How to automate feature generation with polynomial features
- How to create custom automated transformation pipelines
- Why feature selection is critical after automation
- When automation helps vs when it hurts
- How to combine manual and automated approaches

Ready for the final challenge? Let's move to **Module 11: Final Project Pipeline** to put it all together!