# Lesson 2: Hyperparameter Tuning

**Module 4: Model Development & Optimization**  
**Estimated Time**: 2 hours  
**Difficulty**: Intermediate

---

## ðŸŽ¯ Learning Objectives

By the end of this lesson, you will:

âœ… Understand the difference between Parameters vs Hyperparameters  
âœ… Know why Grid Search is often a bad idea  
âœ… Implement efficient tuning with **Optuna** (Bayesian Optimization)  
âœ… Answer interview questions on optimization strategies  

---

## ðŸ“š Table of Contents

1. [Parameters vs Hyperparameters](#1-definitions)
2. [Search Strategies: Grid vs Random vs Bayesian](#2-strategies)
3. [Hands-On: Tuning XGBoost with Optuna](#3-hands-on)
4. [Interview Preparation](#4-interview-questions)

---

## 1. Parameters vs Hyperparameters

**Parameters**:
- Learned **internal** weights of the model.
- Example: `Coefficients` in Linear Regression, `Weights` in Neural Net.
- You do **NOT** set these manually.

**Hyperparameters**:
- External configuration set **before** training.
- Example: `Learning Rate`, `Tree Depth`, `Number of Layers`.
- You **MUST** set these derived from experience or search.

## 2. Search Strategies

### 1. Grid Search
- Try ALL combinations.
- `LR = [0.1, 0.01], Depth = [3, 5]` -> 4 runs.
- **Pros**: Guaranteed to find the best in the grid.
- **Cons**: Exponential cost ($O(N^k)$). Impossible for >3 params.

### 2. Random Search
- Try random combinations.
- **Pros**: Surprisingly effective. Can explore continuous spaces more efficiently.
- **Cons**: Dumb. Doesn't learn from past failures.

### 3. Bayesian Optimization (Optuna)
- **Smart Search**.
- Builds a probability model of the objective function.
- "I tried LR=0.01 and it was bad. I won't try LR=0.009 next. I'll try 0.1."
- **Pros**: Most efficient. Finds better results in fewer runs.

## 3. Hands-On: Tuning XGBoost with Optuna

Note: Requires `pip install optuna xgboost scikit-learn`.

In [None]:
import optuna
import xgboost as xgb
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# 1. Load Data
data = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, test_size=0.25)

# 2. Define Objective Function
def objective(trial):
    # Suggest hyperparameters
    param = {
        'objective': 'binary:logistic',
        'eval_metric': 'logloss',
        'booster': trial.suggest_categorical('booster', ['gbtree', 'dart']),
        'lambda': trial.suggest_float('lambda', 1e-8, 1.0, log=True),
        'alpha': trial.suggest_float('alpha', 1e-8, 1.0, log=True),
        'max_depth': trial.suggest_int('max_depth', 1, 9),
        'eta': trial.suggest_float('eta', 0.01, 1.0, log=True),
    }

    # Train model
    bst = xgb.XGBClassifier(**param)
    bst.fit(X_train, y_train)
    
    # Evaluate
    preds = bst.predict(X_test)
    accuracy = accuracy_score(y_test, preds)
    
    return accuracy

# 3. Run Optimization
study = optuna.create_study(direction="maximize")
study.optimize(objective, n_trials=10) # Run 10 experiments

print(f"Best Accuracy: {study.best_value}")
print(f"Best Params: {study.best_params}")

## 4. Interview Preparation

### Common Questions

#### Q1: "Why is Random Search often better than Grid Search?"
**Answer**: "In high dimensions, not all parameters are equally important. Grid search wastes time checking all combinations of unimportant parameters. Random search explores the space of important parameters more densely for the same computational budget."

#### Q2: "How does Bayesian Optimization work conceptually?"
**Answer**: "It builds a surrogate model (Gaussian Process) to approximate the relationship between hyperparameters and model performance. It uses an acquisition function to decide where to sample next, balancing **Exploration** (unsampled areas) and **Exploitation** (promising areas)."