### Day 24 — Advanced Hyperparameter Tuning (Bayesian Optimization & Optuna)

This notebook documents **Day 24** of my Machine Learning learning journey.
The focus of this day is on **advanced hyperparameter tuning strategies**, moving
beyond classical Grid and Random Search toward **Sequential / Bayesian Optimization**
using **Optuna**.

This day builds directly on concepts of **generalization**, **bias–variance tradeoff**,
and **cross-validation** learned in previous days.

---



## 1. Recap: Generalization & Cross Validation

### What is Generalization?
- Generalization refers to a model’s ability to perform well on **unseen (test) data**
- A model that performs well only on training data is **memorizing**, not learning

### Memorization vs Generalization
- Memorization → Overfitting
- Generalization → Balanced bias and variance

### Role of Cross Validation
- Cross validation helps estimate **true generalization performance**
- Prevents tuning decisions based on a single train–test split
- Helps detect:
  - Overfitting
  - Underfitting
  - High variance models

---

## 2. Overfitting, Underfitting & Bias–Variance Tradeoff

### Overfitting
- Low bias, high variance
- Very complex model
- Large gap between training and validation/test performance

### Underfitting
- High bias, low variance
- Very simple model
- Poor performance on both training and test data

### Bias–Variance Tradeoff
- Increasing model complexity:
  - Decreases bias
  - Increases variance
- Decreasing model complexity:
  - Increases bias
  - Decreases variance
- Goal: **optimal balance for best generalization**

---



## 3. Ways to Improve Model Generalization

The notebook discusses three major approaches:

1. **Cross Validation**
   - Reliable estimation of model performance
   - Helps compare models fairly

2. **Regularization**
   - Penalizes overly complex models
   - Examples:
     - L1 (Lasso)
     - L2 (Ridge)
     - Elastic Net

3. **Ensemble Methods**
   - Bagging
   - Boosting
   - Reduce variance and improve stability

---



## 4. Cross Validation Workflow (K-Fold)

- Dataset is divided into **K folds**
- For each iteration:
  - K−1 folds → training
  - 1 fold → validation
- Process is repeated K times
- Final performance:
  - Mean of validation scores
  - Variance of validation scores

### Interpretation
- High variance in validation score → Overfitting
- Low variance + good mean score → Better generalized model

---

## 5. Model Selection after Cross Validation

After CV:
- Compare **mean training score** vs **mean validation score**
- Identify:
  - Underfitted models
  - Overfitted models
  - Well-generalized models
- Final selected model is evaluated **once** on test data
- Test data is never used during tuning

---

## 6. Hyperparameters vs Parameters

### Parameters
- Learned during training
- Example:
  - Weights in Linear Regression
  - Tree split thresholds

### Hyperparameters
- Set **before training**
- Control model complexity and learning behavior
- Examples:
  - `k` in KNN
  - `max_depth` in Decision Tree
  - Regularization strength

---

## 7. What is Hyperparameter Tuning?

Hyperparameter tuning is the process of **systematically changing hyperparameters**
to:

- Improve generalization
- Control overfitting and underfitting
- Select the best model configuration for a given dataset

---

## 8. Why Hyperparameter Tuning is Required

Hyperparameter tuning helps in:
1. Finding a generalized model
2. Model selection
3. (Rarely) Feature selection indirectly

Different hyperparameter values result in models with **different capacity and behavior**.

---

## 9. Types of Hyperparameter Tuning

### 1. Manual Search
- Manually try different hyperparameter combinations
- Simple but inefficient
- Not scalable

---

### 2. Grid Search
- Exhaustive search over all combinations
- Hyperparameters are defined as a grid
- Guarantees best result within grid
- Computationally expensive
- Number of experiments grows exponentially

---

### 3. Random Search
- Hyperparameters sampled randomly from ranges
- More efficient than grid search
- Works well when only a few hyperparameters matter
- Still inefficient for very large search spaces

---

## 10. Limitations of Grid & Random Search

- Computationally expensive
- Evaluates many poor configurations
- Inefficient for:
  - Large datasets
  - Complex models
  - Deep learning models
- Motivation for **Sequential / Bayesian Optimization**

---

## 11. Sequential Search (Bayesian Optimization)

### Core Idea
- Instead of searching blindly, learn from past trials
- Select next hyperparameter configuration **intelligently**

### Key Components
1. Small initial random sample of configurations
2. Evaluate model performance
3. Fit a **surrogate model**
4. Use surrogate to propose better configurations
5. Repeat until convergence or budget exhausted

---

## 12. Surrogate Model Concept

- Surrogate model approximates:
  - Hyperparameters → Performance mapping
- Cheaper to evaluate than training full ML model
- Guides search toward promising regions

---

## 13. Tree-Structured Parzen Estimator (TPE)

- A popular Bayesian Optimization algorithm
- Used by **Optuna**
- Models:
  - Good hyperparameter distributions
  - Bad hyperparameter distributions
- Selects new configurations that maximize expected improvement

---

## 14. Why Bayesian Optimization is Efficient

- Requires fewer trials
- Avoids evaluating poor configurations
- Suitable for:
  - Large search spaces
  - Expensive models
  - Real-world ML workflows

---

## 15. Introduction to Optuna

Optuna is a modern hyperparameter optimization framework that:
- Implements Bayesian Optimization
- Uses TPE by default
- Is fast, flexible, and scalable
- Widely used in industry and research

---



## 16. Optuna Hyperparameter Tuning Workflow

### Step 1: Define Objective Function
- Written in Python
- Includes:
  - Hyperparameter search space
  - Model training
  - Evaluation metric
- Returns a single score to optimize

---

### Step 2: Create a Study
- Study defines:
  - Optimization direction (maximize / minimize)
  - Sampler (TPE by default)

---

### Step 3: Optimize
- Optuna runs multiple trials
- Each trial:
  - Chooses hyperparameters
  - Trains model
  - Evaluates performance
- Best parameters are tracked automatically

---

## 17. Optuna Search Components

- **Search Space** → Hyperparameters & their ranges
- **Sampler** → Strategy to select next configuration
- **Model** → ML algorithm being tuned
- **Metric** → Performance measure
- **Trials** → Number of optimization steps

---



## 18. Models Used in This Notebook

- **K-Nearest Neighbors (KNN)**
  - Hyperparameters:
    - `n_neighbors`
    - `weights`

- **Decision Tree**
  - Hyperparameters:
    - `max_depth`
    - `min_samples_split`
    - `min_samples_leaf`

---



## 19. Dataset Used

- **Breast Cancer Wisconsin Dataset**
- Binary classification problem
- Used to compare tuning effectiveness across models

---



## 20. Key Learnings from Day 24

- Grid and Random Search do not scale well
- Bayesian Optimization is more efficient and intelligent
- Hyperparameter tuning is essential for generalization
- Optuna simplifies advanced tuning workflows
- Sequential optimization reduces computational cost
- Modern ML requires efficient model selection strategies

---
