# Day 3: Advanced Hyperparameter Tuning with Bayesian Optimization

---

## Introduction to Hyperparameter Tuning

Hyperparameters are configuration parameters external to the model, such as learning rate, number of layers, or maximum depth. Hyperparameter tuning is the process of finding the best combination of these parameters to improve model performance.

---

## What is Bayesian Optimization?

Bayesian Optimization is an efficient technique for optimizing objective functions that are:
- Expensive to evaluate
- Have no known closed-form expression
- Potentially noisy or non-convex

It uses a probabilistic surrogate model to model the objective function and an acquisition function to decide where to sample next.

---

## Key Components of Bayesian Optimization

1. *Surrogate Model*:
   - A probabilistic model such as a Gaussian Process (GP)
   - Approximates the true objective function

2. *Acquisition Function*:
   - Guides the optimization by balancing exploration and exploitation
   - Examples: Expected Improvement (EI), Probability of Improvement (PI), Upper Confidence Bound (UCB)

3. *Iterative Refinement*:
   - After evaluating new hyperparameters, the surrogate model is updated
   - This improves accuracy over time

---

## Why Use Bayesian Optimization?

- More sample-efficient than grid or random search
- Suitable for high-dimensional search spaces
- Reduces the number of required evaluations
- Especially effective when evaluations are expensive or time-consuming

---

## Exploration vs. Exploitation

| Strategy       | Description                                     | Purpose                          |
|----------------|-------------------------------------------------|----------------------------------|
| Exploration    | Samples from less-known or unvisited regions    | Finds new potentially good areas |
| Exploitation   | Samples near previously known good results      | Fine-tunes existing good areas   |

Bayesian Optimization balances both using the acquisition function.

---

## Unified Example: Minimizing a Simple Objective Function

We will demonstrate Bayesian Optimization using two libraries: *Hyperopt* and *Optuna*. The goal is to minimize a simple quadratic function:

*Objective function:*  
\[
f(x) = (x - 3)^2 + 1
\]

### Using Hyperopt

```python
from hyperopt import fmin, tpe, hp, Trials
import numpy as np

# Define the objective function
def objective(params):
    x = params['x']
    return (x - 3)**2 + 1

# Define the search space
space = {'x': hp.uniform('x', -10, 10)}

# Run Bayesian Optimization using Hyperopt
trials = Trials()
best = fmin(
    fn=objective,
    space=space,
    algo=tpe.suggest,
    max_evals=50,
    trials=trials
)

print("Best parameters from Hyperopt:", best)
```
### Using Optuna
```python
import optuna

# Define the objective function
def objective(trial):
    x = trial.suggest_float('x', -10, 10)
    return (x - 3)**2 + 1

# Run Bayesian Optimization using Optuna
study = optuna.create_study(direction='minimize')
study.optimize(objective, n_trials=50)

print("Best parameters from Optuna:", study.best_params)
```

In [18]:
import optuna

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split, cross_val_score, GridSearchCV, RandomizedSearchCV
from sklearn.preprocessing import StandardScaler
from xgboost import XGBClassifier
from sklearn.metrics import accuracy_score   
 
# load and preprocess dataset
data = load_breast_cancer()
x, y = data.data, data.target

# split dataset
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.3, random_state=7)

# standardize features
scalar = StandardScaler()
x_train = scalar.fit_transform(x_train)
x_test = scalar.fit_transform(x_test)

print(f"Training Dataset: {x_train.shape}") 
print(f"Test Dataset: {x_test.shape}") 

Training Dataset: (398, 30)
Test Dataset: (171, 30)


In [19]:
# Define baseline XGBOOST model

# train baseline 
baseline_model = XGBClassifier(eval_metric = 'logloss', random_state=7)
baseline_model.fit(x_train, y_train)

# Evaluate model
baseline_pred= baseline_model.predict(x_test)
baseline_accuracy = accuracy_score(y_test, baseline_pred)

print(f"XGBoost Accuracy: {baseline_accuracy:.4f}")

XGBoost Accuracy: 0.9591


In [20]:
def objective(trial):
    # Compute scale_pos_weight to handle class imbalance
    pos = sum(y_train == 1)
    neg = sum(y_train == 0)
    scale_pos_weight = neg / pos

    params = {
        'n_estimators': trial.suggest_int('n_estimators', 50, 500),
        'max_depth': trial.suggest_int('max_depth', 3, 100),
        'learning_rate': trial.suggest_float('learning_rate', 0.01, 0.3),
        'subsample': trial.suggest_float('subsample', 0.6, 1.0),
        'colsample_bytree': trial.suggest_float('colsample_bytree', 0.6, 1.0),
        'gamma': trial.suggest_float('gamma', 0, 5),
        'reg_alpha': trial.suggest_float('reg_alpha', 0, 10),
        'reg_lambda': trial.suggest_float('reg_lambda', 0, 10),
        'scale_pos_weight': scale_pos_weight
    }

    model = XGBClassifier(eval_metric='logloss', random_state=7, use_label_encoder=False, **params)
    
    # 5-fold cross-validation using accuracy
    score = cross_val_score(model, x_train, y_train, cv=5, scoring='accuracy').mean()
    return score

# Create and optimize the study
study = optuna.create_study(direction='maximize')
study.optimize(objective, n_trials=100)

# Best results
print("Best Hyperparameters: ", study.best_params)
print("Best Accuracy: ", study.best_value)


[I 2025-05-29 22:22:54,600] A new study created in memory with name: no-name-b0208057-cb3c-450b-8232-5c4f9a7acc5e
[I 2025-05-29 22:22:55,911] Trial 0 finished with value: 0.9397151898734177 and parameters: {'n_estimators': 438, 'max_depth': 89, 'learning_rate': 0.29849837507030175, 'subsample': 0.707739710080838, 'colsample_bytree': 0.7420400448003868, 'gamma': 1.189939368070974, 'reg_alpha': 7.6858627672766975, 'reg_lambda': 4.675515109036943}. Best is trial 0 with value: 0.9397151898734177.
[I 2025-05-29 22:22:56,973] Trial 1 finished with value: 0.9296835443037974 and parameters: {'n_estimators': 415, 'max_depth': 92, 'learning_rate': 0.1739999041224506, 'subsample': 0.9172564486773785, 'colsample_bytree': 0.6118839047616612, 'gamma': 2.8909186578955643, 'reg_alpha': 6.223521447731558, 'reg_lambda': 1.6193277132824313}. Best is trial 0 with value: 0.9397151898734177.
[I 2025-05-29 22:22:57,902] Trial 2 finished with value: 0.9422151898734177 and parameters: {'n_estimators': 290, 'ma

Best Hyperparameters:  {'n_estimators': 230, 'max_depth': 6, 'learning_rate': 0.18980284512611548, 'subsample': 0.6396626578705434, 'colsample_bytree': 0.8953884937679862, 'gamma': 0.07112505010552457, 'reg_alpha': 1.5964442119016944, 'reg_lambda': 0.7739122436890957}
Best Accuracy:  0.9623417721518986


In [22]:
# Define parameter grid 
param_grid = {
    'n_estimators': [100, 200, 300],
    'max_depth':[3,5,7],
    'learning_rate':[0.01, 0.1, 0.2],
    'subsample':[0.6, 0.8, 1.0]
}

# train xgboost with grid search

grid_search = GridSearchCV(
    estimator=XGBClassifier(eval_metrics ='logloss', random_state = 7),
    param_grid= param_grid,
    scoring='accuracy',
    cv=3,
    verbose=1   
)

grid_search.fit(x_train, y_train)

# Best parameters and accuracy
print(f"Grid Search Best Parameters: {grid_search.best_params_}")
print(f"Grid Search Best Score: {grid_search.best_score_}")

Fitting 3 folds for each of 81 candidates, totalling 243 fits
Grid Search Best Parameters: {'learning_rate': 0.01, 'max_depth': 5, 'n_estimators': 200, 'subsample': 0.6}
Grid Search Best Score: 0.9548492443229285


In [23]:
# Define parameter grid 
param_dist= {
    'n_estimators': [100, 200, 300, 400],
    'max_depth':[3,5,7, 9],
    'learning_rate':[0.01, 0.05, 0.1, 0.2],
    'subsample':[0.6, 0.7, 0.8, 0.9, 1.0],
    'colsample_bytree': [0.6, 0.7, 0.8, 0.9, 1.0]
}

# traib XGBoost with Random Search
random_search = RandomizedSearchCV(
    estimator=XGBClassifier(eval_mmetrics= 'logloss', random_state=7),
    param_distributions=param_dist,
    n_iter=50, 
    scoring='accuracy',
    cv=3,
    verbose=1,
    random_state=7
    
)

random_search.fit(x_train, y_train)

# Best paramaters and accuracy 
print(f"Random Search Best Parameters: {random_search.best_params_}")
print(f"Random Search Best Score: {random_search.best_score_}")

Fitting 3 folds for each of 50 candidates, totalling 150 fits
Random Search Best Parameters: {'subsample': 0.6, 'n_estimators': 200, 'max_depth': 5, 'learning_rate': 0.1, 'colsample_bytree': 0.8}
Random Search Best Score: 0.9623680413154098
