# Day 11: Advanced Model Tuning and Hyperparameter Optimization

Today, we will take a step beyond basic model building and learn how to improve model performance using hyperparameter tuning and cross-validation techniques. These practices help us build more robust models that generalize better on unseen data.

## Topic covered:
- Introduction to Model Tuning

- Understanding Hyperparameters vs Parameters

- Grid Search and Random Search

- Cross-Validation Techniques

- Applying Cross-Validation with Regularized Models (Ridge, Lasso, ElasticNet)

- Evaluation Metrics Recap: MSE, Accuracy, etc.

## 1. Introduction to Model Tuning

Most machine learning models have hyperparameters—these are settings that you specify before training the model (e.g., alpha in Ridge or Lasso Regression).

Tuning these hyperparameters helps:

    - Improve model accuracy

    - Prevent overfitting

    - Optimize performance on unseen data

## 2. Parameters vs Hyperparameters

| Term                | Definition                                  | Example                                        |
| ------------------- | ------------------------------------------- | ---------------------------------------------- |
| **Model Parameter** | Learned from data during training           | Coefficients (`β`) in linear regression        |
| **Hyperparameter**  | Set before training; governs model behavior | `alpha` in Ridge, `max_depth` in Decision Tree |


## 3. Grid Search vs Random Search

### Grid Search

Tries every combination of specified hyperparameter values.

In [3]:
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import train_test_split
from sklearn.linear_model import Ridge
import numpy as np

# Sample data (house sizes and prices)
X = np.array([1000, 1500, 2000, 2500, 3000]).reshape(-1, 1)
y = np.array([200000, 300000, 400000, 500000, 600000])

# Splitting data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Example: Tune alpha in Ridge Regression
param_grid = {'alpha': [0.01, 0.1, 1, 10, 100]}
grid = GridSearchCV(Ridge(), param_grid, cv=2)
grid.fit(X_train, y_train)
print("Best alpha:", grid.best_params_)




Best alpha: {'alpha': 0.01}


### Random Search

Randomly tries combinations from a parameter distribution.

In [5]:
from sklearn.model_selection import RandomizedSearchCV

from scipy.stats import uniform

param_dist = {'alpha': uniform(0.01, 10)}
random_search = RandomizedSearchCV(Ridge(), param_distributions=param_dist, n_iter=10, cv=4)
random_search.fit(X_train, y_train)
print("Best alpha:", random_search.best_params_)


Best alpha: {'alpha': np.float64(2.409504255788265)}




## 4. Cross-Validation(CV)

Cross-validation helps evaluate the model more reliably by splitting data into several folds:

- k-Fold CV: Splits data into k parts and rotates training/testing

- Leave-One-Out CV: One data point is used as test, rest as train (very slow)

- Stratified K-Fold: Maintains class distribution (used in classification)

In [6]:
from sklearn.model_selection import cross_val_score

scores = cross_val_score(Ridge(alpha=1), X, y, cv=5, scoring='neg_mean_squared_error')
print("Average CV MSE:", -scores.mean())

Average CV MSE: 0.01730609565500119


## 5. Applying CV and Tuning on Ridge & Lasso

Ridge Example with Grid Search:

In [11]:
from sklearn.linear_model import Ridge
from sklearn.model_selection import GridSearchCV

ridge = Ridge()
params = {'alpha': [0.01, 0.1, 1, 10, 100]}

grid_search = GridSearchCV(ridge, params, cv=4, scoring='neg_mean_squared_error')
grid_search.fit(X_train, y_train)

print("Best alpha:", grid_search.best_params_)
print("Best MSE:", -grid_search.best_score_)


Best alpha: {'alpha': 0.01}
Best MSE: 1.006841674867602e-05


## 6. Evaluation Metrics Recap (for Tuning):

| Metric     | Use Case                  | Higher or Lower is Better |
| ---------- | ------------------------- | ------------------------- |
| MSE / RMSE | Regression                | Lower                     |
| R² Score   | Regression                | Higher                    |
| Accuracy   | Classification            | Higher                    |
| F1 Score   | Imbalanced classification | Higher                    |


Use the appropriate scoring metric when doing GridSearchCV (e.g., scoring='accuracy' for classification, scoring='neg_mean_squared_error' for regression).

## Summary of Day 11

Today, we explored techniques to fine-tune our machine learning models using:

    - Grid Search and Random Search to find optimal hyperparameters

    - Cross-Validation to assess model robustness and reduce overfitting

    - Practical application on Ridge and Lasso regressions using MSE as an evaluation metric

By using these techniques, you can ensure your models are not only accurate but also generalize well to new, unseen data.

## What's Next

On Day 12, we will dive into Decision Trees for both regression and classification tasks. You’ll learn:

- How trees split data

- Metrics like Gini Impurity and Entropy

- How to avoid overfitting with pruning and depth control