## Grid Search CV

Grid Search CV is a powerful technique used in machine learning to tune hyperparameters like alpha in ridge regression by systematically trying out various values and selecting the one that optimizes model performance based on cross-validation results. 

It automates the tedious process of manually testing each hyperparameter and helps in preventing overfitting by finding the best generalization performance on unseen data.

### Explanation of Grid Search CV for Hyperparameter Tuning

Grid Search CV works by:
- Defining a grid of hyperparameter values to try.

- For each value (or combination of values), the model is trained and evaluated using cross-validation.

- The performance scores, often mean squared error for regression, are collected.

- The best hyperparameters are selected based on the metric, such as the lowest dev set error.

- The best model can then be used for prediction, with accessible attributes like coefficients for interpretation.

This method reduces variance in model evaluation by using k-fold cross-validation, helping to avoid overfitting to a particular validation set. The GridSearchCV object in scikit-learn holds detailed results on training times, rankings, and scores, allowing you to analyze the tuning process for debugging or understanding model behavior.

### Key Points
- GridSearchCV allows tuning multiple hyperparameters simultaneously.

- It requires specifying the hyperparameter names with the step names in pipelines, joined by double underscores.

- The fit method is called on the entire dataset, internally performing cross-validation splits for train/dev.

- The best estimator can be accessed via `.best_estimator_`, and predictions should be made using this best estimator rather than the GridSearchCV object directly.

- Cross-validation results are accessible through `.cv_results_` for detailed inspection.

In [3]:
# Example Python Implementation with Ridge Regression on a Dataset

import numpy as np
import pandas as pd
from sklearn.linear_model import Ridge
from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.metrics import mean_squared_error

# Toy dataset for demonstration
from sklearn.datasets import load_diabetes
data = load_diabetes()
X = data.data
y = data.target

# Split into training and development sets
X_train, X_dev, y_train, y_dev = train_test_split(X, y, test_size=0.2, random_state=42)

# Pipeline with standard scaler and ridge regression
pipe = Pipeline([
    ('scaler', StandardScaler()),
    ('ridge', Ridge())
])

# Define alpha values to search over on log scale
alpha_values = np.logspace(-5, 4, 50)

# Setup GridSearchCV with the hyperparameter dictionary
param_grid = {'ridge__alpha': alpha_values}

# Create GridSearchCV object with negative mean squared error as scoring
grid_search = GridSearchCV(pipe, param_grid, scoring='neg_mean_squared_error', cv=5)

# Fit GridSearchCV on training + dev data (cross-validation internally)
grid_search.fit(X_train, y_train)

# Best alpha found
best_alpha = grid_search.best_estimator_.named_steps['ridge'].alpha
print(f"Best alpha: {best_alpha}")

# Predict on dev set with best estimator
y_pred = grid_search.best_estimator_.predict(X_dev)
dev_mse = mean_squared_error(y_dev, y_pred)
print(f"Development set MSE with best alpha: {dev_mse}")

# Extract coefficients from the best model
coefficients = grid_search.best_estimator_.named_steps['ridge'].coef_
coef_df = pd.DataFrame(coefficients, index=data.feature_names, columns=['Coefficient'])
print(coef_df)

Best alpha: 26.826957952797272
Development set MSE with best alpha: 2863.8889199207347
     Coefficient
age     1.981092
sex   -10.371246
bmi    24.775212
bp     15.756054
s1     -6.968186
s2     -3.357444
s3     -8.494348
s4      7.543101
s5     19.927514
s6      3.438558


### Interpretation
- The code scales features to z-scores using StandardScaler.

- It performs a grid search over 50 alpha values from $10^{-5}$ to $10^{4}$.

- The best alpha minimizes the dev set mean squared error found by cross-validation.

- The coefficients from the best ridge regression model can be interpreted relative to the standardized features.

- This approach automates finding optimal regularization strength to balance underfitting and overfitting in the model.

This method matches the experimental description where a wide range of alphas are tested, and the best value (around alpha=20-23) is selected, showing how regularization improves unseen data performance while managing model complexity.

Sources:

[1](https://dev.to/anurag629/gridsearchcv-in-scikit-learn-a-comprehensive-guide-2a72)
[2](https://www.youtube.com/watch?v=4XJX_Sb5Zf0)
[3](https://www.kdnuggets.com/hyperparameter-tuning-gridsearchcv-and-randomizedsearchcv-explained)
[4](https://alfurka.github.io/2018-11-18-grid-search/)
[5](https://www.geeksforgeeks.org/machine-learning/hyperparameter-tuning/)
[6](https://stackoverflow.com/questions/43530761/cross-validation-with-grid-search-returns-worse-results-than-default)
[7](https://scikit-learn.org/stable/modules/grid_search.html)
[8](https://www.youtube.com/watch?v=cOos6wRMpAU)
[9](https://www.reddit.com/r/datascience/comments/ydo3fj/thoughts_on_hyperparameter_tuning_using_sklearns/)
[10](https://www.geeksforgeeks.org/machine-learning/comparing-randomized-search-and-grid-search-for-hyperparameter-estimation-in-scikit-learn/)