# Part 5.3: Model Selection - Hyperparameter Tuning

**Hyperparameters** are parameters that are not learned from the data but are set prior to training (e.g., `C` or `gamma` in an SVM). Finding the optimal hyperparameters for a model can significantly improve its performance. This process is called **hyperparameter tuning**.

In [1]:
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV
from sklearn.datasets import load_iris
from sklearn.svm import SVC
from scipy.stats import expon, reciprocal

X, y = load_iris(return_X_y=True)

### Grid Search (`GridSearchCV`)
Grid Search exhaustively tries every combination of the specified hyperparameter values and evaluates each combination using cross-validation.

In [2]:
# Define the parameter grid
param_grid = {
    'C': [0.1, 1, 10, 100],
    'gamma': [1, 0.1, 0.01, 0.001],
    'kernel': ['rbf']
}

grid_search = GridSearchCV(SVC(), param_grid, refit=True, verbose=0, cv=5)
grid_search.fit(X, y)

print("Best parameters found by Grid Search:", grid_search.best_params_)
print(f"Best score from Grid Search: {grid_search.best_score_:.4f}")

Best parameters found by Grid Search: {'C': 1, 'gamma': 0.1, 'kernel': 'rbf'}
Best score from Grid Search: 0.9800


### Random Search (`RandomizedSearchCV`)
Random Search is often more efficient. Instead of trying all combinations, it samples a fixed number of parameter settings from specified probability distributions.

In [3]:
# Define the parameter distributions
param_dist = {
    'C': reciprocal(0.1, 100),
    'gamma': expon(scale=1.0),
    'kernel': ['rbf']
}

# n_iter specifies the number of parameter settings that are sampled
random_search = RandomizedSearchCV(SVC(), param_distributions=param_dist, n_iter=100, cv=5, random_state=42)
random_search.fit(X, y)

print("Best parameters found by Randomized Search:", random_search.best_params_)
print(f"Best score from Randomized Search: {random_search.best_score_:.4f}")

Best parameters found by Randomized Search: {'C': np.float64(5.9874749104613985), 'gamma': np.float64(0.047563849756408545), 'kernel': 'rbf'}
Best score from Randomized Search: 0.9867


### Advanced Method: Bayesian Optimization
This is a more intelligent search strategy. It forms a probabilistic model of the objective function (e.g., cross-validation score as a function of hyperparameters) and uses it to select the most promising hyperparameters to evaluate next. This can be much more efficient than grid or random search.

Popular libraries for this include **Optuna** and **Hyperopt**.