# Hyperparameter Tuning

Hyperparameter tuning is the process of finding the optimal combination of hyperparameters—the external configuration settings of a model—that results in the best performance on a given task. Unlike model parameters, which are learned during training, hyperparameters must be set before training and can greatly affect model behavior.

## Key Concepts

### Hyperparameters vs. Model Parameters

- **Model Parameters**:  
  These are learned from data during training. For example, the weights in a neural network or the coefficients in linear regression.

- **Hyperparameters**:  
  These are set manually or via an automated tuning process. They control aspects of the training process (e.g., learning rate, regularization strength, number of layers, etc.) and can influence the model’s ability to generalize.

### Why Tune Hyperparameters?

- **Performance**: Optimal hyperparameters can improve accuracy, precision, recall, and other performance metrics.
- **Generalization**: Proper tuning helps prevent overfitting by finding a good balance between model complexity and data representation.
- **Efficiency**: Better-tuned hyperparameters can also reduce training time or resource usage.

## Common Tuning Methods

Several approaches exist for hyperparameter tuning:

### 1. Grid Search

Grid Search (implemented in scikit‑learn’s `GridSearchCV`) exhaustively evaluates every combination from a predefined grid of hyperparameter values.

- **Pros**:  
  Guarantees evaluation of all specified combinations.
- **Cons**:  
  Can be computationally expensive if the grid is large.

**Mathematical Formulation:**

<p>For hyperparameters <span style="font-family: 'Courier New', Courier, monospace;">p<sub>1</sub>, p<sub>2</sub>, ....., p<sub>k</sub></span> with candidate sets:</p>  
$$
P_1, P_2, \dots, P_k
$$
The grid is:
$$
\mathcal{P} = \{ (p_1, p_2, \dots, p_k) \mid p_1 \in P_1,\, p_2 \in P_2,\, \dots,\, p_k \in P_k \}
$$
Each combination is evaluated using cross-validation:
$$
\text{CV Score}(\mathbf{p}) = \frac{1}{K} \sum_{i=1}^{K} \text{score}_i(\mathbf{p})
$$
and the best combination is chosen:
$$
\mathbf{p}^* = \arg\max_{\mathbf{p} \in \mathcal{P}} \text{CV Score}(\mathbf{p})
$$

### 2. Random Search

Random Search (available as `RandomizedSearchCV` in scikit‑learn) randomly samples a fixed number of combinations from a defined hyperparameter space.

- **Pros**:  
  More efficient when the hyperparameter space is large, since it does not evaluate all possible combinations.
- **Cons**:  
  Does not guarantee that the best combination will be found.

**Sampling Formulation:**

<p>For each hyperparameter, define a distribution <span style="font-family: 'Courier New', Courier, monospace;">D<sub>i</sub></span>. Randomly sample <span style="font-family: 'Courier New', Courier, monospace;">N</span> combinations:</p>  
$$
\{ (p_1^{(i)}, p_2^{(i)}, \dots, p_k^{(i)}) \}_{i=1}^{N}
$$

### 3. Bayesian Optimization

Bayesian optimization builds a probabilistic model of the function mapping hyperparameters to the objective metric. It then selects the next set of hyperparameters to evaluate by optimizing an acquisition function.

- **Pros**:  
  Efficiently navigates large or continuous spaces and often finds good solutions with fewer evaluations.
- **Cons**:  
  More complex to implement and may require additional libraries (e.g., Hyperopt, BayesianOptimization).

### 4. Other Methods

- **Hyperband**: Uses adaptive resource allocation to focus on promising hyperparameter configurations.
- **Evolutionary Algorithms**: Uses genetic algorithms to evolve the hyperparameter population over time.

## Best Practices

- **Start Simple**: Begin with a coarse grid or fewer random samples, then refine the search around promising areas.
- **Use Cross-Validation**: Reliable evaluation through k-fold cross-validation reduces the risk of overfitting.
- **Balance Computational Cost**: Adjust the number of iterations and range of values based on available resources.
- **Monitor and Iterate**: Analyze the tuning results and adjust the search space if necessary.

## Python Code Examples

### GridSearchCV Example

```python
import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.svm import SVC
from sklearn.metrics import classification_report

# Load Iris dataset
iris = datasets.load_iris()
X = iris.data
y = iris.target

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42)

# Define hyperparameter grid
param_grid = {
    'C': [0.1, 1, 10, 100],
    'gamma': [1, 0.1, 0.01, 0.001],
    'kernel': ['rbf']
}

# Create GridSearchCV object with 5-fold cross-validation
grid = GridSearchCV(SVC(), param_grid, cv=5, scoring='accuracy', verbose=1, n_jobs=-1)
grid.fit(X_train, y_train)

print("Best Parameters:", grid.best_params_)
print("Best CV Accuracy: {:.2f}%".format(grid.best_score_ * 100))

# Evaluate on test data
y_pred = grid.predict(X_test)
print("\nClassification Report:\n", classification_report(y_test, y_pred))
```

### RandomizedSearchCV Example

```python
import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split, RandomizedSearchCV
from sklearn.svm import SVC
from sklearn.metrics import classification_report
from scipy.stats import expon

# Load Iris dataset
iris = datasets.load_iris()
X = iris.data
y = iris.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42)

# Define distributions for hyperparameters
param_distributions = {
    'C': expon(scale=100),      # Exponential distribution for C
    'gamma': expon(scale=0.1),    # Exponential distribution for gamma
    'kernel': ['rbf']
}

# Create RandomizedSearchCV object
random_search = RandomizedSearchCV(
    SVC(),
    param_distributions=param_distributions,
    n_iter=20,
    cv=5,
    scoring='accuracy',
    verbose=1,
    n_jobs=-1,
    random_state=42
)
random_search.fit(X_train, y_train)

print("Best Parameters:", random_search.best_params_)
print("Best CV Accuracy: {:.2f}%".format(random_search.best_score_ * 100))

# Evaluate on test set
y_pred = random_search.predict(X_test)
print("\nClassification Report:\n", classification_report(y_test, y_pred))
```

## Conclusion

Hyperparameter tuning is crucial for optimizing model performance. Techniques such as Grid Search and Random Search provide systematic ways to explore the hyperparameter space, while advanced methods like Bayesian optimization can further enhance efficiency when dealing with large or continuous spaces.

By applying these methods with cross-validation, you can reliably select the hyperparameters that yield the best generalization performance on unseen data.

---

*References:*
- [scikit‑learn documentation on GridSearchCV.](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html)
- [Analytics Vidhya’s guide on hyperparameter tuning.](https://www.analyticsvidhya.com/blog/2021/06/tune-hyperparameters-with-gridsearchcv/)
- [Great Learning’s article on hyperparameter tuning.](https://www.mygreatlearning.com/blog/gridsearchcv/)
