# Model Tuning with RandomizedSearchCV

RandomizedSearchCV is a method provided by scikit‑learn that helps you tune model hyperparameters by randomly sampling from a specified distribution over the hyperparameter space. Instead of trying every combination as in GridSearchCV, RandomizedSearchCV evaluates a fixed number of randomly chosen combinations, which often leads to significant speedups.

## Overview

- **Hyperparameters** are the external configuration settings of a model (e.g., regularization strength, number of neighbors).
- **RandomizedSearchCV** randomly samples hyperparameter combinations from given distributions, then uses cross-validation to evaluate model performance.
- It is especially useful when the number of combinations is huge or when you suspect only a few parameters are truly influential.

## Mathematical Formulation

Suppose you have k hyperparameters, and instead of a grid, you define a probability distribution for each:

<ul style="list-style-type: none;">  
    <li><span style="font-family: 'Courier New', Courier, monospace;">p<sub>1</sub> ~ D<sub>1</sub></span></li>  
    <li><span style="font-family: 'Courier New', Courier, monospace;">p<sub>2</sub> ~ D<sub>2</sub></span></li>  
    <li>&#x2026;</li>  
    <li><span style="font-family: 'Courier New', Courier, monospace;">p<sub>k</sub> ~ D<sub>k</sub></span></li>  
</ul>  

RandomizedSearchCV will sample N combinations:
$$
\{ (p_1^{(i)}, p_2^{(i)}, \dots, p_k^{(i)}) \}_{i=1}^{N}
$$
<p>For each sampled combination <span style="font-family: 'Courier New', Courier, monospace;">𝑝<sup>(i)</sup></span>, the model is trained and evaluated using cross-validation:</p>  
$$
\text{CV Score}(\mathbf{p}^{(i)}) = \frac{1}{K} \sum_{j=1}^{K} \text{score}_{j}(\mathbf{p}^{(i)})
$$
The best combination is then selected based on the highest (or lowest) average CV score:
$$
\mathbf{p}^* = \arg\max_{i=1,\dots,N} \text{CV Score}(\mathbf{p}^{(i)})
$$

## How RandomizedSearchCV Works

1. **Define Distributions:** Instead of a grid, you define a distribution (or list) of possible values for each hyperparameter.
2. **Sampling:** RandomizedSearchCV samples a fixed number \( N \) of combinations from these distributions.
3. **Evaluation:** Each sampled set of hyperparameters is evaluated using cross-validation.
4. **Selection:** The hyperparameter set with the best average performance is selected.
5. **Refitting:** Optionally, the best model is refit on the entire training dataset.

## Python Code Example

Below is an example that uses RandomizedSearchCV with an SVM classifier on the Iris dataset. In this example, we tune the regularization parameter \( C \) and the kernel coefficient \(\gamma\).

```python
import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split, RandomizedSearchCV
from sklearn.svm import SVC
from sklearn.metrics import classification_report
from scipy.stats import expon

# Load the Iris dataset
iris = datasets.load_iris()
X = iris.data
y = iris.target

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42)

# Define the distributions for the hyperparameters
param_distributions = {
    'C': expon(scale=100),  # Exponential distribution for C
    'gamma': expon(scale=0.1),  # Exponential distribution for gamma
    'kernel': ['rbf']  # Using the RBF kernel
}

# Create a RandomizedSearchCV object with 5-fold cross-validation
random_search = RandomizedSearchCV(
    SVC(),
    param_distributions=param_distributions,
    n_iter=20,  # Number of parameter settings sampled
    cv=5,
    scoring='accuracy',
    verbose=1,
    n_jobs=-1,
    random_state=42
)

# Fit RandomizedSearchCV on the training data
random_search.fit(X_train, y_train)

# Print the best hyperparameters and the corresponding CV score
print("Best Parameters:", random_search.best_params_)
print("Best CV Accuracy: {:.2f}%".format(random_search.best_score_ * 100))

# Evaluate the best estimator on the test set
y_pred = random_search.predict(X_test)
print("\nClassification Report:\n", classification_report(y_test, y_pred))
```

### Explanation

- **Data Preparation:** The Iris dataset is loaded and split into training and testing sets.
- **Parameter Distributions:** We define distributions for \( C \) and \(\gamma\) using an exponential distribution (via `scipy.stats.expon`) and fix the kernel to `'rbf'`.
- **RandomizedSearchCV Object:** We sample 20 different hyperparameter combinations using 5-fold cross-validation.
- **Evaluation:** The best hyperparameters and their corresponding CV accuracy are printed, and the final model is evaluated on the test set.

## Conclusion

RandomizedSearchCV offers a flexible and efficient alternative to GridSearchCV when tuning hyperparameters over a large search space. By randomly sampling a fixed number of combinations, it reduces computation time while still providing competitive model performance.

---

*References:*
- [Analytics Vidhya’s guide on GridSearchCV and hyperparameter tuning.](https://www.analyticsvidhya.com/blog/2021/06/tune-hyperparameters-with-gridsearchcv/)
- [LabEx’s tutorial on optimizing model hyperparameters with GridSearchCV.](https://labex.io/tutorials/ml-optimizing-model-hyperparameters-with-gridsearchcv-49219)