# Day Two: Hyperparameter Tuning — Grid Search vs Random Search

## Introduction

When training machine learning models, it's often necessary to tune hyperparameters — these are configuration settings used to control the learning process. Unlike model parameters, hyperparameters are not learned from the data and must be set before training. 

Two commonly used strategies for hyperparameter tuning are **Grid Search** and **Random Search**. Both methods aim to find the best combination of hyperparameters that results in the highest model performance on validation data.

---

## What is Grid Search?

Grid Search is a brute-force approach to hyperparameter tuning. It systematically tests every possible combination of hyperparameter values that the user defines in a grid.

### How Grid Search Works:

1. Define a set of values for each hyperparameter.
2. Evaluate the model's performance using every possible combination from the grid.
3. Choose the combination that performs best on the validation set.

This method ensures that all specified combinations are tested, making it thorough but potentially time-consuming.

### Example:

Suppose you want to tune two hyperparameters:

- `max_depth`: [3, 5, 10]
- `min_samples_split`: [2, 4]

Grid search will train the model on all 3 x 2 = 6 combinations of these hyperparameters.

---

## What is Random Search?

Random Search offers a more efficient alternative. Instead of testing all possible combinations, it selects a fixed number of combinations randomly from the defined hyperparameter ranges.

### How Random Search Works:

1. Define a range or distribution for each hyperparameter.
2. Randomly sample a fixed number of combinations.
3. Train and evaluate the model for each sampled set of values.

This approach allows for a broader search of the parameter space without the computational cost of evaluating every possibility.

---
# Grid search and Random Search Visualization
![Clickhere](Gs&Rs.jpg)

## Grid Search vs Random Search

| Feature               | Grid Search                                     | Random Search                                  |
|-----------------------|--------------------------------------------------|------------------------------------------------|
| Search strategy        | Exhaustive — evaluates all combinations         | Random sampling of combinations                |
| Computational cost     | High, especially with large parameter spaces    | Lower, scales better with more parameters      |
| Coverage               | Limited to specified grid values                | Can explore unexpected but effective values    |
| Best suited for        | Small search spaces with fewer parameters       | Large search spaces or when time is limited    |

---

## Practical Guidelines

Choosing the right approach depends on your dataset, model complexity, and available computational resources.

### General Tips:

1. **Start broad with random search**:
   Begin by exploring a wide range of values to find promising regions in the hyperparameter space.

2. **Refine with grid search**:
   After identifying the best-performing ranges using random search, apply grid search in a narrower region for fine-tuning.

3. **Understand parameter sensitivity**:
   - Some parameters like learning rate often require fine-grained tuning (e.g., 0.001, 0.01, 0.1).
   - Others like number of estimators or depth can tolerate larger steps (e.g., 50, 100, 150).

---

## Summary

Grid search and random search are both effective methods for hyperparameter tuning. 

- Use **grid search** when the search space is small and you want a thorough evaluation.
- Use **random search** when the search space is large or when computational time is limited.
- Combining both can yield optimal results: start wide with random search, then fine-tune with grid search.

#### If I want to get deeper
[BlogPost](https://www.sabrepc.com/blog/deep-learning-and-ai/grid-search-and-random-search)

In [2]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import accuracy_score

# load dataset
data = load_iris()

# features and target
x,y = data.data, data.target

# Split dataset
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=7)

print(f"Features Name: {data.feature_names}")
print(f"Target Name: {data.target_names}")

# define hyperparameter grid
param_grid = {
    'n_estimators': [50, 100, 200],
    'max_depth': [None, 5, 10],
    'min_samples_split':[2, 5, 10]
 }

# Initilize Grid Search
grid_search = GridSearchCV(
  estimator= RandomForestClassifier(random_state=7),
  param_grid=param_grid,
  cv=5,
  scoring='accuracy',
  n_jobs=-1,  
)

# perform grid search
grid_search.fit(x_train,y_train)

# Evaluate the parameters
best_grid_model = grid_search.best_estimator_
y_pred = best_grid_model.predict(x_test)
accuracy = accuracy_score(y_test, y_pred)

print(f"Best Parameters (Grid Search): {grid_search.best_params_}")
print(f"GridSearch Accuracy Score: {accuracy}")

Features Name: ['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']
Target Name: ['setosa' 'versicolor' 'virginica']
Best Parameters (Grid Search): {'max_depth': None, 'min_samples_split': 10, 'n_estimators': 50}
GridSearch Accuracy Score: 0.8666666666666667
