# What is GridSearchCV?

- ```GridSearchCV``` performs an ```exhaustive search``` over all possible combinations of hyperparameters in a given grid.

- It evaluates each combination using ```cross-validation``` and returns the best hyperparameters.

👉 Example: If you have

- ```max_depth = [3, 5, 7]```

- ```n_estimators = [50, 100]```

GridSearch will test all ```3 × 2 = 6``` combinations.


✅ Pros:

- Finds the absolute best hyperparameter combination (within the grid).

- Simple to understand.

❌ Cons:

- Computationally expensive (especially with large grids).

- Not scalable for high-dimensional hyperparameter spaces.


# What is RandomizedSearchCV?

- **RandomizedSearchCV** randomly samples hyperparameter combinations from a **given distribution**.

- You specify number of iterations (```n_iter```) instead of testing all combinations.

👉 Example:

Instead of testing all 6 combos, it may test only 3 random ones.


✅ Pros:

- Much faster than GridSearch.

- Works better with large search spaces.

- Can still find near-optimal hyperparameters.

❌ Cons:

- Doesn’t guarantee the absolute best hyperparameters.

- Results can vary depending on random seed.


# 🔹 3. Key Differences

| Aspect        | GridSearchCV             | RandomizedSearchCV |
| ------------- | ------------------------ | ------------------ |
| Search method | Exhaustive (all combos)  | Random sampling    |
| Speed         | Slow (expensive)         | Faster             |
| Best result   | Guaranteed (within grid) | Approximate        |
| Use case      | Small search space       | Large search space |



# 🔹 4. Real-Time Example

Let’s take a **Random Forest Classifier** on the ```Titanic dataset```.



In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split, GridSearchCV, RandomizedSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

# Load Titanic dataset
from sklearn.datasets import fetch_openml
titanic = fetch_openml("titanic", version=1, as_frame=True)
df = titanic.frame

# Preprocess (drop missing & encode)
df = df[['pclass', 'sex', 'age', 'fare', 'survived']].dropna()
df['sex'] = df['sex'].map({'male':0, 'female':1})

X = df.drop('survived', axis=1)
y = df['survived']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define model
rf = RandomForestClassifier(random_state=42)

# --- GridSearchCV ---
param_grid = {
    'n_estimators': [50, 100, 200],
    'max_depth': [3, 5, 7]
}
grid_search = GridSearchCV(rf, param_grid, cv=5, scoring='accuracy')
grid_search.fit(X_train, y_train)

print("Best params (GridSearch):", grid_search.best_params_)
print("Best score (GridSearch):", grid_search.best_score_)

# --- RandomizedSearchCV ---
from scipy.stats import randint

param_dist = {
    'n_estimators': randint(50, 300),
    'max_depth': randint(3, 10)
}
random_search = RandomizedSearchCV(rf, param_distributions=param_dist,
                                   n_iter=5, cv=5, scoring='accuracy', random_state=42)
random_search.fit(X_train, y_train)

print("Best params (RandomizedSearch):", random_search.best_params_)
print("Best score (RandomizedSearch):", random_search.best_score_)

# Test set performance
best_model = random_search.best_estimator_
y_pred = best_model.predict(X_test)
print("Test Accuracy:", accuracy_score(y_test, y_pred))

Best params (GridSearch): {'max_depth': 3, 'n_estimators': 100}
Best score (GridSearch): 0.8241873396065014
Best params (RandomizedSearch): {'max_depth': 5, 'n_estimators': 121}
Best score (RandomizedSearch): 0.8206087824351298
Test Accuracy: 0.7655502392344498


# 🔹 5. Explanation in Real-Time

- **GridSearchCV** will try **all combos (3×3=9 tests)** → ensures best result but slower.

- **RandomizedSearchCV** will try only **5 random combos (n_iter=5)** → much faster, may still hit near-best parameters.

- In practice, people often start with **RandomizedSearchCV** to narrow down ranges, then use **GridSearchCV** for fine-tuning.

👉 So in real-world use cases:

- **Small models, small parameter space** → Use **GridSearchCV**.

- **Large models (RandomForest, XGBoost, Neural Nets)** → Use **RandomizedSearchCV** for efficiency.