# **Hyperparameter Tuning in Machine Learning:**

**Hyperparameter**:

A hyperparameter is a decision you make before the model learns.

It controls how the model learns, not what it learns.

Think of it like this:

In cooking:
Recipe (hyperparameters): cooking time, oven temperature, spice levels.
Outcome (parameters): flavor balance, crust texture — learned from ingredients and process.

What are Hyperparameters?

Hyperparameters are settings that control the learning process of a model.
Unlike parameters (which are learned from the data), hyperparameters are set before training.

Difference Between Parameters & Hyperparameters:

**Parameters**: Learned by the model (e.g., weights in linear regression).

**Hyperparameters**: Set manually before training (e.g., learning rate, number of trees in Random Forest).

Example: Tuning a Decision Tree

Imagine you are growing a tree:

If the tree is too short (low depth), it won’t capture enough pattern

# **Grid Search:**

Test every combination.

Best for small search spaces.

Slow but guaranteed to find the best option in the grid.

**Grid Search – Real-Life Example:**

Imagine buying a car:

You list exact combinations:

Brand: Toyota, Honda

Color: Red, Black

Fuel: Petrol, Diesel

You check every single combination (Toyota-Red-Petrol, Toyota-Red-Diesel, etc.).
It’s exhaustive but time-consuming.

Perfect if you know exactly what features matter and can patiently evaluate each combo.


# **Random Search**:

Randomly selects combinations from the defined ranges.

More efficient for large or complex search spaces.

Often finds near-optimal solutions faster.

Think: Creative explorer, sampling wild possibilities for hidden gold.

**Random Search – Real-Life Example:**

Imagine choosing a vacation spot:

Instead of checking all countries, seasons, hotels, and activities,

You spin a globe and shortlist random places, then look for deals.

You might stumble upon an unexpectedly perfect destination.

Ideal when possibilities are huge and you want serendipity with speed.



**Challenge**: In innovation, when do you grid (precision) and when do you random (discovery)?


In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split, GridSearchCV, RandomizedSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
from sklearn.datasets import load_iris


# **Load and Explore the dataset**

In [6]:
# Load dataset (Replace 'your_dataset.csv' with actual filename)
# df = pd.read_csv('your_dataset.csv')
# df.head()

df = load_iris()
X= df.data
y = df.target


# **Define Hyperparameter Grid**

In [7]:
param_grid = {
    'n_estimators': [10, 50, 100, 200],
    'max_depth': [5, 10, 20, None],
    'min_samples_split': [2, 5, 10],
    'min_samples_leaf': [1, 2, 4]
}
# Split into training and testing sets (80-20 split)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# **Apply Grid Search CV**

In [8]:
# Grid Search CV
grid_search = GridSearchCV(RandomForestClassifier(), param_grid, cv=5, n_jobs=-1, verbose=1)
grid_search.fit(X_train, y_train)

# Get best parameters
print('Best Parameters from Grid Search:', grid_search.best_params_)


Fitting 5 folds for each of 144 candidates, totalling 720 fits
Best Parameters from Grid Search: {'max_depth': 10, 'min_samples_leaf': 1, 'min_samples_split': 5, 'n_estimators': 10}


# **Apply Randomized Search CV**

In [9]:

# Randomized Search CV
random_search = RandomizedSearchCV(RandomForestClassifier(), param_grid,  n_iter=10,  cv=5,  n_jobs=-1,   verbose=1, random_state=42)
random_search.fit(X_train, y_train)

# Get best parameters
print('Best Parameters from Random Search:', random_search.best_params_)


Fitting 5 folds for each of 10 candidates, totalling 50 fits
Best Parameters from Random Search: {'n_estimators': 50, 'min_samples_split': 5, 'min_samples_leaf': 4, 'max_depth': 10}


# **Evaluate Model Performance**

In [10]:


# Train models with best parameters
best_grid_model = grid_search.best_estimator_
best_random_model = random_search.best_estimator_

# Predictions
y_pred_grid = best_grid_model.predict(X_test)
y_pred_random = best_random_model.predict(X_test)

# Accuracy Scores
acc_grid = accuracy_score(y_test, y_pred_grid)
acc_random = accuracy_score(y_test, y_pred_random)

print(f'Grid Search Accuracy: {acc_grid:.2f}')
print(f'Random Search Accuracy: {acc_random:.2f}')


Grid Search Accuracy: 1.00
Random Search Accuracy: 1.00
