<a href="https://colab.research.google.com/github/Atharv-1905/Machine-Learning/blob/main/Hyperparameter_Tuning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Hyperparameter Tuning**

In [2]:
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split, GridSearchCV, RandomizedSearchCV
from sklearn.neighbors import KNeighborsClassifier

# 1. Load Data
iris = load_iris()
X, y = iris.data, iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# 2. Define the Model
knn = KNeighborsClassifier()

STRATEGY 1: GRID SEARCH (Brute Force)

In [7]:
param_grid = {
    'n_neighbors': [1, 3, 5, 7, 9, 11, 13, 15], # 8 values
}

grid_search = GridSearchCV(estimator=knn, param_grid=param_grid, cv=5, scoring='accuracy')
grid_search.fit(X_train, y_train)

print(f"--- Grid Search ---")
print(f"Best Parameters: {grid_search.best_params_}")
print(f"Best Score: {grid_search.best_score_:.4f}")

--- Grid Search ---
Best Parameters: {'n_neighbors': 3}
Best Score: 0.9583


1. Grid Search (The Brute Force Method(Slow)) :

-  Concept: You give the computer a specific list of values for each hyperparameter. The computer builds a model for every single possible combination of those values.

-  Analogy: You are trying to crack a 3-digit combination lock (0-9).

    You try 0-0-0.Then 0-0-1.Then 0-0-2.... all the way to 9-9-9.

-  The Math (Computational Cost): If you test 3 values for Parameter A and 4 values for Parameter B, you train $3 \times 4 = 12$ models.

Formula: $Cost = \prod (\text{number of values per parameter}) \times \text{CV Folds}$

STRATEGY 2: RANDOM SEARCH (Statistical)

In [5]:

# We define a range or list, but we limit the number of tries
from scipy.stats import randint

param_dist = {
    'n_neighbors': randint(1, 20),      # Any integer between 1 and 20
}
random_search = RandomizedSearchCV(estimator=knn, param_distributions=param_dist,
                                   n_iter=10, cv=5, scoring='accuracy', random_state=42)
random_search.fit(X_train, y_train)

print(f"\n--- Random Search ---")
print(f"Best Parameters: {random_search.best_params_}")
print(f"Best Score: {random_search.best_score_:.4f}")



--- Random Search ---
Best Parameters: {'n_neighbors': 11}
Best Score: 0.9500


2) Random Search (The Statistical Method(Fast)) :

-   Concept: Instead of trying every combination, you define a range (distribution) for the values. The computer picks random combinations and tests them. You tell it exactly how many tries (iterations) it gets.

-   Analogy: Cracking that same lock, but you only have 1 minute.

    You spin the dials randomly and try: 5-2-8... 1-9-3... 7-4-0.

    You stop after 50 tries and pick the best one.

The Math:    $Cost = n\_iter \times \text{CV Folds}$