<a href="https://colab.research.google.com/github/Ramandeep-Singh17/Machine-Learning/blob/main/SL_Hyperparameter_Tuning_Grid_Random.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## 🔧 Model Tuning (Hyperparameter Tuning)

Jab bhi hum koi Machine Learning model banate hain, ek baat fix hai:

> ⚠️ **First version of the model kabhi best nahi hota.**

Isi liye zarurat padti hai **Model Tuning** ki.

---

### ⚙️ Hyperparameters kya hote hain?

Model ke kuch settings hote hain jo hum manually set karte hain. Jaise:

- KNN → `k` (kitne neighbors)
- Decision Tree → `max_depth`, `min_samples_split`
- Gradient Boosting → `learning_rate`, `n_estimators`

Ye sab cheezein model ka behavior define karti hain.

---

### ❗ Agar randomly choose karein:

- Model **underfit** ho sakta hai (kam sikhega)
- Ya **overfit** (sirf training yaad karke baaki fail karega)
- Ya simply **poor performance** de sakta hai

---

### 🎯 Goal of Model Tuning:

1. Best hyperparameter ka combination find karna
2. Taaki model **new, unseen data** pe bhi achha perform kare (sirf training pe nahi)

---

### 💡 Short Me:

> **Model Tuning = Apne model ka full potential nikalna 🔥**  
> Isse hi **Hyperparameter Tuning** bhi kehte hain.

---

### 🔁 Next Step:

Model tuning start karne se pehle ek important cheez samajhni padti hai:

> ✅ **Cross Validation** — kya hai aur kaise kaam karta hai


In [None]:
#In English

## 🔧 Model Tuning (Hyperparameter Tuning)

When we build a Machine Learning model, there’s one big truth:

> ⚠️ **The first version of your model is never the best version.**

That’s where **Model Tuning** comes in.

---

### ⚙️ What are Hyperparameters?

These are settings that control how your model behaves. Some examples:

- KNN → `k` (number of neighbors)
- Decision Tree → `max_depth`, `min_samples_split`
- Gradient Boosting → `learning_rate`, `n_estimators`

These must be set **before** training starts.

---

### ❗ If chosen randomly:

- Model may **underfit** (learns too little)
- Or **overfit** (memorizes training data)
- Or just perform **poorly** overall

---

### 🎯 Goal of Model Tuning:

1. Find the **best combination of hyperparameters**
2. So the model performs better on **new, unseen data**

---

### 💡 In Short:

> **Model Tuning = Squeezing out the best possible performance from your model 🔥**  
> This is also known as **Hyperparameter Tuning**.

---

### 🔁 Next Step:

Before tuning begins, it’s important to understand:

> ✅ **Cross Validation** — what it is and how to use it


In [None]:
#just a enhanced version of the notes

## 🔧 Model Tuning (Hyperparameter Tuning)

---

### ✅ What:
Model tuning matlab — model ke **hyperparameters** ko optimize karna taaki **best performance** mile.

---

### ❓ Why:
- Kyunki model ke default ya random parameters se output weak ho sakta hai.
- Tuning se **underfitting** aur **overfitting** dono control hota hai.
- Ye final model ka performance strong banata hai.

---

### 🕒 When:
- Jab model ban gaya ho aur ab usse aur behtar banana ho.
- Jab accuracy low ho ya generalization me dikkat ho.

---

### 📍 Where (Use hota hai):
- Har supervised learning model me — Logistic Regression, KNN, SVM, Decision Trees, etc.
- **Model Deployment** se pehle hamesha tuning karna chahiye.

---

### ⚙️ How:
1. Hyperparameters set karte hain (like `k` in KNN, `max_depth` in DT, etc.)
2. Alag-alag values try karte hain (GridSearchCV ya RandomSearchCV se)
3. Cross Validation lagake test karte hain
4. Best value select karke final model banate hain

---

### 🌍 Real-life Examples:
- Credit card fraud detection → best model chahiye jo false alarms kam de
- Medical diagnosis → hyperparameter tuning se accurate prediction
- Email spam filter → tuning se precision/recall improve hota hai

---

### 💡 In Short:
> **Model Tuning = Apne model ka full potential nikalna 🔥**  
> Isi ko **Hyperparameter Tuning** bhi kehte hain.

---

### ⏭️ Next Topic:
Pehle Cross Validation samajhna zaroori hai — tabhi tuning sahi se ho sakti hai.


In [None]:
#hm try karte ha ki best combination mile hyperparameter  unseen data.

## 🔧 Model Tuning (Hyperparameter Tuning)

---

### ✅ What:
Model tuning matlab — model ke hyperparameters ko tune karna taaki best version mile.

---

### ❓ Why:
- Random hyperparameters se model **underfit**, **overfit** ya poor perform kar sakta hai.
- Isiliye tuning karte hain taaki **model unseen data pe achha perform kare.**

---

### 🕒 When:
- Jab base model ready ho jaata hai
- Tuning hamesha **model finalize karne se pehle** ki jaati hai

---

### 📍 Where:
- Sabhi supervised ML models me — jaise:
  - Decision Tree → `max_depth`, `min_samples_split`
  - KNN → `k` (neighbors ki sankhya)
  - Gradient Boost → `learning_rate`

---

### ⚙️ How:
- Jab hum KNN jaise model me **different `k` values try karte hain** (jaise 3, 5, 7, 9), toh har value par **accuracy alag hoti hai**.
- Tuning ka kaam hai:
  1. **Multiple hyperparameter values** try karna
  2. **Cross validation** use karke har value ko test karna
  3. **Jo best result de**, use final model me rakhna

---

### 🌍 Real-life Examples:
- Final model ko deploy karne se pehle tuning zaruri hoti hai
- Jise ki model **real-world data pe fail na kare**


In [None]:
#cross validation

## 🔁 Cross Validation

---

### ✅ What:
Cross Validation ek technique hai jisme hum apne data ko multiple parts me divide karke model ko baar-baar train & test karte hain — taaki performance zyada reliable ho.

---

### ❓ Why:
- Kabhi kabhi agar hum **80% train + 20% test** karte hain, toh model usi 20% ke basis pe judge hota hai.
- Agar wo 20% lucky ya biased ho gaya, toh model ka score galat lag sakta hai.
- Isliye **Cross Validation** karte hain — taaki **underfitting/overfitting** detect ho sake aur final performance fair ho.

---

### 🕒 When:
- Jab data zyada nahi hai
- Jab model ka accuracy up-down kar raha ho
- Jab best hyperparameters dhoondhne ho (GridSearchCV me use hota hai)

---

### 📍 Where:
- Har supervised learning problem me use hota hai: classification ho ya regression
- Hyperparameter tuning ke time bhi use hota hai

---

### ⚙️ How:

#### 📌 Tu jo points bola usko yahan highlight kiya gaya hai:

🔹 **Generally, hum 100% data ko 80% training + 20% testing** ke liye use karte hain —  
But ye method **hamesha sahi result nahi deta**, aur **overfitting / underfitting** ho sakti hai.

🔹 Isliye **Cross Validation** me hum **K-Fold Cross Validation** ka use karte hain.

🔹 Example: **K = 5**

```text
DATA  ➝  ▓▓▓▓░░░░░░  (100% data)

Split into 5 parts:
Fold 1 | Fold 2 | Fold 3 | Fold 4 | Fold 5

Step 1: Train on 1+2+3+4 → Test on 5  
Step 2: Train on 1+2+3+5 → Test on 4  
Step 3: Train on 1+2+4+5 → Test on 3  
Step 4: Train on 1+3+4+5 → Test on 2  
Step 5: Train on 2+3+4+5 → Test on 1
---

### 🔁 Key Points in K-Fold Cross Validation

---

🔄 **Har step me testing data ka 20% change hota hai**

- Jab `k = 5` hota hai, toh data 5 parts me divide hota hai.
- Har round me ek **alag 20% part testing** ke liye use hota hai.
- Isse model ko **har section pe test karna possible hota hai**.

---

📉 **Har baar model ki accuracy score different ho sakti hai**

- Kyunki har test set alag hota hai.
- Kabhi test data easy hoga, kabhi tough — isliye **performance fluctuate kar sakti hai**.

---

📊 **End me sab scores ka average nikaalte hain – jise bolte hain final model performance**

- Sabhi folds ka result ek saath leke:
  ```python
  final_accuracy = (acc1 + acc2 + acc3 + acc4 + acc5) / 5
Ye average score model ka real performance dikhata hai.

🌍 Real-life Examples
🏥 Medical Diagnosis:
Medical datasets chhote hote hain — isliye cross-validation reliable performance deta hai.

🏦 Credit Scoring:
Multiple test splits ensure karte hain ki model alag-alag customers pe kaam kare.

🎯 Kaggle Competitions:
Almost har winning solution me K-Fold CV use hota hai for best accuracy and stability.

**💡 Short Summary:**

Cross Validation = Fair testing + Reliable accuracy.

K-Fold CV = Model ko har angle se test karna **🔁**

---

## 🎯 Hyperparameter Tuning – Based on Your Notes

---

### ✅ What is Hyperparameter Tuning?

> Hyperparameter tuning matlab: **model ke external settings** ko test karna  
> taaki **best combination mil sake** jo highest accuracy de.

Ye parameters **training ke pehle set hote hain**, aur training ke dauraan change nahi hote.

---

## 🛠️ Examples from Notes (Hyperparameters in Models):

---

### 🔹 Ridge & Lasso Regression

- Regularization models hain jo **overfitting ko control** karte hain  
- Hyperparameter: `alpha` (ya `lambda`)  
  ➤ Ye decide karta hai kitna penalty lagana hai model ke coefficients par

---

### 🔹 KNN (K-Nearest Neighbors)

- Hyperparameter: `n_neighbors`  
  ➤ Jaise `[3, 5, 7, 9]` try karte hain — sab par alag accuracy aati hai  
  ➤ **Isi liye hyperparameter tuning karni padti hai**

---

### 🔹 Decision Tree

- Hyperparameters:
  - `max_depth` → Kitni depth tak tree jaaye
  - `min_samples_split` → Node split hone ke liye min sample kitne chahiye
  - `max_features` → Har node pe consider hone wale features ki sankhya

---

## 📌 Manual Search (Basic Method)

### ✅ What:
> Jab hum manually hyperparameters change karke accuracy check karte hain

### ❗ Problem:
> Jaise Titanic dataset me:
- RandomForest ka `n_estimators = 100` pe accuracy 82.7 thi
- Manually `50` set kiya toh gir ke 78.5 ho gayi
- **Kab tak aise guess karte rahenge?** 😓

📌 **Manual tuning is slow & not scalable**

---

## 🧪 Automated Methods for Hyperparameter Tuning

---

### 🔍 1. Grid Search CV

#### ✅ What:
> Sab possible combinations ko **test karta hai systematically**

#### 📦 Breakdown:
| Word        | Meaning |
|-------------|---------|
| **Grid**    | Table of all possible hyperparameter values  
| **Search**  | In sab combinations me se best ko find karna  
| **CV**      | Cross Validation – har combo ko fair tarike se test karna

#### ⚙️ How it works:
```text
Example:
n_neighbors = [3, 5, 7]
weights = ['uniform', 'distance']

Grid:
[3, 'uniform']
[3, 'distance']
[5, 'uniform']
[5, 'distance']
[7, 'uniform']
[7, 'distance']
Total = 3 × 2 = 6 combinations
Har combo par CV apply hoga → accuracy milegi → best wala choose hoga

✅ Guarantee deta hai best combo milega
❌ But time-consuming hota hai (slow if grid is large)

🎲 2. Randomized Search CV
✅ What:
Random combinations try karta hai grid me se — time bachane ke liye

⚙️ How:
Grid define karte hain, but sabhi try nahi karta

Random k combinations pick karta hai (jaise 10 out of 100)

Fast hota hai, especially jab grid bada ho

✅ Faster than GridSearchCV
❌ Might miss the absolute best combination (kyunki sab try nahi karta)

🌍 Real-life Examples:
🎯 Kaggle models → mostly GridSearchCV use hota hai

🏥 Healthcare me accuracy critical hoti hai — tuning must

📊 Titanic dataset → Manual tuning se pata chalta hai ki automated best hai
---

### 💡 Summary: Hyperparameter Tuning Methods

| Method               | Speed    | Accuracy     | Use Case              |
|----------------------|----------|--------------|------------------------|
| Manual Search        | 🐌 Slow   | ❌ Unreliable | Basic trial & error   |
| GridSearchCV         | ⚖️ Medium | ✅ Best       | Small search space     |
| RandomizedSearchCV   | ⚡ Fast   | 🔄 Good       | Large search space     |

---


In [3]:
#randomized search

---

## 🎲 RandomizedSearchCV – Hyperparameter Tuning (Smart + Fast)

---

### ✅ What:
RandomizedSearchCV ek technique hai jisme:
- Hum **poori grid ki jagah**, **randomly kuch combinations** ko hi try karte hain.

---

### ❓ Why:
- Jab parameter combinations **bohot zyada ho jayein** (100s or 1000s)
- Tab GridSearchCV **slow aur expensive** ho jaata hai
- RandomizedSearchCV se time save hota hai, fir bhi **achha result milta hai**

---

### 🕒 When to use:
- Jab model slow ho (jaise XGBoost)
- Jab grid size bohot bada ho
- Jab quick but fairly good result chahiye ho

---

### 📍 Where used:
- Large models like RandomForest, GradientBoost, XGBoost
- Kaggle competitions for quick tuning
- Real-time ML pipelines jaha tuning me time kam hota hai

---

### ⚙️ How it works:

- Grid define karte ho jaise:
  ```python
  param_grid = {
      'n_estimators': [50, 100, 200, 300],
      'max_depth': [3, 5, 10, 20],
      'min_samples_split': [2, 5, 10]
  }
---

### 💡 Summary: Hyperparameter Tuning Methods

| Method               | 🔁 Speed   | 🎯 Accuracy     | 📌 Use Case              |
|----------------------|------------|------------------|---------------------------|
| Manual Search        | 🐌 Slow     | ❌ Unreliable     | Basic trial & error       |
| GridSearchCV         | ⚖️ Medium   | ✅ Best           | Small search space        |
| RandomizedSearchCV   | ⚡ Fast     | 🔄 Good           | Large search space        |

---


---

### 🎲 RandomizedSearchCV – Smart Tuning for Large Search Spaces

---

### 📌 Problem: Jab hyperparameter combinations bohot zyada ho

> Jab chhoti grid hoti hai (jaise 5–10 combinations), toh GridSearchCV handle kar sakta hai.

❗ But suppose XGBoost ke case me:
- `n_estimators` = [100, 200, ..., 1000]
- `max_depth` = [3, 5, 7, 10]
- `learning_rate` = [0.01, 0.05, 0.1, 0.2]

Total combinations = **10 × 4 × 4 = 160**  
Aur agar aur zyada params ho gaye (2000–3000), toh GridSearch **bohot slow ho jaata hai**.

---

### ✅ Solution: RandomizedSearchCV

- Ye **poore grid ko scan nahi karta**
- Sirf randomly **kuch combinations (jaise 10 ya 20)** pick karta hai
- Time bachaata hai, fir bhi **achi accuracy mil jaati hai**

---

### 🔹 `n_estimators` kya hota hai?

> Ye decide karta hai ki **kitne trees banaye jaayenge** model me (mostly in ensemble models like RandomForest, XGBoost)

- Zyadatar case me `n_estimators` = 100 ya 200 rakhte hain
- Zyada trees = zyada training time, lekin better learning (up to a point)

✅ RandomizedSearchCV me hum is value ko **efficiently test kar sakte hain**

---

### ✅ Short Example:

```python
param_grid = {
  'n_estimators': [100, 200, 300, 400, 500, 600, 700, 800, 900, 1000],
  'max_depth': [3, 5, 7, 10],
  'learning_rate': [0.01, 0.05, 0.1, 0.2]
}

from sklearn.model_selection import RandomizedSearchCV
from xgboost import XGBClassifier

model = XGBClassifier()

random_cv = RandomizedSearchCV(model, param_distributions=param_grid, n_iter=20, cv=5)
random_cv.fit(X, y)
