# **Day 51: Hyperparameter Tuning** 🎯

Hyperparameter tuning is a crucial step in machine learning that involves optimizing model settings to achieve the best performance. Today, we explore **Grid Search** and **Random Search** as tuning techniques for models like **Decision Trees** and **Logistic Regression**.

---

## **Key Concepts**

### **What are Hyperparameters?**
- Hyperparameters are model settings defined *before training*.  
  Examples:  
  - **Tree depth** for Decision Trees.  
  - **Regularization strength** for Logistic Regression.  
- Unlike model parameters, hyperparameters are not learned from the data and require manual tuning.

---

### **Tuning Techniques**

1. **Grid Search**  
   - Exhaustively tests *all possible combinations* of hyperparameters in a defined range.  
   - Ensures the best combination but can be computationally expensive.  

2. **Random Search**  
   - Samples random combinations of hyperparameters.  
   - Faster than Grid Search but may miss the optimal set.

---

### **Why is Hyperparameter Tuning Important?**
- **Prevents Overfitting/Underfitting**  
  Helps balance model complexity for better predictions.  

- **Improves Model Accuracy**  
  Ensures the model generalizes well to unseen data.  

---

## Practical Implementation
### Step 1: Import Libraries

In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report

### Step 2: Load Data

In [2]:
data = pd.DataFrame({
    'Age': [22, 25, 47, 52, 46, 56, 55, 60, 62, 61],
    'Contract': [1, 0, 1, 1, 0, 1, 0, 1, 1, 0],
    'MonthlyCharges': [29, 35, 70, 90, 55, 85, 65, 95, 120, 100],
    'Churn': [0, 0, 1, 1, 0, 1, 0, 1, 1, 1]
})

X = data[['Age', 'Contract', 'MonthlyCharges']]
y = data['Churn']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

### Grid Search for Decision Tree
### Step 3: Set Up Hyperparameter Grid

In [3]:
param_grid = {
    'max_depth': [2, 3, 4, 5],
    'min_samples_split': [2, 3, 4],
    'criterion': ['gini', 'entropy']
}

### Step 4: Perform Grid Search

In [4]:
dt = DecisionTreeClassifier()

grid_search = GridSearchCV(estimator=dt, param_grid=param_grid, cv=5, scoring='accuracy')
grid_search.fit(X_train, y_train)

print("Best Parameters:", grid_search.best_params_)

best_dt = grid_search.best_estimator_
y_pred = best_dt.predict(X_test)
print(classification_report(y_test, y_pred))



Best Parameters: {'criterion': 'gini', 'max_depth': 2, 'min_samples_split': 3}
              precision    recall  f1-score   support

           0       1.00      1.00      1.00         1
           1       1.00      1.00      1.00         1

    accuracy                           1.00         2
   macro avg       1.00      1.00      1.00         2
weighted avg       1.00      1.00      1.00         2



### Random Search for Logistic Regression
### Step 5: Set Up Parameter Distribution

In [5]:
from sklearn.model_selection import RandomizedSearchCV
import numpy as np

param_dist = {
    'C': np.logspace(-4, 4, 20),  # Regularization strength
    'solver': ['liblinear', 'lbfgs']  # Optimization solvers
}

### Step 6: Perform Random Search

In [6]:
lr = LogisticRegression(max_iter=1000)

random_search = RandomizedSearchCV(estimator=lr, param_distributions=param_dist, n_iter=10, cv=5, scoring='accuracy', random_state=42)
random_search.fit(X_train, y_train)

print("Best Parameters:", random_search.best_params_)

best_lr = random_search.best_estimator_
y_pred = best_lr.predict(X_test)
print(classification_report(y_test, y_pred))



Best Parameters: {'solver': 'liblinear', 'C': 0.23357214690901212}
              precision    recall  f1-score   support

           0       0.00      0.00      0.00         1
           1       0.50      1.00      0.67         1

    accuracy                           0.50         2
   macro avg       0.25      0.50      0.33         2
weighted avg       0.25      0.50      0.33         2



  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


---

## Takeaways
1. **Grid Search** is exhaustive but computationally expensive. Use it for small hyperparameter spaces.
2. **Random Search** is faster and works well with larger hyperparameter ranges.
3. Always use cross-validation during tuning to ensure your model generalizes well to new data.


By systematically tuning hyperparameters, you can significantly boost your model's performance! 🚀

---