<a href="https://colab.research.google.com/github/Jhansipothabattula/Machine_Learning/blob/main/Day46.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Automated Hyperparameter Tuning with GridSearchCV and RandomizedSearchCV


**Using GridSearchCV and RandomizedSearchCV in scikit-learn**

* **What is GridSearchCV?**

  * Exhaustive search over a specified parameter grid
  * Trains and evaluates a model for every combination of hyperparameters in the grid using cross-validation

* **What is RandomizedSearchCV?**

  * Selects a fixed number of random combinations from a parameter distribution
  * Faster than GridSearchCV for large hyperparameter spaces while still providing good results




**Using GridSearchCV and RandomizedSearchCV in scikit-learn**

* **Key Features**

  * **Automates Hyperparameter Tuning**

    * Combines model training, evaluation, and hyperparameter search into a single step
  * **Cross-Validation Integration**

    * Ensures robust performance metrics by using cross-validation
  * **Result Interpretation**

    * Provides the best hyperparameter combination and associated metrics



**Integrating Cross-Validation with Hyperparameter Tuning**
- Cross-Validation

  - Ensures that the hyperparameters selected generalize well to unseen data

- Benefits

  - Reduces overfitting to the training dataset

  - Provides robust estimates of model performance

**Interpreting Results and Selecting the Best Model**
- Best Parameters

  - Access the optimal hyperparameter combination using .best_params_

- Best Estimator

  - Retrieve the model trained with the best hyperparameters using .best_estimator_

- Performance Metrics

  - Use .best_score_ to evaluate the performance of the best hyperparameters

**Objective**

- Use GridSearchCV and RandomizedSearchCV to tune hyperparameters of Gradient Boosting and support vector machine models, and compare results

In [2]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split, GridSearchCV, RandomizedSearchCV
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, classification_report
import numpy as np

# Load dataset
iris = load_iris()
X, y = iris.data, iris.target

# Split dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print("Dataset Loaded and split successfully")

# Define parameter grid
param_grid={
    "n_estimators": [50, 100, 200],
    "learning_rate": [0.01, 0.1, 0.2],
    "max_depth": [3, 5, 7]
}

# Initialize GridSearchCV
grid_search = GridSearchCV(
    estimator=GradientBoostingClassifier(random_state=42),
    param_grid=param_grid,
    scoring="accuracy",
    cv=5,
    n_jobs=-1
)

# Get Best parameters and score
grid_search.fit(X_train, y_train)
best_params_grid = grid_search.best_params_
best_score_grid = grid_search.best_score_

print(f"Best Parameters (GridSearchCV): {best_params_grid}")
print(f"Best cross-Validation Accuracy (GridSearchCV): {best_score_grid:.4f}")

# Get best model
best_grid_model = grid_search.best_estimator_

# Predict and evaluate
y_pred_grid = best_grid_model.predict(X_test)
accuracy_grid = accuracy_score(y_test, y_pred_grid)

print(f"Test Accuracy (GridSearchCV): {accuracy_grid:.4f}")
print("Classification Report:", classification_report(y_test, y_pred_grid))

# Define Parameter Distribution
param_dist = {
    "C":np.logspace(-3, 3, 10),
    "kernel": ["linear", "rbf", "poly", "sigmoid"],
    "gamma":["scale", "auto"]
}

# Initialize RandomizedSearchCV
random_search = RandomizedSearchCV(
    estimator=SVC(random_state=42),
    param_distributions=param_dist,
    n_iter=10,
    scoring="accuracy",
    cv=5,
    n_jobs=-1,
    random_state=42
)

# Perform Randomized Search
random_search.fit(X_train, y_train)

# Get best parameters and score
best_params_random = random_search.best_params_
best_score_random = random_search.best_score_

print(f"Best Parameters (RandomizedSearchCV): {best_params_random}")
print(f"Best cross-Validation Accuracy (RandomizedSearchCV): {best_score_random:.4f}")

# Get Best model
best_random_model = random_search.best_estimator_

# Predict and evaluate
y_pred_random = best_random_model.predict(X_test)
accuracy_random = accuracy_score(y_test, y_pred_random)

print(f"Test Accuracy (RandomizedSearchCV): {accuracy_random:.4f}")
print("Classification Report: \n", classification_report(y_test, y_pred_random))


Dataset Loaded and split successfully
Best Parameters (GridSearchCV): {'learning_rate': 0.01, 'max_depth': 3, 'n_estimators': 50}
Best cross-Validation Accuracy (GridSearchCV): 0.9500
Test Accuracy (GridSearchCV): 1.0000
Classification Report:               precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       1.00      1.00      1.00         9
           2       1.00      1.00      1.00        11

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30

Best Parameters (RandomizedSearchCV): {'kernel': 'poly', 'gamma': 'auto', 'C': np.float64(0.021544346900318832)}
Best cross-Validation Accuracy (RandomizedSearchCV): 0.9583
Test Accuracy (RandomizedSearchCV): 0.9667
Classification Report: 
               precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       1.00  