## Introduction to GridSearchCV
GridSearchCV in Scikit-Learn is a vital tool for hyperparameter tuning, performing an exhaustive search over specified parameter values for an estimator. It systematically evaluates each combination using cross-validation to identify the optimal settings that maximize model performance based on a scoring metric like accuracy or F1-score. Hyperparameter tuning is crucial as it significantly impacts model performance, preventing underfitting or overfitting. GridSearchCV automates this process, ensuring robust generalization on unseen data. It helps data scientists efficiently find the best hyperparameters, saving time and resources while optimizing model performance, making it an essential tool in the machine learning pipeline.

### Parameters:
GridSearchCV has several important parameters:
* Estimator: The model or pipeline to be optimized. This can be any Scikit-Learn estimator like LogisticRegression(),SVC(), RandomForestClassifier(), etc.
* param_grid: A dictionary or list of dictionaries with parameter names (as strings) as keys and lists of parameter settings to try as values. Using param_grid, you can specify the hyperparameters for various models to find the optimal combination.ameters.

In [1]:
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.svm import SVC
from sklearn.metrics import classification_report
import warnings
# Ignore warnings
warnings.filterwarnings('ignore')

In [2]:
iris = load_iris()
X = iris.data
y = iris.target

In [3]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [4]:
# Define the parameter grid: Specify a grid of hyperparameters for the SVM model to search over. 
# The grid includes different values for C, gamma, and kernel.
param_grid = {
    'C': [0.1, 1, 10, 100],
    'gamma': [1, 0.1, 0.01, 0.001],
    'kernel': ['linear', 'rbf', 'poly']
}

In [5]:
# Initialize the SVC model: Create an instance of the support vector classifier (SVC).
svc = SVC()

In [6]:
# Initialize GridSearchCV: 
# Set up the GridSearchCV with the SVC model, the parameter grid, and the desired configuration.

grid_search = GridSearchCV(estimator=svc, param_grid=param_grid, scoring='accuracy', cv=5, n_jobs=-1, verbose=2)

In [7]:
# Fit GridSearchCV to the training data: Perform the grid search on the training data.

grid_search.fit(X_train, y_train)

Fitting 5 folds for each of 48 candidates, totalling 240 fits


In [8]:
# Check the best parameters and estimator: After fitting, print the best parameters and the best estimator found during the grid search.

print("Best parameters found: ", grid_search.best_params_)
print("Best estimator: ", grid_search.best_estimator_)

Best parameters found:  {'C': 0.1, 'gamma': 0.1, 'kernel': 'poly'}
Best estimator:  SVC(C=0.1, gamma=0.1, kernel='poly')


In [9]:
# Make predictions with the best estimator: Use the best estimator to make predictions on the test set.

y_pred = grid_search.best_estimator_.predict(X_test)

In [10]:
# Evaluate the performance: Evaluate the model's performance on the test set using the classification_report function, which provides precision, recall, F1-score, and support for each class.

print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       1.00      1.00      1.00         9
           2       1.00      1.00      1.00        11

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30



## Key Points
* GridSearchCV conducts a thorough exploration across a defined parameter grid.
* Parameters include the estimator to optimize, parameter grid, scoring method, number of jobs for parallel execution, cross-validation strategy, and verbosity.
* Practical example demonstrated using GridSearchCV to find the optimal parameters for an SVC model on the Iris data set.
* GridSearchCV helps in selecting the best model by evaluating multiple combinations of hyperparameters.
ers.

### Examples of various models hyperparameters for the param_grid parameter.

In [11]:
# Logistic Regression: When tuning a logistic regression model, GridSearchCV 
# can search through different values of C, penalty, and solver to find the best parameters.
parameters = {'C': [0.01, 0.1, 1],
              'penalty': ['l2'],
              'solver': ['lbfgs']}

# C: Inverse of regularization strength; smaller values specify stronger regularization.
# penalty: Specifies the norm of the penalty; 'l2' is ridge regression.
# solver: Algorithm to use in the optimization problem.

In [12]:
# Support Vector Machine: For SVM, GridSearchCV can explore different kernels, 
# C values, and gamma settings to optimize the model.
parameters = {'kernel': ['linear', 'rbf', 'poly', 'sigmoid'],
              'C': np.logspace(-3, 3, 5),
              'gamma': np.logspace(-3, 3, 5)}

# kernel: Specifies the kernel type to be used in the algorithm.
# C: Regularization parameter.
# gamma: Kernel coefficient.

In [13]:
# Decision Tree Classifier: In the case of a decision tree, GridSearchCV can test 
# various criteria, splitters, depths, and other parameters to find the best configuration.

parameters = {'criterion': ['gini', 'entropy'],
              'splitter': ['best', 'random'],
              'max_depth': [2*n for n in range(1, 10)],
              'max_features': ['auto', 'sqrt'],
              'min_samples_leaf': [1, 2, 4],
              'min_samples_split': [2, 5, 10]}

# criterion: The function to measure the quality of a split.
# splitter: The strategy used to choose the split at each node.
# max_depth: The maximum depth of the tree.
# max_features: The number of features to consider when looking for the best split.
# min_samples_leaf: The minimum number of samples required to be at a leaf node.
# min_samples_split: The minimum number of samples required to split an internal node.

In [14]:
# K-Nearest Neighbors: For KNN, GridSearchCV can try different numbers of neighbors, 
# algorithms, and power parameters to determine the best model. 

parameters = {'n_neighbors': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
              'algorithm': ['auto', 'ball_tree', 'kd_tree', 'brute'],
              'p': [1, 2]}

# n_neighbors: Number of neighbors to use.
# algorithm: Algorithm used to compute the nearest neighbors.
# p: Power parameter for the Minkowski metric.

### Applications and Advantages of GridSearchCV
* Model Selection: GridSearchCV enables the comparison of multiple models and facilitates the selection of the best-performing one for a given data set.
* Hyperparameter Tuning: It automates the process of finding the optimal hyperparameters, which can significantly improve the performance of machine learning models.
* Pipeline Optimization: GridSearchCV can be applied to complex pipelines involving multiple preprocessing steps and models to optimize the entire workflow.
* Cross-Validation: It incorporates cross-validation in the parameter search process, ensuring that the model's performance is robust and not overfitted to a particular train-test split.
* Exhaustive Search: GridSearchCV performs an exhaustive search over the specified parameter grid, ensuring that the best combination of parameters is found.
* Parallel Execution: With the n_jobs parameter, it can leverage multiple processors to speed up the search process.
* Automatic Refit: By setting refit=True, GridSearchCV automatically refits the model with the best parameters on the entire data set, making it ready for use.
* Detailed Output: The cv_results_ attribute provides detailed information about the performance of each parameter combination, including training and validation scores, which helps in understanding the model's behavior.
del's behavior.