Let us demonstrate the use of GridSearchCV with a practical example using the Iris data set. We will perform a grid search to find the optimal hyperparameters for a support vector classifier (SVC).

Import necessary libraries: First, import the essential libraries required for loading the data set, splitting the data, performing GridSearchCV, and evaluating the model.

In [1]:
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.svm import SVC
from sklearn.metrics import classification_report
import warnings
# Ignore warnings
warnings.filterwarnings('ignore')

Load the Iris data set: The Iris data set is a classic data set in machine learning. Load it using the load_iris function from Scikit-Learn.

In [2]:
iris = load_iris()
X = iris.data
y = iris.target

- X: Features of the Iris dataset (sepal length, sepal width, petal length, petal width).
- y: Target labels representing the three species of Iris (setosa, versicolor, virginica).

Splitting the data into training and test set: Divide data set into training and test sets to evaluate how well the model performs on data it has not been trained on.

In [5]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Define the parameter grid: Specify a grid of hyperparameters for the SVM model to search over. The grid includes different values for C, gamma, and kernel.

In [6]:
param_grid = {
    'C': [0.1, 1, 10, 100],
    'gamma': [1, 0.1, 0.01, 0.001],
    'kernel': ['linear', 'rbf', 'poly']
}

In [7]:
# Initialize the SVC model: Create an instance of the support vector classifier (SVC).
svc = SVC()

In [8]:
# Initialize GridSearchCV: Set up the GridSearchCV with the SVC model, the parameter grid, and the desired configuration.
grid_search = GridSearchCV(estimator=svc, param_grid=param_grid, scoring='accuracy', cv=5, n_jobs=-1, verbose=2)

In [9]:
# Fit GridSearchCV to the training data: Perform the grid search on the training data.
grid_search.fit(X_train, y_train)

Fitting 5 folds for each of 48 candidates, totalling 240 fits
[CV] END ......................C=0.1, gamma=1, kernel=linear; total time=   0.0s
[CV] END .........................C=0.1, gamma=1, kernel=rbf; total time=   0.0s
[CV] END .........................C=0.1, gamma=1, kernel=rbf; total time=   0.0s
[CV] END ......................C=0.1, gamma=1, kernel=linear; total time=   0.0s
[CV] END ........................C=0.1, gamma=1, kernel=poly; total time=   0.0s
[CV] END ........................C=0.1, gamma=1, kernel=poly; total time=   0.0s
[CV] END ........................C=0.1, gamma=1, kernel=poly; total time=   0.0s
[CV] END ........................C=0.1, gamma=1, kernel=poly; total time=   0.0s
[CV] END ........................C=0.1, gamma=1, kernel=poly; total time=   0.0s
[CV] END ....................C=0.1, gamma=0.1, kernel=linear; total time=   0.0s
[CV] END ....................C=0.1, gamma=0.1, kernel=linear; total time=   0.0s
[CV] END ....................C=0.1, gamma=0.1, 

In [10]:
# Check the best parameters and estimator: After fitting, print the best parameters and the best estimator found during the grid search.
print("Best parameters found: ", grid_search.best_params_)
print("Best estimator: ", grid_search.best_estimator_)

Best parameters found:  {'C': 0.1, 'gamma': 0.1, 'kernel': 'poly'}
Best estimator:  SVC(C=0.1, gamma=0.1, kernel='poly')


In [11]:
# Make predictions with the best estimator: Use the best estimator to make predictions on the test set.
y_pred = grid_search.best_estimator_.predict(X_test)

In [12]:
# Evaluate the performance: Evaluate the model's performance on the test set using the classification_report function, which provides precision, recall, F1-score, and support for each class.
print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       1.00      1.00      1.00         9
           2       1.00      1.00      1.00        11

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30



## Key Points
- GridSearchCV conducts a thorough exploration across a defined parameter grid.
- Parameters include the estimator to optimize, parameter grid, scoring method, number of jobs for parallel execution, cross-validation strategy, and verbosity.
- Practical example demonstrated using GridSearchCV to find the optimal parameters for an SVC model on the Iris data set.
- GridSearchCV helps in selecting the best model by evaluating multiple combinations of hyperparameters.