Task 29-> Hyperparameter Tuning Techniques
When working on a machine learning project, selecting and fine-tuning the right model is essential for optimal performance. Common models from sklearn include Linear Regression for predicting continuous values, Logistic Regression for binary classification, and Decision Trees, Random Forests, SVMs, and KNNs for various classification and regression tasks. Hyperparameter tuning techniques such as Grid Search, Random Search, and Bayesian Optimization can be employed to find the best parameters for these models, enhancing their performance and predictive accuracy. Apply these techniques and note down the results to evaluate the impact of different models and hyperparameters on your dataset's performance.

In [6]:
pip install scikit-optimize




Importing Libraries

In [7]:
import numpy as np
import pandas as pd
from sklearn import datasets
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split, GridSearchCV ,RandomizedSearchCV
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import classification_report, accuracy_score
from skopt import BayesSearchCV

Load dataset

In [8]:
# Load the Wine dataset
wine =datasets.load_wine()
X = wine.data
y = wine.target

Spliting the dataset

In [9]:
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Step 4 : Define the model and hyperparameters for Grid Search

In [10]:
# Step 4: Define the models and hyperparameters for Grid Search
models = {
    'LogisticRegression': LogisticRegression(),
    'DecisionTree': DecisionTreeClassifier(),
    'RandomForest': RandomForestClassifier(),
    'SVM': SVC(),
    'KNN': KNeighborsClassifier()
}

In [11]:

param_grid = {
    'LogisticRegression': {
        'C': [0.01, 0.1, 1, 10, 100],
        'solver': ['liblinear', 'lbfgs']
    },
    'DecisionTree': {
        'max_depth': [None, 10, 20, 30, 40, 50],
        'min_samples_split': [2, 5, 10]
    },
    'RandomForest': {
        'n_estimators': [10, 50, 100, 200],
        'max_depth': [None, 10, 20, 30, 40, 50],
        'min_samples_split': [2, 5, 10]
    },
    'SVM': {
        'C': [0.01, 0.1, 1, 10, 100],
        'kernel': ['linear', 'rbf']
    },
    'KNN': {
        'n_neighbors': [3, 5, 7, 9],
        'weights': ['uniform', 'distance']
    }
}

Perform grid search and evaluate the model

In [13]:
best_models ={}
for model_name in models:
  print(f'Performing Grid Search for {model_name}...')
  grid_search = GridSearchCV(models[model_name], param_grid[model_name], cv=5, scoring='accuracy')
  grid_search.fit(X_train,y_train)
  best_models[model_name]=grid_search.best_estimator_
  y_pred = grid_search.predict(X_test)
  print(f'Best parameters for {model_name}: {grid_search.best_params_}')
  print(f'Accuracy: {accuracy_score(y_test, y_pred)}')
  print(classification_report(y_test, y_pred))

Performing Grid Search for LogisticRegression...


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver opt

Best parameters for LogisticRegression: {'C': 1, 'solver': 'liblinear'}
Accuracy: 0.9722222222222222
              precision    recall  f1-score   support

           0       1.00      0.93      0.96        14
           1       0.93      1.00      0.97        14
           2       1.00      1.00      1.00         8

    accuracy                           0.97        36
   macro avg       0.98      0.98      0.98        36
weighted avg       0.97      0.97      0.97        36

Performing Grid Search for DecisionTree...
Best parameters for DecisionTree: {'max_depth': None, 'min_samples_split': 2}
Accuracy: 0.9444444444444444
              precision    recall  f1-score   support

           0       1.00      0.93      0.96        14
           1       0.88      1.00      0.93        14
           2       1.00      0.88      0.93         8

    accuracy                           0.94        36
   macro avg       0.96      0.93      0.94        36
weighted avg       0.95      0.94      0.9

Compare the results

In [14]:
print("Summary of best models and their performance:")
for model_name in best_models:
    print(f"{model_name}:")
    y_pred = best_models[model_name].predict(X_test)
    print(f"Accuracy: {accuracy_score(y_test, y_pred)}")
    print(classification_report(y_test, y_pred))
    print()


Summary of best models and their performance:
LogisticRegression:
Accuracy: 0.9722222222222222
              precision    recall  f1-score   support

           0       1.00      0.93      0.96        14
           1       0.93      1.00      0.97        14
           2       1.00      1.00      1.00         8

    accuracy                           0.97        36
   macro avg       0.98      0.98      0.98        36
weighted avg       0.97      0.97      0.97        36


DecisionTree:
Accuracy: 0.9444444444444444
              precision    recall  f1-score   support

           0       1.00      0.93      0.96        14
           1       0.88      1.00      0.93        14
           2       1.00      0.88      0.93         8

    accuracy                           0.94        36
   macro avg       0.96      0.93      0.94        36
weighted avg       0.95      0.94      0.94        36


RandomForest:
Accuracy: 1.0
              precision    recall  f1-score   support

           0   