In [1]:
import pandas as pd
import numpy as np

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score


In [2]:
data = load_breast_cancer()

X = data.data
y = data.target


In [3]:
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)


In [4]:
pipeline = Pipeline([
    ('scaler', StandardScaler()),
    ('svm', SVC())
])


In [5]:
param_grid = {
    'svm__C': [0.1, 1, 10, 100],
    'svm__kernel': ['linear', 'rbf'],
    'svm__gamma': ['scale', 'auto']
}


In [6]:
grid = GridSearchCV(
    estimator=pipeline,
    param_grid=param_grid,
    cv=5,
    scoring='f1',
    n_jobs=-1
)

grid.fit(X_train, y_train)


In [7]:
print("Best Parameters:", grid.best_params_)


Best Parameters: {'svm__C': 1, 'svm__gamma': 'scale', 'svm__kernel': 'rbf'}


In [8]:
best_model = grid.best_estimator_

y_pred_tuned = best_model.predict(X_test)

print("Tuned Model Performance:")
print("Accuracy:", accuracy_score(y_test, y_pred_tuned))
print("Precision:", precision_score(y_test, y_pred_tuned))
print("Recall:", recall_score(y_test, y_pred_tuned))
print("F1 Score:", f1_score(y_test, y_pred_tuned))


Tuned Model Performance:
Accuracy: 0.9824561403508771
Precision: 0.9726027397260274
Recall: 1.0
F1 Score: 0.9861111111111112


In [9]:
default_model = Pipeline([
    ('scaler', StandardScaler()),
    ('svm', SVC())
])

default_model.fit(X_train, y_train)

y_pred_default = default_model.predict(X_test)

print("Default Model F1:", f1_score(y_test, y_pred_default))
print("Tuned Model F1:", f1_score(y_test, y_pred_tuned))


Default Model F1: 0.9861111111111112
Tuned Model F1: 0.9861111111111112


In [11]:
comparison = pd.DataFrame({
    "Model": ["Default SVM", "Tuned SVM"],
    "F1 Score": [
        f1_score(y_test, y_pred_default),
        f1_score(y_test, y_pred_tuned)
    ]
})

comparison


Unnamed: 0,Model,F1 Score
0,Default SVM,0.986111
1,Tuned SVM,0.986111


#  Hyperparameter Tuning using GridSearchCV


## Objective

The objective of this task is to improve model performance by tuning hyperparameters using GridSearchCV.

Hyperparameter tuning helps identify the best configuration for a machine learning model.


## Dataset Description

The Breast Cancer dataset was used for this task.

- Problem Type: Binary Classification
- Target Variable: Tumor diagnosis (Malignant or Benign)
- Features: Numerical tumor characteristics


## Why Hyperparameter Tuning?

Machine learning models have hyperparameters that control how they learn.

Instead of using default values, GridSearchCV tests multiple combinations to find the best-performing configuration.


## Train-Test Split

The dataset was split into training (80%) and testing (20%) data.

Hyperparameter tuning was performed only on the training data to prevent data leakage.


## Pipeline Creation

A pipeline was created to combine:

1. StandardScaler (for feature scaling)
2. Support Vector Machine (SVM) classifier

This ensures preprocessing and modeling occur together during cross-validation.


## Defining Hyperparameter Grid

The following hyperparameters were tuned:

- C (Regularization parameter)
- Kernel type (linear, rbf)
- Gamma (Kernel coefficient)

Multiple combinations were tested using GridSearchCV.


## GridSearchCV with Cross-Validation

GridSearchCV was applied with 5-fold cross-validation.

This means:
- The training data was split into 5 parts.
- The model was trained and validated 5 times.
- The average performance was calculated.

The best combination of hyperparameters was selected based on F1 Score.


## Best Parameters

GridSearchCV returned the best hyperparameter combination that maximized F1 Score.

These optimized parameters were used to build the tuned model.


## Model Evaluation

The tuned model was evaluated on the test dataset using:

- Accuracy
- Precision
- Recall
- F1 Score


## Performance Comparison

The performance of the default SVM model and the tuned SVM model was compared.

In this case, both models achieved similar F1 Scores.

This indicates that the default hyperparameters were already near-optimal for this dataset.


## Key Insight

Hyperparameter tuning does not always significantly improve performance.

In well-structured datasets, default parameters may already perform optimally.

GridSearchCV helps confirm model stability and optimal configuration.
