#  ML Model Training, Evaluation, and Hyperparameter Tuning

This notebook trains and evaluates multiple ML models on the **breast_cancer** :

QUES. : Train multiple machine learning models and evaluate their performance using metrics such as accuracy, precision, recall, and F1-score.                        
Implement hyperparameter tuning techniques like GridSearchCV and RandomizedSearchCV to optimize model parameters.                                             
Analyze the results to select the best-performing model.

# 1. Importing  Required Libraries ....

In [3]:
import pandas as pd
import numpy as np
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split, GridSearchCV, RandomizedSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier


# 2. Load Iris Dataset from KDnuggets Source

In [8]:
data = load_breast_cancer()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = pd.Series(data.target)
X.head()


Unnamed: 0,mean radius,mean texture,mean perimeter,mean area,mean smoothness,mean compactness,mean concavity,mean concave points,mean symmetry,mean fractal dimension,...,worst radius,worst texture,worst perimeter,worst area,worst smoothness,worst compactness,worst concavity,worst concave points,worst symmetry,worst fractal dimension
0,17.99,10.38,122.8,1001.0,0.1184,0.2776,0.3001,0.1471,0.2419,0.07871,...,25.38,17.33,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189
1,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,0.1812,0.05667,...,24.99,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902
2,19.69,21.25,130.0,1203.0,0.1096,0.1599,0.1974,0.1279,0.2069,0.05999,...,23.57,25.53,152.5,1709.0,0.1444,0.4245,0.4504,0.243,0.3613,0.08758
3,11.42,20.38,77.58,386.1,0.1425,0.2839,0.2414,0.1052,0.2597,0.09744,...,14.91,26.5,98.87,567.7,0.2098,0.8663,0.6869,0.2575,0.6638,0.173
4,20.29,14.34,135.1,1297.0,0.1003,0.1328,0.198,0.1043,0.1809,0.05883,...,22.54,16.67,152.2,1575.0,0.1374,0.205,0.4,0.1625,0.2364,0.07678


# 3. Defining Features and Target

In [6]:
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

X_train, X_test, y_train, y_test = train_test_split(
    X_scaled, y, test_size=0.2, random_state=42, stratify=y
)

print("Data Loaded. X shape:", X.shape, "y distribution:", y.value_counts().to_dict())


Data Loaded. X shape: (569, 30) y distribution: {1: 357, 0: 212}


# 4. Training and Evaluating Multiple Models

In [36]:
models = {
    'Logistic Regression': LogisticRegression(),
    'Random Forest': RandomForestClassifier(),
    'SVM': SVC(),
    'KNN': KNeighborsClassifier()
}

for name, model in models.items():
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    print(f"\n{name} :")
    print("Accuracy:", accuracy_score(y_test, y_pred))
    print("Precision:", precision_score(y_test, y_pred))
    print("Recall:", recall_score(y_test, y_pred))
    print("F1 Score:", f1_score(y_test, y_pred))


Logistic Regression :
Accuracy: 0.9824561403508771
Precision: 0.9861111111111112
Recall: 0.9861111111111112
F1 Score: 0.9861111111111112

Random Forest :
Accuracy: 0.956140350877193
Precision: 0.958904109589041
Recall: 0.9722222222222222
F1 Score: 0.9655172413793104

SVM :
Accuracy: 0.9824561403508771
Precision: 0.9861111111111112
Recall: 0.9861111111111112
F1 Score: 0.9861111111111112

KNN :
Accuracy: 0.9649122807017544
Precision: 0.9594594594594594
Recall: 0.9861111111111112
F1 Score: 0.9726027397260274


# 5. GridSearchCV for SVM

In [37]:
param_grid = {
    'C': [0.1, 1, 10],
    'gamma': [1, 0.1, 0.01],
    'kernel': ['rbf', 'linear']
}

grid = GridSearchCV(SVC(), param_grid, cv=5)
grid.fit(X_train, y_train)

print("\nBest parameters for SVM:", grid.best_params_)
y_pred_grid = grid.predict(X_test)
print("SVM (GridSearch) F1 Score:", f1_score(y_test, y_pred_grid))


Best parameters for SVM: {'C': 10, 'gamma': 0.01, 'kernel': 'rbf'}
SVM (GridSearch) F1 Score: 0.9861111111111112


# 6. RandomizedSearchCV for Random Forest

In [38]:
param_dist = {
    'n_estimators': [10, 50, 100, 150],
    'max_depth': [None, 10, 20, 30],
    'min_samples_split': [2, 5, 10]
}

rand_search = RandomizedSearchCV(RandomForestClassifier(), param_distributions=param_dist, n_iter=10, cv=5, random_state=42)
rand_search.fit(X_train, y_train)

print("\nBest parameters for Random Forest:", rand_search.best_params_)
y_pred_rand = rand_search.predict(X_test)
print("Random Forest (RandomizedSearch) F1 Score:", f1_score(y_test, y_pred_rand))


Best parameters for Random Forest: {'n_estimators': 100, 'min_samples_split': 2, 'max_depth': 20}
Random Forest (RandomizedSearch) F1 Score: 0.9655172413793104
