# Objective

- Use GridSearchCV to optimize Logistic Regression hyperparameters (`C`, `penalty`, `solver`).
- Compare performance against the baseline model.
- Record execution time for analysis.

**Why GridSearch?**
- GridSearch systematically explores a predefined parameter grid using cross-validation to find the best combination.

`Import Libraries`

In [11]:
import pandas as pd
import time
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.preprocessing import MinMaxScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

In [13]:
# In Depth Ananlysis in File 02_Data_Preprocessing_and_Feature_Scaling for feature scaling and data preproxessing

In [15]:
# Load dataset
df = pd.read_csv('C:/Users/ajayr/Desktop/Projects to upload/Metaheuristic_Optimization_for_Logistic_Regression/data/data.csv')

# Drop irrelevant columns
if 'id' in df.columns:
    df.drop(columns=['id'], inplace=True)

# Encode target variable
df['diagnosis'] = df['diagnosis'].map({'M': 1, 'B': 0})

# Split features and target
X = df.drop(columns=['diagnosis'])
y = df['diagnosis']

# Normalize features using MinMaxScaler
scaler = MinMaxScaler()
X_scaled = scaler.fit_transform(X)

In [48]:
# Train-Test Split
X_train, X_test, y_train, y_test = train_test_split(
    X_scaled, y, test_size=0.2, random_state=42, stratify=y)

` Define Parameter Grid for Logistic Regression`

In [29]:
param_grid = {
    'C': [0.01, 0.1, 1, 10, 100],
    'penalty': ['l1', 'l2'],
    'solver': ['liblinear', 'saga']
}


`Apply GridSearchCV and Measure Time`

In [32]:
# Initialize model
log_reg = LogisticRegression(max_iter=1000, random_state=42)

# Initialize GridSearchCV
grid = GridSearchCV(estimator=log_reg, param_grid=param_grid, cv=5, scoring='accuracy', n_jobs=-1)

# Measure execution time
start_time = time.time()
grid.fit(X_train, y_train)
end_time = time.time()
execution_time = end_time - start_time

print(f"GridSearchCV Training Time: {execution_time:.4f} seconds")

GridSearchCV Training Time: 5.4559 seconds


`Best Parameters and Model Performance`

In [35]:
# Best parameters
print("Best Parameters found by GridSearchCV:", grid.best_params_)

# Predict using best estimator
y_pred = grid.best_estimator_.predict(X_test)

# Accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy after GridSearchCV: {accuracy:.4f}")

# Classification report
print("\nClassification Report:\n", classification_report(y_test, y_pred))

# Confusion Matrix
print("\nConfusion Matrix:\n", confusion_matrix(y_test, y_pred))


Best Parameters found by GridSearchCV: {'C': 100, 'penalty': 'l2', 'solver': 'liblinear'}
Accuracy after GridSearchCV: 0.9737

Classification Report:
               precision    recall  f1-score   support

           0       0.97      0.99      0.98        72
           1       0.98      0.95      0.96        42

    accuracy                           0.97       114
   macro avg       0.97      0.97      0.97       114
weighted avg       0.97      0.97      0.97       114


Confusion Matrix:
 [[71  1]
 [ 2 40]]


# Conclusion

The optimized Logistic Regression model using GridSearchCV achieved an accuracy of 97.37%, similar to the baseline but with a slight improvement in recall for malignant cases.
Best Parameters: {'C': 100, 'penalty': 'l2', 'solver': 'liblinear'}
Class 0 (Benign): Precision = 0.97, Recall = 0.99 → Almost all benign cases correctly classified.
Class 1 (Malignant): Precision = 0.98, Recall = 0.95 → Slight improvement compared to baseline (recall was 0.93).
Confusion Matrix: [[71, 1], [2, 40]] → Only 2 malignant cases missed (better than 3 previously).
Insight: GridSearchCV improved recall slightly and confirmed that tuning hyperparameters can refine performance, but the overall accuracy gain is minimal compared to baseline.

# Summary

- Implemented **GridSearchCV** to optimize Logistic Regression hyperparameters (`C`, `penalty`, `solver`).
- **Best Parameters:** {'C': 100, 'penalty': 'l2', 'solver': 'liblinear'}.
- **Accuracy:** 97.37% (similar to baseline but with improved recall for malignant cases).
- **Class 1 (Malignant):** Recall improved from 0.93 (baseline) to 0.95 → fewer false negatives.
- **Confusion Matrix:** [[71, 1], [2, 40]] → Only 2 malignant cases missed.
- **Execution Time:** Slower than baseline (exact timing recorded separately).
- **Insight:** Hyperparameter tuning refined model performance and confirmed the benefit of systematic search.
- **Next Step:** Apply **metaheuristic optimization (PSO)** to explore non-grid search-based tuning for potential improvements.