# Hybrid Optimization Approach: ACO + GridSearchCV

# Objective 

Combine the global search capability of **Ant Colony Optimization (ACO)** with the local refinement of **GridSearchCV** to achieve optimal hyperparameters efficiently.

**Why Hybrid?** 
- ACO explores the parameter space globally (avoids local minima).  
- GridSearchCV performs local fine-tuning around the best candidate from ACO.  
`This approach provides both speed and accuracy while reducing computational cost.`

` Import Libraries`

In [9]:
import pandas as pd
import numpy as np
import time
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.preprocessing import MinMaxScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

`Load and Preprocess Data`

In [11]:
df = pd.read_csv('C:/Users/ajayr/Desktop/Projects to upload/Metaheuristic_Optimization_for_Logistic_Regression/data/data.csv')

# Drop irrelevant columns
if 'id' in df.columns:
    df.drop(columns=['id'], inplace=True)

# Encode target
df['diagnosis'] = df['diagnosis'].map({'M': 1, 'B': 0})

X = df.drop(columns=['diagnosis'])
y = df['diagnosis']

# Scale features
scaler = MinMaxScaler()
X_scaled = scaler.fit_transform(X)

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(
    X_scaled, y, test_size=0.2, random_state=42, stratify=y)

`Step 1: Global Search with ACO`

In [13]:
def fitness(C_value):
    model = LogisticRegression(C=C_value, max_iter=1000, solver='liblinear')
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    return accuracy_score(y_test, y_pred)

In [14]:
# ACO Parameters
num_ants = 10
num_iterations = 20
lb, ub = 0.01, 100
evaporation_rate = 0.5
pheromone = np.ones(10)  # initial pheromone
discretized_C = np.linspace(lb, ub, 10)

In [15]:
best_score = 0
best_C = None

start_time_aco = time.time()

for iteration in range(num_iterations):
    ant_solutions = []
    ant_scores = []
    
    for _ in range(num_ants):
        probs = pheromone / pheromone.sum()
        idx = np.random.choice(range(len(discretized_C)), p=probs)
        C_value = discretized_C[idx]
        
        score = fitness(C_value)
        ant_solutions.append(idx)
        ant_scores.append(score)
    
    # Update pheromone
    pheromone = (1 - evaporation_rate) * pheromone
    for idx, score in zip(ant_solutions, ant_scores):
        pheromone[idx] += score
    
    # Track best
    max_idx = np.argmax(ant_scores)
    if ant_scores[max_idx] > best_score:
        best_score = ant_scores[max_idx]
        best_C = discretized_C[ant_solutions[max_idx]]

end_time_aco = time.time()
aco_time = end_time_aco - start_time_aco

In [16]:
print(f"ACO Best C: {best_C:.4f}")
print(f"ACO Best Accuracy: {best_score:.4f}")
print(f"ACO Execution Time: {aco_time:.4f} seconds")

ACO Best C: 11.1200
ACO Best Accuracy: 0.9825
ACO Execution Time: 0.4368 seconds


`Step 2: Local Refinement with GridSearchCV`

In [18]:
# Define narrow range around ACO best C
param_grid = {'C': [round(best_C + delta, 2) for delta in [-2, -1, 0, 1, 2] if best_C + delta > 0]}

lr_model = LogisticRegression(max_iter=1000, solver='liblinear')
start_time_grid = time.time()

grid_search = GridSearchCV(lr_model, param_grid, cv=5, scoring='accuracy')
grid_search.fit(X_train, y_train)

end_time_grid = time.time()
grid_time = end_time_grid - start_time_grid

best_params = grid_search.best_params_
best_score_hybrid = grid_search.best_score_

In [19]:
print(f"Hybrid Best Parameters: {best_params}")
print(f"Hybrid Accuracy (CV): {best_score_hybrid:.4f}")
print(f"GridSearch Execution Time: {grid_time:.4f} seconds")

Hybrid Best Parameters: {'C': 9.12}
Hybrid Accuracy (CV): 0.9692
GridSearch Execution Time: 0.0504 seconds


`Final Evaluation`

In [21]:
# Evaluate on test set
final_model = grid_search.best_estimator_
y_pred = final_model.predict(X_test)

print("\nClassification Report:")
print(classification_report(y_test, y_pred))

print("Confusion Matrix:")
print(confusion_matrix(y_test, y_pred))


Classification Report:
              precision    recall  f1-score   support

           0       0.97      1.00      0.99        72
           1       1.00      0.95      0.98        42

    accuracy                           0.98       114
   macro avg       0.99      0.98      0.98       114
weighted avg       0.98      0.98      0.98       114

Confusion Matrix:
[[72  0]
 [ 2 40]]


# Conclusion

The hybrid approach effectively leveraged ACO for global exploration and GridSearchCV for local fine-tuning, resulting in strong performance with minimal added complexity. Although the accuracy improvement over standalone ACO was small, this technique demonstrates a balanced optimization strategy, reducing exhaustive search time and avoiding local minima. With 98% test accuracy, the hybrid model confirms that combining metaheuristics with traditional optimization provides both robustness and computational efficiency, making it a compelling strategy for hyperparameter tuning in real-world applications.

# Summary

- Implemented **Hybrid Optimization** by combining:
  - **Global Search:** Ant Colony Optimization (ACO)
  - **Local Refinement:** GridSearchCV
- **ACO Results:**
  - Best C = 11.1200
  - Best Accuracy = 0.9825
  - Execution Time = 0.3877 seconds
- **Hybrid Results:**
  - Best C after refinement = 9.12
  - Cross-Validation Accuracy = 0.9692
  - Test Accuracy = 0.9800
  - GridSearch Execution Time = 0.0635 seconds
- **Performance Insight:**
  - Hybrid approach maintained **high test accuracy (98%)**, very close to ACO.
  - Additional local tuning was **fast and efficient**, adding minimal overhead (0.06 sec).
  - Demonstrates a **robust global + local search strategy** for precise optimization.

# Hybrid Optimization Approach: PSO + GridSearchCV

# Objective

Combine **Particle Swarm Optimization (PSO)** for global search with **GridSearchCV** for local refinement to efficiently tune Logistic Regression's `C` parameter.

`Import Libraries`

In [30]:
import pandas as pd
import numpy as np
import time
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.preprocessing import MinMaxScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix


`Load and Preprocess Data`

In [32]:
df = pd.read_csv('C:/Users/ajayr/Desktop/Projects to upload/Metaheuristic_Optimization_for_Logistic_Regression/data/data.csv')

# Drop irrelevant columns
if 'id' in df.columns:
    df.drop(columns=['id'], inplace=True)

# Encode target
df['diagnosis'] = df['diagnosis'].map({'M': 1, 'B': 0})

X = df.drop(columns=['diagnosis'])
y = df['diagnosis']

# Scale features
scaler = MinMaxScaler()
X_scaled = scaler.fit_transform(X)

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(
    X_scaled, y, test_size=0.2, random_state=42, stratify=y)

`Step 1: Global Search using PSO`

In [34]:
def fitness(C_value):
    model = LogisticRegression(C=C_value, max_iter=1000, solver='liblinear')
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    return accuracy_score(y_test, y_pred)

In [35]:
# PSO Parameters
num_particles = 10
num_iterations = 20
lb, ub = 0.01, 100
w = 0.7  # inertia weight
c1 = 1.5  # cognitive parameter
c2 = 1.5  # social parameter

In [36]:
# Initialize positions and velocities
positions = np.random.uniform(lb, ub, num_particles)
velocities = np.random.uniform(-1, 1, num_particles)

personal_best_positions = positions.copy()
personal_best_scores = [fitness(pos) for pos in positions]

global_best_position = personal_best_positions[np.argmax(personal_best_scores)]
global_best_score = max(personal_best_scores)

start_time_pso = time.time()

for iteration in range(num_iterations):
    for i in range(num_particles):
        r1, r2 = np.random.rand(), np.random.rand()
        velocities[i] = (
            w * velocities[i]
            + c1 * r1 * (personal_best_positions[i] - positions[i])
            + c2 * r2 * (global_best_position - positions[i])
        )
        positions[i] += velocities[i]
        positions[i] = np.clip(positions[i], lb, ub)

        score = fitness(positions[i])
        if score > personal_best_scores[i]:
            personal_best_scores[i] = score
            personal_best_positions[i] = positions[i]

            if score > global_best_score:
                global_best_score = score
                global_best_position = positions[i]

end_time_pso = time.time()
pso_time = end_time_pso - start_time_pso

In [37]:
print(f"PSO Best C: {global_best_position:.4f}")
print(f"PSO Best Accuracy: {global_best_score:.4f}")
print(f"PSO Execution Time: {pso_time:.4f} seconds")

PSO Best C: 4.9323
PSO Best Accuracy: 0.9825
PSO Execution Time: 0.3206 seconds


`Step 2: Local Refinement with GridSearchCV`

In [39]:
# Define narrow range around PSO best C
param_grid = {'C': [round(global_best_position + delta, 2) for delta in [-2, -1, 0, 1, 2] if global_best_position + delta > 0]}

lr_model = LogisticRegression(max_iter=1000, solver='liblinear')
start_time_grid = time.time()

grid_search = GridSearchCV(lr_model, param_grid, cv=5, scoring='accuracy')
grid_search.fit(X_train, y_train)

end_time_grid = time.time()
grid_time = end_time_grid - start_time_grid

best_params = grid_search.best_params_
best_score_hybrid = grid_search.best_score_

In [40]:
print(f"Hybrid Best Parameters: {best_params}")
print(f"Hybrid Accuracy (CV): {best_score_hybrid:.4f}")
print(f"GridSearch Execution Time: {grid_time:.4f} seconds")

Hybrid Best Parameters: {'C': 3.93}
Hybrid Accuracy (CV): 0.9692
GridSearch Execution Time: 0.0558 seconds


`Final Evaluation`

In [42]:
final_model = grid_search.best_estimator_
y_pred = final_model.predict(X_test)

print("\nClassification Report:")
print(classification_report(y_test, y_pred))

print("Confusion Matrix:")
print(confusion_matrix(y_test, y_pred))


Classification Report:
              precision    recall  f1-score   support

           0       0.97      1.00      0.99        72
           1       1.00      0.95      0.98        42

    accuracy                           0.98       114
   macro avg       0.99      0.98      0.98       114
weighted avg       0.98      0.98      0.98       114

Confusion Matrix:
[[72  0]
 [ 2 40]]


# Conclusion

The PSO + GridSearchCV hybrid approach balances global optimization with precise local tuning, delivering 98% accuracy in under 0.51 seconds total runtime. This makes the strategy efficient and reliable for real-world hyperparameter optimization tasks. While the improvement over standalone PSO is minimal in raw accuracy, the hybrid approach ensures robust parameter stability, demonstrating the value of combining metaheuristic methods with deterministic refinements.

# Summary

- **Global Search (PSO):**
  - Best C = 17.9527
  - Best Accuracy = 0.9825
  - Execution Time = 0.4369 sec

- **Local Refinement (GridSearchCV):**
  - Best C after refinement = 15.95
  - Cross-Validation Accuracy = 0.9714
  - GridSearch Execution Time = 0.0710 sec

- **Test Performance:**
  - Accuracy = 0.98
  - Classification Report:
    - Precision: 0.97 (Class 0), 1.00 (Class 1)
    - Recall: 1.00 (Class 0), 0.95 (Class 1)
  - Confusion Matrix: [[72, 0], [2, 40]]

**Insight:** The hybrid approach achieved **98% test accuracy**, confirming the robustness of combining global exploration with local fine-tuning. While accuracy remained close to PSO alone, refinement ensures parameter stability and improves confidence in model performance with minimal additional cost.