# Seeding PSO Optimization with Custom Initial Populations

This notebook demonstrates functionality for seeding a Particle Swarm Optimization (PSO) algorithm with custom initial populations. It can be useful when you have good candidate solutions and want to guide the optimization process. 

Some basics about PSO and it's implementation in PYMOO:
- If you do not pass a custom initial population, PYMOO will generate one using a sampling method. The default method is Latin Hypercube Sampling (LHS).
- If you want to use a custom initial population, you can pass it directly using the sampling parameter (see documentation: https://pymoo.org/algorithms/soo/pso.html).
- This notebook shows how to prepare and pass custom populations



#### Step 1: Define initial solutions

We provide two ways to create a custom initial population:
1. `from_data`: This is based on the initial solution (in this case, it is the GTFS feed (and DRT) read by `extract_optimization_data` / `extract_optimization_data_with_drt`)
2. List: A list of solutions. This is useful if you have good solutions from a previous run. We provide a utility function `extract_multiple_gtfs_solutions` to read in these solutions from disk. It is compatible with solutions saved in `3_write_solutions_to_file.ipynb`

#### Step 2: Add complementary solutions to match the population size

Let's say you want to run PSO with a population (pop_size) of 20. If you want to use `sampling`, then the size of the custom population must match the `pop_size` of the algorithm.
It is very likely (and encouraged!!!) for your custom initial population to have fewer solutions than the desired `pop_size`. The remianing solution are added using a combination of Gaussian pertubabtion (noise) and LHS.

- Gaussian pertubation means that we take an existing solution and add some noise to it. This is useful when you have a good solution and want to explore the nearby solution space.
- LHS is used to fill in any remaining slots in the population. This ensures diversity in the population.

You have to decide
- Proportion of remaining solutions to be filled using Gaussian pertubation vs LHS.
- Standard deviation of the Gaussian noise. If standard deviation is 1, then you should expect that ~68% of the pertubed solutions will be within 1 unit of the original solution. In our case, we use index values [1, 2, ..., n] to represent discrete choices (headway or drt fleet size), so I reccomend a standard deviation that makes sense relative to the range of index values. 

The `PopulationBuilder` class then builds a complete population of the desired size.

In [1]:
import os
import sys
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import geopandas as gpd
from pathlib import Path
import json
from typing import Dict, Any, List
import logging

# Add src to path
project_root = os.path.abspath(os.path.join(os.getcwd(), ".."))
src_path = os.path.join(project_root, "src")
if src_path not in sys.path:
    sys.path.insert(0, src_path)

logging.basicConfig(level=logging.INFO)

In [2]:
from transit_opt.optimisation.spatial.boundaries import StudyAreaBoundary

# Load boundary for creating problem later
boundary_gdf = gpd.read_file("../data/external/boundaries/study_area_boundary.geojson")
study_boundary = StudyAreaBoundary(
    boundary_gdf=boundary_gdf,
    crs="EPSG:3857",
    buffer_km=2.0
)

✅ Validated metric CRS: EPSG:3857
🔄 Converting boundary CRS: EPSG:4326 → EPSG:3857
📏 Applied 2.0km buffer
✅ Study area set: 1 polygon(s) in EPSG:3857


# How to load in solutions from disk

Here we show two different ways to load in solutions from disk.
1. Load in a single solution: This is the standard way used in other notebooks. We just load in an existing GTFS file
2. Load in multiple solutions: This is useful when you have a set of good solutions from a previous run. We use `extract_multiple_gtfs_solutions` which can extract gtfs zip files, their corresponding drt json files, and link them to create dsolutions in the correct format

## Loading in a single base solution with DRT 

In [3]:

from transit_opt.preprocessing.prepare_gtfs import GTFSDataPreparator

# Set up paths
gtfs_path = '../data/external/study_area_gtfs_bus.zip'

print(f"\n📂 GTFS data path: {gtfs_path}")

# Define DRT configuration (matching your provided config)
drt_config = {
    'enabled': True,
    'target_crs': 'EPSG:3857',
    'default_drt_speed_kmh': 25.0,
    'zones': [
        {
            'zone_id': 'drt_ne',
            'service_area_path': '../data/external/drt/drt_ne.shp',
            'allowed_fleet_sizes': [0, 10, 25, 50, 100],
            'zone_name': 'Leeds NE DRT',
            'drt_speed_kmh': 20.0
        },
        {
            'zone_id': 'drt_nw',
            'service_area_path': '../data/external/drt/drt_nw.shp',
            'allowed_fleet_sizes': [0, 15, 30, 60, 120],
            'zone_name': 'Leeds NW DRT'
        }
    ]
}

print("\n🚁 DRT Configuration:")
print(f"   Zones: {len(drt_config['zones'])}")
for zone in drt_config['zones']:
    print(f"   • {zone['zone_name']}: {zone['allowed_fleet_sizes']}")

# Create preparator and extract PT+DRT optimization data
print("\n🔧 Extracting PT+DRT optimization data...")

preparator = GTFSDataPreparator(
    gtfs_path=gtfs_path,
    interval_hours=6,
    log_level="WARNING"
)

allowed_headways = [10, 15, 30, 60, 120, 240]

# Extract optimization data WITH DRT
opt_data = preparator.extract_optimization_data_with_drt(
    allowed_headways=allowed_headways,
    drt_config=drt_config
)

# Show initial solution structure
print(f"\n📋 Initial Solution Structure:")
print(f"   Format: Flat array for PSO")
print(f"   Shape: {opt_data['initial_solution'].shape}")
print(f"   Contains: PT headway indices + DRT fleet size indices")
print(f"   First 10 values: {opt_data['initial_solution'][:10]}")
print(f"   Last 10 values: {opt_data['initial_solution'][-10:]}")


print("\n✅ Section 1 Complete: Base solution loaded from GTFS with DRT")


📂 GTFS data path: ../data/external/study_area_gtfs_bus.zip

🚁 DRT Configuration:
   Zones: 2
   • Leeds NE DRT: [0, 10, 25, 50, 100]
   • Leeds NW DRT: [0, 15, 30, 60, 120]

🔧 Extracting PT+DRT optimization data...
🔧 EXTRACTING OPTIMIZATION DATA WITH DRT SUPPORT:




   ✅ Base PT data extracted: 147 routes, 4 intervals
   🚁 Adding DRT configuration...
   🔍 Validating DRT configuration...
   ✅ DRT configuration valid: 2 zones
      Target CRS: EPSG:3857
   🗺️ Loading DRT spatial layers...
      Target CRS: EPSG:3857
      Loading zone 1: drt_ne
         Path: ../data/external/drt/drt_ne.shp
         Original CRS: EPSG:3857
         🔄 Converting: EPSG:3857 → EPSG:3857
   DRT Zone drt_ne: 474.95 km², speed 20.0 km/h
         ✅ Loaded: 474.95 km² service area
            CRS: EPSG:3857
            Fleet choices: [0, 10, 25, 50, 100]
      Loading zone 2: drt_nw
         Path: ../data/external/drt/drt_nw.shp
         Original CRS: EPSG:3857
         🔄 Converting: EPSG:3857 → EPSG:3857
   DRT Zone drt_ne: 474.95 km², speed 20.0 km/h
   DRT Zone drt_nw: 151.69 km², speed 25.0 km/h
         ✅ Loaded: 151.69 km² service area
            CRS: EPSG:3857
            Fleet choices: [0, 15, 30, 60, 120]
   ✅ All DRT spatial layers loaded successfully
      Total

### Loading multiple solutions from disk

If you have multiple good solutions from a previous run, you can load them in using the `extract_multiple_gtfs_solutions` function. Here I load in solutions that I created using code from `2d_optimization_joint_PT_DRT.ipynb`. the code below shows the directory structure, and the gtfs zip files and drt files inside it

You can see code for writing solutions to disk in `3_write_solutions_to_file.ipynb`


In [4]:
# Set up paths to saved solutions
solutions_dir = Path("output/combined_pt_drt_solutions")

print(f"\n📁 Solutions directory: {solutions_dir}")

# Check what solution files exist
drt_files = sorted(solutions_dir.glob("*_drt.json"))
gtfs_files = sorted(solutions_dir.glob("*_gtfs.zip"))

print(f"\n📋 Available solution files:")
print(f"   DRT files: {[f.name for f in drt_files]}")
print(f"   GTFS files: {[f.name for f in gtfs_files]}")

    
# Use extract_multiple_gtfs_solutions to load all solutions at once
print("\n🔄 Loading multiple solutions using extract_multiple_gtfs_solutions()...")
    
opt_data_list = preparator.extract_multiple_gtfs_solutions(
    gtfs_paths=[
        str(solutions_dir / 'combined_solution_01_gtfs.zip'),
        str(solutions_dir / 'combined_solution_02_gtfs.zip'),
        str(solutions_dir / 'combined_solution_03_gtfs.zip')
    ],
    allowed_headways=allowed_headways,
    drt_config=drt_config,
    drt_solution_paths=[
        str(solutions_dir / 'combined_solution_01_drt.json'),
        str(solutions_dir / 'combined_solution_02_drt.json'),
        str(solutions_dir / 'combined_solution_03_drt.json')
    ]
)
    
print(f"\n✅ LOADED {len(opt_data_list)} COMPLETE SOLUTIONS:")
for i, data in enumerate(opt_data_list, 1):
    print(f"\n   Solution {i}:")
    print(f"      PT variables: {data['pt_decision_variables']}")
    print(f"      DRT variables: {data['drt_decision_variables']}")
    print(f"      Initial solution shape: {data['initial_solution'].shape}")
    print(f"      Source GTFS: {data['metadata']['source_gtfs_path']}")
    print(f"      Source DRT: {data['metadata']['source_drt_path']}")

# Extract just the initial solutions (flat arrays)
flat_solutions_from_files = [data['initial_solution'] for data in opt_data_list]

print(f"\n📦 Extracted {len(flat_solutions_from_files)} flat solution arrays")
print(f"   These are ready to be used as base solutions for seeding PSO")



📁 Solutions directory: output/combined_pt_drt_solutions

📋 Available solution files:
   DRT files: ['combined_solution_01_drt.json', 'combined_solution_02_drt.json', 'combined_solution_03_drt.json']
   GTFS files: ['combined_solution_01_gtfs.zip', 'combined_solution_02_gtfs.zip', 'combined_solution_03_gtfs.zip']

🔄 Loading multiple solutions using extract_multiple_gtfs_solutions()...
Processing GTFS feed 1/3: output/combined_pt_drt_solutions/combined_solution_01_gtfs.zip
🔧 EXTRACTING OPTIMIZATION DATA WITH DRT SUPPORT:
   ✅ Base PT data extracted: 147 routes, 4 intervals
   🚁 Adding DRT configuration...
   🔍 Validating DRT configuration...
   ✅ DRT configuration valid: 2 zones
      Target CRS: EPSG:3857
   🗺️ Loading DRT spatial layers...
      Target CRS: EPSG:3857
      Loading zone 1: drt_ne
         Path: ../data/external/drt/drt_ne.shp
         Original CRS: EPSG:3857
         🔄 Converting: EPSG:3857 → EPSG:3857
   DRT Zone drt_ne: 474.95 km², speed 20.0 km/h
         ✅ Loaded: 

# Using loaded solutions to build a complete population for seeding 

As mentioned previously, we want to ensure that we pass a complete population of size `pop_size` to the `sampling` parameter in PYMOO. Why? From the documentation, PYMOO will not generate any additional solutions if the size of the custom population passed to `sampling` is less than `pop_size`, so the `pop_size` you define is not going to be the `pop_size` used in the PSO run

Here we show how to use the loaded solutions to build a complete population using the `PopulationBuilder` class. The basic idea of the class is:
- You have N initial solutions. You need an additional `pop_size - N` solutions to complete the population
- You decide the proportion of the remaining solutions you want to create using Gaussian pertubation of existing solutions. The remaining solutions are creating using LHS
- You also decide the standard deviation of the Gaussian noise
- The class then builds a complete population of size `pop_size`

LHS() is a very effective way to sample the solution space. Gaussian pertubation is useful when you have good solutions and want to explore the nearby solution space. I recommend keeping the proportion of Gaussian pertubation low (e.g., 20-30%) to ensure diversity in the population.

In [5]:

from transit_opt.optimisation.utils.solution_loader import SolutionLoader
from transit_opt.optimisation.utils.population_builder import PopulationBuilder
from transit_opt.optimisation.objectives.service_coverage import HexagonalCoverageObjective
from transit_opt.optimisation.problems.transit_problem import TransitOptimizationProblem

print("\n📚 UNDERSTANDING THE SEEDING WORKFLOW:")
print("   1. Load base solutions (from GTFS or previous runs)")
print("   2. Use SolutionLoader to validate and format solutions")
print("   3. Use PopulationBuilder to create PSO initial population")
print("   4. Pass population to PSO via configuration")

# Create problem for population building
print("\n🔧 Creating optimization problem...")

coverage_objective = HexagonalCoverageObjective(
    optimization_data=opt_data,
    spatial_resolution_km=2.0,
    crs="EPSG:3857",
    boundary=study_boundary,
    time_aggregation="average"
)

problem = TransitOptimizationProblem(
    optimization_data=opt_data,
    objective=coverage_objective
)

print(f"✅ Problem created:")
print(f"   Variables: {problem.n_var}")
print(f"   DRT enabled: {problem.drt_enabled}")

# Initialize loaders and builders
solution_loader = SolutionLoader()
population_builder = PopulationBuilder(solution_loader)

print("\n" + "-"*70)
print("STRATEGY A: SEEDING FROM BASE GTFS DATA ('from_data')")
print("-"*70)

print("\n📋 The 'from_data' strategy:")
print("   • Uses opt_data['initial_solution'] as the base")
print("   • Creates variations using Gaussian perturbations")
print("   • Fills remaining slots with Latin Hypercube Sampling (LHS)")

print("\n🎯 KEY PARAMETER: frac_gaussian_pert")
print("   Controls the mix of perturbations vs random exploration:")
print("   • frac_gaussian_pert = 0.7 → 70% Gaussian + 30% LHS")
print("   • Higher value = more exploration near base solution")
print("   • Lower value = more random exploration")

# Build population from base GTFS data
pop_size = 30
population_from_data = population_builder.build_initial_population(
    problem=problem,
    pop_size=pop_size,
    optimization_data=opt_data,
    base_solutions='from_data',  # Use base GTFS solution
    frac_gaussian_pert=0.7,      # 70% perturbations, 30% LHS
    gaussian_sigma=0.5,          # Moderate perturbation strength
    random_seed=42               # For reproducibility
)

print(f"\n✅ POPULATION FROM BASE DATA CREATED:")

# Show diversity analysis
base_flat = opt_data['initial_solution']
distances = [np.linalg.norm(population_from_data[i] - base_flat) 
             for i in range(pop_size)]

print(f"\n📊 Population Diversity Analysis:")
print(f"   Mean distance from base: {np.mean(distances):.2f}")
print(f"   Std distance: {np.std(distances):.2f}")
print(f"   Min distance: {np.min(distances):.2f} (base solution)")
print(f"   Max distance: {np.max(distances):.2f}")

print("\n" + "-"*70)
print("STRATEGY B: SEEDING FROM SOLUTION LIST")
print("-"*70)

print("\n📋 The 'solution list' strategy:")
print("   • Uses multiple solutions as bases")
print("   • Creates perturbations around EACH base solution")

print(f"\n🔄 Loading and validating {len(flat_solutions_from_files)} previous solutions...")

# Use SolutionLoader to validate and convert flat solutions
# This handles the flat -> domain format conversion automatically
validated_solutions = solution_loader.load_solutions(
    flat_solutions_from_files,
    opt_data  # Reference opt_data for validation
)

print(f"✅ Solutions validated and converted to domain format:")
for i, sol in enumerate(validated_solutions, 1):
    if isinstance(sol, dict):
        print(f"   Solution {i}: PT shape {sol['pt'].shape}, DRT shape {sol['drt'].shape}")
    else:
        print(f"   Solution {i}: Shape {sol.shape}")

# Build population from solution list
population_from_list = population_builder.build_initial_population(
    problem=problem,
    pop_size=pop_size,
    optimization_data=opt_data,
    base_solutions=validated_solutions,  # Use loaded solutions
    frac_gaussian_pert=0.6,              # 60% perturbations, 40% LHS
    gaussian_sigma=0.3,                  # Smaller sigma (stay closer to good solutions)
    random_seed=123
)

print(f"\n✅ POPULATION FROM SOLUTION LIST CREATED:")

# Diversity analysis
diversity_metrics = []
for i, base_sol in enumerate(validated_solutions):
    base_encoded = problem.encode_solution(base_sol)
    distances_to_base = [np.linalg.norm(population_from_list[j] - base_encoded) 
                        for j in range(pop_size)]
    diversity_metrics.append(np.mean(distances_to_base))

print(f"\n📊 Multi-Solution Diversity:")
print(f"   Average distance to each base solution:")
for i, dist in enumerate(diversity_metrics, 1):
    print(f"      Base {i}: {dist:.2f}")


📚 UNDERSTANDING THE SEEDING WORKFLOW:
   1. Load base solutions (from GTFS or previous runs)
   2. Use SolutionLoader to validate and format solutions
   3. Use PopulationBuilder to create PSO initial population
   4. Pass population to PSO via configuration

🔧 Creating optimization problem...
🗺️ Setting up spatial analysis with 2.0km resolution
🗺️  Reprojected 6897 stops to EPSG:3857
🎯 Applying boundary filter to 6897 stops...
🔍 Filtered 6897 → 4405 points
✅ Filtered to 4405 stops within boundary
🔧 Creating 27 × 26 = 702 grid cells
   Grid bounds: (-195346, 7111759) to (-142657, 7161976) meters
   Cell size: 2000.0m × 2000.0m
✅ Created 702 hexagonal zones in EPSG:3857
🎯 Applying boundary filter to 702 grid cells...
🔍 Filtered 702 → 552 grid cells
✅ Filtered to 552 grid cells within boundary
🚀 Using spatial join for zone mapping...
✅ Mapped 4405 stops to zones
🗺️ Computing DRT spatial intersections for 2 zones...
   Hexagonal grid size: 552 zones
   Zone drt_ne: affects 149 hexagonal 

## Running PSO with initial solutions

Here we bring everything together. Above we were showing how the populations are built for demonstration purposes. When running a PSO algorithm, all of that is handled under the hood. All you need to specify is the sampling config:


``` python
'sampling': {
    'enabled': True,
    'base_solutions': 'from_data',  # Use base GTFS solution. Other option is List
    'frac_gaussian_pert': 0.2, # 20%/80% for Gaussian pertubation/LHS()
    'gaussian_sigma': 1.5,
    'random_seed': 42
}
```



### Option 1: Running PSO with one initial solution 

Here we show an example of running PSO with one base sample (the initial GTFS solution). under the hodd, `PopulationBuilder` fills in the rest of the population using Gaussian perturbations and Latin Hypercube Sampling (LHS) so that it is the size of `pop_size`.

In [6]:
from transit_opt.optimisation.config.config_manager import OptimizationConfigManager
from transit_opt.optimisation.runners.pso_runner import PSORunner

print("\n📝 INTEGRATING SEEDING WITH PSO CONFIGURATION:")
print("   The seeding configuration goes in optimization.sampling section")

# Configuration WITH seeding from base data
config_with_seeding_from_data = {
    'problem': {
        'objective': {
            'type': 'HexagonalCoverageObjective',
            'spatial_resolution_km': 2.0,
            'boundary': study_boundary,
            'crs': 'EPSG:3857',
            'time_aggregation': 'average'
        },
        'constraints': [
            {
                'type': 'FleetTotalConstraintHandler',
                'baseline': 'current_peak',
                'tolerance': 0.30,
                'measure': 'peak'
            }
        ]
    },
    'optimization': {
        'algorithm': {
            'type': 'PSO',
            'pop_size': 60,
            'inertia_weight': 0.9,
            'cognitive_coeff': 2.0,
            'social_coeff': 2.0,
            'adaptive': True
        },
        'sampling': {
            'enabled': True,
            'base_solutions': 'from_data',  # Use base GTFS solution
            'frac_gaussian_pert': 0.2,
            'gaussian_sigma': 1.5,
            'random_seed': 42
        },
        'termination': {
            'max_generations': 30
        },
        'monitoring': {
            'progress_frequency': 5,
            'save_history': True
        }
    }
}

print("\n🚀 RUNNING PSO WITH 'FROM_DATA' SEEDING:")
print(f"   Population size: {config_with_seeding_from_data['optimization']['algorithm']['pop_size']}")
print(f"   Seeding: {config_with_seeding_from_data['optimization']['sampling']['frac_gaussian_pert']*100:.0f}% Gaussian perturbations")
print(f"   Generations: {config_with_seeding_from_data['optimization']['termination']['max_generations']}")

config_manager = OptimizationConfigManager(config_dict=config_with_seeding_from_data)
runner = PSORunner(config_manager)

result_from_data = runner.optimize(opt_data, track_best_n=3)

print(f"\n✅ OPTIMIZATION WITH 'FROM_DATA' SEEDING COMPLETE:")
print(f"   Best objective: {result_from_data.best_objective:.6f}")
print(f"   Runtime: {result_from_data.optimization_time:.1f}s")
print(f"   Generations: {result_from_data.generations_completed}")
print(f"   Feasible: {result_from_data.constraint_violations['feasible']}")
print(f"   Best solutions found: {len(result_from_data.best_feasible_solutions)}")

# Baseline comparison (no seeding)
print("\n📊 BASELINE: PSO WITHOUT SEEDING (for comparison)")

config_no_seeding = config_with_seeding_from_data.copy()
config_no_seeding['optimization']['sampling'] = {'enabled': False}

config_manager_baseline = OptimizationConfigManager(config_dict=config_no_seeding)
runner_baseline = PSORunner(config_manager_baseline)

result_baseline = runner_baseline.optimize(opt_data, track_best_n=3)

print(f"\n✅ BASELINE OPTIMIZATION COMPLETE:")
print(f"   Best objective: {result_baseline.best_objective:.6f}")
print(f"   Runtime: {result_baseline.optimization_time:.1f}s")

# Comparison
improvement = ((result_baseline.best_objective - result_from_data.best_objective) / 
               result_baseline.best_objective * 100)
print(f"\n📈 COMPARISON:")
print(f"   Improvement with seeding: {improvement:+.2f}%")
print(f"   Time difference: {result_from_data.optimization_time - result_baseline.optimization_time:+.1f}s")


📝 INTEGRATING SEEDING WITH PSO CONFIGURATION:
   The seeding configuration goes in optimization.sampling section

🚀 RUNNING PSO WITH 'FROM_DATA' SEEDING:
   Population size: 60
   Seeding: 20% Gaussian perturbations
   Generations: 30
📋 Using provided configuration dictionary
🚀 STARTING PSO OPTIMIZATION
🗺️ Setting up spatial analysis with 2.0km resolution
🗺️  Reprojected 6897 stops to EPSG:3857
🎯 Applying boundary filter to 6897 stops...
🔍 Filtered 6897 → 4405 points
✅ Filtered to 4405 stops within boundary
🔧 Creating 27 × 26 = 702 grid cells
   Grid bounds: (-195346, 7111759) to (-142657, 7161976) meters
   Cell size: 2000.0m × 2000.0m
✅ Created 702 hexagonal zones in EPSG:3857
🎯 Applying boundary filter to 702 grid cells...
🔍 Filtered 702 → 552 grid cells
✅ Filtered to 552 grid cells within boundary
🚀 Using spatial join for zone mapping...
✅ Mapped 4405 stops to zones
🗺️ Computing DRT spatial intersections for 2 zones...
   Hexagonal grid size: 552 zones
   Zone drt_ne: affects 149 

### Option 2: Running PSO with multiple initial solutions

The main difference here is that in the `sampling` config, we provide a List object instead of `from_data`. As shown above, this list is created by loading in multiple solutions from disk using `extract_multiple_gtfs_solutions`

In [7]:
# Configuration with solution list seeding
config_with_solution_list = {
    'problem': {
        'objective': {
            'type': 'HexagonalCoverageObjective',
            'spatial_resolution_km': 2.0,
            'boundary': study_boundary,
            'crs': 'EPSG:3857',
            'time_aggregation': 'average'
        },
        'constraints': [
            {
                'type': 'FleetTotalConstraintHandler',
                'baseline': 'current_peak',
                'tolerance': 0.30,
                'measure': 'peak'
            }
        ]
    },
    'optimization': {
        'algorithm': {
            'type': 'PSO',
            'pop_size': 60,
            'inertia_weight': 0.9,
            'cognitive_coeff': 2.0,
            'social_coeff': 2.0,
            'adaptive': True
        },
        'sampling': {
            'enabled': True,
            'base_solutions': flat_solutions_from_files,  # Use loaded solutions directly
            'frac_gaussian_pert': 0.2,
            'gaussian_sigma': 1.5,
            'random_seed': 123
        },
        'termination': {
            'max_generations': 30
        },
        'monitoring': {
            'progress_frequency': 5,
            'save_history': True
        }
    }
}

print("\n🚀 RUNNING PSO WITH SOLUTION LIST SEEDING:")
print(f"   Base solutions: {len(flat_solutions_from_files)}")
print(f"   Population size: 30")
print(f"   Seeding: 60% Gaussian + 40% LHS")

config_manager_list = OptimizationConfigManager(config_dict=config_with_solution_list)
runner_list = PSORunner(config_manager_list)

result_from_list = runner_list.optimize(opt_data, track_best_n=3)

print(f"\n✅ OPTIMIZATION WITH SOLUTION LIST SEEDING COMPLETE:")
print(f"   Best objective: {result_from_list.best_objective:.6f}")
print(f"   Runtime: {result_from_list.optimization_time:.1f}s")
print(f"   Generations: {result_from_list.generations_completed}")
print(f"   Feasible: {result_from_list.constraint_violations['feasible']}")






🚀 RUNNING PSO WITH SOLUTION LIST SEEDING:
   Base solutions: 3
   Population size: 30
   Seeding: 60% Gaussian + 40% LHS
📋 Using provided configuration dictionary
🚀 STARTING PSO OPTIMIZATION
🗺️ Setting up spatial analysis with 2.0km resolution
🗺️  Reprojected 6897 stops to EPSG:3857
🎯 Applying boundary filter to 6897 stops...
🔍 Filtered 6897 → 4405 points
✅ Filtered to 4405 stops within boundary
🔧 Creating 27 × 26 = 702 grid cells
   Grid bounds: (-195346, 7111759) to (-142657, 7161976) meters
   Cell size: 2000.0m × 2000.0m
✅ Created 702 hexagonal zones in EPSG:3857
🎯 Applying boundary filter to 702 grid cells...
🔍 Filtered 702 → 552 grid cells
✅ Filtered to 552 grid cells within boundary
🚀 Using spatial join for zone mapping...
✅ Mapped 4405 stops to zones
🗺️ Computing DRT spatial intersections for 2 zones...
   Hexagonal grid size: 552 zones
   Zone drt_ne: affects 149 hexagonal zones
   Zone drt_nw: affects 54 hexagonal zones
🚀 Pre-computing route-stop mappings...
✅ Cached stops f

### Demonstration of Multi-PSO run with seeded population

Parallel PSO runs are explained in `2c_optimization_multi_swarm.ipynb`. Here we just show how to run multiple PSO runs with seeded populations. The only difference is that the `sampling` config is passed to each PSO run.

`TODO`: understand if random_seed should be used here or not. If used, the are our initial samples the same?

In [8]:

print("\n📚 MULTI-RUN OPTIMIZATION WITH SEEDING:")
print("   Each run gets a different random seed for perturbations")

# Multi-run configuration with seeding
config_multi_run_seeded = {
    'problem': {
        'objective': {
            'type': 'HexagonalCoverageObjective',
            'spatial_resolution_km': 2.0,
            'boundary': study_boundary,
            'crs': 'EPSG:3857',
            'time_aggregation': 'average'
        },
        'constraints': [
            {
                'type': 'FleetTotalConstraintHandler',
                'baseline': 'current_peak',
                'tolerance': 0.30,
                'measure': 'peak'
            }
        ]
    },
    'optimization': {
        'algorithm': {
            'type': 'PSO',
            'pop_size': 60,
            'inertia_weight': 0.9,
            'cognitive_coeff': 2.0,
            'social_coeff': 2.0,
            'adaptive': True
        },
        'sampling': {
            'enabled': True,
            'base_solutions': flat_solutions_from_files, #'from_data',
            'frac_gaussian_pert': 0.2,
            'gaussian_sigma': 1.5
            # Note: No random_seed, each run gets unique randomness
        },
        'termination': {
            'max_generations': 30  # Shorter for demo
        },
        'monitoring': {
            'progress_frequency': 5,
            'save_history': False
        }
    }
}

print("\n🚀 RUNNING 5 INDEPENDENT PSO RUNS WITH SEEDING:")
print("   Each run: 25 particles, 10 generations")
print("   Seeding: 70% Gaussian from base GTFS solution")
print("   Track top 2 solutions per run")

config_manager_multi = OptimizationConfigManager(config_dict=config_multi_run_seeded)
runner_multi = PSORunner(config_manager_multi)

multi_result_seeded = runner_multi.optimize_multi_run(
    optimization_data=opt_data,
    num_runs=3,
    parallel=True,
    track_best_n=2
)

print(f"\n✅ MULTI-RUN WITH SEEDING COMPLETE:")
print(f"   Best overall objective: {multi_result_seeded.best_result.best_objective:.6f}")
print(f"   Total time: {multi_result_seeded.total_time:.1f}s")
print(f"   Runs completed: {multi_result_seeded.num_runs_completed}")

# Per-run analysis
print(f"\n📊 PER-RUN RESULTS:")
print(f"   {'Run':<5} {'Objective':<12} {'Feasible':<9} {'Time(s)':<8} {'Solutions':<10}")
print(f"   {'-'*55}")
for summary in multi_result_seeded.run_summaries:
    print(f"   {summary['run_id']:<5} {summary['objective']:<12.6f} "
          f"{str(summary['feasible']):<9} {summary['time']:<8.1f} "
          f"{summary['best_feasible_solutions_count']:<10}")

# Statistical analysis
stats = multi_result_seeded.statistical_summary

print(f"\n📈 STATISTICAL ANALYSIS:")
print(f"   Mean objective: {stats['objective_mean']:.6f} ± {stats['objective_std']:.6f}")
print(f"   Best objective: {stats['objective_min']:.6f}")



📚 MULTI-RUN OPTIMIZATION WITH SEEDING:
   Each run gets a different random seed for perturbations

🚀 RUNNING 5 INDEPENDENT PSO RUNS WITH SEEDING:
   Each run: 25 particles, 10 generations
   Seeding: 70% Gaussian from base GTFS solution
   Track top 2 solutions per run
📋 Using provided configuration dictionary
🔄 STARTING MULTI-RUN PSO OPTIMIZATION (3 runs)
   🚀 Parallel execution enabled
🚀 PARALLEL EXECUTION:
   👥 Using 3 parallel workers
   🔇 Individual run output suppressed for clarity
   📊 Progress will be shown as runs complete

📋 Using provided configuration dictionary
📋 Using provided configuration dictionary
📋 Using provided configuration dictionary
[ 1/3] Run  1: Objective=320.054326, Gens=30, Time= 35.4s, FeasibleSols=2, ✅ Feasible
[ 2/3] Run  2: Objective=273.864521, Gens=30, Time= 35.6s, FeasibleSols=2, ✅ Feasible
[ 3/3] Run  3: Objective=316.412550, Gens=30, Time= 35.4s, FeasibleSols=2, ✅ Feasible

✅ All parallel runs completed!

🎯 MULTI-RUN OPTIMIZATION COMPLETED
   Succe

# Comparing results

We can compare whether seeding resulted in better solutions (better objective values) by looking at the best objectives found in each case. 

THIS IS JUST A DEMONSTATION AND THE RESULTS ARE NOT INDICATIVE - PSO NEEDS TO BE RUN WITH LARGER POPULATIONS AND MANY MORE GENERATIONS TO MAKE SUCH A COMPARISON

In [None]:
print(f"\n📊 COMPARING ALL RUNS TO BASELINE (NO SEEDING):")
print(" (Note: We aim to minimize the objective, so lower is better)")
print(f"   {'-'*60}")
print(f"   {'Strategy':<25} {'Objective':<12} {'Time(s)':<8} {'Improvement':<12}")
print(f"   {'-'*60}")

baseline_obj = result_baseline.best_objective

print(f"   {'Baseline (no seeding)':<25} {result_baseline.best_objective:<12.6f} "
        f"{result_baseline.optimization_time:<8.1f} {'--':<12}")

improvement_data = ((baseline_obj - result_from_data.best_objective) / baseline_obj * 100)
print(f"   {'From Data seeding':<25} {result_from_data.best_objective:<12.6f} "
        f"{result_from_data.optimization_time:<8.1f} {improvement_data:+.2f}%")

improvement_list = ((baseline_obj - result_from_list.best_objective) / baseline_obj * 100)
print(f"   {'Solution List seeding':<25} {result_from_list.best_objective:<12.6f} "
        f"{result_from_list.optimization_time:<8.1f} {improvement_list:+.2f}%")

improvement_multi = ((baseline_obj - stats['objective_min']) / baseline_obj * 100)
print(f"   {'Multi-Run seeding':<25} {stats['objective_min']:<12.6f} "
        f"{multi_result_seeded.total_time:<8.1f} {improvement_multi:+.2f}%")



📊 COMPARING ALL RUNS TO BASELINE (NO SEEDING):
 (Note: We aim to minimize the objective, so lower is better.)
   ------------------------------------------------------------
   Strategy                  Objective    Time(s)  Improvement 
   ------------------------------------------------------------
   Baseline (no seeding)     387.895621   24.4     --          
   From Data seeding         377.338692   24.9     +2.72%
   Solution List seeding     262.520622   24.1     +32.32%
   Multi-Run seeding         273.864521   40.3     +29.40%
