# 3. Solutions to GTFS Reconstruction

This notebook demonstrates converting optimization solutions back to GTFS feeds. It covers:

3.1 Quick Problem Setup
- Load GTFS data and create optimization problem
- Run a simple optimization to get solutions

3.2 Phase 1: Solution Conversion
- Convert solution matrices to headway values
- Validate solutions and extract templates

3.3 Phase 2: GTFS Reconstruction
- Generate new trips and stop times
- Create complete GTFS feed

3.4 Validation and Analysis
- Compare original vs reconstructed GTFS
- Verify service patterns match optimization

In [1]:
import os
import sys
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from pathlib import Path

# Add src to path
project_root = Path.cwd().parent
src_path = project_root / "src"
if str(src_path) not in sys.path:
    sys.path.insert(0, str(src_path))


print("=== GTFS RECONSTRUCTION WORKFLOW ===")
print("🔄 Testing solution-to-GTFS conversion pipeline")
print(f"📁 Project root: {project_root}")

=== GTFS RECONSTRUCTION WORKFLOW ===
🔄 Testing solution-to-GTFS conversion pipeline
📁 Project root: /home/hussein/Documents/GitHub/transit_opt


## 3.1 Quick Problem Setup

Create a minimal optimization problem to generate test solutions.

In [2]:
from transit_opt.preprocessing.prepare_gtfs import GTFSDataPreparator
from transit_opt.optimisation.config.config_manager import OptimizationConfigManager
from transit_opt.optimisation.runners.pso_runner import PSORunner

print("=== LOADING GTFS DATA ===")

# Quick GTFS setup
preparator = GTFSDataPreparator(
    gtfs_path='../data/external/study_area_gtfs_bus.zip',
    interval_hours=6,  # 4 periods per day for simplicity
    date=None,
    turnaround_buffer=1.15,
    max_round_trip_minutes=240.0,
    no_service_threshold_minutes=480.0,
    log_level="INFO"
)

# Simple headway choices for testing
allowed_headways = [10, 15, 30, 60, 120]
opt_data = preparator.extract_optimization_data(allowed_headways)

print(f"✅ Loaded {opt_data['n_routes']} routes, {opt_data['n_intervals']} intervals")
print(f"📊 Headway choices: {allowed_headways}")
print(f"🎯 Decision variables: {np.prod(opt_data['decision_matrix_shape'])}")

2025-09-24 16:14:23,222 - transit_opt.preprocessing.prepare_gtfs - INFO - Initializing GTFSDataPreparator with 6h intervals
2025-09-24 16:14:23,223 - transit_opt.preprocessing.prepare_gtfs - INFO - Loading GTFS feed from ../data/external/study_area_gtfs_bus.zip


=== LOADING GTFS DATA ===


2025-09-24 16:14:25,869 - transit_opt.preprocessing.prepare_gtfs - INFO - Using full GTFS feed (all service periods)
2025-09-24 16:14:27,284 - transit_opt.preprocessing.prepare_gtfs - INFO - GTFS loaded and cached in 4.06 seconds
2025-09-24 16:14:27,284 - transit_opt.preprocessing.prepare_gtfs - INFO - Dataset: 13,974 trips, 703,721 stop times
2025-09-24 16:14:27,285 - transit_opt.preprocessing.prepare_gtfs - INFO - Extracting optimization data with 5 allowed headways
2025-09-24 16:14:27,285 - transit_opt.preprocessing.prepare_gtfs - INFO - Extracting route essentials with 6-hour intervals
2025-09-24 16:14:39,857 - transit_opt.preprocessing.prepare_gtfs - INFO - Route extraction complete: 147 routes retained from 187 total
2025-09-24 16:14:39,858 - transit_opt.preprocessing.prepare_gtfs - INFO - Successfully extracted 147 routes for optimization
2025-09-24 16:14:39,868 - transit_opt.preprocessing.prepare_gtfs - INFO - Fleet analysis completed:
2025-09-24 16:14:39,868 - transit_opt.prep

✅ Loaded 147 routes, 4 intervals
📊 Headway choices: [10, 15, 30, 60, 120]
🎯 Decision variables: 588


In [3]:
# Quick optimization to get test solutions
print("=== RUNNING QUICK OPTIMIZATION ===")

# Minimal config for fast testing
test_config = {
    'problem': {
        'objective': {
            'type': 'HexagonalCoverageObjective',
            'spatial_resolution_km': 5.0,  # Larger zones for speed
            'crs': 'EPSG:3857'
        },
        'constraints': [
            {
                'type': 'FleetTotalConstraintHandler',
                'baseline': 'current_peak',
                'tolerance': 0.3,  # Relaxed for testing
                'measure': 'peak'
            }
        ]
    },
    'optimization': {
        'algorithm': {
            'type': 'PSO',
            'pop_size': 20,  # Small for speed
            'inertia_weight': 0.9,
            'inertia_weight_final': 0.4,
            'cognitive_coeff': 2.0,
            'social_coeff': 2.0,
            'use_penalty_method': False
        },
        'termination': {
            'max_generations': 10  # Very short for testing
        },
        'monitoring': {
            'progress_frequency': 5,
            'save_history': True,
            'detailed_logging': False
        }
    }
}

# Run optimization
config_manager = OptimizationConfigManager(config_dict=test_config)
pso_runner = PSORunner(config_manager)
result = pso_runner.optimize(opt_data, track_best_n=5)

print(f"✅ Optimization complete: {result.best_objective:.4f}")
print(f"📊 Found {len(result.best_feasible_solutions) if hasattr(result, 'best_feasible_solutions') and result.best_feasible_solutions else 0} feasible solutions")
print(f"⏱️  Time: {result.optimization_time:.1f}s")

=== RUNNING QUICK OPTIMIZATION ===
📋 Using provided configuration dictionary
🚀 STARTING PSO OPTIMIZATION
🗺️ Setting up spatial analysis with 5.0km resolution
🗺️  Reprojected 6897 stops to EPSG:3857
🔧 Creating 88 × 156 = 13728 grid cells
   Grid bounds: (-454051, 6583019) to (-14085, 7359492) meters
   Cell size: 5000.0m × 5000.0m
✅ Created 13728 hexagonal zones in EPSG:3857
🚀 Using spatial join for zone mapping...
✅ Mapped 6897 stops to zones
🚀 Pre-computing route-stop mappings...
✅ Cached stops for 278 routes/services
✅ Spatial system ready: 13728 hexagonal zones
   📋 Creating 1 constraint handler(s)...
      Creating constraint 1: FleetTotalConstraintHandler
         ✓ FleetTotal: 1 constraint(s)
🏗️  CREATING TRANSIT OPTIMIZATION PROBLEM:
   📊 Problem dimensions:
      Routes: 147
      Time intervals: 4
      Headway choices: 6
   🔧 Pymoo parameters:
      Decision variables: 588
      Objectives: 1
      Constraints: 1
   🚦 Hard constraints: 1 constraint(s)
   📋 Constraint breakdow

## 3.2 Phase 1: Solution Conversion

Test the SolutionConverter class for converting optimization results to headway specifications.

In [4]:
from transit_opt.gtfs.gtfs import SolutionConverter

print("=== TESTING SOLUTION CONVERTER ===")

# Create converter
converter = SolutionConverter(opt_data)

# Test with best solution
test_solution = result.best_solution
print(f"🔍 Testing solution shape: {test_solution.shape}")

# Validate solution
validation = converter.validate_solution(test_solution)
print(f"\n✅ VALIDATION RESULTS:")
print(f"   Valid: {validation['valid']}")
print(f"   Errors: {len(validation['errors'])}")
print(f"   Warnings: {len(validation['warnings'])}")

if validation['errors']:
    for error in validation['errors']:
        print(f"   ❌ {error}")

# Show statistics
stats = validation['statistics']
print(f"\n📊 SOLUTION STATISTICS:")
print(f"   Service coverage: {stats['service_percentage']:.1f}%")
print(f"   Active cells: {stats['service_cells']}/{stats['total_cells']}")
print(f"   Headway distribution: {stats['headway_distribution']}")

=== TESTING SOLUTION CONVERTER ===
🔍 Testing solution shape: (147, 4)

✅ VALIDATION RESULTS:
   Valid: True
   Errors: 0

📊 SOLUTION STATISTICS:
   Service coverage: 88.4%
   Active cells: 520/588
   Headway distribution: {'10min': 72, '15min': 100, '30min': 111, '60min': 125, '120min': 112, 'No Service': 68}


In [5]:
# Convert to headways dictionary
print("=== CONVERTING TO HEADWAYS ===")

headways_dict = converter.solution_to_headways(test_solution)
print(f"📋 Converted {len(headways_dict)} routes")

# Show example route
example_route = list(headways_dict.keys())[50]
print(f"\n📋 EXAMPLE ROUTE ({example_route}):")
for interval, headway in headways_dict[example_route].items():
    if headway is None:
        print(f"   {interval:>15}: No Service")
    else:
        print(f"   {interval:>15}: {headway:.0f} minutes")

# Extract template trips
print(f"\n🔧 EXTRACTING TEMPLATE TRIPS:")
templates = converter.extract_route_templates()
print(f"   Templates extracted: {len(templates)}/{len(headways_dict)}")

if templates:
    example_template = templates[example_route]
    print(f"   Example template ({example_route}):")
    print(f"      Trip ID: {example_template['trip_id']}")
    print(f"      Duration: {example_template['duration_minutes']:.1f} min")
    print(f"      Stops: {example_template['n_stops']}")

=== CONVERTING TO HEADWAYS ===
📋 Converted 147 routes

📋 EXAMPLE ROUTE (30532):
            00-06h: No Service
            06-12h: 120 minutes
            12-18h: 120 minutes
            18-24h: No Service

🔧 EXTRACTING TEMPLATE TRIPS:
   Templates extracted: 147/147
   Example template (30532):
      Trip ID: VJ7a7e72c601331a8a0a26ade124264eb58cdd5889
      Duration: 63.0 min
      Stops: 62


## 3.3 Phase 2: GTFS Reconstruction 

This section will implement trip generation and GTFS file creation.

In [6]:
# Add this new cell for Phase 2 testing
print("=== PHASE 2: TRIP GENERATION ===")

# Generate trips and stop times
new_trips_df, new_stop_times_df = converter.generate_trips_and_stop_times(headways_dict, templates)

print(f"📊 GENERATION RESULTS:")
print(f"   Generated trips: {len(new_trips_df)}")
print(f"   Generated stop times: {len(new_stop_times_df)}")

if len(new_trips_df) > 0:
    print(f"   Routes with trips: {new_trips_df['route_id'].nunique()}")
    print(f"   Trips per route (avg): {len(new_trips_df) / new_trips_df['route_id'].nunique():.1f}")
    
    # Show example trip
    example_trip_id = new_trips_df.iloc[5]['trip_id']
    example_route_id = new_trips_df.iloc[5]['route_id']
    
    print(f"\n📋 EXAMPLE TRIP ({example_trip_id}):")
    print(f"   Route: {example_route_id}")
    
    # Show stop times for this trip
    trip_stop_times = new_stop_times_df[new_stop_times_df['trip_id'] == example_trip_id]
    print(f"   Stops: {len(trip_stop_times)}")
    print(f"   First departure: {trip_stop_times.iloc[0]['departure_time']}")
    print(f"   Last arrival: {trip_stop_times.iloc[-1]['arrival_time']}")
    
    # Validate timing
    first_dep = trip_stop_times.iloc[0]['departure_seconds']
    last_arr = trip_stop_times.iloc[-1]['arrival_seconds']
    actual_duration = (last_arr - first_dep) / 60
    template_duration = templates[example_route_id]['duration_minutes']
    
    print(f"   Duration: {actual_duration:.1f} min (template: {template_duration:.1f} min)")

=== PHASE 2: TRIP GENERATION ===
📊 GENERATION RESULTS:
   Generated trips: 6790
   Generated stop times: 282970
   Routes with trips: 147
   Trips per route (avg): 46.2

📋 EXAMPLE TRIP (opt_trip_50627_06_005):
   Route: 50627
   Stops: 43
   First departure: 08:30:00
   Last arrival: 09:05:00
   Duration: 35.0 min (template: 35.0 min)


In [7]:
# Add this diagnostic cell to see the current structure
print("=== DATA STRUCTURE ANALYSIS ===")

print(f"📊 OPTIMIZATION DATA STRUCTURE:")
print(f"   Routes: {len(opt_data['routes']['ids'])} unique route IDs")
print(f"   Intervals: {len(opt_data['intervals']['labels'])} time periods")
print(f"   Headway choices: {opt_data['allowed_headways']}")

print(f"\n📋 HEADWAYS DICTIONARY STRUCTURE:")
print(f"   Format: route_id -> {{interval_label: headway_minutes}}")
example_route = list(headways_dict.keys())[0]
print(f"   Example ({example_route}): {headways_dict[example_route]}")

print(f"\n🚌 TEMPLATES STRUCTURE:")
print(f"   Format: route_id -> {{trip_id, duration_minutes, n_stops, stop_times_df}}")
if templates:
    example_template = templates[example_route]
    print(f"   Example ({example_route}):")
    print(f"      - trip_id: {example_template['trip_id']}")
    print(f"      - duration: {example_template['duration_minutes']:.1f} min")
    print(f"      - stops: {example_template['n_stops']}")

print(f"\n🎯 GENERATED DATA STRUCTURE:")
print(f"   new_trips_df columns: {list(new_trips_df.columns)}")
print(f"   new_stop_times_df columns: {list(new_stop_times_df.columns)}")

# Show interval breakdown
print(f"\n⏰ INTERVAL STRUCTURE:")
for i, (label, (start, end)) in enumerate(zip(opt_data['intervals']['labels'], opt_data['intervals']['hours'])):
    print(f"   {i}: {label} = {start:02d}:00 to {end:02d}:00")

=== DATA STRUCTURE ANALYSIS ===
📊 OPTIMIZATION DATA STRUCTURE:
   Routes: 147 unique route IDs
   Intervals: 4 time periods
   Headway choices: [  10.   15.   30.   60.  120. 9999.]

📋 HEADWAYS DICTIONARY STRUCTURE:
   Format: route_id -> {interval_label: headway_minutes}
   Example (50627): {'00-06h': None, '06-12h': 30.0, '12-18h': 10.0, '18-24h': 120.0}

🚌 TEMPLATES STRUCTURE:
   Format: route_id -> {trip_id, duration_minutes, n_stops, stop_times_df}
   Example (50627):
      - trip_id: VJ61453c073b835840fdf78144f238befa6c687ca8
      - duration: 35.0 min
      - stops: 43

🎯 GENERATED DATA STRUCTURE:
   new_trips_df columns: ['trip_id', 'route_id', 'service_id', 'trip_headsign', 'direction_id', 'shape_id']
   new_stop_times_df columns: ['trip_id', 'stop_id', 'stop_sequence', 'arrival_time', 'departure_time', 'arrival_seconds', 'departure_seconds', 'pickup_type', 'drop_off_type']

⏰ INTERVAL STRUCTURE:
   0: 00-06h = 00:00 to 06:00
   1: 06-12h = 06:00 to 12:00
   2: 12-18h = 12:00 

In [8]:
# Add this cell to understand trip generation
print("=== TRIP GENERATION ANALYSIS ===")

# Analyze trips per route and interval
route_analysis = []
for route_id in new_trips_df['route_id'].unique()[:5]:  # Check first 5 routes
    route_trips = new_trips_df[new_trips_df['route_id'] == route_id]
    route_headways = headways_dict[route_id]
    
    print(f"\n🚌 ROUTE {route_id}:")
    print(f"   Total trips generated: {len(route_trips)}")
    print(f"   Headway schedule: {route_headways}")
    
    # Calculate expected trips per interval
    for interval_label, headway in route_headways.items():
        if headway is not None:
            interval_duration_hours = 6  # Your interval_hours setting
            expected_trips = (interval_duration_hours * 60) / headway
            print(f"   {interval_label}: {headway}min headway → ~{expected_trips:.1f} trips expected")

# Show trip ID patterns
print(f"\n🎫 TRIP ID PATTERNS:")
sample_trips = new_trips_df[new_trips_df['route_id'] == list(new_trips_df['route_id'].unique())[0]]
for trip_id in sample_trips['trip_id'].head(8):
    print(f"   {trip_id}")

=== TRIP GENERATION ANALYSIS ===

🚌 ROUTE 50627:
   Total trips generated: 47
   Headway schedule: {'00-06h': None, '06-12h': 30.0, '12-18h': 10.0, '18-24h': 120.0}
   06-12h: 30.0min headway → ~12.0 trips expected
   12-18h: 10.0min headway → ~36.0 trips expected
   18-24h: 120.0min headway → ~3.0 trips expected

🚌 ROUTE 50628:
   Total trips generated: 32
   Headway schedule: {'00-06h': 60.0, '06-12h': 60.0, '12-18h': 120.0, '18-24h': 15.0}
   00-06h: 60.0min headway → ~6.0 trips expected
   06-12h: 60.0min headway → ~6.0 trips expected
   12-18h: 120.0min headway → ~3.0 trips expected
   18-24h: 15.0min headway → ~24.0 trips expected

🚌 ROUTE 50629:
   Total trips generated: 25
   Headway schedule: {'00-06h': 30.0, '06-12h': 60.0, '12-18h': 60.0, '18-24h': 60.0}
   00-06h: 30.0min headway → ~12.0 trips expected
   06-12h: 60.0min headway → ~6.0 trips expected
   12-18h: 60.0min headway → ~6.0 trips expected
   18-24h: 60.0min headway → ~6.0 trips expected

🚌 ROUTE 58940:
   Total tr

In [9]:
print("=== TRIP GENERATION VERIFICATION ===")

test_route = list(headways_dict.keys())[7]
test_headways = headways_dict[test_route]
test_trips = new_trips_df[new_trips_df['route_id'] == test_route]

print(f"🔍 TRACING ROUTE {test_route}:")
print(f"   Headway schedule: {test_headways}")
print(f"   Generated trips: {len(test_trips)}")

# Group trips by time interval based on trip_id pattern
for interval_idx, (interval_label, headway) in enumerate(test_headways.items()):
    if headway is not None:
        start_hour = interval_idx * 6  # Your 6-hour intervals
        
        # Look for the correct pattern after fixing trip IDs
        interval_trips = test_trips[test_trips['trip_id'].str.contains(f"_{start_hour:02d}_")]
        
        expected_trips = (6 * 60) / headway  # 6 hours / headway
        print(f"   {interval_label} ({start_hour:02d}h): {len(interval_trips)} trips (expected ~{expected_trips:.1f})")
        
        # Show actual trip IDs for this interval
        if len(interval_trips) > 0:
            sample_ids = interval_trips['trip_id'].head(3).tolist()
            print(f"      Sample IDs: {sample_ids}")

=== TRIP GENERATION VERIFICATION ===
🔍 TRACING ROUTE 12826:
   Headway schedule: {'00-06h': 120.0, '06-12h': 120.0, '12-18h': 120.0, '18-24h': 10.0}
   Generated trips: 39
   00-06h (00h): 3 trips (expected ~3.0)
      Sample IDs: ['opt_trip_12826_00_000', 'opt_trip_12826_00_001', 'opt_trip_12826_00_002']
   06-12h (06h): 3 trips (expected ~3.0)
      Sample IDs: ['opt_trip_12826_06_000', 'opt_trip_12826_06_001', 'opt_trip_12826_06_002']
   12-18h (12h): 3 trips (expected ~3.0)
      Sample IDs: ['opt_trip_12826_12_000', 'opt_trip_12826_12_001', 'opt_trip_12826_12_002']
   18-24h (18h): 30 trips (expected ~36.0)
      Sample IDs: ['opt_trip_12826_18_000', 'opt_trip_12826_18_001', 'opt_trip_12826_18_002']


In [10]:
# Add this diagnostic cell
print("=== TRIP CUTTING ANALYSIS ===")

# Check why fewer trips than expected
test_route = '50627'
test_headways = headways_dict[test_route]

if test_route in templates:
    template = templates[test_route]
    trip_duration = template['duration_minutes']
    
    print(f"🔍 ROUTE {test_route} ANALYSIS:")
    print(f"   Trip duration: {trip_duration:.1f} minutes")
    print(f"   Headway schedule: {test_headways}")
    
    intervals = opt_data['intervals']
    
    for interval_idx, (interval_label, headway) in enumerate(test_headways.items()):
        if headway is not None:
            start_hour, end_hour = intervals['hours'][interval_idx]
            
            # Calculate available time for trips
            interval_duration_min = (end_hour - start_hour) * 60
            time_available_for_trips = interval_duration_min - trip_duration
            
            theoretical_trips = interval_duration_min / headway
            actual_max_trips = (time_available_for_trips / headway) + 1
            
            print(f"\n   📊 {interval_label} ({start_hour:02d}-{end_hour:02d}h):")
            print(f"      Interval duration: {interval_duration_min} min")
            print(f"      Trip duration: {trip_duration:.1f} min")
            print(f"      Available time: {time_available_for_trips:.1f} min")
            print(f"      Theoretical trips: {theoretical_trips:.1f}")
            print(f"      Actual max trips: {actual_max_trips:.1f}")
            print(f"      Trip cutting: {theoretical_trips - actual_max_trips:.1f} trips lost")

=== TRIP CUTTING ANALYSIS ===
🔍 ROUTE 50627 ANALYSIS:
   Trip duration: 35.0 minutes
   Headway schedule: {'00-06h': None, '06-12h': 30.0, '12-18h': 10.0, '18-24h': 120.0}

   📊 06-12h (06-12h):
      Interval duration: 360 min
      Trip duration: 35.0 min
      Available time: 325.0 min
      Theoretical trips: 12.0
      Actual max trips: 11.8
      Trip cutting: 0.2 trips lost

   📊 12-18h (12-18h):
      Interval duration: 360 min
      Trip duration: 35.0 min
      Available time: 325.0 min
      Theoretical trips: 36.0
      Actual max trips: 33.5
      Trip cutting: 2.5 trips lost

   📊 18-24h (18-24h):
      Interval duration: 360 min
      Trip duration: 35.0 min
      Available time: 325.0 min
      Theoretical trips: 3.0
      Actual max trips: 3.7
      Trip cutting: -0.7 trips lost


## 3.3 Phase 3: Create GTFS Feed

In [11]:
# Add this new cell for Phase 3 testing
print("=== PHASE 3: COMPLETE GTFS CREATION ===")

# Build complete GTFS feed
gtfs_output_path = converter.build_complete_gtfs(headways_dict, templates)

print(f"\n📁 GTFS FILES CREATED:")
import os
for filename in sorted(os.listdir(gtfs_output_path)):
    if filename.endswith('.txt'):
        filepath = os.path.join(gtfs_output_path, filename)
        file_size = os.path.getsize(filepath)
        print(f"   {filename:15} ({file_size:,} bytes)")

# Quick validation
print(f"\n🔍 QUICK VALIDATION:")

# Check referential integrity
trips_df = pd.read_csv(os.path.join(gtfs_output_path, 'trips.txt'))
stop_times_df = pd.read_csv(os.path.join(gtfs_output_path, 'stop_times.txt'))
routes_df = pd.read_csv(os.path.join(gtfs_output_path, 'routes.txt'))
stops_df = pd.read_csv(os.path.join(gtfs_output_path, 'stops.txt'))

# Check trip consistency
trip_ids_in_trips = set(trips_df['trip_id'])
trip_ids_in_stop_times = set(stop_times_df['trip_id'])
print(f"   Trip IDs match: {trip_ids_in_trips == trip_ids_in_stop_times}")

# Check route consistency  
route_ids_in_routes = set(routes_df['route_id'])
route_ids_in_trips = set(trips_df['route_id'])
print(f"   Route coverage: {len(route_ids_in_trips)}/{len(route_ids_in_routes)} routes have trips")

# Check stop consistency
stop_ids_in_stops = set(stops_df['stop_id'])
stop_ids_in_stop_times = set(stop_times_df['stop_id'])
print(f"   Stop coverage: {len(stop_ids_in_stop_times.intersection(stop_ids_in_stops))}/{len(stop_ids_in_stop_times)} stops valid")

print(f"\n✅ GTFS RECONSTRUCTION COMPLETE!")

=== PHASE 3: COMPLETE GTFS CREATION ===
📁 Creating directory output: output/optimized_gtfs


Found 155 stops with invalid parent_station references: <StringArray>
[    '450G2617',     '450G7920',     '450G2572',     '450G2393',
     '450G7954',     '450G7613',     '450G6820',     '450G8341',
     '450G7637',     '450G7935',     '450G1142',     '450G7922',
     '450G9423',     '450G9566',     '450G8193',     '029G0052',
  '049GBUSCWY1',    '075G71047',    '079G73001', '109GDDCCBS01',
     '180GCSBS',     '180GMABS',     '180GSHIC',  '269GLC30614',
 '280G00000005',   '330GMA0337',     '339GBB08', '340G00001090',
   '370G100007',   '370G100004',   '370G105120',   '370G100009',
   '380G510101',    '430G00050',    '430G01055',    '430G21031',
   '440GCY0359',     '450G5168',     '450G6011',     '450G7949',
     '450G8049',    '450G21825',    '450G22583',     '450G7924',
     '450G7936',     '450G9478',     '450G8061',     '450G9178',
     '450G8028',  '910GHTRWCBS',  '910GHTRBUS5',     '910GPBRO']
Length: 52, dtype: string


⚠️  Found 155 stops with invalid parent_station references
   Missing parent stations: ['450G2617', '450G7920', '450G2572', '450G2393', '450G7954', '450G7613', '450G6820', '450G8341', '450G7637', '450G7935', '450G1142', '450G7922', '450G9423', '450G9566', '450G8193', '029G0052', '049GBUSCWY1', '075G71047', '079G73001', '109GDDCCBS01', '180GCSBS', '180GMABS', '180GSHIC', '269GLC30614', '280G00000005', '330GMA0337', '339GBB08', '340G00001090', '370G100007', '370G100004', '370G105120', '370G100009', '380G510101', '430G00050', '430G01055', '430G21031', '440GCY0359', '450G5168', '450G6011', '450G7949', '450G8049', '450G21825', '450G22583', '450G7924', '450G7936', '450G9478', '450G8061', '450G9178', '450G8028', '910GHTRWCBS', '910GHTRBUS5', '910GPBRO']
✅ Cleared 155 invalid parent_station references
✅ Fixed and copied stops.txt: 6897 stops
✅ Copied routes.txt: 187 routes
✅ Copied agency.txt: 24 agencies
✅ Generated trips.txt: 6790 trips
✅ Generated stop_times.txt: 282970 stop times
📅 Using m

  stop_times_df = pd.read_csv(os.path.join(gtfs_output_path, 'stop_times.txt'))


In [12]:
# Enhanced Phase 3 testing with ZIP options
print("=== PHASE 3: ENHANCED GTFS CREATION WITH ZIP OPTIONS ===")

# Option 1: Directory output (existing behavior)
print("\n📁 CREATING DIRECTORY OUTPUT:")
gtfs_dir_path = converter.build_complete_gtfs(
    headways_dict, 
    templates, 
    service_id='my_optimized_service',
    output_dir='output/optimized_directory'
)

# Option 2: ZIP output - basic
print("\n📦 CREATING ZIP OUTPUT:")
gtfs_zip_path = converter.build_complete_gtfs(
    headways_dict, 
    templates, 
    service_id='my_optimized_service',
    output_dir='output/optimized_feed',  # Will become optimized_feed.zip
    zip_output=True  # ✅ NEW parameter
)

# Option 3: ZIP with custom dates
print("\n📦 CREATING ZIP WITH CUSTOM DATES:")
gtfs_zip_custom = converter.build_complete_gtfs(
    headways_dict, 
    templates, 
    service_id='service_2023_2024',
    start_date='20230101', 
    end_date='20241231',
    output_dir='output/service_2023_2024',
    zip_output=True
)


=== PHASE 3: ENHANCED GTFS CREATION WITH ZIP OPTIONS ===

📁 CREATING DIRECTORY OUTPUT:
📁 Creating directory output: output/optimized_directory


Found 155 stops with invalid parent_station references: <StringArray>
[    '450G2617',     '450G7920',     '450G2572',     '450G2393',
     '450G7954',     '450G7613',     '450G6820',     '450G8341',
     '450G7637',     '450G7935',     '450G1142',     '450G7922',
     '450G9423',     '450G9566',     '450G8193',     '029G0052',
  '049GBUSCWY1',    '075G71047',    '079G73001', '109GDDCCBS01',
     '180GCSBS',     '180GMABS',     '180GSHIC',  '269GLC30614',
 '280G00000005',   '330GMA0337',     '339GBB08', '340G00001090',
   '370G100007',   '370G100004',   '370G105120',   '370G100009',
   '380G510101',    '430G00050',    '430G01055',    '430G21031',
   '440GCY0359',     '450G5168',     '450G6011',     '450G7949',
     '450G8049',    '450G21825',    '450G22583',     '450G7924',
     '450G7936',     '450G9478',     '450G8061',     '450G9178',
     '450G8028',  '910GHTRWCBS',  '910GHTRBUS5',     '910GPBRO']
Length: 52, dtype: string


⚠️  Found 155 stops with invalid parent_station references
   Missing parent stations: ['450G2617', '450G7920', '450G2572', '450G2393', '450G7954', '450G7613', '450G6820', '450G8341', '450G7637', '450G7935', '450G1142', '450G7922', '450G9423', '450G9566', '450G8193', '029G0052', '049GBUSCWY1', '075G71047', '079G73001', '109GDDCCBS01', '180GCSBS', '180GMABS', '180GSHIC', '269GLC30614', '280G00000005', '330GMA0337', '339GBB08', '340G00001090', '370G100007', '370G100004', '370G105120', '370G100009', '380G510101', '430G00050', '430G01055', '430G21031', '440GCY0359', '450G5168', '450G6011', '450G7949', '450G8049', '450G21825', '450G22583', '450G7924', '450G7936', '450G9478', '450G8061', '450G9178', '450G8028', '910GHTRWCBS', '910GHTRBUS5', '910GPBRO']
✅ Cleared 155 invalid parent_station references
✅ Fixed and copied stops.txt: 6897 stops
✅ Copied routes.txt: 187 routes
✅ Copied agency.txt: 24 agencies
✅ Generated trips.txt: 6790 trips
✅ Generated stop_times.txt: 282970 stop times
📅 Using m

Found 155 stops with invalid parent_station references: <StringArray>
[    '450G2617',     '450G7920',     '450G2572',     '450G2393',
     '450G7954',     '450G7613',     '450G6820',     '450G8341',
     '450G7637',     '450G7935',     '450G1142',     '450G7922',
     '450G9423',     '450G9566',     '450G8193',     '029G0052',
  '049GBUSCWY1',    '075G71047',    '079G73001', '109GDDCCBS01',
     '180GCSBS',     '180GMABS',     '180GSHIC',  '269GLC30614',
 '280G00000005',   '330GMA0337',     '339GBB08', '340G00001090',
   '370G100007',   '370G100004',   '370G105120',   '370G100009',
   '380G510101',    '430G00050',    '430G01055',    '430G21031',
   '440GCY0359',     '450G5168',     '450G6011',     '450G7949',
     '450G8049',    '450G21825',    '450G22583',     '450G7924',
     '450G7936',     '450G9478',     '450G8061',     '450G9178',
     '450G8028',  '910GHTRWCBS',  '910GHTRBUS5',     '910GPBRO']
Length: 52, dtype: string


⚠️  Found 155 stops with invalid parent_station references
   Missing parent stations: ['450G2617', '450G7920', '450G2572', '450G2393', '450G7954', '450G7613', '450G6820', '450G8341', '450G7637', '450G7935', '450G1142', '450G7922', '450G9423', '450G9566', '450G8193', '029G0052', '049GBUSCWY1', '075G71047', '079G73001', '109GDDCCBS01', '180GCSBS', '180GMABS', '180GSHIC', '269GLC30614', '280G00000005', '330GMA0337', '339GBB08', '340G00001090', '370G100007', '370G100004', '370G105120', '370G100009', '380G510101', '430G00050', '430G01055', '430G21031', '440GCY0359', '450G5168', '450G6011', '450G7949', '450G8049', '450G21825', '450G22583', '450G7924', '450G7936', '450G9478', '450G8061', '450G9178', '450G8028', '910GHTRWCBS', '910GHTRBUS5', '910GPBRO']
✅ Cleared 155 invalid parent_station references
✅ Fixed and copied stops.txt: 6897 stops
✅ Copied routes.txt: 187 routes
✅ Copied agency.txt: 24 agencies
✅ Generated trips.txt: 6790 trips
✅ Generated stop_times.txt: 282970 stop times
📅 Using m

Found 155 stops with invalid parent_station references: <StringArray>
[    '450G2617',     '450G7920',     '450G2572',     '450G2393',
     '450G7954',     '450G7613',     '450G6820',     '450G8341',
     '450G7637',     '450G7935',     '450G1142',     '450G7922',
     '450G9423',     '450G9566',     '450G8193',     '029G0052',
  '049GBUSCWY1',    '075G71047',    '079G73001', '109GDDCCBS01',
     '180GCSBS',     '180GMABS',     '180GSHIC',  '269GLC30614',
 '280G00000005',   '330GMA0337',     '339GBB08', '340G00001090',
   '370G100007',   '370G100004',   '370G105120',   '370G100009',
   '380G510101',    '430G00050',    '430G01055',    '430G21031',
   '440GCY0359',     '450G5168',     '450G6011',     '450G7949',
     '450G8049',    '450G21825',    '450G22583',     '450G7924',
     '450G7936',     '450G9478',     '450G8061',     '450G9178',
     '450G8028',  '910GHTRWCBS',  '910GHTRBUS5',     '910GPBRO']
Length: 52, dtype: string


⚠️  Found 155 stops with invalid parent_station references
   Missing parent stations: ['450G2617', '450G7920', '450G2572', '450G2393', '450G7954', '450G7613', '450G6820', '450G8341', '450G7637', '450G7935', '450G1142', '450G7922', '450G9423', '450G9566', '450G8193', '029G0052', '049GBUSCWY1', '075G71047', '079G73001', '109GDDCCBS01', '180GCSBS', '180GMABS', '180GSHIC', '269GLC30614', '280G00000005', '330GMA0337', '339GBB08', '340G00001090', '370G100007', '370G100004', '370G105120', '370G100009', '380G510101', '430G00050', '430G01055', '430G21031', '440GCY0359', '450G5168', '450G6011', '450G7949', '450G8049', '450G21825', '450G22583', '450G7924', '450G7936', '450G9478', '450G8061', '450G9178', '450G8028', '910GHTRWCBS', '910GHTRBUS5', '910GPBRO']
✅ Cleared 155 invalid parent_station references
✅ Fixed and copied stops.txt: 6897 stops
✅ Copied routes.txt: 187 routes
✅ Copied agency.txt: 24 agencies
✅ Generated trips.txt: 6790 trips
✅ Generated stop_times.txt: 282970 stop times
📅 Using p