# Solution Approaches Comparison: Greedy vs Metaheuristic vs MILP

This notebook compares three solution approaches for freight dispatch:

## Key Questions:
1. How do solution quality and computational cost compare?
2. Where does each approach fit?
3. When should you use which approach?

## Three Approaches:
- **Greedy Heuristics**: Fast algorithms making locally optimal decisions (milliseconds)
- **Local Search (Metaheuristic)**: Iteratively improve greedy solutions (seconds)
- **MILP Optimization**: Provably optimal solutions using exact methods (minutes)

We'll test on multiple datasets and visualize the tradeoffs between solution quality, solving time, and scalability.

**For detailed theoretical background**, see `SOLUTION_APPROACHES.md`.

## Setup

In [None]:
using Pkg
Pkg.activate(".")
Pkg.instantiate()

In [None]:
using FreightDispatchSimulator
using CSV, DataFrames
using Plots
using Statistics
using Printf

gr()
default(size=(900, 600), legend=:best)

## Helper Functions

In [None]:
"""
Run greedy strategy and time it
"""
function run_greedy(freights_df, vehicles_df, strategy, strategy_name)
    start_time = time()
    freight_results, vehicle_aggregates = Simulation(
        freights_df, vehicles_df, 3600.0, strategy
    )
    solve_time = time() - start_time
    
    total_distance = sum(vehicle_aggregates.total_distance_km)
    
    return (
        name = strategy_name,
        type = "Greedy",
        total_distance = total_distance,
        solve_time = solve_time,
        n_freights = nrow(freight_results),
        result = (freight_results, vehicle_aggregates)
    )
end

"""
Run local search on greedy solution
"""
function run_local_search(greedy_result_tuple, freights_df, vehicles_df; time_limit=10.0)
    start_time = time()
    result = local_search_optimize(
        greedy_result_tuple,
        freights_df, vehicles_df,
        time_limit=time_limit
    )
    
    return (
        name = "Local Search",
        type = "Metaheuristic",
        total_distance = result.objective_value,
        solve_time = result.solve_time,
        n_freights = nrow(result.freight_results),
        initial_distance = result.initial_objective,
        improvement = result.improvement,
        iterations = result.iterations
    )
end

"""
Run MILP optimization and time it
"""
function run_milp(freights_df, vehicles_df; time_limit=60.0)
    result = optimize_dispatch(freights_df, vehicles_df, 
                              time_limit=time_limit, verbose=false)
    
    return (
        name = "MILP",
        type = "Optimal",
        total_distance = result.objective_value,
        solve_time = result.solve_time,
        n_freights = nrow(result.freight_results),
        status = result.termination_status
    )
end

## Experiment 1: Small Dataset (Urban)

Start with a small dataset to see MILP in action.

In [None]:
# Load urban dataset
freights_urban = CSV.read("data/urban/freights.csv", DataFrame)
vehicles_urban = CSV.read("data/urban/vehicles.csv", DataFrame)

println("Urban Dataset: $(nrow(freights_urban)) freights, $(nrow(vehicles_urban)) vehicles")

In [None]:
# Define strategies to compare
greedy_strategies = [
    (FCFSStrategy(), "FCFS"),
    (CostStrategy(), "Cost"),
    (DistanceStrategy(), "Distance"),
    (OverallCostStrategy(), "OverallCost")
]

# Run all greedy strategies
println("\nRunning greedy strategies...")
greedy_results = []
for (strategy, name) in greedy_strategies
    result = run_greedy(freights_urban, vehicles_urban, strategy, name)
    push!(greedy_results, result)
    @printf("  %s: %.1f km in %.4f seconds\n", 
            result.name, result.total_distance, result.solve_time)
end

# Run local search on best greedy
println("\nRunning local search metaheuristic...")
best_greedy_idx = argmin([r.total_distance for r in greedy_results])
best_greedy = greedy_results[best_greedy_idx]
ls_result = run_local_search(best_greedy.result, freights_urban, vehicles_urban, time_limit=10.0)
@printf("  Local Search: %.1f km in %.4f seconds [%.2f%% improvement, %d iterations]\n", 
        ls_result.total_distance, ls_result.solve_time, ls_result.improvement, ls_result.iterations)

# Run MILP
println("\nRunning MILP optimization...")
milp_result = run_milp(freights_urban, vehicles_urban, time_limit=60.0)
@printf("  MILP: %.1f km in %.4f seconds [%s]\n", 
        milp_result.total_distance, milp_result.solve_time, milp_result.status)

### Visualize Results: Distance Comparison

In [None]:
# Combine results
all_results = vcat(greedy_results, [ls_result, milp_result])
names = [r.name for r in all_results]
distances = [r.total_distance for r in all_results]
times = [r.solve_time for r in all_results]

# Calculate optimality gap
optimal_distance = milp_result.total_distance
gaps = [(d - optimal_distance) / optimal_distance * 100 for d in distances]

# Plot distance comparison
colors = [:skyblue, :lightgreen, :orange, :coral, :yellow, :purple]
p1 = bar(names, distances,
    title="Total Distance: Greedy vs Metaheuristic vs MILP (Urban)",
    xlabel="Strategy",
    ylabel="Total Distance (km)",
    color=colors,
    legend=false,
    grid=true,
    fillalpha=0.7,
    xrotation=45
)

# Add optimality gap labels
for (i, (d, gap)) in enumerate(zip(distances, gaps))
    if gap > 0.1
        annotate!(i, d + 5, text(@sprintf("+%.1f%%", gap), 8))
    else
        annotate!(i, d + 5, text("OPTIMAL", 8, :green))
    end
end

hline!([optimal_distance], color=:red, linestyle=:dash, linewidth=2, label="Optimal")
plot!()

### Visualize Results: Solving Time Comparison

In [None]:
# Plot solving time (log scale due to large differences)
p2 = bar(names, times,
    title="Solving Time: Greedy vs MILP (Urban)",
    xlabel="Strategy",
    ylabel="Solving Time (seconds)",
    color=colors,
    legend=false,
    grid=true,
    fillalpha=0.7,
    yscale=:log10
)

# Add time labels
for (i, t) in enumerate(times)
    if t < 0.01
        annotate!(i, t * 2, text(@sprintf("%.4fs", t), 8))
    else
        annotate!(i, t * 2, text(@sprintf("%.2fs", t), 8))
    end
end

plot!()

### Optimality Gap Summary

In [None]:
# Create summary table
summary_df = DataFrame(
    Strategy = names,
    Type = [r.type for r in all_results],
    Distance_km = round.(distances, digits=2),
    Gap_pct = round.(gaps, digits=2),
    SolveTime_s = round.(times, digits=4),
    Speedup = round.(milp_result.solve_time ./ times, digits=0)
)

println("\n" * "="^70)
println("URBAN DATASET SUMMARY")
println("="^70)
display(summary_df)
println()
println("Key Insights:")
best_greedy_idx = argmin([r.total_distance for r in greedy_results])
best_greedy = greedy_results[best_greedy_idx]
gap_pct = (best_greedy.total_distance - optimal_distance) / optimal_distance * 100
speedup = milp_result.solve_time / best_greedy.solve_time

println(@sprintf("  - Best greedy (%s): %.1f%% worse than optimal", 
        best_greedy.name, gap_pct))
println(@sprintf("  - MILP is %.0fx slower than best greedy", speedup))
println(@sprintf("  - Tradeoff: %.1f%% better solution for %.2fs extra time",
        gap_pct, milp_result.solve_time - best_greedy.solve_time))

## Experiment 2: Multiple Datasets

Compare across different problem sizes and types.

In [None]:
# Define datasets to test (starting with smaller ones)
test_datasets = [
    ("test0", "data/test0"),
    ("EU Urban (NL)", "data/eu_urban"),
    ("Urban (NYC)", "data/urban"),
]

# Store results
dataset_comparison = []

for (dataset_name, dataset_path) in test_datasets
    println("\n" * "="^70)
    println("Testing: $dataset_name")
    println("="^70)
    
    freights = CSV.read("$dataset_path/freights.csv", DataFrame)
    vehicles = CSV.read("$dataset_path/vehicles.csv", DataFrame)
    
    println("Size: $(nrow(freights)) freights, $(nrow(vehicles)) vehicles")
    
    # Run best greedy (Distance)
    greedy = run_greedy(freights, vehicles, DistanceStrategy(), "Distance")
    println(@sprintf("  Greedy (Distance): %.1f km in %.4f s", 
            greedy.total_distance, greedy.solve_time))
    
    # Run local search
    ls = run_local_search(greedy.result, freights, vehicles, time_limit=10.0)
    println(@sprintf("  Local Search: %.1f km in %.4f s [%.2f%% improvement]", 
            ls.total_distance, ls.solve_time, ls.improvement))
    
    # Run MILP
    milp = run_milp(freights, vehicles, time_limit=60.0)
    println(@sprintf("  MILP: %.1f km in %.4f s [%s]", 
            milp.total_distance, milp.solve_time, milp.status))
    
    greedy_gap = (greedy.total_distance - milp.total_distance) / milp.total_distance * 100
    ls_gap = (ls.total_distance - milp.total_distance) / milp.total_distance * 100
    
    println(@sprintf("  Gaps: Greedy %.1f%%, LS %.1f%%", greedy_gap, ls_gap))
    
    push!(dataset_comparison, (
        dataset = dataset_name,
        n_freights = nrow(freights),
        n_vehicles = nrow(vehicles),
        greedy_distance = greedy.total_distance,
        ls_distance = ls.total_distance,
        milp_distance = milp.total_distance,
        greedy_time = greedy.solve_time,
        ls_time = ls.solve_time,
        milp_time = milp.solve_time,
        greedy_gap = greedy_gap,
        ls_gap = ls_gap
    ))
end

### Cross-Dataset Comparison: Optimality Gap

In [None]:
dataset_names = [r.dataset for r in dataset_comparison]
greedy_gaps = [r.greedy_gap for r in dataset_comparison]
ls_gaps = [r.ls_gap for r in dataset_comparison]

x_pos = 1:length(dataset_names)
bar_width = 0.35

p = bar(x_pos .- bar_width/2, greedy_gaps,
    bar_width=bar_width,
    label="Greedy",
    color=:orange,
    alpha=0.7
)

bar!(x_pos .+ bar_width/2, ls_gaps,
    bar_width=bar_width,
    label="Local Search",
    color=:yellow,
    alpha=0.7
)

plot!(title="Optimality Gap: Greedy vs Local Search vs MILP",
      xlabel="Dataset",
      ylabel="Gap from Optimal (%)",
      xticks=(x_pos, dataset_names),
      legend=:topright,
      grid=true)

# Add value labels
for (i, (g, ls)) in enumerate(zip(greedy_gaps, ls_gaps))
    annotate!(i - bar_width/2, g + 0.2, text(@sprintf("%.1f%%", g), 8))
    annotate!(i + bar_width/2, ls + 0.2, text(@sprintf("%.1f%%", ls), 8))
end

plot!()

### Cross-Dataset Comparison: Solving Time

In [None]:
greedy_times = [r.greedy_time for r in dataset_comparison]
ls_times = [r.ls_time for r in dataset_comparison]
milp_times = [r.milp_time for r in dataset_comparison]

x_pos = 1:length(dataset_names)
bar_width = 0.25

p = bar(x_pos .- bar_width, greedy_times,
    bar_width=bar_width,
    label="Greedy",
    color=:skyblue,
    alpha=0.7
)

bar!(x_pos, ls_times,
    bar_width=bar_width,
    label="Local Search",
    color=:yellow,
    alpha=0.7
)

bar!(x_pos .+ bar_width, milp_times,
    bar_width=bar_width,
    label="MILP",
    color=:purple,
    alpha=0.7
)

plot!(title="Solving Time: Greedy vs Local Search vs MILP",
      xlabel="Dataset",
      ylabel="Time (seconds)",
      xticks=(x_pos, dataset_names),
      yscale=:log10,
      legend=:topleft,
      grid=true)

plot!()

### Final Summary Table

In [None]:
summary_table = DataFrame(
    Dataset = [r.dataset for r in dataset_comparison],
    Freights = [r.n_freights for r in dataset_comparison],
    Vehicles = [r.n_vehicles for r in dataset_comparison],
    Greedy_km = round.([r.greedy_distance for r in dataset_comparison], digits=1),
    LS_km = round.([r.ls_distance for r in dataset_comparison], digits=1),
    MILP_km = round.([r.milp_distance for r in dataset_comparison], digits=1),
    Greedy_gap = round.([r.greedy_gap for r in dataset_comparison], digits=1),
    LS_gap = round.([r.ls_gap for r in dataset_comparison], digits=1),
    Greedy_s = round.([r.greedy_time for r in dataset_comparison], digits=4),
    LS_s = round.([r.ls_time for r in dataset_comparison], digits=2),
    MILP_s = round.([r.milp_time for r in dataset_comparison], digits=2)
)

println("\n" * "="^80)
println("FINAL SUMMARY: GREEDY vs LOCAL SEARCH vs MILP")
println("="^80)
display(summary_table)
println()
println("Column Descriptions:")
println("  - Greedy_gap, LS_gap: % worse than optimal")
println("  - Greedy_s, LS_s, MILP_s: Solving time in seconds")
println()
println("Key Observations:")
println("  - Local search improves greedy by ", 
        round(mean([r.greedy_gap - r.ls_gap for r in dataset_comparison]), digits=1), 
        "% on average")
println("  - Local search is ", 
        round(mean([r.milp_time / r.ls_time for r in dataset_comparison]), digits=0), 
        "x faster than MILP on average")

## Conclusions

### Key Findings:

1. **Solution Quality Spectrum**:
   - **Greedy**: 2-10% from optimal (fastest)
   - **Local Search**: 0-5% from optimal (middle ground)
   - **MILP**: Optimal (0% gap, proven)

2. **Computational Cost**:
   - **Greedy**: Milliseconds (<0.01s typical)
   - **Local Search**: Seconds (0.1-2s typical)
   - **MILP**: Seconds to minutes (0.1-10s+ depending on size)

3. **Local Search Sweet Spot**:
   - Improves greedy solutions by 2-5% on average
   - Takes only seconds (much faster than MILP)
   - Good balance between quality and speed

### When to Use Each:

**Use Greedy (Distance/OverallCost) when:**
- ✓ Real-time/online decisions (freights arrive dynamically)
- ✓ Large-scale problems (50+ freights)
- ✓ Sub-second response time required
- ✓ 5-10% suboptimality acceptable

**Use Local Search (Metaheuristic) when:**
- ✓ Batch planning (all freights known)
- ✓ Medium problems (10-50 freights)
- ✓ Seconds available for optimization
- ✓ Want better than greedy but MILP too slow
- ✓ 2-5% suboptimality acceptable

**Use MILP when:**
- ✓ Small problems (<20 freights)
- ✓ Minutes available for solving
- ✓ Optimality critical (provably best solution)
- ✓ Cost of suboptimality very high
- ✓ Strategic/offline planning

### Hybrid Approaches:

The three approaches can be combined:

1. **Greedy → Local Search** (implemented here):
   - Fast initialization with greedy
   - Refinement with local search
   - Best balance for most use cases

2. **Local Search → MILP Warm Start**:
   - Local search finds good solution
   - Provide to MILP as starting point
   - MILP refines to optimality faster

3. **Hierarchical (for large problems)**:
   - Greedy for clustering/partitioning
   - Local search or MILP for each cluster
   - Scales to 100+ freights

### Recommendation:

**For most practical applications**, use **Local Search**:
- Good solution quality (close to optimal)
- Reasonable solving time (seconds)
- Best cost-benefit tradeoff

Only use pure greedy for real-time systems or MILP when optimality is critical.

**For detailed theory**, see `SOLUTION_APPROACHES.md`.