# PRRP Experimental Framework

This notebook implements Phase 1 of the PRRP experimental framework. It supports:

1. Loading spatial (shapefile) and graph (METIS) datasets.
2. Running experiments on spatial regionalization (using PRRP from `src/spatial_prrp.py`) and graph partitioning (using PRRP from `src/graph_prrp.py` and PyMETIS from `src/pymetis_partition.py`).
3. Logging and storing performance metrics (execution time, success probability, effectiveness, and completeness).
4. Generating performance visualizations.

In [1]:
import os
import sys
import time
import random
import logging
import json
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Add root directory to sys.path so that src modules can be imported
ROOT_DIR = os.path.abspath(os.path.join(os.getcwd(), ".."))
if ROOT_DIR not in sys.path:
    sys.path.insert(0, ROOT_DIR)

In [2]:
# Import PRRP modules
from src.prrp_data_loader import load_shapefile, load_metis_graph
from src.spatial_prrp import run_prrp, run_parallel_prrp
from src.metis_parser import load_graph_from_metis
from src.pymetis_partition import partition_graph_pymetis

RuntimeError: Could not locate METIS dll. Please set the METIS_DLL environment variable to its full path.

In [None]:
# Configure logging to display info messages
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)

## Experiment Configuration

In this section we define:

- **Dataset paths:** For the spatial and graph datasets.
- **Parameter settings:** Such as the number of regions, sample size (M), maximum retries (MR), and number of cores (Q).
- **Output directories:** For saving results and figures.

In [None]:
# Define dataset paths (update these paths as necessary)
DATA_DIR = os.path.join(ROOT_DIR, "data")
# Example spatial dataset (census tracts shapefile)
spatial_dataset_path = os.path.join(DATA_DIR, "cb_2015_42_tract_500k", "cb_2015_42_tract_500k.shp")
# Example graph dataset in METIS format
graph_dataset_path = os.path.join(DATA_DIR, "PGPgiantcompo.graph")

In [None]:
# Experiment parameters for spatial PRRP
spatial_params = {
    "p_percentage": [0.01, 0.02, 0.03],  # percentage of dataset to use as number of regions
    "sample_sizes": [10, 50, 100],         # M: sample size (number of solutions)
    "MR_values": [10, 30, 50],             # MR: max iterations for region formation
    "num_cores": [1, 2, 4]                 # Q: number of parallel cores
}

# Experiment parameters for graph partitioning PRRP
graph_params = {
    "p_values": [5, 10, 20],             # number of partitions
    "cardinality_constraints": ["Uniform", "Skewed"],  # example constraints
    "MR_values": [5, 20, 50],
    "MS_values": [5, 10]
}

In [None]:
# Output directories for results and figures
RESULTS_DIR = os.path.join(ROOT_DIR, "results/final_results")
FIGURES_DIR = os.path.join(ROOT_DIR, "results/figures")
os.makedirs(RESULTS_DIR, exist_ok=True)
os.makedirs(FIGURES_DIR, exist_ok=True)

## Utility Functions for Experimentation

The following helper functions are used to:

- Load datasets (spatial or graph).
- Compute random target cardinalities for spatial regions.
- Run a spatial experiment (using PRRP).
- Run a graph partitioning experiment (using both PRRP and PyMETIS for comparison).
- Save the results in CSV/JSON format.
- Generate visualizations.

In [None]:
def generate_cardinalities(total_areas, num_regions):
    """
    Generate a list of target cardinalities for regions.
    
    Parameters:
        total_areas (int): Total number of areas in the dataset.
        num_regions (int): Number of regions to partition into.
    
    Returns:
        list: List of integers representing the target cardinality for each region.
              The sum of cardinalities equals total_areas.
    """
    # Start with a minimum of 2 areas per region (as an example)
    cardinalities = [2] * num_regions
    remaining = total_areas - 2 * num_regions
    # Distribute the remaining areas randomly across regions
    for i in range(num_regions):
        if remaining <= 0:
            break
        add = random.randint(0, remaining)
        cardinalities[i] += add
        remaining -= add
    random.shuffle(cardinalities)
    return cardinalities

def run_spatial_experiment(shapefile_path, p_percentage, M, MR, num_threads):
    """
    Run spatial PRRP experiment on a shapefile dataset.
    
    Parameters:
        shapefile_path (str): Path to the shapefile.
        p_percentage (float): Percentage of dataset size used to determine number of regions.
        M (int): Sample size (number of PRRP solutions to generate).
        MR (int): Maximum iterations to build each region.
        num_threads (int): Number of parallel threads/processes.
    
    Returns:
        dict: Dictionary with experiment metrics and the list of solutions.
    """
    logger.info(f"Loading spatial dataset from {shapefile_path} ...")
    areas = load_shapefile(shapefile_path)
    if areas is None:
        raise FileNotFoundError(f"Failed to load shapefile from {shapefile_path}")
    
    total_areas = len(areas)
    num_regions = max(2, int(total_areas * p_percentage))
    cardinalities = generate_cardinalities(total_areas, num_regions)
    
    logger.info(f"Dataset loaded with {total_areas} areas.")
    logger.info(f"Parameters: num_regions = {num_regions}, cardinalities = {cardinalities}, M = {M}, MR = {MR}, threads = {num_threads}")
    
    start_time = time.time()
    # Run parallel PRRP (if num_threads > 1) or sequentially
    solutions = run_parallel_prrp(areas, num_regions, cardinalities, M, num_threads, use_multiprocessing=(num_threads > 1))
    exec_time = time.time() - start_time
    
    # Dummy metrics – in a real experiment, you would compute success probability, effectiveness, and completeness.
    metrics = {
        "execution_time_sec": exec_time,
        "total_areas": total_areas,
        "num_regions": num_regions,
        "cardinalities": cardinalities,
        "sample_size": M,
        "MR": MR,
        "num_threads": num_threads,
        "num_solutions_generated": len(solutions)
    }
    logger.info(f"Spatial experiment completed in {exec_time:.2f} seconds.")
    
    return {"metrics": metrics, "solutions": solutions}

def run_graph_experiment(graph_file_path, p, C_constraint, MR, MS):
    """
    Run graph partitioning experiment using PRRP and compare against PyMETIS.
    
    Parameters:
        graph_file_path (str): Path to the METIS-format graph file.
        p (int): Desired number of partitions.
        C_constraint (str): Cardinality constraint type (e.g., "Uniform" or "Skewed").
        MR (int): Maximum iterations for growing a partition.
        MS (int): Maximum allowed partition size before splitting.
    
    Returns:
        dict: Dictionary with performance metrics and partition results for both PRRP and PyMETIS.
    """
    # Load graph using METIS parser
    logger.info(f"Loading graph dataset from {graph_file_path} ...")
    try:
        G_metis, num_nodes, num_edges = load_graph_from_metis(graph_file_path)
    except Exception as e:
        logger.error(f"Error loading graph: {e}")
        raise
    
    logger.info(f"Graph loaded: {num_nodes} nodes, {num_edges} edges.")
    
    # Run graph-based PRRP partitioning (from src/graph_prrp.py)
    # Note: Here we assume that the PRRP function 'run_graph_prrp' is defined in that module.
    from src.graph_prrp import run_graph_prrp
    start_time = time.time()
    prrp_partitions = run_graph_prrp(G_metis, p, None, MR, MS)
    prrp_time = time.time() - start_time
    logger.info(f"Graph PRRP partitioning completed in {prrp_time:.2f} seconds.")
    
    # Run PyMETIS partitioning for comparison
    start_time = time.time()
    pymetis_partitions = partition_graph_pymetis(G_metis, p)
    pymetis_time = time.time() - start_time
    logger.info(f"PyMETIS partitioning completed in {pymetis_time:.2f} seconds.")
    
    metrics = {
        "prrp_execution_time_sec": prrp_time,
        "pymetis_execution_time_sec": pymetis_time,
        "num_nodes": num_nodes,
        "num_edges": num_edges,
        "num_partitions": p,
        "MR": MR,
        "MS": MS,
        "C_constraint": C_constraint
    }
    return {
        "metrics": metrics,
        "prrp_partitions": prrp_partitions,
        "pymetis_partitions": pymetis_partitions
    }

def save_results(results, filename):
    """
    Save experiment results to a JSON file.
    
    Parameters:
        results (dict): Results dictionary.
        filename (str): Filename (including path) to save the JSON.
    """
    with open(filename, "w") as f:
        json.dump(results, f, indent=4)
    logger.info(f"Results saved to {filename}")

## Running Experiments and Collecting Metrics

The following cells run the spatial and graph experiments over a range of parameter settings.
For demonstration, we run a single configuration for each type. In practice, you might loop over
multiple parameter combinations and aggregate the results into a DataFrame.

In [None]:
# Run a single spatial experiment configuration
spatial_experiment_config = {
    "p_percentage": 0.02,  # using 2% of dataset size for number of regions
    "M": 10,
    "MR": 30,
    "num_threads": 2
}
spatial_results = run_spatial_experiment(spatial_dataset_path,
                                           spatial_experiment_config["p_percentage"],
                                           spatial_experiment_config["M"],
                                           spatial_experiment_config["MR"],
                                           spatial_experiment_config["num_threads"])
# Save spatial results
spatial_results_file = os.path.join(RESULTS_DIR, "spatial_experiment_results.json")
save_results(spatial_results, spatial_results_file)

# %% [code]
# Run a single graph partitioning experiment configuration
graph_experiment_config = {
    "p": 10,
    "C_constraint": "Uniform",
    "MR": 20,
    "MS": 10
}
graph_results = run_graph_experiment(graph_dataset_path,
                                     graph_experiment_config["p"],
                                     graph_experiment_config["C_constraint"],
                                     graph_experiment_config["MR"],
                                     graph_experiment_config["MS"])
# Save graph results
graph_results_file = os.path.join(RESULTS_DIR, "graph_experiment_results.json")
save_results(graph_results, graph_results_file)

## Generating Visualizations

Here we generate a simple visualization of the execution time from the spatial experiment.
In a full evaluation, you would plot multiple performance metrics across parameter settings.

In [None]:
def plot_execution_time(metric_list, title, xlabel, ylabel, save_path=None):
    """
    Plot execution time (or any metric) across experiments.
    
    Parameters:
        metric_list (list or np.array): List of metric values.
        title (str): Plot title.
        xlabel (str): Label for x-axis.
        ylabel (str): Label for y-axis.
        save_path (str, optional): If provided, saves the figure to the given path.
    """
    plt.figure(figsize=(8, 6))
    plt.plot(metric_list, marker='o', linestyle='-')
    plt.title(title)
    plt.xlabel(xlabel)
    plt.ylabel(ylabel)
    plt.grid(True)
    if save_path:
        plt.savefig(save_path, dpi=300)
        logger.info(f"Figure saved to {save_path}")
    plt.show()

In [None]:
# Example: plot spatial experiment execution time
exec_time = spatial_results["metrics"]["execution_time_sec"]
# For demonstration we create a dummy list; in a full experiment you would loop over many parameter combinations.
times = [exec_time, exec_time * 1.2, exec_time * 0.9]  # dummy data
plot_execution_time(times, "Spatial Experiment Execution Time", "Parameter Variation Index", "Time (sec)",
                    save_path=os.path.join(FIGURES_DIR, "spatial_execution_time.png"))