# Developing Optimisation Models

### Importing Libraries

In [1]:
# ────────────────────────────────────────────────────────────────────────────
# Future (must be first)
# ────────────────────────────────────────────────────────────────────────────
from __future__ import annotations

# ────────────────────────────────────────────────────────────────────────────
# Jupyter/Notebook Setup
# ────────────────────────────────────────────────────────────────────────────
%matplotlib inline
from IPython.display import display

# ────────────────────────────────────────────────────────────────────────────
# Standard Library
# ────────────────────────────────────────────────────────────────────────────
import os
import time
from copy import deepcopy
from dataclasses import dataclass, asdict
from datetime import datetime, timedelta
from functools import lru_cache, partial
from typing import (
    Any, Callable, Dict, List, Optional, Sequence, Tuple, Union, Literal
)

# ────────────────────────────────────────────────────────────────────────────
# Optimisation / OR libraries
# ────────────────────────────────────────────────────────────────────────────
import pyomo.environ as pyo
# import cvxpy as cp   # only if you plan to run LP relaxations

# ────────────────────────────────────────────────────────────────────────────
# Core Data Handling
# ────────────────────────────────────────────────────────────────────────────
import numpy as np
import pandas as pd
import polars as pl

# ────────────────────────────────────────────────────────────────────────────
# Visualization (optional)
# ────────────────────────────────────────────────────────────────────────────
import matplotlib.pyplot as plt


Awesome—here’s your **clean, updated constraint set** reflecting the changes you just decided:

# Behavioral constraints

1. **Max moves per customer per day**

* **Constraint:** `moves_per_customer_per_day ≤ X` (default X=1)
* **Why:** Minimise disruption and alert fatigue.
* **Params:** `X`, `timezone`, `day_boundaries` (default 00:00–24:00), `week_boundaries` (Mon–Sun or ISO)

2. **Max moves per customer per week**

* **Constraint:** `moves_per_customer_per_week ≤ Y` (default Y=3)
* **Why:** Prevent repeated nudges.
* **Params:** `Y`, `timezone`, `week_boundaries`

3. **Max shift window**

* **Constraint:** `|t_original − t_shifted| ≤ H` (inclusive)
* **Why:** Keep shifts realistic.
* **Params:** `H_hours` (e.g., 2h), `slot_length_minutes` (default 30), `inclusive=True`

4. **Peak-hour comfort (per-slot or per-hour grouping)**

* **Constraint (per-slot form):** For each peak slot `s`,
  `post_usage_customer(s) ≥ (1 − Z) · baseline_customer(s)`
  (i.e., **no more than Z% reduction** in that customer’s usage in that peak slot/hour)
* **Why:** Maintain comfort in peak periods.
* **Params:**

  * `Z_percent` (default 30%)
  * `scope="per_customer"` (your choice)
  * `peak_hours` defined **per city** and **weekday** as full hours; helper expands to 30-min slots
  * `cap_mode ∈ { "slot", "hour" }` (if `"hour"`, treat both half-hour slots within the hour jointly)

# Practical technical constraints

1. **Regional maximum shift (per city-day)**

* **Constraint:** `total_moved_kWh(city, day) ≤ P% × regional_total_daily_average_load_kWh(city)`
* **Why:** Avoid excessive system perturbation.
* **Implementation:** **City-day budget with running residual**; each customer-day solve receives a `moved_kwh_cap = residual_budget`.
* **Params:** `P_percent` (default 10%), `regional_total_daily_average_load_kWh={city: kWh}`

2. **Household minimum usage per slot**

* **Constraint:** `post_usage_customer(s) ≥ max( min_baseline(customer,s), R% × robust_max(customer,s) )`
* **Why:** Preserve essential usage.
* **Computation:** Per **customer** and **slot** (stratify by hour-of-day & day-of-week). Recommend **precompute** in Polars and attach `floor_kwh`.
* **Params:**

  * `baseline_period ∈ {year, month, week, day}`
  * `baseline_type ∈ {average, absolute_min}`
  * `robust_max_percentile` (default 95)
  * `R_percent` (default 10)
  * `epsilon_floor_kWh` (small positive to avoid zero)

3. **No spiking (per-city per-slot cap vs baseline)**

* **Constraint:** For each slot `s` in a city-day,
  `post_city_usage(s) ≤ (1 + α%) × baseline_city_usage(s)`
* **Why:** Prevent creating new peaks by piling into a single low-MEF slot.
* **Implementation:** **City-day per-slot residual capacity**. Compute `baseline_city_usage(s)` once, set cap to `(1+α%)·baseline`, maintain a **running residual** and pass **per-destination upper bounds** into each customer solve.
* **Params:** `alpha_peak_cap_percent` (default 25%)

# Hardcoded constraints

1. **Total consumption conservation (per customer)**

* **Constraint:** `Σ_s post_usage_customer(s) = Σ_s original_usage_customer(s)` within the conservation horizon (default **day**).
* **Params:** `id_field="ca_id"`, `conservation_horizon ∈ {day, week, month, year}`

2. **Intra-customer conservation**

* **Constraint:** Energy cannot be traded between households; all shifts remain within the same `ca_id`.

# Orchestration & model levers

1. **High-usage focus (ordering, not a math constraint)**

* **Behaviour:** Rank **customers by daily kWh within each city-day** (e.g., percentile).

  * If `shuffle_high_usage_order=False` (default): process **highest usage first**.
  * If `True`: compute the percentile anyway (record it), but **randomly shuffle** the processing order.
* **Params:** `percentile_threshold` (optional for tagging), `shuffle_high_usage_order ∈ {True, False}` (default False)

2. **Parallelisation guidance**

* Parallelise **across city-days** (and/or cities/weeks).
* Do **not** parallelise **within** a single city-day if enforcing regional/anti-spike budgets (they rely on a shared residual).

# Units & accounting

* **Base unit:** `gCO₂/kWh` (both **marginal** and **average**).
* **Objective:** minimise `∑ post_usage[s] · MEF[s]` subject to constraints.
* **Reporting:** show savings in **marginal** terms (what the optimiser optimises) and **average** terms (for context).

---

## Loading Data

### Directories

In [2]:
# DIRECTORIES AND PATHS
base_data_directory = "data"    # Base directory where the dataframes will be saved
hitachi_data_directory = os.path.join(base_data_directory, "hitachi_copy")      # Directory where the dataframes will be saved
meter_save_directory = os.path.join(hitachi_data_directory, "meter_primary_files")       # Directory for meter readings

marginal_emissions_development_directory = os.path.join(base_data_directory, "marginal_emissions_development")  # Directory for marginal emissions development data
marginal_emissions_results_directory = os.path.join(marginal_emissions_development_directory, "results")
marginal_emissions_logs_directory = os.path.join(marginal_emissions_development_directory, "logs")

optimisation_development_directory = os.path.join(base_data_directory, "optimisation_development")

In [3]:
print("\n" + "-" * 120)
print(f"Contents of '{optimisation_development_directory}' and subdirectories:\n" + "-" * 120)
for root, dirs, files in os.walk(optimisation_development_directory):
    for f in sorted(files):
        rel_dir = os.path.relpath(root, hitachi_data_directory)
        rel_file = os.path.join(rel_dir, f) if rel_dir != "." else f
        print(f"  - {rel_file}")


------------------------------------------------------------------------------------------------------------------------
Contents of 'data/optimisation_development' and subdirectories:
------------------------------------------------------------------------------------------------------------------------
  - ../optimisation_development/.DS_Store
  - ../optimisation_development/average_emissions_2022-05-04_to_2022-05-18.parquet
  - ../optimisation_development/customers_ids_with_emissions.parquet
  - ../optimisation_development/customers_ids_with_marginal_emissions.parquet
  - ../optimisation_development/marginal_and_average_emissions_2022-05-04_to_2022-05-18.parquet
  - ../optimisation_development/marginal_emissions_2022-05-04_to_2022-05-18.parquet
  - ../optimisation_development/meter_readings_2022-05-04_to_2022-05-18.parquet
  - ../optimisation_development/meter_readings_2022-05-04_to_2022-05-18_with_marginal_emissions.parquet
  - ../optimisation_development/results/.DS_Store


### File Paths

In [4]:
# Defining Full files
marginal_emissions_filename = "meter_readings_2022-05-04_to_2022-05-18_with_marginal_emissions"
marginal_emissions_filepath = os.path.join(optimisation_development_directory, marginal_emissions_filename + ".parquet")

### Load

In [5]:
marginal_emissions_pldf = pl.read_parquet(marginal_emissions_filepath)

In [6]:
print("-"  *120)
print("Schema of marginal_emissions_pldf\n"+ "-"  *120)
display(marginal_emissions_pldf.schema)

------------------------------------------------------------------------------------------------------------------------
Schema of marginal_emissions_pldf
------------------------------------------------------------------------------------------------------------------------


Schema([('ca_id', String),
        ('date', Datetime(time_unit='us', time_zone='Asia/Kolkata')),
        ('city', Categorical(ordering='physical')),
        ('customer_longitude', Float64),
        ('customer_latitude', Float64),
        ('value', Float64),
        ('demand_met_kWh', Float64),
        ('marginal_emissions_grams_co2_per_kWh', Float64),
        ('average_emissions_grams_co2_per_kWh', Float64)])

In [7]:
marginal_emissions_pldf = marginal_emissions_pldf.rename({
    "marginal_emissions_grams_co2_per_kWh": "marginal_emissions_factor_grams_co2_per_kWh",
    "average_emissions_grams_co2_per_kWh": "average_emissions_factor_grams_co2_per_kWh"
})

In [8]:
print("-"  *120)
print("Schema of marginal_emissions_pldf\n"+ "-"  *120)
display(marginal_emissions_pldf.schema)

------------------------------------------------------------------------------------------------------------------------
Schema of marginal_emissions_pldf
------------------------------------------------------------------------------------------------------------------------


Schema([('ca_id', String),
        ('date', Datetime(time_unit='us', time_zone='Asia/Kolkata')),
        ('city', Categorical(ordering='physical')),
        ('customer_longitude', Float64),
        ('customer_latitude', Float64),
        ('value', Float64),
        ('demand_met_kWh', Float64),
        ('marginal_emissions_factor_grams_co2_per_kWh', Float64),
        ('average_emissions_factor_grams_co2_per_kWh', Float64)])

In [9]:
marginal_emissions_pldf.describe()

statistic,ca_id,date,city,customer_longitude,customer_latitude,value,demand_met_kWh,marginal_emissions_factor_grams_co2_per_kWh,average_emissions_factor_grams_co2_per_kWh
str,str,str,str,f64,f64,f64,f64,f64,f64
"""count""","""118643003""","""118643003""","""118643003""",117974327.0,117974327.0,118643003.0,118643003.0,118643003.0,118643003.0
"""null_count""","""0""","""0""","""0""",668676.0,668676.0,0.0,0.0,0.0,0.0
"""mean""",,"""2022-05-10 23:31:16.797125+05:…",,77.291045,28.759742,0.349623,92286000.0,708.509083,733.241389
"""std""",,,,3.262976,1.216057,0.541503,4168900.0,66.359649,44.958014
"""min""","""60000005516""","""2022-05-04 00:00:00+05:30""",,76.954941,28.58,0.0,83034000.0,536.134053,624.711512
"""25%""",,"""2022-05-07 11:30:00+05:30""",,77.12764,28.688147,0.069,88729000.0,673.338588,696.592516
"""50%""",,"""2022-05-10 23:00:00+05:30""",,77.15129,28.703647,0.166,92563000.0,720.94116,738.809205
"""75%""",,"""2022-05-14 13:30:00+05:30""",,77.193402,28.723953,0.417,95808000.0,756.948061,764.851089
"""max""","""60029920067""","""2022-05-18 00:00:00+05:30""",,231.484813,86.320485,49.438,100710000.0,835.939878,827.083443


## Developing Models

### Classes

#### Customer & Behavior Constraint Configurations

In [10]:
@dataclass
class PeakHoursReductionLimitConfig:
    """
    Configuration for limiting peak hours reduction.

    This class allows you to define the scope and percentage limit for reducing peak hours
    in a specific region or for a specific customer.
    The purpose for this configuration is to allow for preserving comfort during peak hours by not reducing the consumption too much.

    Attributes:
    -----------
    peak_hours_reduction_scope: Literal["per_customer", "per_city"]
        The scope of the reduction limit (per customer or per city).
    peak_hours_reduction_percent_limit: float
        The percentage limit for    peak hours reduction.
    peak_hours_dict: Optional[Dict[str, Dict[str, List[str]]]]
        A dictionary defining peak hours for reduction by city, day, and time.
    limit_scope: Literal["slot", "hour"]
        The scope of the limit (per slot or per hour).
    """
    peak_hours_reduction_scope: Literal["per_customer", "per_city"] = "per_city"
    peak_hours_reduction_percent_limit: float = 30.0
    # Dict structure: { delhi: { "Mon": [ 9,10,20, ...], "Tue": [9,10,11, ...] } }
    peak_hours_dict: Optional[Dict[str, Dict[str, List[int]]]] = None
    limit_scope: Literal["slot", "hour"] = "hour"


In [11]:
@dataclass
class CustomerAdoptionBehavioralConfig:
    """
    Configuration for customer adoption behavior in the energy management system.

    This class allows you to define the behavioral parameters that influence how customers
    interact with the energy management system, including their usage patterns and preferences.

    Attributes:
    -----------
    customer_power_moves_per_day : int
        The number of shifts a customer is allowed to make per day
    customer_power_moves_per_week : int
        The number of shifts a customer is allowed to make per week.
    timezone: str
        The timezone in which the customer operates.
    day_boundaries: str
        The hours in the day which can be evaluated for shifts (e.g., "08:00-20:00")
    week_boundaries: str
        The boundaries of the week for the customer's schedule.
    shift_hours_window: float
        The number of hours on either side of usage to look for shifts.
    slot_length_minutes: int
        The length of each time slot in minutes.
    shift_window_inclusive: bool
        Whether the shift window is inclusive or exclusive.
    peak_hours_reduction_limit_config: Optional[PeakHoursReductionLimitConfig]
        Configuration for peak hours reduction limits.
    """
    customer_power_moves_per_day: int = 1
    customer_power_moves_per_week: int = 3
    timezone: str = "Asia/Kolkata"
    day_boundaries: str = "00:00-24:00"
    week_boundaries: Literal["Mon-Sun","ISO"] = "Mon-Sun"
    shift_hours_window: float = 2.0
    slot_length_minutes: int = 30
    shift_window_inclusive: bool = True
    peak_hours_reduction_limit_config: Optional[PeakHoursReductionLimitConfig] = None

#### Technical Constraint Configurations

In [12]:
@dataclass
class HouseholdMinimumConsumptionLimitConfig:
    """
    Configuration for minimum energy consumption limits at the household level.

    This class allows you to define the minimum energy consumption limits for households,
    ensuring that essential energy needs are met even during demand response events.

    The attributes of this class will be used to configure the minimum consumption limits.
    remaining_usage ≥ max(min_baseline(customer,t), R% * robust_max(customer,t))

    Attributes:
    -----------
    household_minimum_baseline_period: Literal["year","month","week","day"]
        The period over which to calculate the baseline consumption.
    household_minimum_baseline_type: Literal["average","absolute_min"]
        The type of baseline calculation to use (average or absolute minimum).
    household_minimum_robust_max_percentile: float
        The percentile to use for robust maximum calculation.
    household_minimum_R_percent: float
        The percentage limit for minimum consumption (as a fraction of robust max).
    household_minimum_epsilon_floor_kWh: float
        A small value to avoid zero consumption limits.
    """
    household_minimum_baseline_period: Literal["year","month","week","day"] = "year"
    household_minimum_baseline_type: Literal["average","absolute_min"] = "average"
    household_minimum_robust_max_percentile: float = 95.0
    household_minimum_R_percent: float = 10.0  # fraction of robust max - defaults to 10
    # internal: epsilon floor to avoid zeros/outages
    household_minimum_epsilon_floor_kWh: float = 0.001 # small value (1 Wh) to avoid zero consumption limits


In [13]:
@dataclass
class RegionalLoadShiftingLimitConfig:
    """
    Configuration for regional load shifting capabilities.

    The purpose of this configuration is to define the upper limit for load shifting at a regional level,
    as large fluctuations in the energy demand can impact the overall stability of the grid.

    Attributes:
    -----------
    regional_load_shift_percent_limit: float
        The percentage limit for load shifting (per city).
    regional_total_daily_average_load_kWh: float
        The total daily average load in kWh for each city.
    """
    regional_load_shift_percent_limit: float = 10.0  # per city
    regional_total_daily_average_load_kWh: Optional[Dict[str, float]] = None  # {city: kWh/day}


In [14]:
@dataclass
class ShiftWithoutSpikeLimitConfig:
    """
    Configuration to limit the amount of energy that can be moved into a single time slot
    in order to avoid causing a spike in usage.

    Attributes:
    -----------
    city: The city for which the configuration applies.
    alpha_peak_cap_percent: The per-slot upper cap vs baseline (city level).
    """
    city: str = "default_city" # The city for which the configuration applies
    alpha_peak_cap_percent: float = 25.0  # per-slot upper cap vs baseline (city level)

#### Utility Configurations

In [15]:
@dataclass
class ParallelConfig:
    """
    Configuration for parallel processing in the energy management system.

    This class allows you to define the parameters for parallel execution of tasks,
    including the method of parallelism and the number of workers to use.

    Attributes:
    -----------
    enabled: bool
        Whether parallel processing is enabled.
    method: Optional[Literal["local","mpi"]]
        The method of parallelism to use (e.g., local or MPI).
    workers: Optional[int]
        The number of workers to use for local parallelism.
    show_progress: bool
        Whether to show progress bars during execution.
    """
    enabled: bool = False
    method: Optional[Literal["local","mpi"]] = None
    workers: Optional[int] = None  # for local
    show_progress: bool = False


In [16]:
@dataclass
class ShiftPolicy:
    """
    Policy configuration for load shifting in the energy management system.

    This class allows you to consolidate the configurations available through
    the classes defined above in order to apply constraints and limitations to
    load shifting strategies, including behavioral, regional, household, and spike caps.

    Attributes:
    -----------
    behavioral: CustomerAdoptionBehavioralConfig
        Behavioral configuration for customer adoption.
    regional_cap: RegionalLoadShiftingLimitConfig
        Regional load shifting limit configuration.
    household_min: HouseholdMinimumConsumptionLimitConfig
        Household minimum consumption limit configuration.
    spike_cap: ShiftWithoutSpikeLimitConfig
        Shift without spike limit configuration.
    """
    behavioral: CustomerAdoptionBehavioralConfig = CustomerAdoptionBehavioralConfig()
    regional_cap: Optional[RegionalLoadShiftingLimitConfig] = None
    household_min: Optional[HouseholdMinimumConsumptionLimitConfig] = None
    spike_cap: Optional[ShiftWithoutSpikeLimitConfig] = None

In [17]:
@dataclass
class SolverConfig:
    """
    Configuration for the optimization solver in the energy management system.

    This class allows you to define the parameters for the solver used in the optimization
    process, including the solver family and specific options for each solver type.

    Attributes:
    -----------
    solver_family : Literal["lp","milp","greedy"]
        The family of the solver to use (e.g., LP, MILP, greedy).
    lp_solver : Optional[str]
        The specific LP solver to use (if family is "lp").
    lp_solver_opts : Optional[Dict[str, Any]]
        Options for the LP solver.
    milp_solver : Optional[str]
        The specific MILP solver to use (if family is "milp").
    milp_solver_opts : Optional[Dict[str, Any]]
        Options for the MILP solver.
    greedy_min_fraction_of_day_to_move : Optional[float]
        Minimum fraction of the day to move for greedy strategies.
    """
    solver_family: Literal["lp","milp","greedy"] = "lp"
    # LP: choose cvxpy solver name if desired
    lp_solver: Optional[str] = "GLPK"        # e.g., "GLPK","ECOS","OSQP","GUROBI","CPLEX"
    lp_solver_opts: Optional[Dict[str, Any]] = None
    # MILP: choose pyomo solver name
    milp_solver: Optional[str] = "cbc"       # e.g., "glpk","cbc","highs","gurobi"
    milp_solver_opts: Optional[Dict[str, Any]] = None
    # Heuristic knobs
    greedy_min_fraction_of_day_to_move: Optional[float] = None  # if set, skip tiny moves


### Functions

#### Temporal helpers

In [18]:
def add_week_start_col(df_pd: pd.DataFrame, week_boundaries: str = "Mon-Sun") -> pd.DataFrame:
    """
    Add a 'week_start' midnight column consistent with week boundaries.
    Currently supports Monday-start weeks ("Mon-Sun") and ISO (also Monday).
    """
    out = df_pd.copy()
    day = pd.to_datetime(out["day"])
    if week_boundaries == "ISO":
        week_start = (day - pd.to_timedelta(day.dt.dayofweek, unit="D")).dt.normalize()
    else:
        # Mon-Sun default
        week_start = (day - pd.to_timedelta(day.dt.dayofweek, unit="D")).dt.normalize()
    out["week_start"] = week_start
    return out

In [19]:
@lru_cache(maxsize=256)
def cached_pairs(
    T: int,
    W_slots: int
) -> Tuple[Tuple[Tuple[int,int], ...], Tuple[np.ndarray, ...], Tuple[np.ndarray, ...]]:
    """
    Generate allowed (t, s) pairs for a given time horizon T and window W_slots.

    Parameters
    ----------
    T : int
        Time horizon (number of slots).
    W_slots : int
        Maximum allowed distance between t and s.

    Returns
    -------
    pairs : Tuple[(int,int), ...]
        Allowed (t, s) pairs with |t - s| ≤ W_slots.
    by_src : Tuple[np.ndarray, ...]
        by_src[t] -> indices i in 'pairs' where pairs[i][0] == t
    by_dst : Tuple[np.ndarray, ...]
        by_dst[s] -> indices i in 'pairs' where pairs[i][1] == s
    """
    pairs: List[Tuple[int,int]] = []
    for t in range(T):
        s0, s1 = max(0, t - W_slots), min(T, t + W_slots + 1)
        for s in range(s0, s1):
            pairs.append((t, s))
    pairs = tuple(pairs)
    by_src = [[] for _ in range(T)]
    by_dst = [[] for _ in range(T)]
    for i, (t, s) in enumerate(pairs):
        by_src[t].append(i)
        by_dst[s].append(i)
    by_src = tuple(np.asarray(ix, dtype=int) for ix in by_src)
    by_dst = tuple(np.asarray(ix, dtype=int) for ix in by_dst)
    return pairs, by_src, by_dst


In [20]:
def day_and_slot_cols(
        df: pl.DataFrame,
        slot_len_min: int = 30
) -> pl.DataFrame:
    """
    Adds "day" (midnight timestamp) and half-hour "slot" [0..47] columns to the DataFrame.

    Parameters:
    ----------
    df : pl.DataFrame
        Input DataFrame containing a "date" column.
    slot_len_min: int
        Length of the time slots in minutes (default is 30).

    Returns:
    --------
    pl.DataFrame
        DataFrame with added "day" and "slot" columns.
    """
    # Adds "day" (midnight ts) and half-hour "slot" [0..47] columns
    half_hour = slot_len_min == 30
    out = (
        df
        .with_columns([
            pl.col("date").dt.truncate("1d").alias("day"),
            (pl.col("date").dt.hour() * (60//slot_len_min) + (pl.col("date").dt.minute() // slot_len_min)).alias("slot")
        ])
    )
    if half_hour:
        # ensure slot in [0..47]
        out = out.with_columns(pl.col("slot").cast(pl.Int32))
    return out

In [21]:
def hours_to_slots(
        hours: float,
        slot_len_min: int = 30
) -> int:
    """
    Convert hours to the number of slots based on the slot length in minutes.

    Parameters:
    ----------
    hours : float
        The number of hours to convert.
    slot_len_min : int
        The length of each slot in minutes (default is 30).

    Returns:
    -------
    int
        The number of slots corresponding to the given hours.
    """
    return int((hours * 60) // slot_len_min)

In [22]:
def week_start(ts: pd.Timestamp, week_boundaries: Literal["Mon-Sun","ISO"]="Mon-Sun") -> pd.Timestamp:
    """
    Compute week start (midnight) for a timestamp given the boundary convention.
    """
    ts = pd.to_datetime(ts).normalize()
    dow = ts.weekday()  # Mon=0..Sun=6
    if week_boundaries == "ISO":
        # Monday is week start in ISO as well; keep same
        return ts - pd.Timedelta(days=dow)
    else:  # "Mon-Sun"
        return ts - pd.Timedelta(days=dow)

#### Baselines and Caps

In [23]:
def cityday_baseline_by_slot(df_city_day: pd.DataFrame, slot_len_min: int = 30,  dtype=np.float32) -> np.ndarray:
    """
    Compute city-day baseline per slot as the sum of original usage across customers.

    Parameters
    ----------
    df_city_day : pandas.DataFrame
        Rows for a single (city, day). Requires 'slot' and 'value'.
    slot_len_min : int
        Slot size in minutes. Defaults to 30.

    Returns
    -------
    np.ndarray
        Baseline usage (kWh) per slot.
    """
    if 60 % slot_len_min != 0:
        raise ValueError("slot_len_min must divide 60.")
    T = 24 * (60 // slot_len_min)
    base = np.zeros(T, dtype=dtype)
    grp = df_city_day.groupby("slot", as_index=False)["value"].sum()
    sl = grp["slot"].to_numpy(dtype=int)
    base[sl] = grp["value"].to_numpy(dtype=dtype)
    return base

#### Customer Percentiles

In [24]:
def compute_city_day_percentiles(df: pl.DataFrame) -> pl.DataFrame:
    # 1) daily kWh per (city, day, ca_id)
    daily = (
        df
        .group_by(["city", "day", "ca_id"], maintain_order=False)
        .agg(pl.col("value").sum().alias("day_kwh"))
    )

    # 2) percentile within (city, day); highest day_kwh gets pct close to 1.0
    # Polars rank() starts at 1. We want a fractional percentile in [0,1].
    # Use "average" to mirror pandas' method="average".
    daily = daily.with_columns([
        (
            pl.col("day_kwh")
            .rank(method="average", descending=True)
            .over(["city", "day"])
            / pl.len().over(["city", "day"])
        ).alias("pct")
    ])

    return daily

In [25]:
def build_cityday_ca_order_map_pl(pl_df: pl.DataFrame) -> Dict[Tuple[str, pd.Timestamp], List[str]]:
    # total kWh per (city, day, ca_id)
    daily = (
        pl_df
        .group_by(["city", "day", "ca_id"], maintain_order=False)
        .agg(pl.col("value").sum().alias("day_kwh"))
        .sort(by=["city", "day", "day_kwh"], descending=[False, False, True])
    )
    # Build grouped lists of ca_id ordered by day_kwh desc
    # (Polars: group again and collect ca_id lists in sorted order above)
    lists = (
        daily
        .group_by(["city", "day"], maintain_order=False)
        .agg(pl.col("ca_id").alias("ca_order"))
    )

    # Materialize to Python dict
    out: Dict[Tuple[str, pd.Timestamp], List[str]] = {}
    for row in lists.iter_rows(named=True):
        city = row["city"]
        day  = pd.to_datetime(row["day"])  # ensure pandas Timestamp key
        out[(city, day)] = row["ca_order"]
    return out

In [26]:
def rank_customers_by_daily_kwh(df_city_day: pd.DataFrame) -> Tuple[List[str], Dict[str, float], Dict[str, float]]:
    """
    Rank customers by daily kWh within a city-day and compute percentiles.

    Returns
    -------
    order : List[str]
        ca_id ordered by descending total daily kWh.
    pct : Dict[str, float]
        ca_id -> percentile rank (0..100).
    totals : Dict[str, float]
        ca_id -> total kWh that day.
    """
    totals = df_city_day.groupby("ca_id", as_index=False)["value"].sum()
    totals = totals.sort_values("value", ascending=False)
    # Percentiles within this city-day
    totals["pct"] = 100.0 * totals["value"].rank(pct=True, method="average")
    order = totals["ca_id"].tolist()
    pct = dict(zip(totals["ca_id"], totals["pct"]))
    tot = dict(zip(totals["ca_id"], totals["value"]))
    return order, pct, tot


#### Peak Helpers

In [27]:
def city_peak_targets_for_day(
    city: str,
    day_ts,
    peak_cfg: PeakHoursReductionLimitConfig,
    slot_len_min: int = 30,
) -> Tuple[Optional[np.ndarray], Optional[List[List[int]]], Optional[float], str]:
    """
    Expand user-provided full hours (e.g. 8, 9, 10) into slot indices for a given day.

    Parameters
    ----------
    city : str
        City name. Must exist as a key in `peak_cfg.peak_hours_dict`.
    day_ts : datetime
        Midnight timestamp for the day being solved (group key).
    peak_cfg : PeakHoursReductionLimitConfig
        Configuration object with Z% and hourly lists per (city, weekday).
    slot_len_min : int, optional
        Slot size in minutes. Defaults to 30 (→ 48 slots).

    Returns
    -------
    mask : np.ndarray or None
        Boolean mask of length T (True where destination slots are peak).
    groups : List[List[int]] or None
        Each inner list is the indices of the slots forming one *hour*. Only used
        when applying a single cap per hour (i.e., limit_scope == "hour").
    Z : float or None
        Fractional cap (e.g., 0.30 for 30%).
    limit_scope : {"slot", "hour"}
        Scope of the cap; mirrors `peak_cfg.limit_scope`.

    Notes
    -----
    - Weekday keys must match `strftime("%a")` ("Mon".."Sun").
    - If city or weekday has no configured hours, returns (None, None, None, "slot").
    """
    if not peak_cfg or not peak_cfg.peak_hours_dict:
        return None, None, None, "slot"

    wk = day_ts.strftime("%a")  # "Mon".."Sun"
    hours = (peak_cfg.peak_hours_dict.get(city, {}) or {}).get(wk, [])
    if not hours:
        return None, None, None, "slot"

    if 60 % slot_len_min != 0:
        raise ValueError(f"slot_len_min must divide 60; got {slot_len_min}.")

    per_hour = 60 // slot_len_min
    T = 24 * per_hour

    mask = np.zeros(T, dtype=bool)
    groups: List[List[int]] = []
    for h in hours:
        if 0 <= h <= 23:
            start = h * per_hour
            end   = start + per_hour
            groups.append(list(range(start, end)))
            mask[start:end] = True

    Z = peak_cfg.peak_hours_reduction_percent_limit / 100.0
    return mask, groups, Z, peak_cfg.limit_scope


#### General Utilities

In [28]:
def build_arrays_for_group_pd(
    sub_df: pd.DataFrame,
    slot_len_min: int = 30,
) -> Optional[Dict[str, np.ndarray]]:
    """
    Build dense, slot-aligned arrays for a single (ca_id, day, city) pandas group.

    Parameters
    ----------
    sub_df : pandas.DataFrame
        Expected columns:
        - 'slot' (int), 'value' (float, kWh)
        - 'marginal_emissions_factor_grams_co2_per_kWh' (float, gCO2/kWh)
        - 'average_emissions_factor_grams_co2_per_kWh' (float, gCO2/kWh)
        - 'floor_kwh' (optional, float)
    slot_len_min : int, optional
        Slot size in minutes. Defaults to 30 (→ 48 slots/day).

    Returns
    -------
    dict or None
        dict with keys: 'usage', 'mef', 'aef' and optional 'floor'.
        Returns None when sub_df is empty or required factors missing in used slots.
    """
    if sub_df.empty:
        return None
    if 60 % slot_len_min != 0:
        raise ValueError(f"slot_len_min must evenly divide 60; got {slot_len_min}.")
    per_hour = 60 // slot_len_min
    T = 24 * per_hour

    usage = np.zeros(T, dtype=np.float32)
    mef   = np.full(T, np.nan, dtype=np.float32)
    aef   = np.full(T, np.nan, dtype=np.float32)
    floor: Optional[np.ndarray] = None

    sl = sub_df["slot"].to_numpy(dtype=int)
    if (sl < 0).any() or (sl >= T).any():
        raise ValueError(f"Found slot outside [0,{T-1}] for slot_len_min={slot_len_min}.")

    usage[sl] = sub_df["value"].to_numpy(dtype=np.float32)
    mef[sl]   = sub_df["marginal_emissions_factor_grams_co2_per_kWh"].to_numpy(dtype=np.float32)
    aef[sl]   = sub_df["average_emissions_factor_grams_co2_per_kWh"].to_numpy(dtype=np.float32)

    if "floor_kwh" in sub_df.columns:
        floor = np.zeros(T, dtype=np.float32)
        floor[sl] = sub_df["floor_kwh"].to_numpy(dtype=np.float32)

    used = usage > 0
    if used.any() and (np.isnan(mef[used]).any() or np.isnan(aef[used]).any()):
        return None

    out = {"usage": usage, "mef": mef, "aef": aef}
    if floor is not None:
        out["floor"] = floor
    return out

In [29]:
def progress_printer(total: int) -> Callable[[int], None]:
    """
    Simple progress printer.
    """

    # simple progress callback factory
    def cb(i: int):
        if total <= 0:
            return
        # print every ~5%
        step = max(1, total // 20)
        if i % step == 0 or i == total:
            print(f"[progress] {i}/{total} groups processed ({(i/total)*100:.1f}%)")
    return cb

#### Peak Masks

In [30]:
def restrict_sources_mask_for_limit(usage: np.ndarray, mef: np.ndarray, K: int) -> np.ndarray:
    """
    Choose up to K source slots allowed to move (others must stay).
    Heuristic: highest potential benefit (mef[t] - best_dest_mef) * usage[t].
    """
    T = len(usage)
    if K is None or K >= T:
        return np.ones(T, dtype=bool)

    # take only slots with positive usage
    pos = usage > 1e-12
    idx = np.flatnonzero(pos)
    if idx.size <= K:
        mask = np.zeros(T, dtype=bool)
        mask[idx] = True
        return mask

    # top-K by mef among positive-usage
    k_idx_local = np.argpartition(-mef[idx], K-1)[:K]        # O(T)
    top_idx = idx[k_idx_local]

    mask = np.zeros(T, dtype=bool)
    mask[top_idx] = True
    return mask

##### Emissions Computations and Reporting

In [31]:
def compute_emissions_totals(
        usage: np.ndarray,
        aef: np.ndarray,
        mef: np.ndarray,
) -> Dict[str, float]:
    """
    Compute total emissions in grams and tonnes of CO2 based on usage and emission factors.

    Parameters:
    -----------
    usage : np.ndarray
        The energy usage for each time slot.
    aef : np.ndarray
        The average emission factor for each time slot.
    mef : np.ndarray
        The marginal emission factor for each time slot.

    Returns:
    --------
    Dict[str, float]
        A dictionary containing the total emissions in grams and tonnes of CO2 for both average and marginal weighting.

    """
    # returns totals in grams and tCO2 for both average and marginal weighting
    g_avg = float(np.dot(usage, aef))   # total grams avg
    g_mrg = float(np.dot(usage, mef))   # total grams marginal
    return {
        "E_avg_g": g_avg,
        "E_avg_t": tco2_from_grams(g_avg),
        "E_marg_g": g_mrg,
        "E_marg_t": tco2_from_grams(g_mrg),
    }


In [32]:
def flows_table(
        ca_id: str,
        city: str,
        day_ts,
        flows: List[Tuple[int,int,float]],
        mef: np.ndarray,
        aef: np.ndarray,
        slot_len_min: int = 30,
) -> List[Dict[str, Any]]:
    """
    Generate a table of flows with associated emissions data.

    Parameters:
    ----------
    ca_id : str
        The ID of the charging station.
    city : str
        The city where the charging station is located.
    day_ts : datetime
        The timestamp for the day of the flows.
    flows : List[Tuple[int,int,float]]
        A list of tuples representing the flows, where each tuple contains
        (start_time, end_time, kwh).
    mef : np.ndarray
        The marginal emission factor for each time slot.
    aef : np.ndarray
        The average emission factor for each time slot.
    slot_len_min : int, optional
        The length of each time slot in minutes (default is 30).

    Returns:
    -------
    List[Dict[str, Any]]
        A list of dictionaries containing the flow data with emissions information.
    """
    rows: List[Dict[str, Any]] = []
    for (t, s, kwh) in flows:
        t_minutes = t * slot_len_min
        s_minutes = s * slot_len_min
        t_ts = day_ts + timedelta(minutes=int(t_minutes))
        s_ts = day_ts + timedelta(minutes=int(s_minutes))
        dirn = "forward" if s > t else ("backward" if s < t else "stay")

        # Per-move emissions deltas (grams) using marginal & average
        g_marg_before = grams_co2_from_kwh_grams_per_kwh(kwh, mef[t])
        g_marg_after  = grams_co2_from_kwh_grams_per_kwh(kwh, mef[s])
        g_marg_delta  = g_marg_before - g_marg_after

        g_avg_before = grams_co2_from_kwh_grams_per_kwh(kwh, aef[t])
        g_avg_after  = grams_co2_from_kwh_grams_per_kwh(kwh, aef[s])
        g_avg_delta  = g_avg_before - g_avg_after

        rows.append({
            "ca_id": ca_id,
            "city": city,
            "day": day_ts,
            "original_time": t_ts,
            "proposed_shift_time": s_ts,
            "delta_minutes": int((s - t) * slot_len_min),
            "shift_direction": dirn,
            "delta_kwh": float(kwh),
            "marginal_emissions_before_shift_grams_co2": g_marg_before,
            "marginal_emissions_after_shift_grams_co2": g_marg_after,
            "marginal_emissions_delta_grams_co2": g_marg_delta,
            "average_emissions_before_shift_grams_co2": g_avg_before,
            "average_emissions_after_shift_grams_co2": g_avg_after,
            "average_emissions_delta_grams_co2": g_avg_delta,
        })
    return rows

In [33]:
def grams_co2_from_kwh_grams_per_kwh(
        kwh: float,
        grams_co2_per_kwh: float
) -> float:
    """
    Convert energy in kWh to grams co2 based on a specific emission factor.

    Parameters:
    ----------
    kwh : float
        The energy in kilowatt-hours.
    grams_co2_per_kwh : float
        The emission factor in grams co2 per kilowatt-hour.

    Returns:
    -------
    float
        The equivalent emissions in grams of co2.
    """
    return float(kwh * grams_co2_per_kwh)


In [34]:
def tco2_from_grams(g: float) -> float:
    """
    Convert grams of CO2 to tonnes of CO2.
    """
    return g / 1_000_000.0


In [35]:
def weighted_median_shift_minutes(flows: List[Tuple[int,int,float]], slot_len_min: int = 30) -> float:
    """
    Energy-weighted median of |Δslots| converted to minutes.
    """
    if not flows:
        return 0.0
    steps = np.array([abs(s - t) for (t, s, _) in flows], dtype=float)
    w     = np.array([kwh for (_, _, kwh) in flows], dtype=float)
    order = np.argsort(steps)
    steps, w = steps[order], w[order]
    cum = np.cumsum(w) / (w.sum() + 1e-12)
    idx = np.searchsorted(cum, 0.5, side="left")
    return float(steps[min(idx, len(steps)-1)] * slot_len_min)

#### Unknown?

In [36]:
def attach_household_floor(
    df: pl.DataFrame,
    cfg: HouseholdMinimumConsumptionLimitConfig,
) -> pl.DataFrame:
    """
    Adds a column 'floor_kwh' per row using:
      floor = max(min_baseline, R% * robust_max_percentile), with epsilon floor.
    Stratification: (hour-of-day, day-of-week); period filter applied across df.
    """
    # Filter by period if you want (here we keep full df; you can parametrize)
    base = (
        df
        .with_columns([
            pl.col("date").dt.hour().alias("hod"),
            pl.col("date").dt.strftime("%A").alias("dow"),
        ])
        .group_by(["ca_id","hod","dow"])
    )

    # Baseline
    if cfg.household_minimum_baseline_type == "average":
        baseline = base.agg(pl.mean("value").alias("baseline_kwh"))
    else:
        baseline = base.agg(pl.min("value").alias("baseline_kwh"))

    # Robust max (percentile on 'value')
    p = cfg.household_minimum_robust_max_percentile
    robust = (
        df
        .with_columns([
            pl.col("date").dt.hour().alias("hod"),
            pl.col("date").dt.strftime("%A").alias("dow"),
        ])
        .group_by(["ca_id","hod","dow"])
        .agg(pl.col("value").quantile(p/100.0, interpolation="nearest").alias("robust_max_kwh"))
    )

    stats = baseline.join(robust, on=["ca_id","hod","dow"], how="full")
    stats = stats.with_columns([
        pl.col("baseline_kwh").fill_null(0.0),
        pl.col("robust_max_kwh").fill_null(0.0),
    ])

    stats = stats.with_columns([
        pl.max_horizontal([
            pl.col("baseline_kwh"),
            (pl.col("robust_max_kwh") * (cfg.household_minimum_R_percent / 100.0))
        ]).alias("floor_kwh_raw")
    ]).with_columns([
        pl.max_horizontal([pl.col("floor_kwh_raw"), pl.lit(cfg.household_minimum_epsilon_floor_kWh)]).alias("floor_kwh")
    ]).select(["ca_id","hod","dow","floor_kwh"])

    # Attach back to original df
    df_with_floor = (
        df
        .with_columns([
            pl.col("date").dt.hour().alias("hod"),
            pl.col("date").dt.strftime("%A").alias("dow"),
        ])
        .join(stats, on=["ca_id","hod","dow"], how="left")
        .drop(["hod","dow"])
    )
    return df_with_floor

#### Solvers

In [37]:
def greedy_k_moves(
    usage: np.ndarray,
    mef: np.ndarray,
    aef: np.ndarray,
    *,
    W_slots: int,
    slot_len_min: int,
    floor_vec: Optional[np.ndarray],
    peak_mask: Optional[np.ndarray],
    peak_groups: Optional[List[List[int]]],
    Z: Optional[float],
    cap_mode: Literal["slot","hour"] = "slot",
    dest_rem: np.ndarray,
    reg_rem: float,
    max_moves: int = 1,
    enforce_distinct_sources: bool = True,   # NEW: enforce unique source slots
) -> Tuple[np.ndarray, List[Tuple[int,int,float]], float, float]:
    """
    Greedy solver that performs up to `max_moves` beneficial moves (t->s).
    If `enforce_distinct_sources` is True, each move must use a *new* source slot t
    (i.e., at most `max_moves` distinct sources moved for the day).
    """
    T = len(usage)
    usage_opt = usage.copy()
    flows: List[Tuple[int,int,float]] = []
    used_dest_total = 0.0
    used_reg_total  = 0.0

    # Track which sources have already been used (distinct-sources cap)
    used_sources: Set[int] = set()

    # Peak comfort remaining reducible energy (per slot or per hour) for THIS customer-day
    if Z is not None and peak_mask is not None and peak_mask.any():
        if cap_mode == "slot":
            peak_rem_slot = np.zeros(T, dtype=np.float32)
            peak_idx = np.where(peak_mask)[0]
            peak_rem_slot[peak_mask] = np.float32(Z) * usage_opt[peak_mask].astype(np.float32, copy=False)
            peak_hour_rem: Optional[List[float]] = None
        else:
            peak_hour_rem = []
            for grp in (peak_groups or []):
                peak_hour_rem.append(Z * float(usage_opt[grp].sum()))
            peak_rem_slot = None
    else:
        peak_rem_slot = None
        peak_hour_rem = None

    for _ in range(max_moves):
        best_gain = 0.0
        best: Optional[Tuple[int,int,float]] = None

        for t in range(T):
            # Distinct sources: skip if t already used in a previous move
            if enforce_distinct_sources and (t in used_sources):
                continue

            if usage_opt[t] <= 1e-12:
                continue

            floor_t = float(floor_vec[t]) if (floor_vec is not None) else 0.0
            avail_from_t = max(0.0, usage_opt[t] - floor_t)
            if avail_from_t <= 1e-12:
                continue

            # Peak remaining allowance if moving OUT of peak
            peak_lim_t = float("inf")
            if peak_rem_slot is not None and (peak_rem_slot[t] > 0.0):
                peak_lim_t = peak_rem_slot[t]
            elif peak_hour_rem is not None and peak_groups is not None:
                # find hour group index for t (only if t is inside any peak group)
                for k, grp in enumerate(peak_groups):
                    if t in grp:
                        if peak_hour_rem[k] > 0.0:
                            peak_lim_t = peak_hour_rem[k]
                        else:
                            peak_lim_t = 0.0
                        break

            s0, s1 = max(0, t - W_slots), min(T, t + W_slots + 1)
            for s in range(s0, s1):
                if s == t:
                    continue
                # Only consider moves that reduce emissions
                if mef[s] >= mef[t] - 1e-12:
                    continue

                # Destination anti-spike residual
                cap_dest = float(dest_rem[s]) if dest_rem is not None else float("inf")
                # Regional budget
                cap_reg = float(reg_rem) if reg_rem is not None else float("inf")

                # If moving OUT of peak, also cap by remaining peak allowance from t
                if (peak_mask is not None) and peak_mask.any() and peak_mask[t]:
                    q_max = min(avail_from_t, peak_lim_t, cap_dest, cap_reg)
                else:
                    q_max = min(avail_from_t, cap_dest, cap_reg)

                if q_max <= 1e-12:
                    continue

                # emissions gain = (mef[t]-mef[s]) * q
                gain = (mef[t] - mef[s]) * q_max
                if gain > best_gain + 1e-12:
                    best_gain = gain
                    best = (t, s, q_max)

        if best is None:
            break  # no beneficial distinct-source move left

        # Apply the best move
        t, s, q = best
        usage_opt[t] -= q
        usage_opt[s] += q
        flows.append((t, s, float(q)))

        # Mark source as used if enforcing distinct sources
        if enforce_distinct_sources:
            used_sources.add(t)

        # Update shared caps
        if dest_rem is not None:
            dest_rem[s] = max(0.0, dest_rem[s] - q)
        if reg_rem is not None:
            reg_rem = max(0.0, reg_rem - q)

        used_dest_total += q
        used_reg_total  += q

        # Update peak allowance if we moved OUT of peak
        if Z is not None and peak_mask is not None and peak_mask.any() and peak_mask[t]:
            if peak_rem_slot is not None:
                peak_rem_slot[t] = max(0.0, peak_rem_slot[t] - q)
            elif peak_hour_rem is not None and peak_groups is not None:
                for k, grp in enumerate(peak_groups):
                    if t in grp:
                        peak_hour_rem[k] = max(0.0, peak_hour_rem[k] - q)
                        break

    return usage_opt, flows, used_dest_total, used_reg_total


In [38]:
def solve_milp_k(
    mef: np.ndarray,
    usage: np.ndarray,
    W_slots: int,
    cfg: SolverConfig,
    *,
    max_moves: int,
    peak_mask: Optional[np.ndarray] = None,
    peak_groups: Optional[List[List[int]]] = None,
    Z: Optional[float] = None,
    cap_mode: Literal["slot","hour"] = "hour",
    floor_vec: Optional[np.ndarray] = None,
    dest_upper_bounds: Optional[np.ndarray] = None,
    moved_kwh_cap: Optional[float] = None,
) -> Tuple[np.ndarray, List[Tuple[int,int,float]]]:
    assert pyo is not None
    T = len(mef)
    pairs, by_src, by_dst = cached_pairs(T, W_slots)

    m = pyo.ConcreteModel()
    m.P = pyo.RangeSet(0, len(pairs)-1)
    m.y = pyo.Var(m.P, domain=pyo.NonNegativeReals)

    # z_t = 1 if any move out of t to s != t
    m.TS = pyo.RangeSet(0, T-1)
    m.z  = pyo.Var(m.TS, domain=pyo.Binary)

    # Objective: minimize post emissions
    m.obj = pyo.Objective(expr=sum(m.y[i] * mef[pairs[i][1]] for i in m.P), sense=pyo.minimize)

    # Supply conservation: for each t, sum_s y_{t->s} = usage[t]
    def supply_rule(m, t):
        idx = by_src[t].tolist()
        return sum(m.y[i] for i in idx) == float(usage[t])
    m.supply = pyo.Constraint(m.TS, rule=supply_rule)

    # Link y to z: any flow to s!=t implies z_t = 1
    m.link = pyo.ConstraintList()
    # Big-M per pair
    for i, (t, s) in enumerate(pairs):
        if s == t:
            continue
        # M bound like your greedy: source above floor, dest headroom, peak allowance, regional cap
        src_floor = float(floor_vec[t]) if (floor_vec is not None) else 0.0
        ub_src = max(0.0, float(usage[t] - src_floor))
        if dest_upper_bounds is not None and np.isfinite(dest_upper_bounds[s]):
            ub_dest = max(0.0, float(dest_upper_bounds[s]))
        else:
            ub_dest = float("inf")
        ub_peak = float("inf")
        if Z is not None and peak_mask is not None and peak_mask.any() and peak_mask[t]:
            # cap per slot or per hour of the source side – conservative slot-level bound
            if cap_mode == "slot":
                ub_peak = Z * float(usage[t])
            else:
                # if hour groups provided, use the hour bound for t
                if peak_groups:
                    for grp in peak_groups:
                        if t in grp:
                            ub_peak = Z * float(usage[grp].sum())
                            break
        ub_reg = float(moved_kwh_cap) if (moved_kwh_cap is not None) else float("inf")
        M_ts = min(ub_src, ub_dest, ub_peak, ub_reg)
        if not np.isfinite(M_ts) or M_ts < 0:
            M_ts = 0.0
        m.link.add(m.y[i] <= M_ts * m.z[t])

    # K moves total
    m.kcap = pyo.Constraint(expr=sum(m.z[t] for t in m.TS) <= int(max_moves))

    # # Floors at destinations
    # if floor_vec is not None:
    #     for s in range(T):
    #         if floor_vec[s] > 0:
    #             idx = by_dst[s].tolist()
    #             if idx:
    #                 m.add_component(f"floor_{s}", pyo.Constraint(expr=sum(m.y[i] for i in idx) >= float(floor_vec[s])))

    # # Peak comfort at destinations (keep your existing per-slot/per-hour minima)
    # if peak_mask is not None and Z is not None:
    #     if cap_mode == "slot":
    #         for s in np.where(peak_mask)[0]:
    #             base_s = float(usage[s])
    #             if base_s > 1e-12:
    #                 idx = by_dst[s].tolist()
    #                 if idx:
    #                     m.add_component(f"peak_slot_min_{s}", pyo.Constraint(expr=sum(m.y[i] for i in idx) >= (1.0 - Z) * base_s))
    #     else:
    #         if peak_groups:
    #             for k, grp in enumerate(peak_groups):
    #                 base_h = float(usage[grp].sum())
    #                 if base_h > 1e-12:
    #                     idxs = []
    #                     for s in grp:
    #                         idxs.extend(by_dst[s].tolist())
    #                     if idxs:
    #                         m.add_component(f"peak_hour_min_{k}", pyo.Constraint(expr=sum(m.y[i] for i in idxs) >= (1.0 - Z) * base_h))

    # Anti-spike dest caps (remaining headroom for this customer)
    if dest_upper_bounds is not None:
        for s in range(T):
            cap = dest_upper_bounds[s]
            if np.isfinite(cap):
                idx = by_dst[s].tolist()
                if idx:
                    m.add_component(f"dest_cap_{s}",
                                    pyo.Constraint(expr=sum(m.y[i] for i in idx) <= float(usage[s]) + float(cap))
                                    )

    # Regional moved-kWh cap
    if moved_kwh_cap is not None:
        stay_idxs = [i for i, (t, s) in enumerate(pairs) if t == s]
        stay_expr = sum(m.y[i] for i in stay_idxs)
        m.add_component("moved_cap", pyo.Constraint(expr=(float(usage.sum()) - stay_expr) <= float(moved_kwh_cap)))

    # NEW: robust solver creation with fallbacks
    solver_name = cfg.milp_solver or "cbc"
    solver = pyo.SolverFactory(solver_name)

    # If preferred solver isn't available, try a few common fallbacks
    if (solver is None) or (not solver.available(False)):
        for cand in ("highs", "glpk"):
            s = pyo.SolverFactory(cand)
            if (s is not None) and s.available(False):
                solver, solver_name = s, cand
                break
        else:
            raise RuntimeError("No MILP solver available (tried cbc, highs, glpk). "
                            "Install one of them or set SolverConfig.milp_solver accordingly.")

    # Optional: pass any user-specified options (e.g., {"threads": 1, "time_limit": 60})
    if cfg.milp_solver_opts:
        for k, v in cfg.milp_solver_opts.items():
            solver.options[k] = v

    # Solve
    solver.solve(m, tee=False)


    y = np.array([pyo.value(m.y[i]) for i in m.P], dtype=float)
    usage_opt = np.zeros(T, dtype=float)
    flows: List[Tuple[int,int,float]] = []
    for i, val in enumerate(y):
        if val is None or val <= 1e-12:
            continue
        t, s = pairs[i]
        usage_opt[s] += val
        flows.append((t, s, val))
    return usage_opt, flows


#### Pipeline

In [39]:
def _init_worker_singlethread():
    # Keep each worker process single-threaded for math libs
    os.environ.setdefault("OMP_NUM_THREADS", "1")
    os.environ.setdefault("OPENBLAS_NUM_THREADS", "1")
    os.environ.setdefault("MKL_NUM_THREADS", "1")
    os.environ.setdefault("VECLIB_MAXIMUM_THREADS", "1")
    try:
        # This caps numpy/scipy threadpools too (if available)
        from threadpoolctl import threadpool_limits
        threadpool_limits(1)
    except Exception:
        pass

In [40]:
def _iter_results_local(job_args, workers: int):
    import multiprocessing as mp
    # Use 'fork' on macOS/Linux; safer in notebooks
    ctx = mp.get_context("fork")
    with ctx.Pool(processes=workers, initializer=_init_worker_singlethread) as pool:
        # tune chunksize if needed
        for res in pool.imap(_solve_cityweek_worker, job_args, chunksize=1):
            yield res

In [41]:
def _iter_results_mpi(job_args, workers: int):
    try:
        from mpi4py.futures import MPIPoolExecutor
    except Exception as e:
        raise RuntimeError(
            "MPI requested but mpi4py is not available. "
            "Install mpi4py and run with mpirun/mpiexec."
        ) from e
    with MPIPoolExecutor(max_workers=workers) as ex:
        for res in ex.map(_solve_cityweek_worker, job_args, chunksize=1):
            yield res

In [42]:
# --- TOP-LEVEL WORKER (must live at module scope) ---
def _solve_cityweek_worker(args):
    (
        city, week_ts, df_cityweek,
        policy, solver,
        slot_len, W_slots,
        cityday_ca_order,               # dict[(city, day)->list of ca_id] or None
        emit_optimised_rows
    ) = args

    m_rows, mv_rows, o_rows = [], [], []

    # Weekly move counters per customer
    weekly_quota = dict.fromkeys(
        df_cityweek["ca_id"].unique().tolist(),
        policy.behavioral.customer_power_moves_per_week
    )

    for day_ts, df_city_day in df_cityweek.groupby("day", sort=True, group_keys=False):
        print("DEBUG: entered day loop")
        # float32 city-day arrays
        base_city = cityday_baseline_by_slot(df_city_day, slot_len_min=slot_len, dtype=np.float32)
        alpha = (policy.spike_cap.alpha_peak_cap_percent / 100.0) if policy.spike_cap else 0.25
        post_city_cap  = (1.0 + np.float32(alpha)) * base_city.astype(np.float32, copy=False)
        post_city_used = np.zeros_like(post_city_cap, dtype=np.float32)

        print("DEBUG: city cap calculated")
        # regional moved budget (float32)
        if policy.regional_cap:
            P_pct = np.float32(policy.regional_cap.regional_load_shift_percent_limit / 100.0)
            city_daily_avg = np.float32(
                policy.regional_cap.regional_total_daily_average_load_kWh.get(city, 0.0)
            )
            moved_budget_remaining = np.float32(P_pct * city_daily_avg)
        else:
            moved_budget_remaining = np.float32(0.0)
        print("DEBUG: moved budget calculated")

        # precomputed order
        if cityday_ca_order is not None:
            order = cityday_ca_order.get((city, day_ts))
            if order is None:
                order = df_city_day["ca_id"].drop_duplicates().tolist()
        else:
            order = (
                df_city_day.groupby("ca_id", as_index=False)["value"].sum()
                           .sort_values("value", ascending=False)["ca_id"]
                           .tolist()
            )
        print("DEBUG: precomputed order determined")

        # PRE-SPLIT once per day: avoid repeated boolean filters on Arrow strings
        by_ca = {cid: g for cid, g in df_city_day.groupby("ca_id", sort=False)}

        for ca_id in order:
            # print(f"DEBUG Entering customer loop - processing {ca_id}")
            if weekly_quota.get(ca_id, 0) <= 0:
                continue

            # O(1) lookup instead of df_city_day[df_city_day["ca_id"] == ca_id]
            sub = by_ca.get(ca_id)
            if sub is None or sub.empty:
                continue

            arrs = build_arrays_for_group_pd(sub, slot_len_min=slot_len)
            if arrs is None:
                continue

            usage = arrs["usage"].astype(np.float32, copy=False)
            mef   = arrs["mef"].astype(np.float32, copy=False)
            aef   = arrs["aef"].astype(np.float32, copy=False)
            floor_vec = arrs.get("floor")
            if floor_vec is not None:
                floor_vec = floor_vec.astype(np.float32, copy=False)

            peak_cfg = policy.behavioral.peak_hours_reduction_limit_config
            peak_mask, peak_groups, Z, cap_mode = city_peak_targets_for_day(
                city, day_ts, peak_cfg, slot_len_min=slot_len
            )
            if Z is not None:
                Z = np.float32(Z)

            K_sources = int(max(0, min(policy.behavioral.customer_power_moves_per_day,
                                       weekly_quota[ca_id])))

            resid_cap = np.maximum(np.float32(0.0), post_city_cap - post_city_used).astype(np.float32, copy=False)
            moved_kwh_cap = np.float32(max(0.0, float(moved_budget_remaining)))

            # print("DEBUG: prepped usage, mef, aef, peak_mask, resid_cap, moved_kwh_cap")
            # print("DEBUG: entering solver")
            # solve
            if solver.solver_family == "milp":
                usage_opt, flows = solve_milp_k(
                    mef=mef, usage=usage, W_slots=W_slots, cfg=solver, max_moves=K_sources,
                    peak_mask=peak_mask, peak_groups=peak_groups, Z=Z, cap_mode=cap_mode,
                    floor_vec=floor_vec, dest_upper_bounds=resid_cap, moved_kwh_cap=float(moved_kwh_cap),
                )
            else:
                usage_opt, flows, used_dest, used_reg = greedy_k_moves(
                    usage=usage, mef=mef, aef=aef,
                    W_slots=W_slots, slot_len_min=slot_len, floor_vec=floor_vec,
                    peak_mask=peak_mask, peak_groups=peak_groups, Z=Z, cap_mode=cap_mode,
                    dest_rem=resid_cap.copy(), reg_rem=float(moved_kwh_cap), max_moves=K_sources,
                )

            # print("DEBUG: solver finished")
            usage_opt = usage_opt.astype(np.float32, copy=False)

            # update tallies
            post_city_used += usage_opt
            moved_kwh = float(np.maximum(np.float32(0.0), usage - usage_opt).sum(dtype=np.float32))
            moved_budget_remaining = np.maximum(np.float32(0.0), moved_budget_remaining - np.float32(moved_kwh))
            weekly_moves_used = len({t for (t, s, _) in flows if s != t})
            weekly_quota[ca_id] = max(0, weekly_quota[ca_id] - weekly_moves_used)

            # print("DEBUG: updated weekly quota")
            # metrics row (emit as float64 for reporting)
            base = compute_emissions_totals(usage.astype(float), aef.astype(float), mef.astype(float))
            post = compute_emissions_totals(usage_opt.astype(float), aef.astype(float), mef.astype(float))
            median_shift = weighted_median_shift_minutes(flows, slot_len)
            m_rows.append({
                "ca_id": ca_id, "city": city, "day": day_ts,
                "baseline_E_avg_g": base["E_avg_g"], "post_E_avg_g": post["E_avg_g"],
                "delta_E_avg_g": base["E_avg_g"] - post["E_avg_g"],
                "baseline_E_marg_g": base["E_marg_g"], "post_E_marg_g": post["E_marg_g"],
                "delta_E_marg_g": base["E_marg_g"] - post["E_marg_g"],
                "baseline_kwh": float(np.sum(usage, dtype=np.float32)),
                "post_kwh": float(np.sum(usage_opt, dtype=np.float32)),
                "moved_kwh": moved_kwh,
                "avg_shift_minutes_energy_weighted": float(
                    (sum(abs(s - t) * val for (t, s, val) in flows) / max(moved_kwh, 1e-9)) * slot_len
                ) if flows else 0.0,
                "median_shift_minutes_energy_weighted": median_shift,
                "weekly_moves_used": weekly_moves_used,
                "weekly_moves_remaining": weekly_quota[ca_id],
            })
            # print("DEBUG: metrics row added")
            # moves
            mv_rows.extend(
                flows_table(ca_id, city, day_ts, flows, mef.astype(float), aef.astype(float), slot_len_min=slot_len)
            )

            # optional per-slot outputs
            if emit_optimised_rows:
                for s in range(len(usage_opt)):
                    ts = day_ts + timedelta(minutes=int(s * slot_len))
                    o_rows.append({
                        "ca_id": ca_id, "city": city, "day": day_ts,
                        "slot": s, "date": ts, "optimised_value": float(usage_opt[s]),
                    })
        # print("DEBUG: customer loop")
    # print("DEBUG: exiting day loop")
    return m_rows, mv_rows, o_rows

In [43]:
def run_pipeline_pandas_cityweek_budget(
    df_pd: pd.DataFrame,
    policy: ShiftPolicy,
    solver: SolverConfig,
    *,
    shuffle_high_usage_order: bool = False,
    emit_optimised_rows: bool = True,
    workers: int = 1,
    show_progress: bool = True,
    cityday_ca_order: Optional[Dict[Tuple[str, pd.Timestamp], List[str]]] = None,
    backend: Literal["local","mpi"] = "local",   # <— NEW
) -> Tuple[pd.DataFrame, pd.DataFrame, pd.DataFrame]:

    # derive week_start once
    df = add_week_start_col(df_pd, policy.behavioral.week_boundaries)

    df = df.sort_values(["city", "week_start", "day", "ca_id", "slot"])
    groups = list(df.groupby(["city","week_start"], sort=True, group_keys=False))
    total = len(groups)

    # hoist constants
    slot_len = policy.behavioral.slot_length_minutes
    W_slots = hours_to_slots(policy.behavioral.shift_hours_window, slot_len)

    metrics_rows: List[Dict[str, Any]] = []
    move_rows: List[Dict[str, Any]] = []
    opt_rows:  List[Dict[str, Any]] = []

    # Build job args (must be picklable)
    job_args = [
        (city, wk, df_cw, policy, solver, slot_len, W_slots, cityday_ca_order, emit_optimised_rows)
        for (city, wk), df_cw in groups
    ]

    workers = min(workers or 1, total)

    if workers > 1:
        if backend == "mpi":
            iterator = _iter_results_mpi(job_args, workers)
        else:
            iterator = _iter_results_local(job_args, workers)

        for i, (m, mv, o) in enumerate(iterator, 1):
            metrics_rows.extend(m); move_rows.extend(mv); opt_rows.extend(o)
            if show_progress and (i % max(1, total//20) == 0 or i == total):
                print(f"[progress] {i}/{total} city-weeks processed ({i/total*100:.1f}%)")
    else:
        for i, args in enumerate(job_args, 1):
            m, mv, o = _solve_cityweek_worker(args)
            metrics_rows.extend(m); move_rows.extend(mv); opt_rows.extend(o)
            if show_progress and (i % max(1, total//20) == 0 or i == total):
                print(f"[progress] {i}/{total} city-weeks processed ({i/total*100:.1f}%)")

    return pd.DataFrame(metrics_rows), pd.DataFrame(move_rows), pd.DataFrame(opt_rows)


### Implementation

Regional Total Daily Average Loads

Sources:
* Delhi : https://www.ceicdata.com/en/india/electricity-consumption-utilities/electricity-consumption-utilities-delhi
* Maharashtra : https://www.ceicdata.com/en/india/electricity-consumption-utilities/electricity-consumption-utilities-maharashtra

Values:
* Delhi Annual Electricity Consumption (2023): 34,107.000 GWh
* Maharashtra Annual Electricity Consumption (2023): 155,518.000 GWh

Logic:
* Delhi : 34,107.000 GWh Annual / 365 = 93.500 GWh Daily / 24  = 3.937 GWh Hourly
* Maharashtra : 155,518.000 GWh Annual / 365 = 426.000 GWh Daily / 24 = 17.750 GWh Hourly


In [44]:
delhi_total_daily_average_load_gWh = 3.937  # GWh
maharashtra_total_daily_average_load_gWh = 17.750  # GWh

delhi_total_daily_average_load_kWh = delhi_total_daily_average_load_gWh * 1_000_000 # convert GWh to kWh
maharashtra_total_daily_average_load_kWh = maharashtra_total_daily_average_load_gWh * 1_000_000 # convert GWh to kWh

In [45]:
peak_hours_reduction_limit_config = PeakHoursReductionLimitConfig(
    peak_hours_reduction_percent_limit=25.0,
    peak_hours_reduction_scope="per_city",
    peak_hours_dict={
        "delhi": {
            "Mon": [8,9,10,11,12,20],
            "Tue": [9,10,11,20,21,22],
            "Wed": [9,10,11,20,21,22],
            "Thu": [9,10,11,20,21,22],
            "Fri": [8,9,10,11,20,21],
            "Sat": [9,10,11,12,13,20],
            "Sun": [10,11,12,13,14,20],
        },
        "mumbai": {
            "Mon": [8,9,10,11,12,20],
            "Tue": [9,10,11,20,21,22],
            "Wed": [9,10,11,20,21,22],
            "Thu": [9,10,11,20,21,22],
            "Fri": [8,9,10,11,20,21],
            "Sat": [9,10,11,12,13,20],
            "Sun": [10,11,12,13,14,20],
        },
    },
)

In [46]:
policy = ShiftPolicy(
    behavioral=CustomerAdoptionBehavioralConfig(
                        customer_power_moves_per_day=1,
                        customer_power_moves_per_week=3,
                        timezone="Asia/Kolkata",
                        day_boundaries="00:00-24:00",
                        week_boundaries="Mon-Sun",
                        shift_hours_window=2.0,
                        slot_length_minutes=30,
                        peak_hours_reduction_limit_config=peak_hours_reduction_limit_config
                    ),
    regional_cap=RegionalLoadShiftingLimitConfig(
                        regional_load_shift_percent_limit=10.0,
                        regional_total_daily_average_load_kWh={"delhi": delhi_total_daily_average_load_kWh,
                                                            "mumbai": maharashtra_total_daily_average_load_kWh}),
    household_min=HouseholdMinimumConsumptionLimitConfig(
                        household_minimum_baseline_period="year",
                        household_minimum_baseline_type="average",
                        household_minimum_robust_max_percentile=95,
                        household_minimum_R_percent=10),
    spike_cap=ShiftWithoutSpikeLimitConfig(alpha_peak_cap_percent=25),
)

In [47]:
# marginal_emissions_pldf.describe()

In [48]:
pl_df = day_and_slot_cols(marginal_emissions_pldf, slot_len_min=policy.behavioral.slot_length_minutes)

In [49]:
if policy.household_min is not None:
    pl_df = attach_household_floor(pl_df, policy.household_min)

# usually takes about 40s to run on a two week dataset

In [50]:
# 2) Keep only needed columns
cols_needed = [
    "ca_id","city","date","day","slot","value",
    "marginal_emissions_factor_grams_co2_per_kWh",
    "average_emissions_factor_grams_co2_per_kWh",
]
if "floor_kwh" in pl_df.columns:
    cols_needed.append("floor_kwh")
pl_df = pl_df.select(cols_needed).sort(["ca_id","day","slot"])

# takes ~16 seconds

In [51]:
cityday_ca_order = build_cityday_ca_order_map_pl(pl_df)

In [52]:
pct_tbl = compute_city_day_percentiles(pl_df)

In [53]:
# Convert to pandas for better paralellisation
df_pd = pl_df.to_pandas(use_pyarrow_extension_array=False)  # keeps dictionary cols efficient

In [54]:
df_pd.head(5)

Unnamed: 0,ca_id,city,date,day,slot,value,marginal_emissions_factor_grams_co2_per_kWh,average_emissions_factor_grams_co2_per_kWh,floor_kwh
0,60000005516,delhi,2022-05-04 00:00:00+05:30,2022-05-04 00:00:00+05:30,0,0.002,738.426404,763.740246,0.0022
1,60000005516,delhi,2022-05-04 00:30:00+05:30,2022-05-04 00:00:00+05:30,1,0.002,691.052088,768.35144,0.0022
2,60000005516,delhi,2022-05-04 01:00:00+05:30,2022-05-04 00:00:00+05:30,2,0.002,637.266463,774.318949,0.00225
3,60000005516,delhi,2022-05-04 01:30:00+05:30,2022-05-04 00:00:00+05:30,3,0.002,646.189219,783.035512,0.00225
4,60000005516,delhi,2022-05-04 02:00:00+05:30,2022-05-04 00:00:00+05:30,4,0.002,682.124021,789.819035,0.00225


In [55]:
solver = SolverConfig(solver_family="greedy",)
parallel = ParallelConfig(enabled=False, method="local", workers=1, show_progress=True)

In [None]:
os.environ.setdefault("VECLIB_MAXIMUM_THREADS", "1")
os.environ.setdefault("OMP_NUM_THREADS", "1")
os.environ.setdefault("OPENBLAS_NUM_THREADS", "1")
os.environ.setdefault("MKL_NUM_THREADS", "1")

'1'

: 

In [None]:
# 5) Run the city-week, K-moves pipeline (MILP or Greedy)
metrics_df, moves_df, opt_df = run_pipeline_pandas_cityweek_budget(
    df_pd=df_pd,
    policy=policy,
    solver=solver,          # set solver.solver_family to "milp" or "greedy"
    shuffle_high_usage_order=False,
    emit_optimised_rows=True,
    show_progress=parallel.show_progress,
    workers=parallel.workers,  # number of parallel workers
    cityday_ca_order=cityday_ca_order,   # pass the precomputed order
    backend=parallel.method,  # pass the backend
)

  groups = list(df.groupby(["city","week_start"], sort=True, group_keys=False))


DEBUG: entered day loop
DEBUG: city cap calculated
DEBUG: moved budget calculated
DEBUG: precomputed order determined
DEBUG: entered day loop
DEBUG: city cap calculated
DEBUG: moved budget calculated
DEBUG: precomputed order determined
DEBUG: entered day loop
DEBUG: city cap calculated
DEBUG: moved budget calculated
DEBUG: precomputed order determined
DEBUG: entered day loop
DEBUG: city cap calculated
DEBUG: moved budget calculated
DEBUG: precomputed order determined
DEBUG: entered day loop
DEBUG: city cap calculated
DEBUG: moved budget calculated
DEBUG: precomputed order determined
[progress] 1/3 city-weeks processed (33.3%)
DEBUG: entered day loop
DEBUG: city cap calculated
DEBUG: moved budget calculated
DEBUG: precomputed order determined
DEBUG: entered day loop
DEBUG: city cap calculated
DEBUG: moved budget calculated
DEBUG: precomputed order determined
DEBUG: entered day loop
DEBUG: city cap calculated
DEBUG: moved budget calculated
DEBUG: precomputed order determined


In [None]:
results_directory = os.path.join(optimisation_development_directory, "results")
os.makedirs(results_directory, exist_ok=True)

In [None]:
metrics_df.to_parquet(os.path.join(results_directory, "metrics_config_2_greedy.parquet"))
moves_df.to_parquet(os.path.join(results_directory, "moves_config_2_greedy.parquet"))
opt_df.to_parquet(os.path.join(results_directory, "optimised_config_2_greedy.parquet"))

#### CONFIGURATION 1 - Lax with the limits

In [56]:
peak_hours_reduction_limit_config_1 = PeakHoursReductionLimitConfig(
        peak_hours_reduction_percent_limit=80,
        peak_hours_reduction_scope="per_city",
        peak_hours_dict={
            "delhi": {
                "Mon": [8,9,10,11,12,20],
                "Tue": [9,10,11,20,21,22],
                "Wed": [9,10,11,20,21,22],
                "Thu": [9,10,11,20,21,22],
                "Fri": [8,9,10,11,20,21],
                "Sat": [9,10,11,12,13,20],
                "Sun": [10,11,12,13,14,20],
            },
            "mumbai": {
                "Mon": [8,9,10,11,12,20],
                "Tue": [9,10,11,20,21,22],
                "Wed": [9,10,11,20,21,22],
                "Thu": [9,10,11,20,21,22],
                "Fri": [8,9,10,11,20,21],
                "Sat": [9,10,11,12,13,20],
                "Sun": [10,11,12,13,14,20],
            },
        },
    )

In [57]:
policy_1 = ShiftPolicy(
        behavioral=CustomerAdoptionBehavioralConfig(
                            customer_power_moves_per_day=2,
                            customer_power_moves_per_week=7,
                            timezone="Asia/Kolkata",
                            day_boundaries="00:00-24:00",
                            week_boundaries="Mon-Sun",
                            shift_hours_window=3.0,
                            slot_length_minutes=30,
                            peak_hours_reduction_limit_config=peak_hours_reduction_limit_config_1
                        ),
        regional_cap=RegionalLoadShiftingLimitConfig(
                            regional_load_shift_percent_limit=10.0,
                            regional_total_daily_average_load_kWh={"delhi": delhi_total_daily_average_load_kWh,
                                                                "mumbai": maharashtra_total_daily_average_load_kWh}),
        household_min=HouseholdMinimumConsumptionLimitConfig(
                            household_minimum_baseline_period="year",
                            household_minimum_baseline_type="average",
                            household_minimum_robust_max_percentile=95,
                            household_minimum_R_percent=10),
        spike_cap=ShiftWithoutSpikeLimitConfig(alpha_peak_cap_percent=25),
    )


In [None]:
parallel_1 = ParallelConfig(enabled=False, method="local", workers=1, show_progress=True)

: 

##### CONFIGURATION 1 - GREEDY SOLVER

In [None]:
solver_1_greedy = SolverConfig(solver_family="greedy")

marginal_emissions_pldf_run_1_greedy = day_and_slot_cols(marginal_emissions_pldf, slot_len_min=policy_1.behavioral.slot_length_minutes)

if policy_1.household_min is not None:
    marginal_emissions_pldf_run_1_greedy = attach_household_floor(df=marginal_emissions_pldf_run_1_greedy, cfg=policy_1.household_min)

cols_needed = [
        "ca_id","city","date","day","slot","value",
        "marginal_emissions_factor_grams_co2_per_kWh",
        "average_emissions_factor_grams_co2_per_kWh",
]
if "floor_kwh" in marginal_emissions_pldf_run_1_greedy.columns:
        cols_needed.append("floor_kwh")

marginal_emissions_pldf_run_1_greedy = marginal_emissions_pldf_run_1_greedy.select(cols_needed).sort(["ca_id","day","slot"])

cityday_ca_order_1_greedy = build_cityday_ca_order_map_pl(marginal_emissions_pldf_run_1_greedy)

pct_tbl_1_greedy = compute_city_day_percentiles(marginal_emissions_pldf_run_1_greedy)

marginal_emissions_pldf_run_1_greedy = marginal_emissions_pldf_run_1_greedy.to_pandas()  # keeps dictionary cols efficient

metrics_df_1_greedy, moves_df_1_greedy, opt_df_1_greedy = run_pipeline_pandas_cityweek_budget(
        df_pd=marginal_emissions_pldf_run_1_greedy,
        policy=policy_1,
        solver=solver_1_greedy,          # set solver.solver_family to "milp" or "greedy"
        shuffle_high_usage_order=False,
        emit_optimised_rows=True,
        show_progress=True,
        workers=parallel_1.workers,  # number of parallel workers
        cityday_ca_order=cityday_ca_order_1_greedy,   # pass the precomputed order
        backend=parallel_1.method,  # pass the backend
)

metrics_df_1_greedy.to_parquet(os.path.join(results_directory, "metrics_config_1_greedy.parquet"))
moves_df_1_greedy.to_parquet(os.path.join(results_directory, "moves_config_1_greedy.parquet"))
opt_df_1_greedy.to_parquet(os.path.join(results_directory, "optimised_config_1_greedy.parquet"))


  groups = list(df.groupby(["city","week_start"], sort=True, group_keys=False))


DEBUG: entered day loop
DEBUG: city cap calculated
DEBUG: moved budget calculated
DEBUG: precomputed order determined


##### CONFIGURATION 1 - MILP SOLVER

In [None]:
solver_1_milp = SolverConfig(solver_family="milp")

# IMPLEMENTATION 1 - MILP

marginal_emissions_pldf_run_1_milp = day_and_slot_cols(marginal_emissions_pldf, slot_len_min=policy_1.behavioral.slot_length_minutes)

if policy_1.household_min is not None:
        marginal_emissions_pldf_run_1_milp = attach_household_floor(df=marginal_emissions_pldf_run_1_milp, cfg=policy_1.household_min)

cols_needed = [
        "ca_id","city","date","day","slot","value",
        "marginal_emissions_factor_grams_co2_per_kWh",
        "average_emissions_factor_grams_co2_per_kWh",
]
if "floor_kwh" in marginal_emissions_pldf_run_1_milp.columns:
        cols_needed.append("floor_kwh")

marginal_emissions_pldf_run_1_milp = marginal_emissions_pldf_run_1_milp.select(cols_needed).sort(["ca_id","day","slot"])

cityday_ca_order_1_milp = build_cityday_ca_order_map_pl(marginal_emissions_pldf_run_1_milp)

pct_tbl_1_milp = compute_city_day_percentiles(marginal_emissions_pldf_run_1_milp)

marginal_emissions_pldf_run_1_milp = marginal_emissions_pldf_run_1_milp.to_pandas()  # keeps dictionary cols efficient

metrics_df_1_milp, moves_df_1_milp, opt_df_1_milp = run_pipeline_pandas_cityweek_budget(
        df_pd=marginal_emissions_pldf_run_1_milp,
        policy=policy_1,
        solver=solver_1_milp,          # set solver.solver_family to "milp" or "greedy"
        shuffle_high_usage_order=False,
        emit_optimised_rows=True,
        show_progress=True,
        workers=parallel_1.workers,  # number of parallel workers
        cityday_ca_order=cityday_ca_order_1_milp,   # pass the precomputed order
        backend=parallel_1.method,  # pass the backend
)

metrics_df_1_milp.to_parquet(os.path.join(results_directory, "metrics_config_1_milp.parquet"))
moves_df_1_milp.to_parquet(os.path.join(results_directory, "moves_config_1_milp.parquet"))
opt_df_1_milp.to_parquet(os.path.join(results_directory, "optimised_config_1_milp.parquet"))


#### CONFIGURATION 2 - RESTRICTED

In [None]:
# CONFIGURATION 2:
peak_hours_reduction_limit_config_2 = PeakHoursReductionLimitConfig(
        peak_hours_reduction_percent_limit=25.0,
        peak_hours_reduction_scope="per_city",
        peak_hours_dict={
            "delhi": {
                "Mon": [8,9,10,11,12,20],
                "Tue": [9,10,11,20,21,22],
                "Wed": [9,10,11,20,21,22],
                "Thu": [9,10,11,20,21,22],
                "Fri": [8,9,10,11,20,21],
                "Sat": [9,10,11,12,13,20],
                "Sun": [10,11,12,13,14,20],
            },
            "mumbai": {
                "Mon": [8,9,10,11,12,20],
                "Tue": [9,10,11,20,21,22],
                "Wed": [9,10,11,20,21,22],
                "Thu": [9,10,11,20,21,22],
                "Fri": [8,9,10,11,20,21],
                "Sat": [9,10,11,12,13,20],
                "Sun": [10,11,12,13,14,20],
            },
        },
)

In [None]:
policy_2 = ShiftPolicy(
        behavioral=CustomerAdoptionBehavioralConfig(
                            customer_power_moves_per_day=1,
                            customer_power_moves_per_week=3,
                            timezone="Asia/Kolkata",
                            day_boundaries="00:00-24:00",
                            week_boundaries="Mon-Sun",
                            shift_hours_window=2.0,
                            slot_length_minutes=30,
                            peak_hours_reduction_limit_config=peak_hours_reduction_limit_config_2
                        ),
        regional_cap=RegionalLoadShiftingLimitConfig(
                            regional_load_shift_percent_limit=10.0,
                            regional_total_daily_average_load_kWh={"delhi": delhi_total_daily_average_load_kWh,
                                                                "mumbai": maharashtra_total_daily_average_load_kWh}),
        household_min=HouseholdMinimumConsumptionLimitConfig(
                            household_minimum_baseline_period="year",
                            household_minimum_baseline_type="average",
                            household_minimum_robust_max_percentile=95,
                            household_minimum_R_percent=10),
        spike_cap=ShiftWithoutSpikeLimitConfig(alpha_peak_cap_percent=25),
    )

In [None]:
parallel_2 = ParallelConfig(enabled=False, method="local", workers=1, show_progress=True)

##### CONFIGURATION 2 - GREEDY SOLVER

In [None]:

 # CONFIGURATION 2 - GREEDY

solver_2_greedy = SolverConfig(solver_family="greedy")

# IMPLEMENTATION 2 - GREEDY

marginal_emissions_pldf_run_2_greedy = day_and_slot_cols(marginal_emissions_pldf, slot_len_min=policy_2.behavioral.slot_length_minutes)

if policy_2.household_min is not None:
        marginal_emissions_pldf_run_2_greedy = attach_household_floor(df=marginal_emissions_pldf_run_2_greedy, cfg=policy_2.household_min)

cols_needed = [
        "ca_id","city","date","day","slot","value",
        "marginal_emissions_factor_grams_co2_per_kWh",
        "average_emissions_factor_grams_co2_per_kWh",
]
if "floor_kwh" in marginal_emissions_pldf_run_2_greedy.columns:
        cols_needed.append("floor_kwh")

marginal_emissions_pldf_run_2_greedy = marginal_emissions_pldf_run_2_greedy.select(cols_needed).sort(["ca_id","day","slot"])

cityday_ca_order_2_greedy = build_cityday_ca_order_map_pl(marginal_emissions_pldf_run_2_greedy)

pct_tbl_2_greedy = compute_city_day_percentiles(marginal_emissions_pldf_run_2_greedy)

marginal_emissions_pldf_run_2_greedy = marginal_emissions_pldf_run_2_greedy.to_pandas()  # keeps dictionary cols efficient

metrics_df_2_greedy, moves_df_2_greedy, opt_df_2_greedy = run_pipeline_pandas_cityweek_budget(
        df_pd=marginal_emissions_pldf_run_2_greedy,
        policy=policy_2,
        solver=solver_2_greedy,          # set solver.solver_family to "milp" or "greedy"
        shuffle_high_usage_order=False,
        emit_optimised_rows=True,
        show_progress=True,
        workers=parallel_2.workers,  # number of parallel workers
        cityday_ca_order=cityday_ca_order_2_greedy,   # pass the precomputed order
        backend=parallel_2.method,  # pass the backend
)

metrics_df_2_greedy.to_parquet(os.path.join(results_directory, "metrics_config_2_greedy.parquet"))
moves_df_2_greedy.to_parquet(os.path.join(results_directory, "moves_config_2_greedy.parquet"))
opt_df_2_greedy.to_parquet(os.path.join(results_directory, "optimised_config_2_greedy.parquet"))


##### CONFIGURATION 2 - MILP SOLVER

In [None]:
 # CONFIGURATION 2 - MILP

solver_2_milp = SolverConfig(solver_family="milp")

 # IMPLEMENTATION 2 - MILP

marginal_emissions_pldf_run_2_milp = day_and_slot_cols(marginal_emissions_pldf, slot_len_min=policy_2.behavioral.slot_length_minutes)
if policy_2.household_min is not None:
        marginal_emissions_pldf_run_2_milp = attach_household_floor(df=marginal_emissions_pldf_run_2_milp, cfg=policy_2.household_min)

cols_needed = [
        "ca_id","city","date","day","slot","value",
        "marginal_emissions_factor_grams_co2_per_kWh",
        "average_emissions_factor_grams_co2_per_kWh",
]
if "floor_kwh" in marginal_emissions_pldf_run_2_milp.columns:
        cols_needed.append("floor_kwh")

marginal_emissions_pldf_run_2_milp = marginal_emissions_pldf_run_2_milp.select(cols_needed).sort(["ca_id","day","slot"])

cityday_ca_order_2_milp = build_cityday_ca_order_map_pl(marginal_emissions_pldf_run_2_milp)

pct_tbl_2_milp = compute_city_day_percentiles(marginal_emissions_pldf_run_2_milp)

marginal_emissions_pldf_run_2_milp = marginal_emissions_pldf_run_2_milp.to_pandas()  # keeps dictionary cols efficient

metrics_df_2_milp, moves_df_2_milp, opt_df_2_milp = run_pipeline_pandas_cityweek_budget(
        df_pd=marginal_emissions_pldf_run_2_milp,
        policy=policy_2,
        solver=solver_2_milp,          # set solver.solver_family to "milp" or "greedy"
        shuffle_high_usage_order=False,
        emit_optimised_rows=True,
        show_progress=True,
        workers=parallel_2.workers,  # number of parallel workers
        cityday_ca_order=cityday_ca_order_2_milp,   # pass the precomputed order
        backend=parallel_2.method,  # pass the backend
    )

metrics_df_2_milp.to_parquet(os.path.join(results_directory, "metrics_config_2_milp.parquet"))
moves_df_2_milp.to_parquet(os.path.join(results_directory, "moves_config_2_milp.parquet"))
opt_df_2_milp.to_parquet(os.path.join(results_directory, "optimised_config_2_milp.parquet"))

In [None]:
pct_tbl_pd = pct_tbl.to_pandas()

# Ensure datetime alignment (no tz and normalized to midnight)
metrics_df["day"]  = pd.to_datetime(metrics_df["day"]).dt.normalize()
pct_tbl_pd["day"]  = pd.to_datetime(pct_tbl_pd["day"]).dt.normalize()

# Merge (left join keeps all metrics rows)
metrics_df = metrics_df.merge(
    pct_tbl_pd[["city","day","ca_id","day_kwh","pct"]],
    on=["city","day","ca_id"],
    how="left",
    validate="many_to_one"   # optional safety: each metrics row matches ≤1 percentile row
)

# If you prefer 0..100 instead of 0..1:
metrics_df["pct"] = (metrics_df["pct"] * 100.0).astype(float)

In [None]:
moves_df.head()

In [None]:
opt_df.head()

### LEGACY

#### HELPERS

In [None]:
def per_city_slot_baseline(df_city_day: pl.DataFrame) -> np.ndarray:
    """
    Return baseline per slot (city-level same-day baseline) for spike cap comparison.

    Parameters:
    ----------
    df_city_day : pl.DataFrame
        The DataFrame containing the relevant data for the city and day.

    Returns
    -------
    np.ndarray
        The baseline per slot (city-level same-day baseline) for spike cap comparison.
    """
    slot_count = 48
    base = np.zeros(slot_count, dtype=float)
    if df_city_day.height == 0:
        return base
    # baseline = sum of original usage by slot for that city-day
    grouped = df_city_day.group_by("slot").agg(pl.sum("value").alias("sum_kwh"))
    base[grouped["slot"].to_numpy().astype(int)] = grouped["sum_kwh"].to_numpy()
    return base


#### Pipeline

In [None]:
def run_pipeline_pandas(df_pd: pd.DataFrame, policy: ShiftPolicy, solver: SolverConfig,
                        workers: int = 4, show_progress: bool = True):
    # Make sure day is datetime64[ns, tz] or tz-naive consistently; keep as-is if already tz-aware
    # Group by ca_id, day, city
    gobj = df_pd.groupby(["ca_id","day","city"], sort=True, group_keys=False)

    # Build argument tuples for Pool
    policy_d = {
        "slot_length_minutes": policy.behavioral.slot_length_minutes,
        "shift_hours_window": policy.behavioral.shift_hours_window,
        "peak_hours_reduction_limit_config": policy.behavioral.peak_hours_reduction_limit_config,
    }
    solver_d = {
        "solver_family": solver.solver_family,
        "lp_solver": solver.lp_solver,
        "lp_solver_opts": solver.lp_solver_opts,
        "milp_solver": solver.milp_solver,
        "milp_solver_opts": solver.milp_solver_opts,
        "greedy_min_fraction_of_day_to_move": solver.greedy_min_fraction_of_day_to_move,
    }

    jobs = []
    for (ca_id, day_ts, city), sub in gobj:
        # we send a small dict (column -> list) to reduce pandas pickle overhead
        sub_dict = sub[["slot","value",
                        "marginal_emissions_factor_grams_co2_per_kWh",
                        "average_emissions_factor_grams_co2_per_kWh"] +
                       (["floor_kwh"] if "floor_kwh" in sub.columns else [])].to_dict("list")
        jobs.append(((ca_id, day_ts, city), sub_dict, policy_d, solver_d))

    metrics_rows, move_rows, opt_rows = [], [], []

    if workers and workers > 1:
        with mp.Pool(processes=workers) as pool:
            it = pool.imap(worker_solve_group, jobs, chunksize=16)
            for i, out in enumerate(it, 1):
                if show_progress and (i % max(1, len(jobs)//20) == 0 or i == len(jobs)):
                    print(f"[progress] {i}/{len(jobs)} groups processed ({i/len(jobs)*100:.1f}%)")
                if out is None:
                    continue
                mrow, mrows, orows = out
                metrics_rows.append(mrow); move_rows.extend(mrows); opt_rows.extend(orows)
    else:
        for i, args in enumerate(jobs, 1):
            out = worker_solve_group(args)
            if show_progress and (i % max(1, len(jobs)//20) == 0 or i == len(jobs)):
                print(f"[progress] {i}/{len(jobs)} groups processed ({i/len(jobs)*100:.1f}%)")
            if out is None:
                continue
            mrow, mrows, orows = out
            metrics_rows.append(mrow); move_rows.extend(mrows); opt_rows.extend(orows)

    metrics_df = pd.DataFrame(metrics_rows)
    moves_df   = pd.DataFrame(move_rows)
    opt_df     = pd.DataFrame(opt_rows)
    return metrics_df, moves_df, opt_df


In [None]:
def run_pipeline(
        df_raw: pl.DataFrame,
        policy: ShiftPolicy,
        solver: SolverConfig,
        parallel: Optional[ParallelConfig] = None,
        emit_optimised_rows: bool = False,
        progress: bool = False,
) -> Tuple[pl.DataFrame, pl.DataFrame, Optional[pl.DataFrame]]:
    """
    Run the load shifting optimization pipeline.

    Parameters:
    -----------
    df_raw: pl.DataFrame
        The raw input DataFrame containing the data to process.
    policy: ShiftPolicy
        The policy configuration to use for the shift optimization.
    solver: SolverConfig
        The solver configuration to use for the optimization.
    parallel: Optional[ParallelConfig], optional
        The parallelization configuration to use (default is None).
    emit_optimised_rows: bool, optional
        Whether to emit the optimized rows (default is False).
    progress: bool, optional
        Whether to show progress (default is False).

    Returns:
    --------
    Tuple[pl.DataFrame, pl.DataFrame, Optional[pl.DataFrame]]
        metrics_df: per customer-day totals & savings
        moves_df: per move (t->s) detailed records for auditing
        optimised_rows_df: 48-slot reconstructed series if requested
    """
    if parallel is None:
        parallel = ParallelConfig(enabled=False)

    df = day_and_slot_cols(df_raw, slot_len_min=policy.behavioral.slot_length_minutes)
    df = df.sort(["ca_id","day","slot"])

    # group iterator
    groups = list(df.group_by(["ca_id","day","city"], maintain_order=True))
    total_groups = len(groups)
    prog_cb = progress_printer(total_groups) if (progress or (parallel and parallel.show_progress)) else (lambda i: None)

    # choose backend
    W = hours_to_slots(policy.behavioral.shift_hours_window, policy.behavioral.slot_length_minutes)

    results_metrics: List[Dict[str, Any]] = []
    results_moves: List[Dict[str, Any]] = []
    results_opt_rows: List[Dict[str, Any]] = [] if emit_optimised_rows else None

    if parallel.enabled and parallel.method == "local" and parallel.workers and parallel.workers > 1:
        with mp.Pool(processes=parallel.workers) as pool:
            worker = partial(
                solve_one,
                solver=solver, W=W, policy=policy, emit_optimised_rows=emit_optimised_rows
            )
            for i, out in enumerate(pool.imap(worker, groups, chunksize=8), start=1):
                prog_cb(i)
                if out is None:
                    continue
                mrow, mrows, orows = out
                results_metrics.append(mrow)
                results_moves.extend(mrows)
                if emit_optimised_rows and orows:
                    results_opt_rows.extend(orows)
    else:
        for i, g in enumerate(groups, start=1):
            out = solve_one(sub_tuple=g, solver=solver, W=W, policy=policy, emit_optimised_rows=emit_optimised_rows)
            prog_cb(i)
            if out is None:
                continue
            mrow, mrows, orows = out
            results_metrics.append(mrow)
            results_moves.extend(mrows)
            if emit_optimised_rows and orows:
                results_opt_rows.extend(orows)

    metrics_df = pl.DataFrame(results_metrics) if results_metrics else pl.DataFrame()
    moves_df   = pl.DataFrame(results_moves)   if results_moves   else pl.DataFrame()
    opt_df     = pl.DataFrame(results_opt_rows) if (emit_optimised_rows and results_opt_rows) else None

    return metrics_df, moves_df, opt_df

#### Solvers

In [None]:
def solve_city_day(
    df_city_day: pd.DataFrame,
    policy: ShiftPolicy,
    solver: SolverConfig,
    *,
    shuffle_high_usage_order: bool = False,
    emit_optimised_rows: bool = True,
) -> Tuple[List[Dict[str, Any]], List[Dict[str, Any]], List[Dict[str, Any]]]:
    """
    Solve one (city, day) by iterating customers in a chosen order while enforcing:
      - Regional moved-kWh budget (P% × city_daily_avg)
      - Per-slot anti-spike cap (≤ (1+α%) × baseline)
      - Per-customer peak-hour comfort (slot/hour)
      - Household per-slot floors (if present)

    Parameters
    ----------
    df_city_day : pandas.DataFrame
        Filtered rows for a single (city, day).
    policy : ShiftPolicy
        Full policy with behavioral, regional and anti-spike configs.
    solver : SolverConfig
        Solver configuration.
    shuffle_high_usage_order : bool, optional
        If True, randomly shuffles the high-usage order after ranking.
    emit_optimised_rows : bool, optional
        If True, emit per-slot post-usage rows.

    Returns
    -------
    metrics_rows, move_rows, opt_rows : lists of dict
        Aggregated outputs for this city-day across customers.
    """
    assert {"ca_id","city","day","slot","value",
            "marginal_emissions_factor_grams_co2_per_kWh",
            "average_emissions_factor_grams_co2_per_kWh"}.issubset(df_city_day.columns)

    city = df_city_day["city"].iloc[0]
    day_ts = df_city_day["day"].iloc[0]
    slot_len = policy.behavioral.slot_length_minutes
    per_hour = 60 // slot_len
    T = 24 * per_hour

    # City-day baselines and caps for anti-spike
    base_city = cityday_baseline_by_slot(df_city_day, slot_len_min=slot_len)
    alpha = (policy.spike_cap.alpha_peak_cap_percent / 100.0) if policy.spike_cap else 0.25
    post_city_cap = (1.0 + alpha) * base_city
    post_city_used = np.zeros(T, dtype=float)  # accumulate post usage as we assign customers

    # City-day regional moved budget
    P_pct = (policy.regional_cap.regional_load_shift_percent_limit / 100.0) if policy.regional_cap else 0.10
    city_avgs = policy.regional_cap.regional_total_daily_average_load_kWh if policy.regional_cap else {}
    city_daily_avg = float(city_avgs.get(city, 0.0))
    moved_budget_remaining = P_pct * city_daily_avg

    # Customer order and optional shuffle
    order, pct_map, tot_map = rank_customers_by_daily_kwh(df_city_day)
    if shuffle_high_usage_order:
        rng = np.random.default_rng()
        order = list(order)  # avoid shuffling original view
        rng.shuffle(order)

    metrics_rows: List[Dict[str, Any]] = []
    move_rows: List[Dict[str, Any]] = []
    opt_rows: List[Dict[str, Any]] = []

    for ca_id in order:
        sub = df_city_day[df_city_day["ca_id"] == ca_id]
        arrs = build_arrays_for_group_pd(sub, slot_len_min=slot_len)
        if arrs is None:
            continue

        usage = arrs["usage"]; mef = arrs["mef"]; aef = arrs["aef"]
        floor_vec = arrs.get("floor", None)

        # Peak comfort for this customer (based on user's config)
        peak_cfg = policy.behavioral.peak_hours_reduction_limit_config
        peak_mask, peak_groups, Z, cap_mode = city_peak_targets_for_day(
            city, day_ts, peak_cfg, slot_len_min=slot_len
        )

        # Residual anti-spike capacity for this customer
        resid_cap = np.maximum(0.0, post_city_cap - post_city_used)

        # Remaining regional moved budget for this customer
        moved_kwh_cap = max(0.0, float(moved_budget_remaining))

        # Solve (LP shown; MILP/greedy could be wired similarly)
        usage_opt, flows = solve_lp(
            mef=mef, usage=usage, W_slots=int((policy.behavioral.shift_hours_window * 60)//slot_len),
            cfg=solver,
            peak_mask=peak_mask, peak_groups=peak_groups, Z=Z, cap_mode=cap_mode,
            floor_vec=floor_vec,
            dest_upper_bounds=resid_cap,
            moved_kwh_cap=moved_kwh_cap,
        )

        # Update city-day residuals
        post_city_used += usage_opt
        moved_kwh = float(np.maximum(np.float32(0.0), usage.astype(np.float32) - usage_opt.astype(np.float32)).sum())
        moved_budget_remaining -= moved_kwh
        if moved_budget_remaining < 0:
            moved_budget_remaining = 0.0  # guard for tiny negatives

        # Metrics
        base = compute_emissions_totals(usage, aef, mef)
        post = compute_emissions_totals(usage_opt, aef, mef)
        median_shift = weighted_median_shift_minutes(flows, slot_len)

        metrics_rows.append({
            "ca_id": ca_id, "city": city, "day": day_ts,
            "customer_daily_kwh": tot_map.get(ca_id, float(usage.sum())),
            "customer_daily_percentile": pct_map.get(ca_id, np.nan),
            "baseline_E_avg_g": base["E_avg_g"], "post_E_avg_g": post["E_avg_g"],
            "delta_E_avg_g": base["E_avg_g"] - post["E_avg_g"],
            "baseline_E_marg_g": base["E_marg_g"], "post_E_marg_g": post["E_marg_g"],
            "delta_E_marg_g": base["E_marg_g"] - post["E_marg_g"],
            "baseline_kwh": float(usage.sum()), "post_kwh": float(usage_opt.sum()),
            "moved_kwh": moved_kwh,
            "avg_shift_minutes_energy_weighted": float(
                (sum(abs(s - t) * val for (t, s, val) in flows) / max(moved_kwh, 1e-9)) * slot_len
            ) if flows else 0.0,
            "median_shift_minutes_energy_weighted": median_shift,
            "regional_budget_remaining_kwh_after": moved_budget_remaining,
        })

        # Moves table rows
        for (t, s, kwh) in flows:
            t_ts = day_ts + timedelta(minutes=int(t * slot_len))
            s_ts = day_ts + timedelta(minutes=int(s * slot_len))
            move_rows.append({
                "ca_id": ca_id, "city": city, "day": day_ts,
                "customer_daily_percentile": pct_map.get(ca_id, np.nan),
                "original_time": t_ts, "proposed_shift_time": s_ts,
                "delta_minutes": int((s - t) * slot_len),
                "shift_direction": "forward" if s > t else ("backward" if s < t else "stay"),
                "delta_kwh": float(kwh),
                "marginal_emissions_before_shift_grams_co2": grams_co2_from_kwh_grams_per_kwh(kwh, mef[t]),
                "marginal_emissions_after_shift_grams_co2": grams_co2_from_kwh_grams_per_kwh(kwh, mef[s]),
                "marginal_emissions_delta_grams_co2": grams_co2_from_kwh_grams_per_kwh(kwh, mef[t]) - grams_co2_from_kwh_grams_per_kwh(kwh, mef[s]),
                "average_emissions_before_shift_grams_co2": grams_co2_from_kwh_grams_per_kwh(kwh, aef[t]),
                "average_emissions_after_shift_grams_co2": grams_co2_from_kwh_grams_per_kwh(kwh, aef[s]),
                "average_emissions_delta_grams_co2": grams_co2_from_kwh_grams_per_kwh(kwh, aef[t]) - grams_co2_from_kwh_grams_per_kwh(kwh, aef[s]),
            })

        # Optional post-usage rows
        if emit_optimised_rows:
            for s in range(T):
                ts = day_ts + timedelta(minutes=int(s * slot_len))
                opt_rows.append({
                    "ca_id": ca_id, "city": city, "day": day_ts,
                    "slot": s, "date": ts, "optimised_value": float(usage_opt[s]),
                })

    return metrics_rows, move_rows, opt_rows


In [None]:
def worker_solve_group_one_move(args):
    (ca_id, day_ts, city), sub_df, policy_d, solver_d, caps = args
    # sub_df is dict-of-lists to avoid pandas pickling overhead
    sub = pd.DataFrame(sub_df)

    arrs = build_arrays_for_group_pd(sub, slot_len_min=policy_d["slot_length_minutes"])
    if arrs is None:
        return None
    usage, mef, aef = arrs["usage"], arrs["mef"], arrs["aef"]
    floor = arrs.get("floor")

    slot_len = policy_d["slot_length_minutes"]

    peak_cfg = policy_d.get("peak_hours_reduction_limit_config")
    peak_mask, peak_groups, Z = city_peak_targets_for_day(city, day_ts, peak_cfg, slot_len_min=slot_len)

    # caps injected by orchestrator
    dest_ub = caps["dest_upper_bounds"]           # np.ndarray for this (city, day)
    reg_rem = caps["remaining_regional_budget"]   # float remaining for this (city, day)

    W = int((policy_d["shift_hours_window"] * 60) // slot_len)

    family = solver_d["solver_family"]
    if family == "greedy_one":
        usage_opt, flows = solve_one_move_greedy(
            mef, usage, W,
            slot_len_min=slot_len,
            floor_vec=floor,
            peak_mask=peak_mask,
            peak_groups=peak_groups,
            Z=Z,
            dest_upper_bounds=dest_ub,
            remaining_regional_budget_kwh=reg_rem,
        )
    elif family == "milp_one":
        usage_opt, flows = solve_one_move_milp(
            mef, usage, W,
            floor_vec=floor,
            peak_groups=peak_groups,
            Z=Z,
            slot_len_min=slot_len,
            dest_upper_bounds=dest_ub,
            remaining_regional_budget_kwh=reg_rem,
            milp_solver=solver_d.get("milp_solver","cbc"),
            milp_opts=solver_d.get("milp_solver_opts"),
        )
    else:
        raise ValueError("Use 'greedy_one' or 'milp_one' to enforce a single move.")

    # build metrics / rows as you already do...
    # IMPORTANT: return also 'moved_kwh' so the orchestrator can decrement the city-day budget.
    base = compute_emissions_totals(usage, aef, mef)
    post = compute_emissions_totals(usage_opt, aef, mef)
    moved_kwh = float(np.maximum(0.0, usage - usage_opt).sum())
    # ... construct rows as in your existing worker ...
    return metrics_row, rows, opt_rows, moved_kwh


In [None]:
def solve_one_move_greedy(
    mef: np.ndarray,
    usage: np.ndarray,
    W_slots: int,
    *,
    slot_len_min: int = 30,
    floor_vec: Optional[np.ndarray] = None,          # per-slot min at source (kWh)
    peak_mask: Optional[np.ndarray] = None,          # destination peak slots (boolean)
    peak_groups: Optional[List[List[int]]] = None,   # hour grouping (for source cap)
    Z: Optional[float] = None,                       # fraction (0..1)
    dest_upper_bounds: Optional[np.ndarray] = None,  # city-day anti-spike caps (kWh)
    remaining_regional_budget_kwh: Optional[float] = None,  # city-day remaining P% budget
) -> Tuple[np.ndarray, List[Tuple[int,int,float]]]:
    """
    Make exactly ONE move (t->s) that maximizes emissions reduction subject to caps.

    Returns
    -------
    usage_opt : np.ndarray
        New usage after the single move (or unchanged if no feasible positive-gain move).
    flows : list[(t,s,kwh)]
        Empty if no beneficial move exists, else one tuple for the executed move.
    """
    T = len(usage)
    usage_opt = usage.astype(float).copy()

    slots = range(T)
    best_gain = 0.0
    best_pair = None
    best_amt  = 0.0

    # helper: hour group for source t (for 30-min default, two slots per hour)
    slots_per_hour = 60 // slot_len_min

    def hour_group_of(t: int) -> List[int]:
        h = t // slots_per_hour
        start = h * slots_per_hour
        return list(range(start, start + slots_per_hour))

    # consider all allowed pairs within the window
    for t in slots:
        if usage[t] <= 1e-12:
            continue
        s0, s1 = max(0, t - W_slots), min(T, t + W_slots + 1)
        for s in range(s0, s1):
            if s == t:
                continue
            # emissions gain per kWh if we move from t to s
            delta_per_kwh = float(mef[t] - mef[s])
            if delta_per_kwh <= 1e-12:
                continue  # no environmental benefit

            # upper bound 1: source availability after floor
            src_floor = float(floor_vec[t]) if (floor_vec is not None) else 0.0
            ub_src = max(0.0, usage[t] - src_floor)

            if ub_src <= 1e-12:
                continue

            # upper bound 2: destination anti-spike remaining capacity
            if dest_upper_bounds is not None and np.isfinite(dest_upper_bounds[s]):
                # current post-usage at s (before move) = usage[s]
                ub_dest = max(0.0, float(dest_upper_bounds[s] - usage[s]))
            else:
                ub_dest = float("inf")

            if ub_dest <= 1e-12:
                continue

            # upper bound 3: peak-hour *source* comfort cap (per hour)
            if Z is not None and peak_groups is not None:
                grp = hour_group_of(t)
                base_hour = float(usage[grp].sum())  # base per-customer usage in that hour
                ub_peak = Z * base_hour
            else:
                ub_peak = float("inf")

            # upper bound 4: remaining regional city-day budget
            ub_reg = float(remaining_regional_budget_kwh) if (remaining_regional_budget_kwh is not None) else float("inf")

            # total feasible
            m = min(ub_src, ub_dest, ub_peak, ub_reg)
            if m <= 1e-12:
                continue

            gain = delta_per_kwh * m
            if gain > best_gain:
                best_gain = gain
                best_pair = (t, s)
                best_amt  = m

    if best_pair is None:
        return usage_opt, []  # no beneficial move

    t, s = best_pair
    m = best_amt

    usage_opt[t] -= m
    usage_opt[s] += m

    return usage_opt, [(t, s, float(m))]


In [None]:
def worker_solve_group(args):
    """
    args: (key_tuple, sub_df_dict, policy_dict, solver_dict)
    To keep things Pool-friendly, pass simple dicts/tuples, not complex objects.
    """
    (ca_id, day_ts, city), sub_df, policy_d, solver_d = args

    # reconstruct small pandas frame for the group
    sub = pd.DataFrame(sub_df)

    # arrays
    arrs = build_arrays_for_group_pd(sub, slot_len_min=policy_d["slot_length_minutes"])
    if arrs is None:
        return None
    usage, mef, aef = arrs["usage"], arrs["mef"], arrs["aef"]
    floor = arrs.get("floor")

    # peak comfort mask (per city, per customer)
    peak_cfg = policy_d.get("peak_hours_reduction_limit_config")
    slot_len = policy_d["slot_length_minutes"]
    peak_mask, Z = city_peak_targets_for_day()(city, day_ts, peak_cfg, slot_len_min=slot_len)
    base_peak_kwh = float(usage[peak_mask].sum()) if (peak_mask is not None) else None

    # window in slots
    W = int((policy_d["shift_hours_window"] * 60) // slot_len)

    # choose solver
    solver_family = solver_d["solver_family"]
    if solver_family == "lp":
        usage_opt, flows = solve_lp(mef, usage, W,
                                    cfg=SolverConfig(solver_family="lp",
                                                     lp_solver=solver_d["lp_solver"],
                                                     lp_solver_opts=solver_d.get("lp_solver_opts")),
                                    peak_mask=peak_mask, Z=Z, base_peak_kwh=base_peak_kwh)
    elif solver_family == "milp":
        usage_opt, flows = solve_milp(mef, usage, W,
                                      cfg=SolverConfig(solver_family="milp",
                                                       milp_solver=solver_d["milp_solver"],
                                                       milp_solver_opts=solver_d.get("milp_solver_opts")),
                                      peak_mask=peak_mask, Z=Z, base_peak_kwh=base_peak_kwh,
                                      floor=floor)
    else:
        usage_opt, flows = solve_greedy(mef, usage, W,
                                        cfg=SolverConfig(solver_family="greedy",
                                                         greedy_min_fraction_of_day_to_move=solver_d.get("greedy_min_fraction_of_day_to_move")))

    # metrics
    base = compute_emissions_totals(usage, aef, mef)
    post = compute_emissions_totals(usage_opt, aef, mef)
    moved_kwh = float(np.maximum(0.0, usage - usage_opt).sum())
    median_shift_minutes = weighted_median_shift_minutes(flows, slot_len)

    metrics_row = {
        "ca_id": ca_id, "city": city, "day": day_ts,
        "baseline_E_avg_g": base["E_avg_g"], "post_E_avg_g": post["E_avg_g"],
        "delta_E_avg_g": base["E_avg_g"] - post["E_avg_g"],
        "baseline_E_marg_g": base["E_marg_g"], "post_E_marg_g": post["E_marg_g"],
        "delta_E_marg_g": base["E_marg_g"] - post["E_marg_g"],
        "baseline_kwh": float(usage.sum()),
        "post_kwh": float(usage_opt.sum()),
        "moved_kwh": moved_kwh,
        "avg_shift_minutes_energy_weighted": float(
            (sum(abs(s - t) * val for (t, s, val) in flows) / max(moved_kwh, 1e-9)) * slot_len
        ) if flows else 0.0,
        "median_shift_minutes_energy_weighted": median_shift_minutes,
    }

    # moves table rows
    rows = []
    for (t, s, kwh) in flows:
        t_ts = day_ts + timedelta(minutes=int(t * slot_len))
        s_ts = day_ts + timedelta(minutes=int(s * slot_len))
        dirn = "forward" if s > t else ("backward" if s < t else "stay")
        g_marg_before = grams_co2_from_kwh_grams_per_kwh(kwh, mef[t])
        g_marg_after  = grams_co2_from_kwh_grams_per_kwh(kwh, mef[s])
        g_avg_before  = grams_co2_from_kwh_grams_per_kwh(kwh, aef[t])
        g_avg_after   = grams_co2_from_kwh_grams_per_kwh(kwh, aef[s])
        rows.append({
            "ca_id": ca_id, "city": city, "day": day_ts,
            "original_time": t_ts, "proposed_shift_time": s_ts,
            "delta_minutes": int((s - t) * slot_len), "shift_direction": dirn,
            "delta_kwh": float(kwh),
            "marginal_emissions_before_shift_grams_co2": g_marg_before,
            "marginal_emissions_after_shift_grams_co2": g_marg_after,
            "marginal_emissions_delta_grams_co2": g_marg_before - g_marg_after,
            "average_emissions_before_shift_grams_co2": g_avg_before,
            "average_emissions_after_shift_grams_co2": g_avg_after,
            "average_emissions_delta_grams_co2": g_avg_before - g_avg_after,
        })

    # optimised 48-slot rows
    opt_rows = []
    for s in range(len(usage_opt)):
        ts = day_ts + timedelta(minutes=int(s * slot_len))
        opt_rows.append({
            "ca_id": ca_id, "city": city, "day": day_ts,
            "slot": s, "date": ts, "optimised_value": float(usage_opt[s]),
        })

    return metrics_row, rows, opt_rows


In [None]:
def solve_one(
    sub_tuple: Tuple,
    solver: SolverConfig,
    W: float,
    policy: ShiftPolicy,
    emit_optimised_rows: bool,
) -> Optional[Tuple[Dict[str, float], List[Dict[str, float]], List[Dict[str, float]]]]:
    _, subdf = sub_tuple  # (key, frame)
    ca_id = subdf["ca_id"][0]
    city  = subdf["city"][0]
    day   = subdf["day"][0]

    arrays = build_arrays_for_group(subdf.select([
        "slot", "value",
        "marginal_emissions_factor_grams_co2_per_kWh",
        "average_emissions_factor_grams_co2_per_kWh"
    ]))
    if not arrays:
        return None

    usage, mef, aef = arrays["usage"], arrays["mef"], arrays["aef"]

    # Optional household per-slot floor vector
    if "floor_kwh" in subdf.columns:
        floor_vec = np.zeros(len(usage), dtype=float)
        for s, fk in zip(subdf["slot"].to_numpy().astype(int), subdf["floor_kwh"].to_numpy()):
            floor_vec[s] = float(max(0.0, fk))
    else:
        floor_vec = None

    # Peak-hour mask/groups from user's full-hour inputs (per city)
    peak_cfg = policy.behavioral.peak_hours_reduction_limit_config
    peak_mask, peak_groups, Z = city_peak_targets_for_day(
        city, day, peak_cfg, slot_len_min=policy.behavioral.slot_length_minutes
    )
    cap_mode = peak_cfg.cap_mode if peak_cfg else "slot"

    # Solve
    if solver.solver_family == "lp":
        usage_opt, flows = solve_lp(
            mef, usage, W, solver,
            peak_mask=peak_mask, peak_groups=peak_groups, Z=Z,
            cap_mode=cap_mode, floor_vec=floor_vec
        )
    elif solver.solver_family == "milp":
        usage_opt, flows = solve_milp(
            mef, usage, W, solver,
            peak_mask=peak_mask, peak_groups=peak_groups, Z=Z,
            cap_mode=cap_mode, floor_vec=floor_vec
        )
    elif solver.solver_family == "greedy":
        usage_opt, flows = solve_greedy(mef, usage, W, solver)
    else:
        raise ValueError("Unknown solver family.")

    # Metrics (per day)
    base = compute_emissions_totals(usage, aef, mef)
    post = compute_emissions_totals(usage_opt, aef, mef)
    moved_kwh = float(np.maximum(0.0, usage - usage_opt).sum())
    median_shift_minutes = weighted_median_shift_minutes(flows, policy.behavioral.slot_length_minutes)

    metrics_row = {
            "ca_id": ca_id,
            "city": city,
            "day": day,
            "baseline_E_avg_g": base["E_avg_g"],
            "post_E_avg_g": post["E_avg_g"],
            "delta_E_avg_g": base["E_avg_g"] - post["E_avg_g"],
            "baseline_E_marg_g": base["E_marg_g"],
            "post_E_marg_g": post["E_marg_g"],
            "delta_E_marg_g": base["E_marg_g"] - post["E_marg_g"],
            "baseline_kwh": float(usage.sum()),
            "post_kwh": float(usage_opt.sum()),
            "moved_kwh": moved_kwh,
            "avg_shift_minutes_energy_weighted": float(
                (sum(abs(s - t) * val for (t, s, val) in flows) / max(moved_kwh, 1e-9)) * policy.behavioral.slot_length_minutes
            ) if flows else 0.0,
            "median_shift_minutes_energy_weighted": median_shift_minutes,

        }

    # Moves table rows
    move_rows = flows_table(ca_id, city, day, flows, mef, aef, slot_len_min=policy.behavioral.slot_length_minutes)

    # Optimised 48-slot rows
    opt_rows = None
    if emit_optimised_rows:
        rows = []
        for s in range(len(usage_opt)):
            ts = day + timedelta(minutes=int(s * policy.behavioral.slot_length_minutes))
            rows.append({
                    "ca_id": ca_id,
                    "city": city,
                    "day": day,
                    "slot": s,
                    "date": ts,
                    "optimised_value": float(usage_opt[s]),
            })
        opt_rows = rows

    return (metrics_row, move_rows, opt_rows)

In [None]:
def solve_one_move_milp(
    mef: np.ndarray,
    usage: np.ndarray,
    W_slots: int,
    *,
    floor_vec: Optional[np.ndarray] = None,
    peak_groups: Optional[List[List[int]]] = None,
    Z: Optional[float] = None,
    slot_len_min: int = 30,
    dest_upper_bounds: Optional[np.ndarray] = None,
    remaining_regional_budget_kwh: Optional[float] = None,
    milp_solver: str = "cbc",
    milp_opts: Optional[Dict[str, Any]] = None,
) -> Tuple[np.ndarray, List[Tuple[int,int,float]]]:
    assert pyo is not None
    T = len(usage)
    slots_per_hour = 60 // slot_len_min

    # Build allowed pairs and one big index
    pairs = []
    for t in range(T):
        s0, s1 = max(0, t - W_slots), min(T, t + W_slots + 1)
        for s in range(s0, s1):
            if s != t:
                pairs.append((t, s))
    P = len(pairs)

    m = pyo.ConcreteModel()
    m.P = pyo.RangeSet(0, P-1)

    # decision: pick at most one pair
    m.z = pyo.Var(m.P, domain=pyo.Binary)
    m.y = pyo.Var(m.P, domain=pyo.NonNegativeReals)  # moved kWh

    # Big-M bounds on y if pair selected
    # M_{t,s} from per-pair feasibility (same bounds as greedy)
    M = []
    for i, (t, s) in enumerate(pairs):
        src_floor = float(floor_vec[t]) if (floor_vec is not None) else 0.0
        ub_src = max(0.0, usage[t] - src_floor)

        if dest_upper_bounds is not None and np.isfinite(dest_upper_bounds[s]):
            ub_dest = max(0.0, float(dest_upper_bounds[s] - usage[s]))
        else:
            ub_dest = float("inf")

        if Z is not None and peak_groups is not None:
            h = t // slots_per_hour
            grp = list(range(h*slots_per_hour, (h+1)*slots_per_hour))
            base_hour = float(usage[grp].sum())
            ub_peak = Z * base_hour
        else:
            ub_peak = float("inf")

        ub_reg = float(remaining_regional_budget_kwh) if (remaining_regional_budget_kwh is not None) else float("inf")
        M_i = min(ub_src, ub_dest, ub_peak, ub_reg)
        if not np.isfinite(M_i) or M_i < 0:
            M_i = 0.0
        M.append(M_i)

    # y_i <= M_i * z_i
    m.bigM = pyo.ConstraintList()
    for i in range(P):
        m.bigM.add(m.y[i] <= M[i] * m.z[i])

    # at most one pair
    m.one = pyo.Constraint(expr=sum(m.z[i] for i in m.P) <= 1)

    # objective: maximize emissions reduction
    # gain_i = (mef[t]-mef[s]) * y_i
    gains = []
    for i, (t, s) in enumerate(pairs):
        gains.append((float(mef[t] - mef[s]), i))
    m.obj = pyo.Objective(
        expr=sum(max(0.0, g) * m.y[i] for (g, i) in gains),
        sense=pyo.maximize
    )

    solver = pyo.SolverFactory(milp_solver or "cbc")
    if milp_opts:
        for k, v in milp_opts.items():
            solver.options[k] = v
    solver.solve(m, tee=False)

    # build result
    usage_opt = usage.astype(float).copy()
    flows = []
    for i in range(P):
        yi = pyo.value(m.y[i])
        zi = pyo.value(m.z[i])
        if zi and yi and yi > 1e-12:
            t, s = pairs[i]
            usage_opt[t] -= yi
            usage_opt[s] += yi
            flows.append((t, s, float(yi)))
            break  # only one move
    return usage_opt, flows


#### LP

In [None]:
def solve_lp(
    mef: np.ndarray,
    usage: np.ndarray,
    W_slots: int,
    cfg: SolverConfig,
    *,
    peak_mask: Optional[np.ndarray] = None,
    peak_groups: Optional[List[List[int]]] = None,
    Z: Optional[float] = None,
    cap_mode: Literal["slot", "hour"] = "slot",
    floor_vec: Optional[np.ndarray] = None,
    dest_upper_bounds: Optional[np.ndarray] = None,
    moved_kwh_cap: Optional[float] = None,
) -> Tuple[np.ndarray, List[Tuple[int, int, float]]]:
    """
    Linear program: minimise sum_s post_usage[s] * MEF[s] with constraints.

    Parameters
    ----------
    mef, usage, W_slots, cfg : see pipeline docs
    peak_mask : np.ndarray, optional
        Boolean vector of destination peak slots.
    peak_groups : List[List[int]], optional
        Hour groups for 'hour' mode peak comfort.
    Z : float, optional
        Peak comfort reduction cap (e.g., 0.30).
    cap_mode : {"slot","hour"}, optional
        Apply peak comfort per slot or per hour.
    floor_vec : np.ndarray, optional
        Per-slot household floors (kWh).
    dest_upper_bounds : np.ndarray, optional
        Per-destination max kWh for THIS customer (city anti-spike residuals).
    moved_kwh_cap : float, optional
        City-day remaining budget for kWh moved by THIS customer.

    Returns
    -------
    usage_opt : np.ndarray
        Optimised post-usage (kWh) per slot for this customer.
    flows : List[Tuple[int,int,float]]
        Non-zero flows (t -> s, kWh).
    """
    assert cp is not None
    T = len(mef)
    pairs, by_src, by_dst = cached_pairs(T, W_slots)
    P = len(pairs)

    y = cp.Variable(P, nonneg=True)
    cost = cp.sum(cp.multiply(y, np.array([mef[s] for (_, s) in pairs], dtype=float)))

    cons = []

    # Supply conservation: for each source slot t, sum_s y_{t->s} = usage[t]
    for t in range(T):
        idx = by_src[t]
        cons.append(cp.sum(y[idx]) == float(usage[t]))

    # Household per-slot minimum floors
    if floor_vec is not None:
        for s in range(T):
            if floor_vec[s] > 0:
                idx = by_dst[s]
                if idx.size:
                    cons.append(cp.sum(y[idx]) >= float(floor_vec[s]))

    # Peak-hour comfort: do not reduce more than Z (per customer)
    if peak_mask is not None and Z is not None:
        if cap_mode == "slot":
            for s in np.where(peak_mask)[0]:
                base_s = float(usage[s])
                if base_s > 1e-12:
                    idx = by_dst[s]
                    if idx.size:
                        cons.append(cp.sum(y[idx]) >= (1.0 - Z) * base_s)
        else:  # "hour"
            if peak_groups:
                for grp in peak_groups:
                    base_h = float(usage[grp].sum())
                    if base_h > 1e-12:
                        idxs = np.unique(np.concatenate([by_dst[s] for s in grp if by_dst[s].size]))
                        if idxs.size:
                            cons.append(cp.sum(y[idxs]) >= (1.0 - Z) * base_h)

    # Anti-spike per-slot residual capacity (city-day)
    if dest_upper_bounds is not None:
        for s in range(T):
            if np.isfinite(dest_upper_bounds[s]):
                idx = by_dst[s]
                if idx.size:
                    cons.append(cp.sum(y[idx]) <= float(dest_upper_bounds[s]))

    # Regional moved-kWh cap for this customer (city-day remaining budget)
    if moved_kwh_cap is not None:
        stay = cp.sum([y[i] for i, (t, s) in enumerate(pairs) if t == s])
        total = float(usage.sum())
        cons.append(total - stay <= float(moved_kwh_cap))

    prob = cp.Problem(cp.Minimize(cost), cons)
    prob.solve(solver=cfg.lp_solver, **(cfg.lp_solver_opts or {}))

    yv = np.asarray(y.value).reshape(-1)
    usage_opt = np.zeros(T, dtype=float)
    flows: List[Tuple[int, int, float]] = []
    for i, val in enumerate(yv):
        if val <= 1e-12:
            continue
        t, s = pairs[i]
        usage_opt[s] += float(val)
        flows.append((t, s, float(val)))
    return usage_opt, flows
