# 06 — Train/Test Scenario Construction

In this notebook we define a reproducible set of *evaluation windows* for
testing imputation and forecasting under sensor blackouts.

The idea is:

- Use **realistic blackout lengths** inspired by the empirical blackout
  statistics (10–120 minutes and beyond).
- But select **time intervals where the underlying data are fully observed**
  and **we artificially hide** the readings during those windows.
- For each chosen blackout interval on a given detector, we evaluate:
  - **Imputation**: recovering the hidden readings *inside* the blackout.
  - **Forecasting**: predicting the first few steps *after* the blackout
    (here: +1, +3, +6 steps).

The final output is a manifest `evaluation_windows.parquet` that specifies
which detector/time windows belong to which evaluation scenario.

In [1]:
import numpy as np
import pandas as pd
from pathlib import Path

In [2]:
# ------------------------------------------------
# 1. Load cleaned panel + missingness matrix
# ------------------------------------------------

# Same clean panel as in earlier steps
try:
    wide = pd.read_parquet("data/seattle_loop_clean.parquet")
    print("Loaded clean panel from parquet.")
except Exception as e:
    print("Parquet load failed, falling back to pickle. Error was:")
    print(" ", e)
    wide = pd.read_pickle("data/seattle_loop_clean.pkl")
    print("Loaded clean panel from pickle.")

X = wide.to_numpy(dtype=np.float32)
timestamps = wide.index.to_numpy()
detectors = wide.columns.to_numpy()
T, D = X.shape

print("wide shape:", X.shape)
print("Time span:", wide.index.min(), "→", wide.index.max())

# Missingness matrix M[t, d] = 1 if missing, 0 if observed
M = np.load("data/m_t.npy")
assert M.shape == X.shape, "M and X must have same shape"
print("Loaded m_t.npy; fraction missing:", M.mean())

Loaded clean panel from parquet.
wide shape: (105120, 147)
Time span: 2015-01-01 00:00:00 → 2015-12-31 23:55:00
Loaded m_t.npy; fraction missing: 0.051772771513476014


In [3]:
# ------------------------------------------------
# 2. Helper: find contiguous True streaks
# ------------------------------------------------

def find_streaks(bool_array):
    """
    Given a 1D boolean array, return a list of (start, end) indices
    for contiguous True runs (inclusive).
    """
    out = []
    start = None
    for i, v in enumerate(bool_array):
        if v and start is None:
            start = i
        elif not v and start is not None:
            out.append((start, i - 1))
            start = None
    if start is not None:
        out.append((start, len(bool_array) - 1))
    return out


## 3. Design choices for evaluation windows

We want evaluation windows that:

1. **Use realistic blackout lengths**, inspired by the EDA:
   - Many blackouts last 30–120 minutes.
   - Some are longer (multi-hour outages).

2. **Have full ground truth**:
   - All readings for that detector must currently be **observed** (non-NaN)
     within the blackout interval and the post-blackout horizon.
   - We can then temporarily hide those values during evaluation and compare
     predictions against the original speeds.

3. Cover both **imputation** and **forecasting**:
   - **Imputation**: inside the blackout.
   - **Forecasting**: first few steps after blackout end, at horizons  
     $h \in \{1, 3, 6\}$ (i.e., $+5$, $+15$, $+30$ minutes).

We implement this by scanning for long, fully observed stretches of data for
each detector and carving out blackout windows inside them.


In [4]:
# ------------------------------------------------
# 3. Construct evaluation windows
# ------------------------------------------------

# Parameters (you can tweak these)
BLACKOUT_LENGTHS = [6, 12, 24]      # in steps: 30, 60, 120 minutes
FORECAST_HORIZONS = [1, 3, 6]       # steps after blackout_end
MAX_WINDOWS_PER_DET = 3             # avoid having too many windows per detector

max_h = max(FORECAST_HORIZONS)
all_rows = []
window_id = 0

# Global RNG so randomness is reproducible
rng = np.random.default_rng(111)

for d_idx, det in enumerate(detectors):
    # Boolean array: True if this detector is observed at time t
    obs = (M[:, d_idx] == 0)

    # Find contiguous fully-observed stretches, then shuffle their order
    obs_streaks = find_streaks(obs)
    if len(obs_streaks) == 0:
        continue
    obs_streaks = list(obs_streaks)
    obs_streaks = list(rng.permutation(obs_streaks))

    used_for_det = 0

    for (s, e) in obs_streaks:
        if used_for_det >= MAX_WINDOWS_PER_DET:
             break

        streak_len = e - s + 1
        # We need enough length to fit blackout + max forecast horizon
        min_required = min(BLACKOUT_LENGTHS) + max_h
        if streak_len < min_required:
            continue

        def pick_offset_mnar_weighted(X, d_idx, s, e, required, rng):
            """
            Pick blackout offset with higher probability when the *pre-blackout*
            regime is extreme/volatile (synthetic MNAR stress test).
            """
            # candidate blackout_start indices (must have at least one pre-step)
            starts = np.arange(s + 1, e - required + 2)  # inclusive
            if starts.size == 0:
                return None

            pre = starts - 1
            x_pre = X[pre, d_idx]
            # rolling variance proxy over last 12 steps (1 hour) using finite values only
            roll = np.zeros_like(x_pre, dtype=float)
            for i, t0 in enumerate(pre):
                w = X[max(0, t0-12):t0, d_idx]
                w = w[np.isfinite(w)]
                roll[i] = np.var(w) if w.size >= 3 else 0.0

            # weights: extreme speeds + volatility
            xz = (x_pre - np.nanmean(X[:, d_idx])) / (np.nanstd(X[:, d_idx]) + 1e-6)
            score = np.abs(xz) + 0.15 * np.sqrt(roll)
            score = np.nan_to_num(score, nan=0.0, posinf=0.0, neginf=0.0)
            w = np.exp(np.clip(score, 0.0, 6.0))
            w = w / (w.sum() + 1e-12)
            start = rng.choice(starts, p=w)
            offset = int(start - (s + 1))
            return offset


        for L in BLACKOUT_LENGTHS:
            required = L + max_h
            if streak_len < required:
                continue

            required = L + max_h  # already defined above
            available = (e - s + 1) - required   # how many positions we can slide
            if available < 0:
                continue

            # MNAR-weighted offset (synthetic stress test)
            offset = pick_offset_mnar_weighted(X, d_idx, s, e, required, rng)
            if offset is None:
                continue
            blackout_start = s + 1 + offset
            blackout_end = blackout_start + L - 1

            # Check again that we have room for max horizon
            if blackout_end + max_h > e:
                continue

            # Sanity check: all points we rely on (blackout + horizon) must be observed
            seg_ok = obs[blackout_start : blackout_end + max_h + 1].all()
            if not seg_ok:
                continue

            # We now have a valid evaluation window
            # 1) imputation window row
            all_rows.append({
                "window_id": window_id,
                "detector_id": det,
                "blackout_start": timestamps[blackout_start],
                "blackout_end": timestamps[blackout_end],
                "len_steps": L,               # blackout length in 5-min steps
                "test_type": "impute",
                "horizon_steps": np.nan,      # not used for imputation
                "scenario": "mnar_weighted",
            })

            # 2) forecasting rows for each horizon
            for h in FORECAST_HORIZONS:
                forecast_time = timestamps[blackout_end + h]
                all_rows.append({
                    "window_id": window_id,
                    "detector_id": det,
                    "blackout_start": timestamps[blackout_start],
                    "blackout_end": timestamps[blackout_end],
                    "len_steps": L,          
                    "test_type": "forecast",
                    "horizon_steps": h,
                    "scenario": "mnar_weighted",
                })

            window_id += 1
            used_for_det += 1

            if used_for_det >= MAX_WINDOWS_PER_DET:
                break  # stop after enough windows for this detector

print(f"Total evaluation windows (unique window_id): {window_id}")
print(f"Total manifest rows (impute + forecast): {len(all_rows)}")

Total evaluation windows (unique window_id): 441
Total manifest rows (impute + forecast): 1764


In [5]:
# ------------------------------------------------
# 4. Build manifest DataFrame and save
# ------------------------------------------------

eval_df = pd.DataFrame(all_rows)

print(eval_df.head())
print("\nSummary by test_type:")
print(eval_df["test_type"].value_counts())

Path("data").mkdir(exist_ok=True)
out_path = Path("data") / "evaluation_windows_mnar_weighted.parquet"
eval_df.to_parquet(out_path, engine="pyarrow", index=False)
print("\nSaved evaluation manifest to:", out_path.resolve())


   window_id detector_id      blackout_start        blackout_end  len_steps  \
0          0  005es15036 2015-06-01 16:55:00 2015-06-01 17:20:00          6   
1          0  005es15036 2015-06-01 16:55:00 2015-06-01 17:20:00          6   
2          0  005es15036 2015-06-01 16:55:00 2015-06-01 17:20:00          6   
3          0  005es15036 2015-06-01 16:55:00 2015-06-01 17:20:00          6   
4          1  005es15036 2015-07-11 07:55:00 2015-07-11 08:50:00         12   

  test_type  horizon_steps       scenario  
0    impute            NaN  mnar_weighted  
1  forecast            1.0  mnar_weighted  
2  forecast            3.0  mnar_weighted  
3  forecast            6.0  mnar_weighted  
4    impute            NaN  mnar_weighted  

Summary by test_type:
test_type
forecast    1323
impute       441
Name: count, dtype: int64

Saved evaluation manifest to: C:\Users\Dell\Downloads\Modeling-Information-Blackouts-in-MNAR-Time-Series-main\Modeling-Information-Blackouts-in-MNAR-Time-Series-main\d

## 5. Manifest format and usage

The file `data/evaluation_windows.parquet` has one row per evaluation
*scenario*, with the following columns:

- `window_id`:
  - Integer identifier of a blackout window.
  - All rows sharing the same `window_id` refer to the same blackout
    interval on a specific detector.

- `detector_id`:
  - String identifier of the loop detector (matches columns of `wide`).

- `blackout_start`:
  - Timestamp of the first time step inside the blackout (inclusive).

- `blackout_end`:
  - Timestamp of the last time step inside the blackout (inclusive).

- `test_type`:
  - `"impute"`:
    - Evaluate reconstruction of the hidden readings inside the blackout
      window `[blackout_start, blackout_end]`.
  - `"forecast"`:
    - Evaluate forecasting at a given horizon after the blackout (see below).

- `horizon_steps`:
  - For `test_type="impute"`: `0` (meaning inside-window evaluation).
  - For `test_type="forecast"`: number of 5-minute steps after
    `blackout_end` (here: `1`, `3`, or `6` steps, corresponding to +5,
    +15, +30 minutes).

### How to apply this in experiments

Given a row with (`detector_id`, `blackout_start`, `blackout_end`):

1. **Imputation** (`test_type = "impute"`):
   - Temporarily set `x[t, d]` to `NaN` for all time steps
     `t ∈ [blackout_start, blackout_end]` and that detector `d`.
   - Run the model and compare the predictions to the original speeds
     over that interval.

2. **Forecasting** (`test_type = "forecast"` with `horizon_steps = h > 0`):
   - Use only observations up to `blackout_end` (and possibly the blackout
     pattern as context).
   - Predict `x[blackout_end + h, d]` and compare to the actual value.

Because all windows are chosen from fully observed stretches of data, the
“ground truth” speeds are available both inside and after the blackout.
This makes the evaluation reproducible and directly comparable across
different models and inference schemes.
