# Compare original vs current DisruptCNN segment settings

This notebook compares segment and label boundaries for a set of shots under:

- **Original setting**: Shot list from `d3d_*_ecei.final.txt`, **flattop_only=True** — segment start = `t_flat_start`, segment end = `tend = max(tdisrupt, min(tlast, t_flat_stop))`; shots with NaN `t_flat_start` are dropped.
- **Current setting**: Same shot list, **flattop_only=False** — segment from `tstart` to `tlast` (full shot); no flattop window.

We compare: `start_idx`, `stop_idx`, `disrupt_idx`, and segment length (samples and ms) on a number of shots.

In [None]:
from pathlib import Path
import numpy as np
import pandas as pd

from disruptcnn.dataset_original import segment_info_for_comparison

# Shot list: run from repo root (soenre) or from disruptcnn/
SHOTS_DIR = Path("disruptcnn/shots")
if not SHOTS_DIR.exists():
    SHOTS_DIR = Path("shots")
DISRUPT_LIST = SHOTS_DIR / "d3d_disrupt_ecei.final.txt"
assert DISRUPT_LIST.exists(), f"Shot list not found: {DISRUPT_LIST}"

## Load segment info for both settings

We use the same shot list and compute segment bounds with **flattop_only=True** (original) and **flattop_only=False** (current).

In [None]:
original = segment_info_for_comparison(str(DISRUPT_LIST), flattop_only=True, snr_min_threshold=None)
current = segment_info_for_comparison(str(DISRUPT_LIST), flattop_only=False, snr_min_threshold=None)

# Align by shot: original drops NaN flattop shots, so we index current by shot for comparison
shots_original = {r["shot"]: r for r in original}
shots_current = {r["shot"]: r for r in current}
shots_common = sorted(set(shots_original) & set(shots_current))
print(f"Shots in disrupt list (total): {len(current)}")
print(f"Shots with flattop (original): {len(original)}")
print(f"Shots in both (for comparison): {len(shots_common)}")

## Comparison table on first N shots

For each shot we show original (flattop) vs current (full segment): start_idx, stop_idx, disrupt_idx, segment length (samples and ms).

In [None]:
N_COMPARE = 20  # number of shots to compare (use first N with flattop)
compare_shots = [s for s in shots_common if s in shots_original][:N_COMPARE]

rows = []
for shot in compare_shots:
    o = shots_original[shot]
    c = shots_current[shot]
    rows.append({
        "shot": shot,
        "tstart": o["tstart"],
        "tlast": o["tlast"],
        "t_flat_start": o["t_flat_start"],
        "t_flat_stop": o["t_flat_stop"],
        "tdisrupt": o["tdisrupt"],
        "dt_ms": o["dt"],
        "orig_start_idx": o["start_idx"],
        "orig_stop_idx": o["stop_idx"],
        "orig_disrupt_idx": o["disrupt_idx"],
        "orig_len_samples": o["segment_length_samples"],
        "orig_len_ms": round(o["segment_length_ms"], 2),
        "curr_start_idx": c["start_idx"],
        "curr_stop_idx": c["stop_idx"],
        "curr_disrupt_idx": c["disrupt_idx"],
        "curr_len_samples": c["segment_length_samples"],
        "curr_len_ms": round(c["segment_length_ms"], 2),
        "delta_len_samples": c["segment_length_samples"] - o["segment_length_samples"],
        "delta_len_ms": round(c["segment_length_ms"] - o["segment_length_ms"], 2),
    })

df = pd.DataFrame(rows)
pd.set_option("display.max_columns", None)
df

## Summary: segment length difference

Original (flattop) uses a shorter segment; current (full shot) uses the full tstart–tlast window. So segment length (samples and ms) is larger for current.

In [None]:
summary = df.agg({
    "orig_len_samples": ["min", "max", "mean"],
    "curr_len_samples": ["min", "max", "mean"],
    "orig_len_ms": ["min", "max", "mean"],
    "curr_len_ms": ["min", "max", "mean"],
    "delta_len_samples": ["mean", "sum"],
    "delta_len_ms": ["mean", "sum"],
}).round(2)
summary

## Disrupt index difference

`disrupt_idx` is the same formula in both (ceil((tdisrupt - 300 - tstart) / dt)); the value can differ because the *segment* start differs: in original, indices are relative to the same tstart, but the segment we use starts at `start_idx` (flattop). So within-segment position of disruption is different.

In [None]:
df["orig_disrupt_in_segment"] = df["orig_disrupt_idx"] - df["orig_start_idx"]
df["curr_disrupt_in_segment"] = df["curr_disrupt_idx"] - df["curr_start_idx"]
df[["shot", "orig_disrupt_idx", "curr_disrupt_idx", "orig_disrupt_in_segment", "curr_disrupt_in_segment"]].head(15)

## Shots without flattop (only in "current")

Shots with NaN `t_flat_start` are dropped in the original setting but appear in the current (full-segment) setting. We list a few.

In [None]:
shots_no_flat = sorted(set(shots_current) - set(shots_original))
print(f"Shots dropped in original (no flattop): {len(shots_no_flat)}")
if shots_no_flat:
    no_flat_info = [shots_current[s] for s in shots_no_flat[:10]]
    pd.DataFrame(no_flat_info)[["shot", "tstart", "tlast", "tdisrupt", "segment_length_samples", "segment_length_ms"]].head(10)