# Extended Vetting Metrics: TOI-5807.01 (TIC 188646744)

This notebook focuses on the **opt-in extended vetting metrics** (V16–V21) added to `bittr-tess-vetter`.

Goals:

1. Run the baseline `preset="default"` vetting and the `preset="extended"` vetting on the same target
2. Inspect which extended checks run vs. skip (based on available inputs)
3. Compare the *additional metrics* produced by V16–V21 without introducing new decision thresholds

This notebook reuses the tutorial data directory and ephemeris from `04-real-candidate-validation.ipynb`.

## Setup

In [None]:
import csv
from pathlib import Path

import numpy as np

# All imports from the public API
from bittr_tess_vetter.api import (
    LightCurve,
    Ephemeris,
    Candidate,
    StellarParams,
    TPFStamp,
    vet_candidate,
)

from astropy.wcs import WCS

## Load Tutorial Data (Light Curve + TPF)

For speed and reproducibility, we load the pre-extracted light curve + a representative sector TPF stamp from the tutorial data directory.

If you want to download fresh data instead, see `04-real-candidate-validation.ipynb`.

In [None]:
TIC_ID = 188646744
SECTORS = [55, 75, 82, 83]
DATA_DIR = Path("data/tic188646744")


def load_sector_arrays(sector: int) -> tuple[np.ndarray, np.ndarray, np.ndarray]:
    path = DATA_DIR / f"sector{sector}_pdcsap.csv"

    time: list[float] = []
    flux: list[float] = []
    flux_err: list[float] = []
    quality: list[int] = []

    with path.open(newline="") as f:
        for line in f:
            if not line.startswith("#"):
                header = line
                break
        else:
            raise ValueError(f"Missing CSV header in {path}")

        reader = csv.DictReader([header] + f.readlines())
        for row in reader:
            time.append(float(row["time_btjd"]))
            flux.append(float(row["flux"]))
            flux_err.append(float(row["flux_err"]))
            quality.append(int(row["quality"]))

    t = np.asarray(time)
    f_arr = np.asarray(flux)
    e_arr = np.asarray(flux_err)
    q = np.asarray(quality)
    ok = q == 0
    return t[ok], f_arr[ok], e_arr[ok]


def stitch_lightcurves(sectors: list[int]) -> tuple[np.ndarray, np.ndarray, np.ndarray]:
    time_all, flux_all, flux_err_all = [], [], []
    for sector in sectors:
        t, f_arr, e_arr = load_sector_arrays(sector)
        time_all.append(t)
        flux_all.append(f_arr)
        flux_err_all.append(e_arr)

    time = np.concatenate(time_all)
    flux = np.concatenate(flux_all)
    flux_err = np.concatenate(flux_err_all)
    sort_idx = np.argsort(time)
    return time[sort_idx], flux[sort_idx], flux_err[sort_idx]


time, flux, flux_err = stitch_lightcurves(SECTORS)
lc = LightCurve(time=time, flux=flux, flux_err=flux_err)

print(f"Loaded {len(time):,} points from sectors {SECTORS}")
print(f"Time range: {time.min():.2f} - {time.max():.2f} BTJD")
print(f"Flux scatter (MAD): {np.median(np.abs(flux - np.median(flux))) * 1e6:.1f} ppm")


# Load one sector TPF stamp (includes an aperture mask)
tpf_path = DATA_DIR / "sector83_tpf.npz"
if tpf_path.exists():
    tpf_data = np.load(tpf_path, allow_pickle=True)
    wcs_header = tpf_data["wcs_header"].item()
    tpf_wcs = WCS(wcs_header)
    tpf_stamp = TPFStamp(
        time=tpf_data["time"],
        flux=tpf_data["flux"],
        flux_err=tpf_data["flux_err"],
        wcs=tpf_wcs,
        aperture_mask=tpf_data["aperture_mask"],
        quality=tpf_data["quality"],
    )
    print(
        f"Loaded TPF stamp: pixels={tpf_stamp.flux.shape[1:]} aperture_pixels={int(tpf_stamp.aperture_mask.sum())}"
    )
else:
    tpf_stamp = None
    print("TPF stamp missing; pixel-dependent checks will be skipped.")

## Candidate Ephemeris + Stellar Params

We use the same ephemeris parameters as the validation tutorial.

If you want to re-query ExoFOP or re-fit ephemeris, do that in `04-real-candidate-validation.ipynb` and paste updated values here.

In [None]:
# Ephemeris (days / BTJD / hours)
PERIOD_DAYS = 13.0177
T0_BTJD = 3401.2142
DURATION_HOURS = 3.42
DEPTH_PPM = 253.0

ephemeris = Ephemeris(
    period_days=PERIOD_DAYS,
    t0_btjd=T0_BTJD,
    duration_hours=DURATION_HOURS,
)
candidate = Candidate(ephemeris=ephemeris, depth_ppm=DEPTH_PPM)


# Stellar params used by duration-consistency checks and physical plausibility features
stellar = StellarParams(
    teff=6816,
    radius=1.650,
    mass=1.47,
    logg=4.17,
)

print("Candidate and stellar context configured.")

## Run Baseline Vetting (`preset="default"`)

This is the standard 15-check pipeline used by existing tutorials.

In [None]:
baseline = vet_candidate(
    lc,
    candidate,
    stellar=stellar,
    tpf=tpf_stamp,
    network=False,
    tic_id=TIC_ID,
    preset="default",
)

print(
    f"checks={len(baseline.results)} passed={baseline.n_passed} failed={baseline.n_failed} skipped={baseline.n_unknown}"
)
print([r.id for r in baseline.results])

## Run Extended Vetting (`preset="extended"`)

This runs the baseline checks plus additional metrics-only diagnostics (V16–V21).

Some extended checks may still return `skipped` depending on which inputs are present (for example, sector-consistency requires host-provided per-sector measurements).

In [None]:
extended = vet_candidate(
    lc,
    candidate,
    stellar=stellar,
    tpf=tpf_stamp,
    network=False,
    tic_id=TIC_ID,
    preset="extended",
)

print(
    f"checks={len(extended.results)} passed={extended.n_passed} failed={extended.n_failed} skipped={extended.n_unknown}"
)
print([r.id for r in extended.results])

## Compare Baseline vs Extended Results

This comparison is intentionally metrics-first:

- which checks are new
- which ones ran vs. skipped
- what new `details` keys appear

It does *not* introduce new hard pass/fail thresholds beyond what each check already reports.

In [None]:
def by_id(bundle):
    return {r.id: r for r in bundle.results}


b0 = by_id(baseline)
b1 = by_id(extended)

new_ids = sorted(set(b1.keys()) - set(b0.keys()))
shared_ids = sorted(set(b1.keys()) & set(b0.keys()))

print("New check IDs in extended:", new_ids)

print("\nExtended-only summary:")
for cid in new_ids:
    r = b1[cid]
    detail_keys = sorted((r.details or {}).keys())
    print(
        f"- {cid}: status={r.status} confidence={r.confidence} flags={r.flags} detail_keys={detail_keys[:12]}"
    )

print("\nBaseline checks that changed status/confidence between presets:")
for cid in shared_ids:
    r0, r1 = b0[cid], b1[cid]
    if (r0.status != r1.status) or (r0.confidence != r1.confidence):
        print(f"- {cid}: {r0.status}/{r0.confidence} -> {r1.status}/{r1.confidence}")

## Deep Dive: Inspect One Extended Check

Pick one of V16–V21 that ran and inspect its raw metrics.

If a check is skipped, check `flags` and `details` for the reason.

In [None]:
check_id = "V16"  # try V17/V18/V19/V20/V21
r = by_id(extended).get(check_id)
if r is None:
    raise ValueError(f"Missing {check_id} in extended bundle")

print(f"{r.id}: {r.name}")
print("status:", r.status)
print("confidence:", r.confidence)
print("flags:", r.flags)
print("details:")
for k, v in (r.details or {}).items():
    print(f"  {k}: {v}")

## Next: Utility Questions to Answer

Suggested, concrete follow-ups for deciding whether V16–V21 are worth running by default (without adding subjective thresholds):

- Do these checks surface *new metrics* that correlate with known false positives or systematics?
- Across a set of candidates, do extended metrics have stable behavior vs. sector selection / detrending?
- When a check is skipped, is the reason actionable (missing inputs) or confusing (API friction)?

If you want, we can extend this notebook to run on a small cohort of TOIs and produce a compact table of V16–V21 metrics for downstream modeling.