# V22: TRICERATOPS(+ ) false positive probability (FPP)

Runs TRICERATOPS(+ ) to estimate the probability that the observed signal is a planet vs. an eclipsing-binary scenario.

Notes:
- This step uses the **gated sectors (82+83)** and a **deterministic detrend** configuration selected in `35_gated_82_83_detrend_param_grid.ipynb`.
- The TRICERATOPS engine prints a lot of diagnostic text; we capture it to log files under `persistent_cache/tutorial_toi-5807-incremental/36_v22_triceratops_fpp/`.


In [None]:
from __future__ import annotations

import contextlib
import json
import sys
from pathlib import Path

tutorial_dir = Path('docs/tutorials/tutorial_toi-5807-incremental').resolve()
sys.path.insert(0, str(tutorial_dir))

import toi5807_shared as sh

btv = sh.btv

step_id = '36_v22_triceratops_fpp'
run_out_dir, docs_out_dir = sh.artifact_dirs(step_id=step_id)

sectors = [82, 83]

# Detrend config chosen for stability (see V35)
bin_hours = 6.0
duration_buffer_factor = 2.0
sigma_clip = 5.0

# Use a step-specific cache directory so we can safely cache detrended light curves
# under the default flux_type='pdcsap' expected by the API.
cache_dir = run_out_dir / 'cache_pdcsap_detrended'

base = sh.load_dataset()
detrended_by_sector: dict[int, btv.LightCurve] = {}
for sector in sectors:
    detrended_by_sector[int(sector)] = sh.detrend_transit_masked_bin_median(
        base.lc_by_sector[int(sector)],
        bin_hours=bin_hours,
        duration_buffer_factor=duration_buffer_factor,
        sigma_clip=sigma_clip,
    )

from bittr_tess_vetter.api.datasets import LocalDataset

ds_det = LocalDataset(
    schema_version=base.schema_version,
    root=base.root,
    lc_by_sector=detrended_by_sector,
    tpf_by_sector={k: v for k, v in base.tpf_by_sector.items() if int(k) in sectors},
    artifacts=base.artifacts,
)

# Cache per-sector light curves for TRICERATOPS
cache = btv.hydrate_cache_from_dataset(
    dataset=ds_det,
    tic_id=sh.TIC_ID,
    flux_type='pdcsap',
    cache_dir=cache_dir,
    sectors=sectors,
)

# Depth used by TRICERATOPS is conditional on this preprocessing choice.
stitched_det = sh.stitch_pdcsap_sectors(ds_det, sectors)
depth_ppm, depth_err_ppm = sh.estimate_depth_ppm(stitched_det)

stdout_path = run_out_dir / 'triceratops_stdout.log'
stderr_path = run_out_dir / 'triceratops_stderr.log'

with (
    stdout_path.open('w', encoding='utf-8') as out,
    stderr_path.open('w', encoding='utf-8') as err,
    contextlib.redirect_stdout(out),
    contextlib.redirect_stderr(err),
):
    result = btv.calculate_fpp(
        cache=cache,
        tic_id=sh.TIC_ID,
        period=sh.PERIOD_DAYS,
        t0=sh.T0_BTJD,
        depth_ppm=depth_ppm,
        duration_hours=sh.DURATION_HOURS,
        sectors=sectors,
        stellar_radius=sh.STELLAR.radius,
        stellar_mass=sh.STELLAR.mass,
        preset='fast',
        replicates=3,
        seed=42,
    )

payload = {
    'cache_dir': str(cache_dir),
    'cached_flux_type': 'pdcsap',
    'sectors': sectors,
    'detrend': {
        'bin_hours': bin_hours,
        'duration_buffer_factor': duration_buffer_factor,
        'sigma_clip': sigma_clip,
    },
    'depth_ppm_used': float(depth_ppm),
    'depth_err_ppm_est': float(depth_err_ppm),
    'fpp_result': result,
    'logs': {
        'stdout': str(stdout_path),
        'stderr': str(stderr_path),
    },
}

json_name = 'triceratops_fpp.json'
run_json_path = run_out_dir / json_name
run_json_path.write_text(json.dumps(payload, indent=2, sort_keys=True))
docs_json_path = None
if docs_out_dir is not None:
    docs_json_path = docs_out_dir / json_name
    docs_json_path.write_text(json.dumps(payload, indent=2, sort_keys=True))

summary = {
    'depth_ppm_used': round(float(depth_ppm), 3),
    'depth_err_ppm_est': round(float(depth_err_ppm), 3),
    'fpp': result.get('fpp'),
    'nfpp': result.get('nfpp'),
    'disposition': result.get('disposition'),
    'n_nearby_sources': result.get('n_nearby_sources'),
    'gaia_query_complete': result.get('gaia_query_complete'),
    'aperture_modeled': result.get('aperture_modeled'),
    'sectors_used': result.get('sectors_used'),
    'replicates': result.get('replicates'),
}

print(json.dumps(summary, indent=2, sort_keys=True))


<details>
<summary><b>Expected Output</b></summary>

```text
{
  "aperture_modeled": true,
  "depth_err_ppm_est": 11.135,
  "depth_ppm_used": 210.635,
  "disposition": "VALIDATED",
  "fpp": 0.00262,
  "gaia_query_complete": true,
  "n_nearby_sources": 347,
  "nfpp": 0.0,
  "replicates": 3,
  "sectors_used": [
    82,
    83
  ]
}
```

</details>


In [None]:
from pathlib import Path
import json

import matplotlib.pyplot as plt
import numpy as np

step_id = '36_v22_triceratops_fpp'
fname = 'V22_triceratops_fpp.png'

run_out_dir, docs_out_dir = sh.artifact_dirs(step_id=step_id)
run_path = run_out_dir / fname
docs_path = (docs_out_dir / fname) if docs_out_dir is not None else None

probs = {
    'planet': result.get('prob_planet'),
    'EB': result.get('prob_eb'),
    'BEB': result.get('prob_beb'),
    'NEB': result.get('prob_neb'),
    'NTP': result.get('prob_ntp'),
}
labels = list(probs.keys())
vals = [float(probs[k]) if probs[k] is not None else np.nan for k in labels]

fig, ax = plt.subplots(figsize=(7.5, 3.8), dpi=150)
ax.bar(labels, vals, color=['#2ca02c'] + ['#d62728'] * 4)
ax.set_ylim(0, 1)
ax.set_ylabel('Probability')
ax.set_title('V22: TRICERATOPS(+ ) scenario probabilities (fast preset)')
for i, v in enumerate(vals):
    if np.isfinite(v):
        ax.text(i, min(0.98, v + 0.03), f"{v:.3f}", ha='center', va='bottom', fontsize=9)

fpp = result.get('fpp')
nfpp = result.get('nfpp')
ax.text(
    0.98,
    0.98,
    f"FPP={fpp:.3g}\nNFPP={nfpp:.3g}" if fpp is not None and nfpp is not None else 'FPP/NFPP unavailable',
    ha='right',
    va='top',
    transform=ax.transAxes,
    bbox={'boxstyle': 'round,pad=0.25', 'fc': 'white', 'ec': '0.7'},
)
fig.tight_layout()
fig.savefig(run_path)
if docs_path is not None:
    fig.savefig(docs_path)
plt.close(fig)

print(
    json.dumps(
        {
            'run_plot_path': str(run_path),
            'docs_plot_path': str(docs_path) if docs_path is not None else None,
            'run_json_path': str(run_out_dir / 'triceratops_fpp.json'),
            'docs_json_path': str(docs_out_dir / 'triceratops_fpp.json') if docs_out_dir is not None else None,
        },
        indent=2,
        sort_keys=True,
    )
)


<details>
<summary><b>Expected Output (plot cell)</b></summary>

```text
{
  "docs_json_path": "docs/tutorials/artifacts/tutorial_toi-5807-incremental/36_v22_triceratops_fpp/triceratops_fpp.json",
  "docs_plot_path": "docs/tutorials/artifacts/tutorial_toi-5807-incremental/36_v22_triceratops_fpp/V22_triceratops_fpp.png",
  "run_json_path": "persistent_cache/tutorial_toi-5807-incremental/36_v22_triceratops_fpp/triceratops_fpp.json",
  "run_plot_path": "persistent_cache/tutorial_toi-5807-incremental/36_v22_triceratops_fpp/V22_triceratops_fpp.png"
}
```

</details>


## Plot

<img src="../artifacts/tutorial_toi-5807-incremental/36_v22_triceratops_fpp/V22_triceratops_fpp.png" width="820" />


<details>
<summary><b>Analysis</b></summary>

- **What this checks:** a Bayesian comparison of hypotheses that can explain the observed transit depth (planet on target, EB on target, blended/background EB, nearby transiting planet), using Gaia neighbors and TRILEGAL population priors.
- **Headline outputs:**
  - `FPP` (false positive probability): total probability mass in non-planet scenarios.
  - `NFPP` (nearby false positive probability): probability mass specifically in nearby-source scenarios (e.g., blended/background contributors).
  - Scenario probabilities (`prob_planet`, `prob_eb`, `prob_beb`, `prob_neb`, `prob_ntp`) provide the breakdown.
- **Interpretation (this run):** `FPPâ‰ˆ2.62e-3` is below the common `0.01` statistical-validation threshold, and `NFPP=0.0` indicates TRICERATOPS does not favor a nearby-source origin under the assumed aperture model.
- **What this does *not* validate:** TRICERATOPS is primarily a *blend/EB* probabilistic screen; it does not guarantee the signal is astrophysical (systematics can still mimic transits). Cross-check against the light-curve diagnostics (uniqueness, SWEET, sector consistency, localization) remains required.
- **Cautions / failure modes to keep in mind:**
  - **Crowding:** the field is crowded (`n_nearby_sources=347`). In crowded regions, small changes in aperture assumptions and centroid/localization evidence matter more.
  - **Depth dependence:** the FPP result is **conditional on the depth used** (here ~210.6 ppm for the gated+detrended baseline). If depth moves substantially under reasonable preprocessing, re-run TRICERATOPS and treat the FPP conclusion as conditional until depth is stabilized.
  - **Catalog completeness:** some neighbors may lack stellar parameters; TRICERATOPS may assume Solar-like defaults for those stars. If you need to audit that sensitivity, review the captured logs.
  - **Stochastic variance:** even with a fixed seed, Monte Carlo sampling introduces some variance; if you see borderline values near thresholds, increase draws or run additional replicates.
- **Practical next step:** incorporate any available high-resolution imaging constraints (contrast curve) and re-run (V23), then check stability across reasonable preprocessing (V24).

</details>
