Skip to content

Auto-stamp bcfp baseline in run_provincial_parity.R #121

@NewGraphEnvironment

Description

@NewGraphEnvironment

Problem

data-raw/logs/bcfp_baselines.csv records the bcfp comparison baseline (model_run_id + SHA) each link comparison was computed against. The CSV's existing rows are compare-flavoured: (link_schema, bcfp_model_run_id) pairs. Currently hand-maintained — the goal is auto-stamping.

Stamping must happen where comparisons actually occur. The trifecta orchestrator (data-raw/trifecta_provincial.sh) dispatches builds; it does not compare. Comparison happens one layer down inside run_provincial_parity.R (per-WSG via compare_bcfishpass_wsg()).

Where bcfp is touched

Script Queries bcfp? Stamp here?
data-raw/trifecta_provincial.sh No (dispatch only) No
data-raw/run_provincial_parity.R Yes — per-WSG via compare_bcfishpass_wsg() Yes — once per invocation
data-raw/compare_bcfishpass_wsg.R Yes — single WSG Sub-call (not stamped per-WSG)
data-raw/compare_rollups.R No — link-vs-link RDS delta No (no bcfp involved)

Proposed Solution

  1. Wire the stamp at the top of run_provincial_parity.R, once per invocation (not per-WSG). The same script handles single-host runs and trifecta-dispatched per-host runs, so single wiring covers both call patterns automatically.
  2. CSV schema migration: add host column. Trifecta runs produce 3 rows (m4 / m1 / cypher) all with the same bcfp_model_run_id but different host context. Backfill existing rows to m4 (they were single-host runs).
  3. Tunnel-tolerance: reuse compare_bcfishpass_wsg()'s existing tunnel where possible; warn-and-continue if the tunnel can't open. Stamp failure must not block a build.
  4. Idempotency: if (host, link_schema, bcfp_model_run_id, run_started_pdt) row already exists, skip — supports resume-safe retries.

Final CSV columns: run_started_pdt, host, run_label, link_schema, bcfp_model_run_id, bcfp_model_version, bcfp_date_completed, notes.

Verification

  • Run trifecta_provincial.sh --config=default_rearbreaks --schema=fresh_default_rearbreaks.
  • Confirm three rows in bcfp_baselines.csv (host = m4, m1, cypher) all with matching bcfp_model_run_id.
  • Confirm [bcfp-baseline] stamped: ... line in each host's run_provincial_parity.R log.
  • Re-running with the same run_started_pdt does not double-stamp.

Out of scope

  • Mid-run bcfp build collision detection (Tuesday rebuilds shifting bcfp build mid-trifecta). Separate enhancement.
  • Stamping on compare_rollups.R (link-vs-link methodology delta, no bcfp involved).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions