Problem
data-raw/logs/bcfp_baselines.csv records the bcfp comparison baseline (model_run_id + SHA) each link comparison was computed against. The CSV's existing rows are compare-flavoured: (link_schema, bcfp_model_run_id) pairs. Currently hand-maintained — the goal is auto-stamping.
Stamping must happen where comparisons actually occur. The trifecta orchestrator (data-raw/trifecta_provincial.sh) dispatches builds; it does not compare. Comparison happens one layer down inside run_provincial_parity.R (per-WSG via compare_bcfishpass_wsg()).
Where bcfp is touched
| Script |
Queries bcfp? |
Stamp here? |
data-raw/trifecta_provincial.sh |
No (dispatch only) |
No |
data-raw/run_provincial_parity.R |
Yes — per-WSG via compare_bcfishpass_wsg() |
Yes — once per invocation |
data-raw/compare_bcfishpass_wsg.R |
Yes — single WSG |
Sub-call (not stamped per-WSG) |
data-raw/compare_rollups.R |
No — link-vs-link RDS delta |
No (no bcfp involved) |
Proposed Solution
- Wire the stamp at the top of
run_provincial_parity.R, once per invocation (not per-WSG). The same script handles single-host runs and trifecta-dispatched per-host runs, so single wiring covers both call patterns automatically.
- CSV schema migration: add
host column. Trifecta runs produce 3 rows (m4 / m1 / cypher) all with the same bcfp_model_run_id but different host context. Backfill existing rows to m4 (they were single-host runs).
- Tunnel-tolerance: reuse
compare_bcfishpass_wsg()'s existing tunnel where possible; warn-and-continue if the tunnel can't open. Stamp failure must not block a build.
- Idempotency: if
(host, link_schema, bcfp_model_run_id, run_started_pdt) row already exists, skip — supports resume-safe retries.
Final CSV columns: run_started_pdt, host, run_label, link_schema, bcfp_model_run_id, bcfp_model_version, bcfp_date_completed, notes.
Verification
- Run
trifecta_provincial.sh --config=default_rearbreaks --schema=fresh_default_rearbreaks.
- Confirm three rows in
bcfp_baselines.csv (host = m4, m1, cypher) all with matching bcfp_model_run_id.
- Confirm
[bcfp-baseline] stamped: ... line in each host's run_provincial_parity.R log.
- Re-running with the same
run_started_pdt does not double-stamp.
Out of scope
- Mid-run bcfp build collision detection (Tuesday rebuilds shifting bcfp build mid-trifecta). Separate enhancement.
- Stamping on
compare_rollups.R (link-vs-link methodology delta, no bcfp involved).
Problem
data-raw/logs/bcfp_baselines.csvrecords the bcfp comparison baseline (model_run_id+ SHA) each link comparison was computed against. The CSV's existing rows are compare-flavoured:(link_schema, bcfp_model_run_id)pairs. Currently hand-maintained — the goal is auto-stamping.Stamping must happen where comparisons actually occur. The trifecta orchestrator (
data-raw/trifecta_provincial.sh) dispatches builds; it does not compare. Comparison happens one layer down insiderun_provincial_parity.R(per-WSG viacompare_bcfishpass_wsg()).Where bcfp is touched
data-raw/trifecta_provincial.shdata-raw/run_provincial_parity.Rcompare_bcfishpass_wsg()data-raw/compare_bcfishpass_wsg.Rdata-raw/compare_rollups.RProposed Solution
run_provincial_parity.R, once per invocation (not per-WSG). The same script handles single-host runs and trifecta-dispatched per-host runs, so single wiring covers both call patterns automatically.hostcolumn. Trifecta runs produce 3 rows (m4 / m1 / cypher) all with the samebcfp_model_run_idbut different host context. Backfill existing rows tom4(they were single-host runs).compare_bcfishpass_wsg()'s existing tunnel where possible; warn-and-continue if the tunnel can't open. Stamp failure must not block a build.(host, link_schema, bcfp_model_run_id, run_started_pdt)row already exists, skip — supports resume-safe retries.Final CSV columns:
run_started_pdt, host, run_label, link_schema, bcfp_model_run_id, bcfp_model_version, bcfp_date_completed, notes.Verification
trifecta_provincial.sh --config=default_rearbreaks --schema=fresh_default_rearbreaks.bcfp_baselines.csv(host = m4, m1, cypher) all with matchingbcfp_model_run_id.[bcfp-baseline] stamped: ...line in each host'srun_provincial_parity.Rlog.run_started_pdtdoes not double-stamp.Out of scope
compare_rollups.R(link-vs-link methodology delta, no bcfp involved).