PF Gate provides deterministic CI gating and portable debug bundles. Operator details live in
docs/operator_guide.md, and the frozen v1 contracts are in docs/contracts_v1.md.
python -m pip install -e .
pf gate --selftest
pf gate --policy persons_field/ops/policy_packs/pf_gate_r10_gate_score.json \
--out /tmp/pf_gate_bundle \
path/to/trace.jsonl
pf ci gate-and-diff --policy persons_field/ops/policy_packs/pf_gate_r10_gate_score.json \
--trace examples/nightly_regression/trace.jsonl \
--out /tmp/pf_ci_outProject links:
- Operator guide:
docs/operator_guide.md - Contracts:
docs/contracts_v1.md - CI templates:
.github/workflows/pf_gate_ci.yml,ci/gitlab/pf_gate_ci.yml - Contributing:
CONTRIBUTING.md - License:
LICENSE - Citation:
CITATION.cff
This repo is scoped to three validated claims:
- A: Behavioral invariance across trace arches (trajectory parity hashing).
- B: Data quality improvements from richer contract traces (arch x regime metrics).
- C: Failure prediction enabled by contract traces (AUPRC lift + calibration + leakage guards).
bash scripts/setup_venv.sh
bash scripts/run_tests.sh
bash scripts/run_abc_paper.shDocker-first path (minimal artifacts):
docker build -t pf-gate:ci .
docker run --rm -v "$(pwd):/repo" -w /repo pf-gate:ci \
pf gate --policy tests/fixtures/pf_gate_selftest/policy.json \
--artifact-level minimal \
--out /tmp/pf_gate_bundle \
--overwrite \
tests/fixtures/pf_gate_selftest
docker run --rm -v "$(pwd):/repo" -w /repo pf-gate:ci \
pf diff examples/ci/baselines/pf_gate_selftest_standard /tmp/pf_gate_bundle \
--as-gate \
--artifact-level minimal \
--out /tmp/pf_gate_diff \
--overwriteExit codes:
0PASS2WARN3FAIL4QUARANTINE
Artifacts land in the --out directory for pf gate and pf diff:
decision.json,summary.json,receipts.json,review_list.csv,report.htmldiff_summary.json/diff_report.mdfor diffsepisodes/drilldowns for reviewed episodes
- Pilot quickstart:
docs/pilot_quickstart.md - Adapter quickstart:
docs/adapters_quickstart.md - Operator guide:
docs/operator_guide.md
python -m pip install persons-field
pf --versionThe canonical CLI path is:
python3.13 -m persons_field.cli abc_validate --seeds 0-9 --config configs/abc_paper.yaml --out_dir reports/abcEach run writes:
reports/abc/<timestamp>/{summary.md,tables/,figures/,provenance.json}
- Trust receipt:
/private/tmp/Data 2/pf_omniverse_audit/omniverse_audit_summary.json - Core claim artifacts:
/private/tmp/Data 2/pf_omniverse_audit/presentation/warnbreak_mechanism_table_core.md,/private/tmp/Data 2/pf_omniverse_audit/presentation/warnbreak_delta_auprcx.svg - Appendix/quarantine artifacts:
/private/tmp/Data 2/pf_omniverse_audit/presentation/warnbreak_mechanism_table_quarantined.md - GPU-free artifacts:
/private/tmp/Data 2/pf_omniverse_audit/presentation/quarantine_diagnosis.md,/private/tmp/Data 2/pf_omniverse_audit/presentation/quarantine_diagnosis.csv,/private/tmp/Data 2/pf_omniverse_audit/presentation/baseline_vs_model.md,/private/tmp/Data 2/pf_omniverse_audit/presentation/baseline_vs_model.csv,/private/tmp/Data 2/pf_omniverse_audit/presentation/bootstrap_ci_deltas.md,/private/tmp/Data 2/pf_omniverse_audit/presentation/bootstrap_ci_deltas.csv,/private/tmp/Data 2/pf_omniverse_audit/presentation/sensitivity_sweep.md,/private/tmp/Data 2/pf_omniverse_audit/presentation/sensitivity_sweep.csv,/private/tmp/Data 2/pf_omniverse_audit/presentation/bootstrap_model_minus_baseline.md,/private/tmp/Data 2/pf_omniverse_audit/presentation/bootstrap_model_minus_baseline.csv,/private/tmp/Data 2/pf_omniverse_audit/presentation/bootstrap_model_minus_baseline_episode.md,/private/tmp/Data 2/pf_omniverse_audit/presentation/bootstrap_model_minus_baseline_episode.csv,/private/tmp/Data 2/pf_omniverse_audit/presentation/slice_adaptive_policy.md,/private/tmp/Data 2/pf_omniverse_audit/presentation/slice_adaptive_policy.csv,/private/tmp/Data 2/pf_omniverse_audit/presentation/predictions/ - Scaled portability audit:
/private/tmp/Data 2/pf_portability_audit_scaled_50/portability_report.json - Paper-grade ABC v2 bundle:
papers/abc_claims_v2/(verify withbash papers/abc_claims_v2/verify.sh) - Session handoff:
docs/HANDOFF.md(latest freeze + repro commands) - Latest freeze zip (see
FREEZE.md):/tmp/persons_field_abc_claims_v2_freeze_20260128_151521.zip - Router contract:
docs/router_contract.md(also bundled inpapers/abc_claims_v2/router_contract.md) - Claim ledger:
CLAIM_LEDGER.md(also bundled inpapers/abc_claims_v2/CLAIM_LEDGER.md) - Generalization probes (dense-positive):
reports/rare_failure_generalization_HARD/,reports/rare_failure_generalization_MULTITASK/,reports/rare_failure_generalization_NOISY/ - Rarity-controlled probes (target=0.01):
reports/rare_failure_generalization_HARD_rarity/,reports/rare_failure_generalization_MULTITASK_rarity/,reports/rare_failure_generalization_NOISY_rarity/ - Shared-threshold specs (A1 VAL):
reports/rare_failure_generalization_HARD_rarity/shared_threshold.json,reports/rare_failure_generalization_MULTITASK_rarity/shared_threshold.json,reports/rare_failure_generalization_NOISY_rarity/shared_threshold.json - NOISY flip autopsy (power-adjusted rarity):
reports/rare_failure_generalization_NOISY_rarity/probe_flip_autopsy.md,reports/rare_failure_generalization_NOISY_rarity/probe_flip_autopsy.csv,reports/rare_failure_generalization_NOISY_rarity/shared_threshold_autopsy.json - NOISY shared-signal parity + allocation/ranking audits:
reports/rare_failure_generalization_NOISY_rarity/shared_signal_parity.md,reports/rare_failure_generalization_NOISY_rarity/noisy_budget_allocation_audit.md,reports/rare_failure_generalization_NOISY_rarity/noisy_ranking_displacement_audit.md,reports/rare_failure_generalization_NOISY_rarity/noisy_ranking_displacement_per_event.csv - NOISY shared event sets (A1-defined):
reports/rare_failure_generalization_NOISY_rarity/shared_event_set.json,reports/rare_failure_generalization_NOISY_rarity/shared_event_set_autopsy.json - HARD shared-signal parity (target=0.05 diagnostic):
reports/rare_failure_generalization_HARD_rarity/shared_signal_parity.md,reports/rare_failure_generalization_HARD_rarity/shared_signal_parity.csv,reports/rare_failure_generalization_HARD_rarity/shared_signal_parity_gate.json - HARD shared threshold + event set (target=0.05 diagnostic):
reports/rare_failure_generalization_HARD_rarity/shared_threshold_autopsy.json,reports/rare_failure_generalization_HARD_rarity/shared_event_set_autopsy.json - Feature governance map (shared-severity NOISY):
reports/feature_governance_map/noisy_shared_severity_governance.md,reports/feature_governance_map/noisy_shared_severity_governance.csv - Feature governance map (shared-severity NOISY, target=0.05 diagnostic):
reports/feature_governance_map/noisy_shared_severity_governance_target0p05.md,reports/feature_governance_map/noisy_shared_severity_governance_target0p05.csv - NOISY governance case study (canonical shared events):
reports/feature_governance_map/noisy_governance_case_study.md,reports/feature_governance_map/noisy_governance_case_study.csv,reports/feature_governance_map/noisy_governance_case_study_per_seed.csv - NOISY governance case study (target=0.05 diagnostic):
reports/feature_governance_map/noisy_governance_case_study_target0p05.md,reports/feature_governance_map/noisy_governance_case_study_target0p05.csv,reports/feature_governance_map/noisy_governance_case_study_target0p05_per_seed.csv - NOISY governance case study budget sweep (deploy):
reports/feature_governance_map/noisy_governance_case_study_budget_sweep.md,reports/feature_governance_map/noisy_governance_case_study_budget_sweep.csv - NOISY governance case study budget sweep (diagnostic):
reports/feature_governance_map/noisy_governance_case_study_target0p05_budget_sweep.md,reports/feature_governance_map/noisy_governance_case_study_target0p05_budget_sweep.csv - HARD governance case study (target=0.05 diagnostic):
reports/feature_governance_map/hard_governance_case_study_target0p05.md,reports/feature_governance_map/hard_governance_case_study_target0p05.csv,reports/feature_governance_map/hard_governance_case_study_target0p05_per_seed.csv - HARD governance case study budget sweep (diagnostic):
reports/feature_governance_map/hard_governance_case_study_target0p05_budget_sweep.md,reports/feature_governance_map/hard_governance_case_study_target0p05_budget_sweep.csv - Family policy router:
reports/feature_governance_map/family_policy_router.md,reports/feature_governance_map/family_policy_router.csv,reports/feature_governance_map/family_policy_router.json - Family policy router budget sweep:
reports/feature_governance_map/family_policy_router_budget_sweep.md,reports/feature_governance_map/family_policy_router_budget_sweep.csv - Family router summary panel:
reports/feature_governance_map/family_router_summary_panel.md,reports/feature_governance_map/family_router_summary_panel.csv - NOISY power plan (router gate):
reports/feature_governance_map/power_plan_noisy.md,reports/feature_governance_map/power_plan_noisy.csv - HARD power plan (router gate):
reports/feature_governance_map/power_plan_hard.md,reports/feature_governance_map/power_plan_hard.csv - NOISY diagnose mode panel:
reports/feature_governance_map/noisy_diagnose_panel.md - HARD diagnose mode panel:
reports/feature_governance_map/hard_diagnose_panel.md - HARD policy budget sensitivity:
reports/feature_governance_map/hard_policy_budget_sensitivity.md,reports/feature_governance_map/hard_policy_budget_sensitivity.csv - HARD margin-collapse zero-recall audit:
reports/feature_governance_map/hard_margin_collapse_zero_recall_audit.md,reports/feature_governance_map/hard_margin_collapse_zero_recall_audit.csv - Generalization scorecard:
reports/rare_failure_generalization_scorecard.md - Mechanism-matched family definitions:
reports/mechanism_tuning/HARD_margin_collapse_tuned/family_definition.md,reports/mechanism_tuning/NOISY_margin_collapse_tuned/family_definition.md - Mechanism-matched shared event sets:
reports/mechanism_matched/HARD_margin_collapse_tuned/shared_event_set.json,reports/mechanism_matched/NOISY_margin_collapse_tuned/shared_event_set.json - Mechanism-matched shared-signal parity gates:
reports/mechanism_matched/HARD_margin_collapse_tuned/shared_signal_parity_gate.json,reports/mechanism_matched/NOISY_margin_collapse_tuned/shared_signal_parity_gate.json - Mechanism-matched case studies (target=0.05):
reports/feature_governance_map/hard_margin_collapse_tuned_case_study_target0p05.md,reports/feature_governance_map/noisy_margin_collapse_tuned_case_study_target0p05.md - Mechanism-matched power plans:
reports/feature_governance_map/power_plan_hard_margin_collapse_tuned.md,reports/feature_governance_map/power_plan_noisy_margin_collapse_tuned.md - Router extra-family config:
reports/feature_governance_map/extra_family_config_mechanism_matched.json - Shared-signal parity gate:
scripts/check_shared_signal_parity_gate.sh(writesreports/rare_failure_generalization_NOISY_rarity/shared_signal_parity_gate.json) - Shared-signal parity gate (HARD):
scripts/check_shared_signal_parity_gate_hard.sh(writesreports/rare_failure_generalization_HARD_rarity/shared_signal_parity_gate.json)
- Readiness failures (spec/taxonomy): claims are conditional on the current readiness policy; signal_ready is diagnostic; claim_eligible_strict (claim_eligible AND signal_ready_ok) is provided for signal-gated eligibility.
- Baseline-sufficient slice: evidence of slice heterogeneity; a slice-adaptive policy (baseline where it dominates, model otherwise) is the right operational outcome and improves interpretability. Model-versus-baseline is a reality check, not the primary claim.
- Portability eval scope: ok_core is the portability gate; ok_eval applies only to metrics computable from the portable payload; keep eval metrics diagnostic unless the payload expands.
- Rare catastrophes (ABC suite): unweighted models remain near base rate under ultra-rare step labels; balanced + calibrated logreg shows consistent lift above PR baseline at k=3 with budgeted and event-level gains (see reports/rare_failure_audits_10k/ and reports/rare_failure_audits_10k_A1/). External validity still pending.
- Generalization probes: dense-positive probes are not rare-failure transfer claims; rarity-controlled probes use shared severity with A1-derived thresholds and report actual prevalence + event-count audits per regime.
- External validity: Omniverse transfer is still pending; portability audits are necessary but not sufficient. Validate on Omniverse tasks without per-scenario hand tuning before broad claims.
See docs/warnbreak_learning.md and docs/learning_readiness.md for gate policy and the
minimal publishable grid. To run the end-to-end WarnBreak learning workflow:
bash scripts/run_warnbreak_learning.sh --a1 <path/to/A1/datasets/learning_clean> --a5 <path/to/A5/datasets/learning_clean>Outputs land under:
reports/warnbreak_learning/<timestamp>/
- 2026-01-19: "better data => better learning" for WarnBreak (A5 > A1) across 8 slices (logreg+MLP, all train fracs).
- Mechanism: pre-break separability in queue_wcet/queue_len/queue_work_est (hard-only attribution).
- Learning sweep reports: reports/warnbreak_learning/20260119_184145/
- Attribution reports: reports/warnbreak_attribution/20260119_205552/
- Tag: freeze_learning_results_v2_mechanism_2026-01-19
- Footnote: attribution uses numpy fallback logreg (see docs/warnbreak_learning.md).
The demo CLI (entrypoint pf demo) writes:
report.mdsummary.jsondeployed_policy.json(stable policy artifact: rule_text, thresholds, required_features, selection/metrics splits, derived TEST metrics)selection_debug.json(optional, when selection debug info is available)
Smoke-check summary integrity:
python3 scripts/smoke_policy_integrity.py --summary <out_dir>/summary.jsonRun the operational gate with IDE-ready outputs and bundles:
OUT="/tmp/pf_operational_gate"
pf demo --out "$OUT" --horizon 10 --budget 0.10 --group system_only --optimize-for utility \
--emit-policy-pack --emit-episodes --emit-alert-events \
--bundle-window 20 --evidence-window 5This produces:
policy_pack.json(canonical tier specs, budgets, playbooks, calibration_context)gate_events.json/gate_events.jsonlandalert_events.json/alert_events.jsonlreport_gate_ide.html(offline IDE-style debugging UI)bundles/<episode_id>/<event_id>/with event context, trace slice, repro stub
Schema migration notes:
policy_pack.jsonschema_version2.0addstiers,budgets, andcalibration_context(warn/abort policies remain for back-compat).gate_eventsschema_versionpf_gate_event_v2addsevent_id,margin,margin_ratio,consecutive_over,evidence_window,suspected_subsystem, and realbundle_paths.alert_eventsandgate_eventsare now emitted as.jsonarrays alongside the legacy.jsonl.
docs/README_ABC.mddocs/architecture_abc.mddocs/trace_schema.mddocs/evaluation_abc.md