Skip to content
/ PF-Gate Public

PF Gate — deterministic CI gate + debug packets for robotics runs. Turns traces into PASS/WARN/FAIL/QUARANTINE decisions with auditable receipts, offline reports, JUnit output, and diff-as-gate support. Designed to sit above existing robotics logging/visualization stacks and make regressions fast to triage.

License

Notifications You must be signed in to change notification settings

QPFAI/PF-Gate

The Person's Field (ABC validation)

PF Gate (CI Gate)

PF Gate provides deterministic CI gating and portable debug bundles. Operator details live in docs/operator_guide.md, and the frozen v1 contracts are in docs/contracts_v1.md.

Getting Started (PF Gate CLI)

python -m pip install -e .
pf gate --selftest

pf gate --policy persons_field/ops/policy_packs/pf_gate_r10_gate_score.json \
  --out /tmp/pf_gate_bundle \
  path/to/trace.jsonl

pf ci gate-and-diff --policy persons_field/ops/policy_packs/pf_gate_r10_gate_score.json \
  --trace examples/nightly_regression/trace.jsonl \
  --out /tmp/pf_ci_out

Project links:

  • Operator guide: docs/operator_guide.md
  • Contracts: docs/contracts_v1.md
  • CI templates: .github/workflows/pf_gate_ci.yml, ci/gitlab/pf_gate_ci.yml
  • Contributing: CONTRIBUTING.md
  • License: LICENSE
  • Citation: CITATION.cff

This repo is scoped to three validated claims:

  • A: Behavioral invariance across trace arches (trajectory parity hashing).
  • B: Data quality improvements from richer contract traces (arch x regime metrics).
  • C: Failure prediction enabled by contract traces (AUPRC lift + calibration + leakage guards).

Quickstart

bash scripts/setup_venv.sh
bash scripts/run_tests.sh
bash scripts/run_abc_paper.sh

CI Quickstart

Docker-first path (minimal artifacts):

docker build -t pf-gate:ci .
docker run --rm -v "$(pwd):/repo" -w /repo pf-gate:ci \
  pf gate --policy tests/fixtures/pf_gate_selftest/policy.json \
    --artifact-level minimal \
    --out /tmp/pf_gate_bundle \
    --overwrite \
    tests/fixtures/pf_gate_selftest

docker run --rm -v "$(pwd):/repo" -w /repo pf-gate:ci \
  pf diff examples/ci/baselines/pf_gate_selftest_standard /tmp/pf_gate_bundle \
    --as-gate \
    --artifact-level minimal \
    --out /tmp/pf_gate_diff \
    --overwrite

Exit codes:

  • 0 PASS
  • 2 WARN
  • 3 FAIL
  • 4 QUARANTINE

Artifacts land in the --out directory for pf gate and pf diff:

  • decision.json, summary.json, receipts.json, review_list.csv, report.html
  • diff_summary.json / diff_report.md for diffs
  • episodes/ drilldowns for reviewed episodes

Self-serve onboarding

  • Pilot quickstart: docs/pilot_quickstart.md
  • Adapter quickstart: docs/adapters_quickstart.md
  • Operator guide: docs/operator_guide.md

Install (CLI)

python -m pip install persons-field
pf --version

The canonical CLI path is:

python3.13 -m persons_field.cli abc_validate --seeds 0-9 --config configs/abc_paper.yaml --out_dir reports/abc

Outputs

Each run writes:

reports/abc/<timestamp>/{summary.md,tables/,figures/,provenance.json}

Results (Frozen / Omniverse-prep)

  • Trust receipt: /private/tmp/Data 2/pf_omniverse_audit/omniverse_audit_summary.json
  • Core claim artifacts: /private/tmp/Data 2/pf_omniverse_audit/presentation/warnbreak_mechanism_table_core.md, /private/tmp/Data 2/pf_omniverse_audit/presentation/warnbreak_delta_auprcx.svg
  • Appendix/quarantine artifacts: /private/tmp/Data 2/pf_omniverse_audit/presentation/warnbreak_mechanism_table_quarantined.md
  • GPU-free artifacts: /private/tmp/Data 2/pf_omniverse_audit/presentation/quarantine_diagnosis.md, /private/tmp/Data 2/pf_omniverse_audit/presentation/quarantine_diagnosis.csv, /private/tmp/Data 2/pf_omniverse_audit/presentation/baseline_vs_model.md, /private/tmp/Data 2/pf_omniverse_audit/presentation/baseline_vs_model.csv, /private/tmp/Data 2/pf_omniverse_audit/presentation/bootstrap_ci_deltas.md, /private/tmp/Data 2/pf_omniverse_audit/presentation/bootstrap_ci_deltas.csv, /private/tmp/Data 2/pf_omniverse_audit/presentation/sensitivity_sweep.md, /private/tmp/Data 2/pf_omniverse_audit/presentation/sensitivity_sweep.csv, /private/tmp/Data 2/pf_omniverse_audit/presentation/bootstrap_model_minus_baseline.md, /private/tmp/Data 2/pf_omniverse_audit/presentation/bootstrap_model_minus_baseline.csv, /private/tmp/Data 2/pf_omniverse_audit/presentation/bootstrap_model_minus_baseline_episode.md, /private/tmp/Data 2/pf_omniverse_audit/presentation/bootstrap_model_minus_baseline_episode.csv, /private/tmp/Data 2/pf_omniverse_audit/presentation/slice_adaptive_policy.md, /private/tmp/Data 2/pf_omniverse_audit/presentation/slice_adaptive_policy.csv, /private/tmp/Data 2/pf_omniverse_audit/presentation/predictions/
  • Scaled portability audit: /private/tmp/Data 2/pf_portability_audit_scaled_50/portability_report.json
  • Paper-grade ABC v2 bundle: papers/abc_claims_v2/ (verify with bash papers/abc_claims_v2/verify.sh)
  • Session handoff: docs/HANDOFF.md (latest freeze + repro commands)
  • Latest freeze zip (see FREEZE.md): /tmp/persons_field_abc_claims_v2_freeze_20260128_151521.zip
  • Router contract: docs/router_contract.md (also bundled in papers/abc_claims_v2/router_contract.md)
  • Claim ledger: CLAIM_LEDGER.md (also bundled in papers/abc_claims_v2/CLAIM_LEDGER.md)
  • Generalization probes (dense-positive): reports/rare_failure_generalization_HARD/, reports/rare_failure_generalization_MULTITASK/, reports/rare_failure_generalization_NOISY/
  • Rarity-controlled probes (target=0.01): reports/rare_failure_generalization_HARD_rarity/, reports/rare_failure_generalization_MULTITASK_rarity/, reports/rare_failure_generalization_NOISY_rarity/
  • Shared-threshold specs (A1 VAL): reports/rare_failure_generalization_HARD_rarity/shared_threshold.json, reports/rare_failure_generalization_MULTITASK_rarity/shared_threshold.json, reports/rare_failure_generalization_NOISY_rarity/shared_threshold.json
  • NOISY flip autopsy (power-adjusted rarity): reports/rare_failure_generalization_NOISY_rarity/probe_flip_autopsy.md, reports/rare_failure_generalization_NOISY_rarity/probe_flip_autopsy.csv, reports/rare_failure_generalization_NOISY_rarity/shared_threshold_autopsy.json
  • NOISY shared-signal parity + allocation/ranking audits: reports/rare_failure_generalization_NOISY_rarity/shared_signal_parity.md, reports/rare_failure_generalization_NOISY_rarity/noisy_budget_allocation_audit.md, reports/rare_failure_generalization_NOISY_rarity/noisy_ranking_displacement_audit.md, reports/rare_failure_generalization_NOISY_rarity/noisy_ranking_displacement_per_event.csv
  • NOISY shared event sets (A1-defined): reports/rare_failure_generalization_NOISY_rarity/shared_event_set.json, reports/rare_failure_generalization_NOISY_rarity/shared_event_set_autopsy.json
  • HARD shared-signal parity (target=0.05 diagnostic): reports/rare_failure_generalization_HARD_rarity/shared_signal_parity.md, reports/rare_failure_generalization_HARD_rarity/shared_signal_parity.csv, reports/rare_failure_generalization_HARD_rarity/shared_signal_parity_gate.json
  • HARD shared threshold + event set (target=0.05 diagnostic): reports/rare_failure_generalization_HARD_rarity/shared_threshold_autopsy.json, reports/rare_failure_generalization_HARD_rarity/shared_event_set_autopsy.json
  • Feature governance map (shared-severity NOISY): reports/feature_governance_map/noisy_shared_severity_governance.md, reports/feature_governance_map/noisy_shared_severity_governance.csv
  • Feature governance map (shared-severity NOISY, target=0.05 diagnostic): reports/feature_governance_map/noisy_shared_severity_governance_target0p05.md, reports/feature_governance_map/noisy_shared_severity_governance_target0p05.csv
  • NOISY governance case study (canonical shared events): reports/feature_governance_map/noisy_governance_case_study.md, reports/feature_governance_map/noisy_governance_case_study.csv, reports/feature_governance_map/noisy_governance_case_study_per_seed.csv
  • NOISY governance case study (target=0.05 diagnostic): reports/feature_governance_map/noisy_governance_case_study_target0p05.md, reports/feature_governance_map/noisy_governance_case_study_target0p05.csv, reports/feature_governance_map/noisy_governance_case_study_target0p05_per_seed.csv
  • NOISY governance case study budget sweep (deploy): reports/feature_governance_map/noisy_governance_case_study_budget_sweep.md, reports/feature_governance_map/noisy_governance_case_study_budget_sweep.csv
  • NOISY governance case study budget sweep (diagnostic): reports/feature_governance_map/noisy_governance_case_study_target0p05_budget_sweep.md, reports/feature_governance_map/noisy_governance_case_study_target0p05_budget_sweep.csv
  • HARD governance case study (target=0.05 diagnostic): reports/feature_governance_map/hard_governance_case_study_target0p05.md, reports/feature_governance_map/hard_governance_case_study_target0p05.csv, reports/feature_governance_map/hard_governance_case_study_target0p05_per_seed.csv
  • HARD governance case study budget sweep (diagnostic): reports/feature_governance_map/hard_governance_case_study_target0p05_budget_sweep.md, reports/feature_governance_map/hard_governance_case_study_target0p05_budget_sweep.csv
  • Family policy router: reports/feature_governance_map/family_policy_router.md, reports/feature_governance_map/family_policy_router.csv, reports/feature_governance_map/family_policy_router.json
  • Family policy router budget sweep: reports/feature_governance_map/family_policy_router_budget_sweep.md, reports/feature_governance_map/family_policy_router_budget_sweep.csv
  • Family router summary panel: reports/feature_governance_map/family_router_summary_panel.md, reports/feature_governance_map/family_router_summary_panel.csv
  • NOISY power plan (router gate): reports/feature_governance_map/power_plan_noisy.md, reports/feature_governance_map/power_plan_noisy.csv
  • HARD power plan (router gate): reports/feature_governance_map/power_plan_hard.md, reports/feature_governance_map/power_plan_hard.csv
  • NOISY diagnose mode panel: reports/feature_governance_map/noisy_diagnose_panel.md
  • HARD diagnose mode panel: reports/feature_governance_map/hard_diagnose_panel.md
  • HARD policy budget sensitivity: reports/feature_governance_map/hard_policy_budget_sensitivity.md, reports/feature_governance_map/hard_policy_budget_sensitivity.csv
  • HARD margin-collapse zero-recall audit: reports/feature_governance_map/hard_margin_collapse_zero_recall_audit.md, reports/feature_governance_map/hard_margin_collapse_zero_recall_audit.csv
  • Generalization scorecard: reports/rare_failure_generalization_scorecard.md
  • Mechanism-matched family definitions: reports/mechanism_tuning/HARD_margin_collapse_tuned/family_definition.md, reports/mechanism_tuning/NOISY_margin_collapse_tuned/family_definition.md
  • Mechanism-matched shared event sets: reports/mechanism_matched/HARD_margin_collapse_tuned/shared_event_set.json, reports/mechanism_matched/NOISY_margin_collapse_tuned/shared_event_set.json
  • Mechanism-matched shared-signal parity gates: reports/mechanism_matched/HARD_margin_collapse_tuned/shared_signal_parity_gate.json, reports/mechanism_matched/NOISY_margin_collapse_tuned/shared_signal_parity_gate.json
  • Mechanism-matched case studies (target=0.05): reports/feature_governance_map/hard_margin_collapse_tuned_case_study_target0p05.md, reports/feature_governance_map/noisy_margin_collapse_tuned_case_study_target0p05.md
  • Mechanism-matched power plans: reports/feature_governance_map/power_plan_hard_margin_collapse_tuned.md, reports/feature_governance_map/power_plan_noisy_margin_collapse_tuned.md
  • Router extra-family config: reports/feature_governance_map/extra_family_config_mechanism_matched.json
  • Shared-signal parity gate: scripts/check_shared_signal_parity_gate.sh (writes reports/rare_failure_generalization_NOISY_rarity/shared_signal_parity_gate.json)
  • Shared-signal parity gate (HARD): scripts/check_shared_signal_parity_gate_hard.sh (writes reports/rare_failure_generalization_HARD_rarity/shared_signal_parity_gate.json)

Scope and limitations (explicit)

  • Readiness failures (spec/taxonomy): claims are conditional on the current readiness policy; signal_ready is diagnostic; claim_eligible_strict (claim_eligible AND signal_ready_ok) is provided for signal-gated eligibility.
  • Baseline-sufficient slice: evidence of slice heterogeneity; a slice-adaptive policy (baseline where it dominates, model otherwise) is the right operational outcome and improves interpretability. Model-versus-baseline is a reality check, not the primary claim.
  • Portability eval scope: ok_core is the portability gate; ok_eval applies only to metrics computable from the portable payload; keep eval metrics diagnostic unless the payload expands.
  • Rare catastrophes (ABC suite): unweighted models remain near base rate under ultra-rare step labels; balanced + calibrated logreg shows consistent lift above PR baseline at k=3 with budgeted and event-level gains (see reports/rare_failure_audits_10k/ and reports/rare_failure_audits_10k_A1/). External validity still pending.
  • Generalization probes: dense-positive probes are not rare-failure transfer claims; rarity-controlled probes use shared severity with A1-derived thresholds and report actual prevalence + event-count audits per regime.
  • External validity: Omniverse transfer is still pending; portability audits are necessary but not sufficient. Validate on Omniverse tasks without per-scenario hand tuning before broad claims.

WarnBreak learning (A1 vs A5)

See docs/warnbreak_learning.md and docs/learning_readiness.md for gate policy and the minimal publishable grid. To run the end-to-end WarnBreak learning workflow:

bash scripts/run_warnbreak_learning.sh --a1 <path/to/A1/datasets/learning_clean> --a5 <path/to/A5/datasets/learning_clean>

Outputs land under:

reports/warnbreak_learning/<timestamp>/

Milestones

  • 2026-01-19: "better data => better learning" for WarnBreak (A5 > A1) across 8 slices (logreg+MLP, all train fracs).
  • Mechanism: pre-break separability in queue_wcet/queue_len/queue_work_est (hard-only attribution).
  • Learning sweep reports: reports/warnbreak_learning/20260119_184145/
  • Attribution reports: reports/warnbreak_attribution/20260119_205552/
  • Tag: freeze_learning_results_v2_mechanism_2026-01-19
  • Footnote: attribution uses numpy fallback logreg (see docs/warnbreak_learning.md).

Demo Policy Artifacts

The demo CLI (entrypoint pf demo) writes:

  • report.md
  • summary.json
  • deployed_policy.json (stable policy artifact: rule_text, thresholds, required_features, selection/metrics splits, derived TEST metrics)
  • selection_debug.json (optional, when selection debug info is available)

Smoke-check summary integrity:

python3 scripts/smoke_policy_integrity.py --summary <out_dir>/summary.json

Operational Gate IDE workflow for robotics teams

Run the operational gate with IDE-ready outputs and bundles:

OUT="/tmp/pf_operational_gate"
pf demo --out "$OUT" --horizon 10 --budget 0.10 --group system_only --optimize-for utility \
  --emit-policy-pack --emit-episodes --emit-alert-events \
  --bundle-window 20 --evidence-window 5

This produces:

  • policy_pack.json (canonical tier specs, budgets, playbooks, calibration_context)
  • gate_events.json/gate_events.jsonl and alert_events.json/alert_events.jsonl
  • report_gate_ide.html (offline IDE-style debugging UI)
  • bundles/<episode_id>/<event_id>/ with event context, trace slice, repro stub

Schema migration notes:

  • policy_pack.json schema_version 2.0 adds tiers, budgets, and calibration_context (warn/abort policies remain for back-compat).
  • gate_events schema_version pf_gate_event_v2 adds event_id, margin, margin_ratio, consecutive_over, evidence_window, suspected_subsystem, and real bundle_paths.
  • alert_events and gate_events are now emitted as .json arrays alongside the legacy .jsonl.

Docs

  • docs/README_ABC.md
  • docs/architecture_abc.md
  • docs/trace_schema.md
  • docs/evaluation_abc.md

About

PF Gate — deterministic CI gate + debug packets for robotics runs. Turns traces into PASS/WARN/FAIL/QUARANTINE decisions with auditable receipts, offline reports, JUnit output, and diff-as-gate support. Designed to sit above existing robotics logging/visualization stacks and make regressions fast to triage.

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

No packages published