Harden replay evidence safety checks#14
Conversation
There was a problem hiding this comment.
Pull request overview
Hardens the STRIX software replay evidence pipeline by strengthening public-safety redactions and tightening pass/fail envelope behavior, ensuring generated JSON/HTML artifacts remain safe to share.
Changes:
- Canonicalizes scenario paths before applying
public_path()redaction to prevent..traversal leaks. - Treats missing required envelope metrics as an envelope failure (
not_observednow fails the envelope). - HTML-escapes the scenario id in the generated replay page
<title>and ensures a genericarea_coverage_pctmetric is always present.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| scripts/strix_sim_replay.py | Strengthens path redaction, envelope failure rules, and HTML title escaping; adds a generic coverage metric. |
| python/tests/test_strix_sim_replay.py | Adds regression tests for traversal redaction, HTML title escaping, and missing-metric envelope failure. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if normalized.is_absolute(): | ||
| return f"<external>/{normalized.name or '.'}" | ||
| return f"<external>/{path.name or '.'}" |
There was a problem hiding this comment.
public_path() ends with a fallback branch that appears unreachable: candidate.resolve(strict=False) returns an absolute path, so normalized.is_absolute() will always be true and the final return f"<external>/{path.name or '.'}" won’t be hit. Consider simplifying the control flow (or adding a comment explaining why this can still happen) to avoid dead code and make the redaction logic easier to reason about.
| if normalized.is_absolute(): | |
| return f"<external>/{normalized.name or '.'}" | |
| return f"<external>/{path.name or '.'}" | |
| return f"<external>/{normalized.name or '.'}" |
| base_coverage = 100.0 * alive_fraction | ||
| metrics: dict[str, float | int] = { | ||
| "active_agents": active_agents, | ||
| "area_coverage_pct": round(base_coverage, 3), | ||
| "offline_agents": total_agents - active_agents, | ||
| "frame_count": len(frames), | ||
| "mean_energy_remaining_pct": round( |
There was a problem hiding this comment.
Now that area_coverage_pct is part of the base metrics dict, the gps_denied_recon branch later in this function still re-adds the same key via metrics.update(...). Consider removing that duplicate assignment to reduce redundancy and avoid future divergence between the two values.
This follow-up addresses the actionable PR #13 Copilot/Codex review findings after merge.\n\nFixes:\n- canonicalize scenario paths before public path reporting so parent traversal is redacted\n- fail replay envelopes when required metrics are not observed\n- HTML-escape the scenario id in the generated replay title\n- keep a generic area_coverage_pct metric available for public replay contracts\n\nValidation:\n- python -m pytest -q python/tests/test_strix_sim_replay.py -> 8 passed\n- python scripts/verify_public_surface.py -> passed\n- python scripts/strix_test_matrix.py --select smoke --output target/strix-test-reports/smoke.json -> passed\n- pytest -q -> 150 passed\n- cargo test -p strix-optimizer doctrine_profile_accepts_neutral_aliases --lib -> passed\n- cargo check -p strix-python -> passed