Add software-only visual replay harness#13
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8174c75686
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| failed = [check for check in checks if check["status"] == "failed"] | ||
| return { | ||
| "status": "failed" if failed else "passed", |
There was a problem hiding this comment.
Fail envelope when required metrics are missing
The envelope verdict only treats checks with status "failed" as failures, so any metric missing from metrics is marked "not_observed" but still produces an overall "passed" result. In practice, a scenario with pass_envelope entries that this replay harness does not compute will exit 0 from main() and look successful, which can hide real regression gaps in automated evidence runs.
Useful? React with 👍 / 👎.
| if path.is_relative_to(ROOT): | ||
| return str(path.relative_to(ROOT)) |
There was a problem hiding this comment.
Canonicalize scenario paths before public-path checks
Using path.is_relative_to(ROOT) on an unnormalized path allows parent traversal paths like --scenario ../secret/scenario.yaml to be treated as in-repo and emitted as ../secret/scenario.yaml instead of <external>/.... This breaks the public-safe path redaction guarantee by leaking parent-directory structure in replay JSON/HTML outputs.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Pull request overview
Adds a deterministic, software-only STRIX scenario replay generator to produce public-safe JSON timeline evidence and an optional self-contained HTML canvas visualizer for pre-field behavior inspection.
Changes:
- Introduces
scripts/strix_sim_replay.pyto generate deterministic kinematic replays (JSON + optional HTML). - Adds pytest coverage for determinism, public-safe path redaction, HTML embedding, and zero-index attrition handling.
- Updates docs and the public smoke test matrix to include replay generation as evidence.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| sim/scenarios/README.md | Documents generating deterministic replay evidence from public scenarios. |
| scripts/strix_sim_replay.py | Implements the replay generator, metrics/envelope evaluation, and HTML canvas visualizer. |
| python/tests/test_strix_sim_replay.py | Adds tests for deterministic output, path redaction, HTML generation, and attrition handling. |
| demo/README.md | Shows how to generate and view the replay HTML locally. |
| README.md | Introduces “Software-Only Replay” usage at the top-level documentation. |
| Project_Docs/testing/public_test_matrix.json | Adds a smoke test entry to generate replay artifacts under target/. |
| Project_Docs/testing/EVIDENCE_HARNESS.md | Documents replay generation as part of the public evidence harness workflow. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if path.is_relative_to(ROOT): | ||
| return str(path.relative_to(ROOT)) | ||
| if path_str.startswith("\\\\") or path_str[:3].replace("\\", "/").endswith(":/"): | ||
| return f"<external>/{PureWindowsPath(path_str).name or '.'}" | ||
| if path.is_absolute(): | ||
| return f"<external>/{path.name or '.'}" | ||
| return path_str |
| <meta charset="utf-8"> | ||
| <meta name="viewport" content="width=device-width, initial-scale=1"> | ||
| <title>STRIX Software Replay - {replay['scenario']['id']}</title> | ||
| <style> |
| failed = [check for check in checks if check["status"] == "failed"] | ||
| return { | ||
| "status": "failed" if failed else "passed", |
This PR adds a public-safe software-only replay layer for STRIX scenarios.
Included:
Validation: