Harden deterministic replay harness validation and clock patching#31
Conversation
Review Summary by QodoImplement deterministic replay harness with fixture recording and clock patching
WalkthroughsDescription• Implement deterministic replay harness with fixture recording and validation - Recorder captures screenshots and click/swipe events to JSONL manifests - Mock device replays fixtures with strict event ordering and coordinate assertions • Add simulated clock patching for fast-forward deterministic test execution - Patches time.time, time.sleep, and ALAS timer module aliases - Gracefully handles environments where ALAS internals are not importable • Expand test coverage with replay deviation detection and clock patching validation • Document harness architecture and usage in testing guide and roadmap Diagramflowchart LR
A["Record Scenario"] -->|"Patches device methods"| B["ScenarioRecorder"]
B -->|"Writes PNG frames + manifest.jsonl"| C["Fixture Directory"]
C -->|"Loads manifest"| D["MockDevice"]
D -->|"Validates event sequence"| E["ReplayDeviationError"]
D -->|"Extracts click coordinates"| F["Area Bounds Check"]
G["SimulatedClock"] -->|"Patches time.time/sleep"| H["patched_time Context"]
H -->|"Advances logical clock"| I["Fast-Forward Execution"]
D -->|"Uses clock"| I
File Changes1. agent_orchestrator/replay/mock_device.py
|
Code Review by Qodo
1. record_scenario.py outside tools
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c269b92d63
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| "start_area": [int(p1[0]), int(p1[1]), int(p1[0]), int(p1[1])], | ||
| "end_area": [int(p2[0]), int(p2[1]), int(p2[0]), int(p2[1])], |
There was a problem hiding this comment.
Record swipe bounds with tolerance
The recorder writes swipe start_area/end_area as [x, y, x, y], which makes replay require exact coordinates. In real login code paths like handle_cn_user_agreement (alas_wrapped/module/handler/login.py:244-246), swipe points are generated with random_rectangle_point, so the same behavior can produce different endpoints between record and replay and trigger ReplayDeviationError even when nothing regressed. This makes the replay harness flaky for any flow with randomized swipes; the manifest should capture a non-zero area (or tolerance) for swipe endpoints.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Pull request overview
This PR hardens the deterministic replay harness used for offline state-machine regression checks by tightening replay/manifest validation, improving simulated clock patching, and adding a recorder + unit tests to make fixture capture/replay less brittle.
Changes:
- Added a fixture recorder script that captures screenshot frames and action events into a JSONL manifest.
- Added a mock replay device + simulated clock, with stricter validation for click/swipe ordering and coordinate bounds.
- Added/expanded pytest coverage for fast-forward time patching and replay deviation detection, plus documentation/changelog updates.
Reviewed changes
Copilot reviewed 8 out of 9 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| docs/dev/testing.md | Documents the current deterministic replay harness components and purpose. |
| docs/ROADMAP.md | Marks the replay harness scaffold as completed in success criteria. |
| docs/ARCHITECTURE.md | Adds an architecture section describing the replay harness and its components. |
| alas_wrapped/dev_tools/record_scenario.py | New recorder CLI to generate fixture images + manifest.jsonl from live ALAS runs. |
| agent_orchestrator/replay/mock_device.py | New manifest-driven replay device + simulated clock + deviation assertions. |
| agent_orchestrator/replay/time_control.py | New context manager to patch time and ALAS timer aliases to a simulated clock. |
| agent_orchestrator/test_login_replay.py | New tests for replay success, deviation detection, target mismatch, and patched sleep behavior. |
| CHANGELOG.md | Notes the addition of the deterministic replay harness scaffold. |
| x, y = self._extract_click_point(target) | ||
| x1, y1, x2, y2 = area |
There was a problem hiding this comment.
Manifest schema issues can currently raise generic exceptions instead of ReplayDeviationError (e.g., x1, y1, x2, y2 = area will throw ValueError if area isn't length 4). Since this is a validation harness, consider validating area shape/types explicitly and raising ReplayDeviationError with a clear message for malformed data.
| x, y = self._extract_click_point(target) | |
| x1, y1, x2, y2 = area | |
| # Validate area shape and contents to avoid generic unpacking/type errors | |
| if not isinstance(area, (list, tuple)) or len(area) != 4: | |
| raise ReplayDeviationError( | |
| f"Click action in manifest has malformed `area`; expected sequence of 4 numbers, got: {area!r}" | |
| ) | |
| try: | |
| x1, y1, x2, y2 = [float(v) for v in area] | |
| except (TypeError, ValueError): | |
| raise ReplayDeviationError( | |
| f"Click action in manifest has non-numeric `area` bounds: {area!r}" | |
| ) | |
| x, y = self._extract_click_point(target) |
| start_area = event["start_area"] | ||
| end_area = event["end_area"] | ||
| if not self._point_in_area(p1, start_area): | ||
| raise ReplayDeviationError(f"Swipe start out of expected area: {p1} not in {start_area}") | ||
| if not self._point_in_area(p2, end_area): | ||
| raise ReplayDeviationError(f"Swipe end out of expected area: {p2} not in {end_area}") | ||
|
|
||
| @staticmethod | ||
| def _point_in_area(point: tuple[int, int], area: list[int]) -> bool: |
There was a problem hiding this comment.
start_area = event["start_area"] / end_area = event["end_area"] will raise KeyError for malformed manifests, and _point_in_area will raise ValueError if the area isn't length 4. Consider validating presence/shape of these fields and raising ReplayDeviationError instead so replay failures are consistently reported as deviations.
| start_area = event["start_area"] | |
| end_area = event["end_area"] | |
| if not self._point_in_area(p1, start_area): | |
| raise ReplayDeviationError(f"Swipe start out of expected area: {p1} not in {start_area}") | |
| if not self._point_in_area(p2, end_area): | |
| raise ReplayDeviationError(f"Swipe end out of expected area: {p2} not in {end_area}") | |
| @staticmethod | |
| def _point_in_area(point: tuple[int, int], area: list[int]) -> bool: | |
| start_area = event.get("start_area") | |
| end_area = event.get("end_area") | |
| if start_area is None or end_area is None: | |
| raise ReplayDeviationError("Swipe action in manifest missing `start_area` or `end_area` bounds") | |
| if not ( | |
| isinstance(start_area, (list, tuple)) | |
| and isinstance(end_area, (list, tuple)) | |
| and len(start_area) == 4 | |
| and len(end_area) == 4 | |
| ): | |
| raise ReplayDeviationError( | |
| f"Swipe action in manifest has invalid area bounds: start_area={start_area}, end_area={end_area}" | |
| ) | |
| if not self._point_in_area(p1, start_area): | |
| raise ReplayDeviationError(f"Swipe start out of expected area: {p1} not in {start_area}") | |
| if not self._point_in_area(p2, end_area): | |
| raise ReplayDeviationError(f"Swipe end out of expected area: {p2} not in {end_area}") | |
| @staticmethod | |
| def _point_in_area(point: tuple[int, int], area: list[int] | tuple[int, int, int, int]) -> bool: | |
| if not isinstance(area, (list, tuple)) or len(area) != 4: | |
| raise ReplayDeviationError(f"Invalid area bounds: {area!r}") |
| return np.array(image) | ||
|
|
||
| def click(self, target: Any) -> None: | ||
| event = self._consume(expected_type="action") |
There was a problem hiding this comment.
Replay clock is only set on screenshot() events. Because action events in the manifest also carry timestamp, any time.time() calls that occur between screenshots during replay will see a stale time value. Consider setting self.clock from event["timestamp"] for action events as well (click/swipe) to preserve the recorded timeline.
| event = self._consume(expected_type="action") | |
| event = self._consume(expected_type="action") | |
| timestamp = event.get("timestamp") | |
| if timestamp is not None: | |
| self.clock.set(float(timestamp)) |
| actual_target = str(target) | ||
| if expected_target and expected_target != actual_target: |
There was a problem hiding this comment.
expected_target = event.get("target") is treated as optional (if expected_target and ...). If the manifest is missing target (or has an empty value), replay will silently skip target validation, which weakens the harness' "strict" semantics. Consider requiring a non-empty target for click actions and raising ReplayDeviationError when it's missing.
| actual_target = str(target) | |
| if expected_target and expected_target != actual_target: | |
| if not expected_target: | |
| raise ReplayDeviationError("Click action in manifest missing `target`") | |
| actual_target = str(target) | |
| if expected_target != actual_target: |
| from PIL import Image | ||
|
|
||
| from alas import AzurLaneAutoScript | ||
|
|
There was a problem hiding this comment.
1. record_scenario.py outside tools 📘 Rule violation ✓ Correctness
alas_wrapped/dev_tools/record_scenario.py imports ALAS runtime (AzurLaneAutoScript) but is not located under alas_wrapped/tools/ as required for ALAS-internal imports. This breaks the directory-based layering contract and can lead to inconsistent coupling/packaging expectations.
Agent Prompt
## Issue description
`alas_wrapped/dev_tools/record_scenario.py` imports ALAS runtime internals but is not placed under `alas_wrapped/tools/`, violating the code placement rule.
## Issue Context
The compliance rule enforces directory-based layering: anything that imports ALAS internals (e.g., `alas`, `module.*`) must live under `alas_wrapped/tools/`.
## Fix Focus Areas
- alas_wrapped/dev_tools/record_scenario.py[10-13]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| # ALAS timer module imports time/datetime directly; patch aliases when available. | ||
| try: | ||
| stack.enter_context(patch("module.base.timer.time", side_effect=clock.time)) | ||
| stack.enter_context(patch("module.base.timer.sleep", side_effect=lambda seconds: clock.advance(seconds))) | ||
| stack.enter_context(patch("module.base.timer.datetime", _TimerDatetime)) | ||
| except ModuleNotFoundError: | ||
| pass |
There was a problem hiding this comment.
2. patched_time patches module.base.timer 📘 Rule violation ✓ Correctness
agent_orchestrator/replay/time_control.py conditionally patches ALAS internals via module.base.timer.*, which relies on importing module.* at runtime. Per the layering rule, code that imports/depends on ALAS internals should not live under agent_orchestrator/.
Agent Prompt
## Issue description
`agent_orchestrator/replay/time_control.py` depends on ALAS internals (`module.base.timer.*`) via runtime patching, violating the directory-based layering rule.
## Issue Context
Even though the code is guarded by `try/except`, `patch("module.base.timer.*")` requires importing `module.*` when present, which is still an ALAS-internal dependency.
## Fix Focus Areas
- agent_orchestrator/replay/time_control.py[24-30]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| def __init__(self, scenario_name: str, base_dir: Path | None = None): | ||
| base = base_dir or Path("tests/fixtures") | ||
| self.fixture_dir = base / scenario_name | ||
| self.images_dir = self.fixture_dir / "images" | ||
| self.manifest_path = self.fixture_dir / "manifest.jsonl" | ||
| self.images_dir.mkdir(parents=True, exist_ok=True) | ||
| for old_frame in self.images_dir.glob("*.png"): | ||
| old_frame.unlink() | ||
|
|
There was a problem hiding this comment.
3. scenario allows path traversal 📘 Rule violation ⛨ Security
The recorder constructs fixture_dir from CLI-provided inputs and then unlinks *.png files without verifying the resolved path stays within the intended fixtures directory. A crafted scenario (or --fixtures-root) can cause deletion of unintended files/directories.
Agent Prompt
## Issue description
The recorder tool accepts CLI inputs that influence filesystem paths and then deletes files, without validating that the resolved directory is contained within the intended fixtures directory.
## Issue Context
Inputs like `scenario="../somewhere"` or a custom `--fixtures-root` can cause the tool to unlink unrelated `*.png` files.
## Fix Focus Areas
- alas_wrapped/dev_tools/record_scenario.py[16-24]
- alas_wrapped/dev_tools/record_scenario.py[120-134]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| script = AzurLaneAutoScript(config_name=args.config) | ||
| device = script.device | ||
|
|
||
| if not hasattr(device, args.method): | ||
| raise AttributeError(f"Device has no method '{args.method}'") | ||
|
|
||
| call = getattr(device, args.method) | ||
| with DevicePatchSession(device=device, recorder=recorder): | ||
| result = call() |
There was a problem hiding this comment.
4. Recorder calls wrong object 🐞 Bug ✓ Correctness
The scenario recorder invokes the configured method (default: handle_app_login) on script.device, but ALAS login entrypoints are on the script/handlers, not Device. Using the tool with its documented defaults will raise AttributeError and prevent fixture capture.
Agent Prompt
### Issue description
`alas_wrapped/dev_tools/record_scenario.py` currently looks up `--method` (default: `handle_app_login`) on `script.device`, but `handle_app_login` is implemented on handler/script flows rather than the Device. This makes the recorder fail with defaults.
### Issue Context
The script already instantiates `AzurLaneAutoScript`, which exposes `start()`/`restart()` helpers that internally create a `LoginHandler` and run the login flow.
### Fix Focus Areas
- alas_wrapped/dev_tools/record_scenario.py[120-153]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| def click(self, target: Any) -> None: | ||
| event = self._consume(expected_type="action") | ||
| if event.get("action") != "click": | ||
| raise ReplayDeviationError(f"Expected click action, found {event.get('action')}") | ||
|
|
||
| expected_target = event.get("target") | ||
| actual_target = str(target) | ||
| if expected_target and expected_target != actual_target: | ||
| raise ReplayDeviationError(f"Expected click target {expected_target}, found {actual_target}") | ||
|
|
||
| area = event.get("area") | ||
| if not area: | ||
| raise ReplayDeviationError("Click action in manifest missing `area` bounds") | ||
|
|
||
| x, y = self._extract_click_point(target) | ||
| x1, y1, x2, y2 = area | ||
| if not (x1 <= x <= x2 and y1 <= y <= y2): | ||
| raise ReplayDeviationError( | ||
| f"Click out of expected area: ({x}, {y}) not in [{x1}, {y1}, {x2}, {y2}]" | ||
| ) | ||
|
|
||
| def swipe(self, p1: tuple[int, int], p2: tuple[int, int]) -> None: | ||
| event = self._consume(expected_type="action") | ||
| if event.get("action") != "swipe": | ||
| raise ReplayDeviationError(f"Expected swipe action, found {event.get('action')}") | ||
|
|
||
| start_area = event["start_area"] | ||
| end_area = event["end_area"] | ||
| if not self._point_in_area(p1, start_area): | ||
| raise ReplayDeviationError(f"Swipe start out of expected area: {p1} not in {start_area}") | ||
| if not self._point_in_area(p2, end_area): | ||
| raise ReplayDeviationError(f"Swipe end out of expected area: {p2} not in {end_area}") | ||
|
|
There was a problem hiding this comment.
5. Replay skips action timestamps 🐞 Bug ✓ Correctness
MockDevice sets the simulated clock only when consuming screenshot events and ignores timestamps on click/swipe action events. This makes time.time()/sleep (patched to the simulated clock) diverge from the manifest between screenshots and leaves recorded action timestamps unused.
Agent Prompt
### Issue description
Action events in the manifest contain `timestamp`, but `MockDevice.click()` / `MockDevice.swipe()` do not update the `SimulatedClock` from it. Since `patched_time()` routes `time.time()` through this clock, time can be stale between screenshots.
### Issue Context
`screenshot()` already sets the clock from the manifest timestamp; action events should do the same for consistency.
### Fix Focus Areas
- agent_orchestrator/replay/mock_device.py[85-126]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
Addresses Qodo review findings: - click()/swipe() now set SimulatedClock from manifest timestamps - area/start_area/end_area validated for type and length before unpacking - _point_in_area() also validates defensively
Motivation
Description
agent_orchestrator/replay/mock_device.pyto verify expected click target names, require manifestareabounds, and keep existing area/ordering checks.agent_orchestrator/replay/time_control.pyby patchingtime.timeandtime.sleepglobally and patchingmodule.base.timeraliases when importable, while avoiding hard imports that break lightweight test runs.alas_wrapped/dev_tools/record_scenario.pyto clear stale PNGs on start, infer point-area clicks for tuple/list targets, and fail early when a click area cannot be inferred.agent_orchestrator/test_login_replay.pyto assert target-mismatch deviations and to validate that patched sleep advances the logical clock without waiting, and removed the extraalas_wrappedsys.pathinsertion fromagent_orchestrator/conftest.py.docs/dev/testing.mdandCHANGELOG.mdto reflect refined clock-patching semantics and harness behavior.Testing
pytest -q agent_orchestrator/test_login_replay.pywhich passed (4 passed).python -m py_compileon the new/modified modules which succeeded.pytest -q --maxfail=1for the broader suite which still fails due to an external OpenCV runtime dependency (ImportError: libGL.so.1) unrelated to the replay harness changes.Codex Task