Skip to content

Harden deterministic replay harness validation and clock patching#31

Merged
Coldaine merged 2 commits intotrunk/stabilizationfrom
codex/implement-deterministic-testing-harness-for-alas/2026-02-23
Feb 23, 2026
Merged

Harden deterministic replay harness validation and clock patching#31
Coldaine merged 2 commits intotrunk/stabilizationfrom
codex/implement-deterministic-testing-harness-for-alas/2026-02-23

Conversation

@Coldaine
Copy link
Owner

Motivation

  • Improve the deterministic replay harness robustness after review feedback by making manifest semantics stricter and the time-control helper more reliable across environments.
  • Ensure recorder output is deterministic and self-consistent (no stale frames or ambiguous click targets) so replay tests are less brittle.
  • Make the fast-forward time patching broadly effective for unit tests while gracefully handling repos/environments where ALAS internals are not importable.

Description

  • Tightened replay validation in agent_orchestrator/replay/mock_device.py to verify expected click target names, require manifest area bounds, and keep existing area/ordering checks.
  • Improved deterministic time control in agent_orchestrator/replay/time_control.py by patching time.time and time.sleep globally and patching module.base.timer aliases when importable, while avoiding hard imports that break lightweight test runs.
  • Hardened recorder behavior in alas_wrapped/dev_tools/record_scenario.py to clear stale PNGs on start, infer point-area clicks for tuple/list targets, and fail early when a click area cannot be inferred.
  • Expanded unit tests in agent_orchestrator/test_login_replay.py to assert target-mismatch deviations and to validate that patched sleep advances the logical clock without waiting, and removed the extra alas_wrapped sys.path insertion from agent_orchestrator/conftest.py.
  • Updated docs/dev/testing.md and CHANGELOG.md to reflect refined clock-patching semantics and harness behavior.

Testing

  • Ran pytest -q agent_orchestrator/test_login_replay.py which passed (4 passed).
  • Verified syntax by running python -m py_compile on the new/modified modules which succeeded.
  • Ran pytest -q --maxfail=1 for the broader suite which still fails due to an external OpenCV runtime dependency (ImportError: libGL.so.1) unrelated to the replay harness changes.

Codex Task

Copilot AI review requested due to automatic review settings February 23, 2026 07:51
@qodo-free-for-open-source-projects

Review Summary by Qodo

Implement deterministic replay harness with fixture recording and clock patching

✨ Enhancement 🧪 Tests

Grey Divider

Walkthroughs

Description
• Implement deterministic replay harness with fixture recording and validation
  - Recorder captures screenshots and click/swipe events to JSONL manifests
  - Mock device replays fixtures with strict event ordering and coordinate assertions
• Add simulated clock patching for fast-forward deterministic test execution
  - Patches time.time, time.sleep, and ALAS timer module aliases
  - Gracefully handles environments where ALAS internals are not importable
• Expand test coverage with replay deviation detection and clock patching validation
• Document harness architecture and usage in testing guide and roadmap
Diagram
flowchart LR
  A["Record Scenario"] -->|"Patches device methods"| B["ScenarioRecorder"]
  B -->|"Writes PNG frames + manifest.jsonl"| C["Fixture Directory"]
  C -->|"Loads manifest"| D["MockDevice"]
  D -->|"Validates event sequence"| E["ReplayDeviationError"]
  D -->|"Extracts click coordinates"| F["Area Bounds Check"]
  G["SimulatedClock"] -->|"Patches time.time/sleep"| H["patched_time Context"]
  H -->|"Advances logical clock"| I["Fast-Forward Execution"]
  D -->|"Uses clock"| I
Loading

Grey Divider

File Changes

1. agent_orchestrator/replay/mock_device.py ✨ Enhancement +153/-0

Replay device with manifest validation and deviation detection

agent_orchestrator/replay/mock_device.py


2. agent_orchestrator/replay/time_control.py ✨ Enhancement +32/-0

Simulated clock patching for deterministic time control

agent_orchestrator/replay/time_control.py


3. agent_orchestrator/test_login_replay.py 🧪 Tests +121/-0

Replay harness tests covering fast-forward and deviation cases

agent_orchestrator/test_login_replay.py


View more (6)
4. alas_wrapped/dev_tools/record_scenario.py ✨ Enhancement +160/-0

Fixture recorder capturing screenshots and action events

alas_wrapped/dev_tools/record_scenario.py


5. CHANGELOG.md 📝 Documentation +6/-0

Document deterministic replay harness components and features

CHANGELOG.md


6. docs/ARCHITECTURE.md 📝 Documentation +10/-0

Add deterministic replay harness architecture section

docs/ARCHITECTURE.md


7. docs/ROADMAP.md 📝 Documentation +1/-0

Mark replay harness scaffolding success criterion complete

docs/ROADMAP.md


8. docs/dev/testing.md 📝 Documentation +11/-0

Document replay harness usage and clock control semantics

docs/dev/testing.md


9. agent_orchestrator/replay/__init__.py Additional files +0/-0

...

agent_orchestrator/replay/init.py


Grey Divider

Qodo Logo

@qodo-free-for-open-source-projects
Copy link

qodo-free-for-open-source-projects bot commented Feb 23, 2026

Code Review by Qodo

🐞 Bugs (2) 📘 Rule violations (4) 📎 Requirement gaps (0)

Grey Divider


Action required

1. record_scenario.py outside tools 📘 Rule violation ✓ Correctness
Description
alas_wrapped/dev_tools/record_scenario.py imports ALAS runtime (AzurLaneAutoScript) but is not
located under alas_wrapped/tools/ as required for ALAS-internal imports. This breaks the
directory-based layering contract and can lead to inconsistent coupling/packaging expectations.
Code

alas_wrapped/dev_tools/record_scenario.py[R10-13]

+from PIL import Image
+
+from alas import AzurLaneAutoScript
+
Evidence
Compliance ID 12 requires code that imports ALAS internals to live under alas_wrapped/tools/. The
added file is under alas_wrapped/dev_tools/ and directly imports AzurLaneAutoScript from alas,
indicating ALAS coupling outside the allowed location.

AGENTS.md
alas_wrapped/dev_tools/record_scenario.py[10-13]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`alas_wrapped/dev_tools/record_scenario.py` imports ALAS runtime internals but is not placed under `alas_wrapped/tools/`, violating the code placement rule.
## Issue Context
The compliance rule enforces directory-based layering: anything that imports ALAS internals (e.g., `alas`, `module.*`) must live under `alas_wrapped/tools/`.
## Fix Focus Areas
- alas_wrapped/dev_tools/record_scenario.py[10-13]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


2. patched_time patches module.base.timer 📘 Rule violation ✓ Correctness
Description
agent_orchestrator/replay/time_control.py conditionally patches ALAS internals via
module.base.timer.*, which relies on importing module.* at runtime. Per the layering rule, code
that imports/depends on ALAS internals should not live under agent_orchestrator/.
Code

agent_orchestrator/replay/time_control.py[R24-30]

+        # ALAS timer module imports time/datetime directly; patch aliases when available.
+        try:
+            stack.enter_context(patch("module.base.timer.time", side_effect=clock.time))
+            stack.enter_context(patch("module.base.timer.sleep", side_effect=lambda seconds: clock.advance(seconds)))
+            stack.enter_context(patch("module.base.timer.datetime", _TimerDatetime))
+        except ModuleNotFoundError:
+            pass
Evidence
Compliance ID 12 requires code that imports ALAS internals (module.*) to be placed under
alas_wrapped/tools/. The new patched_time helper in agent_orchestrator/ attempts to patch
module.base.timer.*, which triggers imports of module.* when available.

AGENTS.md
agent_orchestrator/replay/time_control.py[24-30]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`agent_orchestrator/replay/time_control.py` depends on ALAS internals (`module.base.timer.*`) via runtime patching, violating the directory-based layering rule.
## Issue Context
Even though the code is guarded by `try/except`, `patch("module.base.timer.*")` requires importing `module.*` when present, which is still an ALAS-internal dependency.
## Fix Focus Areas
- agent_orchestrator/replay/time_control.py[24-30]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


3. scenario allows path traversal 📘 Rule violation ⛨ Security
Description
The recorder constructs fixture_dir from CLI-provided inputs and then unlinks *.png files
without verifying the resolved path stays within the intended fixtures directory. A crafted
scenario (or --fixtures-root) can cause deletion of unintended files/directories.
Code

alas_wrapped/dev_tools/record_scenario.py[R16-24]

+    def __init__(self, scenario_name: str, base_dir: Path | None = None):
+        base = base_dir or Path("tests/fixtures")
+        self.fixture_dir = base / scenario_name
+        self.images_dir = self.fixture_dir / "images"
+        self.manifest_path = self.fixture_dir / "manifest.jsonl"
+        self.images_dir.mkdir(parents=True, exist_ok=True)
+        for old_frame in self.images_dir.glob("*.png"):
+            old_frame.unlink()
+
Evidence
Compliance ID 6 requires validation/sanitization of external inputs, especially when they affect
filesystem operations. Here, scenario_name is used directly in path construction and the code
deletes existing PNGs under that path without any containment checks.

Rule 6: Generic: Security-First Input Validation and Data Handling
alas_wrapped/dev_tools/record_scenario.py[16-24]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The recorder tool accepts CLI inputs that influence filesystem paths and then deletes files, without validating that the resolved directory is contained within the intended fixtures directory.
## Issue Context
Inputs like `scenario="../somewhere"` or a custom `--fixtures-root` can cause the tool to unlink unrelated `*.png` files.
## Fix Focus Areas
- alas_wrapped/dev_tools/record_scenario.py[16-24]
- alas_wrapped/dev_tools/record_scenario.py[120-134]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


View more (2)
4. Recorder calls wrong object 🐞 Bug ✓ Correctness
Description
The scenario recorder invokes the configured method (default: handle_app_login) on script.device,
but ALAS login entrypoints are on the script/handlers, not Device. Using the tool with its
documented defaults will raise AttributeError and prevent fixture capture.
Code

alas_wrapped/dev_tools/record_scenario.py[R144-152]

+    script = AzurLaneAutoScript(config_name=args.config)
+    device = script.device
+
+    if not hasattr(device, args.method):
+        raise AttributeError(f"Device has no method '{args.method}'")
+
+    call = getattr(device, args.method)
+    with DevicePatchSession(device=device, recorder=recorder):
+        result = call()
Evidence
record_scenario resolves the callable via getattr(device, args.method) with a default method name
that is not a Device API. In ALAS, handle_app_login is implemented by LoginHandler, and
AzurLaneAutoScript exposes higher-level entrypoints like start()/restart() that create a
LoginHandler; Device is a Screenshot/Control/AppControl composition and does not implement login
flows.

alas_wrapped/dev_tools/record_scenario.py[120-153]
alas_wrapped/module/handler/login.py[256-282]
alas_wrapped/alas.py[304-321]
alas_wrapped/module/device/device.py[64-66]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`alas_wrapped/dev_tools/record_scenario.py` currently looks up `--method` (default: `handle_app_login`) on `script.device`, but `handle_app_login` is implemented on handler/script flows rather than the Device. This makes the recorder fail with defaults.
### Issue Context
The script already instantiates `AzurLaneAutoScript`, which exposes `start()`/`restart()` helpers that internally create a `LoginHandler` and run the login flow.
### Fix Focus Areas
- alas_wrapped/dev_tools/record_scenario.py[120-153]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


5. Replay skips action timestamps🐞 Bug ✓ Correctness
Description
MockDevice sets the simulated clock only when consuming screenshot events and ignores timestamps on
click/swipe action events. This makes time.time()/sleep (patched to the simulated clock) diverge
from the manifest between screenshots and leaves recorded action timestamps unused.
Code

agent_orchestrator/replay/mock_device.py[R94-126]

+    def click(self, target: Any) -> None:
+        event = self._consume(expected_type="action")
+        if event.get("action") != "click":
+            raise ReplayDeviationError(f"Expected click action, found {event.get('action')}")
+
+        expected_target = event.get("target")
+        actual_target = str(target)
+        if expected_target and expected_target != actual_target:
+            raise ReplayDeviationError(f"Expected click target {expected_target}, found {actual_target}")
+
+        area = event.get("area")
+        if not area:
+            raise ReplayDeviationError("Click action in manifest missing `area` bounds")
+
+        x, y = self._extract_click_point(target)
+        x1, y1, x2, y2 = area
+        if not (x1 <= x <= x2 and y1 <= y <= y2):
+            raise ReplayDeviationError(
+                f"Click out of expected area: ({x}, {y}) not in [{x1}, {y1}, {x2}, {y2}]"
+            )
+
+    def swipe(self, p1: tuple[int, int], p2: tuple[int, int]) -> None:
+        event = self._consume(expected_type="action")
+        if event.get("action") != "swipe":
+            raise ReplayDeviationError(f"Expected swipe action, found {event.get('action')}")
+
+        start_area = event["start_area"]
+        end_area = event["end_area"]
+        if not self._point_in_area(p1, start_area):
+            raise ReplayDeviationError(f"Swipe start out of expected area: {p1} not in {start_area}")
+        if not self._point_in_area(p2, end_area):
+            raise ReplayDeviationError(f"Swipe end out of expected area: {p2} not in {end_area}")
+
Evidence
The recorder writes a timestamp for each action event. During replay, screenshot() updates the clock
from the manifest, but click()/swipe() do not, while patched_time drives time.time/time.sleep from
this clock; therefore, any time reads between a screenshot and the next screenshot will not reflect
the manifest action timestamps.

alas_wrapped/dev_tools/record_scenario.py[80-107]
agent_orchestrator/replay/mock_device.py[85-113]
agent_orchestrator/replay/mock_device.py[115-126]
agent_orchestrator/replay/time_control.py[20-23]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Action events in the manifest contain `timestamp`, but `MockDevice.click()` / `MockDevice.swipe()` do not update the `SimulatedClock` from it. Since `patched_time()` routes `time.time()` through this clock, time can be stale between screenshots.
### Issue Context
`screenshot()` already sets the clock from the manifest timestamp; action events should do the same for consistency.
### Fix Focus Areas
- agent_orchestrator/replay/mock_device.py[85-126]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

6. area shape not validated📘 Rule violation ⛯ Reliability
Description
Replay code unpacks area, start_area, and end_area from the manifest without validating
shape/type, so malformed fixtures can raise ValueError/KeyError rather than a contextual
ReplayDeviationError. This weakens edge-case handling and makes fixture issues harder to diagnose.
Code

agent_orchestrator/replay/mock_device.py[R104-131]

+        area = event.get("area")
+        if not area:
+            raise ReplayDeviationError("Click action in manifest missing `area` bounds")
+
+        x, y = self._extract_click_point(target)
+        x1, y1, x2, y2 = area
+        if not (x1 <= x <= x2 and y1 <= y <= y2):
+            raise ReplayDeviationError(
+                f"Click out of expected area: ({x}, {y}) not in [{x1}, {y1}, {x2}, {y2}]"
+            )
+
+    def swipe(self, p1: tuple[int, int], p2: tuple[int, int]) -> None:
+        event = self._consume(expected_type="action")
+        if event.get("action") != "swipe":
+            raise ReplayDeviationError(f"Expected swipe action, found {event.get('action')}")
+
+        start_area = event["start_area"]
+        end_area = event["end_area"]
+        if not self._point_in_area(p1, start_area):
+            raise ReplayDeviationError(f"Swipe start out of expected area: {p1} not in {start_area}")
+        if not self._point_in_area(p2, end_area):
+            raise ReplayDeviationError(f"Swipe end out of expected area: {p2} not in {end_area}")
+
+    @staticmethod
+    def _point_in_area(point: tuple[int, int], area: list[int]) -> bool:
+        x, y = point
+        x1, y1, x2, y2 = area
+        return x1 <= x <= x2 and y1 <= y <= y2
Evidence
Compliance ID 3 requires robust edge-case management with meaningful error context. The new replay
logic directly unpacks and indexes manifest-provided areas without validation, leading to
non-graceful failures on malformed or incomplete manifest data.

Rule 3: Generic: Robust Error Handling and Edge Case Management
agent_orchestrator/replay/mock_device.py[104-113]
agent_orchestrator/replay/mock_device.py[120-131]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Malformed or incomplete manifests can cause unhandled `KeyError`/`ValueError` during replay because area bounds are unpacked without validation.
## Issue Context
Manifests are file inputs; even for tests, failures should be deterministic and provide actionable context via `ReplayDeviationError`.
## Fix Focus Areas
- agent_orchestrator/replay/mock_device.py[94-131]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

ⓘ The new review experience is currently in Beta. Learn more

Grey Divider

Qodo Logo

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c269b92d63

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +103 to +104
"start_area": [int(p1[0]), int(p1[1]), int(p1[0]), int(p1[1])],
"end_area": [int(p2[0]), int(p2[1]), int(p2[0]), int(p2[1])],

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Record swipe bounds with tolerance

The recorder writes swipe start_area/end_area as [x, y, x, y], which makes replay require exact coordinates. In real login code paths like handle_cn_user_agreement (alas_wrapped/module/handler/login.py:244-246), swipe points are generated with random_rectangle_point, so the same behavior can produce different endpoints between record and replay and trigger ReplayDeviationError even when nothing regressed. This makes the replay harness flaky for any flow with randomized swipes; the manifest should capture a non-zero area (or tolerance) for swipe endpoints.

Useful? React with 👍 / 👎.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens the deterministic replay harness used for offline state-machine regression checks by tightening replay/manifest validation, improving simulated clock patching, and adding a recorder + unit tests to make fixture capture/replay less brittle.

Changes:

  • Added a fixture recorder script that captures screenshot frames and action events into a JSONL manifest.
  • Added a mock replay device + simulated clock, with stricter validation for click/swipe ordering and coordinate bounds.
  • Added/expanded pytest coverage for fast-forward time patching and replay deviation detection, plus documentation/changelog updates.

Reviewed changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
docs/dev/testing.md Documents the current deterministic replay harness components and purpose.
docs/ROADMAP.md Marks the replay harness scaffold as completed in success criteria.
docs/ARCHITECTURE.md Adds an architecture section describing the replay harness and its components.
alas_wrapped/dev_tools/record_scenario.py New recorder CLI to generate fixture images + manifest.jsonl from live ALAS runs.
agent_orchestrator/replay/mock_device.py New manifest-driven replay device + simulated clock + deviation assertions.
agent_orchestrator/replay/time_control.py New context manager to patch time and ALAS timer aliases to a simulated clock.
agent_orchestrator/test_login_replay.py New tests for replay success, deviation detection, target mismatch, and patched sleep behavior.
CHANGELOG.md Notes the addition of the deterministic replay harness scaffold.

Comment on lines +108 to +109
x, y = self._extract_click_point(target)
x1, y1, x2, y2 = area
Copy link

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Manifest schema issues can currently raise generic exceptions instead of ReplayDeviationError (e.g., x1, y1, x2, y2 = area will throw ValueError if area isn't length 4). Since this is a validation harness, consider validating area shape/types explicitly and raising ReplayDeviationError with a clear message for malformed data.

Suggested change
x, y = self._extract_click_point(target)
x1, y1, x2, y2 = area
# Validate area shape and contents to avoid generic unpacking/type errors
if not isinstance(area, (list, tuple)) or len(area) != 4:
raise ReplayDeviationError(
f"Click action in manifest has malformed `area`; expected sequence of 4 numbers, got: {area!r}"
)
try:
x1, y1, x2, y2 = [float(v) for v in area]
except (TypeError, ValueError):
raise ReplayDeviationError(
f"Click action in manifest has non-numeric `area` bounds: {area!r}"
)
x, y = self._extract_click_point(target)

Copilot uses AI. Check for mistakes.
Comment on lines +120 to +128
start_area = event["start_area"]
end_area = event["end_area"]
if not self._point_in_area(p1, start_area):
raise ReplayDeviationError(f"Swipe start out of expected area: {p1} not in {start_area}")
if not self._point_in_area(p2, end_area):
raise ReplayDeviationError(f"Swipe end out of expected area: {p2} not in {end_area}")

@staticmethod
def _point_in_area(point: tuple[int, int], area: list[int]) -> bool:
Copy link

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

start_area = event["start_area"] / end_area = event["end_area"] will raise KeyError for malformed manifests, and _point_in_area will raise ValueError if the area isn't length 4. Consider validating presence/shape of these fields and raising ReplayDeviationError instead so replay failures are consistently reported as deviations.

Suggested change
start_area = event["start_area"]
end_area = event["end_area"]
if not self._point_in_area(p1, start_area):
raise ReplayDeviationError(f"Swipe start out of expected area: {p1} not in {start_area}")
if not self._point_in_area(p2, end_area):
raise ReplayDeviationError(f"Swipe end out of expected area: {p2} not in {end_area}")
@staticmethod
def _point_in_area(point: tuple[int, int], area: list[int]) -> bool:
start_area = event.get("start_area")
end_area = event.get("end_area")
if start_area is None or end_area is None:
raise ReplayDeviationError("Swipe action in manifest missing `start_area` or `end_area` bounds")
if not (
isinstance(start_area, (list, tuple))
and isinstance(end_area, (list, tuple))
and len(start_area) == 4
and len(end_area) == 4
):
raise ReplayDeviationError(
f"Swipe action in manifest has invalid area bounds: start_area={start_area}, end_area={end_area}"
)
if not self._point_in_area(p1, start_area):
raise ReplayDeviationError(f"Swipe start out of expected area: {p1} not in {start_area}")
if not self._point_in_area(p2, end_area):
raise ReplayDeviationError(f"Swipe end out of expected area: {p2} not in {end_area}")
@staticmethod
def _point_in_area(point: tuple[int, int], area: list[int] | tuple[int, int, int, int]) -> bool:
if not isinstance(area, (list, tuple)) or len(area) != 4:
raise ReplayDeviationError(f"Invalid area bounds: {area!r}")

Copilot uses AI. Check for mistakes.
return np.array(image)

def click(self, target: Any) -> None:
event = self._consume(expected_type="action")
Copy link

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replay clock is only set on screenshot() events. Because action events in the manifest also carry timestamp, any time.time() calls that occur between screenshots during replay will see a stale time value. Consider setting self.clock from event["timestamp"] for action events as well (click/swipe) to preserve the recorded timeline.

Suggested change
event = self._consume(expected_type="action")
event = self._consume(expected_type="action")
timestamp = event.get("timestamp")
if timestamp is not None:
self.clock.set(float(timestamp))

Copilot uses AI. Check for mistakes.
Comment on lines +100 to +101
actual_target = str(target)
if expected_target and expected_target != actual_target:
Copy link

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

expected_target = event.get("target") is treated as optional (if expected_target and ...). If the manifest is missing target (or has an empty value), replay will silently skip target validation, which weakens the harness' "strict" semantics. Consider requiring a non-empty target for click actions and raising ReplayDeviationError when it's missing.

Suggested change
actual_target = str(target)
if expected_target and expected_target != actual_target:
if not expected_target:
raise ReplayDeviationError("Click action in manifest missing `target`")
actual_target = str(target)
if expected_target != actual_target:

Copilot uses AI. Check for mistakes.
Comment on lines +10 to +13
from PIL import Image

from alas import AzurLaneAutoScript

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

1. record_scenario.py outside tools 📘 Rule violation ✓ Correctness

alas_wrapped/dev_tools/record_scenario.py imports ALAS runtime (AzurLaneAutoScript) but is not
located under alas_wrapped/tools/ as required for ALAS-internal imports. This breaks the
directory-based layering contract and can lead to inconsistent coupling/packaging expectations.
Agent Prompt
## Issue description
`alas_wrapped/dev_tools/record_scenario.py` imports ALAS runtime internals but is not placed under `alas_wrapped/tools/`, violating the code placement rule.

## Issue Context
The compliance rule enforces directory-based layering: anything that imports ALAS internals (e.g., `alas`, `module.*`) must live under `alas_wrapped/tools/`.

## Fix Focus Areas
- alas_wrapped/dev_tools/record_scenario.py[10-13]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Comment on lines +24 to +30
# ALAS timer module imports time/datetime directly; patch aliases when available.
try:
stack.enter_context(patch("module.base.timer.time", side_effect=clock.time))
stack.enter_context(patch("module.base.timer.sleep", side_effect=lambda seconds: clock.advance(seconds)))
stack.enter_context(patch("module.base.timer.datetime", _TimerDatetime))
except ModuleNotFoundError:
pass

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

2. patched_time patches module.base.timer 📘 Rule violation ✓ Correctness

agent_orchestrator/replay/time_control.py conditionally patches ALAS internals via
module.base.timer.*, which relies on importing module.* at runtime. Per the layering rule, code
that imports/depends on ALAS internals should not live under agent_orchestrator/.
Agent Prompt
## Issue description
`agent_orchestrator/replay/time_control.py` depends on ALAS internals (`module.base.timer.*`) via runtime patching, violating the directory-based layering rule.

## Issue Context
Even though the code is guarded by `try/except`, `patch("module.base.timer.*")` requires importing `module.*` when present, which is still an ALAS-internal dependency.

## Fix Focus Areas
- agent_orchestrator/replay/time_control.py[24-30]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Comment on lines +16 to +24
def __init__(self, scenario_name: str, base_dir: Path | None = None):
base = base_dir or Path("tests/fixtures")
self.fixture_dir = base / scenario_name
self.images_dir = self.fixture_dir / "images"
self.manifest_path = self.fixture_dir / "manifest.jsonl"
self.images_dir.mkdir(parents=True, exist_ok=True)
for old_frame in self.images_dir.glob("*.png"):
old_frame.unlink()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

3. scenario allows path traversal 📘 Rule violation ⛨ Security

The recorder constructs fixture_dir from CLI-provided inputs and then unlinks *.png files
without verifying the resolved path stays within the intended fixtures directory. A crafted
scenario (or --fixtures-root) can cause deletion of unintended files/directories.
Agent Prompt
## Issue description
The recorder tool accepts CLI inputs that influence filesystem paths and then deletes files, without validating that the resolved directory is contained within the intended fixtures directory.

## Issue Context
Inputs like `scenario="../somewhere"` or a custom `--fixtures-root` can cause the tool to unlink unrelated `*.png` files.

## Fix Focus Areas
- alas_wrapped/dev_tools/record_scenario.py[16-24]
- alas_wrapped/dev_tools/record_scenario.py[120-134]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Comment on lines +144 to +152
script = AzurLaneAutoScript(config_name=args.config)
device = script.device

if not hasattr(device, args.method):
raise AttributeError(f"Device has no method '{args.method}'")

call = getattr(device, args.method)
with DevicePatchSession(device=device, recorder=recorder):
result = call()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

4. Recorder calls wrong object 🐞 Bug ✓ Correctness

The scenario recorder invokes the configured method (default: handle_app_login) on script.device,
but ALAS login entrypoints are on the script/handlers, not Device. Using the tool with its
documented defaults will raise AttributeError and prevent fixture capture.
Agent Prompt
### Issue description
`alas_wrapped/dev_tools/record_scenario.py` currently looks up `--method` (default: `handle_app_login`) on `script.device`, but `handle_app_login` is implemented on handler/script flows rather than the Device. This makes the recorder fail with defaults.

### Issue Context
The script already instantiates `AzurLaneAutoScript`, which exposes `start()`/`restart()` helpers that internally create a `LoginHandler` and run the login flow.

### Fix Focus Areas
- alas_wrapped/dev_tools/record_scenario.py[120-153]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Comment on lines +94 to +126
def click(self, target: Any) -> None:
event = self._consume(expected_type="action")
if event.get("action") != "click":
raise ReplayDeviationError(f"Expected click action, found {event.get('action')}")

expected_target = event.get("target")
actual_target = str(target)
if expected_target and expected_target != actual_target:
raise ReplayDeviationError(f"Expected click target {expected_target}, found {actual_target}")

area = event.get("area")
if not area:
raise ReplayDeviationError("Click action in manifest missing `area` bounds")

x, y = self._extract_click_point(target)
x1, y1, x2, y2 = area
if not (x1 <= x <= x2 and y1 <= y <= y2):
raise ReplayDeviationError(
f"Click out of expected area: ({x}, {y}) not in [{x1}, {y1}, {x2}, {y2}]"
)

def swipe(self, p1: tuple[int, int], p2: tuple[int, int]) -> None:
event = self._consume(expected_type="action")
if event.get("action") != "swipe":
raise ReplayDeviationError(f"Expected swipe action, found {event.get('action')}")

start_area = event["start_area"]
end_area = event["end_area"]
if not self._point_in_area(p1, start_area):
raise ReplayDeviationError(f"Swipe start out of expected area: {p1} not in {start_area}")
if not self._point_in_area(p2, end_area):
raise ReplayDeviationError(f"Swipe end out of expected area: {p2} not in {end_area}")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

5. Replay skips action timestamps 🐞 Bug ✓ Correctness

MockDevice sets the simulated clock only when consuming screenshot events and ignores timestamps on
click/swipe action events. This makes time.time()/sleep (patched to the simulated clock) diverge
from the manifest between screenshots and leaves recorded action timestamps unused.
Agent Prompt
### Issue description
Action events in the manifest contain `timestamp`, but `MockDevice.click()` / `MockDevice.swipe()` do not update the `SimulatedClock` from it. Since `patched_time()` routes `time.time()` through this clock, time can be stale between screenshots.

### Issue Context
`screenshot()` already sets the clock from the manifest timestamp; action events should do the same for consistency.

### Fix Focus Areas
- agent_orchestrator/replay/mock_device.py[85-126]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Addresses Qodo review findings:
- click()/swipe() now set SimulatedClock from manifest timestamps
- area/start_area/end_area validated for type and length before unpacking
- _point_in_area() also validates defensively
@Coldaine Coldaine merged commit 5f1d823 into trunk/stabilization Feb 23, 2026
@Coldaine Coldaine deleted the codex/implement-deterministic-testing-harness-for-alas/2026-02-23 branch March 4, 2026 18:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants