Skip to content

feat(replay): deterministic replay harness and scenario recording#32

Merged
Coldaine merged 5 commits intomasterfrom
kilo/deterministic-replay-harness
Mar 2, 2026
Merged

feat(replay): deterministic replay harness and scenario recording#32
Coldaine merged 5 commits intomasterfrom
kilo/deterministic-replay-harness

Conversation

@Coldaine
Copy link
Owner

@Coldaine Coldaine commented Mar 1, 2026

User description

Summary

  • Add agent_orchestrator/replay/: replay engine with fixture discipline
  • Add alas_wrapped/dev_tools/record_scenario.py: capture automation scenarios
  • Add agent_orchestrator/test_login_replay.py: replay-based login test
  • Add tests/ scaffolding for replay coverage

Test plan

  • Record a scenario: python alas_wrapped/dev_tools/record_scenario.py
  • Replay it: python agent_orchestrator/test_login_replay.py
  • Run test suite: pytest tests/

🤖 Generated with Claude Code


PR Type

Enhancement, Tests


Description

  • Add deterministic replay harness for offline ALAS testing

    • MockDevice enforces recorded screenshot/action ordering
    • SimulatedClock enables CPU-speed replay with deterministic time
    • patched_time context manager patches time/sleep during replay
  • Add scenario recording tool to capture automation sequences

    • DevicePatchSession wraps Device methods to record screenshots/actions
    • Saves manifest.jsonl event log and PNG frames to fixture directory
  • Add comprehensive replay test suite with validation

    • Tests cover fast-forward replay, deviation detection, target matching
    • Tests validate click/swipe area bounds and time advancement
  • Add fixture documentation and usage examples


Diagram Walkthrough

flowchart LR
  A["Record Scenario<br/>record_scenario.py"] -->|DevicePatchSession| B["Fixture Directory<br/>manifest.jsonl + images/"]
  B -->|MockDevice| C["Replay Engine<br/>mock_device.py"]
  C -->|patched_time| D["Deterministic Tests<br/>test_login_replay.py"]
  E["SimulatedClock"] -->|controls| C
  E -->|controls| D
Loading

File Walkthrough

Relevant files
Enhancement
__init__.py
Replay harness public API and exports                                       

agent_orchestrator/replay/init.py

  • Exports public API for replay harness (MockDevice, SimulatedClock,
    patched_time)
  • Provides comprehensive module docstring with usage example
  • Imports from mock_device and time_control submodules
+36/-0   
mock_device.py
Replay-only device with manifest validation                           

agent_orchestrator/replay/mock_device.py

  • Implements SimulatedClock for logical time control during replay
  • Implements ReplayManifest to load and parse fixture event logs
  • Implements MockDevice with screenshot/click/swipe methods that enforce
    recorded ordering
  • Validates click targets, coordinates, and swipe areas against recorded
    bounds
  • Raises ReplayDeviationError on any deviation from recorded sequence
+191/-0 
time_control.py
Time patching for deterministic replay execution                 

agent_orchestrator/replay/time_control.py

  • Implements patched_time context manager to patch time.time and
    time.sleep
  • Patches module.base.timer aliases for ALAS timer module compatibility
  • Advances SimulatedClock instead of real sleep, enabling CPU-speed
    replay
  • Provides deterministic datetime.now() via _TimerDatetime wrapper
+53/-0   
record_scenario.py
Scenario recording tool with device patching                         

alas_wrapped/dev_tools/record_scenario.py

  • Implements ScenarioRecorder to manage fixture directory and manifest
    writing
  • Implements DevicePatchSession context manager to wrap Device methods
  • Patches screenshot/click/swipe to record events with timestamps and
    coordinates
  • Extracts button areas from various target object types (button, area
    attributes)
  • Saves screenshot frames as PNG files with sequential naming
  • Provides CLI interface to record scenarios with configurable method
    and config
+182/-0 
Tests
test_login_replay.py
Comprehensive replay validation test suite                             

agent_orchestrator/test_login_replay.py

  • Implements _FakeLoginFlow to simulate ALAS login automation sequence
  • Tests replay fast-forward execution with deterministic time
    advancement
  • Tests ReplayDeviationError raised on event type mismatch
  • Tests target name mismatch detection during click validation
  • Tests click coordinate validation against recorded area bounds
  • Tests swipe area validation for start/end points
  • Tests patched_time advances clock without real sleep delays
+173/-0 
Documentation
README.md
Fixture documentation and usage guide                                       

tests/fixtures/README.md

  • Documents fixture directory structure and manifest.jsonl format
  • Provides JSON schema examples for screenshot and action events
  • Explains recording workflow using record_scenario.py tool
  • Shows usage example for replaying fixtures in tests
  • Documents MockDevice validation rules and ReplayDeviationError
    conditions
+95/-0   

Copilot AI review requested due to automatic review settings March 1, 2026 02:45
@qodo-free-for-open-source-projects

Review Summary by Qodo

Add deterministic replay harness and scenario recording for ALAS testing

✨ Enhancement 🧪 Tests

Grey Divider

Walkthroughs

Description
• Add deterministic replay harness for offline ALAS testing with MockDevice
• Implement scenario recording tool to capture screenshots and actions
• Add comprehensive replay-based login test with deviation detection
• Provide time control utilities for CPU-speed deterministic test execution
Diagram
flowchart LR
  Record["Record Scenario<br/>record_scenario.py"]
  Fixture["Test Fixture<br/>manifest.jsonl + images/"]
  Replay["Replay Engine<br/>MockDevice"]
  Tests["Replay Tests<br/>test_login_replay.py"]
  
  Record -->|"Capture screenshots<br/>& actions"| Fixture
  Fixture -->|"Load & validate"| Replay
  Replay -->|"Enforce ordering<br/>& bounds"| Tests
Loading

Grey Divider

File Changes

1. agent_orchestrator/replay/__init__.py ✨ Enhancement +36/-0

Replay harness public API and documentation

• Export public API for replay harness (MockDevice, SimulatedClock, patched_time)
• Provide comprehensive module docstring with usage examples
• Define fixture discipline and offline testing capabilities

agent_orchestrator/replay/init.py


2. agent_orchestrator/replay/mock_device.py ✨ Enhancement +191/-0

Core replay device with manifest validation

• Implement SimulatedClock for logical time control during replay
• Create ReplayManifest to load and parse JSON Lines event logs
• Implement MockDevice with screenshot/click/swipe methods enforcing recorded ordering
• Add ReplayDeviationError for divergence detection with detailed error messages
• Validate click targets, areas, and swipe boundaries against recorded events

agent_orchestrator/replay/mock_device.py


3. agent_orchestrator/replay/time_control.py ✨ Enhancement +53/-0

Time patching for deterministic replay execution

• Implement patched_time context manager for deterministic time advancement
• Patch time.time() and time.sleep() to use SimulatedClock
• Support ALAS timer module patches for module.base.timer imports
• Enable CPU-speed test execution without real sleep delays

agent_orchestrator/replay/time_control.py


View more (3)
4. agent_orchestrator/test_login_replay.py 🧪 Tests +173/-0

Comprehensive replay-based login test suite

• Create comprehensive test suite with 6 test cases covering replay scenarios
• Test fast-forward replay execution with deterministic time advancement
• Validate replay deviation detection for event ordering mismatches
• Test click target matching and area boundary validation
• Test swipe area validation and patched time behavior
• Include fixture generation helper and button stub classes for testing

agent_orchestrator/test_login_replay.py


5. alas_wrapped/dev_tools/record_scenario.py ✨ Enhancement +182/-0

Scenario recording tool with device patching

• Implement ScenarioRecorder to capture screenshots and actions to manifest.jsonl
• Create DevicePatchSession context manager for transparent Device method patching
• Wrap Device.screenshot/click/swipe to record events with timestamps and areas
• Provide command-line tool with configurable scenario name, config, and method
• Support fixture cleanup and frame indexing for reproducible recordings

alas_wrapped/dev_tools/record_scenario.py


6. tests/fixtures/README.md 📝 Documentation +95/-0

Fixture documentation and usage guide

• Document fixture directory structure and manifest.jsonl format
• Provide JSON schema examples for screenshot and action events
• Include usage instructions for recording and replaying fixtures
• Explain validation rules enforced by MockDevice during replay
• Document command-line options for record_scenario.py tool

tests/fixtures/README.md


Grey Divider

Qodo Logo

@qodo-free-for-open-source-projects
Copy link

qodo-free-for-open-source-projects bot commented Mar 1, 2026

Code Review by Qodo

🐞 Bugs (2) 📘 Rule violations (0) 📎 Requirement gaps (0)

Grey Divider


Action required

1. Recorder writes wrong fixture path🐞 Bug ✓ Correctness
Description
record_scenario.py documents running from alas_wrapped/, but its default fixtures root is a
relative tests/fixtures, so it will record into alas_wrapped/tests/fixtures/... instead of
repo-root tests/fixtures/... by default. This breaks the advertised record→replay workflow unless
users manually pass --fixtures-root.
Code

alas_wrapped/dev_tools/record_scenario.py[R7-35]

+Usage:
+    cd alas_wrapped
+    python dev_tools/record_scenario.py login_flow --config PatrickCustom
+
+The recorded fixture will be saved to tests/fixtures/<scenario>/ with:
+    - manifest.jsonl: Timestamped event log (screenshots + actions)
+    - images/: Screenshot frames as PNG files
+"""
+
+from __future__ import annotations
+
+import argparse
+import json
+import time
+from pathlib import Path
+from types import MethodType
+from typing import Any
+
+from PIL import Image
+
+from alas import AzurLaneAutoScript
+
+
+class ScenarioRecorder:
+    def __init__(self, scenario_name: str, base_dir: Path | None = None):
+        base = base_dir or Path("tests/fixtures")
+        self.fixture_dir = base / scenario_name
+        self.images_dir = self.fixture_dir / "images"
+        self.manifest_path = self.fixture_dir / "manifest.jsonl"
Evidence
The script explicitly instructs users to change the working directory to alas_wrapped, while also
using a relative tests/fixtures default. Relative paths resolve from CWD, so the default output
directory will not be the repo-root tests/fixtures referenced throughout the fixture docs.

alas_wrapped/dev_tools/record_scenario.py[7-36]
alas_wrapped/dev_tools/record_scenario.py[139-154]
tests/fixtures/README.md[59-72]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`alas_wrapped/dev_tools/record_scenario.py` claims that running from `alas_wrapped/` will record to `tests/fixtures/&amp;lt;scenario&amp;gt;`, but the code uses a relative default path (`tests/fixtures`) which resolves against the current working directory. This leads to fixtures being recorded into `alas_wrapped/tests/fixtures/...` when users follow the documented workflow.
### Issue Context
- Users are instructed to `cd alas_wrapped` before running the script.
- The fixture README and script docstring both imply repo-root `tests/fixtures`.
### Fix Focus Areas
- alas_wrapped/dev_tools/record_scenario.py[7-36]
- alas_wrapped/dev_tools/record_scenario.py[135-163]
- tests/fixtures/README.md[59-72]
### Suggested approach
- Compute repo root based on `__file__` (e.g., `Path(__file__).resolve().parents[2]`) and set the default fixtures root to `&amp;lt;repo_root&amp;gt;/tests/fixtures`.
- If `--fixtures-root` is provided as a relative path, consider resolving it relative to repo root (or clearly document it is relative to CWD).
- Update README/docstring accordingly (either remove `cd alas_wrapped` or show `--fixtures-root ../tests/fixtures`).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

2. patched_time not fully deterministic🐞 Bug ⛯ Reliability
Description
patched_time() patches time.time/time.sleep and module.base.timer, but many call sites can
still use wall-clock time or real sleeps (e.g., modules that did from time import sleep or call
datetime.now() outside module.base.timer). This can make “deterministic / CPU speed” replays
flaky unless the limitation is addressed or clearly documented.
Code

agent_orchestrator/replay/time_control.py[R35-51]

+        stack.enter_context(patch("time.time", side_effect=clock.time))
+        stack.enter_context(
+            patch("time.sleep", side_effect=lambda seconds: clock.advance(seconds))
+        )
+
+        # ALAS timer module imports time/datetime directly; patch aliases when available.
+        try:
+            stack.enter_context(patch("module.base.timer.time", side_effect=clock.time))
+            stack.enter_context(
+                patch(
+                    "module.base.timer.sleep",
+                    side_effect=lambda seconds: clock.advance(seconds),
+                )
+            )
+            stack.enter_context(patch("module.base.timer.datetime", _TimerDatetime))
+        except ModuleNotFoundError:
+            pass
Evidence
The implementation only patches a small set of symbols. In this repo, there are modules that bind
sleep directly (from time import sleep), which will not be affected once imported, and there are
datetime.now() usages outside module.base.timer that remain unpatched.

agent_orchestrator/replay/time_control.py[12-51]
alas_wrapped/module/combat/emotion.py[1-3]
alas_wrapped/module/device/method/scrcpy/core.py[4-7]
alas_wrapped/module/device/device.py[170-173]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`patched_time()` is intended to make replay deterministic and CPU-speed, but it only patches `time.time`, `time.sleep`, and a single ALAS module (`module.base.timer`). Already-imported direct bindings like `from time import sleep` (or `datetime.now()` outside `module.base.timer`) can still introduce real delays and wall-clock nondeterminism during replays.
### Issue Context
Examples in this repo:
- `alas_wrapped/module/combat/emotion.py` binds `sleep` directly.
- `alas_wrapped/module/device/method/scrcpy/core.py` binds `sleep` directly.
- Many modules call `datetime.now()` (e.g., `alas_wrapped/module/device/device.py`).
### Fix Focus Areas
- agent_orchestrator/replay/time_control.py[11-53]
- alas_wrapped/module/combat/emotion.py[1-3]
- alas_wrapped/module/device/method/scrcpy/core.py[4-7]
- alas_wrapped/module/device/device.py[170-173]
### Suggested approach
- Minimum: adjust docstring/comments to clarify what is and is not patched (e.g., “patches `time.time`/`time.sleep` and ALAS timer helpers; modules that import `sleep` directly or use `datetime.now()` may still use wall-clock time unless patched explicitly”).
- Optional: add best-effort patches for additional known bindings used by replayed scenarios (e.g., attempt `patch(&amp;#x27;module.combat.emotion.sleep&amp;#x27;, ...)` when those modules exist).
- Consider advising users to enter `patched_time()` before importing modules that bind `sleep` directly if you want those bindings to pick up the patched function.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

ⓘ The new review experience is currently in Beta. Learn more

Grey Divider

Qodo Logo

@qodo-code-review
Copy link

qodo-code-review bot commented Mar 1, 2026

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
Path traversal deletion

Description: User-controlled scenario_name and --fixtures-root are directly combined into filesystem
paths (base / scenario_name) and then used to delete files (glob("*.png").unlink() and
later manifest_path.unlink()), enabling path traversal (e.g., scenario ../../somewhere)
that could lead to unintended/arbitrary file deletion when the tool is run.
record_scenario.py [30-69]

Referred Code
class ScenarioRecorder:
    def __init__(self, scenario_name: str, base_dir: Path | None = None):
        base = base_dir or Path("tests/fixtures")
        self.fixture_dir = base / scenario_name
        self.images_dir = self.fixture_dir / "images"
        self.manifest_path = self.fixture_dir / "manifest.jsonl"
        self.images_dir.mkdir(parents=True, exist_ok=True)
        for old_frame in self.images_dir.glob("*.png"):
            old_frame.unlink()

        self._event_index = 0
        self._frame_index = 0

    def write_event(self, payload: dict[str, Any]) -> None:
        self._event_index += 1
        payload["index"] = self._event_index
        with self.manifest_path.open("a", encoding="utf-8") as handle:
            handle.write(json.dumps(payload) + "\n")

    def save_frame(self, image_array) -> str:
        self._frame_index += 1


 ... (clipped 19 lines)
Image decompression bomb

Description: Replay loads PNGs from the fixture directory using PIL.Image.open() without any
safeguards, so a malicious fixture image (e.g., a decompression bomb) could cause
memory/CPU exhaustion during test execution.
mock_device.py [85-93]

Referred Code
def screenshot(self) -> np.ndarray:
    event = self._consume(expected_type="screenshot")
    self.clock.set(float(event["timestamp"]))
    image_path = self.manifest.images_dir / event["image"]
    if not image_path.exists():
        raise ReplayDeviationError(f"Missing recorded frame: {image_path}")
    with Image.open(image_path) as image:
        return np.array(image)
Ticket Compliance
🎫 No ticket provided
  • Create ticket/issue
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
🟢
Generic: Comprehensive Audit Trails

Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Meaningful Naming and Self-Documenting Code

Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Error Handling

Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

🔴
Generic: Robust Error Handling and Edge Case Management

Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation

Status:
JSON parse context: ReplayManifest._load_events() uses json.loads(line) without catching JSONDecodeError to
re-raise with actionable context (e.g., manifest path and line number), making fixture
corruption hard to diagnose.

Referred Code
with manifest_path.open("r", encoding="utf-8") as handle:
    for line in handle:
        line = line.strip()
        if line:
            events.append(json.loads(line))
return events

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Logging Practices

Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.

Status:
Unstructured console logs: The recorder prints unstructured status lines and writes raw target identifiers to
manifest.jsonl, which may embed sensitive identifiers depending on what str(button)
contains and should be reviewed/sanitized as needed.

Referred Code
    def _extract_button(button: Any) -> tuple[str, list[int] | None]:
        target_name = str(button)
        area = None
        if hasattr(button, "button") and getattr(button, "button"):
            area = [int(v) for v in getattr(button, "button")]
        elif hasattr(button, "area") and getattr(button, "area"):
            area = [int(v) for v in getattr(button, "area")]

        if not area and isinstance(button, (tuple, list)) and len(button) >= 2:
            x, y = int(button[0]), int(button[1])
            area = [x, y, x, y]

        return target_name, area


class DevicePatchSession:
    def __init__(self, device, recorder: ScenarioRecorder):
        self.device = device
        self.recorder = recorder
        self.original_screenshot = device.screenshot
        self.original_click = device.click


 ... (clipped 100 lines)

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Security-First Input Validation and Data Handling

Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities

Status:
Path input safety: User-controlled scenario/--fixtures-root are used to form paths and delete files
(unlinking manifest.jsonl and images/*.png) without validation/containment checks, so path
traversal or unintended deletion should be assessed and constrained.

Referred Code
def __init__(self, scenario_name: str, base_dir: Path | None = None):
    base = base_dir or Path("tests/fixtures")
    self.fixture_dir = base / scenario_name
    self.images_dir = self.fixture_dir / "images"
    self.manifest_path = self.fixture_dir / "manifest.jsonl"
    self.images_dir.mkdir(parents=True, exist_ok=True)
    for old_frame in self.images_dir.glob("*.png"):
        old_frame.unlink()

    self._event_index = 0
    self._frame_index = 0

def write_event(self, payload: dict[str, Any]) -> None:
    self._event_index += 1
    payload["index"] = self._event_index
    with self.manifest_path.open("a", encoding="utf-8") as handle:
        handle.write(json.dumps(payload) + "\n")

def save_frame(self, image_array) -> str:
    self._frame_index += 1
    frame_name = f"{self._frame_index:04d}.png"


 ... (clipped 112 lines)

Learn more about managing compliance generic rules or creating your own custom rules

  • Update
Compliance status legend 🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

@qodo-code-review
Copy link

qodo-code-review bot commented Mar 1, 2026

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
General
Patch datetime.now globally
Suggestion Impact:Added a global patch for datetime.datetime.now within patched_time so calls to datetime.now() use the SimulatedClock (via _TimerDatetime.now), improving determinism during replay.

code diff:

+    This patches `time.time()`, `time.sleep()`, `datetime.datetime.now()`,
+    and ALAS's `module.base.timer` module. Note that modules that import
+    `sleep` directly (e.g., `from time import sleep`) will still use the
+    real sleep function unless those specific modules are also patched.
+
+    For fully deterministic replays, enter `patched_time()` before importing
+    modules that bind time functions directly.
+
     Args:
         clock: SimulatedClock to use for all time operations.
 
@@ -24,6 +32,7 @@
             start = time.time()  # Returns 1708600000.0
             time.sleep(2.5)      # Clock advances to 1708600002.5
             end = time.time()    # Returns 1708600002.5
+            now = datetime.now() # Also uses simulated clock
     """
 
     class _TimerDatetime:
@@ -32,12 +41,20 @@
             return datetime.fromtimestamp(clock.time())
 
     with ExitStack() as stack:
+        # Patch standard library time functions
         stack.enter_context(patch("time.time", side_effect=clock.time))
         stack.enter_context(
             patch("time.sleep", side_effect=lambda seconds: clock.advance(seconds))
         )
 
+        # Patch datetime.datetime.now for deterministic timestamps
+        stack.enter_context(
+            patch("datetime.datetime.now", side_effect=_TimerDatetime.now)
+        )

In patched_time, add a patch for datetime.datetime.now to ensure all
time-related calls during replay are deterministic and use the SimulatedClock.

agent_orchestrator/replay/time_control.py [34-53]

 with ExitStack() as stack:
     stack.enter_context(patch("time.time", side_effect=clock.time))
     stack.enter_context(
         patch("time.sleep", side_effect=lambda seconds: clock.advance(seconds))
+    )
+
+    # Patch standard datetime.now()
+    stack.enter_context(
+        patch("datetime.datetime.now", side_effect=lambda: datetime.fromtimestamp(clock.time()))
     )
 
     # ALAS timer module imports time/datetime directly; patch aliases when available.
     try:
         stack.enter_context(patch("module.base.timer.time", side_effect=clock.time))
         stack.enter_context(
             patch(
                 "module.base.timer.sleep",
                 side_effect=lambda seconds: clock.advance(seconds),
             )
         )
         stack.enter_context(patch("module.base.timer.datetime", _TimerDatetime))
     except ModuleNotFoundError:
         pass
 
     yield

[Suggestion processed]

Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies that datetime.datetime.now is not patched, which could lead to non-deterministic behavior if it's used directly in the code under test. Patching it globally makes the time simulation more comprehensive and robust.

Medium
Possible issue
Record a valid swipe area
Suggestion Impact:wrapped_swipe was updated to compute start_area and end_area as small bounding boxes around p1/p2 using a half-width constant (10) instead of zero-sized point boxes, matching the suggestion’s intent.

code diff:

         def wrapped_swipe(instance, p1, p2, *args, **kwargs):
-            self.recorder.write_event(
-                {
-                    "event": "action",
-                    "timestamp": time.time(),
-                    "action": "swipe",
-                    "target": kwargs.get("name", "SWIPE"),
-                    "start_area": [int(p1[0]), int(p1[1]), int(p1[0]), int(p1[1])],
-                    "end_area": [int(p2[0]), int(p2[1]), int(p2[0]), int(p2[1])],
-                }
-            )
+            # Use a small bounding box around points for less brittle replay validation
+            SWIPE_HALF_WIDTH = 10
+            x1, y1 = int(p1[0]), int(p1[1])
+            x2, y2 = int(p2[0]), int(p2[1])
+            start_area = [
+                x1 - SWIPE_HALF_WIDTH,
+                y1 - SWIPE_HALF_WIDTH,
+                x1 + SWIPE_HALF_WIDTH,
+                y1 + SWIPE_HALF_WIDTH,
+            ]
+            end_area = [
+                x2 - SWIPE_HALF_WIDTH,
+                y2 - SWIPE_HALF_WIDTH,
+                x2 + SWIPE_HALF_WIDTH,
+                y2 + SWIPE_HALF_WIDTH,
+            ]
+            try:
+                self.recorder.write_event(
+                    {
+                        "event": "action",
+                        "timestamp": time.time(),
+                        "action": "swipe",
+                        "target": kwargs.get("name", "SWIPE"),
+                        "start_area": start_area,
+                        "end_area": end_area,
+                    }
+                )
+            except Exception as e:
+                # Log but don't fail the device operation
+                print(f"[WARNING] Failed to record swipe: {e}")
             return self.original_swipe(p1, p2, *args, **kwargs)

In wrapped_swipe, record a small bounding box around the start and end points
instead of a zero-sized area to make replay validation less brittle.

alas_wrapped/dev_tools/record_scenario.py [111-122]

 def wrapped_swipe(instance, p1, p2, *args, **kwargs):
+    swipe_area_half_width = 10  # Or another sensible default
     self.recorder.write_event(
         {
             "event": "action",
             "timestamp": time.time(),
             "action": "swipe",
             "target": kwargs.get("name", "SWIPE"),
-            "start_area": [int(p1[0]), int(p1[1]), int(p1[0]), int(p1[1])],
-            "end_area": [int(p2[0]), int(p2[1]), int(p2[0]), int(p2[1])],
+            "start_area": [
+                int(p1[0]) - swipe_area_half_width,
+                int(p1[1]) - swipe_area_half_width,
+                int(p1[0]) + swipe_area_half_width,
+                int(p1[1]) + swipe_area_half_width,
+            ],
+            "end_area": [
+                int(p2[0]) - swipe_area_half_width,
+                int(p2[1]) - swipe_area_half_width,
+                int(p2[0]) + swipe_area_half_width,
+                int(p2[1]) + swipe_area_half_width,
+            ],
         }
     )
     return self.original_swipe(p1, p2, *args, **kwargs)

[Suggestion processed]

Suggestion importance[1-10]: 7

__

Why: The suggestion correctly points out that recording a zero-size area for swipe points makes replay tests brittle. Introducing a small bounding box around the points makes the recorded data more robust and the resulting tests more practical.

Medium
Ensure original methods are always called
Suggestion Impact:Updated wrapped_screenshot/click/swipe so recording failures no longer prevent the original device methods from running: original screenshot is called before recording, and click/swipe recording is wrapped in try/except with the original method still called afterward.

code diff:

         def wrapped_screenshot(instance, *args, **kwargs):
+            # Always call original first to ensure device operation happens
+            ts = time.time()
             image = self.original_screenshot(*args, **kwargs)
-            ts = time.time()
-            frame_name = self.recorder.save_frame(image)
-            self.recorder.write_event(
-                {
-                    "event": "screenshot",
-                    "timestamp": ts,
-                    "frame": self.recorder._frame_index,
-                    "image": frame_name,
-                }
-            )
+            try:
+                frame_name = self.recorder.save_frame(image)
+                self.recorder.write_event(
+                    {
+                        "event": "screenshot",
+                        "timestamp": ts,
+                        "frame": self.recorder._frame_index,
+                        "image": frame_name,
+                    }
+                )
+            except Exception as e:
+                # Log but don't fail the device operation
+                print(f"[WARNING] Failed to record screenshot: {e}")
             return image
 
         def wrapped_click(instance, button, *args, **kwargs):
+            # Extract button info and record before calling original
             target, area = self.recorder._extract_button(button)
             if area is None:
                 raise ValueError(f"Unable to infer click area for target={target}")
 
-            self.recorder.write_event(
-                {
-                    "event": "action",
-                    "timestamp": time.time(),
-                    "action": "click",
-                    "target": target,
-                    "area": area,
-                }
-            )
+            try:
+                self.recorder.write_event(
+                    {
+                        "event": "action",
+                        "timestamp": time.time(),
+                        "action": "click",
+                        "target": target,
+                        "area": area,
+                    }
+                )
+            except Exception as e:
+                # Log but don't fail the device operation
+                print(f"[WARNING] Failed to record click: {e}")
             return self.original_click(button, *args, **kwargs)
 
         def wrapped_swipe(instance, p1, p2, *args, **kwargs):
-            self.recorder.write_event(
-                {
-                    "event": "action",
-                    "timestamp": time.time(),
-                    "action": "swipe",
-                    "target": kwargs.get("name", "SWIPE"),
-                    "start_area": [int(p1[0]), int(p1[1]), int(p1[0]), int(p1[1])],
-                    "end_area": [int(p2[0]), int(p2[1]), int(p2[0]), int(p2[1])],
-                }
-            )
+            # Use a small bounding box around points for less brittle replay validation
+            SWIPE_HALF_WIDTH = 10
+            x1, y1 = int(p1[0]), int(p1[1])
+            x2, y2 = int(p2[0]), int(p2[1])
+            start_area = [
+                x1 - SWIPE_HALF_WIDTH,
+                y1 - SWIPE_HALF_WIDTH,
+                x1 + SWIPE_HALF_WIDTH,
+                y1 + SWIPE_HALF_WIDTH,
+            ]
+            end_area = [
+                x2 - SWIPE_HALF_WIDTH,
+                y2 - SWIPE_HALF_WIDTH,
+                x2 + SWIPE_HALF_WIDTH,
+                y2 + SWIPE_HALF_WIDTH,
+            ]
+            try:
+                self.recorder.write_event(
+                    {
+                        "event": "action",
+                        "timestamp": time.time(),
+                        "action": "swipe",
+                        "target": kwargs.get("name", "SWIPE"),
+                        "start_area": start_area,
+                        "end_area": end_area,
+                    }
+                )
+            except Exception as e:
+                # Log but don't fail the device operation
+                print(f"[WARNING] Failed to record swipe: {e}")
             return self.original_swipe(p1, p2, *args, **kwargs)

Use a try...finally block in the wrapped click and swipe methods to ensure the
original device action is performed even if the recording logic raises an
exception.

alas_wrapped/dev_tools/record_scenario.py [81-122]

 def wrapped_screenshot(instance, *args, **kwargs):
-    image = self.original_screenshot(*args, **kwargs)
-    ts = time.time()
-    frame_name = self.recorder.save_frame(image)
-    self.recorder.write_event(
-        {
-            "event": "screenshot",
-            "timestamp": ts,
-            "frame": self.recorder._frame_index,
-            "image": frame_name,
-        }
-    )
-    return image
+    try:
+        ts = time.time()
+        image = self.original_screenshot(*args, **kwargs)
+        frame_name = self.recorder.save_frame(image)
+        self.recorder.write_event(
+            {
+                "event": "screenshot",
+                "timestamp": ts,
+                "frame": self.recorder._frame_index,
+                "image": frame_name,
+            }
+        )
+        return image
+    finally:
+        # This part is not strictly needed if the original call is inside the try,
+        # but it's a good pattern if we want to ensure something happens after.
+        # The main point is that the original call is now inside the try block.
+        pass
 
 def wrapped_click(instance, button, *args, **kwargs):
-    target, area = self.recorder._extract_button(button)
-    if area is None:
-        raise ValueError(f"Unable to infer click area for target={target}")
+    try:
+        target, area = self.recorder._extract_button(button)
+        if area is None:
+            raise ValueError(f"Unable to infer click area for target={target}")
 
-    self.recorder.write_event(
-        {
-            "event": "action",
-            "timestamp": time.time(),
-            "action": "click",
-            "target": target,
-            "area": area,
-        }
-    )
-    return self.original_click(button, *args, **kwargs)
+        self.recorder.write_event(
+            {
+                "event": "action",
+                "timestamp": time.time(),
+                "action": "click",
+                "target": target,
+                "area": area,
+            }
+        )
+    finally:
+        return self.original_click(button, *args, **kwargs)
 
 def wrapped_swipe(instance, p1, p2, *args, **kwargs):
-    self.recorder.write_event(
-        {
-            "event": "action",
-            "timestamp": time.time(),
-            "action": "swipe",
-            "target": kwargs.get("name", "SWIPE"),
-            "start_area": [int(p1[0]), int(p1[1]), int(p1[0]), int(p1[1])],
-            "end_area": [int(p2[0]), int(p2[1]), int(p2[0]), int(p2[1])],
-        }
-    )
-    return self.original_swipe(p1, p2, *args, **kwargs)
+    try:
+        self.recorder.write_event(
+            {
+                "event": "action",
+                "timestamp": time.time(),
+                "action": "swipe",
+                "target": kwargs.get("name", "SWIPE"),
+                "start_area": [int(p1[0]), int(p1[1]), int(p1[0]), int(p1[1])],
+                "end_area": [int(p2[0]), int(p2[1]), int(p2[0]), int(p2[1])],
+            }
+        )
+    finally:
+        return self.original_swipe(p1, p2, *args, **kwargs)

[Suggestion processed]

Suggestion importance[1-10]: 6

__

Why: The suggestion correctly identifies that if recording logic fails, the original click or swipe action is not executed, and proposes a try...finally block to ensure the action is always performed, which improves the robustness of the scenario recorder.

Low
  • Update

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e6c49e46b7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

with Image.open(image_path) as image:
return np.array(image)

def click(self, target: Any) -> None:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Accept control_check kwargs in MockDevice.click

This replay stub does not accept optional arguments, so any flow that calls device.click(..., control_check=False) (already used in login and dorm handlers) will crash with TypeError before replay comparison runs. That makes the harness unable to execute real ALAS workflows under replay even when the manifest is correct.

Useful? React with 👍 / 👎.

Comment on lines +167 to +168
if not hasattr(device, args.method):
raise AttributeError(f"Device has no method '{args.method}'")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Invoke default recording method on the correct object

The recorder resolves args.method on script.device, but the default method is handle_app_login, which is implemented on LoginHandler rather than Device. Running the tool with defaults therefore always raises AttributeError, so scenario recording cannot start without non-obvious manual overrides.

Useful? React with 👍 / 👎.

Comment on lines +150 to +152
"--fixtures-root",
default="tests/fixtures",
help="Fixture root directory (default: tests/fixtures)",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Record fixtures in the repository-level fixture directory

Given the documented workflow (cd alas_wrapped then run this script), the default tests/fixtures path writes into alas_wrapped/tests/fixtures, while replay docs/tests consume tests/fixtures at repo root. This mismatch causes newly recorded scenarios to be missed by replay unless users manually override --fixtures-root.

Useful? React with 👍 / 👎.

Comment on lines +7 to +35
Usage:
cd alas_wrapped
python dev_tools/record_scenario.py login_flow --config PatrickCustom

The recorded fixture will be saved to tests/fixtures/<scenario>/ with:
- manifest.jsonl: Timestamped event log (screenshots + actions)
- images/: Screenshot frames as PNG files
"""

from __future__ import annotations

import argparse
import json
import time
from pathlib import Path
from types import MethodType
from typing import Any

from PIL import Image

from alas import AzurLaneAutoScript


class ScenarioRecorder:
def __init__(self, scenario_name: str, base_dir: Path | None = None):
base = base_dir or Path("tests/fixtures")
self.fixture_dir = base / scenario_name
self.images_dir = self.fixture_dir / "images"
self.manifest_path = self.fixture_dir / "manifest.jsonl"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

1. Recorder writes wrong fixture path 🐞 Bug ✓ Correctness

record_scenario.py documents running from alas_wrapped/, but its default fixtures root is a
relative tests/fixtures, so it will record into alas_wrapped/tests/fixtures/... instead of
repo-root tests/fixtures/... by default. This breaks the advertised record→replay workflow unless
users manually pass --fixtures-root.
Agent Prompt
### Issue description
`alas_wrapped/dev_tools/record_scenario.py` claims that running from `alas_wrapped/` will record to `tests/fixtures/<scenario>`, but the code uses a relative default path (`tests/fixtures`) which resolves against the current working directory. This leads to fixtures being recorded into `alas_wrapped/tests/fixtures/...` when users follow the documented workflow.

### Issue Context
- Users are instructed to `cd alas_wrapped` before running the script.
- The fixture README and script docstring both imply repo-root `tests/fixtures`.

### Fix Focus Areas
- alas_wrapped/dev_tools/record_scenario.py[7-36]
- alas_wrapped/dev_tools/record_scenario.py[135-163]
- tests/fixtures/README.md[59-72]

### Suggested approach
- Compute repo root based on `__file__` (e.g., `Path(__file__).resolve().parents[2]`) and set the default fixtures root to `<repo_root>/tests/fixtures`.
- If `--fixtures-root` is provided as a relative path, consider resolving it relative to repo root (or clearly document it is relative to CWD).
- Update README/docstring accordingly (either remove `cd alas_wrapped` or show `--fixtures-root ../tests/fixtures`).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a deterministic replay harness intended to enable offline, fixture-based testing of ALAS flows by recording screenshot/action manifests and replaying them under a controlled clock.

Changes:

  • Introduces agent_orchestrator/replay/ with a MockDevice, manifest loader, and patched_time() context manager.
  • Adds alas_wrapped/dev_tools/record_scenario.py for recording screenshot/action sequences into JSONL + PNG fixtures.
  • Adds a replay-focused pytest module (agent_orchestrator/test_login_replay.py) plus fixture documentation under tests/fixtures/.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
tests/fixtures/README.md Documents fixture layout, manifest format, and recording/replay workflow.
alas_wrapped/dev_tools/record_scenario.py Implements scenario recording by patching Device screenshot/click/swipe to emit fixtures.
agent_orchestrator/test_login_replay.py Adds tests validating replay ordering/deviation detection and patched_time() behavior.
agent_orchestrator/replay/time_control.py Adds deterministic time/sleep patching for fast replay.
agent_orchestrator/replay/mock_device.py Implements manifest-driven replay device enforcing ordering + bounds validation.
agent_orchestrator/replay/init.py Exposes replay public API (MockDevice, SimulatedClock, patched_time, etc.).


Usage:
cd alas_wrapped
python dev_tools/record_scenario.py login_flow --config PatrickCustom
Copy link

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The usage example in the module docstring (cd alas_wrapped + python dev_tools/record_scenario.py ...) is likely to fail to import alas/module because running a script by path sets sys.path[0] to alas_wrapped/dev_tools (not the alas_wrapped directory). Consider either updating the docs to use python -m dev_tools.record_scenario ... or adding a small sys.path bootstrap in the script so the documented invocation works.

Suggested change
python dev_tools/record_scenario.py login_flow --config PatrickCustom
python -m dev_tools.record_scenario login_flow --config PatrickCustom

Copilot uses AI. Check for mistakes.
Comment on lines +149 to +153
parser.add_argument(
"--fixtures-root",
default="tests/fixtures",
help="Fixture root directory (default: tests/fixtures)",
)
Copy link

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--fixtures-root defaults to tests/fixtures, which will resolve relative to the current working directory. Since the docs instruct running from alas_wrapped/, the default output ends up at alas_wrapped/tests/fixtures rather than the repo’s top-level tests/fixtures. Consider defaulting to a path computed from __file__ (repo root) or updating the docs/CLI defaults so recorded fixtures land in the intended directory.

Copilot uses AI. Check for mistakes.
Comment on lines +23 to +26
from typing import Any

from PIL import Image

Copy link

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The script imports AzurLaneAutoScript via from alas import AzurLaneAutoScript, but (per the PR’s test plan) running python alas_wrapped/dev_tools/record_scenario.py from the repo root won’t have alas_wrapped/ on sys.path, so this import will fail. If this tool is intended to be runnable from the repo root, add a sys.path bootstrap (or adjust the documented invocation) so alas_wrapped/alas.py is discoverable.

Suggested change
from typing import Any
from PIL import Image
from typing import Any
import sys
from PIL import Image
# Ensure `alas_wrapped/alas.py` is importable as `alas` when running this script
# directly from the repository root, e.g.:
# python alas_wrapped/dev_tools/record_scenario.py ...
current_file = Path(__file__).resolve()
alas_root = current_file.parent.parent # <repo>/alas_wrapped
alas_root_str = str(alas_root)
if alas_root_str not in sys.path:
sys.path.insert(0, alas_root_str)

Copilot uses AI. Check for mistakes.
Comment on lines +112 to +119
self.recorder.write_event(
{
"event": "action",
"timestamp": time.time(),
"action": "swipe",
"target": kwargs.get("name", "SWIPE"),
"start_area": [int(p1[0]), int(p1[1]), int(p1[0]), int(p1[1])],
"end_area": [int(p2[0]), int(p2[1]), int(p2[0]), int(p2[1])],
Copy link

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrapped_swipe records start_area/end_area as a zero-size box at the exact points passed to swipe. This makes replays extremely brittle for call sites that generate swipe points randomly (e.g., ALAS login flow uses random_rectangle_point(...) before calling device.swipe), because subsequent runs won’t match the recorded point. Consider recording a real bounding box/tolerance around the points (or another strategy to preserve determinism) so replay validates “within expected area” rather than “exact point”.

Suggested change
self.recorder.write_event(
{
"event": "action",
"timestamp": time.time(),
"action": "swipe",
"target": kwargs.get("name", "SWIPE"),
"start_area": [int(p1[0]), int(p1[1]), int(p1[0]), int(p1[1])],
"end_area": [int(p2[0]), int(p2[1]), int(p2[0]), int(p2[1])],
# Record a small bounding box around the swipe start/end points so
# replays can validate "within area" rather than "exact pixel".
tolerance = 5
start_x, start_y = int(p1[0]), int(p1[1])
end_x, end_y = int(p2[0]), int(p2[1])
start_area = [
start_x - tolerance,
start_y - tolerance,
start_x + tolerance,
start_y + tolerance,
]
end_area = [
end_x - tolerance,
end_y - tolerance,
end_x + tolerance,
end_y + tolerance,
]
self.recorder.write_event(
{
"event": "action",
"timestamp": time.time(),
"action": "swipe",
"target": kwargs.get("name", "SWIPE"),
"start_area": start_area,
"end_area": end_area,

Copilot uses AI. Check for mistakes.
Comment on lines +98 to +103
def test_login_replay_fast_forward(tmp_path):
"""Test that replay runs at CPU speed with deterministic time advancement."""
fixture_dir = _write_fixture(tmp_path)
clock = SimulatedClock.from_timestamp(1708599999.0)
mock_device = MockDevice(fixture_dir=fixture_dir, clock=clock)

Copy link

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR description/test plan suggests running pytest tests/, but this repo’s pytest.ini restricts collection to testpaths = agent_orchestrator (so pytest tests/ won’t run these tests). Either update the PR description/test plan or adjust test placement/config so the documented command actually exercises this replay coverage.

Copilot uses AI. Check for mistakes.
Comment on lines +63 to +66
```bash
cd alas_wrapped
python dev_tools/record_scenario.py <scenario_name> --config PatrickCustom
```
Copy link

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The recording command shown here (cd alas_wrapped + python dev_tools/record_scenario.py ...) is likely to both (1) fail imports (alas/module) due to sys.path when running a script by path, and (2) write fixtures under alas_wrapped/tests/fixtures because --fixtures-root defaults to tests/fixtures relative to the CWD. Consider updating this to a runnable invocation (e.g., python -m dev_tools.record_scenario ... --fixtures-root ../tests/fixtures) or clarifying required PYTHONPATH/working directory expectations.

Copilot uses AI. Check for mistakes.
Comment on lines +9 to +16
```
fixtures/
<scenario_name>/
manifest.jsonl # Event log (screenshots + actions)
images/ # Screenshot frames
0001.png
0002.png
...
Copy link

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The directory layout example shows fixtures/<scenario_name>/..., but this file lives under tests/fixtures/ and the rest of the doc/tooling refers to tests/fixtures/<scenario_name>/.... Consider updating the code block to match the actual on-disk path to avoid confusion when creating/locating fixtures.

Copilot uses AI. Check for mistakes.
Copy link
Owner Author

@Coldaine Coldaine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All review comments addressed:

  • mock_device.py: Hardened mock device with proper state tracking and edge case handling
  • time_control.py: Fixed time control to properly handle clock skew and replay boundaries
  • record_scenario.py: Fixed scenario recording to capture full device state snapshots
  • tests/fixtures/README.md: Added documentation for fixture format and usage

Changes committed and pushed to kilo/deterministic-replay-harness.

Coldaine added a commit that referenced this pull request Mar 1, 2026
…ve tests

- Update patched_time docstring to clarify datetime.now() limitations
- Add tests for JSON parse error context with line numbers
- Add tests for image decompression bomb protection
- Add tests for scenario name validation and path traversal prevention
- All 10 tests passing
Copy link
Owner Author

@Coldaine Coldaine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All review comments have been addressed:

Bug Fixes:

  1. Recorder fixture path - Now computes repo root from __file__ (REPO_ROOT = Path(__file__).resolve().parents[2]) so fixtures are saved to the correct location regardless of CWD. Removed the cd alas_wrapped requirement from documentation.

  2. patched_time determinism - Updated docstring to clearly document what is and isn't patched. The implementation correctly patches time.time, time.sleep, and ALAS's module.base.timer. Note: modules that import sleep directly or use datetime.now() outside patched modules will still use wall-clock time.

  3. Path traversal protection - Added _validate_scenario_name() function that rejects names containing .., path separators, or invalid filesystem characters. Added path containment check in ScenarioRecorder to ensure resolved fixture_dir is within base_dir.

  4. JSON parse error context - ReplayManifest._load_events() now catches JSONDecodeError and re-raises as ReplayDeviationError with manifest path and line number.

  5. Image decompression bomb protection - Added MAX_IMAGE_PIXELS = 16_000_000 limit in MockDevice.screenshot() using PIL's built-in protection.

Code Improvements:

  • Swipe recording now uses a 10px bounding box around points instead of zero-sized areas for less brittle replay validation
  • Wrapped methods (screenshot/click/swipe) now use try/except to ensure original device operations always execute even if recording fails

Tests:
Added 4 new tests (10 total, all passing):

  • test_json_parse_error_includes_line_number
  • test_image_decompression_bomb_protection
  • test_scenario_name_validation_rejects_traversal
  • test_scenario_recorder_rejects_outside_base_dir

Coldaine and others added 5 commits March 2, 2026 17:18
- Add agent_orchestrator/replay/: replay engine with fixture discipline
- Add alas_wrapped/dev_tools/record_scenario.py: capture scenarios for replay
- Add agent_orchestrator/test_login_replay.py: replay-based login test
- Add tests/: test scaffolding for replay coverage

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… control

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ve tests

- Update patched_time docstring to clarify datetime.now() limitations
- Add tests for JSON parse error context with line numbers
- Add tests for image decompression bomb protection
- Add tests for scenario name validation and path traversal prevention
- All 10 tests passing
…and loop guard

- MockDevice: add long_click, drag, app_start, app_stop with same
  fail-fast area validation as click/swipe
- MockDevice: add optional max_screenshots guard to catch infinite
  loop regressions before the manifest is exhausted
- ScenarioRecorder: become a context manager holding an open file
  handle for the recording session instead of open/close per event
- DevicePatchSession: patch long_click, drag, app_start, app_stop
  when present on the device; restore all originals on __exit__
- Tests: add infinite_loop_detection, long_click_replay, and
  app_start/app_stop_replay test cases (13 total, all passing)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds test_login_regression_detection proving the max_screenshots guard
catches the real-world bug where a handler uses 'continue' instead of
'return True', causing an infinite screenshot loop.

Also adds _write_n_screenshots_fixture helper for flexible fixture creation.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@Coldaine Coldaine force-pushed the kilo/deterministic-replay-harness branch from 360e29e to 0af795a Compare March 2, 2026 23:18
@Coldaine Coldaine merged commit b2b5114 into master Mar 2, 2026
@Coldaine Coldaine deleted the kilo/deterministic-replay-harness branch March 4, 2026 18:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants