Skip to content

Deterministic Replay Testing Harness for ALAS#30

Closed
Coldaine wants to merge 1 commit intomasterfrom
feature/deterministic-replay-harness-13419231669082630945
Closed

Deterministic Replay Testing Harness for ALAS#30
Coldaine wants to merge 1 commit intomasterfrom
feature/deterministic-replay-harness-13419231669082630945

Conversation

@Coldaine
Copy link
Owner

This PR introduces a Deterministic Replay Testing Harness for ALAS. This system allows developers to record a "golden path" of bot execution (including all screenshots and subsequent actions) and replay it offline using a Mock Device. This ensures that state machine logic changes can be validated for correctness and regressions without needing a live emulator, significantly speeding up the development and testing cycle.

Key components:

  • Recorder: Captures the live run state.
  • MockDevice: Simulates the emulator by feeding recorded screenshots and verifying bot actions.
  • --record flag: Easy opt-in for recording sessions.
  • Integration tests: Examples of using the harness to catch real-world logic bugs.

Fixes #29


PR created automatically by Jules for task 13419231669082630945 started by @Coldaine

Summary of changes:
- Developed a `Recorder` module in `alas_wrapped/module/device/recorder.py` to capture screenshots and log bot actions (clicks, swipes, etc.) into a JSONL manifest.
- Integrated the `Recorder` into the core `Device` class in `alas_wrapped/module/device/device.py` via hooks in action methods.
- Added support for a `--record` flag in `alas.py`, `gui.py`, and `process_manager.py` to enable session recording from CLI or Web UI.
- Implemented a `MockDevice` in `alas_wrapped/tests/mock_device.py` that replays recorded screenshots and tracks actual actions for verification.
- Added deterministic time control to `MockDevice` by allowing virtual time to be synchronized with recorded timestamps.
- Created test suites in `alas_wrapped/tests/` to verify the replay harness and demonstrate its use in detecting state machine regressions (e.g., the Login Regression).
- Verified that all new tests and existing `agent_orchestrator` tests pass in the environment.

Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
@google-labs-jules
Copy link

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.


For security, I will only act on instructions from the user who triggered this task.

@Coldaine
Copy link
Owner Author

Superseded by PR #31 which targets trunk/stabilization, uses context-manager recorder, and has complete clock patching.

@Coldaine Coldaine closed this Feb 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature: Deterministic Replay Testing Harness

1 participant