Deterministic Replay Testing Harness for ALAS by Coldaine · Pull Request #30 · Coldaine/ALAS

Coldaine · 2026-02-23T05:54:12Z

This PR introduces a Deterministic Replay Testing Harness for ALAS. This system allows developers to record a "golden path" of bot execution (including all screenshots and subsequent actions) and replay it offline using a Mock Device. This ensures that state machine logic changes can be validated for correctness and regressions without needing a live emulator, significantly speeding up the development and testing cycle.

Key components:

Recorder: Captures the live run state.
MockDevice: Simulates the emulator by feeding recorded screenshots and verifying bot actions.
--record flag: Easy opt-in for recording sessions.
Integration tests: Examples of using the harness to catch real-world logic bugs.

Fixes #29

PR created automatically by Jules for task 13419231669082630945 started by @Coldaine

Summary of changes: - Developed a `Recorder` module in `alas_wrapped/module/device/recorder.py` to capture screenshots and log bot actions (clicks, swipes, etc.) into a JSONL manifest. - Integrated the `Recorder` into the core `Device` class in `alas_wrapped/module/device/device.py` via hooks in action methods. - Added support for a `--record` flag in `alas.py`, `gui.py`, and `process_manager.py` to enable session recording from CLI or Web UI. - Implemented a `MockDevice` in `alas_wrapped/tests/mock_device.py` that replays recorded screenshots and tracks actual actions for verification. - Added deterministic time control to `MockDevice` by allowing virtual time to be synchronized with recorded timestamps. - Created test suites in `alas_wrapped/tests/` to verify the replay harness and demonstrate its use in detecting state machine regressions (e.g., the Login Regression). - Verified that all new tests and existing `agent_orchestrator` tests pass in the environment. Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>

google-labs-jules · 2026-02-23T05:54:14Z

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.

For security, I will only act on instructions from the user who triggered this task.

Coldaine · 2026-02-23T08:08:18Z

Superseded by PR #31 which targets trunk/stabilization, uses context-manager recorder, and has complete clock patching.

google-labs-jules bot mentioned this pull request Feb 23, 2026

Feature: Deterministic Replay Testing Harness #29

Closed

Coldaine closed this Feb 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deterministic Replay Testing Harness for ALAS#30

Deterministic Replay Testing Harness for ALAS#30
Coldaine wants to merge 1 commit intomasterfrom
feature/deterministic-replay-harness-13419231669082630945

Coldaine commented Feb 23, 2026

Uh oh!

google-labs-jules bot commented Feb 23, 2026

Uh oh!

Coldaine commented Feb 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Coldaine commented Feb 23, 2026

Uh oh!

google-labs-jules bot commented Feb 23, 2026

Uh oh!

Coldaine commented Feb 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant