Deterministic Replay Testing Harness for ALAS#30
Conversation
Summary of changes: - Developed a `Recorder` module in `alas_wrapped/module/device/recorder.py` to capture screenshots and log bot actions (clicks, swipes, etc.) into a JSONL manifest. - Integrated the `Recorder` into the core `Device` class in `alas_wrapped/module/device/device.py` via hooks in action methods. - Added support for a `--record` flag in `alas.py`, `gui.py`, and `process_manager.py` to enable session recording from CLI or Web UI. - Implemented a `MockDevice` in `alas_wrapped/tests/mock_device.py` that replays recorded screenshots and tracks actual actions for verification. - Added deterministic time control to `MockDevice` by allowing virtual time to be synchronized with recorded timestamps. - Created test suites in `alas_wrapped/tests/` to verify the replay harness and demonstrate its use in detecting state machine regressions (e.g., the Login Regression). - Verified that all new tests and existing `agent_orchestrator` tests pass in the environment. Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
|
Superseded by PR #31 which targets trunk/stabilization, uses context-manager recorder, and has complete clock patching. |
This PR introduces a Deterministic Replay Testing Harness for ALAS. This system allows developers to record a "golden path" of bot execution (including all screenshots and subsequent actions) and replay it offline using a Mock Device. This ensures that state machine logic changes can be validated for correctness and regressions without needing a live emulator, significantly speeding up the development and testing cycle.
Key components:
Recorder: Captures the live run state.MockDevice: Simulates the emulator by feeding recorded screenshots and verifying bot actions.--recordflag: Easy opt-in for recording sessions.Fixes #29
PR created automatically by Jules for task 13419231669082630945 started by @Coldaine