perf: lazy-load screenshots and batch Docker validation by Miyamura80 · Pull Request #97 · Edison-Watch/desktest

Miyamura80 · 2026-03-31T22:48:47Z

Summary

Lazy-load screenshot data URLs: Removed the eagerly-stored screenshot_data_url field from the Observation struct. Screenshots are now loaded from disk on demand via Observation::load_screenshot_data_url() only when needed (building LLM messages, monitor events, judge calls). This eliminates ~1-2MB of retained memory per agent step — previously the full base64 string was held in the trajectory vector indefinitely, even after falling out of the sliding window.
Batch Docker image validation: Replaced 9+ separate Docker exec round-trips in validate_custom_image() with a single batched shell script. Per-check error granularity is preserved via structured CHECK:<tag>:<status> output parsing. The Xauthority-specific warning log is also preserved.

Test plan

All 523 unit tests pass
All 3 macOS integration tests pass
Clippy clean (no warnings)
Run an E2E test with --monitor to verify live dashboard still shows screenshots
Run against a custom Docker image with missing deps to verify per-check error messages

🤖 Generated with Claude Code

… calls Two memory/latency optimizations: 1. Screenshots are no longer eagerly base64-encoded and stored in the Observation struct. Instead, load_screenshot_data_url() reads from disk on demand when building LLM messages or monitor events. This eliminates ~1-2MB of retained memory per step in the trajectory, which previously grew unbounded across all agent steps. 2. Custom Docker image validation now runs a single batched shell script instead of 9+ separate Docker exec round-trips (5 binary checks, 3 Python import checks, 1 file existence check). Per-check error granularity and the Xauthority warning are preserved via structured output parsing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

greptile-apps · 2026-03-31T22:54:04Z

Greptile Summary

This PR delivers two independent performance improvements: lazy-loading screenshot data URLs from disk on demand (eliminating ~1–2 MB of heap retention per trajectory step) and collapsing the custom Docker image validation from 9+ serial exec round-trips into a single batched shell script with structured output parsing.

Key changes:

Observation.screenshot_data_url field removed; replaced by load_screenshot_data_url() which reads and base64-encodes the PNG on demand — all three call sites (observation_to_message, publish_step_event, build_judge_messages) updated accordingly
capture_screenshot_with_retry / capture_screenshot_once return only PathBuf instead of (PathBuf, String); two orchestration.rs callers updated
validate_custom_image() builds one sh -c script with if/fi blocks joined by ;, parses CHECK:<tag>:<status> lines to reconstruct per-check results with a no-output guard
Test helpers in context.rs now return (Observation, NamedTempFile) with isolated temp files, fixing the parallel-test race flagged in the prior review

One finding:

The empty-output guard in validate_custom_image() only catches zero-output failures. If the script produces partial output (e.g., container is OOM-killed mid-execution), checks that never emitted a line will be silently treated as OK, potentially letting an incomplete validation pass. Adding a presence check for either :OK or :MISSING per item would make truncation detectable.

Confidence Score: 5/5

Safe to merge; the one finding is a P2 edge case in the Docker validation guard that does not affect normal operation.

Both prior P1 concerns are resolved: the parallel-test race is fixed with isolated NamedTempFiles, and the multi-read memory tradeoff is intentionally deferred. The only remaining finding is a P2 edge-case (truncated validation script treated as all-OK), which requires an unusual container failure mid-exec to trigger and does not affect the primary agent or screenshot paths.

src/docker/mod.rs — partial-output guard in validate_custom_image() could be strengthened

Important Files Changed

Filename	Overview
src/observation.rs	Removes screenshot_data_url field and adds load_screenshot_data_url() for on-demand disk reads; signature of capture_screenshot_with_retry simplified to return only the path; new unit tests added for the lazy-load method.
src/docker/mod.rs	Replaces 9+ sequential exec calls with a single batched shell script; structured CHECK:: output parsed for per-check granularity; empty-output guard added, but partial-output (truncated script) is silently treated as all-OK.
src/agent/context.rs	Switches observation_to_message() to use load_screenshot_data_url(); test helpers updated to return (Observation, NamedTempFile) to fix the parallel-test race condition identified in the prior review.
src/agent/loop_v2.rs	Three call sites updated to use load_screenshot_data_url() instead of the now-removed field; error handling via warn! is consistent across all three.
src/orchestration.rs	Two call sites updated to destructure the new single-value return from capture_screenshot_with_retry (previously (path, _), now just path).

Sequence Diagram

sequenceDiagram
    participant Agent as AgentLoopV2
    participant Obs as Observation
    participant Disk as Local FS
    participant LLM as LLM API

    Note over Agent,Disk: Per agent step - new lazy-load flow
    Agent->>Disk: capture_screenshot_with_retry returns PathBuf
    Disk-->>Agent: path saved

    Agent->>Obs: create Observation with screenshot_path only
    Note over Obs: No base64 string retained in memory

    Agent->>Obs: load_screenshot_data_url for LLM context
    Obs->>Disk: fs read path
    Disk-->>Obs: raw bytes
    Obs-->>Agent: base64 data URL
    Agent->>LLM: build_messages with image

    Agent->>Obs: load_screenshot_data_url for monitor
    Obs->>Disk: fs read path
    Disk-->>Obs: raw bytes
    Obs-->>Agent: base64 payload stripped

    Agent->>Obs: load_screenshot_data_url for judge
    Obs->>Disk: fs read path
    Disk-->>Obs: raw bytes
    Obs-->>Agent: base64 data URL
    Note over Agent: Observation dropped - no large string retained in trajectory

_{Reviews (2): Last reviewed commit: "fix: log warnings on screenshot load fai..." | Re-trigger Greptile}

1. Use tempfile::NamedTempFile in context.rs test helpers to avoid parallel-test races on a shared temp path. 2. Add empty-output guard in validate_custom_image() so a killed or truncated validation script returns an error instead of silently reporting a clean image. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Miyamura80 · 2026-03-31T22:59:35Z

@greptileai

Previously load errors were silently swallowed via .ok(). Now all three call sites (context, monitor, judge) consistently log tracing::warn! before falling back to no-screenshot behavior. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Miyamura80 · 2026-03-31T23:01:45Z

@greptileai

Miyamura80 and others added 3 commits March 31, 2026 23:48

style: fix rustfmt formatting

15eba47

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: remove unused PathBuf import in context tests

3ea1130

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

greptile-apps bot reviewed Mar 31, 2026

View reviewed changes

Comment thread src/agent/context.rs Outdated

Comment thread src/observation.rs

This comment was marked as resolved.

Sign in to view

Miyamura80 merged commit 8ac28b0 into master Apr 1, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: lazy-load screenshots and batch Docker validation#97

perf: lazy-load screenshots and batch Docker validation#97
Miyamura80 merged 5 commits intomasterfrom
perf/lazy-screenshots-batch-validation

Miyamura80 commented Mar 31, 2026 •

edited by devin-ai-integration bot

Loading

Uh oh!

greptile-apps bot commented Mar 31, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

This comment was marked as resolved.

Uh oh!

Miyamura80 commented Mar 31, 2026

Uh oh!

Miyamura80 commented Mar 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Miyamura80 commented Mar 31, 2026 • edited by devin-ai-integration bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

greptile-apps bot commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

This comment was marked as resolved.

Uh oh!

Miyamura80 commented Mar 31, 2026

Uh oh!

Miyamura80 commented Mar 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Miyamura80 commented Mar 31, 2026 •

edited by devin-ai-integration bot

Loading

greptile-apps bot commented Mar 31, 2026 •

edited

Loading