Skip to content

perf: lazy-load screenshots and batch Docker validation#97

Merged
Miyamura80 merged 5 commits intomasterfrom
perf/lazy-screenshots-batch-validation
Apr 1, 2026
Merged

perf: lazy-load screenshots and batch Docker validation#97
Miyamura80 merged 5 commits intomasterfrom
perf/lazy-screenshots-batch-validation

Conversation

@Miyamura80
Copy link
Copy Markdown
Contributor

@Miyamura80 Miyamura80 commented Mar 31, 2026

Summary

  • Lazy-load screenshot data URLs: Removed the eagerly-stored screenshot_data_url field from the Observation struct. Screenshots are now loaded from disk on demand via Observation::load_screenshot_data_url() only when needed (building LLM messages, monitor events, judge calls). This eliminates ~1-2MB of retained memory per agent step — previously the full base64 string was held in the trajectory vector indefinitely, even after falling out of the sliding window.

  • Batch Docker image validation: Replaced 9+ separate Docker exec round-trips in validate_custom_image() with a single batched shell script. Per-check error granularity is preserved via structured CHECK:<tag>:<status> output parsing. The Xauthority-specific warning log is also preserved.

Test plan

  • All 523 unit tests pass
  • All 3 macOS integration tests pass
  • Clippy clean (no warnings)
  • Run an E2E test with --monitor to verify live dashboard still shows screenshots
  • Run against a custom Docker image with missing deps to verify per-check error messages

🤖 Generated with Claude Code


Open with Devin

Miyamura80 and others added 3 commits March 31, 2026 23:48
… calls

Two memory/latency optimizations:

1. Screenshots are no longer eagerly base64-encoded and stored in the
   Observation struct. Instead, load_screenshot_data_url() reads from
   disk on demand when building LLM messages or monitor events. This
   eliminates ~1-2MB of retained memory per step in the trajectory,
   which previously grew unbounded across all agent steps.

2. Custom Docker image validation now runs a single batched shell script
   instead of 9+ separate Docker exec round-trips (5 binary checks,
   3 Python import checks, 1 file existence check). Per-check error
   granularity and the Xauthority warning are preserved via structured
   output parsing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@greptile-apps
Copy link
Copy Markdown

greptile-apps bot commented Mar 31, 2026

Greptile Summary

This PR delivers two independent performance improvements: lazy-loading screenshot data URLs from disk on demand (eliminating ~1–2 MB of heap retention per trajectory step) and collapsing the custom Docker image validation from 9+ serial exec round-trips into a single batched shell script with structured output parsing.

Key changes:

  • Observation.screenshot_data_url field removed; replaced by load_screenshot_data_url() which reads and base64-encodes the PNG on demand — all three call sites (observation_to_message, publish_step_event, build_judge_messages) updated accordingly
  • capture_screenshot_with_retry / capture_screenshot_once return only PathBuf instead of (PathBuf, String); two orchestration.rs callers updated
  • validate_custom_image() builds one sh -c script with if/fi blocks joined by ;, parses CHECK:<tag>:<status> lines to reconstruct per-check results with a no-output guard
  • Test helpers in context.rs now return (Observation, NamedTempFile) with isolated temp files, fixing the parallel-test race flagged in the prior review

One finding:

  • The empty-output guard in validate_custom_image() only catches zero-output failures. If the script produces partial output (e.g., container is OOM-killed mid-execution), checks that never emitted a line will be silently treated as OK, potentially letting an incomplete validation pass. Adding a presence check for either :OK or :MISSING per item would make truncation detectable.

Confidence Score: 5/5

Safe to merge; the one finding is a P2 edge case in the Docker validation guard that does not affect normal operation.

Both prior P1 concerns are resolved: the parallel-test race is fixed with isolated NamedTempFiles, and the multi-read memory tradeoff is intentionally deferred. The only remaining finding is a P2 edge-case (truncated validation script treated as all-OK), which requires an unusual container failure mid-exec to trigger and does not affect the primary agent or screenshot paths.

src/docker/mod.rs — partial-output guard in validate_custom_image() could be strengthened

Important Files Changed

Filename Overview
src/observation.rs Removes screenshot_data_url field and adds load_screenshot_data_url() for on-demand disk reads; signature of capture_screenshot_with_retry simplified to return only the path; new unit tests added for the lazy-load method.
src/docker/mod.rs Replaces 9+ sequential exec calls with a single batched shell script; structured CHECK:: output parsed for per-check granularity; empty-output guard added, but partial-output (truncated script) is silently treated as all-OK.
src/agent/context.rs Switches observation_to_message() to use load_screenshot_data_url(); test helpers updated to return (Observation, NamedTempFile) to fix the parallel-test race condition identified in the prior review.
src/agent/loop_v2.rs Three call sites updated to use load_screenshot_data_url() instead of the now-removed field; error handling via warn! is consistent across all three.
src/orchestration.rs Two call sites updated to destructure the new single-value return from capture_screenshot_with_retry (previously (path, _), now just path).

Sequence Diagram

sequenceDiagram
    participant Agent as AgentLoopV2
    participant Obs as Observation
    participant Disk as Local FS
    participant LLM as LLM API

    Note over Agent,Disk: Per agent step - new lazy-load flow
    Agent->>Disk: capture_screenshot_with_retry returns PathBuf
    Disk-->>Agent: path saved

    Agent->>Obs: create Observation with screenshot_path only
    Note over Obs: No base64 string retained in memory

    Agent->>Obs: load_screenshot_data_url for LLM context
    Obs->>Disk: fs read path
    Disk-->>Obs: raw bytes
    Obs-->>Agent: base64 data URL
    Agent->>LLM: build_messages with image

    Agent->>Obs: load_screenshot_data_url for monitor
    Obs->>Disk: fs read path
    Disk-->>Obs: raw bytes
    Obs-->>Agent: base64 payload stripped

    Agent->>Obs: load_screenshot_data_url for judge
    Obs->>Disk: fs read path
    Disk-->>Obs: raw bytes
    Obs-->>Agent: base64 data URL
    Note over Agent: Observation dropped - no large string retained in trajectory
Loading

Reviews (2): Last reviewed commit: "fix: log warnings on screenshot load fai..." | Re-trigger Greptile

Comment thread src/agent/context.rs Outdated
Comment thread src/observation.rs
devin-ai-integration[bot]

This comment was marked as resolved.

1. Use tempfile::NamedTempFile in context.rs test helpers to avoid
   parallel-test races on a shared temp path.

2. Add empty-output guard in validate_custom_image() so a killed or
   truncated validation script returns an error instead of silently
   reporting a clean image.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Miyamura80
Copy link
Copy Markdown
Contributor Author

@greptileai

Previously load errors were silently swallowed via .ok(). Now all three
call sites (context, monitor, judge) consistently log tracing::warn!
before falling back to no-screenshot behavior.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Miyamura80
Copy link
Copy Markdown
Contributor Author

@greptileai

@Miyamura80 Miyamura80 merged commit 8ac28b0 into master Apr 1, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant