perf: lazy-load screenshots and batch Docker validation#97
perf: lazy-load screenshots and batch Docker validation#97Miyamura80 merged 5 commits intomasterfrom
Conversation
… calls Two memory/latency optimizations: 1. Screenshots are no longer eagerly base64-encoded and stored in the Observation struct. Instead, load_screenshot_data_url() reads from disk on demand when building LLM messages or monitor events. This eliminates ~1-2MB of retained memory per step in the trajectory, which previously grew unbounded across all agent steps. 2. Custom Docker image validation now runs a single batched shell script instead of 9+ separate Docker exec round-trips (5 binary checks, 3 Python import checks, 1 file existence check). Per-check error granularity and the Xauthority warning are preserved via structured output parsing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Greptile SummaryThis PR delivers two independent performance improvements: lazy-loading screenshot data URLs from disk on demand (eliminating ~1–2 MB of heap retention per trajectory step) and collapsing the custom Docker image validation from 9+ serial Key changes:
One finding:
Confidence Score: 5/5Safe to merge; the one finding is a P2 edge case in the Docker validation guard that does not affect normal operation. Both prior P1 concerns are resolved: the parallel-test race is fixed with isolated NamedTempFiles, and the multi-read memory tradeoff is intentionally deferred. The only remaining finding is a P2 edge-case (truncated validation script treated as all-OK), which requires an unusual container failure mid-exec to trigger and does not affect the primary agent or screenshot paths. src/docker/mod.rs — partial-output guard in validate_custom_image() could be strengthened Important Files Changed
Sequence DiagramsequenceDiagram
participant Agent as AgentLoopV2
participant Obs as Observation
participant Disk as Local FS
participant LLM as LLM API
Note over Agent,Disk: Per agent step - new lazy-load flow
Agent->>Disk: capture_screenshot_with_retry returns PathBuf
Disk-->>Agent: path saved
Agent->>Obs: create Observation with screenshot_path only
Note over Obs: No base64 string retained in memory
Agent->>Obs: load_screenshot_data_url for LLM context
Obs->>Disk: fs read path
Disk-->>Obs: raw bytes
Obs-->>Agent: base64 data URL
Agent->>LLM: build_messages with image
Agent->>Obs: load_screenshot_data_url for monitor
Obs->>Disk: fs read path
Disk-->>Obs: raw bytes
Obs-->>Agent: base64 payload stripped
Agent->>Obs: load_screenshot_data_url for judge
Obs->>Disk: fs read path
Disk-->>Obs: raw bytes
Obs-->>Agent: base64 data URL
Note over Agent: Observation dropped - no large string retained in trajectory
Reviews (2): Last reviewed commit: "fix: log warnings on screenshot load fai..." | Re-trigger Greptile |
1. Use tempfile::NamedTempFile in context.rs test helpers to avoid parallel-test races on a shared temp path. 2. Add empty-output guard in validate_custom_image() so a killed or truncated validation script returns an error instead of silently reporting a clean image. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Previously load errors were silently swallowed via .ok(). Now all three call sites (context, monitor, judge) consistently log tracing::warn! before falling back to no-screenshot behavior. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
Lazy-load screenshot data URLs: Removed the eagerly-stored
screenshot_data_urlfield from theObservationstruct. Screenshots are now loaded from disk on demand viaObservation::load_screenshot_data_url()only when needed (building LLM messages, monitor events, judge calls). This eliminates ~1-2MB of retained memory per agent step — previously the full base64 string was held in the trajectory vector indefinitely, even after falling out of the sliding window.Batch Docker image validation: Replaced 9+ separate Docker exec round-trips in
validate_custom_image()with a single batched shell script. Per-check error granularity is preserved via structuredCHECK:<tag>:<status>output parsing. The Xauthority-specific warning log is also preserved.Test plan
--monitorto verify live dashboard still shows screenshots🤖 Generated with Claude Code