feat: add PlannerGrounderAgent for dual-model GUI automation#134
Merged
feat: add PlannerGrounderAgent for dual-model GUI automation#134
Conversation
Implements the planner-grounder architecture from the GUI agent literature (SeeAct ICML 2024, UFO2 2025, CODA 2025): - Planner sees screenshot + a11y tree, outputs high-level instruction - Grounder sees screenshot + instruction, outputs pixel coordinates - Supports agent instances, VLM API calls, or HTTP endpoints - Action history tracking, DONE/FAIL handling, grounder retry - Registered in agent registry 23 tests passing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- HTTP grounder now uses OpenAI chat completions API format (compatible with vLLM, Ollama, any OpenAI-compatible server) - Sends screenshots as base64-encoded images - serve_grounder.sh: start UI-Venus-1.5-8B via vLLM - run_planner_grounder.py: full experiment script (Claude planner + UI-Venus grounder against WAA VM) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…nderAgent Fixes from first live experiment (Claude planner + UI-Venus grounder on WAA VM): 1. Parse planner instructions for type/key/scroll actions — these bypass the grounder (which only returns click coordinates) 2. Planner prompt now requires ONE ATOMIC action per step (no compound "click X and type Y") 3. Grounder bbox parser handles UI-Venus [x1,y1,x2,y2] format, JSON format, and coordinate pairs 4. Float conversion for coordinates in run script and base.py 5. Added UI-Venus RL training review doc Experiment result: planner correctly navigated Start → Notepad → text area. Grounder returned accurate bounding boxes. Typing failed because compound instructions weren't decomposed — now fixed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Four fixes from live experiment: 1. Key actions with modifiers (Ctrl+A) now use pyautogui.hotkey() instead of press(). Parser stores modifiers separately on BenchmarkAction. Adapter handles both modifier+key combos and legacy "ctrl+a" string format. 2. Planner prompt now includes anti-loop rule: "If your last 3 actions were the same, try a completely different approach." 3. Logging shows planner instruction correctly (was showing "?"). 4. --save-screenshots flag saves PNGs at each step for debugging. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
abrichr
added a commit
that referenced
this pull request
Mar 20, 2026
… training features Covers ~20 PRs merged since March 17 (#134-#157): PlannerGrounderAgent dual-model architecture, TaskConfig YAML custom tasks, 4-pass workflow extraction pipeline, RL training infra (TRL GRPO rollout, AReaL workflow, OpenEnv), LocalAdapter + ScrubMiddleware for governed desktop agent, correction flywheel, strict mode, and task setup dispatch. Updated architecture tree and key files table. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
abrichr
added a commit
that referenced
this pull request
Mar 20, 2026
… training features (#158) Covers ~20 PRs merged since March 17 (#134-#157): PlannerGrounderAgent dual-model architecture, TaskConfig YAML custom tasks, 4-pass workflow extraction pipeline, RL training infra (TRL GRPO rollout, AReaL workflow, OpenEnv), LocalAdapter + ScrubMiddleware for governed desktop agent, correction flywheel, strict mode, and task setup dispatch. Updated architecture tree and key files table. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
abrichr
added a commit
that referenced
this pull request
Mar 20, 2026
…, and demo-guided execution Add detailed usage documentation for all major features added in PRs #134-#165: - Docker/WAA Container: --cap-add NET_ADMIN requirement, full docker run command, boot timeline, port reference - Full Evaluation Runner (run_full_eval.py): all flags with defaults, example commands for smoke test, resume, parallel, HTTP grounder - Distillation Pipeline: two-step workflow (collect_distillation_data.py + finetune_distilled.py), all flags, mock validation mode - Demo-Guided Execution: DemoLibrary API, DemoGuidedAgent with self- verification, recording workflow - Task Setup Config Types: all 15 supported types with example params - Strict Mode: ScrubMiddleware, workflow pipeline, WAALiveAdapter - Pool Execution: external agent_factory support via PoolManager.run() - Updated Quick Start with copy-pasteable sequence - Updated Architecture tree with new files (demo_library, demo_guided_agent, scripts/) - Updated Key Files table with new entries Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3 tasks
abrichr
added a commit
that referenced
this pull request
Mar 20, 2026
…, and demo-guided execution (#166) Add detailed usage documentation for all major features added in PRs #134-#165: - Docker/WAA Container: --cap-add NET_ADMIN requirement, full docker run command, boot timeline, port reference - Full Evaluation Runner (run_full_eval.py): all flags with defaults, example commands for smoke test, resume, parallel, HTTP grounder - Distillation Pipeline: two-step workflow (collect_distillation_data.py + finetune_distilled.py), all flags, mock validation mode - Demo-Guided Execution: DemoLibrary API, DemoGuidedAgent with self- verification, recording workflow - Task Setup Config Types: all 15 supported types with example params - Strict Mode: ScrubMiddleware, workflow pipeline, WAALiveAdapter - Pool Execution: external agent_factory support via PoolManager.run() - Updated Quick Start with copy-pasteable sequence - Updated Architecture tree with new files (demo_library, demo_guided_agent, scripts/) - Updated Key Files table with new entries Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
PlannerGrounderAgentcomposes a planner and a grounder for dual-model GUI automation:Usage
Test plan
🤖 Generated with Claude Code