feat: add PlannerGrounderAgent for dual-model GUI automation by abrichr · Pull Request #134 · OpenAdaptAI/openadapt-evals

abrichr · 2026-03-18T20:10:22Z

Summary

PlannerGrounderAgent composes a planner and a grounder for dual-model GUI automation:

Planner sees screenshot + accessibility tree → high-level instruction
Grounder sees screenshot + instruction → pixel coordinates
Supports BenchmarkAgent instances, VLM API calls, or HTTP endpoints for each role
Action history tracking for planner context
DONE/FAIL handling, grounder retry with simplified prompt

Usage

agent = PlannerGrounderAgent(
    planner="claude-sonnet-4-6",
    grounder="http",
    grounder_endpoint="http://gpu-server:8000/v1",
)

Test plan

23 tests passing (pipeline, history, retry, HTTP, mixed types, a11y tree)

🤖 Generated with Claude Code

Implements the planner-grounder architecture from the GUI agent literature (SeeAct ICML 2024, UFO2 2025, CODA 2025): - Planner sees screenshot + a11y tree, outputs high-level instruction - Grounder sees screenshot + instruction, outputs pixel coordinates - Supports agent instances, VLM API calls, or HTTP endpoints - Action history tracking, DONE/FAIL handling, grounder retry - Registered in agent registry 23 tests passing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- HTTP grounder now uses OpenAI chat completions API format (compatible with vLLM, Ollama, any OpenAI-compatible server) - Sends screenshots as base64-encoded images - serve_grounder.sh: start UI-Venus-1.5-8B via vLLM - run_planner_grounder.py: full experiment script (Claude planner + UI-Venus grounder against WAA VM) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…nderAgent Fixes from first live experiment (Claude planner + UI-Venus grounder on WAA VM): 1. Parse planner instructions for type/key/scroll actions — these bypass the grounder (which only returns click coordinates) 2. Planner prompt now requires ONE ATOMIC action per step (no compound "click X and type Y") 3. Grounder bbox parser handles UI-Venus [x1,y1,x2,y2] format, JSON format, and coordinate pairs 4. Float conversion for coordinates in run script and base.py 5. Added UI-Venus RL training review doc Experiment result: planner correctly navigated Start → Notepad → text area. Grounder returned accurate bounding boxes. Typing failed because compound instructions weren't decomposed — now fixed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Four fixes from live experiment: 1. Key actions with modifiers (Ctrl+A) now use pyautogui.hotkey() instead of press(). Parser stores modifiers separately on BenchmarkAction. Adapter handles both modifier+key combos and legacy "ctrl+a" string format. 2. Planner prompt now includes anti-loop rule: "If your last 3 actions were the same, try a completely different approach." 3. Logging shows planner instruction correctly (was showing "?"). 4. --save-screenshots flag saves PNGs at each step for debugging. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

… training features Covers ~20 PRs merged since March 17 (#134-#157): PlannerGrounderAgent dual-model architecture, TaskConfig YAML custom tasks, 4-pass workflow extraction pipeline, RL training infra (TRL GRPO rollout, AReaL workflow, OpenEnv), LocalAdapter + ScrubMiddleware for governed desktop agent, correction flywheel, strict mode, and task setup dispatch. Updated architecture tree and key files table. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

… training features (#158) Covers ~20 PRs merged since March 17 (#134-#157): PlannerGrounderAgent dual-model architecture, TaskConfig YAML custom tasks, 4-pass workflow extraction pipeline, RL training infra (TRL GRPO rollout, AReaL workflow, OpenEnv), LocalAdapter + ScrubMiddleware for governed desktop agent, correction flywheel, strict mode, and task setup dispatch. Updated architecture tree and key files table. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…, and demo-guided execution Add detailed usage documentation for all major features added in PRs #134-#165: - Docker/WAA Container: --cap-add NET_ADMIN requirement, full docker run command, boot timeline, port reference - Full Evaluation Runner (run_full_eval.py): all flags with defaults, example commands for smoke test, resume, parallel, HTTP grounder - Distillation Pipeline: two-step workflow (collect_distillation_data.py + finetune_distilled.py), all flags, mock validation mode - Demo-Guided Execution: DemoLibrary API, DemoGuidedAgent with self- verification, recording workflow - Task Setup Config Types: all 15 supported types with example params - Strict Mode: ScrubMiddleware, workflow pipeline, WAALiveAdapter - Pool Execution: external agent_factory support via PoolManager.run() - Updated Quick Start with copy-pasteable sequence - Updated Architecture tree with new files (demo_library, demo_guided_agent, scripts/) - Updated Key Files table with new entries Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…, and demo-guided execution (#166) Add detailed usage documentation for all major features added in PRs #134-#165: - Docker/WAA Container: --cap-add NET_ADMIN requirement, full docker run command, boot timeline, port reference - Full Evaluation Runner (run_full_eval.py): all flags with defaults, example commands for smoke test, resume, parallel, HTTP grounder - Distillation Pipeline: two-step workflow (collect_distillation_data.py + finetune_distilled.py), all flags, mock validation mode - Demo-Guided Execution: DemoLibrary API, DemoGuidedAgent with self- verification, recording workflow - Task Setup Config Types: all 15 supported types with example params - Strict Mode: ScrubMiddleware, workflow pipeline, WAALiveAdapter - Pool Execution: external agent_factory support via PoolManager.run() - Updated Quick Start with copy-pasteable sequence - Updated Architecture tree with new files (demo_library, demo_guided_agent, scripts/) - Updated Key Files table with new entries Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

abrichr and others added 4 commits March 18, 2026 16:10

abrichr merged commit a6eac6e into main Mar 18, 2026
1 check passed

abrichr mentioned this pull request Mar 20, 2026

docs: comprehensive usage documentation for eval runner, distillation, and demo-guided execution #166

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add PlannerGrounderAgent for dual-model GUI automation#134

feat: add PlannerGrounderAgent for dual-model GUI automation#134
abrichr merged 4 commits intomainfrom
feat/planner-grounder-agent

abrichr commented Mar 18, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

abrichr commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Usage

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

abrichr commented Mar 18, 2026 •

edited

Loading