feat: add standalone GRPO trainer with WAADirect (no openadapt-ml dependency) by abrichr · Pull Request #191 · OpenAdaptAI/openadapt-evals

abrichr · 2026-03-23T20:30:49Z

Summary

Standalone GRPO training package (openadapt_evals/training/standalone/) with zero openadapt-ml imports
WAADirect HTTP client replaces WAALiveAdapter + RLEnvironment stack, eliminating version coupling and adapter indirection
695 LOC total across 7 modules + CLI wrapper script
Designed for later migration to openadapt-ml once training API stabilizes

Package structure

File	LOC	Purpose
`config.py`	30	`TrainingConfig` dataclass
`waa_direct.py`	125	Direct HTTP: screenshot/click/type/key
`prompt.py`	145	SYSTEM_PROMPT + message builder + multi-format parser
`reward.py`	35	Group-relative advantages + VLM milestone eval
`model_loader.py`	51	HuggingFace + PEFT model loading
`trainer.py`	281	GRPOTrainer: rollout collection + training loop
`__init__.py`	9	Package exports

Key design decisions

ZERO openadapt-ml imports -- self-contained, portable
WAADirect uses direct HTTP -- requests.post(server_url + "/execute_windows", ...), no adapter abstraction
Prompt format matches SFT -- copies SYSTEM_PROMPT from next_action.py, supports Thought:/Action: format
Multi-format parser -- handles Thought/Action, bare DSL (CLICK(...)), and JSON {"action_type":"click"}
max_new_tokens=2048 default -- 100 was catastrophically low (truncates reasoning)
Screenshot-only milestone evaluation -- VLM judge via OpenAI API, no PowerShell
Fresh screenshot for evaluation -- taken at eval time, not cached
Per-step backward -- avoids OOM on long trajectories

Usage

python -m openadapt_evals.training.standalone.trainer \
    --task-dir example_tasks \
    --server-url http://localhost:5001 \
    --model Qwen/Qwen3.5-9B \
    --num-steps 10 \
    --output checkpoints/

Test plan

Syntax check passes (verified)
Zero openadapt-ml imports (verified via grep)
Line count under 700 (695 actual)
CLI --help works
End-to-end test on WAA VM (Phase 3 validation)

🤖 Generated with Claude Code

…endency) Self-contained GRPO training package that eliminates the openadapt-ml dependency for RL training. Uses direct HTTP calls to WAA Flask server (WAADirect) instead of the WAALiveAdapter + RLEnvironment stack, removing version coupling and adapter indirection. Package structure (695 LOC total): - config.py: TrainingConfig dataclass - waa_direct.py: WAADirect HTTP client (screenshot/click/type/key) - prompt.py: SYSTEM_PROMPT + build_agent_messages + parse_vlm_output_to_action - reward.py: compute_group_advantages + evaluate_milestones_screenshot - model_loader.py: load_model_and_processor (HF + PEFT) - trainer.py: GRPOTrainer with rollout collection + training loop Key design decisions: - ZERO openadapt-ml imports (self-contained, will migrate later) - max_new_tokens=2048 default (100 was catastrophically low) - Multi-format parser (Thought/Action, bare DSL, JSON) - Fresh screenshot for evaluation (not cached) - Per-step backward to avoid OOM on long trajectories - VLM judge via OpenAI API for milestone evaluation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

abrichr merged commit ba049f7 into main Mar 23, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add standalone GRPO trainer with WAADirect (no openadapt-ml dependency)#191

feat: add standalone GRPO trainer with WAADirect (no openadapt-ml dependency)#191
abrichr merged 1 commit into
mainfrom
feat/standalone-grpo

abrichr commented Mar 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

abrichr commented Mar 23, 2026

Summary

Package structure

Key design decisions

Usage

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant