refactor(cli): minimal WAA CLI with vanilla image support by abrichr · Pull Request #14 · OpenAdaptAI/openadapt-ml

abrichr · 2026-01-26T20:37:03Z

Summary

Complete CLI refactor for WAA benchmark automation. Replaces the 6800-line CLI with a minimal 1300-line implementation that uses the vanilla Microsoft WAA image.

Key Changes

CLI Refactor (-5500 lines)

Replaced complex vm subcommand structure with flat commands: create, run, probe, analyze, etc.
Removed unused code paths and simplified command routing
Added analyze command to parse and summarize benchmark results
Added --num-tasks to limit the number of tasks to run

Vanilla WAA Image Support

Uses official windowsarena/winarena:latest Docker image
Custom Dockerfile copies Python 3.9 from vanilla image (fixes transformers compatibility)
IP patches for dockurr/windows compatibility (172.30.0.2)

Python 3.9 Compatibility Fix

GroundingDINO requires transformers 4.46.2 (not 5.x)
Fixed by copying Python 3.9 and all packages from vanilla WAA image
This resolves: AttributeError: 'BertModel' has no attribute 'get_head_mask'

Results Analysis

New analyze command parses downloaded benchmark logs
Shows success rate by domain
Handles ANSI color codes in logs

Commands

# Create VM with Docker and WAA image
uv run python -m openadapt_ml.benchmarks.cli create

# Run benchmark (auto-downloads results)
uv run python -m openadapt_ml.benchmarks.cli run --num-tasks 5 --model gpt-4o

# Check WAA server status
uv run python -m openadapt_ml.benchmarks.cli probe --wait

# Analyze results
uv run python -m openadapt_ml.benchmarks.cli analyze

# View logs
uv run python -m openadapt_ml.benchmarks.cli logs --lines 100

# Clean up
uv run python -m openadapt_ml.benchmarks.cli deallocate -y

Files Changed

openadapt_ml/benchmarks/cli.py - Complete refactor (6800 → 1300 lines)
openadapt_ml/benchmarks/waa_deploy/Dockerfile - Python 3.9 compatibility
docs/CLI_V2_DESIGN.md - Design documentation
.gitignore - Coverage and analysis artifacts

Test Plan

probe - Correctly detects WAA server status
run --num-tasks 2 - Limits tasks correctly
analyze - Parses benchmark logs and shows results by domain
logs - Shows container logs
navi agent runs successfully (Python 3.9 fix working)

🤖 Generated with Claude Code

- Refactor CLI from 6800 to ~1300 lines with flat command structure - Add analyze command to parse and summarize benchmark results - Add --num-tasks flag to limit number of tasks to run - Fix Python 3.9 compatibility by copying Python from vanilla WAA image (fixes transformers 4.46.2 compatibility with GroundingDINO) - Add coverage and analysis artifacts to .gitignore Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Fix broken build badge (publish.yml → release.yml) - Add prominent "Parallel WAA Benchmark Evaluation" section near top - Add detailed "WAA Benchmark Workflow" section (#14) with: - Single VM and parallel pool workflows - VNC access instructions - Architecture diagram - Cost estimates - Update section numbering (Limitations → 15, Roadmap → 16) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* docs(readme): add parallel WAA evaluation section, fix build badge - Fix broken build badge (publish.yml → release.yml) - Add prominent "Parallel WAA Benchmark Evaluation" section near top - Add detailed "WAA Benchmark Workflow" section (#14) with: - Single VM and parallel pool workflows - VNC access instructions - Architecture diagram - Cost estimates - Update section numbering (Limitations → 15, Roadmap → 16) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix(readme): address self-review feedback - Fix anchor placement (move before heading for proper navigation) - Correct pool-delete → pool-cleanup (actual command name) - Add pool-status example for getting worker IPs - Add "prices vary by region" caveat Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

abrichr force-pushed the feature/vanilla-waa-cli-clean branch from fcee717 to 47a4d85 Compare January 26, 2026 20:39

abrichr changed the title ~~fix: update default emulator IP to 20.20.20.21 for official WAA~~ fix: CLI improvements for vanilla WAA automation Jan 26, 2026

abrichr changed the title ~~fix: CLI improvements for vanilla WAA automation~~ refactor(cli): minimal WAA CLI with vanilla image support Jan 27, 2026

abrichr force-pushed the feature/vanilla-waa-cli-clean branch 2 times, most recently from 5c51626 to 070225b Compare January 27, 2026 22:07

abrichr merged commit 5557130 into main Jan 27, 2026
0 of 8 checks passed

abrichr mentioned this pull request Feb 5, 2026

docs(readme): add parallel WAA evaluation, fix build badge #19

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(cli): minimal WAA CLI with vanilla image support#14

refactor(cli): minimal WAA CLI with vanilla image support#14
abrichr merged 1 commit intomainfrom
feature/vanilla-waa-cli-clean

abrichr commented Jan 26, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

abrichr commented Jan 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key Changes

CLI Refactor (-5500 lines)

Vanilla WAA Image Support

Python 3.9 Compatibility Fix

Results Analysis

Commands

Files Changed

Test Plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

abrichr commented Jan 26, 2026 •

edited

Loading