Skip to content

fix(cli): fix --task flag concatenation bug and three other issues#31

Merged
abrichr merged 2 commits into
mainfrom
fix/cli-task-flag-and-bugs
Feb 13, 2026
Merged

fix(cli): fix --task flag concatenation bug and three other issues#31
abrichr merged 2 commits into
mainfrom
fix/cli-task-flag-and-bugs

Conversation

@abrichr
Copy link
Copy Markdown
Member

@abrichr abrichr commented Feb 13, 2026

Summary

Fixes four bugs reported in the WAA benchmarks CLI:

Bug 1 (Critical): --task flag produces find_task.pycd

Missing && separator between pre_cmd and cd /client caused shell to receive:

python3 /tmp/find_task.pycd /client && python run.py ...

Every run --task <uuid> invocation since v0.4.2 silently failed. Fixed by adding && after find_task.py.

Bug 2: --num-tasks defaults to 1

Changed default from 1 to None (all tasks). Previously, run without task filtering silently ran only 1 task.

Bug 3: probe --wait timeout too short for first boot

Increased default from 1200s (20min) to 1800s (30min). Windows OOBE + WAA startup routinely takes 18-22 min on first boot.

Bug 4: Default VM size OOMs with navi agent

Changed default VM from Standard_D4ds_v4 (16GB) to Standard_D8ds_v5 (32GB). The navi agent's GroundingDINO + SoM models exhaust 16GB RAM, triggering OOM killer on QEMU. Added runtime warning when standard mode is used explicitly.

Test plan

  • 216 tests pass (7 pre-existing failures from missing test data)
  • find_task.py && separator verified in code
  • --num-tasks default is None, display shows "all tasks"
  • --timeout default is 1800
  • VM_SIZE defaults to VM_SIZE_FAST

🤖 Generated with Claude Code

abrichr and others added 2 commits February 13, 2026 15:18
Bug 1 (Critical): --task flag produced `find_task.pycd` due to missing
`&&` separator between pre_cmd and `cd /client`. Every `run --task`
invocation since v0.4.2 silently failed. Fixed by adding `&&`.

Bug 2: --num-tasks defaulted to 1, silently limiting runs. Changed
default to None (all tasks).

Bug 3: probe --wait timeout of 1200s was too short for first boot
(OOBE takes 18-22 min). Increased to 1800s.

Bug 4: Default VM size (D4ds_v4, 16GB) OOMs with navi agent's
GroundingDINO + SoM models. Changed default to D8ds_v5 (32GB).
Added warning when standard mode is used explicitly.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
D4ds_v4 (16GB) OOMs with navi agent's GroundingDINO + SoM models.
Standardize on D8ds_v5 across all commands — no more --fast/--standard flags.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@abrichr abrichr merged commit b0e09e9 into main Feb 13, 2026
@abrichr abrichr deleted the fix/cli-task-flag-and-bugs branch February 28, 2026 14:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant