Skip to content

feat: replace bulky skills with versioned CLI help#352

Merged
thymikee merged 22 commits intomainfrom
codex/skill-router-refactor
Apr 28, 2026
Merged

feat: replace bulky skills with versioned CLI help#352
thymikee merged 22 commits intomainfrom
codex/skill-router-refactor

Conversation

@thymikee
Copy link
Copy Markdown
Contributor

@thymikee thymikee commented Apr 2, 2026

Summary

Move agent operating guidance from large installed skill reference trees into version-matched agent-device help topics, while keeping published skills as tiny routers that point agents at the installed CLI's own help.

Before

  • Skills carried most of the operating contract in markdown reference trees under skills/**/references.
  • Agents had to read installed skill docs or package files to learn command shapes and workflow rules.
  • Guidance could drift from the installed agent-device CLI version.
  • Website docs exposed SkillGym internals that are useful for maintainers but not product users.
  • Baseline from the thread: 20/48 SkillGym cases, 51/96 runner executions.

After

  • agent-device help includes an Agent Quickstart and workflow links for agents that only run plain help.
  • agent-device help workflow carries bootstrap, exploration, targeting, validation, React Native, Expo, remote/cloud, macOS, dogfood routing, and Shopify-style workaround guidance.
  • help workflow now includes a compact snapshot legend showing @e12 refs, labels/ids, truncated previews, disabled/hittable state, and off-screen hints.
  • Bootstrap guidance explicitly tells smaller models that discovery must still end in open, install means install, freshness comes from open --relaunch, and Expo URLs should not be searched for when the task already gives a launch target.
  • Debug guidance keeps the canonical short forms visible in plain help/workflow help: logs clear --restart and network dump --include headers.
  • Known limitation guidance is routed through help: iOS Allow Paste is manual-only under XCUITest with clipboard write prefill for app behavior, and Android non-ASCII text stays on fill/type with trusted ADB keyboard IME as the external fallback.
  • Focus topics cover specialized flows: debugging, react-devtools, remote, macos, and dogfood.
  • Published agent-device, react-devtools, and dogfood skills are small routers with a required agent-device >= 0.13.4 compatibility gate and upgrade instruction.
  • README and installation docs recommend skills as optional auto-routing helpers, while making CLI help the required version-matched source of truth.
  • Website SkillGym docs and nav entry were removed; repo-local test/skillgym docs remain for maintainers.
  • SkillGym BASE_INSTRUCTIONS is intentionally thin and points agents at local CLI help instead of embedding a command cheat sheet.
  • pnpm test:skillgym builds before running, the runner is renamed codex-main -> codex-mini, per-case timeout is 10m, and scheduling is isolated-by-runner.

Guidance Coverage

Restored and regression-tested high-value guidance from the removed skill references and Shopify feedback:

  • command shapes: no pseudo commands, no raw platform tools, no invented refs/package ids
  • bootstrap: devices/apps, install vs open, GitHub Actions artifact install, Expo Go/dev-client URLs, fresh open --relaunch after installs
  • exploration: read-only snapshot/get/is/find, snapshot -i when refs are needed, scoped snapshots for truncated inputs
  • targeting: selectors/refs first, concrete @e12 refs, raw coordinate fallback only for iOS disabled/no-op refs or collapsed composite controls
  • verification: expected text/selector assertions, local diff snapshot, overlay refs for visual evidence, wait for async/list text presence
  • debugging: logs clear/mark/path, network dump headers, alerts, traces, Android animation stabilizers
  • React Native: Metro reload, warning overlays, Expo quirks, React DevTools profiling/render-cause flows
  • remote/cloud: connect --remote-config, per-command script flow, lease/tenant config, local service tunnels
  • macOS: menu bar/frontmost/desktop surfaces, secondary-click context menus, re-snapshot for menu-item refs
  • dogfood: QA loop, severity/category calibration, auth/OTP handling, runtime-only findings, evidence/report shape, artifact preservation
  • known limitations: iOS pasteboard prompt suppression and Android non-ASCII input behavior

Size And Quality

  • Skills package: 12,315 words / 88,007 bytes on origin/main -> 541 words / 4,042 bytes now.
  • SkillGym BASE_INSTRUCTIONS: 75 words / 483 bytes.
  • Top-level agent-device help: 1,038 words / 12,060 bytes.
  • agent-device help workflow: 1,297 words / 9,955 bytes.
  • Focus topics remain separate on-demand topics.
  • Full batched SkillGym after the bootstrap wording pass, before the latest debug/install fixes:
    • codex-mini: 64/66 cases, 64/66 runs
    • claude-haiku: 63/66 cases, 63/66 runs
  • The remaining failed areas were rerun after fixes and now pass individually on both runners: setup-unknown-app-discover-first, install-from-github-artifact-before-open, debug-logs-short-window, debug-network-session-dump.
  • The two new known-limitation cases pass individually on both runners.

Validation

  • pnpm format
  • pnpm vitest run src/__tests__/cli-help.test.ts src/utils/__tests__/args.test.ts
  • pnpm vitest run src/__tests__/cli-help.test.ts src/utils/__tests__/args.test.ts src/core/__tests__/batch.test.ts
  • pnpm typecheck
  • pnpm build
  • node bin/agent-device.mjs --version -> 0.13.4
  • Batched full SkillGym:
    • pnpm exec skillgym run ./test/skillgym/suites/agent-device-smoke-suite.ts --config ./test/skillgym/skillgym.config.ts --runner codex-mini -> 64/66; output test/skillgym/.skillgym-results/2026-04-27T22-43-14-618Z
    • pnpm exec skillgym run ./test/skillgym/suites/agent-device-smoke-suite.ts --config ./test/skillgym/skillgym.config.ts --runner claude-haiku -> 63/66; output test/skillgym/.skillgym-results/2026-04-27T23-32-38-591Z
  • Failed-case reruns after fixes, all 2/2: setup-unknown-app-discover-first, install-from-github-artifact-before-open, debug-logs-short-window, debug-network-session-dump
  • Previous full-run failures rerun and passing individually on both runners: open-and-snapshot, setup-unknown-app-discover-first, install-artifact-before-open
  • Targeted SkillGym known-limitations cases, both 2/2: ios-allow-paste-prefill-only, android-non-ascii-text-stays-in-fill
  • Targeted SkillGym workflow batches, all 2/2 after fixes: target-ref-after-interactive-snapshot, ios-disabled-row-raw-rect-fallback, ios-composite-horizontal-tabs-coordinate-fallback, list-text-presence-prefers-wait-text, navigation-back-ambiguous-use-visible-nav, batch-inline-step-schema-positionals, form-keyboard-dismiss, catalog-search-debounce, install-from-github-artifact-before-open, remote-config-script-flow

Known gap: I started one final full codex-mini rerun after the latest fixes, but stopped it when correcting the version-floor mistake back to 0.13.4. The changed and previously failing cases were run directly on both runners.

@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 2, 2026

PR Preview Action v1.8.1

QR code for preview link

🚀 View preview at
https://callstackincubator.github.io/agent-device/pr-preview/pr-352/

Built to branch gh-pages at 2026-04-28 01:54 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

@thymikee thymikee force-pushed the codex/skill-router-refactor branch from 6b79dcc to 6a1f630 Compare April 27, 2026 02:52
@thymikee thymikee changed the title docs: restructure agent-device skill guidance docs: move agent guidance into help workflows Apr 27, 2026
@thymikee thymikee changed the title docs: move agent guidance into help workflows docs: replace bulky skills with versioned CLI help Apr 27, 2026
@thymikee thymikee force-pushed the codex/skill-router-refactor branch from ea8fb83 to 21c037e Compare April 28, 2026 01:53
@thymikee thymikee changed the title docs: replace bulky skills with versioned CLI help feat: replace bulky skills with versioned CLI help Apr 28, 2026
@thymikee thymikee merged commit 140fe33 into main Apr 28, 2026
18 checks passed
@thymikee thymikee deleted the codex/skill-router-refactor branch April 28, 2026 18:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant