feat(v0.1.2): behavioral check expansion + HelpOutput cache by brettdavies · Pull Request #24 · brettdavies/agentnative-cli

brettdavies · 2026-04-21T21:59:06Z

Summary

Ships the three behavioral checks and shared HelpOutput cache promised for v0.1.2 H4:

p1-flag-existence — second behavioral proof of p1-must-no-interactive alongside p1-non-interactive
p1-env-hints — behavioral layer over source-only p1-env-flags-source
p6-no-pager-behavioral — behavioral layer over source-only p6-no-pager

All three consume a single HelpOutput per target, so the binary's --help is spawned once per run regardless of how many checks inspect it. Each check declares covers() against an existing registry ID — no new requirement IDs land in this PR, only new verifiers. Three requirements move from single-layer to dual-layer coverage (dual-layer count: 4 → 7).

Also carries an unrelated chore: raising required_approving_review_count from 0 → 1 on the main branch ruleset, committed as the first commit of the branch.

Changelog

Added

Add p1-flag-existence behavioral check — passes when --help advertises a non-interactive gate flag (--no-interactive, --batch, --headless, -y, --yes, -p, --print, --no-input, --assume-yes). Skips when the target already satisfies P1 via help-on-bare-invocation or stdin-primary.
Add p1-env-hints behavioral check — passes when --help exposes clap-style [env: FOO] bindings for flags. Emits medium confidence; the heuristic covers the canonical but not the only env-binding format.
Add p6-no-pager-behavioral behavioral check — passes when --no-pager is advertised in --help. Skips when no pager signal (less / more / $PAGER / --pager) appears. Emits medium confidence.
Add confidence field to every scorecard result (high / medium / low). Additive; v1.1 consumers feature-detect.
Add dual_layer count to the coverage matrix summary so the headline prose surfaces how many covered requirements have verifiers in two layers.

Changed

Raise required approving review count on main branch from 0 to 1.

Documentation

Regenerate docs/coverage-matrix.md + coverage/matrix.json to pick up the three new behavioral verifiers.

Type of Change

`feat`: New feature (non-breaking change which adds functionality)

Testing

Unit tests added/updated
Integration tests added/updated
Manual testing completed
All tests passing

Test Summary:

Unit + integration tests: 41 passing (up from 35 — 6 tests added for the new checks, plus additional HelpOutput parser coverage).
Pre-push hook (mirrors CI): fmt, clippy -Dwarnings, test, cargo-deny, Windows compat — all green.
Smoke-tested against rg, bat, bird, xr via anc check --command; all new checks produce sensible verdicts.
Dogfood: anc check . on the agentnative repo produces Pass or Skip (no Warn) for all three new checks.

Files Modified

Created:

src/runner/help_probe.rs — HelpOutput cache with lazy flags(), env_hints(), subcommands() parsers.
src/checks/behavioral/flag_existence.rs
src/checks/behavioral/env_hints.rs
src/checks/behavioral/no_pager_behavioral.rs

Modified:

src/runner.rs → src/runner/mod.rs (promoted to a module directory so help_probe can live alongside BinaryRunner).
src/project.rs — lazy help_output() accessor on Project (OnceLock, same pattern as parsed_files).
src/types.rs — Confidence enum + CheckResult::confidence field.
src/scorecard.rs — serialize confidence on every result view.
src/principles/matrix.rs — dual_layer stat + surfaced in summary prose.
src/checks/behavioral/mod.rs — register the three new checks.
docs/coverage-matrix.md + coverage/matrix.json — regenerated.
.github/rulesets/protect-main.json — raise required approving review count on main.
34 existing check files gain confidence: Confidence::High, in their CheckResult construction — mechanical, no semantic change.

Key Features

One --help probe per target regardless of how many checks consume it. Runner caching means p1-flag-existence and p6-no-pager-behavioral share state without explicit plumbing.
Dual-layer coverage growth: three requirements now have both behavioral and source/project verifiers, halving the single-layer bucket for P1 and P6 MUSTs.

Breaking Changes

No breaking changes

Deployment Notes

Post-merge, regenerate the 10 committed scorecards on agentnative-site (H3's proven workflow) once v0.1.2 is released to crates.io + Homebrew. Matrix sync (scripts/sync-coverage-matrix.sh) picks up the new dual_layer field automatically.

Checklist

Code follows project conventions and style guidelines
Commit messages follow Conventional Commits
Self-review of code completed
Tests added/updated and passing
No new warnings or errors introduced
Changes are backward compatible

Raises required_approving_review_count from 0 to 1 on the main branch ruleset so PRs targeting main require at least one approval before merge. Complements required_status_checks + required_linear_history already in place.

Introduce `Confidence { High, Medium, Low }` enum so behavioral checks can signal how much they trust their own verdict. Direct probes keep the default `High`; the two new heuristic P1/P6 behavioral checks in this release report `Medium`. The additive `confidence` field in each scorecard result does not bump `schema_version` — v1.1 consumers feature-detect and tolerate missing keys. Existing checks (37 construction sites, plus test helpers) now pass `Confidence::High` explicitly; no semantic change.

New src/runner/help_probe.rs exposes HelpOutput — a once-per-target probe of <binary> --help with lazy, cached parse views: flags(), env_hints(), subcommands(). Checks that consume the help surface share the same HelpOutput so the binary is never re-spawned within a single run. Parsers are English-only on purpose. Localized help is documented as a named exception in docs/coverage-matrix.md; consumers skip (not warn) when the English surface is absent. Unit tests cover the ripgrep/clap/bare/non-English fixtures plus caching idempotence. src/runner.rs is promoted to src/runner/mod.rs so the new submodule can live alongside BinaryRunner without touching unrelated code.

Checks that need to inspect the --help surface now call `project.help_output()`; the OnceLock initializer probes `<binary> --help` exactly once and all subsequent calls return the same `HelpOutput`. No churn to the `Check` trait signature — existing checks continue to compile unchanged.

…oral Three new behavioral checks land as a set, all consuming the shared HelpOutput cache. Registry linkage is via covers() — no new requirement IDs. Each check gets the canonical unit test shape: happy path, skip-applicability, warn-missing, non-English fixture. p1-flag-existence (confidence: high, covers p1-must-no-interactive) — second behavioral proof alongside p1-non-interactive. Passes when the help surface advertises any of the canonical non-interactive flags (--no-interactive, --batch, --headless, -y, --yes, -p, --print, --no-input, --assume-yes). Skips when the target already satisfies P1 via help-on-bare-invocation or stdin-clean-exit. p1-env-hints (confidence: medium, covers p1-must-env-var) — behavioral layer for env-var coverage previously source-only via p1-env-flags-source. Reads clap-style `[env: FOO]` annotations from the parsed help surface. p6-no-pager-behavioral (confidence: medium, covers p6-must-no-pager) — behavioral layer for pager coverage previously source-only via p6-no-pager. Passes when `--no-pager` is advertised. Skips when no pager signal present. Warns when pager is referenced but the escape hatch is missing. Coverage matrix regeneration lands in the next commit.

Regenerated docs/coverage-matrix.md and coverage/matrix.json to pick up covers() linkage for the three new behavioral checks. Three requirements move from single-layer to dual-layer coverage: - p1-must-no-interactive +p1-flag-existence (behavioral) - p1-must-env-var +p1-env-hints (behavioral) - p6-must-no-pager +p6-no-pager-behavioral (behavioral) MatrixSummary gains a `dual_layer` count so the site /coverage page and the committed markdown surface the signal directly. JSON shape addition is additive — schema_version stays at 1.0 and consumers feature-detect missing fields. Also silences the dead_code warnings for HelpOutput::subcommands() (reserved for future P3/P6 checks) and tightens Flag::matches() to drop the redundant branch that clippy flagged.

## Summary Release branch for v0.1.2. Cherry-picked from `dev`: - **PR #24** — v0.1.2 feature: 3 new behavioral checks (`p1-flag-existence`, `p1-env-hints`, `p6-no-pager-behavioral`) + shared `HelpOutput` cache + `Confidence` field on `CheckResult` + `dual_layer` stat in coverage matrix + `main` ruleset review count 0 → 1. - **PR #23** — post-v0.1.1 README + AGENTS sync that never made it into the v0.1.1 release branch. Plus the standard release-branch mechanics: `Cargo.toml` bump to `0.1.2`, `Cargo.lock` refresh, regenerated `CHANGELOG.md` (git-cliff from squash-commit `## Changelog` bodies). Completions had no drift this cycle — CLI surface unchanged. ## Type of Change - [x] `feat`: new user-visible capabilities (three behavioral checks + `confidence` field + dual-layer coverage stat) ## Testing - [x] All tests passing on dev head before cherry-pick (41 / 41) - [x] Pre-push hook green on release branch: fmt / clippy `-Dwarnings` / test / cargo-deny / Windows compat - [x] Coverage matrix drift check passes against the committed artifacts - [x] Smoke-tested against `rg` / `bat` / `bird` / `xr` / `anc` itself ## Files Modified See the two cherry-pick bodies (commits `72ca148` and `2fcb08d`) plus `CHANGELOG.md` + `Cargo.toml` + `Cargo.lock` for the release mechanics. ## Deployment Notes Post-merge: 1. Annotated tag + push — `git tag -a -m "Release v0.1.2" v0.1.2 && git push origin main --tags`. 2. Tag push triggers `release.yml` → `cargo publish` (Trusted Publishing) → GitHub Release (draftless, `make_latest: false` during bottle window) → Homebrew dispatch → `finalize-release` flips `make_latest: true`. 3. Follow-on on `agentnative-site`: install v0.1.2, regenerate the 10 committed scorecards (H3's proven workflow), run `scripts/sync-coverage-matrix.sh` to pull `dual_layer: 7` into `src/data/coverage-matrix.json`. ## Breaking Changes - [x] No breaking changes. Scorecard schema stays at `1.1`; `confidence` on results and `dual_layer` on matrix summary are additive. ## Checklist - [x] CI-equivalent gates green locally - [x] Cherry-picks touched no guarded paths (`docs/plans/`, `docs/solutions/`, `docs/brainstorms/`, `docs/reviews/`) - [x] `Cargo.toml` version matches the tag this branch will produce - [x] CHANGELOG entries reflect user-facing changes only

brettdavies added 6 commits April 21, 2026 16:39

chore(ops): require 1 approving review on main

ca69d68

Raises required_approving_review_count from 0 to 1 on the main branch ruleset so PRs targeting main require at least one approval before merge. Complements required_status_checks + required_linear_history already in place.

brettdavies merged commit f969f8c into dev Apr 21, 2026
6 checks passed

brettdavies deleted the feat/v012-behavioral-check-expansion branch April 21, 2026 22:03

brettdavies mentioned this pull request Apr 21, 2026

release: v0.1.2 #25

Merged

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(v0.1.2): behavioral check expansion + HelpOutput cache#24

feat(v0.1.2): behavioral check expansion + HelpOutput cache#24
brettdavies merged 6 commits intodevfrom
feat/v012-behavioral-check-expansion

brettdavies commented Apr 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

brettdavies commented Apr 21, 2026

Summary

Changelog

Added

Changed

Documentation

Type of Change

Testing

Files Modified

Key Features

Breaking Changes

Deployment Notes

Checklist

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant