Problem
Clause.Verb currently relies on BashArity table lookups (default 2-token, with hand-curated 3-token entries for docker compose and bun run). This breaks down in two ways:
-
Multi-token CLI subcommands without table entries are truncated. freshdesk ticket list --status open, git worktree list, kubectl get pods, aws s3 cp, dotnet ef migrations add all extract a too-short verb chain because their root verbs aren't in the table at the appropriate arity.
-
The table approach can't scale. Custom/private CLIs (freshdesk and any internal company tool) and the long tail of cloud-CLI subcommand groups will never appear in any curated table. Maintenance cost is unbounded.
-
Even with a complete table, the git X Y ambiguity is structurally undecidable. git push origin main (3-token chain git push origin?) vs git worktree list (3-token chain git worktree list) look identical to the parser without per-CLI semantic knowledge. No syntactic rule disambiguates branch names from subcommand names.
Proposed change
Parser side: improve the heuristic
Replace BashArity-table lookup with a "stop at non-verb-like token" heuristic. A token is "verb-like" if it's a bare identifier (lowercase letters + hyphens + dots, no special chars, not too long) and not a flag, not a path-shape, not a value-like token (numeric, URL, env-var ref).
The walk consumes consecutive verb-like tokens from the start of the clause, treating known flag-with-value pairs (existing FlagsWithValue table) as transparent — -C /repo is consumed without breaking the verb-chain walk. Stop at the first non-verb-like token.
Effect on the same examples:
| Command |
Old extraction (BashArity-based) |
New extraction (stop-at-non-verb-like) |
freshdesk ticket list --status open |
[freshdesk] (1-token default) |
[freshdesk, ticket, list] (stops at --status) |
git -C /repo worktree list --porcelain |
[git, worktree] |
[git, worktree, list] (stops at --porcelain) |
kubectl get pods my-pod |
[kubectl, get] |
[kubectl, get, pods, my-pod] (over-extracts; see below) |
git push origin main |
[git, push] |
[git, push, origin, main] (over-extracts; see below) |
cat /etc/foo |
[cat] |
[cat] (stops at /etc/foo path) |
chmod 755 file |
[chmod] |
[chmod] (stops at 755 numeric) |
The over-extraction cases (bare-word args like origin, main, my-pod) are unfixable at the parser layer — they're indistinguishable from subcommand verbs without per-CLI semantic knowledge. The new heuristic is strictly better than today's BashArity-based extraction, just not perfect on bare-word args.
Documentation side: scope Clause.Verb correctly
Update SPEC.md to make explicit:
Clause.Verb is a convenience hint, not a security contract. It's a best-effort identification of the canonical verb chain. Consumers using it for display, audit dedup, or other non-load-bearing purposes can rely on it.
- Consumers needing security-grade verb identification walk the token stream directly. The pattern-matching algorithm should be: "command matches pattern iff first N command tokens equal pattern's verb prefix, where N = pattern's verb-prefix length." This punts the depth choice to the user (via the pattern they author) and eliminates the parser's responsibility to guess.
- The
BashArity table will not grow to enumerate CLI subcommand structures. Existing entries (docker compose, bun run) stay for the convenience hint; new ones are not added. The table is a small set of well-known multi-word verb idioms, not an exhaustive registry.
Why over-extraction is acceptable for security
When the parser over-extracts (e.g., [git, push, origin, main] instead of [git, push]), pattern-depth-driven matching handles it correctly:
- User's pattern
git push * has verb-prefix length 2.
- Command's first 2 tokens are
[git, push].
- Match check: do the first 2 tokens equal the pattern's verb prefix? ✓ MATCH.
Conversely, when a prompt auto-proposes a pattern, greedy over-extraction is the security-correct default. Auto-proposed pattern for git push origin main is git push origin main * (over-specific). This is better than git push * because:
- A subsequent
git push wrongremote wrongbranch doesn't auto-grant.
- Re-prompts on variation are audit checkpoints, not friction.
- Operators wanting broader grants opt in explicitly via CLI (
netclaw approvals trust-verb 'git push *').
False-negative (re-prompt) is recoverable; false-positive (silent destructive grant) is not. Narrow-by-default favors the recoverable failure mode.
Suggested test cases for the corpus
input: "freshdesk ticket list --status open"
expected: verb=[freshdesk, ticket, list]
input: "git -C /repo worktree list --porcelain"
expected: verb=[git, worktree, list]
input: "kubectl get pods"
expected: verb=[kubectl, get, pods]
input: "kubectl get pods my-pod"
expected: verb=[kubectl, get, pods, my-pod] # over-extracts; consumer handles via pattern depth
input: "aws s3 cp src dst"
expected: verb=[aws, s3, cp, src, dst] # over-extracts on bare-word path-like args
input: "git push origin main"
expected: verb=[git, push, origin, main] # over-extracts on bare-word branch/remote names
input: "cat /etc/passwd"
expected: verb=[cat] # stops at path
input: "chmod 755 file"
expected: verb=[chmod] # stops at numeric mode
input: "echo --version"
expected: verb=[echo] # stops at flag
input: "ls -la /tmp"
expected: verb=[ls] # stops at flag
Non-goals
- Per-CLI semantic knowledge baked into the parser (no
git-specific or kubectl-specific subcommand tables).
- Disambiguating bare-word args from subcommand verbs.
- Any UI/UX choices about how consumers display or match these verb chains. Those are consumer concerns.
Severity
Medium. Today's behavior under-extracts for any CLI not in the table, which causes pattern-matching false negatives for consumers (a saved approval pattern doesn't match the command the user thought it would). The proposed fix is strictly better — moves from "always 2-token unless in table" to "stop at non-verb-like" which handles flags and paths correctly without any per-CLI knowledge. Bare-word over-extraction remains; consumers handle that via pattern-depth-driven matching.
Prior discussion
See comments below for the path that led here — earlier proposals to extend the BashArity table with curated entries, plus a multi-option specificity-picker prompt UX, were both rejected. The table approach can't scale to unknown CLIs; multi-option pickers don't survive translation to text-only channel adapters. The current proposal punts depth choice to consumers and keeps the parser stateless about CLI semantics.
Problem
Clause.Verbcurrently relies onBashAritytable lookups (default 2-token, with hand-curated 3-token entries fordocker composeandbun run). This breaks down in two ways:Multi-token CLI subcommands without table entries are truncated.
freshdesk ticket list --status open,git worktree list,kubectl get pods,aws s3 cp,dotnet ef migrations addall extract a too-short verb chain because their root verbs aren't in the table at the appropriate arity.The table approach can't scale. Custom/private CLIs (
freshdeskand any internal company tool) and the long tail of cloud-CLI subcommand groups will never appear in any curated table. Maintenance cost is unbounded.Even with a complete table, the
git X Yambiguity is structurally undecidable.git push origin main(3-token chaingit push origin?) vsgit worktree list(3-token chaingit worktree list) look identical to the parser without per-CLI semantic knowledge. No syntactic rule disambiguates branch names from subcommand names.Proposed change
Parser side: improve the heuristic
Replace
BashArity-table lookup with a "stop at non-verb-like token" heuristic. A token is "verb-like" if it's a bare identifier (lowercase letters + hyphens + dots, no special chars, not too long) and not a flag, not a path-shape, not a value-like token (numeric, URL, env-var ref).The walk consumes consecutive verb-like tokens from the start of the clause, treating known flag-with-value pairs (existing
FlagsWithValuetable) as transparent —-C /repois consumed without breaking the verb-chain walk. Stop at the first non-verb-like token.Effect on the same examples:
freshdesk ticket list --status open[freshdesk](1-token default)[freshdesk, ticket, list](stops at--status)git -C /repo worktree list --porcelain[git, worktree][git, worktree, list](stops at--porcelain)kubectl get pods my-pod[kubectl, get][kubectl, get, pods, my-pod](over-extracts; see below)git push origin main[git, push][git, push, origin, main](over-extracts; see below)cat /etc/foo[cat][cat](stops at/etc/foopath)chmod 755 file[chmod][chmod](stops at755numeric)The over-extraction cases (bare-word args like
origin,main,my-pod) are unfixable at the parser layer — they're indistinguishable from subcommand verbs without per-CLI semantic knowledge. The new heuristic is strictly better than today's BashArity-based extraction, just not perfect on bare-word args.Documentation side: scope
Clause.VerbcorrectlyUpdate SPEC.md to make explicit:
Clause.Verbis a convenience hint, not a security contract. It's a best-effort identification of the canonical verb chain. Consumers using it for display, audit dedup, or other non-load-bearing purposes can rely on it.BashAritytable will not grow to enumerate CLI subcommand structures. Existing entries (docker compose,bun run) stay for the convenience hint; new ones are not added. The table is a small set of well-known multi-word verb idioms, not an exhaustive registry.Why over-extraction is acceptable for security
When the parser over-extracts (e.g.,
[git, push, origin, main]instead of[git, push]), pattern-depth-driven matching handles it correctly:git push *has verb-prefix length 2.[git, push].Conversely, when a prompt auto-proposes a pattern, greedy over-extraction is the security-correct default. Auto-proposed pattern for
git push origin mainisgit push origin main *(over-specific). This is better thangit push *because:git push wrongremote wrongbranchdoesn't auto-grant.netclaw approvals trust-verb 'git push *').False-negative (re-prompt) is recoverable; false-positive (silent destructive grant) is not. Narrow-by-default favors the recoverable failure mode.
Suggested test cases for the corpus
Non-goals
git-specific orkubectl-specific subcommand tables).Severity
Medium. Today's behavior under-extracts for any CLI not in the table, which causes pattern-matching false negatives for consumers (a saved approval pattern doesn't match the command the user thought it would). The proposed fix is strictly better — moves from "always 2-token unless in table" to "stop at non-verb-like" which handles flags and paths correctly without any per-CLI knowledge. Bare-word over-extraction remains; consumers handle that via pattern-depth-driven matching.
Prior discussion
See comments below for the path that led here — earlier proposals to extend the BashArity table with curated entries, plus a multi-option specificity-picker prompt UX, were both rejected. The table approach can't scale to unknown CLIs; multi-option pickers don't survive translation to text-only channel adapters. The current proposal punts depth choice to consumers and keeps the parser stateless about CLI semantics.