Stop verb-chain extraction at non-verb-like tokens

## Problem

`Clause.Verb` currently relies on `BashArity` table lookups (default 2-token, with hand-curated 3-token entries for `docker compose` and `bun run`). This breaks down in two ways:

1. **Multi-token CLI subcommands without table entries are truncated.** `freshdesk ticket list --status open`, `git worktree list`, `kubectl get pods`, `aws s3 cp`, `dotnet ef migrations add` all extract a too-short verb chain because their root verbs aren't in the table at the appropriate arity.

2. **The table approach can't scale.** Custom/private CLIs (`freshdesk` and any internal company tool) and the long tail of cloud-CLI subcommand groups will never appear in any curated table. Maintenance cost is unbounded.

3. **Even with a complete table, the `git X Y` ambiguity is structurally undecidable.** `git push origin main` (3-token chain `git push origin`?) vs `git worktree list` (3-token chain `git worktree list`) look identical to the parser without per-CLI semantic knowledge. No syntactic rule disambiguates branch names from subcommand names.

## Proposed change

### Parser side: improve the heuristic

Replace `BashArity`-table lookup with a **"stop at non-verb-like token"** heuristic. A token is "verb-like" if it's a bare identifier (lowercase letters + hyphens + dots, no special chars, not too long) and not a flag, not a path-shape, not a value-like token (numeric, URL, env-var ref).

The walk consumes consecutive verb-like tokens from the start of the clause, treating known flag-with-value pairs (existing `FlagsWithValue` table) as transparent — `-C /repo` is consumed without breaking the verb-chain walk. Stop at the first non-verb-like token.

Effect on the same examples:

| Command | Old extraction (BashArity-based) | New extraction (stop-at-non-verb-like) |
|---|---|---|
| `freshdesk ticket list --status open` | `[freshdesk]` (1-token default) | `[freshdesk, ticket, list]` (stops at `--status`) |
| `git -C /repo worktree list --porcelain` | `[git, worktree]` | `[git, worktree, list]` (stops at `--porcelain`) |
| `kubectl get pods my-pod` | `[kubectl, get]` | `[kubectl, get, pods, my-pod]` (over-extracts; see below) |
| `git push origin main` | `[git, push]` | `[git, push, origin, main]` (over-extracts; see below) |
| `cat /etc/foo` | `[cat]` | `[cat]` (stops at `/etc/foo` path) |
| `chmod 755 file` | `[chmod]` | `[chmod]` (stops at `755` numeric) |

The over-extraction cases (bare-word args like `origin`, `main`, `my-pod`) are unfixable at the parser layer — they're indistinguishable from subcommand verbs without per-CLI semantic knowledge. **The new heuristic is strictly better than today's BashArity-based extraction**, just not perfect on bare-word args.

### Documentation side: scope `Clause.Verb` correctly

Update SPEC.md to make explicit:

- **`Clause.Verb` is a convenience hint, not a security contract.** It's a best-effort identification of the canonical verb chain. Consumers using it for display, audit dedup, or other non-load-bearing purposes can rely on it.
- **Consumers needing security-grade verb identification walk the token stream directly.** The pattern-matching algorithm should be: "command matches pattern iff first N command tokens equal pattern's verb prefix, where N = pattern's verb-prefix length." This punts the depth choice to the user (via the pattern they author) and eliminates the parser's responsibility to guess.
- **The `BashArity` table will not grow to enumerate CLI subcommand structures.** Existing entries (`docker compose`, `bun run`) stay for the convenience hint; new ones are not added. The table is a small set of well-known multi-word verb idioms, not an exhaustive registry.

## Why over-extraction is acceptable for security

When the parser over-extracts (e.g., `[git, push, origin, main]` instead of `[git, push]`), pattern-depth-driven matching handles it correctly:

- User's pattern `git push *` has verb-prefix length 2.
- Command's first 2 tokens are `[git, push]`.
- Match check: do the first 2 tokens equal the pattern's verb prefix? ✓ MATCH.

Conversely, when a prompt auto-proposes a pattern, **greedy over-extraction is the security-correct default**. Auto-proposed pattern for `git push origin main` is `git push origin main *` (over-specific). This is *better* than `git push *` because:

- A subsequent `git push wrongremote wrongbranch` doesn't auto-grant.
- Re-prompts on variation are audit checkpoints, not friction.
- Operators wanting broader grants opt in explicitly via CLI (`netclaw approvals trust-verb 'git push *'`).

False-negative (re-prompt) is recoverable; false-positive (silent destructive grant) is not. Narrow-by-default favors the recoverable failure mode.

## Suggested test cases for the corpus

```
input:    "freshdesk ticket list --status open"
expected: verb=[freshdesk, ticket, list]

input:    "git -C /repo worktree list --porcelain"
expected: verb=[git, worktree, list]

input:    "kubectl get pods"
expected: verb=[kubectl, get, pods]

input:    "kubectl get pods my-pod"
expected: verb=[kubectl, get, pods, my-pod]   # over-extracts; consumer handles via pattern depth

input:    "aws s3 cp src dst"
expected: verb=[aws, s3, cp, src, dst]   # over-extracts on bare-word path-like args

input:    "git push origin main"
expected: verb=[git, push, origin, main]   # over-extracts on bare-word branch/remote names

input:    "cat /etc/passwd"
expected: verb=[cat]                       # stops at path

input:    "chmod 755 file"
expected: verb=[chmod]                     # stops at numeric mode

input:    "echo --version"
expected: verb=[echo]                      # stops at flag

input:    "ls -la /tmp"
expected: verb=[ls]                        # stops at flag
```

## Non-goals

- Per-CLI semantic knowledge baked into the parser (no `git`-specific or `kubectl`-specific subcommand tables).
- Disambiguating bare-word args from subcommand verbs.
- Any UI/UX choices about how consumers display or match these verb chains. Those are consumer concerns.

## Severity

Medium. Today's behavior under-extracts for any CLI not in the table, which causes pattern-matching false negatives for consumers (a saved approval pattern doesn't match the command the user thought it would). The proposed fix is strictly better — moves from "always 2-token unless in table" to "stop at non-verb-like" which handles flags and paths correctly without any per-CLI knowledge. Bare-word over-extraction remains; consumers handle that via pattern-depth-driven matching.

## Prior discussion

See comments below for the path that led here — earlier proposals to extend the BashArity table with curated entries, plus a multi-option specificity-picker prompt UX, were both rejected. The table approach can't scale to unknown CLIs; multi-option pickers don't survive translation to text-only channel adapters. The current proposal punts depth choice to consumers and keeps the parser stateless about CLI semantics.

Command	Old extraction (BashArity-based)	New extraction (stop-at-non-verb-like)
`freshdesk ticket list --status open`	`[freshdesk]` (1-token default)	`[freshdesk, ticket, list]` (stops at `--status`)
`git -C /repo worktree list --porcelain`	`[git, worktree]`	`[git, worktree, list]` (stops at `--porcelain`)
`kubectl get pods my-pod`	`[kubectl, get]`	`[kubectl, get, pods, my-pod]` (over-extracts; see below)
`git push origin main`	`[git, push]`	`[git, push, origin, main]` (over-extracts; see below)
`cat /etc/foo`	`[cat]`	`[cat]` (stops at `/etc/foo` path)
`chmod 755 file`	`[chmod]`	`[chmod]` (stops at `755` numeric)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stop verb-chain extraction at non-verb-like tokens #27

Problem

Proposed change

Parser side: improve the heuristic

Documentation side: scope `Clause.Verb` correctly

Why over-extraction is acceptable for security

Suggested test cases for the corpus

Non-goals

Severity

Prior discussion

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Stop verb-chain extraction at non-verb-like tokens #27

Description

Problem

Proposed change

Parser side: improve the heuristic

Documentation side: scope Clause.Verb correctly

Why over-extraction is acceptable for security

Suggested test cases for the corpus

Non-goals

Severity

Prior discussion

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Documentation side: scope `Clause.Verb` correctly