fix: greedy verb-chain extraction (#27) — 0.1.4-alpha#28
Merged
Conversation
Replace the static BashArity lookup table with a "stop at non-verb-like token" heuristic. The parser walks consecutive verb-like Word tokens from the start of each clause, transparently consuming flag-with-value pairs, and stops at the first non-verb-like token. Known FILE verbs (cat, ls, bash, cd, chmod, grep, find, ...) keep a 1-token verb chain so per-verb positional-arg classification still fires for bare-name targets like `cat README` and `ln src dst`. The new IsVerbLikeToken predicate is a strict allow-list: Word kind, length 1-64, leading [a-z], body [a-z0-9._-]. This naturally rejects flags, paths, env-var refs, URLs, globs, and uppercase user-named identifiers without per-case predicate logic. Clause.Verb is now documented as a convenience hint, not a security contract. Consumers needing security-grade matching should pattern- prefix match against the raw token stream; the deliberate over- extraction on bare-word args (`git push origin main` -> [git, push, origin, main]) is the security-correct default for auto-proposed approval patterns. Removes BashArity table and ProbeArity() method entirely. SPEC.md gets a full rewrite of section 6.1 (verb-chain extraction), a new 6.1.1 (consumer pattern-matching guidance), updated grammar and worked examples, and a versioning note acknowledging that pre-v0.1.0 alphas may include behavior course-corrections. Corpus: 7 new entries (132-138) cover the issue's headline cases; 11 existing entries flip to the new shape. 8 unit tests in BashCommandParserTests updated. All 394 tests pass; clean build, headers verified, packs as ShellSyntaxTree.0.1.4-alpha.
Drop the `quotedFirstVerb` flag and gate the walk solely on `firstVerb is
not null` — the QuotedString branch no longer needs a sentinel because
it falls through with `firstVerb == null` and the loop short-circuits.
Cache the inner `HashSet<string>` from `FlagsWithValue` once into
`flagsForVerb` instead of re-hashing `firstVerb` on every flag-token
iteration. Inline the `=`-position scan via a single `IndexOf('=')` call
so the `--flag=value` short-circuit doesn't traverse the flag string
twice; this lets us delete the now-unused `StripEqualsValue` and
`HasInlineEqualsValue` helpers.
Trim the 20-line block comment at the top of `ParseClauseSegment` to
just the load-bearing invariants (FileVerb carveout + ordering with
flag-with-value consumption). Drop mid-loop comments that narrate the
control flow. Add a 4-element capacity hint to `verbTokens` to avoid
the first realloc on the typical case.
In `BashVerbs.cs`, revert the `FileVerbs` remarks paragraph that leaked
parser usage details and rewrite the `IsVerbLikeToken` doc to capture
the WHY of the strict allow-list (vs. negation-of-LooksLikePath)
without re-listing all the rejection categories.
All 394 tests still pass; no behavior change.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #27.
Replaces the static
BashAritylookup table with a "stop at non-verb-like token" heuristic. The parser walks consecutive verb-likeWordtokens from the start of each clause, transparently consuming flag-with-value pairs, and stops at the first non-verb-like token. Known FILE verbs (cat,ls,bash,cd,chmod,grep,find, …) keep a 1-token verb chain so per-verb positional-arg classification still fires for bare-name targets likecat READMEandln src dst.The new
BashVerbs.IsVerbLikeTokenpredicate is a strict allow-list:Wordkind, length 1–64, leading[a-z], body[a-z0-9._-]. The allow-list (over a negation ofLooksLikePath) stays conservative for unknown shapes and rejects flags, paths, env-var refs, URLs, globs, and uppercase user-named identifiers without per-case predicate logic.Clause.Verbis now documented as a convenience hint, not a security contract. Consumers needing security-grade matching should pattern-prefix match against the raw token stream — the deliberate over-extraction on bare-word args (git push origin main→[git, push, origin, main]) is the security-correct default for auto-proposed approval patterns: a subsequent variation re-prompts rather than silently auto-grants.Behavior changes (examples):
git push origin main→[git, push, origin, main](was[git, push])git worktree listand arbitrary CLI subcommand chains → fully extractedfreshdesk ticket list --status open→[freshdesk, ticket, list](was[freshdesk])kubectl get pods my-pod→[kubectl, get, pods, my-pod]aws s3 cp src dst→[aws, s3, cp, src, dst]dotnet ef migrations add InitialCreate→[dotnet, ef, migrations, add](stops at uppercase)cat README→ still[cat](FileVerb carveout preserves IsPath)Scope:
BashVerbs.BashAritytable +ProbeArity()method.BashVerbs.IsVerbLikeToken(BashToken)predicate.BashCommandParser.ParseClauseSegment(greedy walk + FileVerb 1-token carveout + flag-with-value consumption).SPEC.mdupdates: §3VerbChain, §4 grammar, §6.1 (full rewrite), new §6.1.1 consumer pattern-matching guidance, §7 flag-with-value note, §12 worked examples, §15 versioning, §16 sequencing.0.1.3-alpha→0.1.4-alpha.Test plan
dotnet build -c Releaseclean (0 warnings, 0 errors)dotnet test -c Release— 394 passed, 0 failedpwsh ./scripts/Add-FileHeaders.ps1 -Verifypassesdotnet pack -c Release -o ./bin/nugetproducesShellSyntaxTree.0.1.4-alpha.nupkgVerbChain/Clauseshape locked)