Releases: Aaronontheweb/ShellSyntaxTree
ShellSyntaxTree 0.1.5-beta
Newline-as-statement-separator. Public API surface unchanged; the
content of ParsedCommand.Clauses changes for any input that spans
multiple lines. Shipped as a beta prerelease so Netclaw can validate
the AST-shape change against its live gate evaluator before this is
promoted to a stable 0.1.5.
BEHAVIOR CHANGE: a bare newline now separates clauses (SPEC §4)
- A bare newline outside quotes, heredoc bodies, line continuations, and
$()/ backtick substitutions is now a statement separator equivalent
to;— the clause after it carriesCompoundOperator.Sequence.
Before this releaseBashCommandParseronly split clauses on&&/
||/;/|, socmd1\ncmd2parsed to a single clause[cmd1]
withcmd2wrongly absorbed as an argument. - The lexer flags the newline-bearing
Whitespacetoken — and the
newline after a heredoc terminator — with a new internal
IsStatementSeparatorbit;FilterSignificantretains those tokens
andSplitIntoSegmentssplits clauses on them. - Consecutive newlines, leading and trailing newlines, and a newline
immediately after a compound operator (cmd1 &&\ncmd2) all collapse —
they never produce an empty clause. - A heredoc followed by a command on the next line now parses to two
clauses (previously the heredoc clause and the following command
merged into one). - A control-flow keyword opening a newline-separated clause
(echo hi\nfor i in 1 2 3) safe-fails toIsUnparseable=true, exactly
as it would after;.
Examples that change:
cmd1\ncmd2→ two clauses[cmd1],[cmd2](was one clause[cmd1]
withcmd2as an arg).git pull # done\ndotnet build→ two clauses (was one).
Behavior notes
- Public API surface is unchanged (no
PublicApiSnapshotTestsdelta). - SPEC.md updates: §4 grammar (
compound_opincludesNEWLINE, new
notes bullet), §5WHITESPACEtokenization, §15 versioning, §16
sequencing note. - Corpus: 11 new entries (139–149) covering newline separation, blank
lines, leading/trailing newlines, newline after an operator, newline
inside quotes and subshells, line continuation, comment-then-newline,
and the control-flow-after-newline safe-fail. Entry 126's note is
corrected — newline-as-separator is no longer a pending gap. - Unit tests: 8 new
BashLexerTestscases + 14 new
BashCommandParserTestscases; the stale comment in
Comment_between_two_statements_preserves_both_clausesis corrected.
ShellSyntaxTree 0.1.4
Stable promotion of 0.1.4-alpha. No code changes from the alpha; this release
drops the pre-release suffix to signal that the v0.1 public API surface is
considered production-ready for Bash parsing use cases (see SPEC.md §17
acceptance criteria). Consumers on any 0.1.x-alpha can upgrade directly.
See the 0.1.4-alpha notes below for the full list of changes in this version.
ShellSyntaxTree 0.1.4-alpha
Greedy verb-chain extraction. Public API surface (VerbChain, Clause)
unchanged; the content of Clause.Verb.Tokens changes for many inputs.
BEHAVIOR CHANGE: verb-chain length is no longer table-driven (#27)
- The
BashAritystatic lookup table andProbeArity()method have been
removed. The parser walks consecutive verb-like Word tokens from
the start of each clause, transparently consuming flag-with-value
pairs (e.g.git -C /repo), and stops at the first non-verb-like
token, the first plain flag, or the first non-Word token. - A token is "verb-like" when its kind is
Word, length 1–64, first
character is an ASCII lowercase letter, and remaining characters are
in[a-z0-9._-]. The strict allow-list naturally excludes flags,
paths (/,\,~), env-var refs ($VAR), URLs (://), globs,
numeric tokens, and uppercase user-named identifiers like migration
names — without requiring per-case predicate logic. See SPEC §6.1. - For known FILE verbs (
cat,ls,bash,cd,chmod,grep,
find, …) the verb chain stops at exactly one token to preserve
per-verb positional-arg classification. The flag-with-value
consumption still runs sotar -C /pathandcurl -o filestyle
values still pick upIsPath=trueviaFlagValueIsPath.
Examples that change:
git push origin main→ verb[git, push, origin, main](was
[git, push]).git worktree list(and arbitrary CLI subcommand chains) → fully
extracted as[git, worktree, list](was[git, worktree]).freshdesk ticket list --status open→[freshdesk, ticket, list]
(was[freshdesk]because freshdesk wasn't in the BashArity table).kubectl get pods my-pod→[kubectl, get, pods, my-pod](was
[kubectl, get]).aws s3 cp src dst→[aws, s3, cp, src, dst](was[aws, s3]).dotnet ef migrations add InitialCreate→[dotnet, ef, migrations, add](was[dotnet, ef]).InitialCreatestays in args because the
predicate rejects uppercase first character.cat README→ still[cat](FileVerb carveout preservesIsPathon
bare-name targets).echo hello→[echo, hello](echo is not a FILE verb).
Clause.Verb is now documented as a convenience hint, not a security
contract (SPEC §6.1.1). Consumers needing security-grade verb
identification should pattern-prefix match against the raw token
stream: a command matches an approval pattern P iff the first
len(P.verb_prefix) command tokens equal P.verb_prefix. This punts
depth choice to the consumer and accommodates the parser's deliberate
over-extraction on bare-word args. Auto-proposed patterns should default
to the full extracted verb chain (greedy match): a subsequent variation
re-prompts rather than silently auto-grants.
Behavior notes
- Public API surface is unchanged (no
PublicApiSnapshotTestsdelta). - SPEC.md updates: §3
VerbChain, §4 grammar, §6.1 verb-chain
extraction (rewritten end-to-end), new §6.1.1 consumer
pattern-matching guidance, §7 flag-with-value note, §12 worked
examples, §15 versioning, §16 implementation sequencing. - Corpus: 7 new entries (132–138) pin the issue #27 headline cases;
10 existing entries flipped to the new shape (04_echo_hello,
11_git_push_origin_main,13_git_checkout_dev,17_docker_run_nginx,
27_make_install,45_echo_append_log,84_subshell_nested,
91_bash_c_simple,96_bash_c_nested_depth_2,
100_bash_c_nested_depth_3,130_netclaw_repro_leading_comment_pipeline). - Unit tests: 8 pinned
BashCommandParserTestscases updated to the new
expected verb chains.
ShellSyntaxTree 0.1.3-alpha
Bash line comment handling. Public API unchanged.
Fixed
-
Bash line comments are now recognized and skipped (#25).
BashLexer
treats#at a word boundary (start of input, or preceded by
whitespace, a newline, or any operator) as the start of a comment
that runs to the next newline. The comment text is emitted as a new
internalBashTokenKind.Commenttoken for source fidelity and is
filtered by the parser alongsideWhitespace/Continuation, so
it contributes no verb, args, redirects, or flags to any clause.
Comment-only input parses toClauses = [],IsUnparseable = false,
matching the existing empty-/whitespace-only path. Quoting and
escape rules are honored:#inside single or double quotes is
literal,#in the interior of an unquoted word (e.g.abc#def)
is literal, and\#outside quotes is literal.Before this fix,
# Extract worktree branches\ngit worktree list
parsed to a single clause with verb chain[#, Extract]— the
comment text leaked into downstream approval prompts and broke
approval-state caching in consumers that did asymmetric verb-chain
extraction (persistence-time vs. retry-authorization saw different
verb sets, causing tool calls to fail after the user had already
clicked Approve).
Behavior notes
- Public API surface is unchanged (no
PublicApiSnapshotTestsdelta). - SPEC.md §4 / §5: new "Comment handling" subsection in §5 documents
the boundary rules; §4 BNF notes that comments are
whitespace-equivalent at the lexer level. - Corpus: 9 new entries (123–131) pin every case from the issue
report, plus the two Netclaw repros (sanitized paths per §14). - v0.1 still does not treat top-level newlines as statement separators
(SPEC §4 gap, tracked separately in IMPLEMENTATION_PLAN NEXT) — a
comment between two commands on separate lines requires an explicit
;separator to split into two clauses.
ShellSyntaxTree 0.1.2-alpha
Three parser correctness fixes. Public API unchanged.
Fixed
- Single-quoted strings are now literal per SPEC §5 (B2). Previously
echo '$HOME'producedKind=Tildebecause the resolver substituted
$HOMEuniformly regardless of quote style. Now the lexer marks
single-quotedQuotedStringtokens with the internalIsSingleQuoted
flag, and the resolver bypasses tilde /$HOME/$VAR/ glob /
filesystem::handling for them.echo '$HOME'staysKind=Literal,
Resolved=null;cat '/etc/passwd'still resolves a path. Matches
bash semantics. LooksLikePathno longer false-positives on a lone trailing
backslash (B3). A double-quoted token like"foo\\"lexes to
Valuefoo\; the trailing\is an escape-collapse artifact, not a
meaningful path signal. The heuristic now requires a backslash at a
non-trailing position. Forward-slash behavior is unchanged —dir/
still classifies as a path (trailing/is a meaningful bash
directory hint).- Control-flow keyword detection precedes paren-balance (B4).
Previouslycase x in a) ;; esacproducedIsUnparseable=truewith
reasonunbalanced parens at position Nbecause the)ina)
trippedSplitIntoSegmentsbefore the per-clause keyword check
could fire. The anomaly pass now scans the token stream for
control-flow keywords at verb position (start of input or after
&&/||/;/|/() and short-circuits with the helpful
control-flow keyword 'case' is not supported in v0.1reason
before downstream checks run. SPEC §11 now pins the full diagnostic
precedence order.
Behavior notes
- Public API surface is unchanged (no
PublicApiSnapshotTestsdelta). - SPEC.md §8: new "Step 0: Single-quoted bypass" preamble; LooksLikePath
heuristic updated to call out the trailing-backslash carve-out. - SPEC.md §11: new "Diagnostic precedence" section enumerating the
order in which unparseable conditions are checked. - Corpus entries 104 (
echo 'literal $HOME') and 109 (echo "trailing backslash\\") updated to the corrected outputs. Four new entries
(119–122) pin the regression guards: single-quoted absolute paths
still resolve,cd dir/still classifies as a path, single-quoted
$VARstays literal underrm, andcase x in a) ;; esacnow
reports the control-flow keyword reason instead of a paren-balance
error.
ShellSyntaxTree 0.1.1-alpha
Bug fix release for v0.1.0-alpha consumers.
Fixed
2>&1fd-dup redirects no longer produce phantom<cwd>/&1file
targets. The parser now recognizes POSIX fd-dup / fd-close shorthand
(&N,&N-,&-) on redirect targets and carries the raw token
verbatim onRedirect.TargetwithRedirect.IsDynamicSkip = true.
Existing consumers that already skip redirects with
IsDynamicSkip = trueget correct behavior with no code changes.
(B1)
Behavior notes
- Public API surface is unchanged.
Redirect.Targetxmldoc and SPEC.md
§3 / §4 are clarified to document the fd-dup rule. - The Blazor sample's basename-startswith-
&workaround has been
removed; the sample now relies solely onRedirect.IsDynamicSkip.
ShellSyntaxTree 0.1.0-alpha
First publishable cut of ShellSyntaxTree — a focused .NET library that
parses bash command strings into a structured AST for security-gate
evaluators. Hand-rolled, AOT-trim friendly, no native dependencies.
What's in this release
IShellParserinterface +BashParserimplementation per locked
v0.1 contract (SPEC.md §2 / §3)- Bash lexer: words, quoted strings, operators, opaque substitutions
($()/ backticks → DynamicSkip), arithmetic ($((...))) and complex
parameter expansion (${var//.../...}) → IsUnparseable - Verb tables (BashArity, CwdVerbs, FileVerbs, FlagsWithValue) + per-verb
path-arg rules + flag-with-value-aware verb-chain probe - Path resolver: tilde /
$HOMEexpansion,filesystem::prefix strip,
glob detection (covering-dir heuristic preserved), cross-platform
forward-slash normalization - cd-in-compound attribution: synthetic
Arg.IsCwdAttributionpropagated
to subsequent clauses;cd $VARproduces a DynamicSkip attribution
signal - Subshell isolation via attribution stack with monotonic IDs (handles
sibling subshells(a) && (b)cleanly) bash -c/sh -crecursion (cap at depth 5 → outer
ParsedCommand.IsUnparseable=true)- 115-entry corpus across all 11 SPEC §13 categories, validated by
CorpusRunnerTestswith a polishedAstAssert.Equalhelper - PII audit
[Fact]enforcing SPEC §14 sanitization patterns
Public API surface (locked per SPEC §2 / §3)
IShellParser, BashParser, BashParserOptions, ParsedCommand,
Clause, VerbChain, Arg, Redirect (records); ArgKind,
RedirectDirection, CompoundOperator (enums). Multi-target
netstandard2.0;net8.0; AOT-friendly (<IsAotCompatible>true</IsAotCompatible>).
Verification
353 tests passing on Linux + Windows. dotnet pack produces
ShellSyntaxTree.0.1.0-alpha.nupkg with embedded README, icon, and
SourceLink metadata.
Known limitations (tracked for v0.1.x)
pushd/popdparse as CwdVerbs but don't propagate cwd
attribution (onlycd/chdirdo in v0.1)tarfalls through to the default per-verb rule (no action-flag
awareness)docker -v "/host:/container"is a single literal arg with
IsPath=false(no colon-split in v0.1)- Single-quoted
'$HOME'is substituted by the resolver (bash
semantics: doesn't substitute in single quotes)
Documentation
SPEC.md— locked v0.1 contract (the source of truth for parser
behavior)openspec/changes/— change-proposal history with rationale for the
eight v0.1 SPEC interpretations resolved during planningREADME.md— quick-start usage