Procedural Lark grammar generation for CLI commands and JSON tool calls.
Instead of hand-writing .lark grammars from memory, grammargen derives them
from structured sources:
--helpoutput — parse GNU-style help text into an IR, then emit a Lark grammar- JSON Schema — convert MCP/function-calling tool schemas into JSON-constraining Lark grammars
cd ~/grammargen
pip install -e ".[dev]"# Run the command and parse its --help output
grammargen from-help grep
grammargen from-help ls -o ls.lark
# Or from a saved help text file
grep --help > grep_help.txt
grammargen from-help-text grep_help.txt --name grepgrammargen from-schema tool_schema.json -o tool.lark# Check grammar syntax
grammargen validate grammar.lark
# Check against example strings
grammargen validate grammar.lark \
--positive "grep -i foo bar.txt" \
--negative "notacommand"grammargen roundtrip grep --positive "grep -i foo bar.txt"--help text ──→ help_parser ──→ CommandSpec IR ──→ lark_emitter ──→ .lark (CLI grammar)
JSON Schema ──→ schema_parser ─────────────────────────────────────→ .lark (JSON grammar)
.lark ──→ validate ──→ pass/fail
The CommandSpec IR captures CLI structure (flags, positional args, cardinality) in a language-agnostic dataclass. The Lark backend emits grammars matching the smolbash conventions.
The JSON Schema path bypasses the IR entirely since JSON objects have a fundamentally different structure from CLI invocations.
pytest