EBNF Forge is a strict, specification-oriented EBNF parser and validator that enforces a bounded grammar subset and emits a canonical, versioned JSON representation of the grammar and its documentation metadata.
This tool treats EBNF as a governed specification artifact, not as an informal notation or a parser-generator input. Its primary goals are validation, stability, and long-term maintainability—not code generation.
This repository also includes companion tools that consume the JSON IR to generate documentation, diagrams, and interactive exploration.
First I needed it as I was building another language.
EBNF is widely used, but in practice:
- There is no single, canonical EBNF.
- Tooling varies widely in what syntax is accepted, ignored, or extended.
- Many tools either:
- parse loosely and discard structure, or
- enforce syntax but discard documentation and intent.
As a result, grammar specifications often drift over time:
- undocumented features creep in,
- syntax diverges from what implementations can realistically support,
- documentation becomes informal, ambiguous, or detached from the grammar itself.
EBNF Forge exists to prevent that drift.
It provides:
- a real parser (not line-based or heuristic),
- explicit validation rules that define an allowed EBNF subset,
- deterministic diagnostics with source spans,
- and a structured JSON output that downstream tools can rely on.
EBNF Forge is inspired by the ISO/IEC 14977 family of EBNF definitions, but it is not a full implementation of any single published dialect, nor does it attempt to be maximally permissive.
Key points about its lineage and differences:
-
Grammar model
- Uses traditional EBNF constructs: sequence, choice, grouping, optionality, repetition.
- Rejects regex-style extensions commonly found in ad-hoc grammars unless explicitly allowed.
-
Explicit subset
- Only a deliberately chosen subset of EBNF is accepted.
- Unsupported or ambiguous constructs are rejected with diagnostics.
- This is intentional: the goal is predictability, not compatibility with every tool.
-
No parser-generator semantics
- This tool does not assume LL, LR, PEG, Pratt, or any other parsing strategy.
- Validation rules may optionally enforce constraints compatible with a given strategy, but those are policy layers—not core syntax.
-
Documentation as first-class data
- Documentation is extracted via explicit annotation comments.
- Nothing is inferred heuristically.
- Comments are part of the grammar’s machine-checked structure, not just prose.
Because EBNF dialects differ across tools and communities, EBNF Forge makes its rules explicit and versioned, rather than pretending to be “universal.”
- Parses EBNF files into a proper abstract syntax tree (AST)
- Validates grammar structure against a defined EBNF subset
- Enforces rules to prevent unreviewed feature creep
- Extracts documentation and metadata from explicit annotation comments
- Emits a canonical, deterministic JSON IR containing:
- grammar structure
- documentation blocks
- sections and tags
- cross-references
- diagnostics with source spans
This JSON output is the primary artifact of the tool.
- ❌ Generate parsers or code
- ❌ Attempt to support every EBNF extension used by other tools
- ❌ Infer documentation structure heuristically
- ❌ Hide syntax errors or “best-guess” malformed input
Rendering (Markdown, HTML, railroad diagrams, etc.) is handled by separate tools that consume the JSON IR produced by ebnff.
This repo ships those tools, rather than baking rendering into the validator itself.
- Validator:
ebnff— validates EBNF and emits versioned JSON IR (--out). - Doc generator:
ebnffdocgen— generates Markdown/MDX docs from JSON IR. - Diagram generator:
ebnff-diagrams— generates railroad diagrams (SVG files or an inline SVG map). - Website explorer: Docusaurus site includes a
/explorerpage backed by IR explorer data and inline diagrams.
It is now trivially easy to build parsers using modern AI systems.
Given a well-specified EBNF grammar, an LLM can:
- generate a recursive-descent parser,
- generate a Pratt or precedence-based parser,
- generate a PEG-style parser,
- or generate multiple implementations in different languages.
This tool is not trying to replace that.
Instead, EBNF Forge focuses on the part that AI-assisted parser generation does not solve well:
- locking down grammar shape,
- preventing silent feature creep,
- enforcing consistency and intent over time,
- and providing a stable, machine-checkable specification artifact.
Think of it as the contract that AI-generated parsers should obey.
Given a validated EBNF grammar, a prompt like the following is often sufficient:
You are implementing a parser based on the following EBNF grammar.
Constraints:
- Follow the grammar exactly; do not introduce new syntax.
- Report syntax errors with line/column information.
- Do not accept constructs not present in the grammar.
- Preserve all identifiers and structure.
Grammar:
<insert EBNF here>
Target language: Go
Parsing strategy: recursive descent
Output:
- Lexer (if required)
- Parser
- AST node definitions
- Error handling
Do not add features or extensions beyond the grammar.
This workflow works best when the grammar itself is validated and constrained—which is exactly what this repository provides.
Downstream tools may also consume the JSON IR produced by EBNF Forge instead of re-parsing the raw grammar.
- Validating a language or protocol grammar in CI
- Preventing undocumented or unsupported grammar extensions
- Enabling safe AI-assisted editing of grammar specifications
- Generating downstream documentation, diagrams, or search indexes
- Acting as a stable contract between specification authors and tooling
- DESIGN.md - Authoritative design specification
- SUBSET.md - Reference table of supported EBNF constructs
- SCHEMA.md - JSON IR schema contract
- EXPLORER_USAGE.md - Explorer data contract and integration patterns
- examples/grammar.ebnf - Canonical grammar definition
- examples/tutorial/ - Step-by-step EBNF tutorial examples
- testdata/ - Golden test fixtures and diagnostics
This repository defines the design and contract of EBNF Forge.
- Core focus: parser, validator, diagnostics, and JSON IR (as a stable contract)
- Rendering and visualization are implemented as separate tools (docgen, diagrams, website explorer) that consume the IR
- The specification is documented in detail in
DESIGN.md
Expect the design to evolve deliberately and with versioned schemas.
- Explicit over implicit
- Validation over permissiveness
- Structure over formatting
- Data over presentation
- Versioned contracts over tribal knowledge
If you are looking for a fast or forgiving EBNF parser, this may not be the right tool. If you are looking to govern a grammar specification over time, it probably is.
This repository includes a Docusaurus-based documentation website.
cd website
npm install
npm run startThis starts a local development server at http://localhost:3000/.
cd website
npm run buildThe static files will be generated in the website/build directory.
Licensed under the Apache License, Version 2.0.
See the LICENSE file for details.