Skip to content

ForgeMechanic/EBNF-Forge

Repository files navigation

EBNF Forge

EBNF Forge is a strict, specification-oriented EBNF parser and validator that enforces a bounded grammar subset and emits a canonical, versioned JSON representation of the grammar and its documentation metadata.

This tool treats EBNF as a governed specification artifact, not as an informal notation or a parser-generator input. Its primary goals are validation, stability, and long-term maintainability—not code generation.

This repository also includes companion tools that consume the JSON IR to generate documentation, diagrams, and interactive exploration.


Why this exists

First I needed it as I was building another language.

EBNF is widely used, but in practice:

  • There is no single, canonical EBNF.
  • Tooling varies widely in what syntax is accepted, ignored, or extended.
  • Many tools either:
    • parse loosely and discard structure, or
    • enforce syntax but discard documentation and intent.

As a result, grammar specifications often drift over time:

  • undocumented features creep in,
  • syntax diverges from what implementations can realistically support,
  • documentation becomes informal, ambiguous, or detached from the grammar itself.

EBNF Forge exists to prevent that drift.

It provides:

  • a real parser (not line-based or heuristic),
  • explicit validation rules that define an allowed EBNF subset,
  • deterministic diagnostics with source spans,
  • and a structured JSON output that downstream tools can rely on.

Lineage and EBNF dialect

EBNF Forge is inspired by the ISO/IEC 14977 family of EBNF definitions, but it is not a full implementation of any single published dialect, nor does it attempt to be maximally permissive.

Key points about its lineage and differences:

  • Grammar model

    • Uses traditional EBNF constructs: sequence, choice, grouping, optionality, repetition.
    • Rejects regex-style extensions commonly found in ad-hoc grammars unless explicitly allowed.
  • Explicit subset

    • Only a deliberately chosen subset of EBNF is accepted.
    • Unsupported or ambiguous constructs are rejected with diagnostics.
    • This is intentional: the goal is predictability, not compatibility with every tool.
  • No parser-generator semantics

    • This tool does not assume LL, LR, PEG, Pratt, or any other parsing strategy.
    • Validation rules may optionally enforce constraints compatible with a given strategy, but those are policy layers—not core syntax.
  • Documentation as first-class data

    • Documentation is extracted via explicit annotation comments.
    • Nothing is inferred heuristically.
    • Comments are part of the grammar’s machine-checked structure, not just prose.

Because EBNF dialects differ across tools and communities, EBNF Forge makes its rules explicit and versioned, rather than pretending to be “universal.”


What this tool does

  • Parses EBNF files into a proper abstract syntax tree (AST)
  • Validates grammar structure against a defined EBNF subset
  • Enforces rules to prevent unreviewed feature creep
  • Extracts documentation and metadata from explicit annotation comments
  • Emits a canonical, deterministic JSON IR containing:
    • grammar structure
    • documentation blocks
    • sections and tags
    • cross-references
    • diagnostics with source spans

This JSON output is the primary artifact of the tool.


What this tool intentionally does NOT do

  • ❌ Generate parsers or code
  • ❌ Attempt to support every EBNF extension used by other tools
  • ❌ Infer documentation structure heuristically
  • ❌ Hide syntax errors or “best-guess” malformed input

Rendering (Markdown, HTML, railroad diagrams, etc.) is handled by separate tools that consume the JSON IR produced by ebnff. This repo ships those tools, rather than baking rendering into the validator itself.

Included tools in this repository

  • Validator: ebnff — validates EBNF and emits versioned JSON IR (--out).
  • Doc generator: ebnffdocgen — generates Markdown/MDX docs from JSON IR.
  • Diagram generator: ebnff-diagrams — generates railroad diagrams (SVG files or an inline SVG map).
  • Website explorer: Docusaurus site includes a /explorer page backed by IR explorer data and inline diagrams.

A note on modern parser development (and AI)

It is now trivially easy to build parsers using modern AI systems.

Given a well-specified EBNF grammar, an LLM can:

  • generate a recursive-descent parser,
  • generate a Pratt or precedence-based parser,
  • generate a PEG-style parser,
  • or generate multiple implementations in different languages.

This tool is not trying to replace that.

Instead, EBNF Forge focuses on the part that AI-assisted parser generation does not solve well:

  • locking down grammar shape,
  • preventing silent feature creep,
  • enforcing consistency and intent over time,
  • and providing a stable, machine-checkable specification artifact.

Think of it as the contract that AI-generated parsers should obey.


Example: using EBNF with an AI to generate a parser

Given a validated EBNF grammar, a prompt like the following is often sufficient:

You are implementing a parser based on the following EBNF grammar.

Constraints:
- Follow the grammar exactly; do not introduce new syntax.
- Report syntax errors with line/column information.
- Do not accept constructs not present in the grammar.
- Preserve all identifiers and structure.

Grammar:
<insert EBNF here>

Target language: Go
Parsing strategy: recursive descent
Output:
- Lexer (if required)
- Parser
- AST node definitions
- Error handling

Do not add features or extensions beyond the grammar.

This workflow works best when the grammar itself is validated and constrained—which is exactly what this repository provides.

Downstream tools may also consume the JSON IR produced by EBNF Forge instead of re-parsing the raw grammar.


Typical use cases

  • Validating a language or protocol grammar in CI
  • Preventing undocumented or unsupported grammar extensions
  • Enabling safe AI-assisted editing of grammar specifications
  • Generating downstream documentation, diagrams, or search indexes
  • Acting as a stable contract between specification authors and tooling

Documentation


Repository status

This repository defines the design and contract of EBNF Forge.

  • Core focus: parser, validator, diagnostics, and JSON IR (as a stable contract)
  • Rendering and visualization are implemented as separate tools (docgen, diagrams, website explorer) that consume the IR
  • The specification is documented in detail in DESIGN.md

Expect the design to evolve deliberately and with versioned schemas.


Design philosophy

  • Explicit over implicit
  • Validation over permissiveness
  • Structure over formatting
  • Data over presentation
  • Versioned contracts over tribal knowledge

If you are looking for a fast or forgiving EBNF parser, this may not be the right tool. If you are looking to govern a grammar specification over time, it probably is.


Website

This repository includes a Docusaurus-based documentation website.

Local Development

cd website
npm install
npm run start

This starts a local development server at http://localhost:3000/.

Building

cd website
npm run build

The static files will be generated in the website/build directory.


License

Licensed under the Apache License, Version 2.0. See the LICENSE file for details.

About

A basic validation and doc export tool for ebnf

Topics

Resources

License

Stars

Watchers

Forks

Contributors