A straightforward recursive descent Parser Generator with a focus on "human" code generation and ease of use.
Switch branches/tags
Clone or download

README.md

Human Parser Generator Build Status

A straightforward recursive descent Parser Generator with a focus on "human" code generation and ease of use.
Christophe VG (contact@christophe.vg)
https://github.com/christophevg/human-parser-generator

Rationale

Although many parser generators exist, I feel like there is room for one more, which generates a parser in a more "human" way.

The objectives are:

  • start from a standard EBNF grammar, e.g. allow copy pasting existing grammars and (maybe almost) be done with it.
  • generate code, as if it were written by a human developer:
    • generate functional classes to construct the AST
    • generate parser logic that is readable and understandable
  • be self hosting: the generator should be able to generate a parser for itself.

EBNF is a (meta-)syntax that can be used to express (context-free) grammars. EBNF is an "extension" to BNF. The Human Parser Generator takes EBNF grammars as input to generate parsers for the language expressed by the grammar.

The project initially targets C#, which is the language of the generator itself. Once the generator is stable, support for generating other languages can be added.

Current Status - Version 1.1

Get the Human Parser Generator

We provide downloads for the repository and a binary build of hpg.exe from our releases GitHub page.

Minimal Survival Commands:

$ git clone https://github.com/christophevg/human-parser-generator
$ cd human-parser-generator
$ msbuild
Microsoft (R) Build Engine version 14.1.0.0
Copyright (C) Microsoft Corporation. All rights reserved.

Build started 3/6/2017 1:46:48 PM.
Project "/Users/xtof/Workspace/human-parser-generator/hpg.csproj" on node 1 (default targets).
MakeBuildDirectory:
  Creating directory "bin/Debug/".
Gen0Parser:
  /Library/Frameworks/Mono.framework/Versions/4.6.2/lib/mono/4.5/csc.exe /debug+ /out:bin/Debug/hpg.gen0.exe /target:exe generator/parsable.cs generator/generator.cs generator/factory.cs generator/emitter.csharp.cs generator/emitter.bnf.cs generator/format.csharp.cs generator/AssemblyInfo.cs generator/grammar.cs generator/bootstrap.cs
Gen1Source:
  mono bin/Debug/hpg.gen0.exe generator/hpg.bnf | LC_ALL="C" astyle -s2 -xt0 -xe -Y -xC80 > generator/parser.gen1.cs
Gen1Parser:
  /Library/Frameworks/Mono.framework/Versions/4.6.2/lib/mono/4.5/csc.exe /debug+ /out:bin/Debug/hpg.gen1.exe /target:exe generator/parsable.cs generator/generator.cs generator/factory.cs generator/emitter.csharp.cs generator/emitter.bnf.cs generator/format.csharp.cs generator/AssemblyInfo.cs generator/parser.gen1.cs generator/hpg.cs
HPGSource:
  mono bin/Debug/hpg.gen1.exe generator/hpg.bnf | LC_ALL="C" astyle -s2 -xt0 -xe -Y -xC80 > generator/parser.cs
Build:
  /Library/Frameworks/Mono.framework/Versions/4.6.2/lib/mono/4.5/csc.exe /debug+ /out:bin/Debug/hpg.exe /target:exe generator/parsable.cs generator/generator.cs generator/factory.cs generator/emitter.csharp.cs generator/emitter.bnf.cs generator/format.csharp.cs generator/AssemblyInfo.cs generator/parser.cs generator/hpg.cs
Done Building Project "/Users/xtof/Workspace/human-parser-generator/hpg.csproj" (default targets).

Build succeeded.
    0 Warning(s)
    0 Error(s)

Time Elapsed 00:00:02.38
$ mono bin/Debug/hpg.exe --help
Human Parser Generator version 1.1.6274.24805
Usage: hpg.exe [options] [file ...]

    --help, -h              Show usage information
    --version, -v           Show version information

    --output, -o FILENAME   Output to file, not stdout

Output options.
Select one of the following:
    --parser, -p            Generate parser (DEFAULT)
    --ast, -a               Show AST
    --model, -m             Show parser model
    --grammar, -g           Show grammar
Formatting options.
    --text, -t              Generate textual output (DEFAULT).
    --dot, -d               Generate Graphviz/Dot format output. (model)
Emission options.
    --info, -i              Suppress generation of info header
    --rule, -r              Suppress generation of rule comment
    --namespace, -n NAME    Embed parser in namespace

When running on a unix-like environment (e.g. macOS, Linux, ...) the generated parsers are styled using AStyle. On Windows this dependency is suppressed by default. To avoid using AStyle, set the AStyle build property to an empty string: msbuild /Property:AStyle=.

A Complete Example

The following example is taken from the Wikipedia page on EBNF:

(* a simple program syntax in EBNF − Wikipedia *)
 program = 'PROGRAM', white space, identifier, white space, 
            'BEGIN', white space, 
            { assignment, ";", white space }, 
            'END.' ;
 identifier = alphabetic character, { alphabetic character | digit } ;
 number = [ "-" ], digit, { digit } ;
 string = '"' , { all characters - '"' }, '"' ;
 assignment = identifier , ":=" , ( number | identifier | string ) ;
 alphabetic character = "A" | "B" | "C" | "D" | "E" | "F" | "G"
                      | "H" | "I" | "J" | "K" | "L" | "M" | "N"
                      | "O" | "P" | "Q" | "R" | "S" | "T" | "U"
                      | "V" | "W" | "X" | "Y" | "Z" ;
 digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ;
 white space = ? white space characters ? ;
 all characters = ? all visible characters ? ;

This grammar allows to parse a Pascal program with assignments:

 PROGRAM DEMO1
 BEGIN
   A:=3;
   B:=45;
   H:=-100023;
   C:=A;
   D123:=B34A;
   BABOON:=GIRAFFE;
   TEXT:="Hello world!";
 END.

To take advantage of the [extended grammar features of the Human Parser Generator](https://github.com/christophevg/human-parser-generator/wiki/HPG Grammar), the grammar above can be rewritten to:

(* a simple program syntax in HPG-flavoured EBNF - based on example from Wikipedia *)

program      = "PROGRAM" identifier
               "BEGIN"
               { assignment ";" }
               "END."
             ;

assignment   = identifier ":=" expression ;

expression   = identifier
             | string
             | number
             ;

identifier   = name  @ ? /([A-Z][A-Z0-9]*)/ ? ;
string       = text  @ ? /"([^"]*)"|'([^']*)'/ ? ;
number       = value @ ? /(-?[1-9][0-9]*)/ ? ;

We can now feed this grammar to the Human Parser Generator

$ mono hpg.exe example/pascal/pascal.bnf

The generated parser is returned on standard output:

// DO NOT EDIT THIS FILE
// This file was generated using the Human Parser Generator
// (https://github.com/christophevg/human-parser-generator)
// on Monday, March 6, 2017 at 1:10:56 PM
// Source : example/pascal/pascal.bnf

using System;
using System.IO;
using System.Collections.Generic;
using System.Text.RegularExpressions;
using System.Linq;

// program ::= "PROGRAM" identifier "BEGIN" { assignment ";" } "END." ;
public class Program {
  public Identifier Identifier { get; set; }
  public List<Assignment> Assignments { get; set; }
  public Program() {
    this.Assignments = new List<Assignment>();
  }
  // ...
}
// ...
public class Parser : ParserBase<Program> {

  // program ::= "PROGRAM" identifier "BEGIN" { assignment ";" } "END." ;
  public override Program Parse() {
    Program program = new Program();
    Log( "ParseProgram" );
    Parse( () => {
      Consume("PROGRAM");
      program.Identifier = ParseIdentifier();
      Consume("BEGIN");
      Repeat( () => {
        program.Assignments.Add(ParseAssignment());
        Consume(";");
      });
      Consume("END.");
    }).OrThrow("Failed to parse Program");
    return program;
  }
// ...
}

If no file is provided, input is read from standard input.

Combine this generated parser with parsable.cs and add a minimal driver application:

// run.cs - a minimal driver application of HPG generated parsers
using System;
using System.IO;

public class Runner {
  public static void Main(string[] args) {
    string source = File.ReadAllText(args[0]);

    Parser parser = new Parser();
    parser.Parse(source);

    Console.WriteLine(parser.AST);
  }
}

Compile and run ...

$ mcs run.cs pascal.cs generator/parsable.cs 
$ mono run.exe example/pascal/example.pascal

The output is a string representation of the resulting AST:

new Program() {
  Identifier = new Identifier() { Name = "DEMO1"},
  Assignments = new List<Assignment>() {
    new Assignment() {
      Identifier = new Identifier() { Name = "A"},
      Expression = new Number() { Value = "3" }
    },
    new Assignment() {
      Identifier = new Identifier() { Name = "B"},
      Expression = new Number() { Value = "45" }
    },
    new Assignment() {
      Identifier = new Identifier() { Name = "H"},
      Expression = new Number() { Value = "-100023" }
    },
    new Assignment() {
      Identifier = new Identifier() { Name = "C"},
      Expression = new Identifier() { Name = "A" }
    },
    new Assignment() {
      Identifier = new Identifier() { Name = "D123"},
      Expression = new Identifier() { Name = "B34A" }
    },
    new Assignment() {
      Identifier = new Identifier() { Name = "BABOON"},
      Expression = new Identifier() { Name = "GIRAFFE" }
    },
    new Assignment() {
      Identifier = new Identifier() { Name = "TEXT"},
        Expression = new String() { Text = "Hello world!" }
    }
  }
}

Documentation

Consult the repository's wiki for more background, tutorials and annotated examples.