Packrat parsing library for GObjects
Vala Perl Shell
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
build-aux
debian
examples
.gitignore
AUTHORS
COPYING
Makefile.am
NEWS
README
autogen.sh
base.vala
configure.ac
lib.deps.in
lib.pc.in
rules.vala
utils.vala

README

GTeonoma
--------

GTeonoma is a packrat parser for Vala and the GObject system that uses introspection. In short, to use the library, create a collection of objects that extend GObject with properties where each object is part of the abstract syntax tree and each property is a non-terminal in the grammar, then build a grammar and parse. The parser will parse and instantiate objects to build the abstract syntax tree.

For example, take the grammar:

	<Foo> ::= <Access> foo <id> ( <string> ) { <Stmt>* }
	<Access> :: = public | private |
	<Stmt> ::= <int> ; | bar <int> ; | do { <Stmt> } ; | say <string> ;

First, create an object for each rule in the grammar:

	class Foo : Object, SourceInfo {
		public source_location source {get; set;}
		public Identifier id { get; set; }
		public Access access { get; set;}
		public StringLiteral info {get; set;}
		public Gee.List<Stmt> statements {get; set;}
	}

	abstract class Stmt : Object {
	}

	class Bar : Stmt {
		public bool is_bar { get; set; }
		public int64 val { get; set; }
	}

	class DoStmt : Stmt {
		public Stmt expr { get; set;}
	}

	class Say : Stmt {
		public StringLiteral expr { get; set; }
	}

	public enum Access {
		PUBLIC, PRIVATE
	}

Then, create a grammar:
	var rules = new Rules();
	rules.register_int(64, typeof(int64));
	rules.register_string_literal();
	rules.register_identifier();
	rules.register<Foo>("foo", 0, "%p{access}%_foo %P{id}% (% %P{info}% ) {%I%n%l{statements}{% ;%n}%i%n}%n", new Type[] {typeof(Stmt)});
	rules.register<Access>();
	rules.register<Bar>("bar", 0, "%b{is_bar}{bar}%_%P{val}");
	rules.register<DoStmt>("do", 0, "do%_{% %P{expr}% }");
	rules.register<Say>("say", 0, "say %P{expr}");

Notice that each object is registered with a friendly name (that will be output to the user in case of error) and a parse string. The %-codes explain exactly how spacing is to be parsed and %p and %P parse properties. The appropriate parse rule is inferred from the type of the property. Similarly, %l and %L parse lists, but since the generic type information is not availble, the type is given as an argument, as with Foo. Now, simply parse text:

	var parser = new StringParser(rules, "private foo _s (\"f\\x0A\\n\") { 3; 0; do {bar 0x5f}; say \"yo\"}");
	Gee.List<Foo> l;
	switch (parser.parse_all<Foo>(out l)) {
		case Result.OK:
			stdout.printf("OK\n");
			var printer = new ConsolePrinter(rules);
			printer.print_all(l);
			break;
		case Result.FAIL:
			stdout.printf("FAIL\n");
			break;
		case Result.ABORT:
			stdout.printf("ABORT\n");
			break;
	}
	parser.visit_errors((loc, s) => { stderr.printf("Parse error %ld:%ld: %s\n", loc.line, loc.offset, s);});

There is a StringParser and a FileParser, but if another data source is needed, this class can be easily extended. When successfully parsed, the output can be pretty printed with a Printer; in this case, one to the console. All the items in l will now contain a Foo object with the entire parse tree populate.

When a property's type is a class with derived classes registered, it will try all of the derived classes in the order specified and pick the first successful one. This how parsing expression grammars work and this avoids the nested if-else problem associated with LR(1) parsers that causes shift-reduce conflicts. In the example above, there is no instantiable class for Stmt, but the derived classes Bar, Do, and Say will be parsed in its place.

Because there is no introspection support, compact classes, non-GObject class, and structs cannot be used in parsing. Classes extending GLib.Object and interfaces are supported. Addtionally, enums and flags can be used so long as they are registered with the GObject system (i.e., defined in Vala, not from an external API). The nicks are used as parsing strings.