New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BNF patterns #902

Open
gavinking opened this Issue Jan 15, 2014 · 19 comments

Comments

Projects
None yet
4 participants
@gavinking
Member

gavinking commented Jan 15, 2014

Problem

Parsing text is an extremely common task, and therefore it would be very nice to have a convenient syntax for creating little parsers from a grammar.

Rant

The traditional thing that programming languages have provided for this is regular expression matching. A regex is a pattern of character classes with alternation and quantification, the operators |, *, +, ?. Unfortunately, regexes are entirely unfit for purpose, since any nontrivial parsing task, and even many trivial parsing tasks, require at least one of:

  • subrules,
  • AST production, and/or
  • a distinction between the tokenization and parsing (in order to be able to handle whitespace).

The fact that regexes provide none of these things is frankly laughable, and it's absurd that language designers have considered this a useful facility. And it's extremely hard to understand, given that language designers are usually at least half familiar with what a real parser looks like. Regexes are not even capable of matching nested parentheses, one of the most basic tasks in parsing.

Worse, regexes feature an awful syntax where what is a character and what is a "metacharacter" is not something clearly distinguishable to the eye.

Proposal

I propose that we introduce a special syntax for BNF patterns that allows parsers to be built via combination of parser rules. This facility would follow the general approach we have taken with operators in Ceylon, where the operators are defined against some general purpose interfaces, allowing different parser libraries to take advantage of the syntax.

Patterns are built from, and usually used to define rules. For example:

value number = ## digit+ (dot digit+)? (e sign? digit+)? ##;

Patterns need to be quoted, since the syntax of a pattern is not part of the expression syntax. I have gone with ## as the quote character here, though there are other options, including \\.

value number = \\ digit+ (dot digit+)? (e sign? digit+)? \\;

Identifiers name rules, for example, digit, e, sign, and dot.

The operators |, ?, *, and + represent alternation and quantification. Parens represent grouping. We might also have a ~ operator, not sure about that one.

Todo

The main outstanding thing to do here is to define a Rule interface that generalizes the proposed operators to enough reasonable parser libraries.

Note that this proposal is very straightforward to implement, since it essentially amounts the addition of some new operators to the compiler. It is not a whole lot of work.

WDYT?

@tombentley

This comment has been minimized.

Show comment
Hide comment
@tombentley

tombentley Jan 15, 2014

Member

I'm not really sure I understand what you're proposing, to be honest.

  • Stuff between ## is a pattern, and the type of a pattern is Rule?
  • Hence we can construct a tree of Rules within Ceylon code?
  • At runtime we can pass a Rule instance to a 'parser library' (is that the same thing as a module) (or does this happen at compile time?)
  • What do we get back from the 'parser library'?
  • How does this address your gripes about AST production and tokenization?
Member

tombentley commented Jan 15, 2014

I'm not really sure I understand what you're proposing, to be honest.

  • Stuff between ## is a pattern, and the type of a pattern is Rule?
  • Hence we can construct a tree of Rules within Ceylon code?
  • At runtime we can pass a Rule instance to a 'parser library' (is that the same thing as a module) (or does this happen at compile time?)
  • What do we get back from the 'parser library'?
  • How does this address your gripes about AST production and tokenization?
@FroMage

This comment has been minimized.

Show comment
Hide comment
@FroMage

FroMage Jan 15, 2014

Member

Is this supposed to address the other things addressed by regexes, like:

  • Replacement
  • Capturing
  • Look-ahead/behind
  • Match bounds (for ex \d{2})
  • Eager/greedy modifiers
  • Locations (start of input, end of input, word boundary)
Member

FroMage commented Jan 15, 2014

Is this supposed to address the other things addressed by regexes, like:

  • Replacement
  • Capturing
  • Look-ahead/behind
  • Match bounds (for ex \d{2})
  • Eager/greedy modifiers
  • Locations (start of input, end of input, word boundary)
@gavinking

This comment has been minimized.

Show comment
Hide comment
@gavinking

gavinking Jan 15, 2014

Member
  • Stuff between ## is a pattern, and the type of a pattern is Rule?
  • Hence we can construct a tree of Rules within Ceylon code?

Right.

  • At runtime we can pass a Rule instance to a 'parser library'

No, your parser library implements Rule. For example, a regex library would come with primitive rules like wordChar, charClass(), etc, and then you use patterns to build more sophisticated rules.

  • What do we get back from the 'parser library'?

WDYM? You get objects. I don't understand the question.

How does this address your gripes about AST production and tokenization?

Rule instances might accept a character stream, and produce Tokens. (A scanner). Or they might accept Tokens and produce Nodes. (A parser).

Rule is an extremely generic interface, along the lines of Ranged, something like that. The only thing it really specifies is composability via the operators.

Member

gavinking commented Jan 15, 2014

  • Stuff between ## is a pattern, and the type of a pattern is Rule?
  • Hence we can construct a tree of Rules within Ceylon code?

Right.

  • At runtime we can pass a Rule instance to a 'parser library'

No, your parser library implements Rule. For example, a regex library would come with primitive rules like wordChar, charClass(), etc, and then you use patterns to build more sophisticated rules.

  • What do we get back from the 'parser library'?

WDYM? You get objects. I don't understand the question.

How does this address your gripes about AST production and tokenization?

Rule instances might accept a character stream, and produce Tokens. (A scanner). Or they might accept Tokens and produce Nodes. (A parser).

Rule is an extremely generic interface, along the lines of Ranged, something like that. The only thing it really specifies is composability via the operators.

@gavinking

This comment has been minimized.

Show comment
Hide comment
@gavinking

gavinking Jan 15, 2014

Member

Is this supposed to address the other things addressed by regexes

  • Replacement
  • Capturing

Those are concerns of the parser library.

Look-ahead/behind

Whether a parser library is topdown, bottomup, a PEG, uses lookahead, syntactic predicates, or whatever, is not really relevant at this level of the discussion:

  • Note that PEGs and ANTLR4 simply use the order of alternatives to imply lookahead.
  • Furthermore, patterns are not the only way to produce Rules. You can also create rules by writing ordinary code.
  • If you desperately want to incorporate a special syntax for lookahead into the pattern language, that is easy enough, we could add a =>operator, but I don't think that's needed.

Eager/greedy modifiers

Similar answer:

value myNongreedyRule = ungreedy(myGreedyRule);

I have never once needed a non-greedy rule.

Match bounds (for ex \d{2})

I mean, we could add that, but I have never really run into much need for it. It's not like x{4} is really much superior to x x x x. (Of course you need it in traditional regexes because there are no subrules.) OTOH, I'm not actively against it. I've just not run into very many parsing tasks where it helps. And again, patterns are not the only way to produce rules:

value fourDigits = repeat(digit,4);

Locations (start of input, end of input, word boundary)

These are just primitive Rules, aren't they?

By the way, "word boundary" is something you never ever need if you use a proper separation between the scanner and parser, AFAICT.

Member

gavinking commented Jan 15, 2014

Is this supposed to address the other things addressed by regexes

  • Replacement
  • Capturing

Those are concerns of the parser library.

Look-ahead/behind

Whether a parser library is topdown, bottomup, a PEG, uses lookahead, syntactic predicates, or whatever, is not really relevant at this level of the discussion:

  • Note that PEGs and ANTLR4 simply use the order of alternatives to imply lookahead.
  • Furthermore, patterns are not the only way to produce Rules. You can also create rules by writing ordinary code.
  • If you desperately want to incorporate a special syntax for lookahead into the pattern language, that is easy enough, we could add a =>operator, but I don't think that's needed.

Eager/greedy modifiers

Similar answer:

value myNongreedyRule = ungreedy(myGreedyRule);

I have never once needed a non-greedy rule.

Match bounds (for ex \d{2})

I mean, we could add that, but I have never really run into much need for it. It's not like x{4} is really much superior to x x x x. (Of course you need it in traditional regexes because there are no subrules.) OTOH, I'm not actively against it. I've just not run into very many parsing tasks where it helps. And again, patterns are not the only way to produce rules:

value fourDigits = repeat(digit,4);

Locations (start of input, end of input, word boundary)

These are just primitive Rules, aren't they?

By the way, "word boundary" is something you never ever need if you use a proper separation between the scanner and parser, AFAICT.

@gavinking

This comment has been minimized.

Show comment
Hide comment
@gavinking

gavinking Jan 15, 2014

Member

P.S. An issue I glossed over in the original proposal is whether a pattern can have function calls in it. If it can, then we can write stuff like greedy(rule), repeat(digit,4), charRange('a','b'), group(wildcard, "replacementGroupName") in a pattern. However, we would have to nail down the syntax of that. In a first cut of this, it's not necessary.

Member

gavinking commented Jan 15, 2014

P.S. An issue I glossed over in the original proposal is whether a pattern can have function calls in it. If it can, then we can write stuff like greedy(rule), repeat(digit,4), charRange('a','b'), group(wildcard, "replacementGroupName") in a pattern. However, we would have to nail down the syntax of that. In a first cut of this, it's not necessary.

@tombentley

This comment has been minimized.

Show comment
Hide comment
@tombentley

tombentley Jan 15, 2014

Member

Whether a parser library is topdown, bottomup, a PEG, uses lookahead, syntactic predicates, or whatever, is not really relevant at this level of the discussion

Is that really true?

  • Things like ANTLR generate compiled parsers, rather than interpreted ones. I don't yet see how this proposal enables their use.
  • The parser library in use will determine whether things like left/right recursion are supported. Personally I would want the IDE to tell me the grammar I'm writing isn't doing to work because its left recursive, which my chosen library doesn't support, rather then discovering it at runtime.
Member

tombentley commented Jan 15, 2014

Whether a parser library is topdown, bottomup, a PEG, uses lookahead, syntactic predicates, or whatever, is not really relevant at this level of the discussion

Is that really true?

  • Things like ANTLR generate compiled parsers, rather than interpreted ones. I don't yet see how this proposal enables their use.
  • The parser library in use will determine whether things like left/right recursion are supported. Personally I would want the IDE to tell me the grammar I'm writing isn't doing to work because its left recursive, which my chosen library doesn't support, rather then discovering it at runtime.
@gavinking

This comment has been minimized.

Show comment
Hide comment
@gavinking

gavinking Jan 15, 2014

Member

Things like ANTLR generate compiled parsers, rather than interpreted ones. I don't yet see how this proposal enables their use.

Sure, this is for interpreters, not for compiler compilers. I'm not trying to abstract over ANTLR and YACC. But it's surely possible to write pretty good interpreters for LL grammars and PEGs, which seem the most relevant grammars these days. LR grammars I must admit I'm not so sure about, I don't really fully understand bottom-up parsing.

The parser library in use will determine whether things like left/right recursion are supported. Personally I would want the IDE to tell me the grammar I'm writing isn't going to work because its left recursive, which my chosen library doesn't support, rather then discovering it at runtime.

I mean, that's feedback you could get from a one-line unit-test, if the library supported it. The overall workflow would be more convenient than with ANTLR, not less.

Member

gavinking commented Jan 15, 2014

Things like ANTLR generate compiled parsers, rather than interpreted ones. I don't yet see how this proposal enables their use.

Sure, this is for interpreters, not for compiler compilers. I'm not trying to abstract over ANTLR and YACC. But it's surely possible to write pretty good interpreters for LL grammars and PEGs, which seem the most relevant grammars these days. LR grammars I must admit I'm not so sure about, I don't really fully understand bottom-up parsing.

The parser library in use will determine whether things like left/right recursion are supported. Personally I would want the IDE to tell me the grammar I'm writing isn't going to work because its left recursive, which my chosen library doesn't support, rather then discovering it at runtime.

I mean, that's feedback you could get from a one-line unit-test, if the library supported it. The overall workflow would be more convenient than with ANTLR, not less.

@gavinking

This comment has been minimized.

Show comment
Hide comment
@gavinking

gavinking Jan 15, 2014

Member

So let me make this a bit more concrete by building a very simplistic character matching engine. Suppose we define Rule in ceylon.language like this:

shared interface Rule<T> of T 
        given T satisfies Rule<T> {
    shared formal T or(T other);
    shared formal T before(T other);
    shared formal T zeroOrOne();
    shared formal T zeroOrMore();
    shared formal T oneOrMore();
}

Then our scanner might look like this:

abstract class CharacterRule() 
        satisfies Rule<CharacterRule> {

    shared formal Integer? match(String stream, Integer start);

    shared Boolean matches(String stream, Integer start=0)
            => match(stream, start) exists;

    shared actual CharacterRule before(CharacterRule other) {
        object rule extends CharacterRule() {
            shared actual Integer? match(String stream, Integer start) {
                if (exists end = outer.match(stream, start)) {
                    return other.match(stream,end);
                }
                else {
                    return null;
                }
            }
            string => outer.string + other.string;
        }
        return rule;
    }

    shared actual CharacterRule zeroOrMore() {
        object rule extends CharacterRule() {
            shared actual Integer match(String stream, Integer start) {
                variable Integer position = start;
                while (exists end = outer.match(stream,position)) {
                    position = end;
                }
                return position;
            }
            string => outer.string + "*";
        }
        return rule;
    }

    shared actual CharacterRule oneOrMore() 
            => zeroOrMore().before(this);

    shared actual CharacterRule zeroOrOne() {
        object rule extends CharacterRule() {
            shared actual Integer match(String stream, Integer start)
                     => outer.match(stream,start) else start;
            string => outer.string + "?";
        }
        return rule;
    }

    shared actual CharacterRule or(CharacterRule other) {
        object rule extends CharacterRule() {
            shared actual Integer? match(String stream, Integer start)
                    => outer.match(stream,start) else other.match(stream, start);
            string => "(" + outer.string + "|" + other.string + ")";
        }
        return rule;
    }

}

class Lit(String characters) 
        extends CharacterRule() {
    match(String stream, Integer start) 
            => stream[start:characters.size]==characters then start+characters.size;
    string => characters;
}

class Class({Character*} characters) 
        extends CharacterRule() {
    match(String stream, Integer start) 
            => stream[start:start] in characters then start+1;
    string => "[" + characters.string + "]";
}

Finally, the following grammar would match numbers:

value digit = Class('0'..'9');
value dot = Lit(".");
value e = Lit("e");
value sign = Class { '+', '-' };
value number = ## sign? digit+ (dot digit+)? (e sign? digit+)? ##;
Member

gavinking commented Jan 15, 2014

So let me make this a bit more concrete by building a very simplistic character matching engine. Suppose we define Rule in ceylon.language like this:

shared interface Rule<T> of T 
        given T satisfies Rule<T> {
    shared formal T or(T other);
    shared formal T before(T other);
    shared formal T zeroOrOne();
    shared formal T zeroOrMore();
    shared formal T oneOrMore();
}

Then our scanner might look like this:

abstract class CharacterRule() 
        satisfies Rule<CharacterRule> {

    shared formal Integer? match(String stream, Integer start);

    shared Boolean matches(String stream, Integer start=0)
            => match(stream, start) exists;

    shared actual CharacterRule before(CharacterRule other) {
        object rule extends CharacterRule() {
            shared actual Integer? match(String stream, Integer start) {
                if (exists end = outer.match(stream, start)) {
                    return other.match(stream,end);
                }
                else {
                    return null;
                }
            }
            string => outer.string + other.string;
        }
        return rule;
    }

    shared actual CharacterRule zeroOrMore() {
        object rule extends CharacterRule() {
            shared actual Integer match(String stream, Integer start) {
                variable Integer position = start;
                while (exists end = outer.match(stream,position)) {
                    position = end;
                }
                return position;
            }
            string => outer.string + "*";
        }
        return rule;
    }

    shared actual CharacterRule oneOrMore() 
            => zeroOrMore().before(this);

    shared actual CharacterRule zeroOrOne() {
        object rule extends CharacterRule() {
            shared actual Integer match(String stream, Integer start)
                     => outer.match(stream,start) else start;
            string => outer.string + "?";
        }
        return rule;
    }

    shared actual CharacterRule or(CharacterRule other) {
        object rule extends CharacterRule() {
            shared actual Integer? match(String stream, Integer start)
                    => outer.match(stream,start) else other.match(stream, start);
            string => "(" + outer.string + "|" + other.string + ")";
        }
        return rule;
    }

}

class Lit(String characters) 
        extends CharacterRule() {
    match(String stream, Integer start) 
            => stream[start:characters.size]==characters then start+characters.size;
    string => characters;
}

class Class({Character*} characters) 
        extends CharacterRule() {
    match(String stream, Integer start) 
            => stream[start:start] in characters then start+1;
    string => "[" + characters.string + "]";
}

Finally, the following grammar would match numbers:

value digit = Class('0'..'9');
value dot = Lit(".");
value e = Lit("e");
value sign = Class { '+', '-' };
value number = ## sign? digit+ (dot digit+)? (e sign? digit+)? ##;
@tombentley

This comment has been minimized.

Show comment
Hide comment
@tombentley

tombentley Jan 16, 2014

Member

@gavinking thanks, I think that illustrates what you're proposing quite well. And I do quite like what I see.

On a practical point, isn't there a bit of a chicken/egg problem here: Don't we at least need one working implementation of this before we go setting it in stone in the spec?

Member

tombentley commented Jan 16, 2014

@gavinking thanks, I think that illustrates what you're proposing quite well. And I do quite like what I see.

On a practical point, isn't there a bit of a chicken/egg problem here: Don't we at least need one working implementation of this before we go setting it in stone in the spec?

@gavinking

This comment has been minimized.

Show comment
Hide comment
@gavinking

gavinking Jan 16, 2014

Member

Don't we at least need one working implementation of this before we go setting it in stone in the spec?

Well, I mean, since the only thing the Rule API defines is the signatures of |, ?, *, +, and sequencing, which I think are pretty well-understood, I don't see much danger of us getting it wrong.

OTOH, sure, we should write a very simple parser combinator library for the SDK, of course. I like this blog post as a simple overview, by the way: http://qntm.org/combinators.

Member

gavinking commented Jan 16, 2014

Don't we at least need one working implementation of this before we go setting it in stone in the spec?

Well, I mean, since the only thing the Rule API defines is the signatures of |, ?, *, +, and sequencing, which I think are pretty well-understood, I don't see much danger of us getting it wrong.

OTOH, sure, we should write a very simple parser combinator library for the SDK, of course. I like this blog post as a simple overview, by the way: http://qntm.org/combinators.

@tombentley

This comment has been minimized.

Show comment
Hide comment
@tombentley

tombentley Jan 27, 2014

Member

A thought occurred to me over the weekend: With a PEG the | operator has a different interpretation than in a normal CFG. Which means that you can't actually read one of these grammars and know what it will match -- you'd need to know which interpretation the parser library used for Rule.or.

We could separate out these two interpretations into different operators, of course. If we did that, perhaps we could also build ambiguity detection into the typechecker.

Member

tombentley commented Jan 27, 2014

A thought occurred to me over the weekend: With a PEG the | operator has a different interpretation than in a normal CFG. Which means that you can't actually read one of these grammars and know what it will match -- you'd need to know which interpretation the parser library used for Rule.or.

We could separate out these two interpretations into different operators, of course. If we did that, perhaps we could also build ambiguity detection into the typechecker.

@gavinking

This comment has been minimized.

Show comment
Hide comment
@gavinking

gavinking Jan 27, 2014

Member

With a PEG the | operator has a different interpretation than in a normal CFG.

Well, a slightly different interpretation.

Member

gavinking commented Jan 27, 2014

With a PEG the | operator has a different interpretation than in a normal CFG.

Well, a slightly different interpretation.

@gavinking

This comment has been minimized.

Show comment
Hide comment
@gavinking

gavinking Jan 27, 2014

Member

If we did that, perhaps we could also build ambiguity detection into the typechecker.

What is "ambiguous" still depends on what kind of grammar you have. An unambiguous LL(2) grammar can be an ambiguous LL(1) grammar. And that's just for LL parsers!

Member

gavinking commented Jan 27, 2014

If we did that, perhaps we could also build ambiguity detection into the typechecker.

What is "ambiguous" still depends on what kind of grammar you have. An unambiguous LL(2) grammar can be an ambiguous LL(1) grammar. And that's just for LL parsers!

@tombentley

This comment has been minimized.

Show comment
Hide comment
@tombentley

tombentley Jan 27, 2014

Member

What is "ambiguous" still depends on what kind of grammar you have

Ah, good point.

Member

tombentley commented Jan 27, 2014

What is "ambiguous" still depends on what kind of grammar you have

Ah, good point.

@gavinking gavinking modified the milestones: 1.2, 1.1 Jun 3, 2014

@gavinking

This comment has been minimized.

Show comment
Hide comment
@gavinking

gavinking Oct 19, 2014

Member

Note that if we add support for AST transformers then we could in principle even accommodate parser compilers with this feature!

Indeed, AST transformers would allow some really simple parser implementations using stuff like compile-time translation to a regex with error reporting and everything.

That would make this feature even more attractive, it seems to me.

Member

gavinking commented Oct 19, 2014

Note that if we add support for AST transformers then we could in principle even accommodate parser compilers with this feature!

Indeed, AST transformers would allow some really simple parser implementations using stuff like compile-time translation to a regex with error reporting and everything.

That would make this feature even more attractive, it seems to me.

@gavinking gavinking referenced this issue Oct 19, 2014

Closed

Regexes #390

@gavinking

This comment has been minimized.

Show comment
Hide comment
@gavinking

gavinking Oct 19, 2014

Member

perhaps we could also build ambiguity detection into the typechecker.

To be clear, what I'm saying above is that AST transformers would solve this problem.

Member

gavinking commented Oct 19, 2014

perhaps we could also build ambiguity detection into the typechecker.

To be clear, what I'm saying above is that AST transformers would solve this problem.

@EricSL

This comment has been minimized.

Show comment
Hide comment
@EricSL

EricSL Feb 5, 2015

An issue I glossed over in the original proposal is whether a pattern can have function calls in it. If it can, then we can write stuff like greedy(rule), repeat(digit,4), charRange('a','b'), group(wildcard, "replacementGroupName") in a pattern. However, we would have to nail down the syntax of that. In a first cut of this, it's not necessary.

As a first cut, at least do it in a way that it will be possible to eventually reach feature parity with OMeta. It's an amazingly powerful language that I also find hard to read. But it should be possible to make more readable. The IronMeta variant looks better to me.

I find your syntax unconvincing as you don't give an example of multiple mutually recursive rules, and somewhat deeper language changes will be needed to give a clean way to do that.

EricSL commented Feb 5, 2015

An issue I glossed over in the original proposal is whether a pattern can have function calls in it. If it can, then we can write stuff like greedy(rule), repeat(digit,4), charRange('a','b'), group(wildcard, "replacementGroupName") in a pattern. However, we would have to nail down the syntax of that. In a first cut of this, it's not necessary.

As a first cut, at least do it in a way that it will be possible to eventually reach feature parity with OMeta. It's an amazingly powerful language that I also find hard to read. But it should be possible to make more readable. The IronMeta variant looks better to me.

I find your syntax unconvincing as you don't give an example of multiple mutually recursive rules, and somewhat deeper language changes will be needed to give a clean way to do that.

@EricSL

This comment has been minimized.

Show comment
Hide comment
@EricSL

EricSL Feb 5, 2015

Actually I think the type system may almost be powerful enough to do this if the rules are all classes instead of objects.

class Calculator extends StringParser {
// inherits nested classes like Star, Sequence, etc.
class Number extends RecursiveRule<Plus> {}
class Expression extends RecursiveRule<Choice<[AddExpression, SubtractExpression, ParenExpression, Number]>> {}
alias OpenParen = Literal("(").RuleClass
alias CloseParen = Literal(")").RuleClass
class ParenExpression extends RecursiveRule<Sequence<[OpenParen, Expression, CloseParen]>> {}
class BinaryOp(String operator) extends ParameterizedRule<[String]>(operator) {
alias Operator = Literal(operator).RuleClass
override RuleClass extends RecursiveRule<Sequence<[Expression, Operator, Expression]>> {}
}
alias AddExpression = BinaryOp("+").RuleClass
alias SubtractExpression = BinaryOp("-").RuleClass
override Expression parserRoot => Expression()
}

The idea being that instantiating a parser causes the root rule to be instantiated, which in turn instantiates any rules it depends on, and adds them to a hash set unless it finds the same rule is already there in which case it knows it doesn't need to recurse on that rule. Two rules are equal if they have the same class; parameterized rules are equal if they have the same class and the same parameters.

EricSL commented Feb 5, 2015

Actually I think the type system may almost be powerful enough to do this if the rules are all classes instead of objects.

class Calculator extends StringParser {
// inherits nested classes like Star, Sequence, etc.
class Number extends RecursiveRule<Plus> {}
class Expression extends RecursiveRule<Choice<[AddExpression, SubtractExpression, ParenExpression, Number]>> {}
alias OpenParen = Literal("(").RuleClass
alias CloseParen = Literal(")").RuleClass
class ParenExpression extends RecursiveRule<Sequence<[OpenParen, Expression, CloseParen]>> {}
class BinaryOp(String operator) extends ParameterizedRule<[String]>(operator) {
alias Operator = Literal(operator).RuleClass
override RuleClass extends RecursiveRule<Sequence<[Expression, Operator, Expression]>> {}
}
alias AddExpression = BinaryOp("+").RuleClass
alias SubtractExpression = BinaryOp("-").RuleClass
override Expression parserRoot => Expression()
}

The idea being that instantiating a parser causes the root rule to be instantiated, which in turn instantiates any rules it depends on, and adds them to a hash set unless it finds the same rule is already there in which case it knows it doesn't need to recurse on that rule. Two rules are equal if they have the same class; parameterized rules are equal if they have the same class and the same parameters.

@EricSL

This comment has been minimized.

Show comment
Hide comment
@EricSL

EricSL Feb 6, 2015

Okay, the type system isn't as powerful as I hoped. In particular there appears to be no way to instantiate a generic type parameter, and no way to create a class alias that is particular to a class within a particular instance.

So it comes down to language support for lazily instantiating the rules and allowing rules to refer to variables declared after them.

EricSL commented Feb 6, 2015

Okay, the type system isn't as powerful as I hoped. In particular there appears to be no way to instantiate a generic type parameter, and no way to create a class alias that is particular to a class within a particular instance.

So it comes down to language support for lazily instantiating the rules and allowing rules to refer to variables declared after them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment