Multiple possible token types #36

beebeeep · 2018-10-09T11:00:44Z

Hello, and thank you for your work :)

I faced some interesting problem while using participle to parse some (well, to be honest, poorly designed) grammar - the thing is that some tokens may be treated differently - for example, as keyword and as identifier.

Simplest example is SQL parser from _examples: it cannot parse "SELECT * FROM select", because table name "select" was identified by lexer as keyword, not ident. Of course that's invalid SQL and table name in this case should be quoted as `select`, but anyway - in my case grammar does not have that quotes.

I doubt what would be the best solution here, the only idea I got is allow multiple Lexer's for Parser - if parser is failing to parse with one lexer, try with another etc.

What do you think? Probably, there is no plans to support anything except CFG grammars, so there is no such problem at all.

alecthomas · 2018-10-09T13:43:09Z

The lack of context does make some things a bit more difficult, that’s for sure. In the specific example of the SQL parser I’d probably just not make SELECT a keyword.

But generally, I’m not sure. I have solved this in one parser by pushing the statefulness down into the lexer. This was a parser for a nested string interpolation language eg. ”hello ${who “nested ${other}”}”. A similar approach can be used for indentation aware languages like Python. I don’t think this approach would be very applicable to the SQL example though as too much logic would have to be pushed down into the lexer.

I’m not against non-CFG but I don’t know if there’s an elegant way to express the grammar like EBNF?

beebeeep · 2018-10-09T18:55:20Z

just not make SELECT a keyword.

Well, it's still nice to have separate keyword type, in particular case - for case-insensitive keyword matching (btw, I've seen that pointlander/peg is using 'token' for case-sensitive matching and "token" for case-insensitive - a bit unobvious but quite elegant)

Regarding how this type ambiguity can be expressed in grammar, for me it seems like no changes here needed - if parser failed to parse current token with given type, it could ask lexer for other options. Or else lexer can have Token.Type as []rune, with standard parser return 1-element slice with option to create custom lexer with multiple choices.

alecthomas · 2018-10-10T07:10:05Z

Regarding how this type ambiguity can be expressed in grammar, for me it seems like no changes here needed - if parser failed to parse current token with given type, it could ask lexer for other options. Or else lexer can have Token.Type as []rune, with standard parser return 1-element slice with option to create custom lexer with multiple choices.

I don't think that's a general solution to contextual-grammars, it just solves one specific example. For example that would not work for the example of the nested interpolation grammar I gave.

alecthomas · 2018-10-11T14:38:05Z

btw, I've seen that pointlander/peg is using 'token' for case-sensitive matching and "token" for case-insensitive - a bit unobvious but quite elegant

I think this is a good idea; the current approach of case case normalisation isn't really a good one. I think I'll add a parser option to do this, eg. participle.CaseInsensitive("Keyword").

beebeeep · 2018-10-12T18:16:08Z

I've been thinking about this for a while - yeah, looks like subj has too narrow scope; also have no other ideas, unfortunately, so I'd rather close issue.

alecthomas · 2018-10-13T20:28:06Z

I've added the CaseInsensitive() option.

beebeeep closed this as completed Oct 12, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple possible token types #36

Multiple possible token types #36

beebeeep commented Oct 9, 2018

alecthomas commented Oct 9, 2018

beebeeep commented Oct 9, 2018

alecthomas commented Oct 10, 2018

alecthomas commented Oct 11, 2018

beebeeep commented Oct 12, 2018

alecthomas commented Oct 13, 2018

Multiple possible token types #36

Multiple possible token types #36

Comments

beebeeep commented Oct 9, 2018

alecthomas commented Oct 9, 2018

beebeeep commented Oct 9, 2018

alecthomas commented Oct 10, 2018

alecthomas commented Oct 11, 2018

beebeeep commented Oct 12, 2018

alecthomas commented Oct 13, 2018