Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple possible token types #36

Closed
beebeeep opened this issue Oct 9, 2018 · 6 comments
Closed

Multiple possible token types #36

beebeeep opened this issue Oct 9, 2018 · 6 comments

Comments

@beebeeep
Copy link

beebeeep commented Oct 9, 2018

Hello, and thank you for your work :)

I faced some interesting problem while using participle to parse some (well, to be honest, poorly designed) grammar - the thing is that some tokens may be treated differently - for example, as keyword and as identifier.

Simplest example is SQL parser from _examples: it cannot parse "SELECT * FROM select", because table name "select" was identified by lexer as keyword, not ident. Of course that's invalid SQL and table name in this case should be quoted as `select`, but anyway - in my case grammar does not have that quotes.

I doubt what would be the best solution here, the only idea I got is allow multiple Lexer's for Parser - if parser is failing to parse with one lexer, try with another etc.

What do you think? Probably, there is no plans to support anything except CFG grammars, so there is no such problem at all.

@alecthomas
Copy link
Owner

The lack of context does make some things a bit more difficult, that’s for sure. In the specific example of the SQL parser I’d probably just not make SELECT a keyword.

But generally, I’m not sure. I have solved this in one parser by pushing the statefulness down into the lexer. This was a parser for a nested string interpolation language eg. ”hello ${who “nested ${other}”}”. A similar approach can be used for indentation aware languages like Python. I don’t think this approach would be very applicable to the SQL example though as too much logic would have to be pushed down into the lexer.

I’m not against non-CFG but I don’t know if there’s an elegant way to express the grammar like EBNF?

@beebeeep
Copy link
Author

beebeeep commented Oct 9, 2018

just not make SELECT a keyword.

Well, it's still nice to have separate keyword type, in particular case - for case-insensitive keyword matching (btw, I've seen that pointlander/peg is using 'token' for case-sensitive matching and "token" for case-insensitive - a bit unobvious but quite elegant)

Regarding how this type ambiguity can be expressed in grammar, for me it seems like no changes here needed - if parser failed to parse current token with given type, it could ask lexer for other options. Or else lexer can have Token.Type as []rune, with standard parser return 1-element slice with option to create custom lexer with multiple choices.

@alecthomas
Copy link
Owner

Regarding how this type ambiguity can be expressed in grammar, for me it seems like no changes here needed - if parser failed to parse current token with given type, it could ask lexer for other options. Or else lexer can have Token.Type as []rune, with standard parser return 1-element slice with option to create custom lexer with multiple choices.

I don't think that's a general solution to contextual-grammars, it just solves one specific example. For example that would not work for the example of the nested interpolation grammar I gave.

@alecthomas
Copy link
Owner

btw, I've seen that pointlander/peg is using 'token' for case-sensitive matching and "token" for case-insensitive - a bit unobvious but quite elegant

I think this is a good idea; the current approach of case case normalisation isn't really a good one. I think I'll add a parser option to do this, eg. participle.CaseInsensitive("Keyword").

@beebeeep
Copy link
Author

I've been thinking about this for a while - yeah, looks like subj has too narrow scope; also have no other ideas, unfortunately, so I'd rather close issue.

@alecthomas
Copy link
Owner

I've added the CaseInsensitive() option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants