Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: lexer codegen should validate token identifiers are valid #202

Open
maxcleme opened this issue Nov 18, 2021 · 3 comments
Open

bug: lexer codegen should validate token identifiers are valid #202

maxcleme opened this issue Nov 18, 2021 · 3 comments

Comments

@maxcleme
Copy link

Hi,

I've been playing with Participle and I wanted to try to codegen feature, however the generated code seems to not escape some characters, leading to uncompilable source.

See the following example :

// ...
} else if match := match:(l.s, l.p); match[1] != 0 {
	sym = -47
	groups = match[:]
} else if match := match;(l.s, l.p); match[1] != 0 {
	sym = -48
	groups = match[:]
} else if match := match|(l.s, l.p); match[1] != 0 {
	sym = -49
	groups = match[:]
} else if match := match,(l.s, l.p); match[1] != 0 {
	sym = -50
	groups = match[:]
} else if match := match@(l.s, l.p); match[1] != 0 {
	sym = -51
	groups = match[:]
}
// ...

My rules used :

{
  // ...
  {":", `:`, nil},
  {";", `;`, nil},
  {"|", `\|`, nil},
  {",", `,`, nil},
  {"@", `@`, nil},
  // ...
}

As far as I can tell it comes from https://github.com/alecthomas/participle/blob/master/lexer/codegen.go#L251

While I do agree that I could rename my Rule name to make the whole thing works, I think it might be a good idea to come up with some strategy in order to avoid this when generating code.

I'm fine with writting a PR if you have a idea for this.

Cheers !

@alecthomas
Copy link
Owner

Ah nice catch, though they are not valid rule names as only identifiers can be matched by the parser. This is really a validation error, but a PR would be welcome to apply the validation.

@maxcleme
Copy link
Author

@alecthomas

Alright about validation, but introducing the validation will create some breaking changes, isn't it ?

@alecthomas
Copy link
Owner

No, because using anything but identifiers has never worked. This has never come up before, I assume because nobody has tried to do what you did.

@alecthomas alecthomas changed the title bug: codegen should escape some characters bug: lexer codegen should validate token identifiers are valid Dec 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants