Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do we have keywords? #463

Closed
michaelpj opened this issue Jan 8, 2019 · 9 comments
Closed

Do we have keywords? #463

michaelpj opened this issue Jan 8, 2019 · 9 comments

Comments

@michaelpj
Copy link
Contributor

Are syntax identifiers like lam, error etc. keywords?

  • The spec says no: there is nothing in the grammar that says they're invalid identifiers.
  • The parser says yes: you'll get an unexpected token error if you try to use one
    • echo "(program 1.0.0 (lam error (forall a (type) a)) error" | plc typecheck --stdin
    • Unexpected 'error' at 1:21
@kwxm
Copy link
Contributor

kwxm commented Jan 9, 2019

Well the spec says that there's no concrete syntax, which would tend to imply that there aren't any keywords either!

The grammar's so rigid that there's surely no danger of conflicts if we do allow re-use of keywords, but looking at the parser/lexer source it looks as it might require quite a lot of rewriting to do that, since it does have to treat 'keywords' specially when they appear in certain positions, but when you tell the lexer that they are keywords, it'll complain if it finds them elsewhere.

Another issue here is that now that we have extensible builtins we've kind of lost control of the grammar: some external person might unwittingly implement a new builtin called wrap for example, and run into problems. However, do we really expect anyone else to be using the concrete syntax?

@michaelpj
Copy link
Contributor Author

Well, in this instance it bit me because I had a generated AST with error as a variable name, which I then pretty printed and attempted to feed into plc typecheck. That didn't go so well. However, I have been trying to preserve the property that my generated ASTs have valid names so I can do this sort of thing, so if we want to keep these as keywords I'll just have to update my name-mangling function to handle them.

@vmchale
Copy link
Contributor

vmchale commented Jan 9, 2019

Modifying the lexer/parser would get us away from standard methods - this is pretty much exactly what you do in other languages. But it would be possible if we really want compliance with the spec, I guess - it seems to me a little weird.

@kwxm
Copy link
Contributor

kwxm commented Jan 9, 2019

Easy solution: keywords like (con, (lam, (wrap,... !

Is that the sound of distant screaming I hear?

@BekaValentine
Copy link

I think the easiest solution is to just make con, lam, etc. into keywords. This also has the additional benefit of eliminating potential confusion. If you can have a variable named lam or something to that effect, then it might be a security vulnerability, b/c at least in passing, these two things look awfully similar:

(lam x x) and [lam x x]

I'm sure that some clever person could exploit that to their advantage in writing malicious obfuscated code.

I propose we make all the relevant things into keywords and any case where the generated ASTs would try to use those names, we just append some unique number or an underscore or some junk like this.

@vmchale
Copy link
Contributor

vmchale commented Feb 7, 2019

Can I close this? Or do we want to note this somewhere in the spec?

@michaelpj
Copy link
Contributor Author

If we have keywords then IMO that should be in the spec. I don't buy that there is "no concrete syntax" - the figure says "lexical grammar of Plutus Core" which sure sounds like it's telling us how to tokenize it!

IMO builtin names should also not be keywords, but that would be a parser change. We know that we're getting a builtin name from context, so we can just parse an arbitrary identifier, try to map it to a builtin, and fail if it's missing.

@michaelpj michaelpj assigned kwxm and unassigned BekaValentine Mar 22, 2019
@kwxm
Copy link
Contributor

kwxm commented Mar 22, 2019

I'm just in the process of trying to get the spec sorted out, so I'll deal with this shortly.

@kwxm
Copy link
Contributor

kwxm commented Apr 2, 2019

I've now modified the spec to say that con, lam, etc. are all keywords (it's not merged yet though). I agree that names of builtins shouldn't be keywords, but that shouldn't be a problem now that we always have to prefix them with the builtin keyword everywhere. Also, we'll have to modify the parser anyway when we get the new infrastructure for extensible builtins. I'm going to close this issue but I'll add a note to the one for extensible builtins.

@kwxm kwxm closed this as completed Apr 2, 2019
@kwxm kwxm mentioned this issue Apr 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants