Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix edge cases with unicode method names #10978

Merged

Conversation

HertzDevil
Copy link
Contributor

Fixes #10970, where 😂 is considered an operator:

macro foo(x)
  {% p x %}
end

# Before
foo x.😂(1)  # => x 😂 1
foo x.😂 = 1 # => x 😂= 1

# After
foo x.😂(1)  # => x.😂(1)
foo x.😂 = 1 # => x.😂 = 1

Also fixes some other places where setter methods are currently detected via ascii_letter?:

  • Fixes parser failures with the following, as y.😂 is not recognized as a setter target inside a multi-assign:
    class Foo
      property 😂 : Int32?
    end
    
    y = Foo.new
    x, y.😂 = 1, 2
  • Fixes formatter failures with the following, as they are not considered setters:
    x._1 = 2
    x.😂 = 1

@straight-shoota straight-shoota added kind:bug A bug in the code. Does not apply to documentation, specs, etc. topic:compiler:parser labels Jul 21, 2021
@@ -3134,14 +3134,24 @@ module Crystal
Slice.new(@reader.string.to_unsafe + start_pos, end_pos - start_pos)
end

def ident_start?(char)
def self.ident_start?(char)
char.ascii_letter? || char == '_' || char.ord > 0x9F
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be ascii_lowercase? instead of ascii_letter??

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally yes. I think the lexer has already ruled out upper case letters as constants at any point where that might be relevant, though. So there should be no practical difference.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or... maybe not: 😆

"".@FOO # Error: can't infer the type of instance variable '@FOO' of String

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we might want to tighten that down. Probably better in a follow-up. char.ord > 0x9F looks also very permissive.

Copy link
Contributor Author

@HertzDevil HertzDevil Jul 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uppercase instance variable names are actually allowed in Ruby. Here are the other direct uses of ident_start? in the lexer:

  • Reading a symbol literal. :FOO is definitely a valid symbol.
  • Reading a percent string literal. This is probably acceptable, because %Q() is a valid string literal.
  • Reading a global variable. The only ones that can be declared are lib external variables, and the parser rejects ones starting with an uppercase letter.

Uppercase def names are also allowed in Ruby. Even in Crystal they are allowed for lib funs, so a Call node with an uppercase method name is not necessarily invalid. I don't think we need to do anything here.

Copy link
Member

@beta-ziliani beta-ziliani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@beta-ziliani beta-ziliani added this to the 1.2.0 milestone Sep 6, 2021
@straight-shoota straight-shoota merged commit c47fd24 into crystal-lang:master Sep 7, 2021
@HertzDevil HertzDevil deleted the bug/unicode-identifiers branch September 8, 2021 08:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:bug A bug in the code. Does not apply to documentation, specs, etc. topic:compiler:parser
Projects
None yet
Development

Successfully merging this pull request may close these issues.

"undefined method" for unicode method name with arguments
4 participants