New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explosive birth of automatic casts! #6074

Merged
merged 1 commit into from May 8, 2018

Conversation

@asterite
Contributor

asterite commented May 8, 2018

What

This PR lets the compiler match number literals against type restrictions that don't exactly match the expected type, but since the literal's value fits the type, it is automatically cast.

Additionally, a similar behaviour is provided to specify enum values by using a symbol.

All of this works as long as the call doesn't become ambiguous. The implementation was a bit tricky, but finally, and amazingly, it worked.

For a quick understanding of the first part (numbers), you can read what's being requested in #2995. With this PR, all of those snippets now compile.

But I'll write a few snippets here.

Method type restrictions

Numbers

def foo(x : Int8)
end

foo(1)    # OK, same as passing 1_u8, because 1 fits Int8
foo(1234) # Error
def foo(x : Float64)
end

foo(1) # OK, same as passing 1.0

Symbols -> Enums

enum Color
  Red
  Green
  Blue
end

def paint(color : Color)
end

paint(:red) # OK, same as passing Color::Red
paint(:yellow) # Error

Instance variables

class Foo
  @x : Int8

  def initialize
    @x = 0 # OK, because @x's type is specified, and 0 can pass as 0_u8
  end 
end

Class variables

class Foo
  @@x : Int8 = 0 # OK
end

Local variables

x : Int8
x = 0 # OK

Method default values, with restriction

enum Color
  Red
  Green
  Blue
end

def paint(color : Color = :red) # OK
end

paint # OK

Combined

record Vector, x : Float64, y : Float64, z : Float64

# Works because record will use those restrictions in initialize
Vector.new(1, 2, 3)

Some examples with the current standard library

File

file = File.open(__FILE__)
file.seek(-10, :end) # Same as passing IO::Seek::End

Process

Process.run("ls", output: :inherit) # Same as passing Process::Redirect::Inherit

String

# Well, this one doesn't work, it's missing the type restriction in the method, but it's super easy to fix
"Súper".upcase(:ascii) # => "SúPER"

What if it's ambiguous?

def foo(x : Int8)
end

def foo(x : Int16)
end

foo(1) # Error: ambiguous call matches both Int8 and Int16

When it doesn't work

def foo(x : Int8)
end

foo(rand < 0.5 ? 1 : 2) # Error

All of the above works only when passed directly to a method, or assigned directly to an instance/class variable. When putting intermediary expressions, or intermediary assignments, it stops working. But I think that's OK. We simply must document these rules. I also believe for example in Go you can pass number literals like that, but of course it stops working when you introduce intermediate expressions too.

Why

I can easily say this might be one of the most wanted features in Crystal. Having to type those number suffixes for no real reason, when the compiler could figure this out, or having to type long enum values, is super tedious, and kind of goes against Crystal's philosophy.

Converting symbols to enums might be controversial.

The important thing to understand here is that this is all sound and type safe. If I pass a symbol to a method expecting an enum, and the name can be matched (currently underscore is being invoked on both ends), then there's simply no confusion: the symbol must mean that enum member. If there's a typo in the symbol you get an error, exactly like what happens with the full enum member. The big advantage is that you get to type less, and also read less redundant information.

With the rule for numbers you can more easily write numeric and scientific code, and also code for games.

With the rule for symbols -> enum, one is actually more encouraged to use enums, because using them in APIs, when having to choose one option between many, one can still pass a symbol (but you'll have to type the full enum value if choosing the value dynamically, but that's as bad as now).

How

Implementing this was tricky.

First I started with number literals. My first idea was to give them a different type that would be compatible with other number types if the literal's value fits. That worked, but then this happened:

def foo(x : Int8)
end

def foo(x : Int16)
end

def foo(x : Int32)
end

foo(1) # Now matches all of them and ends up calling the first one

So I thought I could detect the ambiguous case, and that didn't work at all, because foo(1) still matches all overloads.

The solution then came: first try to match methods the old, usual way. If that matches, then all is good and should work like before. If no match is found, only then try again with this special types for literals. If just one matches (or, well, multiple overloads can match, but all of them must match against the same integer type... think about union types in other positions and how that can create a multiple dispatch), then we are good to go. If more than one matches (against different integer types), then it's an ambiguous call.

This is not the most efficient way to do it because when the rule applies, we end up doing two method lookups instead of one. But I think the number of places in a program where this will be used will be much less than the total number of calls where this rule won't be used. And, in any case, this just adds a small performance cost but compilation speed isn't important at this point.

After that, adding the symbol -> enum rule was simple.

Then I had to make sure the rule worked in all of the cases I documented above, which was a bit tedious, but it finally worked.

Fixes #2995
Fixes #6067

@asterite asterite self-assigned this May 8, 2018

@S-YOU

This comment has been minimized.

S-YOU commented May 8, 2018

This is great! Now C bindings code more natural to write.

x
end
foo(2147483647_i64)

This comment has been minimized.

@bew

bew May 8, 2018

Contributor

Why does this work?
I would expect that if the literal is explicitely typed, that type should be honored

This comment has been minimized.

@asterite

asterite May 8, 2018

Contributor

Yeah, maybe that shouldn't work, but I don't know, there's no harm in casting the value.

Plus, right now there's no concept of a number that's explicitly typed. Or, well, there's no difference between 1 and 1_i32, both parse to exactly the same thing, and are handled in the same way in the compiler. So maybe we should start making that distinction and only apply the rule for values that don't have a suffix...

It should be easy to implement, I might try that tomorrow. But I don't think this particular thing should block the entire feature from being merged.

This comment has been minimized.

@S-YOU

S-YOU May 8, 2018

Seems okay as long as type on a method or C fun is honored.

@sdogruyol

This comment has been minimized.

Member

sdogruyol commented May 8, 2018

Welcome back @asterite 🎉 🕺

@RX14

This comment has been minimized.

Member

RX14 commented May 8, 2018

I have to admit, I danced around the room a lot when I saw this pr.

My only concern is syntax: reusing symbol syntax for enum shorthand vs keeping type syntax and just omitting the full path to the enum type. I'm conflicted. I think a newcomer would see symbols being "converted" to an enum and be confused.

@bararchy

This comment has been minimized.

Contributor

bararchy commented May 8, 2018

@RX14 I too was weirded out by the symbols, but paint(Red) makes you think about name spacing issue

@asterite

This comment has been minimized.

Contributor

asterite commented May 8, 2018

@RX14 Yeah, that's the only concern about symbol -> enum: confusion. Maybe we can have a separate syntax... or we could simply remove symbols from the language and just use them for this.

But given that there's no ambiguity with this syntax, maybe it's fine.

Well, there's the "ambiguity" of having a method receiving both a Symbol and an Enum, but in that case the Symbol overload wins, of course. But I don't think anyone will write such code...

The other proposal about using Red instead of :red did have issues as Red could be resolving to an existing type other than the enum member.

And then, in Ruby it's very common to pass a symbol as an option. "FOO".downcase(:ascii) is also a thing in Ruby, and now we can also have that nice functionality and syntax in Crystal too... Except that in Ruby this even works when passing the symbol through a variable. And maybe that will confuse users, and I can imagine questions about this popping all the time in the future. But if it's well documented, you point them to the docs, they read it, and done. Plus it simply won't compile when passed through a variable, so there's no harm other than "Why do I get this error now? Oh, I see, got it".

@RX14

This comment has been minimized.

Member

RX14 commented May 8, 2018

My preference would be for Type style instead of :symbol style if we can sort out the conflict issues. I like how it's a direct reduction of the enum syntax instead of moving to another syntax (i'd rather not overload symbol syntax).

@asterite

This comment has been minimized.

Contributor

asterite commented May 8, 2018

Yeah, Type is another option, but it can get confusing if Type is also declared as a constant somewhere, and it happens to have the value of an enum member.

Plus I wouldn't know how to implement it. To solve a call, all of its arguments must be typed. But if Type doesn't exist in the current scope, you'll get an error before even trying to lookup method matches. Then we have to somehow give Type some type (but only if it's in a call position, or in an assignment), delay the error, still do the lookup, and if that doesn't match an enum type we must tell the user that Type isn't a type. And I'm not even sure that will work (as I said, I have no idea how to implement this).

Another option is to do something similar to what Swift does: prefix it with a dot. So:

"foo".upcase(.Ascii)

The type of .Ascii will be some kind of EnumMemberType that would match an enum type that has that member...

But I don't know, I still prefer the symbol syntax.

@asterite

This comment has been minimized.

Contributor

asterite commented May 8, 2018

Cool, this PR also makes the following work (maybe still a bit controversial for enums, but type safe 😛):

a = [] of Float64
a << 1
a << 2
p a # =>  [1.0, 2.0]

enum Color
  Red
  Green
  Blue
end

b = [] of Color
b << :red
b << :green
b << :blue
p b # => [Red, Green, Blue]

h = {} of Color => Float64
h[:red] = 1
p h # => {Red => 1.0}

Hey, it even makes this work:

# All of these currently give an error
[1, 2, 3] of Float64
[1, 2, 3] of UInt8
{1 => 2} of Float64
Set(Float64){1, 2, 3}

That's because the code is expanded to a series of calls.

And of course...

enum Color
  Red
  Green
  Blue
end

[:red, :green, :blue] of Color

This feature is definitely more powerful than the snippets I wrote in the beginning of this PR.

@RX14

This comment has been minimized.

Member

RX14 commented May 8, 2018

@asterite I assume swift chose it's syntax because for swift EnumType.EnumMember is it's enum syntax. For crystal, the equivalent to .EnumMember is ::EnumMember. But that syntax is taken so :EnumMember make sense. And making it case insensitive is a bonus.

So yeah, if using Type syntax is that much more effort, than symbol syntax is fine with me. Although removing symbols would make this less ambiguous (we can discuss that later, along with how to implement named tuples if we remove symbols)

I love all of those snippets @asterite!

@Sija

This comment has been minimized.

Contributor

Sija commented May 8, 2018

IMO there's way less ambiguity with :symbol notation than Type (and looks better :)).

@straight-shoota

This comment has been minimized.

Contributor

straight-shoota commented May 8, 2018

I like the idea to use a uppercase symbol and only match the exact same name as the enum member (case sensitive, camel case): :Symbol. It's syntactical equivalent to :symbol as already implemented in this PR, but makes it easier to discover enum member references by plain view (regular symbols are typically lower/snake case) and it's easier to understand that just the first part of the type path is omitted til the last colon.
That's pretty much identical to what Swift does, in Crystal the type path separator is just : instead of ..

@paulcsmith

This comment has been minimized.

Contributor

paulcsmith commented May 8, 2018

I like the symbol syntax with the same casing as the enum. Makes it super clear this is no "ordinary" symbol :) But I could also be ok with ordinary snake_case symbols as well. Either one is a huge improvement IMO.

A few ideas on making it easy for people to figure out what went wrong when using a symbol:

  • Show a "did you mean? :Red" when you misspell an enum but it is close to matching one.
  • If they pass a Var (or something else) instead of a SymbolLiteral: "You passed #{what_they_passed}, but you must pass a symbol literal or enum type for the '#{argument_name}' argument. Example: #{first_symbolized_enum}"

I think this would help eliminate some of the confusion that will surely come up and will stop people from banging their heads too long on typos

@ysbaddaden

This is perfect to me. Casting literals is great, but casting symbols as enum values too is 😍

Maybe just the autocast of an explicitly typed literal is a little surprising, but that can be changed later (or not).

@Sija

This comment has been minimized.

Contributor

Sija commented May 8, 2018

Let's merge, merge, merge! 🎉

end
end
raise "Bug: expected to find enum member of #{to_type} matching symbol #{symbol}"

This comment has been minimized.

@paulcsmith

paulcsmith May 8, 2018

Contributor

Some ideas for this error message: #6074 (comment)

Could easily be done in a future PR if you'd prefer to get this in ASAP

@RX14

This comment has been minimized.

Member

RX14 commented May 8, 2018

What happens in the case of

enum Foo
  Foo
  FOO
end

what does :foo match?

This, and readability, makes me want to make the symbols case sensitive.

@S-YOU

This comment has been minimized.

S-YOU commented May 8, 2018

enum Foo
  Foo
  FOO
end
I would prefer FOO automatically cast to Foo, and Foo and FOO should raise already declared error. EDIT: Nevermind, FOO and f_o_o can be confusing, I take my words back. (means no idea now)
@asterite

This comment has been minimized.

Contributor

asterite commented May 8, 2018

@RX14 Note that we already define query methods for enums, so there's already a conflict in that snippet (foo? will probably return true for FOO and not for Foo). We can probably disallow enum members whose underscore representation turns out to match that of previous enum members.

After that, there's no conflict about using the underscore version or the equal fersion. We still have:

module Color
  Red
  Green
  Blue
  CoolOne
end

color = Color::Red
color.red? # => true
color.cool_one? # => false

So if we have red? and cool_one? I see no problem in having :red and :cool_one for symbols. And I find underscore symbols like that easier to read and write.

We could go with :Red, and :CoolOne, I don't know... I don't find them as cute. Lower case symbols are very common in Ruby and I'm just used to that. We can have almost the same in Crystal too.

A few ideas on making it easy for people to figure out what went wrong when using a symbol

This is not trivial to implement. Maybe even harder than doing this whole PR because it involves heuristics. There might be multiple overloads that didn't match, and then we' have to check if any of the restrictions solve to an enum (maybe the restriction also involves other types in a union), and so on. I'd like to leave this for a separate PR.

@paulcsmith

This comment has been minimized.

Contributor

paulcsmith commented May 8, 2018

This is not trivial to implement. Maybe even harder than doing this whole PR because it involves heuristics. There might be multiple overloads that didn't match, and then we' have to check if any of the restrictions solve to an enum (maybe the restriction also involves other types in a union), and so on. I'd like to leave this for a separate PR.

That makes a lot of sense 👍

I'm so excited to see this and I agree underscores do look a lot better, but maybe that's because I've done so much Ruby/Elixir :)

@straight-shoota

This comment has been minimized.

Contributor

straight-shoota commented May 8, 2018

Lower case symbols are very common in Ruby and I'm just used to that. We can have almost the same in Crystal too.

That's why I'd like to have some slightly different visuals, to express that the argument is not just a regular symbol but in fact the short notation of an enum member.

@RX14

This comment has been minimized.

Member

RX14 commented May 8, 2018

OK i'm convinced about the problem already existing for the query methods, so there's no technical reason for :foo vs :Foo. The question is should we be trying to make them visually distinct from normal symbols or not? I don't really know any more so...

@asterite

This comment has been minimized.

Contributor

asterite commented May 8, 2018

The way I see it, symbols have very few use cases, enums are almost always better. And mixins APIs with symbols and enums is not common. In fact, I believe not a single public API method in the standard library uses symbols, at all. So I don't see a conflict of using symbols for this.

Again, we could go as far as removing symbols from the language, though that might not be necessary.

@asterite

This comment has been minimized.

Contributor

asterite commented May 8, 2018

I'm starting to like the idea of matching the symbol with the enum name. It means there's only one way to match the enum member, and also it's easier to search the enum from the member's name. It should also be just slightly faster to match in the compiler. Plus it does look like a shortcut form of the enum member.

But then, I'd like to get rid of the enum question methods, because they use the "undercase" variant. It would be nice to always have to use the member name.

Using == doesn't work because == is defined for all objects so a symbol will pass.

This works though:

struct Enum
  def is?(other : self)
    self == other
  end
end

enum Color
  Red
  Green
  Blue
end

color : Color
color = :Red
if color.is?(:Red)
  puts "Red"
end

case color
when .is?(:Red)
  puts "Red"
when .is?(:Blue)
  puts "Blue"
when .is?(:Green)
  puts "Green"
end

But maybe that's too much to type? Maybe it's fine. Or maybe people will write :Red and get unexpected results...

We could hardcode a rule for case where if the variable is of type enum then :Red must surely mean the enum member. But then it probably won't work if the variable is a union having that enum type...

Or we could just leave the current question methods...

@paulcsmith

This comment has been minimized.

Contributor

paulcsmith commented May 8, 2018

I personally think that isn't much more to type and is far less "magical". It's incredibly clear 👍

@RX14

RX14 approved these changes May 8, 2018

@oprypin

This comment has been minimized.

Contributor

oprypin commented May 8, 2018

Hmm, not so fast

This is not the most efficient way to do it because when the rule applies, we end up doing two method lookups instead of one. But I think the number of places in a program where this will be used will be much less than the total number of calls where this rule won't be used. And, in any case, this just adds a small performance cost but compilation speed isn't important at this point.

@oprypin

This comment has been minimized.

Contributor

oprypin commented May 8, 2018

I mean it may affect the compilation speed of the compiler. It may not be that big of a deal, but also changing to use the new syntax is not a big deal. Maybe it is worth doing only if the effect of it is taken into account.

@RX14

This comment has been minimized.

Member

RX14 commented May 8, 2018

I continue to believe that the solution is vastly reducing the amount of code we have to type (instantiations), not reduce the constant factor of each instantiation.

I don't think this PR will make a serious difference to compile times. If I'm wrong (someone can make a PR which uses this compiler change throughout the stdlib/compiler and measure it) we can revert it.

@asterite

This comment has been minimized.

Contributor

asterite commented May 8, 2018

Yeah, I don't think this will have a real impact, or be the bottleneck. Right now method instantiation and method body typing is what takes most of the time, method lookup should be pretty fast as it's a hash lookup and possibly a few type restriction matches, which should also be fast. And very little memory is allocated there compared to when a method is instantiated and all type bindings are created.

@RX14

This comment has been minimized.

Member

RX14 commented May 8, 2018

@asterite perhaps it'd be better just to not mention imperceptible performance changes, and only mention performance where you think it's an important topic, because it can be assumed that every feature will have some performance impact.

@asterite asterite deleted the feature/automatic_cast branch May 8, 2018

chris-huxtable added a commit to chris-huxtable/crystal that referenced this pull request Jun 6, 2018

@robacarp robacarp referenced this pull request Jun 11, 2018

Closed

Get ready for Crystal 0.25.0 #226

4 of 4 tasks complete
@Sija

This comment has been minimized.

Contributor

Sija commented Jun 21, 2018

Would be sweet if autocasting would also work in cases below:

require "logger"

def foo?(level : Logger::Severity)
  # ...
end

# works
foo? :info

# doesn't work & shouldn't
foo? :nonexistant

# doesn't work & should
foo? true ? :error : :info
foo? x = :error

Instead, it fails with:

Error in line 11: no overload matches 'foo?' with type Symbol
Overloads are:
 - foo?(level : Logger::Severity)

If not possible, it would be helpful to tweak compile error message at least, to something more informative and less misleading than the current one.

@ysbaddaden

This comment has been minimized.

Member

ysbaddaden commented Jun 22, 2018

I said 💩 . Don't mind me.

@luislavena

This comment has been minimized.

Contributor

luislavena commented Jun 22, 2018

@Sija from the OP:

All of the above works only when passed directly to a method, or assigned directly to an instance/class variable. When putting intermediary expressions, or intermediary assignments, it stops working. But I think that's OK.

The limitation was expressed in the OP, there is lot of complexity in trying to achieve that (also mentioned the current complexity of this).

Cheers.

@RX14

This comment has been minimized.

Member

RX14 commented Jun 22, 2018

The problem with the error messages is that we're using the same syntax for two ideas: symbols and enum values. I'd still like to discuss ways of removing symbols, to make the :enum_member syntax unambiguous, which would allow better error messages.

I think really the main blocker for removing symbols is the design of named tuples...

@asterite

This comment has been minimized.

Contributor

asterite commented Jun 22, 2018

Let's remove symbols, and make named tuples work only with strings.

@vladfaust

This comment has been minimized.

Contributor

vladfaust commented Sep 3, 2018

Hey @asterite, why don't you create a discussion for removing symbols and leave them for Enums only? I'm sure you're able to bullet some strong points.

@j8r

This comment has been minimized.

Contributor

j8r commented Sep 3, 2018

@vladfaust the point of removing the Symbol type is... to remove it totally, I don't see why keeping it specially for Enums.

@Sija

This comment has been minimized.

Contributor

Sija commented Sep 3, 2018

@j8r point is to have a shorthand for specifiying enum members without a need to pass whole path (:baz instead of Foo::Bar::Baz).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment