Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exhaustive case #8424

Merged
merged 29 commits into from
Mar 27, 2020
Merged

Exhaustive case #8424

merged 29 commits into from
Mar 27, 2020

Conversation

asterite
Copy link
Member

@asterite asterite commented Nov 2, 2019

Part of: #8001
Related: #4837, #4870

This PR is the initial step towards having case be exhaustive.

What that means is that the compiler must be able to prove that all when cover all cases of the expression given to case, or otherwise an else must be given.

As an initial step, if the compiler can't prove that all when are covered a warning will be issues. This will help the transition of shards authors. The warning will eventually become an error.

The compiler will be able to prove that all types in a union or single type are covered (not including inheritance because inheritance is open). It can also prove that all enum members of a given enum are covered.

Some things missing:

  • don't include Flags enums in the check
  • allow mixing enums without other types in the check
  • check exhaustiveness of case over a tuple literal
  • improve the warning message

but I didn't want to spend time and effort on those things until we all agree that this is the direction we want to take.

An example of how it works:

a = 1 || 'x' || "foo"
case a
when Int32
  # ...
when String
  # ...
end

In the example above the compiler will issue a warning saying that the case is not exhaustive and Char is missing to be covered.

Note that right now if you cover all cases the compiler won't issue a warning but there will still be an implicit else nil, but eventually that will change to else unreachable of some sort.

Then this PR makes all case be exhaustive by adding an else clause when needed. While doing that I felt:

  • some else were just "yeah, I know I didn't want to cover this case" (but what about someone other than me? would they know?)
  • some else made me stop and think "hm, why was this left out?"
  • some else I knew I ignored in the past, but this time around I couldn't remember why I did that and I left a "TODO" to check it
  • some else were missing a few types or enum cases and I added them for clarity
  • some code were using symbols and I decided to change them to enum so the compiler could check I covered all cases
  • I changed a few case to if when I considered it was more readable/clear

It was a bit tedious to add all the else in one go, but doing it as you code is probably not tedious and it can actually be helpful. I find that it's a bit tedious that the compiler will ask you to write an else unreachable for things that it knows can't be reached, so the state before this PR also has some bad things to it.

There were around 250 warnings in this entire repo. Considering that cloc says there are 218466 lines of pure Crystal code (excluding comments), it's 1 warning per 874 lines of code, which I think isn't that bad. Also 30 of those warnings were in the lexer when scanning identifiers, so the distribution might actually be like 1 warning per 1000 lines of code (well, it depends on how often you use case... let's say it depends on the case ;-)).

I also wanted to see what warnings I needed to fix for a couple of shards, so I chose shards, crystal-db and crystal-pg. To my surprise, no warnings were issued, which is a good sign.

But please, if you can, try the compiler in this PR against your project and see if the warnings make sense and are generally good to have.

@asterite
Copy link
Member Author

asterite commented Nov 2, 2019

Also thinking a bit more about this, I asked myself why make case be exhaustive but not if, forcing an else? I think one answer might be that with case you are potentially covering a lot of cases and you might forget one. With an if it's just the truthy case and the falsey case and so it's easier to analyze or understand why an else was left out.

@jhass
Copy link
Member

jhass commented Nov 2, 2019

How does this handle something like

a = 1 || "foo" || :bar 
case a
when Int32
  puts a*2
when "foo"
  puts "yay"
end

?

@asterite
Copy link
Member Author

asterite commented Nov 2, 2019

@jhass the compiler only checks types and enum members, never other values. So in that case you'll have to add an else, even if you cover "foo", :bar and Int32.

@jhass
Copy link
Member

jhass commented Nov 2, 2019

And that's why I feel having a distinct construct that can only have types xor enum members in its "when" branches wouldn't be so bad...

@ysbaddaden
Copy link
Contributor

Isn't that a little too broad? Enforcing exhaustiveness when it can be formally verified is great, maybe boring, but good nonetheless since it can avoid else raise unreachable branches and help catch bugs down the road (adding type to union, value to enum) but maybe it could be skipped for unverifiable cases, because it replaces raising branches for empty else branches for all mixed cases, which kinda defeats some of the purpose.

I'm not advocating for another keyword or construct. I may even be worrying about something barely existent.

@jhass
Copy link
Member

jhass commented Nov 2, 2019

I'm not advocating for another keyword or construct. I may even be worrying about something barely existent.

I am because to me case really is a more convenient way of writing an if/elsif chain, it's "do something in case Y is true for X". But "Do something for each variant of X" is quite a different concept actually.

@asterite
Copy link
Member Author

asterite commented Nov 2, 2019

Well, if you all look at my original proposal in #8001 I said I prefer a new keyword or a new construct.

But then:

So so far me, @straight-shoota, @RX14 and @bcardiff agree with this change, and that's a big part of the core team.

That said, we have a slightly different option, inspired by the new pattern matching introduced in Ruby 2.7.

The trick is to avoid introducing a new keyword (thus introducing a breaking change) by using something other than when. Ruby uses in, but I'd like to not use that in case we eventually support something similar to pattern matching. Instead, we can use of or maybe another keyword.

So, the idea is:

# the case we all use and love
case exp
when ...
when ...
end

but:

# the exhaustive case
case exp
of T1
of T2
end

This new case ... of would only allow types after of, or symbols or call? for enum members. So it's more limited than the general case, but it makes sense because it's limited to types and enum members.

However, I'm not entirely sold on this. My main concern is that sometimes you want to mix types, values and so on, and still avoid having to remember to put an else. That is, I think that requiring an else is a bit more boring but leads to code that's easier to understand (why did they omit the else clause? no idea; but with a comment it can be clearer. Of course nobody is forcing you to use a comment but it can be a good practice). And forcing an else forces the programmer to think about this case. Many times I just said "meh, I'm sure I don't care about this else" but when the compiler forced me to add it I didn't want to leave it empty and unclear, so I either added a comment or an explicit value or logic.

Another concern is that it's not a new keyword that can conflict with others, but you do have to learn something new.

The way it's proposed in this PR is very simple:

  • the compiler should be able to prove that case covers all cases
  • if that's not the case, an else is required

Also think about this not as a change to the language but as if it were like that from the beginning. That is, let's think about 1.0 as something final and ready to be used, and not as a cumulative changes between 0.x and 1.0.

@asterite
Copy link
Member Author

asterite commented Nov 2, 2019

Oh, I forgot to include the pun I had in my head for like half an hour until I got to the computer: I think this conversation is pretty much exhausted.

@vlazar
Copy link
Contributor

vlazar commented Nov 2, 2019

The trick is to avoid introducing a new keyword (thus introducing a breaking change) by using something other than when. Ruby uses in, but I'd like to not use that in case we eventually support something similar to pattern matching. Instead, we can use of or maybe another keyword.

# the exhaustive case
case exp
of T1
of T2
end

Can it be

# the exhaustive case
case exp
with T1
with T2
end

I like with for the same width 😉

@asterite
Copy link
Member Author

asterite commented Nov 2, 2019

More data. I tried it in:

  • kemal: just 1 warning reported, and I think it's good because it's not clear why the else part was ignored, or maybe it should have been an error, I don't know
  • invidious: 25 warnings in 10166 lines of code, so 1 warning per 406 lines of code. And most, if not all, of those warnings made sense to me: it was case over some strings that were received as query params, and when they didn't match an expected value the value was simply ignored. I think it would be better to return "Bad request" in those cases. In other cases an else was missing and it was not clear why it was left out, where other values could be possible too.
  • mint: 44 warnings in 18235 lines of code, so 1 warning per 414 lines of code. By the way, this is also a compiler. The else that were left out, I don't know why it's like that. Most of them just have one when so an is_a? might have been better. But it's still not clear why an else wasn't handled (maybe raise, or make it clearer).

So my conclusion is that:

  • non-exhaustive case happens, but it's not that common (once every 400 lines)
  • it gives you an opportunity to think about the else case
  • it makes it clearer to the reader what happens in the else case because it's always there (I know you can always leave it empty or put nil, but it feels like cheating, and we can be sure the user at least was noted about a missing else)
  • it's very simple to fix

Copy link
Contributor

@RX14 RX14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is amazing! Thank you so much!

# we must cover.
# Note: a user could override the meaning of such methods.
# In the future it would be wise to mark these as non-redefinable
# so this checks are sounds.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enum constructors should reject invalid values by default too, of course.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this mean?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oooh... I see. Yes, probably.

@@ -6,47 +6,47 @@ describe "ECR::Lexer" do
lexer = ECR::Lexer.new("hello")

token = lexer.next_token
token.type.should eq(:STRING)
token.type.should eq(ECR::Lexer::Token::Type::String)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weird idea, perhaps introduce .should be(&.string?) to the DSL?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'd like something like that but we need to think how to implement it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just be(& : T -> Bool)??

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I don't understand how would that work. Also with & it can't work, it'll need to capture the block, but there's no way to figure out what the T is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ohhh, yeah, you're right...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up doing a trick used by @bcardiff in one of his PRs: use a file-private method to convert a symbol to enum. I think it's a good workaround, and usually you only need this in a spec file.

@@ -207,12 +209,16 @@ module Crystal

def source_lines(filename)
case filename
when Nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when nil should be the same as when Nil to the exhaustiveness checker.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, done!

@asterite
Copy link
Member Author

Let me know if you'd like me to move forward with this. /cc @bcardiff @straight-shoota @waj

@RX14 RX14 marked this pull request as ready for review November 18, 2019 18:30
@RX14
Copy link
Contributor

RX14 commented Nov 18, 2019

(whoops, thought that was the button to add my review)

@bcardiff
Copy link
Member

I do prefer an explicit construct for exhaustive checks.

Having a separate syntax could:

  • allow an easier grasp of the original intention
  • we can omit the current flexibility like when .m? or when e1, e2
  • include sooner support of the main cases: Enum, Union, Type hierarchy
  • allow future improvements like pattern matching like as long as the exhaustivity can be checked

My opinion on #8001 (comment) was to avoid different semantics on case/when depending on the type of the condition.

@asterite
Copy link
Member Author

I do prefer an explicit construct for exhaustive checks.
allow future improvements like pattern matching like as long as the exhaustivity can be checked

Sounds good.

What do you think if we use the same syntax as in Ruby?

case type
in Foo
in Bar
end

But I don't know how we'll make it work for enums. Maybe we can allow this:

case value
in :foo
in :bar
end

and translate that to:

if value.is?(:foo)
elsif value.is?(:bar)
end

and implement Enum#is?(self)

But for now we can focus on the syntax. case ... in is what Ruby 2.7 uses for pattern matching.

@bcardiff
Copy link
Member

bcardiff commented Nov 19, 2019

case ... in sounds good. I would leave out introducing variables in the scope for now.

I don't follow the issue with enums. Whatever syntax is supported to express the expected value, I would not expect that to affect much. What am I missing?

@asterite
Copy link
Member Author

@bcardiff How do you propose expanding case ... in as a series of if/elsif? Right now when T is expanded to if exp.is_a?(T) which makes type flowing working.

So if we are going to go with the case ... in syntax I'd like to see a complete proposal: what kind of expressions are allowed, how does it work with enums, aretrue and false allowed, and what expansion (if any) happens.

@wooster0
Copy link
Contributor

wooster0 commented Nov 19, 2019

Why not is instead of in? Wouldn't that be much closer to what it's translated to? hello.is?(:hello).

@bcardiff
Copy link
Member

@asterite In the implementation aspect: I would not do an expansion. I would leave the case until the codegen phase. I think that if later we introduce more a pattern matching way this will be more comfortable than rewriting to if/elsif. So the check can be done with the (Exhaustive)Case AST. The codegen itself can be expressed in terms of the if/elsif methods if wanted.

Regarding the complete proposal itself I would scope it to:

case cond # cond : T for any T
in expr_1
   ...
in expr_2, expr_3
   ...
end

Note: else is not allowed.

All expr_i should be either

a. types
b. enum values (expressed as .value? or E::Value)

for (a), if T - Union(expr_i) != ∅ there is a compiler error.
for (b), if there is an enum value that is not mentioned, there is compiler error.

If (a) and (b) are mixed it is a compiler error (note that this may be detected far after parsing, since Foo::Bar can be either a type or a value).

If in (a) a hierarchy of types is used it would be ideal to check using the more specific one first, but otherwise, we can use the explicit order. So some cases might not be reachable.

WDYT? I think that covers the main use cases.

@asterite
Copy link
Member Author

Sounds good to me, but comments follow:

In the implementation aspect: I would not do an expansion. I would leave the case until the codegen phase.

Okay, but this sounds like making the compiler's code even more complex... but if we go this route, I'm not going to implement soon (no time these days for such a big task, sorry).

enum values (expressed as .value? or E::Value)

Oh, I thought we were going the symbol route, so you could do:

enum Color 
  Red, Green, Blue
end

color = ...
case color
in :red
  # ...
in :green
  # ...
in :blue
  # ...
end

but I guess using .red? works fine too (though I'd really like to remove those question methods and just have an is?(:red) method that would work the same.

If in (a) a hierarchy of types

I don't think we should check exhaustivity in hierarchy types because external code can add new types and that would mean it breaks existing code:

class Foo
end

class Bar < Foo
end

foo = FooFactory.build
case foo
when Foo
  # ...
when Bar
  # ...
end

# Oops, someone adds a new type, breaks the above code:
class Baz < Foo
end

What about true and false for Bool? Right now that's supported in this PR and this passes the exhaustiveness just fine:

crystal/src/http/client.cr

Lines 142 to 149 in 7ffc459

@tls = case tls
when true
OpenSSL::SSL::Context::Client.new
when OpenSSL::SSL::Context::Client
tls
when false
nil
end


Then @RX14 mentioned nil as something valid to filter out Nil (they are equivalent but I wouldn't mind supporting both).


Also, maybe we should discuss all of this in a separate issue? Not sure.

@straight-shoota
Copy link
Member

Also, maybe we should discuss all of this in a separate issue? Not sure.

Yes please.

@bew
Copy link
Contributor

bew commented Nov 19, 2019

Why not is instead of in? Wouldn't that be much closer to what it's translated to? hello.is?(:hello).

I like that!
Something like:

case cond
is Foo
  # code
is :bar
  # code
is false
  # code
end

Feeld really nice to read and understand imo

@RX14
Copy link
Contributor

RX14 commented Nov 20, 2019

We always agree stuff then someone implements it and then someone comes in at the last second and says "wait no throw that all away".

Can we not derail this feature again? I've been guilty of it too, but, it shouldn't happen.

@asterite
Copy link
Member Author

We always agree stuff then someone implements it and then someone comes in at the last second and says "wait no throw that all away".
Can we not derail this feature again? I've been guilty of it too, but, it shouldn't happen.

I agree.

Many of the new modern languages have a form of exhaustive case, and only that. I don't understand why we want to have two similar syntaxes if one can be enough. Adding else nil or else # comment isn't that terrible, in my opinion. Two syntaxes will bring more confusion and will make the compiler and language harder to use.

@ysbaddaden
Copy link
Contributor

I have a failing case in Minitest:

result.failures is an Array(Assertion | Skip | UnexpectedError), all types of the union are present, in the when branches, yet Crystal 0.34 complains...

@oprypin
Copy link
Member

oprypin commented Apr 7, 2020

For permalink, replace "master" with "b52e34584a68248e489aad9dfafd6235d4f3fce4" above.

@asterite
Copy link
Member Author

asterite commented Apr 7, 2020

@ysbaddaden It's working fine. When you do Assertion | Skip | UnexpectedError it just means Exception. According to the typing rules, when you mix different types under a same hierarchy, the resulting type is the base type.

That's not to say that it's a nice situation. But it's not a bug in the exhaustive check.

Essentially, you can replace Array(Assertion | Skip | UnexpectedError) with Array(Exception), for the compiler both are the same.

@RX14
Copy link
Contributor

RX14 commented Apr 7, 2020

Wouldn't this show up if you printed the types?

@asterite
Copy link
Member Author

asterite commented Apr 8, 2020

The error already tells you that Exception is not covered.

@ysbaddaden
Copy link
Contributor

ysbaddaden commented Apr 8, 2020

Urgh, really? I didn't care much about this change (near zero benefit) but now it's totally falling into the annoying, whiny, compiler (negative benefit). Whenever I write a case I'll probably have to add an else # shut up, crystal or keep else raise "unreachable", which defeats the whole purpose of exhaustive checks 😭

This is almost as annoying as being forced to specify the return type when implementing abstract methods. This is so boring and sometimes unacceptable, that I usually remove the return type annotation of abstract method —IMO abstract methods should just be informational, never enforced.

Overall, warnings are just annoying. I much preferred the "no warnings" stance.

I'm not fond of where this is all going. The nice & fun of Crystal is slowly eroding away 😞

@asterite
Copy link
Member Author

asterite commented Apr 8, 2020

@ysbaddaden I have an idea for not merging the types into the base type when there's a type annotation in instance or class variables, and that will at least don't force a useless else in that case.

Where there many places where you needed to add an else?

@ysbaddaden
Copy link
Contributor

ysbaddaden commented Apr 8, 2020

@asterite I'm checking some other projects, but so far, yes, everywhere I use a case I must specify an empty else branch. This is either because of A+ merges, or because I have lots of sub-classes (e.g. Node or Event) and I only ever care about a few of them in specific places, or use case because it improves readability.

I thought I could benefit from exhaustiveness checks in a few places, replacing Node with specific unions (if Node+ merging was to be corrected for exhaustive cases), but looking closer, I don't think so.

I admit I abuse case sometimes, because it greatly improves readability (not too often). For example:

if (value = node.value).is_a?(Something) # bad: too many nested actions on single line
end

case value = node.value   # much better (to me)
when Something
end

I also have lots of:

case something
when .foo?, .foo2?
when .bar?
end

Where I must now specify en empty else branch (or else raise "unreachable") or rewrite it with regular, but less readable elsif branches:

if something.foo? || something.foo2?
elsif something.bar?
end

So far, the change is just as even more awful as than I thought it would be.

@straight-shoota
Copy link
Member

Yeah, I found fixing these warnings a bit annoying, too. But to put that into perspective, it actually wasn't too frequent, or less than I would have expected. Obviously, we're currently seeing lot's of this because of the initial adaptation. This is going to ease down quickly.
I also found it quite helpful to be forced to think about "what happens else?". If you don't spray out else # shut up Crystal! but more articulated comments like else # skip for anything else, it also greatly improves understanding the ocode for others/later me. And that is IMO the major benefit for having required else even on cases where exhaustiveness can't ever be proven.

@ysbaddaden
Copy link
Contributor

ysbaddaden commented Apr 8, 2020

I just fixed a SDL project. Crystal wants to check the exhaustiveness of the keyboard symbols enum (236 symbols) and of the event types (23 classes):

loop do
  case event = SDL::Event.wait
  when Event::Quit
    exit
  when Event::Keyboard
    case event.sym
    when .q? then exit
    when .s? then save_screenshot
    else    # skip 234 symbols
    end
  else
    # skip 21 event types
  end
end

So far, my impression is confirmed: unless you specifically program for exhaustive checks, you will be annoyed by the warnings, and forced to write an empty else to turn 'em off (or ignore them).

@asterite
Copy link
Member Author

asterite commented Apr 8, 2020

Well, if you take a look at my original proposal in #8001, I proposed using a different keyword, case!, for this. But nobody liked it... only because case! looks weird.

I still think having two constructs is better, and we have time to change this... if everyone else agrees. I personally think case! looks more assertive, so it being exhaustive makes sense to me.

@asterite
Copy link
Member Author

asterite commented Apr 8, 2020

Also case! could have been implemented in one go, already giving an error, because such construct didn't exist before. Also, it wouldn't cause warnings or breaking changes, exactly for the same reason.

I know case! looks weird, no other languages have such keyword. Does it matter? Do we always need to copy other languages? Look at SQL: "alter table foo convert to character set utf8". So many keywords stringed together, yet nobody complains that "oh, it's weird to have a keyword with spaces".

@j8r
Copy link
Contributor

j8r commented Apr 8, 2020

It would be a feature to looks weird, like not_nil! - this can discourage the use of it. ! is an exclamation mark at the end, usually used for warnings in the real world.
That's why I proposed it for non-exhaustive check instead.

@ysbaddaden
Copy link
Contributor

Sorry for the rant. I'll see whether it's boring to use compared to annoying to fix. In Rome, for example, it's not that bad. There are a bunch of Symbol|String cases and it works properly 👍

I got another false positive, thought, this time while iterating an Array(Tuple(Symbol, Symbol) | String): https://github.com/ysbaddaden/rome/blob/master/src/adapter/adapter.cr#L255-L265 (can't prove case is exhaustive).

The SDL example was expected to make the feature look awkward: a human wouldn't consider so many missing classes and enum symbols, but how should a computer program deal with ithis? It would be awkward and confusing to use arbitrary ratios (ratios for my example are 8.3% and 0.85%).

I still stand by my original comment: it's too broad. The compiler shouldn't whine when it can't prove exhaustiveness. That's a job for a linter tool, which is meant to complain. Otherwise it's annoying.

@asterite
Copy link
Member Author

asterite commented Apr 8, 2020

That tuple syntax shouldn't work. I'm surprised it works at all. The special tuple syntax only works if you specify a tuple literal in the case expression. In your case, you can use Tuple(Symbol, Symbol) and it should work fine and the compiler shouldn't complain.

@asterite
Copy link
Member Author

asterite commented Apr 8, 2020

For big enums we could always have a way to mark them as "don't check for exhaustiveness".

@oprypin
Copy link
Member

oprypin commented Apr 19, 2020

https://travis-ci.org/github/oprypin/crsfml/jobs/677030266#L1597 as an example of enums where exhaustive case is never useful.

@asterite
Copy link
Member Author

@oprypin I'm not sure what we can do with that. I guess you can add else at the end? Otherwise, if we disable the check for some enums (with an annotations) we need to go back to the implicit else nil, but that will make case have two modes which is very confusing.

Any other idea in mind?

@oprypin
Copy link
Member

oprypin commented Apr 19, 2020

Your idea!

enums we could always have a way to mark them as "don't check for exhaustiveness".

@oprypin
Copy link
Member

oprypin commented Apr 20, 2020

The thing about "implicit else nil" is that it doesn't matter for the situation where the nilable expression is never used. The expression result of case is almost never used for these particular enums.
It's not such a big difference in the mode then, the compiler just lets you compile something that otherwise wouldn't.

@asterite
Copy link
Member Author

Oh, my idea was to always require an else for open enums like http status. After thinking about it a bit more, I can't see how not requiring exhaustiveness in some cases will work because then you have two modes of operations.

For now that redudnant else will have to stay there..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.