Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Breaking change: use a more precise split point for refutable patterns #53167

Closed
stereotype441 opened this issue Aug 9, 2023 · 13 comments
Closed
Labels
area-language Dart language related items (some items might be better tracked at github.com/dart-lang/language). breaking-change-approved breaking-change-request This tracks requests for feedback on breaking changes

Comments

@stereotype441
Copy link
Member

stereotype441 commented Aug 9, 2023

Background

Dart's flow analysis engine is capable of extending the scope of a type promotion based on the fact that the code along a control flow path doesn't complete normally. For example, in the code below, the "then" branch of the if statement doesn't complete normally, so after the if statement, the programmer may assume that the "else" branch was token (and thus i is known to be non-null):

bool getBool(Object o) => ...;
void consumeInt(int i) { ... }
test(int? i) {
  if ( // (1)
      getBool('foo')
      // (2)
      || i == null) {
    throw StateError('i is null');
  } else {
    consumeInt(i); // OK; `i` is promoted to non-nullable `int` in this path
  }
  // (3)
  consumeInt(i); // OK; `i` is still promoted
}

The way this works internally is that for each point where two control flow paths join together (known as a join point), there is a corresponding split point (the point where the control flow paths diverged). Flow analysis examines the code between the split point and the join point, along both control flow paths; if the code on one of those paths can complete normally but the other can't, then promotions are kept from the control flow path that can complete normally. So in the above example, since the "then" branch doesn't complete normally, the promotion from the "else" branch continues to apply after the if statement is over.

Due to the short-cutting of && and || expressions, the exact split point is tricky to compute. For example, in the code above, the join point at (3) corresponds to a split point at (2), because getBool('foo') will be evaluated in all circumstances, but i == null will only be evaluated if getBool('foo') returned false. In order to avoid having to deal with this subtlety, flow analysis computes split points in an approximate fashion; for example in the above code, the split point it actually uses is (1) rather than (2).

This is a sound approximation, and it's very rare that it's noticeable to users. The only way a user would notice the difference is if the code between (1) and (2) didn't complete normally, for example if getBool('foo') were replaced with getBool(throw UnimplementedError()):

bool getBool(Object o) => ...;
void consumeInt(int i) { ... }
test(int? i) {
  if ( // (1)
      getBool(throw UnimplementedError())
      // (2)
      || i == null) {
    throw StateError('i is null');
  } else {
    consumeInt(i); // OK; `i` is promoted to non-nullable `int` in this path
  }
  // (3)
  consumeInt(i); // ERROR; `i` is no longer promoted
}

With this change, since flow analysis uses (1) as the approximate split point, it deduces that both the "then" and "else" branches unconditionally throw an UnimplementedError, so it has no reason to keep promotions from the "else" branch after the join at (3).

Note that the point where the compile error is issued is now unreachable, so there's no problem from a soundness point of view. In principle it might be slightly annoying to users that introducing throw UnimplementedError() in one location causes a compile-time error to surface at a seemingly unrelated location, but I'm not aware of any customers being bothered by this in practice.

Change Intent

This change makes the computation of split point more precise when refutable patterns are in use, so that the approximate split point is at the beginning of the pattern (or sub-pattern) that triggers the join point top level pattern. Previously, the split point of the innermost enclosing control-flow structure was used instead. For an "if-case" statement, this was the beginning of the scrutinee expression; for a switch statement or switch expression, it was the beginning of the innermost enclosing case there is no behavior change. For example, consider the following code:

int getInt(Object o) => ...;
void consumeInt(int i) { ... }
test(int? i) {
  if (
      // (1)
      getInt('foo')
      case
          // (2)
          int()
      // (3)
      when i == null) {
  } else {
    // (4)
    consumeInt(i);
  }
}

In this example, there are two control flow paths leading to the join point (4):

  • A path that is taken if the value returned by getInt(...) fails to match the pattern int(), AND
  • A path that is taken if the value returned by getInt(...) does match the pattern int(), but the expression i == null evaluates to false.

The corresponding split point is at (3).

The first of these control flow paths can't complete normally, because getInt returns an integer, and so the pattern int() is guaranteed to match. The second control flow path promotes i to non-nullable int, and can complete normally. Therefore, the promotion is kept after the join, and the call to consumeInt is allowed.

However, in Dart 3.1, flow analysis uses (1) as the approximate split point, rather than the true split point of (3). So if getInt('foo') is replaced with getInt(throw UnimplementedError()), then neither control flow path is considered to complete normally, so the promotion is lost and the call to consumeInt becomes a compile-time error.

If the proposed change is made, the split point will be at (2) instead of (1), so replacing getInt('foo') with getInt(throw UnimplementedError()) will have no effect on type promotion.

Note that as with the previous example, the difference is only significant inside unreachable code, so there is no soundness issue.

Justification

The major rationale for this change is to simplify the implementation of flow analysis, by allowing split points for patterns to be determined directly, by simple lexical rules, rather than indirectly through a complex Reachability class. This in turn should make it easier to complete the flow analysis features I'm currently working on, such as:

I believe it will also be helpful with some future work I have planned, to improve compiler performance by simplifying flow analysis data structures.

The minor rationale for this change is that from time to time, some programmers will temporarily introduce a throw expression into otherwise normal code, e.g. to double check that code is covered by a test, or as a placeholder for logic that hasn't been written yet. By moving the split points used for patterns so that they're closer to the true split points, we reduce the chances of temporary throw expressions causing nuisance compile errors.

Impact

Since the change only affects regions of code that are dead, it won't affect runtime behavior. But it could conceivably cause a compile-time error to appear where there was previously no compile-time error. Since programmers rarely write dead code on purpose, I expect this to be very rare.

Also, since the change causes promotions to be kept in situations where they previously weren't kept, it will tend to result in more precise types rather than less precise ones, and more precise types seldom lead to compile-time errors.

However, it's possible to construct contrived examples where there would be a compile-time error. For example:

int getInt(Object o) => ...;
void consumeInt(int i) { ... }
test(int? i) {
  if (getInt(throw UnimplementedError()) case int() when i == null) {
  } else {
    var j = i;
    j = null;
  }
}

In Dart 3.1, since i is not promoted in the dead else branch, j gets an inferred type of int?, so the assignment j = null is valid. With the proposed change, j would get an inferred type of int, and so there would be an error.

Mitigation

If you don't have any dead code in your project, you won't be affected. If you don't use patterns yet, you won't be affected.

If you do have dead code in your project, and you use patterns, there is a small chance that you will see a compile time error as a result of more precise inferred types in regions of code that are dead. If this happens, you will be able to mitigate the problem by deleting the dead code, or introducing explicit types to restore the old behavior.

For example, in the above code, var j = i; could be changed to int? j = i; to avoid any change in the type of j:

int getInt(Object o) => ...;
void consumeInt(int i) { ... }
test(int? i) {
  if (getInt(throw UnimplementedError()) case int() when i == null) {
  } else {
    int? j = i;
    j = null;
  }
}
@stereotype441 stereotype441 added the breaking-change-request This tracks requests for feedback on breaking changes label Aug 9, 2023
@itsjustkevin
Copy link
Contributor

@grouma @vsmenon @Hixie could you take a look at this breaking change request?

@mit-mit
Copy link
Member

mit-mit commented Aug 10, 2023

SGTM

@itsjustkevin
Copy link
Contributor

@stereotype441 could we add the relevant CLs here? It may be useful for contributors who are following the efforts here.

@stereotype441
Copy link
Member Author

@stereotype441 could we add the relevant CLs here? It may be useful for contributors who are following the efforts here.

Absolutely. I will add the CLs as I create them.

@stereotype441
Copy link
Member Author

@stereotype441 could we add the relevant CLs here? It may be useful for contributors who are following the efforts here.

Absolutely. I will add the CLs as I create them.

My work in progress CL is here, if anyone wants to follow along: https://dart-review.googlesource.com/c/sdk/+/319381

I still need to add tests, comments, and a CHANGELOG entry (not to mention a commit message 😃), and to test the change against the internal codebase before landing it.

@grouma
Copy link
Member

grouma commented Aug 10, 2023

SGTM

@mkustermann mkustermann added the area-language Dart language related items (some items might be better tracked at github.com/dart-lang/language). label Aug 11, 2023
@stereotype441
Copy link
Member Author

Note: due to a mis-reading of the code, I previously thought the change applied to both top level patterns and their sub-patterns. In point of fact, it only applies to top level patterns. This makes the change even smaller in scope; all it does is move the split point for if-case statements so that it's the beginning of the pattern rather than the beginning of the scrutinee expression. Switch expressions and switch statements aren't affected.

I've updated the description above accordingly. All the examples in the description above are still correct.

Folks who have already approved this change, I apologize for the churn. I don't believe you need to need to re-evaluate your approval (since the scope has strictly decreased), but feel free to double check my thinking on this!

copybara-service bot pushed a commit that referenced this issue Aug 11, 2023
Previously, the flow control logic for patterns didn't use the
`FlowModel.split` or `FlowModel.unsplit` methods at all. This meant
that if a control flow join point occurred in pattern logic, flow
analysis would consider the split point to be whatever split point was
established by the enclosing expression or statement. In the case of
an if-case statement, it would consider the split point to be at the
beginning of the scrutinee expression.

Split points are used by flow analysis for the sole purpose of
ensuring that joins propagate type promotions the same way in dead
code as they do in live code (so that users introducing temporary
`throw` expressions or `return` statements into their code do not have
to deal with nuisance compile errors in the (now dead) code that
follows. The consequence of flow analysis considering the split point
to be at the beginning of the scrutinee expression is that if the
scrutinee expression is proven to always throw, then joins that arise
from the pattern or guard may not behave consistently with how they
would have behaved otherwise. For example:

    int getInt(Object o) => ...;
    void consumeInt(int i) { ... }
    test(int? i) {
      if (
          // (1)
          getInt('foo')
          case
              // (2)
              int()
          // (3)
          when i == null) {
      } else {
        // (4)
        consumeInt(i);
      }
    }

In the above code, there is a join point at (4), joining control flows
from (a) the situation where the pattern `int()` failed to match, and
(b) the situation where `i == null` evaluated to `false` (and hence
`i` is promoted to non-nullable `int`). Since the return type of
`getInt` is `int`, it's impossible for the pattern `int()` to fail, so
at the join point, control flow path (a) is considered
unreacable. Therefore the promotion from control flow path (b) is
kept, and so the call to `consumeInt` is valid.

In order to decide whether to preserve promotions from one of the
control flow paths leading up to a join, flow analysis only considers
reachability relative to the corresponding split point. Prior to this
change, the split point in question occurred at (1), so if the
expression `getInt('foo')` had been replaced with `getInt(throw
UnimplementedError())`, flow analysis would have considered both
control flow paths (a) and (b) to be unreachable relative to the split
point, so it would not have preserved the promotion from (b), and
there would have been a compile time error in the (now dead) call to
`consumeInt`.

This change moves the split point from (1) to (2), so that changing
`getInt('foo')` to `getInt(throw UnimplementedError())` no longer
causes any change in type promotion behavior.

The implementation of this change is to add calls to `FlowModel.split`
and `FlowModel.unsplit` around all top-level patterns. At first glance
this might appear to affect the behavior of all patterns, but actually
the only user-visible effect is on patterns in if-case statements,
because:

- In switch statements and switch expressions, there is already a
  split point before each case.

- In irrefutable patterns, there is no user-visible effect, because
  irrefutable patterns cannot fail to match, and therefore don't do
  any control flow joins.

This change allows the split points for patterns to be determined by a
simple syntactic rule, which will facilitate some refactoring of split
points that I am currently working on.

Change-Id: I55573ba5c28b2f2e6bba8731f9e3b02613b6beb2
Bug: #53167
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/319381
Reviewed-by: Konstantin Shcheglov <scheglov@google.com>
Commit-Queue: Paul Berry <paulberry@google.com>
@stereotype441
Copy link
Member Author

I've speculatively landed 907e705 on the SDK main branch, enabling this change, since it unblocks other work I have in progress. I'll be happy to revert it if the breaking change request is not approved.

@Hixie
Copy link
Contributor

Hixie commented Aug 14, 2023

I have no objection.

@vsmenon
Copy link
Member

vsmenon commented Aug 14, 2023

lgtm

@itsjustkevin itsjustkevin added the cherry-pick-approved Label for approved cherrypick request label Aug 14, 2023
@itsjustkevin
Copy link
Contributor

Breaking change approved!

@itsjustkevin itsjustkevin added breaking-change-approved and removed cherry-pick-approved Label for approved cherrypick request labels Sep 18, 2023
@itsjustkevin
Copy link
Contributor

Realized I added the wrong label, sorry @stereotype441 🙂.

@stereotype441
Copy link
Member Author

This was completed in 907e705.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-language Dart language related items (some items might be better tracked at github.com/dart-lang/language). breaking-change-approved breaking-change-request This tracks requests for feedback on breaking changes
Projects
None yet
Development

No branches or pull requests

7 participants