Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider allowing context type to provide a hint to "least upper bound" algorithm #1618

Closed
stereotype441 opened this issue May 6, 2021 · 18 comments
Labels
feature Proposed language feature that solves one or more problems inference least-upper-bound Proposals about improvements of the LUB algorithm

Comments

@stereotype441
Copy link
Member

Consider this example code from dart-lang/sdk#45941:

PSW foo(bool bar) {
  final PSW result = bar ? CNB() : AB(); // ERROR: A value of type 'W' can't be assigned to a variable of type 'PSW'.
  return result;
}

class CNB extends SW implements PSW { }

class AB extends SW implements PSW { }

class PSW implements W { }

class SW extends W { }

class W { }

The error is happening because when we use the least upper bound algorithm to determine the type of bar ? CNB() : AB(), we see that both PSW and SW are possible upper bounds, but we can't find any reason to prefer one over the other, so we go one step up the class hierarchy and choose W for the static type.

But we could do better, since we have a strong hint available from the context type PSW. For instance, currently the least upper bound algorithm forms sets of candidate super-interfaces at each possible depth in the class hierarchy, and chooses the final type using the greatest depth for which the number of candidate super-interfaces is exactly one. We could change this so that it if the number of candidate super-interfaces is more than one, but only one of them satisfies the context type, the one that satisfies the context type is chosen.

@stereotype441 stereotype441 added the feature Proposed language feature that solves one or more problems label May 6, 2021
@lrhn
Copy link
Member

lrhn commented May 6, 2021

So, if a conditional expression has a context type, let the static type of the conditional expression be that context type if we can't find a better LUB, defined as a LUB which is a proper subtype of the context type. So, ignore all interfaces that are not subtypes of the context type when trying to find a LUB.
(With no context type, or a useless top-type as context type, the LUB is trivially a subtype of that.)

Ship it!

It never made sense to check that both branches were assignable to the context type, then throw that information away and find an unrelated LUB, and then complain about the (now disallowed) implicit down-cast.

@eernstg
Copy link
Member

eernstg commented May 7, 2021

This idea seems to apply directly to the "Dart 1" rule, that is, the very last item in the rules for UP.
If we're changing the computation of the LUB then we would need to thread the context type through any structural steps in the computation of UP. For instance, the least upper bound of List<T1> and List<T2> with context Iterable<S> would be List<U> where U is the least upper bound of T1 and T2 with context S. For void Function(T1) and void Function(T2) with context void Function(S) we'd need the greatest lower bound of T1 and T2 with inverse context S. Does that all just work out?

@leafpetersen
Copy link
Member

So, if a conditional expression has a context type, let the static type of the conditional expression be that context type if we can't find a better LUB, defined as a LUB which is a proper subtype of the context type. So, ignore all interfaces that are not subtypes of the context type when trying to find a LUB.

This is closer to my proposed solution for this, which isn't really about changing LUB at all. Specifically, what I have advocated for the past (and would still like to find an opportunity to do as a breaking change) is the following:

  • A conditional expression E of the form b ? e1 : e2 with context type K is analyzed as follows:
    • Let T1 be the type of e1 inferred with context type K
    • Let T2 be the type of e2 inferred with context type K
    • Let T be UP(T1, T2)
    • If T <: S where S is the greatest (least? I need to think about this part) closure of K then the type of E is S
    • Otherwise the type of E is the greatest (least?) closure of K
  • Let R be the type of E (as determined above)
    • It is an error if the type of either of the ei is not assignable to R

@leafpetersen
Copy link
Member

Just noting that I expect this issue to become much more prominent with switch expressions. For example, the following simple definition of a map function over a lisp style linked list does not currently compile because LUB can't find an upper bound for Nil and Cons<T>.

sealed class Lisp<T> {
  const Lisp();
}

class Nil extends Lisp<Never> {
  const Nil();
  static const nil = Nil();
}

class Cons<T> extends Lisp<T> {
  final T car;
  final Lisp<T> cdr;
  const Cons(this.car, this.cdr);
}


Lisp<To> map<From, To>(Lisp<From> l, To Function(From) f) =>
switch (l) {
  Nil() => Nil.nil,
  Cons(:var car, :var cdr) => Cons(f(car), map(cdr, f))
};

We might want to consider prioritizing this for an upcoming release.

@stereotype441
Copy link
Member Author

We discussed this in yesterday's language meeting. I personally now favor the following variant of Leaf's proposal:

  • A conditional expression E of the form b ? e1 : e2 with context type K is analyzed as follows:

    • Let T1 be the type of e1 inferred with context type K
    • Let T2 be the type of e2 inferred with context type K
    • Let T be UP(T1, T2)
    • Let S be the greatest (least?) closure of K
    • If T <: S then the type of E is T
    • Otherwise, if T1 <: S and T2 <: S, then the type of E is S
    • Otherwise, the type of E is T

Other than some minor rewording (I split the fourth sub-bullet into two), the salient changes are:

  • If T <: S, then the type of E is T (not S). This is just to correct a typo in Leaf's proposal.
  • If T fails to be a subtype of S, and either T1 or T2 fails to be a subtype of S, then we choose the type of E to be T (rather than issuing an error).

Focusing on the second of these changes, note that in most situations, if the static type of an expression fails to be a subtype of the greatest closure of its context type, then an error will occur later in the analysis. The exceptions are:

  • Assignments to promoted variables
  • RHS of a ?? expression
  • Implicit downcast from dynamic
  • Coercions

So, assuming that "greatest closure" what we wind up choosing for the fourth sub-bullet, the distinction really only matters for the above cases.

@leafpetersen
Copy link
Member

This looks good to me (obviously this generalizes to switches). I believe that in the analogous place in function literal return inference we use greatest closure. I think there are arguments for either choice, but on balance I think choosing greatest closure is probably better, and it's consistent with our treatment of function literals.

@stereotype441
Copy link
Member Author

Here's a fun example of a LUB computation that would be helped by this proposal: https://dart-review.googlesource.com/c/sdk/+/306682/comment/3abc77aa_35768de0/

@lrhn
Copy link
Member

lrhn commented May 31, 2023

From that review:

I don't think I've ever seen an example where LUB behaved quite this poorly

I wish I could say the same, but I think it's one of the deepest examples I've seen. 😉

For good measure, here is an EfficientLengthIterable one:

void main(List<String> args) {
  var numbers = args.isEmpty ? [1] : {1};
  numbers = numbers.map((x) => x + 1);
  // Error: A value of type 'Iterable<int>' can't be assigned to a variable of type 'EfficientLengthIterable<int>'
}

(At least I can see my ideas of removing inaccessible declarations from the LUB computations wouldn't help in either of these examples.)

I sometimes add extra layers to class hierarchies in order to make LUB give what I want. 😢

@kevmoo
Copy link
Member

kevmoo commented Jul 17, 2023

Here's another example

VectorList<Vector> getView(Float32List buffer) {
  final viewOffset = offset ~/ buffer.elementSizeInBytes;
  final viewStride = stride ~/ buffer.elementSizeInBytes;
  switch (size) {
    case 2:
      return Vector2List.view(buffer, viewOffset, viewStride);
    case 3:
      return Vector3List.view(buffer, viewOffset, viewStride);
    case 4:
      return Vector4List.view(buffer, viewOffset, viewStride);
    default:
      throw StateError('size of $size is not supported');
  }
}

VectorListX is a VectorList<Vector> but converting the switch to an expression just gives me an error with Object. Pushing the return context into the expression would solve things – so sayith @munificent. (This is from package:vector_math)

@munificent
Copy link
Member

To make sure I understand this (and to possibly help anyone reading it), let me try to rephrase the proposed change in less formal (and possibly less precise) terms. Let me know if I have it right.

Here is an example to motivate:

Iterable<num> test(bool b) {
  Iterable<int> ints = [1].skip(0);
  List<num> nums = [1.2];

  return b ? ints : nums; // <-- Error.
}

This example seems entirely harmless. Both ints and nums are subtypes Iterable<num>, so which ever branch is chosen, the program should run fine. But instead of allowing this, the type checker says:

Error: A value of type 'Object' can't be returned from a function with return type 'Iterable<num>'.

The problem is that when calculating the type of b ? ints : nums, we take the least upper bound of the two branches, Iterable<int> and List<num>. Least upper bound can get kind of weird in cases like this where you have generics and/or interesting superinterface graphs. LUB, bless its heart, does the best it can and yields Object as the result. Thence the error.

The proposed behavior is, when type checking a ?: expression:

  1. Push the context type (K) into both branches and type check each of them, the same as we do today (T1 and T2).

  2. Take the least upper bound of that (T), again the same as today.

  3. If T is a subtype of K then LUB did a good enough job to avoid an error, so we just use its result and keep the same behavior we have already. (I'm ignoring the "greatest/least closure" bit for simplicity. I think it mostly doesn't affect things in interesting ways.)

  4. Otherwise, we're in a situation where LUB failed to find a type that satisfies the context type. Here's the new behavior where we try to do better:

    If it turns out that each branch's individually computed types (T1 and T2) are subtypes of the context type, then the context type would also work as another "upper bound" for those two types. And, because it is the context type, that would yield a type that won't lead to an error from the type ?: not being good enough for the surrounding context (like Iterable<num> in the example). So instead of using the original least upper bound, we use the context type itself, K, as a "better upper bound".

  5. Otherwise, LUB didn't do a great job, but the context type isn't a better upper bound either, so we just fall back to the original LUB (T) and keep the old behavior.

Walking through those steps with the above example:

  1. We calculate T1 = Iterablefor the then branch andT2=Listfor the else branch. (We push in the context typeK=Iterable` when calculating those, but it doesn't actually do anything in this example.)

  2. The least upper bound of Iterable<int> and List<num> gives us T = Object. (We might wish for a smarter LUB algorithm than this, but that's an orthogonal issue.)

  3. That LUB of Object is not a subtype of Iterable<num> so LUB didn't do a great job and we don't stop here.

  4. Instead, we look at the branches. Both Iterable<int> and List<num> are subtypes of Iterable<num>, so it looks like the context type is a better upper bound. We choose that as the result type and stop here.

  5. We don't reach this step.

Do I have that right?

@lrhn
Copy link
Member

lrhn commented Aug 10, 2023

I think we can extend this to coercions as well.

Define CUp(K; T1, ..., Tn), the "context-guided Up" of T1... Tn with context type scheme K as:

  • Let U = Up(T1, T2, ..., Tn)
  • If U <: K (greatest closure of?), then result is U. (If K is _, this is always the case.)
  • If all of T1, ..., Tn are assignable to K, then result is K (closed wrt. T1...Tn? NOTE BELOW!)
  • Otherwise result is U.

(NOTE: we might need to define what it means for a type to be assignable to a type scheme, in a way that includes coercions, and in such a way that we can find a closure of the context type which all the Tis are assignable to. Solving K for T1 ... Tn assignable to some-closure-of-K, so we get a non-schema type. Maybe that's already in there, but I don't see any "a generic function type ... is a subtype match for a non-generic function type ... if ...".)

Here "assignable" includes using any of our coercions.

Then we infer types for e1 ? e2 : e3 with typing context K as:

  • Infer types for e1 with typing context bool. Let T1 be the static type of e1
    • If T1 is not assignable to bool, a compile-time error occurs.
  • Infer types for e2 with typing context K. Let T2 be the static type of e2.
  • Infer types for e3 with typing context K. Let T3 be the static type of e3.
  • The static type of e1 ? e2 : e3 is CUp(K; T2, T3)

At runtime, we evaluate e1 ? e2 : e3 with static type T as follows:

  • Evaluate e1, with static type T1, to a value v1.
    • If T1 is not a subtype of bool, coerce v from T1 to bool. (Or do it unconditonally, since the first step of coercion bails out if it's already a subtype.)
    • If the runtime type of v1 is not bool, a runtime type error occurs. (Cannot happen when sound.)
  • If v1 is true, evaluate e2, with static type T2, to a value v2
    • Let S be T2 and v be v2
  • Otherwise (v1 is false), evaluate e3, with static type T3, to a value v3
    • Let S be T3 and v be v3
  • If S is not a subtype of T, coerce v from S to T.
  • If the runtime type of v is not a subtype of the extension-type erasure of T, a runtime type error occurs. (Cannot happen when sound.)
  • Then e1 ? e2 : e3 evaluates to v.

where a type S is assignable to a type T (in a certain scope, since it may depend on applicable extensions) iff any of:

  • S <: T.
  • S is dynamic.
  • S is a generic function type, T is a non-generic function type, and a specific deterministic algorithm
    findInstantiation(S, T) → TypeArgs can produce a set of type arguments
    <P1, ..., Pn> such that the type argument instantiation of S, S<P1,...,Pn> is assignable to T.
  • S <!: Function and the type S provides (directly in its interface, or through a unique applicable extension declaration in the current scope) an accessible call method member, such that the function type of that call method is assignable to T.

and we (runtime) coerce a value v from a type S to a type T, where S is known to be assignable to T, thorugh the following steps:

  • If S <: T, v is unchanged.
  • Othewise, if S is dynamic (and T cannot be a top-type):
    • If the runtime type of v is not a subtype of (the extension-type erasure of) T, a runtime error occurs.
    • Otherwise v is unchanged.
  • Otherwise
    • If T is a function type or Function, and S is a type with an accessible call method (interface or extension), such that the function type of that call method is assignable to T, then:
    • Perform the tear off of the call method of v, to a value f.
    • Update v to be f, and S to be the function type of the call method of S. (And continue to next case.)
    • If T is a non-generic function type, and S is a generic function type,
      • Let P1, ..., Pn be the type parameters provided by findInstantiation(S, T), <P1,...,*Pn>.
      • Let f be the generic function v instantiated with type arguments <P1,...,*Pn>.
      • Update v to be f and S to be S<P1,...,*Pn>.

@eernstg
Copy link
Member

eernstg commented Aug 22, 2023

Here is one more case where the context type can be used to inform UP, from this comment:

Future<int> f(Future<int>? f, int x) async {
  return Future.value(f ?? x);
}

The point is that UP does not produce a result of the form FutureOr<T> unless at least one of its operands is a type of that form. The connection to the context type and a potential adjustment to UP is described in this comment:

  • UP(T1, Future<T2>) = FutureOr<T3> where T3 = UP(T1, T2).
  • UP(Future<T1>, T2) = FutureOr<T3> where T3 = UP(T1, T2).

.. and then possibly say that they can only be used if the context type is FutureOr<P> for some type scheme P.

@lrhn
Copy link
Member

lrhn commented Aug 22, 2023

One of our guidelines for UP design was to not introduce new union types (well, except nullables, so really just "no new FutureOr"). One reason being that FutureOr<Object> is a valid supertype of String and Future<int>, and probably not what you want.

If one of the input types were a union type, giving that back out would be fine.
I think that's still valid, except that we should rely on the context type, and if the context type is a union type, giving that back out is also OK.

(Don't invent new types structural types with a structure that doesn't come from the input types. Don't invent new types in general that doesn't come from the input types, where we allow including their precise superinterfaces, but not any arbitrary supertype. For example, don't invent dynamic or void if they are not among the inputs. And if we start using the context type to avoid giving a result that isn't valid, just because we couldn't decide among valid types, then the context type should also be considered an input that we can use freely.)

Let's say we do:

If e is e1 ?? e2 with typing context C, then:
If C is _, then infer types for e1 with typing context _, otherwise with typing context C?.
Let N1 be the static type of e1. Let T1 be NONNULL(N1). It's a compile-time error if T1 is not assignable to C (some closure of?).
If T1 is not a subtype of (some closure of?) C, update T1 to be the static type result of coercing T1 with context type C. This is a coercion-point.
If C is _ then infer types for e2 with typing context T1, otherwise with typing context C.
Let T2 be the static type of e2. It's a compile-time error if T2 is not assignable to C (some closure of?).
If T2 is not a subtype of (some closure of) C, update T2 to be the static type result of coercing N2 with context type C. This is a coercion-point. T2 is always a sutype of C.
Let T3 be the context guided upper bound UPC(C; T1, T2).
_When T1 and T2 are both subtypes of C, the result of that operation is always a subtype of C, because if nothing else, it chosses (some closure of) C as result.)

Then we have a context type C of FutureOr<int>, an e1 with static type N1 of Future<int>?, T1 is Future<int> and is a subtype of C, and e2 with static type T2 of int, also a subtype of C. Then we find an UPC which probably fails to find any common interface below the type FutureOr<int>, so it chooses that type as the best, known, supertype.

We need to do any needed coercions on the individual branches, because that ensures that the context type that the branches are assignable to, is also an upper bound on the runtime types of the individual values afterwards, even if they have no shared type to coerce from.
(Join points should be coercion points. Always.)

@stereotype441
Copy link
Member Author

After a discussion with @leafpetersen today, I'm going to dust off this issue and try prototyping some solutions to it to see if any problems arise.

First I'll try prototyping my suggestion from #1618 (comment) (which addresses the original bug report, but doesn't handle coercions particularly gracefully).

Then, if that goes well, and there's enough time, I'll try expanding it to the approach suggested by @lrhn in #1618 (comment) (which does handle coercions).

copybara-service bot pushed a commit to dart-lang/sdk that referenced this issue Feb 7, 2024
After discussion with Leaf, I believe the primary use case that
`interface-update-3` was intended to address can be better addressed
through the proposal of language issue
dart-lang/language#1618 (instead of
dart-lang/language#3471, which I had been
working on previously).

Since I had not made a lot of progress on
dart-lang/language#3471, rather than set up
a brand new experiment flag, it seems better to simply back out the
work that I'd previously done, and repurpose the `inference-update-3`
flag to work on issue 1618.

Change-Id: I6ee1cb29f722f8e1f0710cbd0600cb87b8fd26a1
Bug: dart-lang/language#1618
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/350620
Commit-Queue: Paul Berry <paulberry@google.com>
Reviewed-by: Nate Bosch <nbosch@google.com>
Reviewed-by: Chloe Stefantsova <cstefantsova@google.com>
Reviewed-by: Konstantin Shcheglov <scheglov@google.com>
stereotype441 added a commit to stereotype441/language that referenced this issue Feb 12, 2024
@stereotype441
Copy link
Member Author

After a discussion with @leafpetersen today, I'm going to dust off this issue and try prototyping some solutions to it to see if any problems arise.

First I'll try prototyping my suggestion from #1618 (comment) (which addresses the original bug report, but doesn't handle coercions particularly gracefully).

This is now out for review: https://dart-review.googlesource.com/c/sdk/+/353440

copybara-service bot pushed a commit to dart-lang/sdk that referenced this issue Mar 4, 2024
In the following expression types, the static type is computed using
the least upper bound ("LUB") of their subexpressions (adjusted as
appropriate to account for the null-shorting behaviors of `??` and
`??=`):

- Conditional expressions (`a ? b : c`)
- If-null expressions (`a ?? b`)
- If-null assignments (`a ??= b`)
- Switch expressions (`switch (s) { p0 => e0, ... }`)

This can lead to problems since the LUB computation sometimes produces
a greater bound than is strictly necessary (for example if there are
multiple candidate bounds at the same level of the class hierarchy,
the LUB algorithm will walk up the class hierarchy until it finds a
level at which there is a unique result). For a discussion of the kind
of problems that can arise, see
dart-lang/language#1618.

This change improves the situation by changing the analysis of these
four expression types so that after computing a candidate static type
using LUB, if that static type does not satisfy the expression's
context, but the static types of all the subexpressions *do* satisfy
the expression's context, then the greatest closure of the context is
used as the static type instead of the LUB. This is the algorithm
proposed in
dart-lang/language#1618 (comment).

This is theoretically a breaking change (since it can change code that
demotes a local variable into code that doesn't, and then the demotion
or lack of demotion can have follow-on effects in later code). So it
is implemented behind the `inference-update-3` experiment
flag. However, in practice it is minimally breaking; a test over all
of google3 found no test failures from turning the feature on.

Since one of these expression types (switch expressions) is
implemented in `package:_fe_analyzer_shared`, but the other three are
implemented separately in the `package:analyzer` and
`package:front_end`, this change required modifications to all three
packages. I've included tests for the new functionality, following the
testing style of each package. I've also included a comprehensive set
of language tests that fully exercises the feature regardless of how
it's implemented.

Since `package:front_end` has many different implementations of `??=`
depending on the form of the left hand side, I've tried to be quite
comprehensive in the language tests, covering each type of assignable
expression that might appear to the left of `??=`.

Change-Id: I13a6168b6edf6eac1e52ecdb3532985af19dbcdf
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/353440
Reviewed-by: Konstantin Shcheglov <scheglov@google.com>
Reviewed-by: Chloe Stefantsova <cstefantsova@google.com>
Commit-Queue: Paul Berry <paulberry@google.com>
Reviewed-by: Erik Ernst <eernst@google.com>
@stereotype441
Copy link
Member Author

After a discussion with @leafpetersen today, I'm going to dust off this issue and try prototyping some solutions to it to see if any problems arise.

First I'll try prototyping my suggestion from #1618 (comment) (which addresses the original bug report, but doesn't handle coercions particularly gracefully).

This has landed (dart-lang/sdk@b279238), behind the inference-update-3 language flag.

I ran an experimental CL through google3 to see if switching on the flag would produce any test failures; it did not. I think we should consider switching on this flag in the next stable Dart release (3.4), and addressing @lrhn's proposed improvements in a future release.

I'll bring up that idea in tomorrow's language team meeting.

@skylon07
Copy link

skylon07 commented Mar 12, 2024

Ran into this issue with switch-expressions and found the behavior of that (and conditionals ?:) very unexpected. Posted something on stackoverflow about it and someone led me to this issue. Happy to see this is being worked on and (pretty much) planned for future releases! You guys rock! 🎉

copybara-service bot pushed a commit to dart-lang/sdk that referenced this issue Mar 12, 2024
The tests for the language feature `inference-update-3` are adjusted
to verify that the following hold true for an if-null expression of
the form `e1 ??= e2`:

- If the static type of `e2` is not a subtype of the write type of
  `e1`, but it is assignable via a coercion, then the coercion is
  performed, and the coerced type of `e2` is used to compute the
  static type of the whole `??=` expression.

- If `e1` is a promoted local variable, then coercions are performed
  based solely on the declared (unpromoted) type of `e1`.

These behaviors apply regardless of whether feature
`inference-update-3` is enabled; accordingly, this commit updates both
the `_test.dart` and `_disabled_test.dart` variants of the tests. I've
manually verified that even with the work on `inference-update-3`
reverted, the `_disabled_test.dart` tests continue to pass, so we can
be reasonably certain that these behaviors pre-date the work on the
`inference-update-3` feature.

Note: the diff is large due to the fact that the front end has 6
different code paths for handling `??=`, depending on the form of the
LHS, so to make sure that we have adequate test coverage, there are
tests for every possible LHS form. However, the diffs for all the
tests are pretty much the same except for
`if_null_assignment_local_disabled_test.dart` and
`if_null_assignment_local_test.dart`, which have extra test cases to
cover promotion behaviors.

Bug: dart-lang/language#1618
Change-Id: I711d62d9dc00fc20a2efd3967d60066d9bfaec03
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/356303
Reviewed-by: Lasse Nielsen <lrn@google.com>
Commit-Queue: Paul Berry <paulberry@google.com>
@stereotype441
Copy link
Member Author

This was switched on in dart-lang/sdk@fb21055, but I forgot to close the issue at the time. Closing it now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Proposed language feature that solves one or more problems inference least-upper-bound Proposals about improvements of the LUB algorithm
Projects
None yet
Development

No branches or pull requests

7 participants