Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inferring return type(List<Object>) for a function that actually evaluates to List<Enum>. #3290

Closed
hamsbrar opened this issue Aug 18, 2023 · 35 comments
Assignees
Labels
least-upper-bound Proposals about improvements of the LUB algorithm

Comments

@hamsbrar
Copy link

hamsbrar commented Aug 18, 2023

[Edit by eernstg: This issue revealed that the analyzer and the CFE treat the superinterface graph of an enum declaration differently. It should be decided which treatment is correct, if any, and implementations should then use that approach.]

enum E { a, b }

mixin M on Enum {}

abstract interface class I {}

enum E1 with M implements I { a1 }

enum E2 with M implements I { a2 }

List<Enum> f(E e) => switch (e) {
      E.a => E1.values,
      E.b => E2.values,
    };

void main() {
  f(E.a).forEach(print);
}
  1. There are no warnings from the analyzer.

  2. Compiler throws the following error:

    Error: A value of type 'List<Object>' can't be returned from a function with return type 'List<Enum>'.
    - 'List' is from 'dart:core'.
    - 'Object' is from 'dart:core'.
    - 'Enum' is from 'dart:core'.
    List<Enum> f(E e) => switch (e) {
                        ^
  3. Analyzer report an issue if I remove the type constraint from the mixin M. i.e if I use mixim M {} not mixin M on Enum {}.

    Analyzing t1.dart...                   0.2s
    
      error • t1.dart:11:22 • A value of type 'List<Object>' can't be returned from the function 'f' because it has a return type of 'List<Enum>'. •
              return_of_invalid_type
    
    1 issue found.
  4. Analyzer report the same issue if either E1 or E2 stop mixing with M.

  5. Example work as expected if f is inlined.

  6. Example work as expected if either E1 or E2 stop implementing I.

  7. Example work as expected if there's only one match in switch. i.e it works if I use switch (e) { _ => E1.values }.

  8. Example work as expected if I use switch statement. i.e it works if I use switch{ case E.a: return ....

> dart --version
Dart SDK version: 3.2.0-74.0.dev (dev) (Tue Aug 15 13:04:34 2023 -0700) on "linux_x64"

Actual example(where I'm having this issue) is bit large but do let me know if it's required. Also it could be true that I'm not using some of the language features as they are intended, and in that case it'd great to see some relevant warnings.

@eernstg
Copy link
Member

eernstg commented Aug 18, 2023

Good catch!

(FYI, this is related to a broader topic and an old discussion, cf. https://github.com/dart-lang/language/issues?q=is%3Aopen+is%3Aissue+label%3Aleast-upper-bound).

We're computing the least upper bound of the types List<E1> and List<E2>, which is List<T> where T is the least upper bound of E1 and E2, and the least upper bound of E1 and E2 is handled by the last case in the so-called UP type function, specified here.

This algorithm computes the shared superinterfaces of the operands (that is, {I, M, Enum, Object}) and then chooses the class which is most specific (Object is least specific; Enum and I are one step more specific; and M is two steps more specific), so we should get the type List<M> as the static type of the switch expression.

As far as I can see, the CFE gets an incorrect result and concludes that the list is a List<Object>, which is then not an acceptable return value for the function f. Looks like it could be caused by the special treatment of the type Enum.

@chloestefantsova, WDYT?

@lrhn lrhn transferred this issue from dart-lang/sdk Aug 18, 2023
@lrhn
Copy link
Member

lrhn commented Aug 18, 2023

This seems to be a repeat of dart-lang/sdk#51730

There is no bug, every implementation behaves according to specification. They just don't agree on what the class hierarchy is for enum types, because the analyzer uses only information from the specification, and the front-end also knows about internal, private helper classes used by the backends.

And since the upper-bound operation depends on the depth of a type in the class hierarchy, that means they produce different results.

I'm moving this issue to the language repository, because we might have to specify which behavior is actually correct, or prioritize our work on a better upper-bound algorithm. The introduction of switch expressions have increased the pressure on the upper bound computations, so the existing flaws show up more often. (Vastly more often, based on issued filed.)
And the difference between Analyzer and CFE on Up of enum types also shows up more often, so it's gone from an annoyance to a liability.

(One possible small step: Remove inaccessible types from the sets before doing LUB on them. That would mean that the UP function behaves differently depending on which library it's used from, but that may also mean it can more easily be computed modularly, using only public summarizes of other modules. But no code outside of dart:core would see the special _Enum class in their upper-bound computations. That's definitely breaking, because it will change the Up of existing types.)

@eernstg
Copy link
Member

eernstg commented Aug 18, 2023

This still sounds like the CFE should remove _Enum from the sets of superinterfaces that are used to find the least upper bound in that Dart 1 part of UP, and skip _Enum when path lengths are computed. I don't see why it wouldn't be a goal (a soft one, best effort) to ensure that it is impossible to detect that _Enum exists.

@lrhn
Copy link
Member

lrhn commented Aug 18, 2023

We can do something special-cased for the _Enum type, but it's not going to be the only problem, just the currently most visible one.
We could also just make the analyzer know about _Enum, after all, it's job is to predict runtime behavior, so it should match the actual runtime code as well as possible.

We could probably also add some unnecessary superclasses to _Enum, to give it greater depth. Then it would win in every such comparison, and the Up here would be _Enum. Every time.

Like:

import "dart:convert";
void main() {
  var r = DateTime.now().millisecondsSinceEpoch < 0;
  var l = r ? AsciiEncoder() : Latin1Encoder();
  print([l].runtimeType); // JSArray<_UnicodeSubsetEncoder>
}

Here we leak that two unrelated encoders have a shared superclass in their implementation.
In a perfect world, nobody would ever know that.

@eernstg
Copy link
Member

eernstg commented Aug 18, 2023

Here we leak that two unrelated encoders have a shared superclass in their implementation.
In a perfect world, nobody would ever know that.

That's a different topic because that's regular Dart code. We might compute least upper bounds in such a way that private declarations in different libraries are always removed from the set of potential results. However, I doubt that we would actually do that. At least, we'd need to double check that we don't get results which are too weird if UP(A, B) can yield different results in different libraries, even though A resolves to the same type in all of them, and similarly for B. Alternatively, we need to consider the consequences if private declarations are always ignored by UP and others, even in the library where those private types are declared.

Anyway, that's one discussion. The discussion about _Enum is different, because _Enum and Enum have non-standard semantics, and _Enum is basically specified to not exist. That's the reason why I think implementations should strive to make _Enum unobservable.

@lrhn
Copy link
Member

lrhn commented Aug 19, 2023

The _Enum class actually is specified:

Superclass: The superclass of C is an implementation-specific built-in class EnumImpl, with the mixins declared by E applied. (The EnumImpl class may be the Enum class itself or it may be another class which extends or implements Enum, but as seen from a non-platform library the interface of EnumImpl is the same as that of Enum, and its methods work as specified for Enum )

It's not specified which class the EnumImpl is, or what its depth is. It doesn't say "the same as Enum", only the same interface.
That is, it's not specified that it isn't there, but only that it might or might not be. It's open to being implementation dependent.

That's just highly annoying when Up behaves differently depending on that implementation choice.

You might be right that the best solution to this problem in isolation is to formally specify that the EnumImpl class should be ignored completely for Up computation: not included in the superinterfaces and doesn't count in the depth computation, for depth, the implicit declared superclass of an enum decoration counts as being Enum.

But there is still the general problem that the behavior of the classes that a library exports, wrt. Up, depends on internal implementation details, which may make it a breaking change to add or remove a private superclass. Something which should be completely invisible, to anyone who cannot name the supertype.

Or, short: depth is part of your public API, and it shouldn't be. The set of public supertypes it's part of your API, private ones shouldn't be, they should be implementation details.

@eernstg
Copy link
Member

eernstg commented Aug 19, 2023

It's open to being implementation dependent [whether or not Enum and _Enum/EnumImpl is the same class]

Right (and I should have looked it up ;-), but I think it would be useful to avoid that the type of an expression is implementation dependent, and we seem to agree on that:

That's just highly annoying when Up behaves differently depending on that implementation choice.

This behavior is specific to _Enum because it is otherwise not an implementation choice whether or not a certain class exists.

[we could] formally specify that the EnumImpl class should be ignored completely for Up computation

We could do that. However, it might actually be simpler to change the analyzer to take _Enum into account, following the actual implementation. This would eliminate the discrepancy between the analyzer and the CFE, and it is consistent with the feature specification. (Also, I think it's a good idea to avoid using different approaches in the specification and in the implementation(s) whenever possible.)

In addition to this we also have the other discussion, about the observability of private classes in a hierarchy:

Or, short: depth is part of your public API, and it shouldn't be.

I think it would be a rather substantial undertaking to make it unobservable that public classes (or classes from the current library which may be private or public) have superinterfaces that are private. We can, essentially, inspect the superinterface hierarchy by observing the response from tools like the analyzer about subtype relationships.

Next, it gets complicated if those private superinterfaces add new members (or override existing ones with a new signature) — in this case the developer needs to know that those new members (or new signatures) exist, and it will easily get out of hand if we start to pretend that those declarations are located in one/some of the public classes in the hierarchy. For instance, it would be weird if we pretend that a private class from the current library doesn't have the declarations that we can actually see in the source code.

As usual, I think we should admit that the world has a particular structure (including: this class has a superinterface which is a private class), rather than trying to stitch up a picture that might seem "more appropriate" based on some abstract principles.

@hamsbrar
Copy link
Author

We're computing the least upper bound of the types List<E1> and List<E2>, which is List<T> where T is the least upper bound of E1 and E2, and the least upper bound of E1 and E2 is handled by the last case in the so-called UP type function, specified here.

Last case doesn't contain as clear instructions as the other ones.

I think that the issue is related more to how an implementation applies UP than UP itself. I'm wondering how the implementation proceed given T1 {Object, _Enum, Enum, M, I, E1} and T2 {Object, _Enum, Enum, M, I, E2}. I can think of applying UP, pair-wise(UP(T1[i], T2[i])) and of course normalizing both sets so each item from T1 gets aligned with a matching item(if there is) in T2. Maybe it's easier said than done but I'm curious, how one can end up with Object, or are we getting Object as a fallback here?

@lrhn
Copy link
Member

lrhn commented Aug 20, 2023

The actual type hierarchy is:

graph BT;
    E1["E1"] --> EM1["_Enum&M_1"]
    E2 --> EM2["_Enum&M_2"]
    E1 --> I
    E2 --> I
    EM1 --> Enum_["_Enum"]
    EM1 --> M
    EM2 --> Enum_
    EM2 --> M
    M --> Enum
    Enum_ --> Enum
    Enum --> Object
    Enum_ ~~~ I
    I --> Object

As you can see, the intersection (everything above the two lowest levels) has two types at each level before Object.
That's why the "legacy LUB" (last step of Up) algorithm refuses to choose one of them, and falls back to Object as the only "most specific and also unambiguous" choice.

And that is why I'd recommend removing the inaccessible _Enum from the set before trying to find an upper bound, which would have allowed the algorithm to find M.
We don't have to change the depth of anything, just don't count the _Enum.
(I'd like to not count the inaccessible member towards depth, and I'd also ignore anonymous mixin applications, but that's not needed for this case.)

@hamsbrar
Copy link
Author

hamsbrar commented Aug 20, 2023

Okay so I see two issues here:

  1. First, LUB isn't smart enough to deal with issue(s) like we've here.
  2. Second, LUB tries to infer in-accessible types which can lead to suboptimal/unexpected results.

This means that even if we devise a new algorithm we still has to choose whether new algorithm will exclude in-accessible entities from the process because we cannot rule out the possibility that an algorithm:

  • Can produce results that don't match the runtime behaviour if in-accessible entities are not part of its search space.
  • Can infer an in-accessible entity as result if such entities are part of it's search space and it's allowed to return them.

The right thing to do here would be to keep every entity in search space and allow the algorithm to return an entity if and only if the entity is accessible in the library where it's about to be inferred.

Current algorithm, as it stands, is trying to select the type that is common to both heirarchies and has the largest unique inheritence path. This means that inference can and will fail in certian cases which is OK. However, the part which is NOT OK is that failures of this approach will never be graceful. As it turns out, this is also the root cause of most of the issues we're facing. Of course you can choose to apply an easy fix and make the current issues go away if that's not a priority item.

Anyway, here's an alternative to LUB that I can think of:

class A extends T1, T2...Ti with M1, M2...Mj implements I1, I2...Ik
  1. First create a list heirarchyA.
  2. Add A to the list heirarchyA.
  3. Add everything that is directly referenced by A in the syntax(such as T1...Ti, M1...Mj, I1...Ik).
  4. Repeat step 3) for all the items we just added in step 3.

Note: There could be cycle that must be avoided in this part and I know that you know that but I just wanted to let you know that I know.

Anyway, using this, we'll get a decent version of our actual heirarchy:

  • HeirarchyE1: E1 < M < I < Enum < _Enum < Object.
  • HeirarchyE2: E2 < M < I < Enum < _Enum < Object.

And given HeirarchyE1 and HeirarchyE2, one can easily find the bounds in any way they like. For example,

  1. Create two sets typeSeenInHeirarchyE1 and typeSeenInHeirarchyE2.

  2. For i = 0 to max(heirarchyE1.len, heirarchyE2.len):

    • Copy ith item from heirarchyE1 to typeSeenInHeirarchyE1.
    • Copy ith item from heirarchyE2 to typeSeenInHeirarchyE2.
    • If ith item from heirarchyE1 exists in typeSeenInHeirarchyE2 and IS_ALLOWED_TO_RETURN(item):
      • Terminate with this item as result.
    • If ith item from heirarchyE2 exists in typeSeenInHeirarchyE1 and IS_ALLOWED_TO_RETURN(item).
      • Terminate with this item as result.
  3. Terminate with some type T if it scares the shit out of everyone in MORETOP.

  4. terminate with no value

Not sure if it's correct but I'll consider it wrong if it require a fix for few special cases – efforts must be doubled not steps.

@eernstg
Copy link
Member

eernstg commented Aug 21, 2023

Thanks for spelling out an algorithm idea, @hamsbrar!

I should say that we have to be very careful about changing algorithms like this one, because it is used during type inference and promotion and typing in general, and any change is likely to be breaking in a way that is pervasive (because type inference etc. are used everywhere), implicit (especially type inference, but also promotion), and subtle (say, the expression [] is inferred as <num>[], but after the update to the LUB algorithm, or promotion, or type inference, etc., it's inferred as <int>[]; then a billion instructions later we encounter a dynamic error because someone attempts to add a double value to this list, or maybe it causes a logical bug because a different branch is taken in if (myList is List<int>) ...). So the discussion about LUB or about any particular algorithm isn't going to end soon. ;-)

A few things to note:

With extends T1, T2...Ti we know that i is one because it isn't possible to have more than one syntactic superclass (and we can only have zero when extends is omitted, too, in which case we have extends Object implicitly).

The repeated application of step 3 (that is, step 3 and 4) will then yield a set of superinterfaces of A which is a subset of the ones that are gathered by the current algorithm (which includes all superinterfaces, including all mixin applications).

Step 3 and 4 will terminate unless the superinterface graph is cyclic (which is an error in its own right), so there is no termination issue for these steps.

The reference to E1 < M < I < Enum < _Enum < Object as a hierarchy seems to imply that Enum is a subtype of _Enum (which is not correct), but also that I is a subtype of Enum (which isn't correct), and M is a subtype of I (also not correct). It is also possible that it doesn't imply anything about the subtyping relationships: It is just a set of superinterfaces of A which is ordered by some ordering relation R.

One possible interpretation of that ordering relation R is that it is determined by a breadth first traversal of the declared superinterfaces of A (with that, it makes sense that M occurs just before I). We would then simply ignore any superinterface when we encounter it for the 2nd, 3rd, etc. time during the traversal. I can't know if this is the way it was done, but it is at least consistent with the declarations in the original posting.

However, that ordering relation is radically different from the ordering which is used today (which is the 'depth' of each of the superinterfaces), and it can actually contradict the subtyping relationship:

class D {}
class E extends D {}
class F implements E, D {}
class G1 extends Object implements F {} // "Object < F < E < D".
class G2 extends Object implements F {} // "Object < F < E < D".

So LUB(G1, G2) would then be Object, and the hierarchy in the comments further seem to imply that Object is a subtype of all the other types, and E is a subtype of D (both of which are false). Actually, we'd probably want LUB(G1, G2) == F (which is also the result that we get today). We could also have a class F1 implements E, D {} and a class F2 implements D, E {}, and then the hierarchies would contradict each other.

I think this illustrates that there are many opportunities to get this wrong. In any case, do keep an eye on https://github.com/dart-lang/language/issues?q=is%3Aopen+is%3Aissue+label%3Aleast-upper-bound in order to see where this goes.

@lrhn
Copy link
Member

lrhn commented Aug 21, 2023

The issue that the legacy LUB is trying to solve is, given two types, which can each have an arbitrary (and infinite) number of supertypes, choose one which is a supertype of both, with an optimization goal of making it as "close" to the original types as possible. And do so in a deterministic, consistent, and efficiently computable way.

The first step it takes is to ignore all the supertypes, and focus only on the shared super-interfaces. A List<int> implements Iterable<int>, but it also has Iterable<num> and List<num> as supertypes. All types have an infinity of supertypes (if nothing else FutureOr<T>, FutureOr<FutureOr<T>>, etc. is one such set, and num, Comparable<num>, Comparable<Comparable<num>> is another.), but their set of superinterfaces is finite.
(And that's why the LUB of List<num> and Iterable<int> is not Iterable<num>, because Iterable<num> is not a superinterface of List<num>. Maybe restricting to just superinterfaces it's too limiting, and we could get better results if we didn't, but it successfully reduces an infinite problem to a finite problem.)

Then, given the set of shared superinterfaces, it tries to find a measure for "closest" to the original types, that it can optimize for. That was chosen, semi-arbitrarily, as the maximal depth of the type in the super-interface graph, rooted at Object with depth zero. (Null is handled before we get here, so it never occurs in this algorithm, otherwise it too would have had to have depth zero.)

Then it orders the potential results by their depth, and starting from the deepest potential solutions, it checks whether that depth level has a single type. If it does, it's the chosen type. If not, the entire level is skipped.
Why is that?

The reason to skip the level is to avoid making an arbitrary choice, because "arbitrary" goes against the goal of "predictable".
The algorithm could have said: "Choose a random type from that level". That would be bad because it would make the algorighm both unpredictable and inconsistent. Two compilations of the same program can yield different types. Two UP operations on the same types can give different types, so var x = b ? e1 : e2; x = b ? e1 : e2; could, sometimes, refuse to compile, because it chooses two different unrelated types as the upper bound of the same two types, in two different places. (OK, random is obviously bad, it's like the optimally bad choice for resolving ambiguity, so no surprise here.)

Your design here is deterministic, but it still suffers from choosing arbitrarily, based on properties that should not have semantic effect.

  • The order of the initial list of types, e.g., HierarchyA, depends on the syntactic order of interfaces. If you change implements I, J to implements J, I, it may change the type of an expression somewhere else in a program you've never heard of. That makes the order of your implements clauses part of your public API, something that others might rely on, and where changing it is potentially breaking. That's bad. Not "random" bad, but pretty damning.
  • The two steps that has "Terminate with this item as result" are ordered, with the first type being checked first. That means that changing b ? e1 : e2 to !b ? e2 : e1 can change the type of the result. That's also a completely valid rewrite, that shouldn't have any effect on anything.

So, if we were to just pick a type that both share, and do it predictably, consistently and not affected by arbitrary syntactic choices, instead of using random to choose, the original algorithm could say:

  • Look at the deepest non-empty level of common subtypes. Choose the type whose declaration's name is first alphabetically (prefer public over privat). If two types have the same name, tie-break by the alphabetical ordering of their declaring library's URI.

That's completely deterministic, it doesn't depend on how things are declared syntactically, it doesn't depend on the order of the operands to the LUB operation. It's main issue is that it's arbitrary. And, heck, maybe that's OK. Sometimes it's better to get a result which is correct, and predictable, but not obviously the only possible choice, than to get a result which is Object, just because there were two better alternatives and we were afraid to choose. (But changing the name of the library file may change ordering, which is also not great.)

We can filter out inaccessible types first, if we want to avoid having them as result. (I'd want that.)

(And it still cannot choose an anonymous mixin application, because any type which has an anonymous mixin application as superinterface will also have its singular subtype as superinterface, so we'll always find that instead.)

The biggest issue here is that it relies on depth, which itself depends on implementation choices which shouldn't be publicly visible. If you change class Foo {...} to abstract class _FooBase {...} class Foo extends _FooBase {...}, which should be a completely invisible change to other libraries, you change the depth of Foo, which may change the type of an expression in some program that you've never heard of. A simple refactoring becomes a breaking change for someone else. That's bad, and if we are going to change LUB, and potentially break some existing code (any change of LUB may do that), I'll want to change it to something that doesn't have this abstraction leaking property either.

@eernstg

As usual, I think we should admit that the world has a particular structure (including: this class has a superinterface which is a private class), rather than trying to stitch up a picture that might seem "more appropriate" based on some abstract principles.

I strongly disagree, and believe that ensuring that abstraction actually works, and prevents depending on implementation details behind the abstraction, is a cornerstone of providing a solid foundation for programmers.

The "public API" that I keep talking about is the totality of what your library is promising its clients. Anything that they can depend on in such a way that changing it is breaking. The types, their subtyping behavior, their member signatures, and their dynamic behavior are all part of that.

The ability to add base, final and interface is taking away things from the public API.
If an implementation detail leaks into the public API, it's no longer an implementation detail, and then it becomes something the author has to deliberately design for and around.

Every part of your public API should be deliberate.
The language doesn't just allow something because it can, we never automatically make something const just because it could be, because it's a promise that it would be breaking to change away from.

And for that to be something authors can actually work with, it needs to be predictable.
It should be clear which parts of a library is part of its public API, and what is not. Which things you can safely change, and which you cannot.

We generally say that what's inside a method body is invisible from the outside. That's why async, async* and sync* are to the right of the declaration, not on the left. It's an implementation choice for that particular member body, not part of the public signature.
And why we have getters and setters, rather the fields, as part of the public API, so that you can safely change a field to a getter/setter and back without affecting clients. And therefore no client behavior must ever depend on whether a getter comes from a field or an explicit declaration. It's equally impossible to see whether a field is late or not, because it's all getters and settes.

A public class exposes its interface (set of member signatures), its static declarations, its constructors (and even whether some of them are generative, if the class can be extended) and its public superinterfaces.

There is, or should be, no way for other libraries to detect the private superinterfaces.
If you cannot name a type, it might as well not exist. At runtime, all you can is object-instance-of-type and type-subtype-of-type checks, and == on Type objects.
If you cannot name a type, and it's not exposed by the API (so you can get it by inference), for all practical purposes, your code should work just as if that type didn't exist.

A library can break its own abstraction.

  • Have a public type alias for a private type.
  • Have a public function signature mentioning a private type.
  • Have a public type which mentions a private type (like implements Iterable<_Private>).
  • Expose instances of a private class which has members or superinterfaces that are not part of the public static type.

But if it doesn't, it should be able to rely on private types being invisible to other libraries, even their position in the type hierarchy. Refactoring should be safe. If it isn't, it's because we're breaking abstraction.

We should not be exporting the type hierarchy, just the superinterface relation. The complete type hierarchy is a red herring, that is, itself, an implementation detail of our type system. It contains details that are implementation details to individual libraries.

And depth breaks that abstraction. And it shouldn't. (Or we should stop using it.)

@lrhn lrhn added the least-upper-bound Proposals about improvements of the LUB algorithm label Aug 21, 2023
@hamsbrar
Copy link
Author

hamsbrar commented Aug 21, 2023

@eernstg I'll give a another read to your message later. I just wanna say something about the example that you gave:

Given the task LUB(G1, G2 in:

class D {}
class E extends D {}
class F implements E, D {}
class G1 extends Object implements F {}
class G2 extends Object implements F {}

I'd turn this into:

class G1 extends Object implements F implements E implements D extends D {}
class G2 extends Object implements F implements E implements D extends D {}
//                                 1.~~~~~~~~~~~~~~~~~~~~~~~ 2. ~~~~~~~~~    
// 1. Results from applying step 3 on F
// 2. Results from applying step 3 on E

And then remove all the noise:

G1 Object F E D D
G2 Object F E D D

And turn that into a list:

[G1, Object, F, E, D, D]
[G2, Object, F, E, D, D]

And then applying the variant that I described I will indeed get Object as result which is not an incorrect result but a sub-optimal one. The reason I said "find the bounds in any way they like" is that this is not the only way to apply this algorithm.

I can choose to get all the possible results( [Object, F, E, D, D]) using the same algorithm. All I've to do is start aggregating result items and return them all after I exits the for loop in the second part of the algorithm(the part that currently terminates on the first valid result item, which, as lrhn explained and your example demonstrate, isn't always gurranteed to be a best choice and it also makes sense why). And then I can throw in depth or whatever logic is needed to select the best item from the results. So I don't think this is an issue here or is it really is?

@eernstg
Copy link
Member

eernstg commented Aug 21, 2023

@lrhn wrote:

I strongly disagree, and believe that ensuring that abstraction actually works, and prevents depending on implementation details behind the abstraction, is a cornerstone of providing a solid foundation for programmers.

That sounds good, and I'd very much like to support that principle. However, I'd put forward an even stronger principle which is that we should be honest about the technical properties of entities in the language as they are actually defined (and we may very well be unable to change them such that they fit any given abstract principle).

In particular, types that are private are not unobservable.

First, they can be observed in the library where they are declared.

I think it is at the very least a questionable idea that we should claim that they are unobservable in other libraries, if this causes such anomalies as UP having different results in different libraries when invoked on the same arguments:

// Library 'a.dart'.

class _Secret {}
class A1 extends _Secret {}
class A2 extends _Secret {}

bool b = true;

var a1 = A1();
var a2 = A2();
_Secret a = b ? a1 : a2; // OK, `UP(A1, A2) == _Secret`

// Library 'b.dart'
import 'a.dart';

void main() {
  a = b ? a1 : a2; // Error, `Object` is not assignable to `_Secret`.
  a = a1; // OK.
  a = a2; // OK.
  var aa = a; // OK, `aa` has declared type `_Secret`.
  aa = a1; // OK.
  aa = a2; // OK.
  aa = b ? aa : aa; // Error, `Object` is not assignable to `_Secret`.
}

You could say that 'a.dart' shouldn't export a when it has a private type, but this is one of those situations where the higher principle kicks in: We can export a even though it has a private type, and this has a well defined semantics, including the ability to introduce _Secret as the type of a variable by means of type inference.

The last statement in main is particularly surprising, but it is just based on the same (impractical) idea that UP must skip private types from a different library.

Another issue is the ability for a private type to introduce public members.

// Library 'a.dart'.

class _Secret {
  void publicMethod() {}
}

bool b = true;

class A1 extends _Secret {}
class A2 extends _Secret {}

// Library 'b.dart'.
import 'a.dart';

void foo(A1 a1, A2 a2) {
  (b ? a1 : a2).publicMethod(); // OK.
}

If we acknowledge that A1 and A2 exist, but refuse to acknowledge that they have a common superinterface _Secret (e.g., by mandating that UP cannot return a private type from a different library) then it gets really tricky to explain why we can call publicMethod. Alternatively, we could just make it an error, but that is a breaking change.

If you cannot name a type, it might as well not exist.

That's a non-sequitur, "cannot name" is not the same thing as "is unobservable".

I actually think it's a good rule of thumb that public APIs shouldn't contain private types (so a should not have the type _Secret, or the class currently denoted by _Secret should be made public). So we can have lints like library_private_types_in_public_api helping us to maintain that style.

However, the rules of the language should be based on the actual language, not on some subset of the language which is considered "good style".

Moreover, I'm not quite convinced that this rule ("don't use a private type in a public api") is universally desirable. The fact that a type is private puts some restrictions on the ways in which it can be used, and there might be some programming idioms where it turns out to be useful.

In any case, if we had had the desire to ensure that private types are unobservable in other libraries then we should have had much more restrictive rules about private types. For instance, they should not be able to declare any public members, and it should not be possible to infer a private type from a different library as the type of any expression, etc.etc.

Every part of your public API should be deliberate.

If there's a well-defined set of rules, and they have been implemented faithfully, and it is possible to express said API exactly as desired using those rules, then that's fine.

In real life there will be cases where the rules don't quite fit the desired properties. For example, Dart doesn't support a mechanism which is similar to protected members in other languages, so there's no way to express an API (in this case: the API offered to subclasses and subtypes) with exactly the desired properties. Still, we'll find a way to define a class in some way which is a conservative approximation of the desired properties (that is, developers can do everything they must be able to do, plus some amount of other things that may or may not be desirable). The situation is similar for the client API (that is, the set of members in the interface of the type).

The slack (that is, the difference between the ideal API and the actual API) may be a delicate balancing act between different trade-offs. You could say that this balancing act is always a deliberate choice, but I think it's equally fair to say that some API properties will be accidental (in the sense that "this is what we got", and we basically can't get exactly what we want).

A library can break its own abstraction.

  • Have a public type alias for a private type.

That is not necessarily a violation of the abstraction. For instance, we can use a type alias to impose an extra discipline on the provision of actual type arguments (such that we always have type arguments of the form T and T Function(T) for some T):

// Library 'a.dart'.

typedef FunctionHolder<X> = _FunctionHolder<X, X Function(X)>;

class _FunctionHolder<X, X Function(X)> {
  int Function(X) intFunc;
  String Function(X) stringFunc;
  FunctionHolder(this.intFunc, this.stringFunc);
}

// Library 'b.dart'.

void main() {
  FunctionHolder<int> holder = FunctionHolder<num>(...); // Error.
}

This allows us to express the constraint that FunctionHolder must be considered to be invariant in its type argument. If _FunctionHolder had been public then we would not be able to maintain that clients outside 'a.dart' must pass actual type arguments such that if the first argument is T then the second argument is T Function(T).

In other words, the ability to exert a certain amount of control over the usages of a private type from a different library can be useful, and hence the story doesn't end with "breaks the abstraction".

@lrhn
Copy link
Member

lrhn commented Aug 21, 2023

such anomalies as UP having different results in different libraries when invoked on the same arguments:

I want UP to depend on the context type too, so depending on the library is small stuff in comparison. 😁

As long as the superinterfaces do not contain any of your own library's private declarations, the behavior of such an UP would perfectly predictable. And if it does contain some of your own library's private declaration, you'd be the one to know, and care, and not be surprised by their existence.
I can't come up with a realistic scenario where UP giving you a type which is private to another library is what you want. I, and users keep coming up with cases where it isn't what you want, but you get it anyway.
Consistently giving a useless result isn't better.

We can't know what users intend. But we can make generalized assumptions. A type being declared private is 99.9% of the time a type that other libraries shouldn't know about. Otherwise, why go to the effort of making it privately named at all.
Sure, there might be exceptions, but that's what they are.

And absent of a better hint, I think we should use that as a hint that giving such a type as the result of UP in another library, is giving a useless result. And we should give a better result instead.

@hamsbrar

This comment was marked as outdated.

@hamsbrar
Copy link
Author

hamsbrar commented Aug 22, 2023

I was trying to see if we can do it more efficiently.

The hidden version above was the first attempt. Second attempt lives as history in this message. The third version is the final variant that actually works(as long as we don't find any issues with it). It's capable of minimizing the number of valid results that algorithm has to check. For instance, in the following example:

class A {}
class B extends A {}
class C extends B {}
class D extends C {}
class E extends D {}

class T1 extends E  {}
class T2 extends D  {}

The already purposed variants will try to find the best item from all valid result items(which in this example are: [A, B, C, D]) but this variant will create a smaller version of search set from which it'll try to find the best item. In this example, that smaller search set will be [D](yes it's true, a list that contains only the best item). As you can probably guess, this is an example of best case for this variant. In worst case this variant tries to find the best item from all the valid result items which is similar to what the already purposed variants do but they do that in every case.

Algorithm

  1. Collect types that are present in declarations using the original algorithm:

    In this example,

    class F {}
    class E extends F {}
    class G1 extends Object implements E {}
    class G2 extends Object implements F {}

    We should get:

    hierarchyG1: [ G1, Object, E, F ]
    hierarchyG2: [ G2, Object, F ]

  2. Create a map mapToImmediateLsTypes that maps every type to list of types that a type directly references in its declaration. For example:

    mapToImmediateLsTypes = { 
        F => [],          
        E => [F]          
        G1 => [Object, E],
        G2 => [Object, F],
        Object => [],     
    }
  3. Create a set validResults such as it contains types that are common in hierarchyG1 and hierarchyG2. For example: validResults in the current example will be [Object, F].

  4. Create a new list L.

  5. Copy 0th item from hierarchyG1 to L.

  6. Copy 0th item from hierarchyG2 to L.

  7. For i = 0 to L.len:

    • If ith item from L doesn't exists in validResults:
      • Replace it with ...mapToImmediateLsTypes[item] and apply this replacement recursively but only on items that are not present in validResults.

    Explanation using current example:

    • Before applying this step, L will be = [G1, G2]
    • G1 is not a valid result i.e it doesn't exists in validResults, we'll replace it with [Object, E] (using mapToImmediateLsTypes).
    • At this point L = [Object, E, G2].
    • Object is a valid result, we'll ignore it.
    • E is not a valid result, we'll replace it with F.
    • At this point L = [Object, F, G2]
    • F is a valid result, we'll ignore it.
    • G2 is not a valid result, we'll replace it with [Object, F] (using mapToImmediateLsTypes).
    • At this point L = [Object, F, Object, F] which doesn't contain any invalid result item so it's our final search set.
  8. Select item from L that has more depth than any other item. If there are multiple items at the deepest level then make the selection according to requirement/preference:

    • Terminate with this item as result.
  9. terminate with no value

This variant ensures that best item(the closest type which is common to operand types) is part of L when we reach the last step(8). So last step can be changed to one's liking or requirement.

And the idea here is that the best item or the item that leads to the best item is always part of declarations of operand types. The only way we can miss the best item using this approach is if we allow the best item to hide behind an invalid result item. By replacing an invalid result item with a valid one we'll force the best item to turn up in L, the spot where we're dead set on hunting it down.

Note: This is assuming that there is no entry in mapToImmediateLsTypes in which there exists a type in right-side(values) that is more specific than type in the left-side(key). But if there exists a type that is more specific, then that should be kept in mind when returning final result. For example, if final result is X and we've an entry X => [Y, ...](in mapToImmediateLsTypes) and Y is more specific, then we can chose to return Y if it's also a valid result item(also this should be checked recursively i.e Y should be checked further before returning it). Or another way would be to replace type X in L with type Y if there exists a more specific type Y that exists in validResults and there exists an entry X => [Y, ..] in mapToImmediateLsTypes. If we have cases like that, then this replacement step can be added after step7(before step8).

The question we're left with is whether this idea is sound(I think it is), and whether everything else is working accordingly. If it's true then that would mean that this variant is capable of ignoring all the sub optimal results and selecting our best item in an efficient way. And there's still a room for micro-optimizations here and there but those are the obvious ones so I deliberately chose to ignore them.

@eernstg
Copy link
Member

eernstg commented Aug 23, 2023

@lrhn and I had a long discussion about this IRL. We still don't agree, but the discussion clarified some overall relevant points.

The main reason why I want to use an UP (last case, that is, the Dart 1 algorithm) that does not have an exception for private type introducing declarations (class, mixin, mixin class, extension type, plus whatever comes next) is that it is inconsistent.

If we do that then we should also do the following in order to be consistent:

  • It should be an error for a public API to refer to a private type (for example, no return type like _Secret?, no parameter type like FutureOr<_Secret>, no superinterfaces like extends _Secret with _SecretMixin implements _SecretInterface). It is probably benign for a private method to have a private return type etc, but we'd need to check that kind of situation carefully.
  • Inference of an expression should never yield type arguments or declared types that are or contain private types from a different library. (So we get the "nearest" non-private supertype or Object? or what? If we're lucky then there are no leaks, due to the previous item, so we won't encounter any private types of a different library during inference.)
  • It should be an error for a private type introducing declaration to declare a public member, or to override a public member with a declaration whose signature differs from the combined member signature of its superinterfaces.
  • It might be an error for a public type alias to refer to a private type (like typedef Foo = _Foo<int>;).
  • We could make it a run time error to invoke e.runtimeType in the case where the run-time type of the value of e is a private type from a different library.
  • There are probably additional situations where a private type "leaks" into a different library.

We could also introduce the ability for a mixin to have no type of its own. For every member declaration D it would then be forced to have superinterfaces such that there is a combined member signature with the same name as D and with the same types everywhere (same return type, no covariant specialization, same parameter types, etc). A mixin with no type could reasonably be ignored during the computation of UP (a bit like an "epsilon step" in some formalisms), which seems to be exactly the idea that "we just delete certain types before we choose the result to return from UP". Similar concepts could be introduced for other declarations (a class that doesn't have a type of its own, etc.)

But we don't do that, and it would be a potentially massively breaking change. Moreover, the restriction on type aliases will gratuitously destroy a significant amount of expressive power. Surely there will be some useful designs relying on not having each of the other restrictions.

I'd very much prefer to have lints to support every developer who wishes to enforce these restrictions as far as possible. We already have library_private_types_in_public_api, and we could easily add an extra lint for public members of private classes, for inferred variable types which are private to a different library, etc.

It should also be noted that if we have these restrictions in the language then the Dart 1 algorithm in UP will never encounter a case where there is a need to remove a private class from a superinterface graph, because they won't occur in such graphs in the first place. Private declarations can depend on each other, and they may have a non-private LUB, but they won't ever be arguments to UP in a different library because they aren't available via any public APIs (and they are not available by transitive traversals of superinterfaces, etc.). Otherwise, private declarations would be treated just like other declarations by UP in the library that declares them. So if we are consistent then we don't need the special exception where UP deletes all the private types from other libraries (but we might do it with "typeless" mixins and classes).

In short: Let's use simple and consistent rules, and treat private types in a way that resembles the treatment they get elsewhere (e.g., when var x = ...; gives x the type _Secret because that's the actual type of the initializing expression, not Object or whatever the nearest non-private type is). For anyone who wishes to avoid encountering private types from different libraries, let's give them good support for doing that by means of lints.

@lrhn
Copy link
Member

lrhn commented Aug 23, 2023

And my general argument against is that UP is not about being consistent. It's a heuristic function that tries to give a useful upper bound, while being deterministic, symmetric, non-arbitrary and possibly efficient.

The "non-arbitrary" means that if there are two equally valid choices, it doesn't choose one of them. Not even if it could do it deterministically and symmetrically. And that might still be more of an artifact of being deterministic, symmetric, and fairly easily specified, than being an actual design choice.

But it's a heuristic, which worked adequately when it was designed.
There is nothing fundamental about the definition of the depth function used, or using a depth at all. It's not trying to be consistent with anything, it's just trying, best effort, to find an upper bound that is useful to the user, and within the design constraints.
It's never been a good design for a least upper bound, but it comes from a time where there were implicit downcasts everywhere, and types were ignored entirely in production mode. It was adequate at the time.
It was not adequate for Dart 2. It wasn't massively worse for Dart 2.12 (null safety), but with switch expressions, the shortcomings of UP hits people on a daily basis.

And so, heuristically, I think we'd be giving the user a more useful result in the very vast majority of cases, if we omit types with inaccessible declarations from the set of possible results. (And any anonymouys mixin applications.)

Any type you can't name, is more likely to not be a type that you can use.

The examples here, of private types that are usefully inferred outside of their declared libraries, is code that would never pass code-review. It's possible to write such code. It's just not a good idea, except perhaps in a very minuscule amount of cases, and we should prevent ourselves from improving the majority of cases because of that.

And the added benefit would be that authors don't need to worry about their private declarations getting leaked in that way. They can still leak them in other ways, but that's on themselves. This was one way of leaking a private type that was mostly outside of the author's control, and not something lints would have much of a chance of detecting. Leaking them through your own API is much easier to detect.

(But I also think that a complete redesign, where we use the context type as well as the operand types, and go away from depth entirely, and possibly generalize to more than two operands, because UP is not associative, would probably be even better. In the design space, with smaller modifications to the existing UP, I'd remove the inaccssbile types that usually just get in the way of finding another result. If we allow bigger changes, the sky is the limit!)

@hamsbrar
Copy link
Author

hamsbrar commented Aug 24, 2023

For anyone who wishes to avoid encountering private types from different libraries, let's give them good support for doing that by means of lints.

@eernstg if we want to support both users i.e a user who is NOT OK with an inaccessible type and a user who is, then I don't think adding a simple lint will help. I understand that a user will be able to see the warning but how they're supposed to fix it? wouldn't there be case in which algorithm return an inaccessible type because it was deeper than a type that is accessible and now user is seeing a warning that they can't fix?

I think this is where we'd have to provide a method using which user can tell the algorithm whether they're okay with inaccessible types as results in their code. We could do that in analysis options or something but then it'll get more complicated from here. What if a user is NOT OK with inaccessible types but some of the packages that they're using are OK with them? Sure we'll be able to find a way to proceed but this is the point where more things, more features will start depending on this behaviour. Which is OK TOO but what if in future we find out that some of the features don't align with the user's ability to choose this behaviour and our algorithm(which is pervasive) start giving weird results? this is the point where we might want to roll back that decision and support just one user(could be any one) but now we're locked in a design that we cannot change because it'll break everything.

So it should be just one user(could be any one). Constraints don't travel through time, at least not as far as decisions do. In future, we might find ourselves in an environment where we've the flexibility to support both users so this could be seen as a limitation for now.

(Also, I've added the optimized version #3290 (comment), just in-case someone wants to see/evaluate the approach)

@hamsbrar
Copy link
Author

hamsbrar commented Aug 24, 2023

I think what is left here is the discussion about what to do next and this is something I should stop meddling into. So I have no problem if team thinks that they can provide assistance/support in both use-cases(the one where a user is OK with inaccessible types as results and the one where a user is NOT). I appreciate that team members(eernstg and lrhn) shared details to public(like me) but it'll be okay too if team decides to do the further discussion elsewhere(or turn it into an internal discussion). And I understand that such discussions can be time consuming and may not always lead to a decision so I'm perfectly fine with this issue remaining open, indefinitely.

@hamsbrar
Copy link
Author

hamsbrar commented Aug 26, 2023

The "non-arbitrary" means that if there are two equally valid choices, it doesn't choose one of them. Not even if it could do it deterministically and symmetrically. And that might still be more of an artifact of being deterministic, symmetric, and fairly easily specified, than being an actual design choice.

I was trying to see if there is a way to proceed, but it turns out that my thinking got a bit muddled recently.

The correct thing for an algorithm is to return multiple results if there are multiple results. If I'm trying to return single result from an operation capable of producing multiple results, without consulting the user who initiated the operation, then my thinking is flawed. And if a user is expecting me to always guess exactly the result they're looking for, then their thinking is flawed as well.

So if I allow users to supply types in a way that can lead to multiple results, then I should ask them which one to continue with whenever there are multiple results. I see that making an arbitrary choice for them is bad but completely swallowing their results like some black hole and offering no explanation about their whereabouts is what I'd say "pretty damning" :)

Here's an another approach:

If there are multiple valid types that algorithm can return, then it return a type mixed indicating that it cannot make an arbitrary choice for the user. Then analyzer issues a warning to the user, asking them to cast the mixed expression to a specific type. Then that cast is taken as a hint by the algorithm and next time it returns the type that user is expecting.

Additionally, a warning can help users avoid casting mixed to a type that's not present in valid results. These warnings can be made compile errors. In most cases algorithm will be able to find only one result, and in cases like the one I have in this issue, I'd be more than happy to cast mixed to the type I want instead of running into a useless compile error(just look at the error I'm having in the example at top and ask yourself whether it makes any sense to a user?). Moreoever, tools can provide accurate information in mixed warnings/errors along with quick fixes(showing all valid options for user to choose from). Inaccessible types will help the algorithm find more types that are accessible but returning an inaccessible type won't help because users won't be able to use them and those types will only cause the algorithm to return mixed in cases where it could've returned just one type.


Furthermore, I think algorithm can decide to return mixed implements BestResult1, BestResult2...BestResultN in cases where it's possible and safe to do so. Analyzer can then let users use these type of mixed expressions without casting them, as long as they're using them at places where it's safe. I think it should be made possible in cases where it's not possible or at least something similar should be tried first. If I've designed a system that allow users to produce multiple results then I must design something that allow them to use those results in a less painful way.

Patchwork is for people like me who know nothing about designing these systems.

@hamsbrar
Copy link
Author

Here's one example where we're already making an arbitrary choice:

mixin M1 { void f() => print('f from M1'); }
mixin M2 { void f() => print('f from M2'); }

class A with M1, M2 {}
class B with M2, M1 {}

A().f(); // f from M1
B().f(); // f from M2

As you can probably see call to f depends on the order in which mixins are specified after with, while it shouldn't have? Still, that's not the end. If f gets removed from M1, I'd expect at least a little heads up since it's in use. Instead, program makes another arbitrary choice to roll with f from M2.

I hope this isn't specified in the rule book but if it actually is... Jesus H. Christ

@lrhn
Copy link
Member

lrhn commented Aug 28, 2023

It is specified. Mixin applications are ordered, so that later mixin applications can do super.foo invocations on methods from earlier mixins. They create a chain of superclasses, which are each an anonymous mixin's application of one mixin on the previous class in the chain.
That's quite deliberate, no deity invocations needed.

@hamsbrar
Copy link
Author

so that later mixin applications can do super.foo invocations on methods from earlier mixins.

It just looked to me that specification is OK with arbitrary choices if they're made for implementation convenience but NOT OK if they're made for user convenience. If this super.f is a must feature and ordering mixin applications is the only way to enable it, then I admit that I'm worng and apologies for the inconvenience caused.

@hamsbrar
Copy link
Author

I still don't get why I'd want to do a super.f in M2 which is not related to M1 in any way.

Even if I add a type constraint on M1 and M2 that doesn't imply that there's a relationship between them.

This doesn't look like a feature to me unless someone explains me how.

@eernstg
Copy link
Member

eernstg commented Aug 28, 2023

@hamsbrar wrote:

if we want to support both users i.e a user who is NOT OK with an inaccessible type and a user who is, then I don't think adding a simple lint will help.

The lint isn't going to help a developer who is experiencing a typing which is an outright error or perhaps just an inconvenience, it is intended to help the authors and maintainers of the types that we're computing an upper bound for.

The assumption is that those authors and maintainers will want to avoid having certain structures in their type hierarchies. For instance, they might well want to make sure that no public type is a subtype of any private type. If that's true then we don't have to worry about how to deal with public types whose UP turns out to be a private type because it basically never happens.

If that's daily life, and we have a ruleset with no exceptions, then it's not going to be hard to reason about cases where it does happen after all (that is, some expressions have a static type which is private to another library). Such cases might turn out to be a useful design pattern, and we generally don't make anything an error in Dart just because we haven't thought of a useful way to use it, yet.

In client code, it is always possible to force a particular solution if UP doesn't return the result we need. For example, we could have a superinterface structure where it could never work (because no reasonable definition of UP would be able to disambiguate the given types), so we'd have to force a specific choice explicitly:

class I {}
class J {}
class C1 implements I, J {}
class C2 implements I, J {}

var b = true;

void main() {
  var x = b ? C1() : C2(); // Must yield `Object`, which we don't want!
  var y = b ? C1() : C2() as I; // Disambiguate explicitly, `y` gets type `I`.
}

... turn it into an internal discussion

Sorry about the delayed responses. We do have lots of internal discussions (as well), and things can take time to settle. But do continue to contribute to any and all Dart discussions here if and when you want to do so!

I still don't get why I'd want to do a super.f in M2 which is not related to M1 in any way.

Superinvocations in mixins are useful when you want to create a behavior from parts:

class A {
  final String name;
  A(this.name);
  String greeting() => name;
}

mixin Hello on A {
  @override
  String greeting() => 'Hello, ${super.greeting()}';
}

mixin Nice on A {
  @override
  String greeting() => '${super.greeting()}, nice to see you!';
}

mixin Join on A {
  @override
  String greeting() => '${super.greeting()}! Come and have cup of coffee!';
}

class B1 = A with Hello, Nice;

class B2 = A with Hello, Join;

void main() {
  print(B1('John').greeting());
  print(B2('John').greeting());
}

The mixins don't have to depend directly on each other, they just need to be designed for contributing to a behavior that has multiple parts.

@hamsbrar
Copy link
Author

We do have lots of internal discussions (as well), and things can take time to settle.

No worries!

In client code, it is always possible to force a particular solution if UP doesn't return the result we need. For example, we could have a superinterface structure where it could never work (because no reasonable definition of UP would be able to disambiguate the given types), so we'd have to force a specific choice explicitly.

I think most users are going to write this:

class I {}
class J {}
class C1 implements I, J {}
class C2 implements I, J {}

var b = true;

void main() {
  var x = b ? C1() : C2();
  var y = b ? C1() : C2();

  // -----------------------------

  if(y is I) {
    // y.somethingFromI();
  }

  if(y is J) {
    // y.somethingFromJ();
  }

  // -----------------------------
}

Which is also good so no issues here. They maybe able to write this too but maybe in distant future:

void main() {
  var x = b ? C1() : C2();
  var y = b ? C1() : C2();

  // y.somethingFromI();
  // y.somethingFromJ();
}

One thing, it'd be great if UP errors carries more information because in many cases they aren't related to user code.

The mixins don't have to depend directly on each other, they just need to be designed for contributing to a behavior that has multiple parts.

This pattern looks fragile but if that's what users want then what can I say. Also, I admit that there's no other way to enable this pattern since it depends on everything there is to depend in the current design. I think a lint for detecting mixin applications having conflicting members can help users who are looking for more safety and predictability.

class B1 = A with Hello, Nice;

Thank you for sharing this one!

@eernstg
Copy link
Member

eernstg commented Aug 29, 2023

One thing, it'd be great if UP errors carries more information because in many cases they aren't related to user code.

Good point!

@bwilkerson, do you think there is a way ahead for telling users how the type Object came up and caused the error in situations like the following one?:

bool b = true;

void main() {
  List<num> xs = [1];
  Iterable<int> ys = [2];
  Iterable<num> zs = b ? xs : ys;
}

Perhaps it could be managed as a form of provenance on a DartType which would be non-null only in special cases, like an UP operation that uses the last clause (that is, the Dart 1 least upper bound algorithm).

@hamsbrar
Copy link
Author

hamsbrar commented Aug 30, 2023

Question from the Iterable example above,

  1. If UP(Interable, List) is Iterable.
  2. If UP(num, int) is num.
  3. Then, a user will naturally conclude that UP(Iterable<int>, List<num>) is Iterable<num>.

What if I generalise their conclusion as UP(C1<T0,...Tn>, C2<S0,...Sn>) = C<R0,...Rn> where C is UP(C1, C2) and Ri is UP(Ti, Si) given that UP(C1, C2) yields C that accept exactly n type arguments and there are no generic/type constraints on Ti, Si and Ri.

Now, is this going to run into problems? and if not then what would be the impact of this generalisation?

@eernstg
Copy link
Member

eernstg commented Aug 30, 2023

Check out #3282. ;-)

@hamsbrar
Copy link
Author

hamsbrar commented Sep 4, 2023

I did check it, the purposed version over there is somewhat similar to the generalisation that I made but yours is definitely better, less restrictive and isn't missing any details.

The algorithm purposed in this thread can be extended with proper handling of type arguments while avoiding termination issues. That is, I can imagine algorithm accepting C1<T1...Ti> and C2<S1...Si> as input and going through it without recursion. Specific-ness(not always = depth) is a byproduct in the purposed design so a separate computation that determine depth(steps away from Object) can be avoided as well. But I also see that not everyone like the purposed design so there's no point in trying to evaluate/talk about it further.

If I'm not mistaken this time, the only problem that remains is arbitrary choices, and it's not related to the algorithms. It's a decision that other parts have to make.

  • One purposed change is to remove in-accessible types from results, which will help in cases where algorithm finds more than one results but only one of them is accessible to the user.
  • Second purposed change is to use something like a mixed type, which can be further enhanced with mixed implements BestResult1, BestResult2.. where possible.
  • Third solution is union types(I think).

If second/third changes are not-possible/infeasible and not everyone likes the first change then:

  • Remove inaccessible types from results iff removing them leaves one result.

This will fix issues stemming solely from inaccessible types.

(I'm not seeking immediate actions; this is just an overview.)


Also, I can finally see how depth function is defined(#3282) and it clearly suffers from choosing arbitrarily:

class A {}

class B {}
class B2 extends B {}

class T1 extends A implements B2 {}
class T2 extends A implements B2 {}

var t = (true as dynamic) ? T1() : T2(); // B2

A is a valid choice, just as B2 is. Choosing B2 is a purely arbitrary decision.

Specification need to stop shifting the goalposts and be transparent about arbitrary choices:

If specification doesn't want arbitrary choices then it should say 'Select type that is common and is more specific than every other common type' or 'Choose none'. An implementation can use depth function to determine which type is more specific iff all common types are related else it return none(or fallback to Object/dynamic if required).

(OR)

If depth function is allowed to make arbitrary choices elsewhere then it should be allowed to make an arbitrary choice when there are multiple types at the deepest level(when Sq have cardinality greater than one). Users can simply cast to the type they want if they don't get what they want upfront. They're used to it anyway.

@eernstg
Copy link
Member

eernstg commented Sep 4, 2023

@hamsbrar, thanks for the kind words!

A is a valid choice, just as B2 is. Choosing B2 is a purely arbitrary decision.

Point well taken! A and B2 are unrelated types, and there is little justification for the assumption that B2 is a more useful result. It is not arbitrary, technically, because it relies on the 'depth' of those types, but the depth of a type isn't directly connected to the relevance or usefulness of a typed for the given context.

On the other hand, the property of being relevant to the context doesn't appear to be algorithmic (unless it is expressed directly as a context type schema), so we probably can't even hope to eliminate the reliance on some property which is "arbitrary" in that sense.

I think this would be yet another reason to keep this issue in mind: #1618.

@hamsbrar
Copy link
Author

hamsbrar commented Sep 4, 2023

"technically" :) even random numbers aren't random.

@eernstg
Copy link
Member

eernstg commented Apr 24, 2024

At this time, --enable-experiment=inference-update-3 has been enabled by default, which means that the typing of the branching expression (in this case it's ?:, but it applies to switch expressions and other forms as well) will use the context type, and the reported inconvenience is gone.

In particular, this example is accepted with no errors with dart and dart analyze from c42fd69433ca0741fda5ecf82803aaa2fda24165:

enum E { a, b }
mixin M on Enum {}
abstract interface class I {}
enum E1 with M implements I { a1 }
enum E2 with M implements I { a2 }

List<Enum> f(E e) => switch (e) {
      E.a => E1.values,
      E.b => E2.values,
    };

void main() => f(E.a).forEach(print);

I'll close this issue because the fix is in the pipeline and will be available soon.

Note that the example used to be related to #3665 as well, but this is no longer true (because the context type overrules the upper bound computation). Anyway, that issue is being fixed as well.

@eernstg eernstg closed this as completed Apr 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
least-upper-bound Proposals about improvements of the LUB algorithm
Projects
None yet
Development

No branches or pull requests

5 participants
@chloestefantsova @lrhn @eernstg @hamsbrar and others