Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lightweight extension types #1426

Closed
eernstg opened this issue Feb 1, 2021 · 105 comments
Closed

Lightweight extension types #1426

eernstg opened this issue Feb 1, 2021 · 105 comments
Labels
extension-types feature Proposed language feature that solves one or more problems inline-classes Cf. language/accepted/future-releases/inline-classes/feature-specification.md

Comments

@eernstg
Copy link
Member

eernstg commented Feb 1, 2021

[Jun 16 2021: Note the proposal for extension types and the newer proposal for views.]

[Feb 24th 2021: This issue is getting too long. Please use separate issues to discuss subtopics of this feature.]

[Editing: Note that updates are described at the end, search for 'Revisions'.]

Cf. #40, #42, and #1474, this issue contains a proposal for how to support static extension types in Dart as a minimal enhancement of the static extension methods that Dart already supports.

In this proposal, a static extension type is a zero-cost abstraction mechanism that allows developers to replace the set Sinstance of available operations on a given object o (that is, the instance members of its type) by a different set Sextension of operations (the members declared by the specific extension type).

One possible perspective is that an extension type corresponds to an abstract data type: There is an underlying representation, but we wish to restrict the access to that representation to a set of operations that are completely independent of the operations available on the representation. In other words, the extension type ensures that we only work with the representation in specific ways, even though the representation itself has an interface that allows us to do many other (wrong) things.

It would be straightforward to achieve this by writing a class C with members Sextension as a wrapper, and working on the wrapper object new C(o) rather than accessing o and its methods directly.

However, creation of wrapper objects takes time and space, and in the case where we wish to work on an entire data structure we'd need to wrap each object as we navigate the data structure. For instance, we'd need to wrap every node in a tree if we wish to traverse a tree and maintain the discipline of using Sextension on each node we visit.

In contrast, the extension type mechanism is zero-cost in the sense that it does not use a wrapper object, it enforces the desired discipline statically.

Examples

extension ListSize<X> on List<X> {
  int get size => length;
  X front() => this[0];
}

void main() {
  ListSize<String> xs = <String>['Hello']; // OK, upcast.
  print(xs); // OK, `toString()` available on Object?.
  print("Size: ${xs.size}. Front: ${xs.front()}"); // Available members.
  xs[0]; // Error, no `operator []`.

  List<ListSize<String>> ys = [xs]; // OK.
  List<List<String>> ys2 = ys; // Error, downcast.
  ListSize<ListSize<Object>> ys3 = ys; // OK.
  ys[0].front(); // OK.
  ys3.front().front(); // OK.
  ys as List<List<String>>; // `ys` is promoted, succeeds at run time.
  // We may wish to lint promotion out of extension types.
}

A major application would be generated extension types, handling the navigation of dynamic object trees (such as JSON, using something like numeric types, String, List<dynamic>, Map<String, dynamic>), with static type dynamic, but assumed to satisfy a specific schema.

Here's a tiny core of that, based on nested List<dynamic> with numbers at the leaves:

extension TinyJson on Object? {
  Iterable<num> get leaves sync* {
    var self = this;
    if (self is num) {
      yield self;
    } else if (self is List<dynamic>) {
      for (Object? element in self) {
        yield* element.leaves;
      }
    } else {
      throw "Unexpected object encountered in TinyJson value";
    }
  }
}

void main() {
  TinyJson tiny = <dynamic>[<dynamic>[1, 2], 3, <dynamic>[]];
  print(tiny.leaves);
}

Proposal

Syntax

This proposal does not introduce new syntax.

Note that the enhancement sections below do introduce new syntax.

Static analysis

Assume that E is an extension declaration of the following form:

extension Ext<X1 extends B1, .. Xm extends Bm> on T { // T may contain X1 .. Xm.
  ... // Members
}

It is then allowed to use Ext<S1, .. Sm> as a type: It can occur as the declared type of a variable or parameter, as the return type of a function or getter, as a type argument in a type, or as the on-type of an extension.

In particular, it is allowed to create a new instance where one or more extension types occur as type arguments.

When m is zero, Ext<S1, .. Sm> simply stands for Ext, a non-generic extension. When m is greater than zero, a raw occurrence Ext is treated like a raw type: Instantiation to bound is used to obtain the omitted type arguments.

We say that the static type of said variable, parameter, etc. is the extension type Ext<S1, .. Sm>, and that its static type is an extension type.

If e is an expression whose static type is the extension type Ext<S1, .. Sm> then a member access like e.m() is treated as Ext<S1, .. Sm>(e as T).m() where T is the on-type corresponding to Ext<S1, .. Sm>, and similarly for instance getters and operators. This rule also applies when a member access implicitly has the receiver this.

That is, when the type of an expression is an extension type, all method invocations on that expression will invoke an extension method declared by that extension, and similarly for other member accesses. In particular, we can not invoke an instance member when the receiver type is an extension type.

For the purpose of checking assignability and type parameter bounds, an extension type Ext<S1, .. Sm> with type parameters X1 .. Xm and on-type T is considered to be a proper subtype of Object?, and a proper supertype of [S1/X1, .. Sm/Xm]T.

That is, the underlying on-type can only be recovered by an explicit cast, and there are no non-trivial supertypes. So an expression whose type is an extension type is in a sense "in prison", and we can only obtain a different type for it by forgetting everything (going to a top type), or by means of an explicit cast.

When U is an extension type, it is allowed to perform a type test, o is U, and a type check, o as U. Promotion of a local variable x based on such type tests or type checks shall promote x to the extension type.

Note that promotion only occurs when the type of o is a top type. If o already has a non-top type which is a subtype of the on-type of U then we'd use a fresh variable U o2 = o; and work with o2.

There is no change to the type of this in the body of an extension E: It is the on-type of E. Similarly, extension methods of E invoked in the body of E are subject to the same treatment as previously, which means that extension methods of the enclosing extension can be invoked implicitly, and it is even the case that extension methods are given higher priority than instance methods on this, also when this is implicit.

Dynamic semantics

At run time, for a given instance o typed as an extension type U, there is no reification of U associated with o.

By soundness, the run-time type of o will be a subtype of the on-type of U.

For a given instance of a generic type G<.. U ..> where U is an extension type, the run-time representation of the generic type contains a representation of the on-type corresponding to U at the location where the static type has U. Similarly for function types.

This implies that void Function(Ext) is represented as void Function(T) at run-time when Ext is an extension with on-type T. In other words, it is possible to have a variable of type void Function(T) that refers to a function object of type void Function(Ext), which seems to be a soundness violation. However, we consider such types to be the same type at run time, which is in any case the finest distinction that we can maintain. There is no soundness issue, because the added discipline of an extension type is voluntary, it is still sound as long as we treat the underlying object according to the on-type.

A type test, o is U, and a type check, o as U, where U is an extension type, is performed at run time as a type test and type check on the corresponding on-type.

Enhancements

The previous section outlines a core proposal. The following sections introduce a number of enhancements that were discussed in the comments on this issue.

Prevent implicit invocations: Keyword 'type'.

Consider the type int. This type is likely to be used as the on-type of many different extension types, because it allows a very lightweight object to play the role as a value with a specific interpretation (say, an Age in years or a Width in pixels). Different extension types are not assignable to each other, so we'll offer a certain protection against inconsistent interpretations.

If we have many different extension types with the same or overlapping on-types then it may be impractical to work with: Lots of extension methods are applicable to any given expression of that on-type, and they are not intended to be used at all, each of them should only be used when the associated interpretation is valid.

So we need to support the notion of an extension type whose methods are never invoked implicitly. One very simple way to achieve this is to use a keyword, e.g., type. The intuition is that an 'extension type' is used as a declared type, and it has no effect on an expression whose static type matches the on-type. Here's the rule:

An extension declaration may start with extension type rather than extension. Such an extension is not applicable for any implicit extension method invocations.

<extensionDeclaration> ::=
    'extension' 'type'? <typeIdentifier> <typeParameters>? 'on' <type>
    '{' (<metadata> <extensionMemberDefinition>)* '}'

For example:

extension type Age on int {
  Age get next => this + 1;
}

void main() {
  int i = 42;
  i.next; // Error, no such method.
  Age age = 42;
  age.next; // OK.
}

Allow instance member access: show, hide.

The core proposal in this issue disallows invocations of instance methods of the on-type of a given extension type. This may be helpful, especially in the situation where the main purpose of the extension type is to ensure that the underlying data is processed in a particular, disciplined manner, whereas the on-type allows for many other operations (that may violate some invariants that we wish to maintain).

However, it may also be useful to support invocation of some or all instance members on a receiver whose type is an extension type. For instance, there may be some read-only methods that we can safely call on the on-type, because they won't violate any invariants associated with the extension type. We address this need by introducing hide and show clauses on extension types.

An extension declaration may optionally have a show and/or a hide clause after the on clause.

<extensionDeclaration> ::=
    'extension' 'type'? <typeIdentifier> <typeParameters>?
        'on' <type> <extensionShowHide>
    '{' (<metadata> <extensionMemberDefinition>)* '}'

<extensionShowHide> ::= <extensionShow>? <extensionHide>?

<extensionShow> ::= 'show' <extensionShowHideList>
<extensionHide> ::= 'hide' <extensionShowHideList>

<extensionShowHideList> ::= <extensionShowHideElement> |
    <extensionShowHideList> ',' <extensionShowHideElement>

<extensionShowHideElement> ::= <type> | <identifier>

We use the phrase extension show/hide part, or just show/hide part when no doubt can arise, to denote a phrase derived from <extensionShowHide>. Similarly, an <extensionShow> is known as an extension show clause, and an <extensionHide> is known as an extension hide clause, similarly abbreviated to show clause and hide clause.

The show/hide part specifies which instance members of the on-type are available for invocation on a receiver whose type is the given extension type.

A compile-time error occurs if an extension does not have the type keyword, and it has a hide or a show clause.

If the show/hide part is empty, no instance members except the ones declared for Object? can be invoked on a receiver whose static type is the given extension type.

If the show/hide part is a show clause listing some identifiers and types, invocation of an instance member is allowed if its basename is one of the given identifiers, or it is the name of a member of the interface of one of the types. Instance members declared for object can also be invoked.

If the show/hide part is a hide clause listing some identifiers and types, invocation of an instance member is allowed if it is in the interface of the on-type and not among the given identifiers, nor in the interface of the specified types.

If the show/hide part is a show clause followed by a hide clause then the available instance members is computed by first considering the show clause as described above, and then removing instance members from that set based on the hide clause as described above.

A compile-time error occurs if a hide or show clause contains an identifier which is not the basename of an instance member of the on-type. A compile-time error occurs if a hide or show clause contains a type which is not among the types that are implemented by the on-type of the extension.

A type in a hide or show clause may be raw (that is, an identifier or qualified identifier denoting a generic type, but no actual type arguments). In this case the omitted type arguments are determined by the corresponding superinterface of the on-type.

For example:

extension type MyInt on int show num, isEven hide floor {
  int get twice => 2 * this;
}

void main() {
  MyInt m = 42;
  m.twice; // OK, in the extension type.
  m.isEven; // OK, a shown instance member.
  m.ceil(); // OK, a shown instance member.
  m.toString(); // OK, an `Object?` member.
  m.floor(); // Error, hidden.
}

Invariant enforcement through introduction: Protected extension types

In some cases, it may be convenient to be able to create a large object structure with no language-level constraints imposed, and later working on that object structure using an extension type. For instance, a JSON value could be modeled by an object structure containing instances of something like int, bool, String, List<dynamic>, and Map<String, dynamic>, and there may be a schema which specifies a certain regularity that this object structure should have. In this case it makes sense to use the approach of the original proposal in this issue: The given object structure is created (perhaps by a general purpose JSON deserializer) without any reference to the schema, or the extension type. Later on the object structure is processed, using an extension type which corresponds to the schema.

However, in other cases it may be helpful to constrain the introduction of objects of the given extension types, such that it is known from the outset that if an expression has a type U which is an extension type then it was guaranteed to have been given that type in a situation where it satisfied some invariants. If the underlying representation object (structure) is mutable, the extension type members should be written in such a way that they preserve the given invariants.

We introduce the notion of extension type constructors to handle this task.

An extension declaration with the type keyword can start with the keyword protected. In this case we say that it is a protected extension type. A protected extension type can declare one or more non-redirecting factory constructors. We use the phrase extension type constructor to denote such constructors.

An instance creation expression of the form Ext<T1, .. Tk>(...) or Ext<T1, .. Tk>.name(...) is used to invoke these constructors, and the type of such an expression is Ext<T1, .. Tk>.

During static analysis of the body of an extension type constructor, the return type is considered to be the on-type of the enclosing extension type declaration.

In particular, it is a compile-time error if it is possible to reach the end of an extension type constructor without returning anything.

A protected extension type is a proper subtype of the top types and a proper supertype of Never.

In particular, there is no subtype relationship between a protected extension type and the corresponding on-type.

When E (respectively E<X1, .. Xk>) is a protected extension type, it is a compile-time error to perform a downcast or promotion where the target type is E (respectively E<T1, .. Tk>).

The rationale is that an extension type that justifies any constructors will need to maintain some invariants, and hence it is not helpful to allow implicit introduction of any value of that type with no enforcement of the invariants at all.

For example:

protected extension type nat on int {
  factory nat(int value) =>
      value >= 0 ? value : throw "Attempt to create an invalid nat";
}

void main() {
  nat n1 = 42; // Error.
  var n2 = nat(42); // OK at compile time and at run time.
}

The run-time representation of a type argument which is a protected extension type E resp. E<T1, .. Tk> is an identification of E resp. E<T1, .. Tk>.

In particular, it is not the same as the run-time representation of the corresponding on-type. This is necessary in order to maintain that the on-type and the protected extension type are unrelated.

For example:

class IntBox {
  int i;
  IntBox(this.i);
}

protected extension type EvenIntBox on IntBox {
  factory EvenIntBox(int i) => i % 2 == 0 ? IntBox(i) : throw "Invalid EvenIntBox";
  void next() => this.i += 2;
}

void main() {
  var evenIntBox = EvenIntBox(42);
  evenIntBox.next(); // Methods of `EvenIntBox` maintain the invariant.
  var intBox = evenIntBox as IntBox; // OK statically and dynamically.
  intBox.i++; // Invariant of `evenIntBox` violated!

  var evenIntBoxes = [evenIntBox]; // Type `List<EvenIntBox>`.
  evenIntBoxes[0].next(); // Elements typed as `EvenIntBox`, maintain invariant.
  List<IntBox> intBoxes = evenIntBoxes; // Compile-time error.
  intBoxes = evenIntBoxes as dynamic; // Run-time error.
}

Boxing

It may be helpful to equip each extension type with a companion class whose instances have a single field holding an instance of the on-type, so it's a wrapper with the same interface as the extension type.

Let E be an extension type with keyword type. The declaration of E is implicitly accompanied by a declaration of a class CE with the same type parameters and members as E, subclass of Object, and with a final field whose type is the on-type of E, and with a single argument constructor setting that field. The class can be denoted in code by E.class. An implicitly induced getter E.class get box returns an object that wraps this.

In the case where it would be a compile-time error to declare such a member named box, the member is not induced.

The latter rule helps avoiding conflicts in situations where box is a non-hidden instance member, and it allows developers to write their own declaration of box if needed.

Non-object entities

If we introduce any non-object entities in Dart (that is, entities that cannot be assigned to a variable of type Object?, e.g., external C / JavaScript / ... entities, or non-boxed tuples, etc), then we may wish to allow for extension types whose on-type is a non-object type.

This should not cause any particular problems: If the on-type is a non-object type then the extension type will not be a subtype of Object.

Discussion

It would be possible to reify extension types when they occur as type arguments of a generic type.

This might help ensuring that the associated discipline of the extension type is applied to the elements in, say, a list, even in the case where that list is obtained under the type dynamic, and a type test or type cast is used to confirm that it is a List<U> where U is an extension type.

However, this presumably implies that the cast to a plain List<T> where T is the on-type corresponding to U should fail; otherwise the protection against accessing the elements using the underlying on-type will easily be violated. Moreover, even if we do make this cast fail then we could cast each element in the list to T, thus still accessing the elements using the on-type rather than the more disciplined extension type U.

We cannot avoid the latter if there is no run-time representation of the extension type in the elements in the list, and that is assumed here: For example, if we have an instance of int, and it is accessed as extension MyInt on int, the dynamic representation will be a plain int, and not some wrapped entity that contains information that this particular int is viewed as a MyInt. It seems somewhat inconsistent if we maintain that a List<MyInt> cannot be viewed as a List<int>, but a MyInt can be viewed as an int.

As for promotion, we could consider "promoting to a supertype" when that type is an extension type: Assume that U is an extension type with on-type T, and the type of a promotable local variable x is T or a subtype thereof; x is U could then demote x to have type U, even though is tests normally do not demote. The rationale would be that the treatment of x as a U is conceptually more "informative" and "strict" than the treatment of x as a T, which makes it somewhat similar to a downcast.

Note that we can use extension types to handle void:

extension void on Object? {}

This means that void is an "epsilon-supertype" of all top types (it's a proper supertype, but just a little bit). It is also a subtype of Object?, of course, so that creates an equivalence class of "equal" types spelled differently. That's a well-known concept today, so we can handle that (and it corresponds to the dynamic semantics).

This approach does not admit any member accesses for a receiver of type void, and it isn't assignable to anything else without a cast. Just like void of today.

However, compared to the treatment of today, we would get support for voidness preservation, i.e., it would no longer be possible to forget voidness without a cast in a number of higher order situations:

List<Object?> objects = <void>[]; // Error.

void f(Object? o) { print(o); }
Object? Function(void) g = f; // Error, for both types in the signature.

Revisions

  • Feb 8, 2021: Introduce 'protected' extension type terminology. Change the proposed subtype relationship for protected extension types. Remove proposal 'alternative 1' where protected extension types are completely unrelated to other types. Add reification of protected extension types as part of the proposal (making 'alternative 2' part of the section on protected extension types).

  • Feb 5, 2021: Add some enhancement mechanism proposals, based on the discussion below: Keyword type prevents implicit invocations; construction methods; show/hide; non-objects.

  • Feb 1, 2021: Initial version.

@eernstg eernstg added the feature Proposed language feature that solves one or more problems label Feb 1, 2021
@eernstg
Copy link
Member Author

eernstg commented Feb 1, 2021

I'm not proposing that StringExt("hello") should be usable as a stand-alone expression. It's certainly possible that this would be a meaningful addition to the proposal, but my immediate take on this idea is that it wouldn't be particularly useful.

For instance, if we were to consider StringExt("hello") as a way to obtain an expression of type StringExt with representation "hello" then we could just as well have used "hello" as StringExt. In any case, this is just an upcast, so if we plan to use it as foo("hello" as StringExt) because the parameter of foo has type StringExt then we could as well use foo("hello"), and rely on the implicit upcast.

The other things are basically working as you describe them:

void main() {
  StringExt hello = "hello"; // OK, implicit upcast.
  hello.bar(); // OK.
  hello.length; // Error.
  ("hello" as StringExt).bar(); // OK.
  ("hello" as StringExt).length; // Error.
  hello.runtimeType; // OK, returns the `Type` that reifies `String`.
  hello as String; // Evaluates to the plain string "hello".
  hello is String; // Evaluates to true.
  StringExt anotherHello = "hello";
  identical(hello, anotherHello); // No promises.
  hello == anotherHello; // True (`operator ==` is available on `Object?`).
}

The run-time representation of hello and otherHello is the underlying String, so identical behaves as it would behave with any other expression evaluating to the same strings. Canonicalization of strings has not been fully specified (#985), but the two string literals both spelled "hello" will in practice be identical.

Given that one of the main motivations for extension types is to allow developers to specify a disciplined way to work on very flexible objects (say, List<dynamic>, Map<String, dynamic> and similar types, used to build trees that should have a specific structure), it seems likely that we'd want to lint the type casts (like hello as String) and type tests (hello is String) where the extension type is dropped, and the underlying representation type (a supertype of the run-time type) is revealed.

We could easily make it harder to enter the extension type (e.g., by requiring an explicit cast when the target type is an extension type), and we could easily make it harder to exit the extension type (perhaps outlawing all type casts and type tests away from an extension type). But I suspect that the proper goal would be to help developers who wish to maintain the discipline associated with a specific extension type, such that they don't switch to the underlying representation type by accident.

There could be cases where an extension type is used to perform almost all of a task, but some remaining bits of work will be done in terms of the underlying representation (because it's too tedious to write methods in the extension to cover it all), and in those cases it might actually be both common and reasonable to switch to the representation type at some point.

@Cat-sushi
Copy link

Does this proposal access to the issue mentioned in "Dart string manipulation done right 👉 | by Tao Dong | Dart | Medium" ?
For example, can this proposal hide or make deprecated some method such as length of String with a corresponding extension type?

@eernstg
Copy link
Member Author

eernstg commented Feb 1, 2021

An extension type as proposed here cannot (conveniently) hide a single method in the interface of an existing class.

If we're willing to use a not-so-convenient approach then we could simply declare every single method in the interface of String except length as members of an extension MyString, and then access all strings using the type MyString rather than String.

However, that's a very unreliable approach, because string literals have type String, methods of code in other libraries (system libraries, libraries from third party packages, etc.) return Strings, and so on, so it isn't going to be easy to enforce that all strings are accessed using the type MyString.

@Cat-sushi
Copy link

I think the problem is that String is and should be assignable to Stringy which is a extension type of String.
My migration story is that,

  1. change the first property of constructor Text() to Stringy.
  2. encourage Flutter developers to migrate String to Stringy with IDEs.
  3. Someday, make String.length deprecated in order to lead developers to use Stringy.len delegating to String.characters.length.

@Cat-sushi
Copy link

By the way, from view point of Japanese, String.length is quite harmful, because '𠮷'.length returns 2 instead of 1;

@lrhn
Copy link
Member

lrhn commented Feb 2, 2021

Are generics of extension types covariant?
I'd assume so, for consistency and because nothing is said. If/when we get declaration site variance, that can then be changed per type variable.

I would consider allowing Ext(expr) and Ext<TypeArgs>(expr) as expression syntax, which would cast to Ext<TypeArgs> with the type args explicit or inferred (I don't expect us to get type inference of the type arguments for expr as Ext).
It would convert an existing special case of the syntax into more general use, which I think is fine.

@lrhn
Copy link
Member

lrhn commented Feb 2, 2021

This proposal has assignment from the on-type to the extension-type, but not in the other direction. That's clever, but also problematic for non-base types.

It means that a List<Ext> is not assignable to List<OnType>.

One of the potential uses of extension types is as aliases that provide a different view, but with the supertype/subtype relation here, that becomes much more cumbersome.
Example (as mentioned by @Cat-sushi, this is not entirely hypothetical):

extension Chars on String {
  // A grapheme cluster based String API.
}

List<Chars> split(Chars text, int partCount) {
  /// splits text into partCount parts of roughly equal length, counted in grapheme clusters.
}

This function cannot be used by someone who wants to see strings as Strings, not Chars. They'll need to convert the resulting list to a List<String>. So, we are introducing new types which are views on existing types, but which are not easily converted back to the original type.

I an see some use-cases negatively affected by that design. What are the advantages? What is better than if we just made the extension type and its on-type mutual subtypes? Or, with a different view, static aliases for the on-type.

(A "static alias" would be a different name for an existing type, one which is a mutual subtype of the type it's aliasing, and which is propagated like a separate type during type inference, and which may act differently wrt. static analysis and statically resolved invocations. At run-time there is only the original type, the static aliases do not reify at run-time because they have no run-time effect. We could even say that void and dynamic are static aliases for Object?, which was the design idea behind void and dynamic during the Dart 2.0 design phase). In that way Ext<T> would be like dynamic, we statically determine how to do the invocation on it (statically resolved vs. always dynamically resolved), but it's really just the normal object, with its real type, below it.

If it was aliases, a class could simply replace all String types with Chars types and be grapheme-cluster safe itself, and clients of the class won't even have to notice (well, I guess they might have to cast the return values back to String/List<String>, but that's just one assignment to a type variable).

@lrhn
Copy link
Member

lrhn commented Feb 2, 2021

If e is an expression whose static type is the extension type Ext<S1, .. Sm> then a member access like e.m() is treated as Ext<S1, .. Sm>(e).m(), and similarly for instance getters and operators.

Nit: ... is treated as Ext<S1, .. Sm>(e).m() where the static type of e is the on type of Ext<S1,...,Sm>.

Because the extension does not apply to itself as specified (it's a supertype of its own on type).

This rule also applies when a member access implicitly has the receiver this.

That would require this to have an extension type, which can only happen if an extension type is the on type of another extension. So, can extensions apply to extension types?

extension Flint on int {
  int floo() => this + 42;
}
extension Grint on Flint {
  int groo() => floo() + 37;
  int gree() => this + 2; // INVALID!
}

I'd say it's perfectly fine for groo. The this type of Grint.groo s Flint, and hence floo() means this.floo() which is valid.
And because of that gree is invalid because Flint does not have a + operator.

@lrhn
Copy link
Member

lrhn commented Feb 2, 2021

Consider allowing an extension to "inherit" the on types methods, so the extension type actually extends the on type.

Strawman:

extension Flist<T> extends List<T> {
  int foo() => this.length + 42;
}
Flist<int> x = [1, 2];
print(x.foo() - x.length); // Works, because `x.length` is visible.

I think that many extensions are really intended as extending the API, not restricting it, so if possible it should be the default behavior to allow access to the on type's members except where overridden by the extension members, and we should add a new syntax for prohibiting that.

So extension Foo on int means that Foo x = 2; x + 4; works because Foo extends int but doesn't hide it.

Maybe we can use show/hide.

extension Chars on String hide length, operator[] {
  ...
}
extension Field<T extends num> on T show operator+, operator-, operator* {
  // ...
}

with shorthand on T hide { to hide everything.

@eernstg
Copy link
Member Author

eernstg commented Feb 2, 2021

@tatumizer wrote:

Then why will we need to support the syntax of
ExtensionName(object).method(); at all? We can always use "as"

True. But ExtensionName(object).method() makes sense if you consider extension method invocation to be similar in nature to wrapper object instance method invocation. You could argue that a wrapper object can invoke an overriding method and static extension methods are resolved statically, but as long as we only consider wrapper objects that are created at the call site (and that's the only case we can express with this syntax), the dynamic type of the receiver is exactly the static type, so method invocations can be resolved statically for the wrapper object as well. So it isn't a bad starting point if you think about explicit extension method invocations as similar to wrapper object method invocations.

So I don't think it will be helpful to eliminate that syntax, and it will surely be a massively breaking change.

@eernstg
Copy link
Member Author

eernstg commented Feb 2, 2021

@lrhn wrote:

Are generics of extension types covariant?

I believe it's fair to say that they are invariant: For any given extension type Ext<S1, .. Sm>, the actual type arguments S1, .. Sm are bound to the type parameters and used in the body of the extension. We do not introduce any new semantics that would allow us to determine the actual value of those type parameters based on the run-time type of the given value of this.

That said, with an extension type Ext<S1, .. Sm> and a corresponding on-type U, it's certainly possible that U is (for example) a generic class C<T1, .. Tk>, and this has type arguments T1a, .. Tka at C that differ from T1, .. Tk. However, that's subject to the normal subtype rules, and we could have T1a :> T1 if the first type parameter of C is contravariant, etc.

Given that this has type C<T1, .. Tk> in the body of the extension, and given that it is exactly as safe to access this under that type as it would be anywhere else, I believe that no special considerations are needed here.

One thing is different, though. In a class C declaring a type parameter Xj, it is known that the actual type argument for Xj has exactly the value of Xj. This means, for instance, that it is safe to do add(e) in List<E> when the static type of e is E, and we may then call a specialized fast version of add (because there's no need to check the type of the argument). That kind of optimization won't be possible in the body of an extension. But that is already true.

@eernstg
Copy link
Member Author

eernstg commented Feb 2, 2021

@lrhn wrote:

This proposal has assignment from the on-type to the extension-type,
but not in the other direction. That's clever, but also problematic for
non-base types.

It is definitely working as intended. ;-)

The point is that it is safe to "enter" an extension type (because usage of an object is expected to be conceptually safer when accessing it under an extension type), but it is then a loss of safety to "exit" the extension type, that is, to cast it down to anything, including the on-type.

We might want to use some specialized syntax to perform a cast to the on-type, however, because that's guaranteed to succeed. So it's a safe way to start doing unsafe things, and we might want to help developers avoid a downcast to List<int> in the case where the on-type is actually List<num>.

What are the advantages?

That would be added safety. For the String/Chars example, it would indeed require a cast (that would succeed) from List<Chars> to List<String>, but that seems to be consistent with the view that Chars is a safer view on strings than String.

So wouldn't we actually want to have a tiny fence protecting us from going from Chars to String completely silently? I think the same consideration applies to List<Chars> vs. List<String>. For the cases where a type argument is actually contravariant or invariant, we'll get the appropriate treatment when sound variance is available.

@eernstg
Copy link
Member Author

eernstg commented Feb 2, 2021

@lrhn wrote:

Nit: ...

Indeed, fixed, thanks!

This rule also applies when a member access implicitly has the receiver this.

That would require this to have an extension type, which can
only happen if an extension type is the on type of another
extension. So, can extensions apply to extension types?

I believe it is consistent to allow that. Note that promotion of this (which would be a demotion from the corresponding on-type) could also give rise to this situation.

@eernstg
Copy link
Member Author

eernstg commented Feb 2, 2021

[Edit: I added this to the discussion section.]

Note that we can use extension types to handle void:

extension void on Object? {}

This means that void is an "epsilon-supertype" of all top types (it's a proper supertype, but just a little bit). It is also a subtype of Object?, of course, so that creates an equivalence class of "equal" types spelled differently. That's a well-known concept today, so we can handle that (and it corresponds to the dynamic semantics).

This approach does not admit any member accesses for a receiver of type void, and it isn't assignable to anything else without a cast. Just like void of today.

However, compared to the treatment of today, we would get support for voidness preservation, i.e., it would no longer be possible to forget voidness without a cast in a number of higher order situations:

List<Object?> objects = <void>[]; // Error.

void f(Object? o) { print(o); }
Object? Function(void) g = f; // Error, for both types in the signature.

@eernstg
Copy link
Member Author

eernstg commented Feb 2, 2021

@lrhn wrote:

Consider allowing an extension to "inherit" the on types
methods, so the extension type actually extends the on type.

That might certainly be useful, but we already get very much the same effect from simply using the extension in the way which is supported today (so the receiver would have a type that matches the on-type, not the extension type).

Presumably, the benefit derived from an extension E extends T would be that it allows for restricting extension methods: With a receiver of type T we can call extension members from E and other extensions, with a receiver of type E we can only call the ones that are declared in E (plus extensions with E as the on-type, if any).

Does that carry its own weight?

However, a similar thing that could be useful would be to allow extensions to offer a set of members which is created by declaring some members and obtaining other members from other extensions. This is similar to inheritance. We could resolve conflicts simply be not inheriting any conflicting declarations; this is basically the same thing as overriding. However, we do not have to enforce any override relationships, because all invocations are resolved statically.

@eernstg
Copy link
Member Author

eernstg commented Feb 2, 2021

@tatumizer wrote:

more questions

extension StringExt on String {
  void bar() {}
}

StringExt hello = "hello"; // OK, upcast.
hello is String; // OK, type test for subtype, evaluates to true.
"hello" is StringExt; // Evaluates to true.
"hello" as StringExt; // Succeeds at run time.

I think there's a bit of a problem with those "is/as". Normally, each
of them is a runtime operation:

That's still true, of course. But they will operate on the run-time representation of the given objects and types, and they erase the extension type down to the corresponding on-type.

dynamic h="hello";
if (h is String) { // Evaluates to true.
   //...
}
// BUT
if (h is ExtString) { Evaluates to true.
   //...
}

The question about promotion is in the 'discussion' section. It seems rather likely that it would be useful to change the type of h to ExtString, and in the above example there's nothing special about that: We just note that ExtString is a subtype of dynamic and hence we can promote (when the variable is otherwise promotable).

However, it gets more tricky in other cases:

String s = "hello";
s as StringExt; // Change the type to `StringExt`?

In this case we are actually demoting the type of s. That may be useful, and we could certainly do it, but it is a possible source of confusion that we go to a supertype rather than a subtype, and it may create some technical issues (what do we do about the list of 'types of interest'?).

It is also possible that this kind of promotion is simply too weird, on the developer should have written StringExt in the first place.

@leafpetersen
Copy link
Member

Broadly, this is in line with what I've been thinking, thanks for starting the discussion! A couple of comments.

  1. I'm fairly skeptical of re-using extensions as is. It will be very surprising, and a bad user experience, if having 4 different extension types which happen to use int as the on type result in a blizzard of conflicting extension methods being defined on int. Code completion on int will go to hell, and the user will get a bunch of methods made available on int that are not necessarily intended to be used on arbitrary ints (e.g. they expect to preserve invariants via the constructor (see below)). I strongly think that we will need different syntax here. My initial thinking was that we might use something like extension MyExt is int, but I'm not very opinionated here. extends is another possiblity.

  2. I don't think we want assignability as the only affordance available. I think we will want the ability to require that creation of the extension type to go through a constructor, so that invariants can be (at least loosely) enforced, and so that the construction parameters can be unrelated to the on type (e.g. I want to create an Id using a String, but represent the Id using an integer index into a table of Strings, and I do not want you to be able to treat an arbitrary int as an Id) My thinking is that you should be able to define one or more constructors in an extension type, and by default the only way to create an instance of the extension type is via a constructor. Perhaps if an extension type has no constructors then you can implicitly assign to it from the on type? Or perhaps we add implicit constructors, perhaps even to classes, so that you can define a constructor which automatically gets invoked in an otherwise invalid assignment.

  3. I think having some way to delegate to the on type with low syntactic overhead would be a very nice affordance. I had also thought of using show and hide. Using extension MyExt extends int to signal delegation might also work. However, I think it would be very nice (one way or the other) to also support mixing in multiple different extensions to produce a single extension with all of the functionality of the "super" extensions, which pushes more in the direction of something like show and hide, and at that point it's tempting to say that delegation to the underlying should just be a use of the same mechanism that is used to "mixin" other extensions.

  4. We may want the ability to introduce extension types which are not subtypes of Object. For interop, if we want to be able to talk about raw platform types that are not boxed, we would need some mechanism for segregating them from Objects, and this would be one potential such mechanism.

@leafpetersen
Copy link
Member

Two additional comments I forgot to add:

  1. For variance, I think invariance is a reasonable choice: otherwise, I think it's valid to choose the variance induced by the uses of the type variables in the on type. But it's not valid to just say "covariant", things will break.

  2. There are bunch of questions to be resolved around cycles. We probably don't want to add recursive types (maybe we do?), and that means doing some hard thinking about what is allowed/not allowed. extension MyExt<T> on MyExt<T> definitely not. extension MyExt<T> on List<MyExt<T>>.... maybe, maybe not? What about uses of MyExt<T> in the body?

@Cat-sushi
Copy link

Sorry my stupid question,
what is the benefit of this proposal in comparison with static extension types #42?
In other words, what "minimal enhancement" means?
I simply feel typedef is more natural as a keyword for transparent/ opaque derived types.

@lrhn
Copy link
Member

lrhn commented Feb 3, 2021

@leafpetersen

Ad 6. Recursive types are bad when they have no finite representation. The underlying representation of an extension type is the on type, and we need to know what that type. If the on type refers to the extension again, then the fully expanded representation type is going to be infinite.
So definite no to both extension Ext<T> on Ext<T> (solving provides no solution) and extension Ext<T> on List<Ext<T>> (solving gives the infinite type List<List<List<...∞...>>>). Referring to itself in the extension methods is perfectly fine, we have no issue with class C { C get c; } either. We just need to be able to figure out what the alias Ext<T> really means, by expanding all type aliases, and an extension type is a type alias for the representation type.

Ad 5. Variance. Yes, we should just follow the variance of the on type by default. When/if we get variance annotations, they can apply to extension type type parameters as well, and should obviously also match how the type parameters are used in the on type and extension methods.

@lrhn
Copy link
Member

lrhn commented Feb 3, 2021

(Edit: Added "subtype" as option)

There are multiple possible approaches here:

Member visiblitly:

  • Transparent extension types: The extension type exposes members of the on type as well as the extension members (but preferring extension members to on-type members in case of a conflict).
  • Opaque extension types: The extension type only exposes the extension members.
  • Partial transparency: Some of the members of the on type are available, perhaps chosen using show/hide or easy delegation syntx.

Type relations

  • Static type alias: An extension type is an alias for the on type with different static behavior for invocations. It doesn't exist at run-time.
  • Dynamic type alias: An extension type and its on type are mutual subtypes. Can be freely assigned in either direction, and a List<Ext> (with Ext on Foo) can be used as a List<Foo>, but they are still different types at run-time. (Not sure this is a meaningful difference from a static type alias).
  • Supertype: An extension type is a super-type of its on type, and is retained at run-time. This allows Foo to be assigned to Ext and List<Foo> to be assigned to List<Ext>, but not the other direction. It makes sense because Foo has all the members of Ext as well. An as cast can cast the object to the on type.
  • Subtype: An extension type is a subtype of its on type, and a super-type of Never. This allows you to either down-cast to, or us a constructor to construct, a value of the extension type, but all value of the extension type can be directly assigned to the on type. You can freely use the type internally and not be afraid of breaking a receiver of your type, like List<Ext> foo() => ... can be used by someone expecting a List<OnType> for reading (but not writing).
  • Separate type: An extension type is unrelated to its on type, and is retained at run-time. The extension type is probably a subtype of Object?, but not necessarily Object. You can't assign in either direction. You can as-cast from extension type to on type. You might be able to cast in the other direction too (if you can cast to an extension type at all).

Creating instances of the extension type

  • Assignment: In "supertype" or "mutual subtype" cases above, you can assign the on type to the extension type.
    For more separate types, an implicit constructor invocation might allow assignments to work.
  • Casting: In most cases (all cases where the extension type is assignable to Object?), you can cast from the extension type to the on type. The on type is valid type for the actual object. If we make extension types that are not a assignable to Object? (not sure how that will work with anything), then we can also prevent casting if we want to. If there is no constructor, you might be able to cast to the extension type.
  • Extension conversion: The extension type can have toFoo methods for any type they want, including the on type, which would just be toFoo() => this;.
  • Constructors: If you can't assign to the extension type, as with the "subtype" or "separate types" above, the extension type needs a "generative" constructor, which will effectively be a factory constructor returning an instance of the on type. In that case, we can treat the extension type as just any other class, the only difference is that the underlying representation is not opaque memory cells (like a class), but a known object of some other type that only the extension can access.

We can support multiple of these behaviors, we just need separate syntax to declare them.
There are use-cases better supported by some combinations than others. The simplest approach is the transparent static type alias with assignment in both directions. It has maximal power (you can do anything), but not much ability to restrict things.

The most restrictive is probably the opaque separate type with constructors. It looks the most like a completely different class.

@Cat-sushi
Copy link

Cat-sushi commented Feb 3, 2021

I prefer opake, supertype, assignment + casting + constructors, typedef and on keywords.
Because, ...

  • Transparent subtype can be substituted by on-type with extension method.
  • Interface of on-type can be implement at extension type, if we want. (new?)
  • Name conflict of extension methods won't be serious, because "extension" type with implementation of on-type interface can reduce requirement of extension method to make conceptually separate type.
  • On-type can be assignable to "extension" type, it is important to incrementally migrate to "extension" type.
  • Supertype/ opaque "extension" variable can statically restrict methods of on-type.
  • Finally, extension and extends are not suitable keywords for supertype.

@lrhn
Copy link
Member

lrhn commented Feb 3, 2021

We could support both "assignment" and "constructors" by defaulting to assignability if there is no constructor declared, and forcing you to go through a constructor if there is one. Then we wouldn't need a different top-level syntax.

That would make it a breaking change to add a constructor, so another alternative is to treat no constructor as a "default constructor" of Ext(<on-type> v) => v; and treat assignment from the on-type as implicitly calling the unnamed single-argument constructor with the assigned value. (And it makes the current explicit extension invocation syntax just be a constructor invocation, so extra bonus for that).

If we don't have assignment at all, so you always have to go through a constructor, then we should definitely have the default constructor above. Then Ext(v).foo() would naturally work the same as now, which means being opaque (not being able to call on-type members).

So, "most backwards compatible" approach is opaque, no-assignment/constructor only (with default constructors) and any type relation (because there is no type no, so we can't be incompatible). We can apply that to the current extension declarations, if we want to, or we can require extra/different syntax to enable it as a type. So, sure, let's use typedef as strawman:

typedef FooList<T> on List<Foo<T>> {
   // Implicit: FooList(List<Foo<T>> value) => value;

   Foo<List<T>> collect() => ...;
}

(Then we can allow show/hide to introduce members of the on type too, so show * to show all, or just show with nothing after - although that can be worrysome if we ever want to add other keywords after the show/hide part).

This can work with any type relation, but probably best with a completely separate type, because if you have constructors (which can choose to throw on some arguments), you don't want to be able to cast just any value into the type. Casting out is fine. Assignment out - potentially fine, but it's safer tot disallow it.

So, this describes opaque/partially transparent, constructors (with default embedding constructor), and separate type.
That gives maximal encapsulation of the new type, full control of which values are accepted by the constructors if you write them yourself, no abstraction leak short of casting to the on type. It can, by and large, pretend to be a separate type, just one represented directly by the on type instead of having the on type as a property.

@Cat-sushi
Copy link

I rethinked my opinion to make myself neutral.
I prefer separate type, which is most natural.
And, assignability of on-type to extension type can be discussed separately at implicit constructor proposal #108 or so.

@Cat-sushi
Copy link

Cat-sushi commented Feb 3, 2021

The simplest approach is the transparent static type alias with assignment in both directions. It has maximal power (you can do anything), but not much ability to restrict things.

I understand the meaning of "Lightweight" / "minimal enhancement".
But, I feel that the simplest approach above is quite odd and inconsistent with other type related features.
I prefer most restrictive one.

@mateusfccp
Copy link
Contributor

@lrhn

We can support multiple of these behaviors, we just need separate syntax to declare them.

I like the idea of supporting multiple behaviours, giving both flexible and restrictive abstractions.

@Cat-sushi
Copy link

Can we discuss them separately?
I bet on typedef type in #42 as the first feature to be implemented.
Because, transparent types as subtypes and opaque types as separate types are simpler but have many use cases.
On the other hand, I feel this proposal is more complicated and requires more discussion, in addition extension methods can help.

@eernstg
Copy link
Member Author

eernstg commented Feb 11, 2021

@tatumizer wrote, about the boxing class of an extension type:

what I'm trying (so far unsuccessfully) to put in words is an idea
that due to the way how the class is defined, we can avoid
heap-allocation of MyFoo

That is exactly what I aimed to do with #308, so you can see a lot of info about my thoughts in that direction by looking there.

@lrhn
Copy link
Member

lrhn commented Feb 11, 2021

If identical means "indistinguishable" then 5 and nat(5) seems distinguishable ... however, identical is a function taking two Object? arguments, so calling identical(5, nat(5)) includes an up-cast of nat(5) to Object?. Will that preserve the distinction between that an 5? Maybe, maybe not. Very much depends on what we do. For a shallow, inexpensive extension type, they're almost certainty going to be identical.

@Cat-sushi
Copy link

Cat-sushi commented Feb 12, 2021

@eernstg

If you have a protected extension type MyFoo and a class Foo then there is no subtype relationship that will allow you to do anything similar by declaring another class MyFooClass and using that as a type rather than MyFoo, even in the situation where you can edit Foo.

No, we can't use MyFoo safely at object leaking positions.
And, MyFoo can't guarantee its behavior after editing Foo than MyFooClass can.

If MyFooClass is a subclass of Foo then you can extend the interface (you can add new methods), just like an extension type can add new methods. But you can't hide any existing methods. This means that you won't get any protection of invariants at all: Anything you can do when using Foo as the type can also be done with MyFooClass as the type.

Yeah, hiding member is not safe or even possible, even with subclasses I recommending.
My opinion is that int with extension should be int when whatever the extension is.
@deprecate might be better solution.

Another problem is, as you mention, that you can't create a subclass (or even subtype) of int and several other classes (and if we add sealed classes then there may be lots of classes that you can't create a subtype of), and you can't expect to be able to create a lot of extra superclasses of an existing class, even a regular class (not from a system library), if you're not the maintainer.

Yes, that is a big problem.
I would like to post yet another proposal that enables existing classes to be superclasses.
Current abstract classes are often used in order to just hide their implementation from being modified, but not to restrict reuse of their implementation

Certainly not. In the first case tiny has the type TinyJson, and that allows us to invoke the members of a specific interface (namely: the interface of TinyJson, computed from the declared members in the declaration of TinyJson and its hide/show part). In the second case the type of tiny is Object?, and that only allows us to call the (few) members that Object? has.

Really?
What is extension TinyJson on Object? {...} for?
My next question is why is "Enhancements" section required?

@lrhn
Copy link
Member

lrhn commented Feb 15, 2021

@tatumizer It's not a bad idea, it's just not perfect (but then, nothing ever is).

You probably need ~44 bits for the object reference (if objects are 16-byte aligned in a 48-bit address space, maybe one more bit for a tag). That leaves 19 bits for the V-table ref. It's cheaper to have a direct V-table pointer than a reference into a indirection table (one less level of indirection), and it costs extra to mask out bits when using the object pointer, but it would avoid using 128 bits per reference. Using V-table ref zero for "not an interface" would work, but since everything in Dart is an interface, it's probably better spent on something like smis. Also, if we are doing this, we can put the class description into the pointer and improve the speed of class-ID checks.

The real issue is that if your program contains more than 2^19 different interfaces/class IDs, you have a problem. I wouldn't rule that out for programs with large generated class structures like proto-buffers.
A whole-program compiler can possibly tree-shake the V-tables that are never used directly (the easy ones are those belonging to abstract classes which are not used as interfaces to all method on. Maybe all interfaces that are not used for virtual method invocations can be dropped, if dynamic invocations are handled in a different way.) Still a risk, so it might be necessary to support real fat pointers of 2×64 bits as well. At least, if you can separate your types into groups that never flow to the same places, then you might be able to use compressed fat pointers for one group and full fat pointers for the other, or even use different V-table indirection tables for the two groups. Cleverness is an option, it just needs a fallback when it's not enough.
(And saying things like "nobody ever needs more than 500K types" doesn't have a great track record).

@lrhn
Copy link
Member

lrhn commented Feb 15, 2021

@tatumizer IIRC, the AMD64/x64 architecture specifies 48 bit addresses (but potentially more physical address lines), and the underlying memory controller in the CPU allows you to map memory to any position in this space. You might be able to ensure that your heap is contained in some lower-bit-sized initial chunk of the memory, but that's up to the operating system to decide. (Assuming that you can, and then failing, might be a used as a security exploit. I don't see how, but I'm not a security expert, they are devious people!)

You can use fewer address bits by using addressing relative to the heap start.
A 4GiB limited memory would even allow 32-bit relative addresses, which is what the x32 mode uses. A 4GiB heap limit is probably going to be too small for 64-bit Dart in some situations.
If you're not using x32 addressing, that requires one extra addition (to the base address of memory) for every heap access, which can be costly. (Now, if x64 still had segment registers ...).

As for the instructions, clearing the upper bits of A requires a mask. You don't want to have too many 64-bit literals in your machine code (don't even think the intel and instruction allows it), so that requires you to reserve one register to keep that mask around. You might be able to fuse the and with the load by doing movq target, _maskRegister; andq target, [fatPointerAddress]. The lower 3-4 bits of the address are likely zero, so to save space, you might want to shift it down, and then you'll have to shift it up again. Saves a few bits, costs an extra cycle (and the size of the opcode). I you only shift it down by 3 bits, you might be able to combine the <<3 with something else in a lea operation.

For B, the shift should be enough (one cycle on Coffee Lake chips), then you need to use that relative to the V-table lookup table (another register you want to reserve).

In many cases, you can probably unbox the combined pointer early in a function and use the two pre-computed values, but worst case, this overhead occurs on every iteration of an inner loop. One movq, one and and one shr (which can run in parallel) will still take up ~10 bytes in the instruction stream, compared to two adjacent memory loads.

It's definitely going to be a trade-off consideration, not a clear win to either approach.

(Mobile devices are obviously smaller, and usually ARM based. I have less experience with those, so there might be things you can do more efficiently there. I think they still have 48 bit addresses, though.).

@eernstg
Copy link
Member Author

eernstg commented Feb 16, 2021

@tatumizer wrote:

Not sure. Suppose identical(nat(5), 5) is true. Then 5 is nat also has to be true.

I'm proposing that identical(nat(5), 5) should be true. There is an upcast from nat to Object? when nat(5) is passed as an actual argument to identical, so we're comparing the object 5 to the object 5 at run time. But 5 is nat is a compile-time error, because nat is a protected extension type—we can't require is to run arbitrary code (here: to verify a user-written invariant), and we should not allow for nat operations to be performed on an entity that hasn't been "purified" by running a constructor of nat. So I believe your worries are already covered.

@lrhn
Copy link
Member

lrhn commented Feb 16, 2021

If nat is an extension type without a constructor, we could allow o is nat to do o is <on-type of nat> and type promotion to nat. If it has a constructor (or is a "separate type" for other reasons), I agree that we can't do that.
(This again suggests that types with constructors and types without constructors are completely different kinds of types, and would need different declarations. The types "with constructors" wouldn't need to actually declare a constructor, they'd get a default embedding constructor if they don't have one, but you need to declare up-front which kind of type you want it to be, because it's a breaking change to change that later, at least in one direction.)

@eernstg
Copy link
Member Author

eernstg commented Feb 16, 2021

@lrhn wrote:

If nat is an extension type without a constructor, we could allow o is nat
to do o is <on-type of nat> and type promotion to nat.

The proposal currently does almost exactly that: It allows o is nat when nat is not protected, and it allows promotion, but it promotes to the extension type and not the on-type. The point is that even a non-protected extension type would provide a certain amount of protection (simply because it doesn't allow invocation of un-shown instance methods), and it seems useful to allow developers to request that protection rather than implicitly promoting to the on-type.

constructors and types without constructors are completely different kinds of types

The difference is syntactically explicit in the proposal: Constructors are allowed in protected extension types, not in non-protected ones.

wouldn't need to actually declare a constructor, they'd get a default embedding constructor

We could certainly do that. I believe it would be a small change (but, as we know, default constructors can have many forms ;-).

I'm not quite sure in which situations it would be helpful, though. If we do not establish any invariants (by having a constructor that does at least something) then an upcast into the extension type would behave in a way which is very similar to the behavior of that default constructor, so the difference is basically only the syntax, Ext ext = Ext(onType); vs. Ext ext = onType;. If we really want to enforce the former then we can always write a trivial constructor that just returns its argument.

@eernstg
Copy link
Member Author

eernstg commented Feb 16, 2021

About the fat pointer discussions:

I think it's worth noting the main properties of Rust traits:

  1. They are interfaces (so they allow for declaring abstractly that some types share certain members). 2. They can be implemented by the implementation type or "remotely" (impl Hash for bool ...).
  2. Trait method implementations can be resolved statically (that is: no indirections) when the implementation type and the trait implementation are in scope at the call site.
  3. Trait method implementations can be dispatched at run-time (using fat pointers) when only the trait is known at the call site.

Let's consider these features one by one:

  1. In Dart, we can use classes as interfaces, and we can use subsumption (upcasts) to allow code that only depends on the shared interface to work on different implementations.

  2. Implementation of an interface member in a class is a standard feature. Implementation "remotely" can be expressed as interface default methods or as extension methods/types (of various kinds). Interface default methods cannot be used to add implementations of an interface to an existing class C (without editing C); but when they can be used, they are complete (they support late binding and dynamic invocation as well as all the more static modes). Extension methods must be resolved statically (so they don't support late binding or dynamic invocations), but it is possible to pass o.box rather than o in the case where o will be used in a way where static resolution of the extension methods is not possible.

  3. Extension methods (and, hence, members of any kind of extension type) are resolved statically.

  4. Extension methods can be resolved dynamically by using box to obtain an instance of the companion class of the extension type.

We could play around with fat pointers, but that would be a very deep perturbation of the runtime of Dart, and we already have ways to achieve a very large portion of the features offered by Rust traits.

Dart just isn't going to be as optimizable at the low level as Rust.

For instance, nobody is pushing for getting rid of the garbage collector and introducing something like ownership types to handle memory bugs statically. I think that makes sense: We just don't want to reason about memory management at that level of detail in all Dart programs, so having a garbage collector is a feature in Dart, not a shortcoming.

I think the use of fat pointers is similar: If we call .box (and thus pay the price for allocating a regular object and calling methods via some kind of indirection) then we get a full-fledged object whose static and dynamic type is E.class which is a normal class. That expression is assignable to all the implemented interfaces of the extension type, and the object is capable of object-oriented dispatch (i.e., using late binding to implement overriding).

The point is that we are already covering a very large portion of the affordances offered by fat pointers, and then some. The Rust version may be optimized a bit more, but they pay a heavy price in that the runtime and the type system must keep track of which pointers are fat and which ones aren't, and they can't abstract away from that fat-ness. For instance, if we cast e.box to Object and later perform the type test is SomeInterface then we will be able to call the methods of SomeInterface. As soon as the vtable part of a fat pointer has been dropped, it cannot be restored safely.

@eernstg
Copy link
Member Author

eernstg commented Feb 16, 2021

That is, when you say y=x.box, all runtime has to do is to copy x to y
and set 24 hi-order bits of y appropriately.

That might be possible, and it might be a very useful optimization.

However, those fat pointers (we need to call them something, even though it couldn't possibly have anything to do with Rust ;-) would most likely need to be known statically: It is probably far too costly to mask out the "other data" part of every single pointer, and or-ing it with whatever is needed in order to create a suitable 64 bit pointer that the hardware can dereference. But we have to do such things on the fat pointers.

Next, how should we handle a cast to a top type (or Object)? If we must track the fat pointers statically then we must transform the pointer to a non-fat value as part of this cast, and then we (presumably) can't cast it down again.

I think the actual wrapper object is attractive because it provides a full-fledged object: It works in all the ways that we would expect for an object (including o is SomeInterface and (o as dynamic).foo()). The fat pointer tricks could be used to optimize some parts of this behavior, but it would break in several ways.

I think it's cleaner to use the extension type in a completely static manner, with added optimization opportunities, and using the companion class via box when we need a full-fledged object.

Another matter is that we might want to use those unused pointer bits in different ways. If it is actually possible to reserve 24 bits or so for other purposes, why wouldn't we use them to carry information about gc, or taintedness of the referred object, or simply to make objects smaller by packing k pointers in less than 64*k bits? Extension types might be widely used at some point, but it still seems likely that the impact would be greater if this idea could be used to reduce the space consumption of the general object layout.

boxed and unboxed versions of ext types can coexist in runtime?

extension E on int {}

void f(E x) {}
void g(E.class x) {}

void main () {
  E x = 42;
  E.class c = x.box;
  f(x); // OK, but `f(c)` is an error.
  g(c); // OK, but `g(x)` is an error.
  f(c.unbox); // OK.
}

The conversion is explicit in both directions, and that makes the distinction between E and E.class easy.

It is also causes the code to be somewhat more verbose, but I think it is useful to maintain a strict and explicit distinction between the extension type E (that allows us to work on a given instance of the on-type with a purely static "wrapper" around it) and the companion class E.class which is a completely normal class as seen from the outside.

@lrhn
Copy link
Member

lrhn commented Feb 16, 2021

I'm not quite sure in which situations it would be helpful, though. If we do not establish any invariants (by having a constructor that does at least something) then an upcast into the extension type would behave in a way which is very similar to the behavior of that default constructor, so the difference is basically only the syntax, Ext ext = Ext(onType); vs. Ext ext = onType;. If we really want to enforce the former then we can always write a trivial constructor that just returns its argument.

This assumes that the arguments to the constructors can only be the on type. It could also be ways to construct the on type from other types:

extension Foo<T> on List<T> {
  factory Foo.rep(int count, T value) => List<T>.filled(count, value);
  ...
}

It can mostly be done with static methods instead, but it does matter where you put the type arguments, Foo<int>.rep(42, 42) vs Foo.rep<int>(42, 42); (returning a Foo<int>).

@lrhn
Copy link
Member

lrhn commented Feb 16, 2021

What happens with "trait casting" is that an object with runtime-type Foo and static type Bar is represented as a pair of

("How to treat a Foo as Bar", the Foo object itself)

If you just have a Foo object (if there were object types which weren't interfaces too), then to assign it to Bar (i.e., to create that pair), you need some evidence that a Foo can behave as a Bar. If Foo directly implements Bar, then that's awesome. You statically know where to find the "How to treat a Foo as Bar" information in the Foo type's static information.
If a trait implementation is imported into your scope which tells how a Foo should behave as a Bar, then awesome. You statically know where that information is too.

If neither, you can't just assign a Foo to a Bar. A Foo is-not a Bar without evidence that it is.

Then you can try to dynamically cast it. Dynamic casts are something completely different. How, and whether, they work is going to be a touch design question. C++ doesn't have to support dynamic_casts, you can choose to compile without the run-time type information needed to do so.

If you assign the (Foo as Bar) to Object, the result is the pair

("How to treat a Foo as Object", the Foo object itself)

All types know how to cast themselves to a super-type, including Object, so that's easy. It forgets the "how to be a Bar" when you do it, though.

All objects know their own run-time type too. That's the one thing that can help you do dynamic casts.
So, whether you can cast down to Bar again depends on whether you can find a proof that this object, with run-time type Foo, can be treated as a Bar. Maybe it can be done (slowly) by starting from the Foo object itself and checking all its super-interfaces, maybe you need the implementation of Bar on Foo declaration in scope as well, and you look it up from the runtime type of the Foo object. Maybe you can't, even though an implementation exists elsewhere in the program.
It all depends on the implementation choices for the static class/implementation information that is available at run-time.

Downcasting is always hard. With traits, not all information is available on the run-time type itself, which makes things potentially slightly harder.

@lrhn
Copy link
Member

lrhn commented Feb 16, 2021

Not an argument for or against, just exploration (some'd say exposition, but explaining helps me think 😁).

As for the wrapper object, I agree. If I didn't say that in an earlier post, a fat pointer is just an inlined/unboxed wrapper object (object header with a V-table + one field of wrapped data).

The tricky bit is always the downcast. It's tricky no matter how you do it, you need run-time type information of some sort to do it. With traits, that information is no longer just in the run-time type of the object being cast, so potentially extra tricky. Rust has proven it's not impossible (with some constraints, one of theirs being that traits and classes are separate types, something Dart's classes and interfaces can't claim).

@eernstg
Copy link
Member Author

eernstg commented Feb 18, 2021

@tatumizer wrote:

I'm afraid the distinction between E and E.class will be widely misunderstood

I think it's important to keep the distinction very visible, and then I hope that the visibility of the distinction will make developers aware of the difference: E is a static mechanism (member access is extension method invocation, resolved at compile time). E.class is a regular class (hence member access is instance method invocation, which involves late binding). The type arguments have this static/dynamic dichotomy as well, and that does matter in many cases.

I think that distinction is subtle enough that many developers wouldn't necessarily think it matters. Yet, it makes such a profound difference that there will be frustrating bug hunts if the distinction is ignored.

when we say "ExtString as Object", nothing happens, except the
compiler will change the "static view".

Even with a Rust-like approach to traits, we presumably would not use fat pointers everywhere. Some pointers are simply references to entities in the heap, and they have a different representation than fat pointers. In particular, if the fat pointer is represented by storing a bit pattern which is a class ID or a similar entity, and those missing bits can be restored safely because the heap is known to be a lot smaller than 2^64, then the non-fat pointer would be obtained from the fat pointer by replacing those bits with something that works. If the given bit pattern will cause SIGFAULTs when used as a non-fat pointer then we must perform that transformation at the cast.

Conversely, if we really think we can use fat pointers everywhere, why wouldn't we just reduce the size of all programs by 30% rather than using all that unused space to store vtables? ;-)

@eernstg
Copy link
Member Author

eernstg commented Feb 22, 2021

One thing I'm still wondering about: Why would anyone want to use fat pointers in Dart? ;-)

A fat pointer is suitable for the situation where an instance of a type T is available, and

  • It is statically known that T declares that it implements an interface I.
  • It is statically known that this implementation is not delivered in the shape of instance methods, it is delivered "on the side" as a vtable which will be the 2nd component of a given fat pointer.

I can't see how this can be compatible with Dart outside some very narrow special cases. For instance, when a class implements two interfaces, do you plan to have super-fat pointers to hold two additional vtables?:

abstract class I {... int get foo; ...}
class J {... bool get bar => e1; ...}
class C implements I {... int get foo => e2; ...} // No `bar`.
class D implements J {... bool get bar => e3; ...} // No `foo`.
class E implements I, J {...}

extension DI on D implements I {
  int get foo => ...;
}

extension CJ on C implements J {
  bool get bar => ...;
}

void f(I i) => print(i.foo);
void g(J j) => print(j.bar);

void main() {
  var c = CJ(C()).box;
  var d = DI(D()).box;
  var e = E();
  f(c); f(d); f(e);
  g(c); g(d); g(e);
  g(J());
}

How would the invocation of i.foo and j.bar be able to look up the correct implementation of those getters in the different cases?

To me it looks like a fat pointer could only be used in some very special cases where we have a lot of compile-time knowledge about how a given class acquired a given member. I'm just not convinced that this situation occurs frequently enough to make fat pointers worthwhile.

The situation is different in Rust, because Rust is optimized for maintaining a larger amount of information about the execution at compile time. This enables static resolution of methods in many cases (such that we can just jump directly to an address at run time, rather than looking up the method implementation in a vtable or similar). But we are not likely to want to compile 4 type-specialized versions of f in Dart, and that won't work if we tear off f, in particular if we invoke the torn-off function object dynamically, etc.etc.

@leafpetersen
Copy link
Member

@tatumizer @eernstg I wonder whether the discussion of fat pointers might better be split off into a separate issue, it's moving pretty far afield?

Independently, I'm not seeing anything about compile to JS here. If this discussion is going to be relevant to the design of a general purpose Dart language feature, there needs to be a discussion of what "fat pointer" means in our JS backends (and realistically, if "fat pointer" means compile every reference to a JS object containing two properties then this seems like a clear non-starter).

@leafpetersen
Copy link
Member

Separately, it may be useful to take into account for this discussion that I believe no Dart implementation currently uses vtables in the traditional sense. Some discussion here, though I believe this does not cover the AOT compiler which uses a sort of global vtable.

@leafpetersen
Copy link
Member

for javascript, this optimization is not feasible. But remember, this is just an optimization, it allows to avoid heap-allocating the boxed object. There's no difference in functionality.

In so far as this is viewed entirely as an optimization which doesn't bear on the practicality of the feature, I would suggest then that this is not really the right forum for discussing it.

@mraleph
Copy link
Member

mraleph commented Feb 23, 2021

Just like @leafpetersen suggests I think it is worth separating specification and implementation concerns, at least on the initial stages. I would even suggest that back and forth on GitHub might not be the best medium for discussing implementation - I'd suggest doing end-to-end design doc or something similar. This thread mashes all of it together and as a result I gave up on following the discussion even though I am keenly interested in the feature itself (e.g. I would like to use these types to build Int64 and similar types as a zero cost abstraction on top of int - which would help protobuf package on VM to be more performant). Also I would like to point out that discussions about implementation concerns that don't involve actual implementors is somewhat moot - unless you are deeply involved in dart2js or VM you can't really make a call about how feasible is to do a particular implementation/optimisation.

As a final remark I would like to point out that not all systems / VM configurations have 64-bit pointers, it is highly likely that VM is going to switch to compressed pointers on mobile devices in the future - meaning that you don't really have a luxury of hiding any sort of tagging information in the pointer.

In any case, just like Leaf, I strongly encourage moving this discussion somewhere else.

@leafpetersen
Copy link
Member

I promise to not post any further comments in this thread,

@tatumizer To be clear, I think (and I have heard feedback from other team members) that you are often a valuable contributor here, so please don't take this as a rejection of your input. My primary concern is that this thread is now 143 comments long, the majority of which have only tangentially to do with the main subject. The absence of threading in github issues means that this this issue is essentially useless for its original purposes - I can't refer people here to get an overview of the lightweight extension types proposal, since this issue is primarily now the "fat pointers" issue. For this reason, I have been trying to impose a discipline of splitting side discussions into separate issues. If you and others feel there is more to discuss on this subject, I'd suggest opening a separate issue? I've also considered trying out the new github "discussions" feature, so perhaps that's another direction we can go.

but I'd like to point out that separating the concept from implementation in this particular case is just impossible: the issue has "zero-cost" mentioned directly in its title.

I actually quite agree with this, which is why I brought up compile to JS, and why I was surprised that your response was that this is "just an optimization". This discussion seems mostly pointless to me because it does not address making the feature zero-cost on JS. From my perspective, this feature has hard requirements for the cost-model across all of our platforms, and so discussing the cost of boxing only on the VM is not very useful to help drive the design.

@eernstg
Copy link
Member Author

eernstg commented Feb 24, 2021

I added a comment at the top of the initial text of this issue, to deliver a message which is now given here as well:

This thread is too long and has too many different topics. Please use separate issues labeled 'extension-types' in order to discuss a topic in the area of extension types.

@eernstg
Copy link
Member Author

eernstg commented Dec 16, 2022

This proposal has now been further developed in many steps, and it has been renamed to 'inline class'. The feature specification has been accepted, cf. https://github.com/dart-lang/language/blob/master/accepted/future-releases/inline-classes/feature-specification.md. Today's 'feature' issue for this feature is #2727.

@eernstg eernstg closed this as completed Dec 16, 2022
@eernstg eernstg added the inline-classes Cf. language/accepted/future-releases/inline-classes/feature-specification.md label Dec 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
extension-types feature Proposed language feature that solves one or more problems inline-classes Cf. language/accepted/future-releases/inline-classes/feature-specification.md
Projects
None yet
Development

No branches or pull requests

7 participants