Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How do we merge augmentation imports? #3643

Open
eernstg opened this issue Mar 6, 2024 · 10 comments
Open

How do we merge augmentation imports? #3643

eernstg opened this issue Mar 6, 2024 · 10 comments
Labels

Comments

@eernstg
Copy link
Member

eernstg commented Mar 6, 2024

Consider the description at the end of the section Scoping in the augmentation feature specification:

  1. Parse the main library and all of its augmentations.
  2. Merge the augmentations to determine the complete set of declarations in all types.
  3. Resolve and type-check the main library and all of its augmentations now that all type namespaces are complete.

This is a syntactic transformation (which is nice because there's a lot of potential complexity that just doesn't come up). However, there should be some kind of "redirection" to get the effect that code from each of the libraries (the main library as well as each of the augmentation libraries) will in the end have their names resolved according to the imports of the library that it came from.

In the example: Code from some_augment.dart should be resolved in a binding environment where the shared scope is enhanced as if also_lib.dart were imported (but not other_lib.dart), and similarly for code from some_lib.dart which should be resolved as if other_lib.dart were imported (but not also_lib.dart).

One way to do this could be as follows:

First transform each of the libraries such that every import has an import prefix, which is a globally fresh identifier. This means that some imports that did not have a prefix would now have a prefix which is a fresh identifier, and other imports already had a prefix, and that prefix has now been renamed. It also means that every identifier that previously resolved to an imported name will now be prefixed (adding a prefix if the corresponding imported library did not have a prefix, and otherwise renaming the prefix to the new, fresh name). Next, merge the libraries syntactically as described, and include all the imports with the new prefixes.

@eernstg eernstg added specification static-metaprogramming Issues related to static metaprogramming labels Mar 6, 2024
@jakemac53
Copy link
Contributor

It also means that every identifier that previously resolved to an imported name will now be prefixed

This isn't possible in the syntactic model though. We can't do resolution, so it can't be specified in this way?

@lrhn
Copy link
Member

lrhn commented Mar 7, 2024

I disagree that this is necessarily a syntactic transformation, but we do need to specify what it means.

I'd prefer to say that each non-augment declaration introduces a semantic declaration, and the properties of that semantic declaration is determined by starting with the first syntactic declaration, and then applying syntactic augmentation declarations in augmentation application order.

The result of that will be a semantics declaration of a type corresponding to the static declaration. Will have to specify those semantic declarations, and which states they can be in during and after the augmentation declaration, and which of those states are invalid and cause compile time errors. I still think it's going to be less work than trying to cover all the same cases using syntax manipulation, and less error prone.

And it gives us a place to attach more information on later steps.

@eernstg
Copy link
Member Author

eernstg commented Mar 7, 2024

This isn't possible in the syntactic model though.

My starting point is that we must have a clear model, syntactic or whatever fits the bill.

Also, merging is a process that produces a library from something, which is represented by the text of the main library and the augmentation libraries, but might also be considered to be some kind of a semantic entity, that is, the text, or an AST, plus some data established by a static analysis step like name resolution, following some rules that will surely have something in common with Dart scope rules.

I don't think we will be happy about an approach where the merged library is produced by a pure search-and-replace operation on text. In particular, in terms of the example in this section, it seems obviously impossible to me to provide the scoping as described if we insist that the C.isEven() body and the C._isOdd() body is just copied as written in the augmentation libraries into the merged library. There's no way we can then know that some names in the first body must be resolved to declarations in other_lib.dart, and some (perhaps the same) names in the second body must be resolved to declarations in also_lib.dart.

My proposal about adding import prefixes to names that are "intended" to be resolved to declarations in imported libraries is simply a purely textual encoding of a semantic model where some occurrences of identifiers are marked as being looked up in a specific imported library. This encoding would survive a simple textual merge, because all those prefixes are fresh and distinct names.

Alternatively, we could say that we have a completely new kind of entity ("the semantics of an augmentation library" respectively "a library that contains one or more import augment directives"), and we could then define a merging step that operates on these semantic entities and produces a merged semantic entity which is the output of the process. We could then specify the language as a whole afresh in terms of this kind of semantic entity, and then we'd know what it means. Or we could process the semantic entity (say, by adding unique import prefixes here and there ;-) in order to obtain a purely textual representation of the merged library. This textual representation would be a completely normal Dart library with imports and declarations, following all the normal rules, and that's the final meaning of the composite entity that is (1) the augmented library and (2) all its augmentation libraries.

I just thought that it would be nice to have a minimal approach where we don't have these new semantic entities to a higher degree or for a longer amount of time than absolutely necessary. Performing a specialized static analysis of each augmentation library and the main library, and fixing the resolution of imported names by adding prefixes was my idea of such a minimal approach.

This would also have the effect that a name which is apparently (looking at just one augmentation library or the main library alone) denoting an imported entity will actually still denote that same imported entity after merging. (That's the hygiene part: Merging doesn't arbitrarily capture identifiers that already have a resolution.) Obviously, some names don't have a resolution. E.g., if one augmentation library provides a top-level declaration named x, and a function body in a later augmentation library contains a reference to x then we have no way to know what x means when we are inspecting the latter augmentation library and its transitive imports alone. So we could have a best effort approach where we resolve as many names as possible to imported declarations, and then we treat all remaining identifiers as non-hygienic (we just treat them as raw text, and whatever happens when they are merged, happens).

We could also try to be fully hygienic, but I don't think that is possible, because of the names that cannot possibly have a known resolution as seen from each augmentation library + imports, or as seen from the main library + imports.

Finally, we could be fully unhygienic and use a simple textual process. I think that's a terrible idea (and so did many earlier designers of static metaprogramming systems). But it is probably the most consistent approach we can choose (the semantics is consistently weird ;-), and surely it's not hard to implement.

@lrhn
Copy link
Member

lrhn commented Mar 7, 2024

I don't think we should assign a semantics to an augmentation library, separate from the library it's augmenting. Like a part file, it has no meaning except in the context of the entire library.

TL;DR: I suggest viewing a library with augmentations as having a set of declarations, where the declaration for a particular name is defined by the combination of the base declaration and all applicable augmentation declarations in augmentation declaration order. This "declaration stack" is what defines the member, not any of them individually.

What I'd probably go for, without introducing new semantic models, is saying:

  • A library is defined by the source of all its files (which is the library file and all the part and augmentation files transitively referenced by that, which must exist and parse correctly).
  • When we have the source of the library, we can start looking for declarations.
  • For each non-augment declaration in any library file, remember it in a collection (uniquely identified by its parsed grammar, fx a <declaration>, its position in the source (file URI and "location", anything that can distinguish two distinct class C {} declarations, so we can see that there are two), and its declared name (identifier or identifier + = for setters). A writable variable is added twice, once with name id and once with name id=.
  • It's a compile-time error if there are two declarations in this collection with the same name.
  • It's a compile-time error if there is a declaration named id=, and a declaration named id which is not a getter or variable declaration.
  • Then look at all the top-level augment declarations. It's a compile-time error if there is an augment declaration named id, and there is no non-augment declaration with the same name. (For writable variable augmentations, check for both getter and setter names). We can check if it has the correct kind here too, but it's not important yet. We'll fail soon enough if it isn't.

At this point we know that the declarations are uniquely defined by their name, and augmentations are matched to a declaration with the same name. We haven't started figuring out what they mean yet, but we can use that to find the set of declared names.

  • Do the above for every library.
  • Now find the exported declarations for each library as a fixed-point computation.
    • For each library, the exported names starts as a collection with a collection containing its non-privately named declarations.
    • Then repeat until a fixed point: For each library, for each export in a file of that library,
      • Add all the exported declarations from the referenced libraries, except if hidden by the export declaration,
        to the exported names of this library, if they are not already in the collection.
  • Compile-time error if any two exported declarations have the same name or if exporting a setter and a non-getter/variable with the same name. (Maybe something about getter/setter pair from different sources.)

Now we know which names and base declarations are exported by each library, which means we can start looking at imports.

  • Resolve imports, and import conflicts, as normal. Each successfully imported declaration (name, source, location) introduces a name into the lexical scope of the importing library or library augmentation file.

Now we start looking at augmentations inside a library. We define a total ordering of declarations in a library, across all files, as:

  • Source order for declarations inside the same file,
  • Depth first preorder traversal of the other files, visiting part files in part declaration source order, then library augmentation files in augment import declaration source order.

If a library contains, fx, a class declaration, then the semantics of the class is defined not just from that one declaration, but from an ordered sequence of declarations, which is that class declaration, followed by all augment declarations with the same name in augmentation application order. The "source of a class" is the ordered sequence of the initial declaration and its augmentations. Same for any other declarations.

And now we make it a compile-time error if the augmentations are not valid augmentations of the prefix of the sequence before it. We can do that by collecting some model of the "combined/merged" effect of the augmentations, with suitable compile-time errors if things occur out of order or are otherwise not valid augmentations. Or we can define each property of the stack recursively. Say "a function declaration augmentation sequence has a named parameter with name x if the non-augmentation declaration of the sequence does." and "The declared type of that variable from the sequence is the declared type of the last entry which declares a type for that particular parameter, or there is no declared type if no entry has a a type annotation for that parameter."

In any case this declaration + augmentation stack is uniquely defined by the library source, by the augmentation declaration order, so we don't need to store it anywhere. We can simply say "the class introduced by the declaration C in library L" and have it actually mean "the class introduced by the declaration C plus all augmentations of it in library L", because that's the same thing, simply by being in the same library.

(I don't remember which declarations are in scope in each library augmentation file. If a non-augment declaration from a later-in-augmentation-application-order is not in scope in the library augmentation, then it also cannot augment it. That's not necessary, but might be reasonable. Just need to make it an error to augment a declaration that occurs in a file that's later in augmentation application order - I'd allow writing the augmentation first in the same file.)

Inside a class, or other scope, the member declarations are the collection of all non-augment member declarations from the base declaration and all augmentations of that class (or other), no duplicates allowed as usual, no setter/other, no static/instance setter/getter, etc. And then each such member declaration can have a stack of member declaration augments from later augmentations.

Every property of an augmented declaration is derived from the stack of declarations. That we can define recursively by applying each augmentation in order.

When invoking an instance member, the member lookup algorithm will now traverse augmentation chains between doing super-class chain steps. If a member is not found in one class, start looking in the superclass, looking through augmentations in reverse augmentation application order.

An invocation of an augmentation function declaration knows which other, specific, function declaration to invoke when using augmented(). We do need a way to invoke a specific method, static or instance, even if we can't denote it by simple name. That's OK. We can say semantically what happens:

class C {
  int get foo => 21; /*loc1*/
}
augment class C {
  augment
  int get foo => augmented * 2; /*loc2*/
}
void main() {
  print(C().foo);
}

The semantics of invoking C().foo is to start with the runtime type of the value C() (which is C),
then look at its foo declaration-stack, which starts at the foo declaration at location "loc2".
Then we invoke that with this bound to the instance of C. It invokes augmented, which resolves
to invoking the foo declaration at location "loc1" with this bound to the instance of C.
That returns 21, which is then multiplied by 2 and returned.
Obviously it prints "42". :)

Declarations are not identified by name only. There can be multiple declarations with the same name in the same class. They are uniquely defined by the source declaration they come from. (Plus something for mixins, where the invocation should also know which mixin application class the method was run from, so that it can do super calls.)

@eernstg
Copy link
Member Author

eernstg commented Mar 8, 2024

TL;DR: I suggest viewing a library with augmentations as ...

I think this supports the assumption that there is a non-trivial amount of modeling that needs to be settled. ;-)

@munificent
Copy link
Member

Also, merging is a process that produces a library from something, which is represented by the text of the main library and the augmentation libraries, but might also be considered to be some kind of a semantic entity, that is, the text, or an AST, plus some data established by a static analysis step like name resolution, following some rules that will surely have something in common with Dart scope rules.

Agreed.

I don't think we will be happy about an approach where the merged library is produced by a pure search-and-replace operation on text. In particular, in terms of the example in this section, it seems obviously impossible to me to provide the scoping as described if we insist that the C.isEven() body and the C._isOdd() body is just copied as written in the augmentation libraries into the merged library. There's no way we can then know that some names in the first body must be resolved to declarations in other_lib.dart, and some (perhaps the same) names in the second body must be resolved to declarations in also_lib.dart.

Correct. It definitely doesn't work to just textually transclude the files. It's also the case that you can't resolve all identifiers before you merge either.

Here's one model that might work:

We treat each library augmentation file as an AST but where each identifier has been tagged with the library augmentation that it came from. For uniformity, you can also think of the body of the main library as essentially an augmentation of itself.

At augmentation merge time, all we need to do is note which file every identifier appeared in, which we know syntactically. When we merge declarations, we're mostly just stuffing ASTs into collections of members. You can think of that syntactically ("append them to the class declaration...") or semantically ("add this member to the class's member namespace...") but I think it's a distinction that doesn't really matter.

You're basically just building up sets of declarations that need to be resolved. At the point that you do this merging, you can detect and report collisions.

Then after everything is merged, you can resolve identifiers and type check. When resolving an identifier, you walk up the lexical scopes until you hit the top level scope of the resulting merged library. Then, if that fails, you look in the augmentation library scope attached to the identifier.

@lrhn
Copy link
Member

lrhn commented Apr 3, 2024

We treat each library augmentation file as an AST but where each identifier has been tagged with the library augmentation that it came from

We usually treat the AST implicitly by just referring to the source by their grammar productions.
A grammar production can refer to its surrounding context (when giving semantics to a method declaration, we can talk about the "surrounding class declaration", which is a way to say that we came to this production by traversing other productions, starting at the file level, where we know the URI used to reference the file.

So so far it's pretty much what we do today.

When we merge declarations, we're mostly just stuffing ASTs into collections of members

So a "class" is an ordered collection (stack/sequence) of a base class declaration and it's augment class augmentation declarations in augmentation order, the members are each a stack of base declaration and augmentation declarations (derivable from the class declaration stack, in the same order).

And then we can derive properties, like superclasses, mixins, interfaces and modifiers of the resulting class from the collection, and properties of the members from their individual collections, or get them indirectly from the class collection every time we need them.

We don't even need to be shoving ASTs into anything, we can derive the class collection from the source every time we need it, like a function augmentationsFor taking a non-augmentation declaration, finding the surrounding library, then collecting all the augmentations for that declaration in they library by following part/augment references.

We can even define a nextAugmentation function that takes any declaration, augmentation or not , and returns the next augmentation declaration that applies to it (prior declaration that it's augmenting), if any.
(And augmentedDeclaration that takes an augment declaration and gives the prior declaration that it augments.)
That's all entirely derivable from the source, and the declaration and context of the current declaration.

We do then need to say what it means for the class hierarchy, type interface, and runtime signature of members, of which there is still only one per effective declaration.
We need a language to talk about grammatical declarations separately from augmented declarations (the former defines the latter, the latter defines the semantics of the name), and use the correct one at different points on the language specification.
And then define augmentation as the (well defined) step from one to the other.

@eernstg
Copy link
Member Author

eernstg commented May 1, 2024

A sketch of a merging semantics proposal is available in #3741.

@lrhn
Copy link
Member

lrhn commented May 23, 2024

A sketch of a framework and terminology for specifying the behavior of augmentations is available here.

The idea is that a syntactic declaration has some properties. We then abstract those properties into a separate concept, a logical declartion, which is just the properties, and applying a (syntactic) augmenting declaration to a logical declaration will create a new logical declaration, with new properties, that may not exist syntactically in a single place in the source.

Then we just have to make sure that everything in the spec is defined in terms of the properties, not the syntax, so that it can continue making sense on the result of applying augmentations, without having to re-introduce any syntax.

@tatumizer
Copy link

tatumizer commented May 23, 2024

Can this proposal be simplified if you defined the category of "sticky properties" (those that can't be modified by the augmentation) as opposed to "non-sticky" (can't find a better word) ones? Then you could just enumerate the sticky properties for each type of declaration and save a bit of space. (It seems the sticky ones should be explicitly repeated by augmentation, and an attempt to override them is an error). Anyway, an appropriate categorization of properties could make the proposal more general and concise (and look less like an enumeration of special cases). FWIW.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants