diff --git a/proposals/p3980.md b/proposals/p3980.md new file mode 100644 index 0000000000000..0a250ed03b494 --- /dev/null +++ b/proposals/p3980.md @@ -0,0 +1,790 @@ +# Singular `extern` declarations + + + +[Pull request](https://github.com/carbon-language/carbon-lang/pull/3980) + + + +## Table of contents + +- [Abstract](#abstract) +- [Problem](#problem) +- [Background](#background) +- [Proposal](#proposal) + - [Declarations](#declarations) + - [Owning `extern` declarations](#owning-extern-declarations) +- [Details](#details) + - [Type coherency](#type-coherency) + - [Impact on indirect imports](#impact-on-indirect-imports) + - [Indirect imports of non-`extern` types](#indirect-imports-of-non-extern-types) + - [Using imported declarations](#using-imported-declarations) + - [`private extern`](#private-extern) + - [Validation for non-owning `extern library` declarations](#validation-for-non-owning-extern-library-declarations) + - [No syntactic matching for `extern library` declarations](#no-syntactic-matching-for-extern-library-declarations) + - [Versus proposal #3762](#versus-proposal-3762) +- [Rationale](#rationale) +- [Future work](#future-work) + - [`extern` and template interactions](#extern-and-template-interactions) +- [Alternatives considered](#alternatives-considered) + - [Allow multiple non-owning declarations, remove the import requirement, or both](#allow-multiple-non-owning-declarations-remove-the-import-requirement-or-both) + - [Total number of allowed declarations (owning and non-owning)](#total-number-of-allowed-declarations-owning-and-non-owning) + - [Do not restrict the number of forward declarations](#do-not-restrict-the-number-of-forward-declarations) + - [Allow up to two declarations total](#allow-up-to-two-declarations-total) + - [Allow up to four declarations total](#allow-up-to-four-declarations-total) + - [Don't require a modifier on the owning declarations](#dont-require-a-modifier-on-the-owning-declarations) + - [Only require `extern` on the first owning declaration](#only-require-extern-on-the-first-owning-declaration) + - [Separate require-direct-import from non-owning declarations](#separate-require-direct-import-from-non-owning-declarations) + - [Other `extern` syntaxes](#other-extern-syntaxes) + - [Have types with `extern` members re-export them](#have-types-with-extern-members-re-export-them) + - [Require syntactic matching for `extern library` declarations](#require-syntactic-matching-for-extern-library-declarations) + + + +## Abstract + +An entity may be declared `extern` (such as `extern class Foo;`); this means +that its type is only complete if the definition is directly imported. It also +allows for a single declaration in a different library, which must be marked as +`extern library ""` (such as `extern library "Bar" class Foo;`). + +Also, establish a different rule of thumb for when modifier keywords are +required: modifier keywords are required when, if prior optional declarations +were removed, the lack of the modifier keyword would change behavior. + +## Problem + +In the `extern` model from +[#3762: Merging forward declarations](https://github.com/carbon-language/carbon-lang/pull/3762), +multiple `extern` declarations are allowed. +[#3763: Matching redeclarations](https://github.com/carbon-language/carbon-lang/pull/3763) +further evolved the `extern` keyword. + +The prior `extern` model assumed that the `extern` and non-`extern` declarations +of a class formed two different types, which could be merged. +[As discussed on #packages-and-libraries](https://discord.com/channels/655572317891461132/1217182321933815820/1230990636073881693), +this runs into an issue with code such as: + +``` +library "a"; +class C {} +``` + +``` +library "b"; +extern class C; +extern fn F() -> C*; +``` + +``` +library "c"; +import library "a"; +extern fn F() -> C*; +``` + +Here, the return types of `F` differ. + +This proposal aims to address the differing return types by unifying the type of +`C` regardless of whether it's `extern`. This could be done under multiple +different approaches, and this proposal aims for one which enables efficient +implementation strategies. + +## Background + +Proposals: + +- [#3762: Merging forward declarations](https://github.com/carbon-language/carbon-lang/pull/3762) +- [#3763: Matching redeclarations](https://github.com/carbon-language/carbon-lang/pull/3763) + +Discussions: + +- [#packages-and-libraries: `extern` type coherency](https://discord.com/channels/655572317891461132/1217182321933815820/1230990636073881693) +- [#packages-and-libraries: When to allow/disallow redeclarations](https://discord.com/channels/655572317891461132/1217182321933815820/1236016051632865421) +- [Open discussion 2024-05-09: Number of allowed redeclarations](https://docs.google.com/document/d/1s3mMCupmuSpWOFJGnvjoElcBIe2aoaysTIdyczvKX84/edit?resourcekey=0-G095Wc3sR6pW1hLJbGgE0g&tab=t.0#heading=h.bu7djkos4xo) +- [Issue #3986: Alternative naming for `has_extern` keyword](https://github.com/carbon-language/carbon-lang/issues/3986) +- [Issue #4025: Handling of indirect access of `extern` types](https://github.com/carbon-language/carbon-lang/issues/4025) +- [#typesystem: Will `&` have an extension point?](https://discord.com/channels/655572317891461132/708431657849585705/1258150877714452581) + +## Proposal + +### Declarations + +A given entity may have up to three declarations: + +- An optional, non-owning `extern library ""` declaration + - It must be in a separate library from the definition. + - The owning library's API file must import the `extern` declaration, and + must also contain a declaration. +- An optional, owning forward declaration + - This must come before the definition. The API file is considered to be + before the implementation file. +- A required, owning definition + +The consequential changes to the [problem example](#problem) are then: + +``` +library "a"; + +// This proposal makes the import required. +import library "b"; + +// This proposal makes `extern` required here. +extern class C {} +``` + +``` +library "b"; +// This proposal makes `library "a"` required here. +extern library "a" class C; +extern fn F() -> C*; +``` + +``` +library "c"; +import library "a"; +extern fn F() -> C*; +``` + +### Owning `extern` declarations + +On an owning `extern` declaration, such as `extern class C {}`, there are two +key effects: + +1. The declaration must be explicitly imported in order to be complete. + - An "explicit import" means some import path exists where the name is + available to name lookup, including `export import` and `export `. +2. A non-owning `extern library "` declaration is allowed, but + not required. + +If _either_ owning declaration has the `extern` modifier, _both_ must have it. + +## Details + +### Type coherency + +In the context of the example that is the [problem](#problem), `C` will produce +the same type regardless of whether `C` is the owning or non-owning declaration. +This means that both function signatures have identical types. + +We do this by only producing a complete type if the owning definition of `C` is +imported by name: either directly through `import library "a"`, or indirectly +through a chain of `export import library "a"` and `export C;`. Otherwise, an +incomplete type is used. + +This does mean that adding `extern` to an owning declaration changes the import +semantic. As a consequence, it is a potentially breaking change for API +consumers that didn't explicitly import the time. + +In the presence of `extern library "a" class C;`, the required +`import library "b"` means that all owning `extern class C` declarations are +able to see the `extern library "a" class C` declaration as a name collision, +which is merged. This allows the compiler to easily apply the same type to all +declarations. That in turn will be used to ensure libraries which import both +understand the type equality. + +#### Impact on indirect imports + +An entity marked as `extern` is only complete when the definition is explicitly +imported. In the following, examples of indirect, non-explicit uses are given +inside `library "o"`. + +``` +library "m"; + +extern class C { fn Member(); } +``` + +``` +library "n"; +import library "m"; + +fn F() -> C; +var c: C = {}; +var pc: C* = &c; +``` + +``` +library "o"; +import library "n"; + +// Invalid: The return type of `C` is incomplete, making the function signature +// invalid. +fn G() { F(); } + +// Invalid: Accessing members requires `C` to be complete. +fn UseC() { c.Member(); } + +// Valid: Taking the address of `C` doesn't require it to be complete. This is +// possible because `&` doesn't have an extension point. +var indirect_pc: auto = &c; + +// Invalid: Copying `C` requires the complete type. +var copy_c: auto = c; + +// Valid: Pointer-to-pointer copies are okay. +var copy_pc: auto = pc; +``` + +#### Indirect imports of non-`extern` types + +The above rules explicitly do not apply for non-`extern` types, as decided in +[Issue #4025](https://github.com/carbon-language/carbon-lang/issues/4025). In +other words: + +``` +library "a"; + +class C { fn F(); } +``` + +``` +library "b"; +import library "a"; + +fn G() -> C; +``` + +``` +library "c"; +import library "b"; + +// Valid: `C` is complete here, even though it's not in name lookup. +G().F(); +``` + +### Using imported declarations + +Since `extern library "a" class C;` must be imported by the owning library, we +now allow uses of the imported name prior to its declaration within the same +file. This is a divergence from +[#3762](https://github.com/carbon-language/carbon-lang/pull/3762). It means the +following now works: + +``` +library "extern"; + +extern library "use_extern" class MyType; +``` + +``` +library "use_extern"; +import library "extern" + +// Uses the `extern library` declaration. +fn Foo(val: MyType*); + +extern class MyType { + fn Bar[addr self: Self*]() { Foo(self); } +} +``` + +### `private extern` + +Previously, in +[#3762](https://github.com/carbon-language/carbon-lang/pull/3762), a non-owning +`private extern` was valid to declare something as extern without exposing the +name. In this proposal, that would be a non-owning +`private extern library ""` for an owning public `extern` +declaration. However, rather than supporting this version of the syntax, it will +instead be invalid because the name would never be visible to the owning +library. Instead, visibility must match between an +`extern library ""` declaration and the owning `extern` +declaration. + +Note, because an owning `extern` declaration can be used independently of +`extern library ""`, an owning `private extern` declaration is +valid in an API file. It has no special behaviors about it, and is merged as +normal. + +### Validation for non-owning `extern library` declarations + +We should offer some validation that the library in `extern library` is correct. +When the owning library is incorrect, it's very likely to be detected in two +cases: + +- A compile-time error when the owning library imports the non-owning library, + when the owning declaration is evaluated. +- A link-time error as a fallback. + +Other cases, such as when both libraries are independently imported, may or may +not be caught, dependent upon the cost of validation. + +### No syntactic matching for `extern library` declarations + +The non-owned `extern library` declarations will only use semantic matching for +redeclarations, not syntactic matching. Details of syntactic matching laid out +in [#3763](https://github.com/carbon-language/carbon-lang/pull/3763) will only +apply to owned declarations in the same library, which may include owned +`extern` declarations. + +### Versus proposal #3762 + +Versus proposal +[#3762](https://github.com/carbon-language/carbon-lang/pull/3762), the `extern` +feature is essentially rewritten. No part of `extern` should be assumed to still +apply. + +## Rationale + +- [Software and language evolution](/docs/project/goals.md#software-and-language-evolution) + - Unifying the type of `extern` entities addresses a type coherency issue. + - The `extern` behavior of requiring an explicit import is intended to + assist library authors in carefully managing the dependencies on their + API. +- [Fast and scalable development](/docs/project/goals.md#fast-and-scalable-development) + - Requiring the non-owning `extern library` declaration be imported by the + owning library should improve compiler performance. + +This proposal makes a trade-off with +[Interoperability with and migration from existing C++ code](/docs/project/goals.md#interoperability-with-and-migration-from-existing-c-code). +The restriction of a unique `extern` declaration is expected to require +additional work in migration, because C++ `extern` declarations will need to be +consolidated. This is currently counter-balanced by the trade-offs involved, +although it may result in a reevaluation of that aspect of this proposal. + +## Future work + +### `extern` and template interactions + +We've only loosely discussed template interactions with `extern`. Right now, +what we expect is that when a template declaration uses an `extern` type, the +_instantiation_ still occurs in the calling file. Thus, the `extern` type's name +would need to be imported in both the file declaring the template, and the file +calling the template. + +When the template is in the same package as the `extern` type, it could +re-export it. However, we don't support re-exporting names cross-package, and +something like `let template ExternType:! auto = OwningPackage.ExternType;` +would not actually forward the _completeness_ of `ExternType`. + +This is expected to be inconvenient, but it may be okay if `extern` sees limited +use. It may also be that the template model ends up different from expected. + +## Alternatives considered + +### Allow multiple non-owning declarations, remove the import requirement, or both + +We limit to one non-owning `extern library` declaration. Continuing to allow +multiple `extern library` declarations (the previous state) is feasible. +Similarly, we could not require the owning `extern` declaration to import the +non-owning `extern library` declaration; this could be done with or without +multiple non-owning `extern library` declarations. For this set of alternatives, +the issues which would arise are similar. + +In the compiler, we want to be able to determine that two types are equal +through a unique identifier, such as a 32-bit integer. When one declaration sees +another directly, as through an import, we identify the redeclaration by name, +and reuse the unique identifier. This deduplication can occur once per +declaration. Indirect imports can continue to use the unique identifier. + +We could instead support unifying declarations that did not see each other. +However, this would require canonicalizing all types by name instead of by +unique identifier. For example, consider: + +``` +package Other library "type"; +extern class MyType { + fn Print(); +}; +``` + +``` +package Other library "use_type"; +import library "type"; +fn Make() -> MyType*; +``` + +``` +package Other library "extern"; +extern library "type" class MyType; +``` + +``` +package Other library "use_extern"; +import library "extern"; +fn Print(val: MyType*); +``` + +``` +library "merge"; +import Other library "use_type"; +import Other library "use_extern"; +Other.Print(Other.Make()); +``` + +Here, the "merge" library doesn't see either declaration of `MyType` directly. +However, `Print(Make())` requires that both declarations of `MyType` be +determined as equivalent. This particular indirect use also means that the names +will not have been added to name lookup, so there is no reason for the two +declarations to be associated by name. + +In order to do merge these declarations, we would need to identify that fully +qualified names and other structural details are equivalent when the type is +used (including non-explicit uses, such as interface lookup). We could achieve +this, for example, by having a name lookup table for in-use types, managed per +library. Each library would also need to validate that declarations were +semantically equivalent, versus the current approach validating as part of the +redeclaration. The cost of a per-library approach is expected to have a +significant impact on the amount of work done as part of semantic analysis. + +We may end up wanting to do similar work in order to improve diagnostics for +invalid cases where the non-owning `extern library` is not correctly declared +and imported. However, additional work building good diagnostics for +already-identified invalid code is less of a concern than additional work on +fully valid code. + +In order to maintain a high-performance compiler, we are taking a restrictive +approach that makes it simpler to associate type information. + +### Total number of allowed declarations (owning and non-owning) + +A few options were considered regarding the number of allowed declarations. + +We limit to two owning declarations: the optional forward declaration, and +required definition. The need to provide interface implementations (for example, +`impl MyType as Add`) is considered to constrain this choice. + +In this category, alternatives considered were: + +- Do not restrict the number of declarations +- Allow up to two declarations total +- Allow up to four declarations total + +Details for why each alternative was declined are below. + +#### Do not restrict the number of forward declarations + +We could not restrict the number of forward declarations, allowing an arbitrary +amount -- possibly also after the definition. This would be consistent with C++. + +One thing to consider here is modifier keyword behavior. If we require modifier +keywords to match across all declarations, that could become a maintenance +burden for developers. If we don't, it makes the meaning of a given forward +declaration more ambiguous. + +This option is declined due to the lack of clear benefit. + +#### Allow up to two declarations total + +Under this option, we would only allow one forward declaration, treating the +non-owning `extern library` declaration as a forward declaration. This would +mean two declarations overall, instead of three. + +For this, the main concern was interactions between file placement of the +definition, and file placement of interface implementations. Interface +implementations must generally be in API files in order to be seen by other +libraries. + +For example: + +``` +library "i"; +interface I {} +``` + +``` +library "e"; +import library "i"; +extern library "o" class C; +extern library "o" impl C as I; +``` + +``` +library "o"; +import library "e"; +extern class C { } +extern impl C as I; +``` + +``` +impl library "o"; +extern impl C as I { } +``` + +If the definition is required to be in the API file in order to allow the +interface implementations in the API file, the API file would need to import +libraries required to construct the definition. That could create issues for +separation of build dependencies, and could also make it more difficult to +unravel some dependency cycles between libraries. + +If the definition was allowed to be in the implementation file even when there +were interface implementations in the API file, the ambiguity of seeing a +non-owning `extern library` declaration and being unsure of whether this was the +owning library could have negative consequences for evaluation of interface +constraints. + +The purpose of allowing a forward declaration when there is a non-owning +`extern` declaration is to make it clear for interface implementations that they +exist in the owning library, while processing the API file. + +#### Allow up to four declarations total + +The four declarations would be: + +1. Non-owning `extern library` declaration +2. Forward declaration in API file +3. Forward declaration in implementation file +4. Definition + +The number of forward declarations allowed is consistent with the current state +from [#3762](https://github.com/carbon-language/carbon-lang/pull/3762). + +This would allow for clarity when defining in the implementation file, to also +be able to put a forward declaration above -- even when the forward declaration +is pulled from the API file. + +If we're allowing declarations from another file (including the non-owning +`extern library` declaration) to be used before an entity is declared in the +same file, the motivating factor for allowing a repeat forward declaration in an +implementation file is removed. Previously, that was required for an entity to +be referenced prior to its definition. + +In discussion of this option, it was considered unclear why we would allow two +forward declarations, but not allow even more. The more popular choice seemed to +be not restricting, which was also declined. + +### Don't require a modifier on the owning declarations + +Instead of requiring an `extern` modifier on owning declarations, we could infer +from the presence of a non-owning `extern library` declaration. + +We had declined allowing a definition to control whether `extern library` was +allowed in discussion of +[#3762](https://github.com/carbon-language/carbon-lang/pull/3762), although this +is not directly mentioned in the proposal. At the time, it was dropped because +the owning library didn't need to include `extern` declarations, and so having +the definition opt-in to allowing `extern` was viewed as low benefit. However, +now that the owning library must import the `extern` declaration, there is a +tighter association and so we reevaluated. + +The `extern` modifier offers a benefit for being able to verify the association +between non-owning and owning declarations, and offers additional parity in +modifiers. It also makes it easy for a tool to know if it's missing a +declaration. + +### Only require `extern` on the first owning declaration + +At present, we require `extern` on _all_ owning declarations. We could instead +only require `extern` on the first owning declaration and, if there's a separate +forward declaration and definition, infer it for the definition. For example: + +``` +// `extern` on the forward declaration. +extern class C; +// Infer `extern` for the definition. +class C {} +``` + +The decision to require `extern` on all owning declarations is based on wanting +the forward declaration to be optional. A rule of thumb was discussed wherein if +a forward declaration could be removed without breaking the definition (as +defined by it being in the same lexical scope), keywords should be duplicated to +the definition. This is not proposed as a rule because it's not clear whether +we'll generally follow it, but it's why this particular choice is taken. + +### Separate require-direct-import from non-owning declarations + +At present, an `extern` modifier on an owning declaration serves two purposes: + +1. Indicates that a non-owning `extern library` declaration _can_ exist. +2. Indicates the declaration must be directly imported in order to be complete. + +This means that: + +- The presence of `extern` on an owning declaration cannot be used to + determine whether a non-owning declaration exists. + - Because the location of a non-owning declaration isn't explicit in the + owning code, this may lead to a developer failing to find the non-owning + declaration and misunderstanding that as the non-existence of a + non-owning declaration. +- Libraries which happen to be imported by the owning declaration may freely + add or remove non-owning `extern library` declarations without modifying the + owning library. + +We could give distinct syntax to the two purposes, so that they could be managed +separately. The preference at present is to use a single syntax for both +purposes, rather than emphasizing control or correspondence. + +### Other `extern` syntaxes + +[Issue #3986](https://github.com/carbon-language/carbon-lang/issues/3986) +discussed other syntaxes for `extern` + `extern library`. These were mainly +`has_extern`/`is_extern`/`externed` + `extern`. + +Breaking down `extern`, there are two features which could have been provided +separately: + +1. Declaring an entity has a forward declaration in a separate library. + - Also, declaring that forward declaration in a separate library: + `extern library ""`. +2. Declaring an entity must be imported directly. + +Although (1) must depend on (2), a different design could provide (2) without +making (1) possible, for example with different keywords to differentiate +between intended usage (`has_extern class C;` meaning (1) and (2), `must_import` +meaning (2) only). However, the `extern` keyword approach means developers have +all or nothing. + +Considering that, the trade-offs are viewed as: + +- The primary motivation is to provide feature (1). +- Leads wanted a syntax on the owning declaration that states something + positive about the owning declaration itself, rather than expressing that + other declarations exist, which suggests that the syntax on the owning + declaration should provide feature (2). +- Leads consider it valuable, though secondary, to support (2) separate from + (1), and find it acceptable to make (1) optional to achieve this (in other + words, making the `extern library ""` declaration optional). + - It's okay that that `extern library ""` can be added and + removed from imported libraries without modifying the owning library. + - If a developer considers it important to disambiguate the intended use + of a declaration `extern class C;` and whether there should be a + declaration in a separate library, they can add comments. +- `extern` seemed like an acceptable name for this approach, and alternative + names seemed significantly less good. +- Using `extern` for both features still only creates one new keyword, versus + multi-keyword approaches. +- Adding the owning library with `extern library ""` will + hopefully improve diagnostics and human understandability of the code. + - It is _very_ verbose, but this verbosity goes on the forward declaration + in the non-owning library. When it's read, which will hopefully be less + often than the actual declaration, it will provide the reader directions + to find the actual declaration. + - If in practice we find the verbosity becomes a significant issue, we can + revisit syntaxes to address that specifically. For example, if we have + significant repetiton, we might consider a grouping structure such as + `extern library "..." { }`. + +### Have types with `extern` members re-export them + +We expect there will be types that have `extern` members; these types are only +truly complete if their members are complete. + +We discussed having such types automatically re-export the `extern` members, +possibly requiring the types to also be `extern` in order to be allowed to have +`extern` members. For example: + +``` +library "a" +extern library "b" class A; +``` + +``` +library "b" +import library "a" +extern class A {} +// B re-exports A so that it's complete on use. +class B { var a: A; } +``` + +``` +library "c" +import library "b" +// Importing this function declaration gets B, which again, re-exports A so that +// it's complete on use. +fn F() -> B { ... } +``` + +``` +library "d" +// This import loads the incomplete name for A. +import library "a" +// This import loads F, which loads B, which loads the definition of A. +import library "c" + +// Because of the import behaviors, this is valid. +var a: A; +``` + +We consider this action-at-a-distance. Type coherency means the `A` member of +`B` is the same as the `A` in name lookup; we could make them behave slightly +differently, but then we get into provenance tracking of type information. +Several various forms of this have been discussed as part of the `extern` +design, and it's something we've decided to avoid. + +Although it's more inconvenient, we will require `A` to be deliberately imported +in order for `B` to be complete. + +### Require syntactic matching for `extern library` declarations + +We will not require syntactic matching for `extern library` declarations, but we +could. + +When a redeclaration is in the same library, we've designed name lookup in a way +such that syntactic matching is effectively a superset of semantic matching. +However, that relies on poisoning entries in name lookup, with later +redeclarations seeing identical name lookup data. Because different libraries +have different name lookup data, syntactic matching _not_ a superset of semantic +matching cross-library. We address this schism by only requiring semantic +matching. + +Semantic matching will include parameter names. The difference is primarily in +whether different ways of producing the same type information are considered +invalid or not. + +For example: + +``` +library "a"; + +class A {} +namespace NS; +extern library "c" fn NS.F() -> A; +``` + +``` +library "b"; + +namespace NS; +class A {} +``` + +``` +library "c"; import library "a" import library "b" + +extern fn NS.F() -> NS.A {} +``` + +Semantically, `NS.F` in libraries "a" and "c" are identical. Syntactically, they +differ because of `NS.A` in "c". Writing `A` in "c" is invalid because it would +use `NS.A` from "b". But in "a", there is nothing to make the declaration +invalid: it would only be invalid after completing cross-library compilation. + +However, we could also have code such as: + +``` +library "d"; + +class D {} +namespace NS; +extern library "e" fn NS.G() -> D; +``` + +``` +library "e"; + +namespace NS; +alias NS.D = D; +extern fn NS.G() -> D {} +``` + +Here, the semantics and syntax match, but this would be invalid in a normal +redeclaration due to the different name lookup result for `D`. + +This additionally gets into a different statement made in +[#3763](https://github.com/carbon-language/carbon-lang/pull/3763) to justify +synactic matching: "The intention is that whenever the syntax matches, the +semantics must also match." Due to the differences in name lookup, syntax +matching does not mean semantics must match; instead of `alias NS.D = D;`, that +could have been `alias NS.D = i32;` and the syntax would have still matched. +This only works in a library because "...we persist syntactic information from +the API file to implementation files." We cannot persist syntactic information +cross-library, across imports. + +Due to the differences in the guarantees that syntactic matching provides for +owned declarations versus non-owned declarations, we will not enforced syntactic +matching on the non-owned `extern library` declarations.