Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Principle: information accumulation #875

Merged
merged 13 commits into from
Mar 16, 2022
Merged

Principle: information accumulation #875

merged 13 commits into from
Mar 16, 2022

Conversation

zygoloid
Copy link
Contributor

@zygoloid zygoloid commented Oct 8, 2021

This proposal addresses the question of whether we should perform a fully top-down compilation (like in C), a mostly top-down compilation (like in C++), or whether we should allow information from later in the same source file to be used in earlier program constructs (like in Rust, Swift, Java, C#, Haskell, and so on).

The proposed direction is:

  • Entities declared later in the same source file cannot be used earlier; top-down semantics apply everywhere.
    • As an exception, class member function bodies are parsed as if they appeared after the class.
  • Forward declarations can be used to separate interface from implementation and to allow entities to be used before they are defined.
  • The behavior of the program is nonetheless required to be the same as if we had a globally-consistent rule: it's always a hard error to depend on any information that is not known or that is provided later.

Fixes #472.

@zygoloid zygoloid added the proposal A proposal label Oct 8, 2021
@zygoloid zygoloid requested a review from a team October 8, 2021 21:35
@zygoloid zygoloid added this to Draft in Proposals via automation Oct 8, 2021
@google-cla google-cla bot added the cla: yes PR meets CLA requirements according to bot. label Oct 8, 2021
Copy link
Contributor

@chandlerc chandlerc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some early thoughts on the draft here...

proposals/p0875.md Outdated Show resolved Hide resolved
proposals/p0875.md Show resolved Hide resolved
proposals/p0875.md Show resolved Hide resolved
proposals/p0875.md Outdated Show resolved Hide resolved
proposals/p0875.md Show resolved Hide resolved
proposals/p0875.md Show resolved Hide resolved
@chandlerc
Copy link
Contributor

Thinking at a high level about whether there are tradeoffs not mentioned here...

I think there are more disadvantages than you cover for disallowing separate declarations and definitions:

  • Familiarity with C++ would be hit significantly as this is heavily used to separate implementation details today.
  • Readability is I think more impacted than you are really covering. For example, forced nesting of definitions (or deep indentation more generally) is often cited as reducing readability. Losing the ability to define members out-of-line seems likely to have sharp consequences here. Especially for nested class members, etc.

It may be worth noting that these don't seem to have much to do with top-down processing...

proposals/p0875.md Outdated Show resolved Hide resolved
In order to support this and still permit cyclic references between entities, we
would need to permit separate declaration and definition.

_Comprehensibility:_ This rule is simple to explain, and has no special cases.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this rule is only simple to explain if declaration vs definition is simple to explain, which as noted above for C++ is not true

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is really a problem with separate declaration and definition rather than with the top-down rule. I've tried to separate the two out and added a discussion of this there.

@zygoloid zygoloid moved this from Draft to RFC in Proposals Oct 16, 2021
@zygoloid zygoloid marked this pull request as ready for review October 16, 2021 00:53
@github-actions github-actions bot added the proposal rfc Proposal with request-for-comment sent out label Oct 16, 2021
@zygoloid
Copy link
Contributor Author

@chandlerc I know this isn't your preferred direction; how opposed are you to this approach? In our most recent weekly meeting, we had a 5:2 preference for this direction over a more top-down model.

impl fn F() {}
```

_Comprehensiblity:_ In general, determining whether two declarations declare the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be greatly simplified if we require overloads to be defined together with a dedicated syntax.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. And this is mentioned now below in the simple compilation area.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mentioning that option below makes it all the more confusing that it's not mentioned here.

Also, given that these sub-alternatives are relevant to multiple goals (at least "comprehensibility" and "compilation", but arguably some of the others as well), they should probably be introduced prior to the goal subsections.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the text below and this suggestion, while superficially similar, are actually not the same thing. The text below is talking about a possible implementation strategy, particularly for name mangling, where functions are identified by ordinal position / declaration order in the API file, without restricting where in that API file they might be declared. This suggestion, as I understand it, is about matching declaration to definition by requiring that all declarations be provided together in a contiguous source utterance, and using a similar contiguous utterance to define the overload set in the same order. These options are compatible with each other, but are independent choices.

I've added a writeup of the dedicated syntax option here.

@zygoloid zygoloid requested a review from a team as a code owner January 8, 2022 02:44
Copy link
Contributor

@chandlerc chandlerc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just an initial quick pass at the text here with a few comments inline....

proposals/p0875.md Show resolved Hide resolved
proposals/p0875.md Show resolved Hide resolved
proposals/p0875.md Outdated Show resolved Hide resolved
impl fn F() {}
```

_Comprehensiblity:_ In general, determining whether two declarations declare the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. And this is mentioned now below in the simple compilation area.

@chandlerc chandlerc self-requested a review January 8, 2022 03:11
docs/project/principles/information_accumulation.md Outdated Show resolved Hide resolved
proposals/p0875.md Outdated Show resolved Hide resolved
proposals/p0875.md Outdated Show resolved Hide resolved
Copy link
Contributor

@jonmeow jonmeow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I know my comments are late, but I was looking again because of fowles' comments and thought I'd make a few...

proposals/p0875.md Outdated Show resolved Hide resolved
proposals/p0875.md Outdated Show resolved Hide resolved
docs/project/principles/information_accumulation.md Outdated Show resolved Hide resolved
docs/project/principles/information_accumulation.md Outdated Show resolved Hide resolved
proposals/p0875.md Show resolved Hide resolved
proposals/p0875.md Outdated Show resolved Hide resolved
proposals/p0875.md Outdated Show resolved Hide resolved
proposals/p0875.md Outdated Show resolved Hide resolved
proposals/p0875.md Outdated Show resolved Hide resolved
proposals/p0875.md Outdated Show resolved Hide resolved
proposals/p0875.md Outdated Show resolved Hide resolved
proposals/p0875.md Show resolved Hide resolved
proposals/p0875.md Show resolved Hide resolved
proposals/p0875.md Show resolved Hide resolved
proposals/p0875.md Show resolved Hide resolved
impl fn F() {}
```

_Comprehensiblity:_ In general, determining whether two declarations declare the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mentioning that option below makes it all the more confusing that it's not mentioned here.

Also, given that these sub-alternatives are relevant to multiple goals (at least "comprehensibility" and "compilation", but arguably some of the others as well), they should probably be introduced prior to the goal subsections.

proposals/p0875.md Outdated Show resolved Hide resolved
proposals/p0875.md Outdated Show resolved Hide resolved
Copy link
Contributor

@jonmeow jonmeow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a thumbs up on other responses, not resolving where I didn't start the thread.

proposals/p0875.md Outdated Show resolved Hide resolved
@jonmeow
Copy link
Contributor

jonmeow commented Jan 12, 2022

@chandlerc In discussion about this, you brought up an example of Swift code where a lot of functions referenced a class which was only defined at the bottom of the file. Trying to understand those functions required reading the class definition, which was out of a logical reading order. With the accumulation rule as given, the class would more likely be defined at the top of the file, and read first.

I would note though, I think the accumulation rule can also get in the way of BLUF-style reading, forcing implementation details to come first and the crux of the implementation to come last. For example, consider how we've implemented name lookup in Carbon:
https://github.com/carbon-language/carbon-lang/blob/trunk/executable_semantics/interpreter/resolve_names.cpp

To summarize layout here:

  • Forward declarations of AddExposedNames to address mutual recursion.
  • AddExposedNames implementations.
  • Forward declarations of ResolveNames to address mutual recursion.
  • ResolveNames implementations, calling AddExposedNames.
  • The primary ResolveNames(AST) function, which is the public API, and calls the other ResolveNames APIs.

From a BLUF perspective, I'd like to put the public function first: it's the high-level summary of what's going on. Then, dig into the function it calls, with the small helpers at the end.

From my perspective, C++'s name lookup rule exerts a strong pressure to put small helper functions at the top of the file, and the key implementation details at the bottom -- I view this as undesirable because I think people would prefer to see the main implementation, then walk through its calls.

If the leaning is that classes should have public APIs first and private APIs last, I think that reflects similar BLUF preferences: the public APIs are the main implementation, the private APIs are the helpers. I think the difference is that C++ has trained us to accept a different layout between class and free functions.

This pattern continues into header files, for example when it's necessary to provide implementation-detail classes above public classes just to ensure name lookup is satisfied. Sometimes forward declarations can be used to partially address this, but the forward declarations would still be first and would still be less important to public API users; also, either language limitations or a developer's preference for minimizing forward declarations can prevent their use.

Allowing more arbitrary layouts will certainly allow style issues as in the Swift example, where developers may provide wholly illogical layouts. I think flexibility in name lookup can also allow better API layouts, where developers aren't forced to put private implementation details above public APIs just because name lookup requires it.

proposals/p0875.md Outdated Show resolved Hide resolved
proposals/p0875.md Show resolved Hide resolved
proposals/p0875.md Outdated Show resolved Hide resolved
proposals/p0875.md Outdated Show resolved Hide resolved
proposals/p0875.md Show resolved Hide resolved

As in C++, member functions of nested classes would be deferred until the
outermost class is complete. Unlike in C++, this deferral would apply only to
the bodies of member functions, and not to default arguments or other contexts
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what the word "contexts" here means.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expanded to a complete description of the difference from C++.

Copy link
Contributor

@chandlerc chandlerc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't finished going through all the alternatives text yet, but at least covered the main principle and some of the proposal. I'd like to see if there is any way to compress or be more brief on the alternatives without losing information. But if not, it does seem better to capture the information, it may just be worth making sure it summarizes reasonably well.

docs/project/principles/information_accumulation.md Outdated Show resolved Hide resolved
docs/project/principles/information_accumulation.md Outdated Show resolved Hide resolved
proposals/p0875.md Outdated Show resolved Hide resolved
proposals/p0875.md Outdated Show resolved Hide resolved
proposals/p0875.md Outdated Show resolved Hide resolved
proposals/p0875.md Outdated Show resolved Hide resolved
Copy link
Contributor

@chandlerc chandlerc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Just wanted to note that other than a minor wording tweak an the re-ordering of some of the alternatives, I think I'm basically happy here. But I'd like to give folks with comment threads a chance to look at the updated version and see if at least their comments are addressed -- I understand that the direction is going to be universally like, but I want to make sure the write up doesn't have actively confusing points.)

Copy link
Contributor

@geoffromer geoffromer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm broadly happy with this; just a few minor suggestions for clarification.

proposals/p0875.md Outdated Show resolved Hide resolved
proposals/p0875.md Outdated Show resolved Hide resolved
proposals/p0875.md Outdated Show resolved Hide resolved
proposals/p0875.md Show resolved Hide resolved
Co-authored-by: Geoff Romer <gromer@google.com>
Copy link
Contributor

@chandlerc chandlerc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good. I think it aligns well with the leads' decision. I know we're not going to hit full consensus here, but I'm not seeing a lot of critical comment threads left. I think this LG to land as-is. The proposal itself I think tries to capture the borderline nature of a bunch of these tradeoffs, and the principle is I think simple and to the point and well aligned with the leads decision.

Ship it!

Copy link
Contributor

@chandlerc chandlerc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good. I think it aligns well with the leads' decision. I know we're not going to hit full consensus here, but I'm not seeing a lot of critical comment threads left. I think this LG to land as-is. The proposal itself I think tries to capture the borderline nature of a bunch of these tradeoffs, and the principle is I think simple and to the point and well aligned with the leads decision.

Ship it!

@zygoloid zygoloid merged commit 8259f76 into carbon-language:trunk Mar 16, 2022
Proposals automation moved this from RFC to Accepted Mar 16, 2022
@zygoloid zygoloid deleted the principle-information-accumulation branch March 16, 2022 21:42
@github-actions github-actions bot added proposal accepted Decision made, proposal accepted and removed proposal rfc Proposal with request-for-comment sent out labels Mar 16, 2022
chandlerc pushed a commit that referenced this pull request Jun 28, 2022
This proposal addresses the question of whether we should perform a fully top-down compilation (like in C), a mostly top-down compilation (like in C++), or whether we should allow information from later in the same source file to be used in earlier program constructs (like in Rust, Swift, Java, C#, Haskell, and so on).

The proposed direction is:

-   Entities declared later in the same source file cannot be used earlier; top-down semantics apply everywhere.
    -    As an exception, class member function bodies are parsed as if they appeared after the class.
-   Forward declarations can be used to separate interface from implementation and to allow entities to be used before they are defined.
-   The behavior of the program is nonetheless required to be the same as if we had a globally-consistent rule: it's always a hard error to depend on any information that is not known or that is provided later.
zygoloid added a commit that referenced this pull request Jul 1, 2022
…type is known. (#1352)

Following #875, diagnose any use of a name prior to the point where it is introduced and its type is known.

This works by performing name resolution on a top-level class, interface, or impl twice: the first pass performs name-resolution for everything other than nested function bodies, and the second pass performs name resolution on function bodies. At the moment, the second pass does a superset of the work done by the first pass, and as a consequence, some identifier expressions now have their target set twice to the same thing.
@zygoloid zygoloid mentioned this pull request Jul 19, 2022
jonmeow added a commit to jonmeow/carbon-lang that referenced this pull request Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes PR meets CLA requirements according to bot. proposal accepted Decision made, proposal accepted proposal A proposal
Projects
No open projects
Proposals
Accepted
Development

Successfully merging this pull request may close these issues.

Open question: Calling functions defined later in the same file
7 participants