Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DO NOT MERGE] Reorder core basics #7153

Closed
wants to merge 10 commits into from

Conversation

AlisdairM
Copy link
Contributor

@AlisdairM AlisdairM commented Jul 23, 2024

This PR shows a vision for re-ordering the Core clauses around program construction for C++26. This is based on the pre-St Louis draft, NOT the current working draft. The intent is to show direction, and re-apply useful edits if this direction gains approval.

There are a variety of small edits, but the significant ones are:

  • give each phase of translation a stable label so they can be individually referenced
  • arrange the lex subclauses to better follow the phases of translation
  • merge the specification of comments directly into phase 3, as that is the only place they are used
  • split the literals subclause into 2, to reflect string literals are in the preprocessor, but arithmetic literals are phase 7
  • move all the preprocessor tokens (phase 3) below a new subtitle to better group them
  • MERGE [cpp] INTO PLACE AS PHASE 4
  • move modules adjacent to [lex], preceding [basic]

This is a response to #2252 where in practice I could not find a satisfactory way to integrate the primitive parts of [lex] with [basic], so instead made the best attempt to clarify the source aspects of program creation that I could.

Overall, I find this proposed structure very helpful, as I better understood a number of parsing issues, mostly preprocessing issues that I did not always realise were preprocessing, but cannot be sure how much of that understanding comes from being involved in making the transformation itself.

The grammar for universal-character-name is oddly sandwiched into the
middle of the subcluase talking about the different character sets used
by the standard.  To improve the flow, extract that grammar into its own
subclause.

In the extraction, I make two other clarifying changes.  First, describe
this new subclause as 'a way to name any element of the of the tranlation
character set using just the basic character set' rather than simply
'a way to name other characters'.  Secondly, remove the 'one of' in the
grammar where there is only one option to choose.
The current contents of [basic.pre] jump between specifying
different things.  This PR moves all the specification of
names to the front, followed by the specification of entities.

There are two main benefits: (1) the specification for when
two names are the same is a list of 4 rules that correspond
to the 4 things than can form a name --- the connection is
much clearer when the paragraphs are adjacent and the list
is sorted to the same order; (2) in this form, even though
all the words are the same, the reordering and merging of
paragraphs a fit on a single page.  The very last paragraph
was forced over a page-break in the original layout.
This change puts all the specification for assembling and transforming
the source of a program ([lex], [cpp], and [modules]) ahead of the
basic core specification of how to interpret that source.
This PR colocates [lex], [cpp], and modules to put all
the parts that talk about assembling and translating a
program together.  In doing so, it rearranges the
subclauses in [lex] and introduces subclauses that can
be cross-references for each phase of translation.

Metadata is introduced to identify the first and last
core clause when cross-references want the first/last
property rather than the specific clause itself.

The subclause on comments, [lex.comment], is merged
into the new [lex.phase.3] as comments feature only
during translation phase three.
…on unit

The definition of program at the top of [basic.link] should move to
the front of [lex.separate] so that it is defined before its first
usage, and also clarifies that the phases of translation produce.

Similarly, move the definition of the grammar production translation-unit
to the top of the first clause to actually use it, [module.unit].

Finally, retitle [basic.link] as just Linkage, rather than
prgrams and linkage.
@AlisdairM
Copy link
Contributor Author

While I like this rearrangement, the placing of the predefined macro names feels odd, as I got too used to it being the very last part of the Core specification, before the Library intro, and it feels lost buried in the middle of this combined clause.

@jensmaurer
Copy link
Member

jensmaurer commented Jul 23, 2024

Not having looked at the details, do we finally differentiate the grammar for preprocessor tokens from those for phase 7 tokens? "identifier" does double duty here.

@AlisdairM
Copy link
Contributor Author

Structurally, pp-tokens in phases 3 and 4 are more clearly distinct from tokens in phase 7, but identifier still does double duty, as all I was comfortable doing was moving words around, not changing them.

It would be much easier to write a tiny paper of CWG issue to address concerns with identifier after this change though. I certainly run into identifier issues on proposals I am writing that touch on this space.

@AlisdairM
Copy link
Contributor Author

Closing this PR in favor of #7272 that does not try to mess with the text for phases of translation, but more freely reorganizes the core clauses.

@AlisdairM AlisdairM closed this Sep 30, 2024
@AlisdairM AlisdairM deleted the reorder_core_basics branch October 23, 2024 23:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants