Skip to content

Conversation

@AlisdairM
Copy link
Contributor

The specification of phase 3 of translation says too much about how things are done rather than what is done. Per suggestions on PR #8414, move the precise handling of comments to [lex.comment]. As this move relies on the specification for whitespace, move the definition of whitespace from [lex.pptoken] to [lex.comment] too.

Note that this PR would obviate #8416, which turns the text moved from [lex.pptoken] into a comment, precisely as the moved text in this PR.

The specification of phase 3 of translation says too much about
how things are done rather than what is done.  Per suggestions
on PR cplusplus#8414, move the precise handling of comments to [lex.comment].
As this move relies on the specificatin for whitespace, move the
definition of whitespace from [lex.pptoken] to [lex.comment] too.]
@eisenwave eisenwave added the P3-Other Triaged issue not in P1 or P2 label Dec 9, 2025
Copy link
Contributor

@hubert-reinterpretcast hubert-reinterpretcast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the proposed changes, the subclause title for [lex.comment] should probably be expanded.

Comment on lines +517 to +518
Each comment is replaced by one \unicode{0020}{space} character;
new-line characters are retained.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be no retention of new-line characters within C-style comments. Additionally, retention of new-line characters following the termination of a // comment does not require mention. The termination point of a comment of either form is made clear above. My understanding of the corresponding sentence in the original text is that it is meant to be bound to the next sentence in the original text (not the previous sentence as this rendition assumes).

Suggested change
Each comment is replaced by one \unicode{0020}{space} character;
new-line characters are retained.
Each comment is replaced by one \unicode{0020}{space} character.

Comment on lines +540 to +542
Whitespace can appear within a preprocessing token only as part of
a \grammarterm{header-name} or
between the quotation characters in a character literal or string literal.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My hope is that we would strike this from the note in the future. I think we want to move to a model, like with "UCNs inside raw strings", where "whitespace" (as we define it) does not appear except between preprocessing tokens.

Comment on lines +546 to +547
Whether each nonempty sequence of whitespace characters other than new-line
is retained or replaced by one \unicode{0020}{space} character is unspecified.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me, this is too far out of context to be worded this way. I would like a statement that such replacement may be done by an implementation before stating that whether it is done (for any specific instance) is unspecified.

Additionally, with this presentation, it is unclear to me when this replacement occurs (which affects whether comments act as separate U+0020 characters and whether both ends of a phase 1 line splice can remain separate U+0020 characters that replaced other whitespace characters). The behaviour can be observed if whitespace is retained by an implementation when forming a header-name token from < h-pp-tokens >.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

P3-Other Triaged issue not in P1 or P2

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants