[lex.phase] Move normative handling of comments] #8594

AlisdairM · 2025-12-09T11:53:17Z

The specification of phase 3 of translation says too much about how things are done rather than what is done. Per suggestions on PR #8414, move the precise handling of comments to [lex.comment]. As this move relies on the specification for whitespace, move the definition of whitespace from [lex.pptoken] to [lex.comment] too.

Note that this PR would obviate #8416, which turns the text moved from [lex.pptoken] into a comment, precisely as the moved text in this PR.

The specification of phase 3 of translation says too much about how things are done rather than what is done. Per suggestions on PR cplusplus#8414, move the precise handling of comments to [lex.comment]. As this move relies on the specificatin for whitespace, move the definition of whitespace from [lex.pptoken] to [lex.comment] too.]

hubert-reinterpretcast

With the proposed changes, the subclause title for [lex.comment] should probably be expanded.

hubert-reinterpretcast · 2025-12-11T03:24:57Z

source/lex.tex

+Each comment is replaced by one \unicode{0020}{space} character;
+new-line characters are retained.


There should be no retention of new-line characters within C-style comments. Additionally, retention of new-line characters following the termination of a // comment does not require mention. The termination point of a comment of either form is made clear above. My understanding of the corresponding sentence in the original text is that it is meant to be bound to the next sentence in the original text (not the previous sentence as this rendition assumes).

Suggested change

Each comment is replaced by one \unicode{0020}{space} character;

new-line characters are retained.

Each comment is replaced by one \unicode{0020}{space} character.

hubert-reinterpretcast · 2025-12-11T03:34:19Z

source/lex.tex

+Whitespace can appear within a preprocessing token only as part of
+a \grammarterm{header-name} or
+between the quotation characters in a character literal or string literal.


My hope is that we would strike this from the note in the future. I think we want to move to a model, like with "UCNs inside raw strings", where "whitespace" (as we define it) does not appear except between preprocessing tokens.

hubert-reinterpretcast · 2025-12-11T04:02:37Z

source/lex.tex

+Whether each nonempty sequence of whitespace characters other than new-line
+is retained or replaced by one \unicode{0020}{space} character is unspecified.


For me, this is too far out of context to be worded this way. I would like a statement that such replacement may be done by an implementation before stating that whether it is done (for any specific instance) is unspecified.

Additionally, with this presentation, it is unclear to me when this replacement occurs (which affects whether comments act as separate U+0020 characters and whether both ends of a phase 1 line splice can remain separate U+0020 characters that replaced other whitespace characters). The behaviour can be observed if whitespace is retained by an implementation when forming a header-name token from < h-pp-tokens >.

eisenwave added the P3-Other Triaged issue not in P1 or P2 label Dec 9, 2025

hubert-reinterpretcast reviewed Dec 11, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[lex.phase] Move normative handling of comments] #8594

[lex.phase] Move normative handling of comments] #8594

Uh oh!

AlisdairM commented Dec 9, 2025

Uh oh!

hubert-reinterpretcast left a comment

Uh oh!

hubert-reinterpretcast Dec 11, 2025

Uh oh!

hubert-reinterpretcast Dec 11, 2025

Uh oh!

hubert-reinterpretcast Dec 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		Each comment is replaced by one \unicode{0020}{space} character;
		new-line characters are retained.

		Whether each nonempty sequence of whitespace characters other than new-line
		is retained or replaced by one \unicode{0020}{space} character is unspecified.

[lex.phase] Move normative handling of comments] #8594

Are you sure you want to change the base?

[lex.phase] Move normative handling of comments] #8594

Uh oh!

Conversation

AlisdairM commented Dec 9, 2025

Uh oh!

hubert-reinterpretcast left a comment

Choose a reason for hiding this comment

Uh oh!

hubert-reinterpretcast Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

hubert-reinterpretcast Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

hubert-reinterpretcast Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants