-
Notifications
You must be signed in to change notification settings - Fork 804
[lex.phase] Move normative handling of comments] #8594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[lex.phase] Move normative handling of comments] #8594
Conversation
The specification of phase 3 of translation says too much about how things are done rather than what is done. Per suggestions on PR cplusplus#8414, move the precise handling of comments to [lex.comment]. As this move relies on the specificatin for whitespace, move the definition of whitespace from [lex.pptoken] to [lex.comment] too.]
hubert-reinterpretcast
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the proposed changes, the subclause title for [lex.comment] should probably be expanded.
| Each comment is replaced by one \unicode{0020}{space} character; | ||
| new-line characters are retained. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be no retention of new-line characters within C-style comments. Additionally, retention of new-line characters following the termination of a // comment does not require mention. The termination point of a comment of either form is made clear above. My understanding of the corresponding sentence in the original text is that it is meant to be bound to the next sentence in the original text (not the previous sentence as this rendition assumes).
| Each comment is replaced by one \unicode{0020}{space} character; | |
| new-line characters are retained. | |
| Each comment is replaced by one \unicode{0020}{space} character. |
| Whitespace can appear within a preprocessing token only as part of | ||
| a \grammarterm{header-name} or | ||
| between the quotation characters in a character literal or string literal. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My hope is that we would strike this from the note in the future. I think we want to move to a model, like with "UCNs inside raw strings", where "whitespace" (as we define it) does not appear except between preprocessing tokens.
| Whether each nonempty sequence of whitespace characters other than new-line | ||
| is retained or replaced by one \unicode{0020}{space} character is unspecified. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me, this is too far out of context to be worded this way. I would like a statement that such replacement may be done by an implementation before stating that whether it is done (for any specific instance) is unspecified.
Additionally, with this presentation, it is unclear to me when this replacement occurs (which affects whether comments act as separate U+0020 characters and whether both ends of a phase 1 line splice can remain separate U+0020 characters that replaced other whitespace characters). The behaviour can be observed if whitespace is retained by an implementation when forming a header-name token from < h-pp-tokens >.
The specification of phase 3 of translation says too much about how things are done rather than what is done. Per suggestions on PR #8414, move the precise handling of comments to [lex.comment]. As this move relies on the specification for whitespace, move the definition of whitespace from [lex.pptoken] to [lex.comment] too.
Note that this PR would obviate #8416, which turns the text moved from [lex.pptoken] into a comment, precisely as the moved text in this PR.