-
Notifications
You must be signed in to change notification settings - Fork 804
[lex.phase] Move normative handling of comments] #8594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -116,8 +116,8 @@ | |
| to the file. | ||
|
|
||
| \item The source file is decomposed into preprocessing | ||
| tokens\iref{lex.pptoken} and sequences of whitespace characters | ||
| (including comments). A source file shall not end in a partial | ||
| tokens\iref{lex.pptoken} and whitespace\iref{lex.comment}. | ||
| A source file shall not end in a partial | ||
| preprocessing token or in a partial comment. | ||
| \begin{footnote} | ||
| A partial preprocessing | ||
|
|
@@ -129,10 +129,6 @@ | |
| would arise from a source file ending with an unclosed \tcode{/*} | ||
| comment. | ||
| \end{footnote} | ||
| Each comment\iref{lex.comment} is replaced by one \unicode{0020}{space} character. New-line characters are | ||
| retained. Whether each nonempty sequence of whitespace characters other | ||
| than new-line is retained or replaced by one \unicode{0020}{space} character is | ||
| unspecified. | ||
| As characters from the source file are consumed | ||
| to form the next preprocessing token | ||
| (i.e., not being consumed as part of a comment or other forms of whitespace), | ||
|
|
@@ -518,13 +514,38 @@ | |
| \indextext{comment!\tcode{//}}% | ||
| The characters \tcode{//} start a comment, which terminates immediately before the | ||
| next new-line character. | ||
| Each comment is replaced by one \unicode{0020}{space} character; | ||
| new-line characters are retained. | ||
| \begin{note} | ||
| The comment characters \tcode{//}, \tcode{/*}, | ||
| and \tcode{*/} have no special meaning within a \tcode{//} comment and | ||
| are treated just like other characters. Similarly, the comment | ||
| characters \tcode{//} and \tcode{/*} have no special meaning within a | ||
| \tcode{/*} comment. | ||
| \end{note} | ||
|
|
||
| \indextext{whitespace}% | ||
| \pnum | ||
| Preprocessing tokens can be separated by whitespace; | ||
| this consists of comments, or whitespace characters | ||
| (\unicode{0020}{space}, | ||
| \unicode{0009}{character tabulation}, | ||
| new-line, | ||
| \unicode{000b}{line tabulation}, and | ||
| \unicode{000c}{form feed}), or both. | ||
| \begin{note} | ||
| In certain circumstances during translation phase 4, as described in \ref{cpp}, | ||
| whitespace (or the absence thereof) serves as more than | ||
| preprocessing token separation. | ||
| Whitespace can appear within a preprocessing token only as part of | ||
| a \grammarterm{header-name} or | ||
| between the quotation characters in a character literal or string literal. | ||
|
Comment on lines
+540
to
+542
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My hope is that we would strike this from the note in the future. I think we want to move to a model, like with "UCNs inside raw strings", where "whitespace" (as we define it) does not appear except between preprocessing tokens. |
||
| \end{note} | ||
|
|
||
| \pnum | ||
| Whether each nonempty sequence of whitespace characters other than new-line | ||
| is retained or replaced by one \unicode{0020}{space} character is unspecified. | ||
|
Comment on lines
+546
to
+547
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For me, this is too far out of context to be worded this way. I would like a statement that such replacement may be done by an implementation before stating that whether it is done (for any specific instance) is unspecified. Additionally, with this presentation, it is unclear to me when this replacement occurs (which affects whether comments act as separate U+0020 characters and whether both ends of a phase 1 line splice can remain separate U+0020 characters that replaced other whitespace characters). The behaviour can be observed if whitespace is retained by an implementation when forming a header-name token from |
||
|
|
||
| \indextext{comment|)} | ||
|
|
||
| \rSec1[lex.pptoken]{Preprocessing tokens} | ||
|
|
@@ -562,22 +583,6 @@ | |
| If a \unicode{0027}{apostrophe}, a \unicode{0022}{quotation mark}, | ||
| or any character not in the basic character set | ||
| matches the last category, the program is ill-formed. | ||
| Preprocessing tokens can be separated by | ||
| \indextext{whitespace}% | ||
| whitespace; | ||
| \indextext{comment}% | ||
| this consists of comments\iref{lex.comment}, or whitespace characters | ||
| (\unicode{0020}{space}, | ||
| \unicode{0009}{character tabulation}, | ||
| new-line, | ||
| \unicode{000b}{line tabulation}, and | ||
| \unicode{000c}{form feed}), or both. | ||
| As described in \ref{cpp}, in certain | ||
| circumstances during translation phase 4, whitespace (or the absence | ||
| thereof) serves as more than preprocessing token separation. Whitespace | ||
| can appear within a preprocessing token only as part of a header name or | ||
| between the quotation characters in a character literal or | ||
| string literal. | ||
|
|
||
| \pnum | ||
| Each preprocessing token that is converted to a token\iref{lex.token} | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be no retention of new-line characters within C-style comments. Additionally, retention of new-line characters following the termination of a
//comment does not require mention. The termination point of a comment of either form is made clear above. My understanding of the corresponding sentence in the original text is that it is meant to be bound to the next sentence in the original text (not the previous sentence as this rendition assumes).