Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 27 additions & 22 deletions source/lex.tex
Original file line number Diff line number Diff line change
Expand Up @@ -116,8 +116,8 @@
to the file.

\item The source file is decomposed into preprocessing
tokens\iref{lex.pptoken} and sequences of whitespace characters
(including comments). A source file shall not end in a partial
tokens\iref{lex.pptoken} and whitespace\iref{lex.comment}.
A source file shall not end in a partial
preprocessing token or in a partial comment.
\begin{footnote}
A partial preprocessing
Expand All @@ -129,10 +129,6 @@
would arise from a source file ending with an unclosed \tcode{/*}
comment.
\end{footnote}
Each comment\iref{lex.comment} is replaced by one \unicode{0020}{space} character. New-line characters are
retained. Whether each nonempty sequence of whitespace characters other
than new-line is retained or replaced by one \unicode{0020}{space} character is
unspecified.
As characters from the source file are consumed
to form the next preprocessing token
(i.e., not being consumed as part of a comment or other forms of whitespace),
Expand Down Expand Up @@ -518,13 +514,38 @@
\indextext{comment!\tcode{//}}%
The characters \tcode{//} start a comment, which terminates immediately before the
next new-line character.
Each comment is replaced by one \unicode{0020}{space} character;
new-line characters are retained.
Comment on lines +517 to +518
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be no retention of new-line characters within C-style comments. Additionally, retention of new-line characters following the termination of a // comment does not require mention. The termination point of a comment of either form is made clear above. My understanding of the corresponding sentence in the original text is that it is meant to be bound to the next sentence in the original text (not the previous sentence as this rendition assumes).

Suggested change
Each comment is replaced by one \unicode{0020}{space} character;
new-line characters are retained.
Each comment is replaced by one \unicode{0020}{space} character.

\begin{note}
The comment characters \tcode{//}, \tcode{/*},
and \tcode{*/} have no special meaning within a \tcode{//} comment and
are treated just like other characters. Similarly, the comment
characters \tcode{//} and \tcode{/*} have no special meaning within a
\tcode{/*} comment.
\end{note}

\indextext{whitespace}%
\pnum
Preprocessing tokens can be separated by whitespace;
this consists of comments, or whitespace characters
(\unicode{0020}{space},
\unicode{0009}{character tabulation},
new-line,
\unicode{000b}{line tabulation}, and
\unicode{000c}{form feed}), or both.
\begin{note}
In certain circumstances during translation phase 4, as described in \ref{cpp},
whitespace (or the absence thereof) serves as more than
preprocessing token separation.
Whitespace can appear within a preprocessing token only as part of
a \grammarterm{header-name} or
between the quotation characters in a character literal or string literal.
Comment on lines +540 to +542
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My hope is that we would strike this from the note in the future. I think we want to move to a model, like with "UCNs inside raw strings", where "whitespace" (as we define it) does not appear except between preprocessing tokens.

\end{note}

\pnum
Whether each nonempty sequence of whitespace characters other than new-line
is retained or replaced by one \unicode{0020}{space} character is unspecified.
Comment on lines +546 to +547
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me, this is too far out of context to be worded this way. I would like a statement that such replacement may be done by an implementation before stating that whether it is done (for any specific instance) is unspecified.

Additionally, with this presentation, it is unclear to me when this replacement occurs (which affects whether comments act as separate U+0020 characters and whether both ends of a phase 1 line splice can remain separate U+0020 characters that replaced other whitespace characters). The behaviour can be observed if whitespace is retained by an implementation when forming a header-name token from < h-pp-tokens >.


\indextext{comment|)}

\rSec1[lex.pptoken]{Preprocessing tokens}
Expand Down Expand Up @@ -562,22 +583,6 @@
If a \unicode{0027}{apostrophe}, a \unicode{0022}{quotation mark},
or any character not in the basic character set
matches the last category, the program is ill-formed.
Preprocessing tokens can be separated by
\indextext{whitespace}%
whitespace;
\indextext{comment}%
this consists of comments\iref{lex.comment}, or whitespace characters
(\unicode{0020}{space},
\unicode{0009}{character tabulation},
new-line,
\unicode{000b}{line tabulation}, and
\unicode{000c}{form feed}), or both.
As described in \ref{cpp}, in certain
circumstances during translation phase 4, whitespace (or the absence
thereof) serves as more than preprocessing token separation. Whitespace
can appear within a preprocessing token only as part of a header name or
between the quotation characters in a character literal or
string literal.

\pnum
Each preprocessing token that is converted to a token\iref{lex.token}
Expand Down