CWG2870 [lex.string] `"" ""` (adjacent ordinary string literals) are ill-formed #511

Eisenwave · 2024-03-07T17:47:23Z

Reference (section label): [lex.string]

Issue description

Subclause 5.13.5 [lex.string] paragraph 7 states that:

If two string-literals have the same encoding-prefix, the common encoding-prefix is that encoding-prefix.
If one string-literal has no encoding-prefix, the common encoding-prefix is that of the other string-literal.
Any other combinations are ill-formed.

In the case "" "", i.e. when neither string-literal has an encoding-prefix:

The first sentence in the quote cannot apply because neither string-literal has an encoding-prefix, and encoding-prefix cannot be empty.
The second sentence in the quote cannot apply because the other string-literal has no encoding-prefix.

Therefore, this construct is ill-formed. Arguably, it is not possible to "fill in the blanks" and interpret the latter sentence as:

If at least one string-literal has no encoding-prefix the common encoding-prefix is that of the other string-literal, or none if neither has an encoding-prefix.

On another note, it is unusual that we talk about a "common encoding-prefix", even in the case where there is no encoding-prefix at all. The common prefix in this paragraph should not be formatted as a grammar rule.

Suggested resolution

Itemize subclause 5.13.5 [lex.string] paragraph 7, and update the result as follows:

The common ~~encoding-prefix~~ encoding prefix for a sequence of adjacent string-literals is determined pairwise as follows:

If two string-literals have the same encoding-prefix, the common ~~encoding-prefix~~ encoding prefix is that encoding-prefix.

If Otherwise, if one string-literal has no encoding-prefix, the common ~~encoding-prefix~~ encoding prefix is that of the other string-literal.

Otherwise, if neither string-literal has an encoding-prefix, there is no common encoding prefix.

~~Any other combinations are~~ Otherwise, the program is ill-formed.

Alternative resolution (not proposed, but worth considering)

In subclause 5.13.5 [lex.string] paragraph 7, replace all occurrences of encoding-prefix with "encoding prefix". This legitimizes applying paragraph 7, sentence 1 or 2 to the case "" "".

The text was updated successfully, but these errors were encountered:

Eisenwave · 2024-03-07T19:16:16Z

It's been alleged that "" "" is valid because these string literals have a "none" prefix ([tab:lex.string.literal]), so the aforementioned sentence in paragraph 7 would apply here.

However, the wording specifically mentions encoding-prefix, not "encoding prefix", and the grammar rule never produces the empty word.

frederick-vs-ja · 2024-03-11T02:25:21Z

Perhaps what we want to say is

~~Any other combinations are ill-formed.~~ Otherwise, there is no common encoding-prefix and both string-literals shall have no encoding-prefix.

This looks somehow editorial to me...

Eisenwave · 2024-03-11T07:53:40Z

This looks somehow editorial to me...

You're suggesting to turn "the program is ill-formed" into "the program is well-formed with this behavior ..."; how is that editorial?

frederick-vs-ja · 2024-03-12T01:20:34Z

This looks somehow editorial to me...

You're suggesting to turn "the program is ill-formed" into "the program is well-formed with this behavior ..."; how is that editorial?

The major issue seems to be that the second sentence may be treated as

If one string-literal has no encoding-prefix and the other has one, the common encoding-prefix is that of the other string-literal.

But my reading is that such treatment isn't or at least shouldn't be valid. The whole precondition should be "one string-literal has no encoding-prefix", so concatenation of adjacent ordinary string literals falls into this case and thus is well-formed.

The issue I see is that it's unclear whether "the common encoding-prefix is that of the other string-literal" can imply that "the common encoding-prefix does not exist if the other string-literal has no encoding-prefix". I believe such implication is intended, but I'm not sure whether it's valid.

Eisenwave · 2024-03-12T20:31:50Z

I believe such implication is intended, but I'm not sure whether it's valid.

Well, yeah, that's the crux of the issue. I don't believe that such a reading is correct because saying "the encoding-prefix of the other literal" cannot be applied when the other has no encoding-prefix at all.

jensmaurer · 2024-03-17T12:11:38Z

CWG2870

jensmaurer changed the title ~~[lex.string] "" "" (adjacent ordinary string literals) are ill-formed~~ CWG2870 [lex.string] "" "" (adjacent ordinary string literals) are ill-formed Mar 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CWG2870 [lex.string] `"" ""` (adjacent ordinary string literals) are ill-formed #511

CWG2870 [lex.string] `"" ""` (adjacent ordinary string literals) are ill-formed #511

Eisenwave commented Mar 7, 2024 •

edited

Eisenwave commented Mar 7, 2024 •

edited

frederick-vs-ja commented Mar 11, 2024

Eisenwave commented Mar 11, 2024

frederick-vs-ja commented Mar 12, 2024

Eisenwave commented Mar 12, 2024

jensmaurer commented Mar 17, 2024

CWG2870 [lex.string] "" "" (adjacent ordinary string literals) are ill-formed #511

CWG2870 [lex.string] "" "" (adjacent ordinary string literals) are ill-formed #511

Comments

Eisenwave commented Mar 7, 2024 • edited

Issue description

Suggested resolution

Alternative resolution (not proposed, but worth considering)

Eisenwave commented Mar 7, 2024 • edited

frederick-vs-ja commented Mar 11, 2024

Eisenwave commented Mar 11, 2024

frederick-vs-ja commented Mar 12, 2024

Eisenwave commented Mar 12, 2024

jensmaurer commented Mar 17, 2024

CWG2870 [lex.string] `"" ""` (adjacent ordinary string literals) are ill-formed #511

CWG2870 [lex.string] `"" ""` (adjacent ordinary string literals) are ill-formed #511

Eisenwave commented Mar 7, 2024 •

edited

Eisenwave commented Mar 7, 2024 •

edited