Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CWG2870 [lex.string] "" "" (adjacent ordinary string literals) are ill-formed #511

Open
Eisenwave opened this issue Mar 7, 2024 · 6 comments

Comments

@Eisenwave
Copy link

Eisenwave commented Mar 7, 2024

Reference (section label): [lex.string]

Issue description

Subclause 5.13.5 [lex.string] paragraph 7 states that:

If two string-literals have the same encoding-prefix, the common encoding-prefix is that encoding-prefix.
If one string-literal has no encoding-prefix, the common encoding-prefix is that of the other string-literal.
Any other combinations are ill-formed.

In the case "" "", i.e. when neither string-literal has an encoding-prefix:

  • The first sentence in the quote cannot apply because neither string-literal has an encoding-prefix, and encoding-prefix cannot be empty.
  • The second sentence in the quote cannot apply because the other string-literal has no encoding-prefix.

Therefore, this construct is ill-formed. Arguably, it is not possible to "fill in the blanks" and interpret the latter sentence as:

If at least one string-literal has no encoding-prefix the common encoding-prefix is that of the other string-literal, or none if neither has an encoding-prefix.

On another note, it is unusual that we talk about a "common encoding-prefix", even in the case where there is no encoding-prefix at all. The common prefix in this paragraph should not be formatted as a grammar rule.

Suggested resolution

Itemize subclause 5.13.5 [lex.string] paragraph 7, and update the result as follows:

The common encoding-prefix encoding prefix for a sequence of adjacent string-literals is determined pairwise as follows:

  • If two string-literals have the same encoding-prefix, the common encoding-prefix encoding prefix is that encoding-prefix.
  • If Otherwise, if one string-literal has no encoding-prefix, the common encoding-prefix encoding prefix is that of the other string-literal.
  • Otherwise, if neither string-literal has an encoding-prefix, there is no common encoding prefix.
  • Any other combinations are Otherwise, the program is ill-formed.

Alternative resolution (not proposed, but worth considering)

In subclause 5.13.5 [lex.string] paragraph 7, replace all occurrences of encoding-prefix with "encoding prefix". This legitimizes applying paragraph 7, sentence 1 or 2 to the case "" "".

@Eisenwave
Copy link
Author

Eisenwave commented Mar 7, 2024

It's been alleged that "" "" is valid because these string literals have a "none" prefix ([tab:lex.string.literal]), so the aforementioned sentence in paragraph 7 would apply here.

However, the wording specifically mentions encoding-prefix, not "encoding prefix", and the grammar rule never produces the empty word.

@frederick-vs-ja
Copy link

Perhaps what we want to say is

Any other combinations are ill-formed. Otherwise, there is no common encoding-prefix and both string-literals shall have no encoding-prefix.

This looks somehow editorial to me...

@Eisenwave
Copy link
Author

This looks somehow editorial to me...

You're suggesting to turn "the program is ill-formed" into "the program is well-formed with this behavior ..."; how is that editorial?

@frederick-vs-ja
Copy link

This looks somehow editorial to me...

You're suggesting to turn "the program is ill-formed" into "the program is well-formed with this behavior ..."; how is that editorial?

The major issue seems to be that the second sentence may be treated as

If one string-literal has no encoding-prefix and the other has one, the common encoding-prefix is that of the other string-literal.

But my reading is that such treatment isn't or at least shouldn't be valid. The whole precondition should be "one string-literal has no encoding-prefix", so concatenation of adjacent ordinary string literals falls into this case and thus is well-formed.

The issue I see is that it's unclear whether "the common encoding-prefix is that of the other string-literal" can imply that "the common encoding-prefix does not exist if the other string-literal has no encoding-prefix". I believe such implication is intended, but I'm not sure whether it's valid.

@Eisenwave
Copy link
Author

I believe such implication is intended, but I'm not sure whether it's valid.

Well, yeah, that's the crux of the issue. I don't believe that such a reading is correct because saying "the encoding-prefix of the other literal" cannot be applied when the other has no encoding-prefix at all.

@jensmaurer jensmaurer changed the title [lex.string] "" "" (adjacent ordinary string literals) are ill-formed CWG2870 [lex.string] "" "" (adjacent ordinary string literals) are ill-formed Mar 17, 2024
@jensmaurer
Copy link
Member

CWG2870

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants