Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

string literals: disallow backslash before non-escapes #21284

Closed
StefanKarpinski opened this issue Apr 5, 2017 · 10 comments
Closed

string literals: disallow backslash before non-escapes #21284

StefanKarpinski opened this issue Apr 5, 2017 · 10 comments
Labels
breaking This change will break code decision A decision on this change is needed help wanted Indicates that a maintainer wants help on an issue or pull request parser Language parsing and surface syntax strings "Strings!"
Milestone

Comments

@StefanKarpinski
Copy link
Sponsor Member

StefanKarpinski commented Apr 5, 2017

See: http://stackoverflow.com/questions/43114125/how-to-replace-string-literals-back-front-slashes-in-julia. There are a couple of issues here:

  1. Since Python behaves differently here for backslash followed by non-escapes, allowing these sequences can cause confusion for Python programmers. It would be better to give an error when a user writes something like "ab\c" telling them to write "ab\\c" instead if they wanted a literal backslash or to write "abc" if they don't want a literal backslash.

  2. If "ab\c" is an error, that leaves the syntax open in the future for new meanings, giving us room for extensibility. This does not sacrifice any expressiveness since writing \c for non-control characters is otherwise completely useless. For example, we could then introduce something like the syntax "foo\(bar)baz" that has previously been proposed for interpolation (this is Swift's interpolation syntax) without fear of breaking anyone's code.

@stevengj
Copy link
Member

stevengj commented Apr 5, 2017

Sounds good to me.

@stevengj stevengj added strings "Strings!" breaking This change will break code decision A decision on this change is needed labels Apr 5, 2017
@JeffBezanson
Copy link
Sponsor Member

Sounds ok to me too.

@stevengj
Copy link
Member

stevengj commented Apr 5, 2017

Disallowing unrecognized escapes seems pretty common: C, Java, C#, R, ...?

@StefanKarpinski
Copy link
Sponsor Member Author

I was under the impression that C allowed unrecognized escapes, but perhaps I'm misremembering.

@StefanKarpinski
Copy link
Sponsor Member Author

I was misremembering:

$ echo 'int main() { puts("ab\\c"); return 0; }' | gcc -xc -
<stdin>:1:14: warning: implicit declaration of function 'puts' is invalid in C99
      [-Wimplicit-function-declaration]
int main() { puts("ab\c"); return 0; }
             ^
<stdin>:1:22: warning: unknown escape sequence '\c' [-Wunknown-escape-sequence]
int main() { puts("ab\c"); return 0; }
                     ^~
2 warnings generated.

$ julia -e 'println(repr(readstring(`./a.out`)))'
"abc\n"

Clang warns about unknown escapes in C string literals and then ignores the backslash. We should definitely change this.

@stevengj
Copy link
Member

stevengj commented Apr 6, 2017

@StefanKarpinski, according to C99 (ISO/IEC 9899:1999), section 6.4.4.4, "If any other character follows a backslash, the result is not a token and a diagnostic is required." In 6.11.4, it says "Lowercase letters as escape sequences are reserved for future standardization."

@StefanKarpinski
Copy link
Sponsor Member Author

Yeah, I saw that but I figured I should also try it since compilers don't always follow specs :)

@stevengj
Copy link
Member

stevengj commented Apr 11, 2017

If I understand the parser code correctly, the processing of escape sequences is actually done in the flisp parser by this read_string function, which in turn calls this utf8.c function.

I think it should be sufficient to change this line to something like:

char esc = read_escape_control_char((char)c);
if (esc == (char)c && esc != '\\' && esc != '"' && esc != '\'') {
    free(buf);
    lerror(fl_ctx, fl_ctx->ParseError, "read: invalid escape sequence");
}
buf[i++] = esc;

(utf8.c also includes a redundant u8_read_escape_sequence function that seems to be unused, and which should maybe be deleted.)

@StefanKarpinski StefanKarpinski added the help wanted Indicates that a maintainer wants help on an issue or pull request label Apr 12, 2017
@JeffBezanson JeffBezanson added this to the 1.0 milestone Apr 18, 2017
@JeffBezanson JeffBezanson added the parser Language parsing and surface syntax label May 10, 2017
@StefanKarpinski
Copy link
Sponsor Member Author

@stevengj, would you be willing to make this change? Everyone on the triage call is in favor.

@stevengj
Copy link
Member

Looking into it now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking This change will break code decision A decision on this change is needed help wanted Indicates that a maintainer wants help on an issue or pull request parser Language parsing and surface syntax strings "Strings!"
Projects
None yet
Development

No branches or pull requests

3 participants