Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove \xXX char escapes from the language #12769

Closed
Valloric opened this issue Mar 8, 2014 · 7 comments
Closed

Remove \xXX char escapes from the language #12769

Valloric opened this issue Mar 8, 2014 · 7 comments

Comments

@Valloric
Copy link
Contributor

Valloric commented Mar 8, 2014

\xXX is very misleading in Rust since it actually works exactly like \u00XX instead of the way it works in C, C++ and other languages. Example:

// this FAILS because left is [195, 191]
assert_eq!( bytes!( "\xFF"), bytes!( 255 ) ); 
// this SUCCEEDS
assert_eq!( bytes!( "\xFF"), bytes!( "\u00FF" ) );  

I understand the reasoning behind this (Rust strings are always UTF-8), but then \xXX shouldn't exist in the language. It brings nothing but confusion and it's functionality as implemented is the same as \u00XX.

@alexcrichton
Copy link
Member

Closing, this was previously decided in #2800 to be working as intended.

@Valloric
Copy link
Contributor Author

Valloric commented Mar 9, 2014

#2800 was about changing \xXX to mean utf8 code unit instead of unicode codepoint. I agree with the conclusion in that issue that that change is not a good idea since it isn't useful.

But this issue is separate; it's about removing \xXX from the language entirely. \uXXXX has to exist because it covers a larger range of values and means "unicode codepoint" in every language. \xXX in Rust has no use because it's equivalent to \u00XX but causes confusion because \xXX in C and C++ means raw byte hex.

So in aggregate it's worse than useless, it's a net negative. It addresses no use case and provides no benefit but comes with a cost; the only thing it does successfully is confuse users coming from Rust's primary market, C & C++ developers.

I honestly can't see why it's being kept.

@lilyball
Copy link
Contributor

I'm strongly in favor of a modified form of this, where we allow \xXX for ASCII characters but disallow it for non-ASCII. This was suggested recently in rust-lang/rfcs#69, and apparently was also suggested back in #2800.

Keeping \xXX for codepoints U+0000 through U+007F seems like a good idea, because it means the same thing regardless of whether \xXX is interpreted as a codepoint or a code unit, and it's a convenient syntax for referring to ASCII characters. But interpreting \x80-\xFF as unicode only serves to be confusing. And not just to C/C++ programmers; even though I've been using Rust for quite a while, the other day I caught myself using \x80 in a string and expecting to get the byte 0x80.

If we restrict it to ASCII characters now, that also makes the behavior of the proposed byte string literals (rust-lang/rfcs#69) make more sense, where \x80 will definitely want to refer to the byte 0x80.

@lilyball lilyball reopened this May 20, 2014
@chris-morgan
Copy link
Member

I am in favour of restricting it for non-bytestring-literals also.

@emberian
Copy link
Member

emberian commented Jun 3, 2014

This is surprising, I assumed this was only a byte literal. I agree with @kballard here.

@Valloric
Copy link
Contributor Author

Valloric commented Jun 3, 2014

As I mentioned on rust-lang/rfcs#69, I agree with @kballard's proposal.

@rust-highfive
Copy link
Collaborator

This issue has been moved to the RFCs repo: rust-lang/rfcs#312

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants