Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set charset in .editorconfig #20541

Merged
merged 5 commits into from Jun 29, 2017
Merged

Set charset in .editorconfig #20541

merged 5 commits into from Jun 29, 2017

Conversation

sharwell
Copy link
Member

@sharwell sharwell commented Jun 29, 2017

  • Set charset = utf-8-bom for C# and VB sources in .editorconfig
  • This encoding was the ~72% majority of existing code
  • Fixes ~15 cases where File.ReadAllText would fail to correctly read the file contents on en-US installations (impact on other regions unclear, but all will correctly read all but one file now)

馃摑 Files were converted using the C# Interactive script window, then manually reviewed to verify accuracy.

> #r "System.ValueTuple.dll"
> Directory.GetFiles(Environment.CurrentDirectory, "*.csx", SearchOption.AllDirectories).Select(path => (path, File.ReadAllText(path))).Select(pair => { File.WriteAllText(pair.Item1, pair.Item2, Encoding.UTF8); return true; }).Count()
11
> Directory.GetFiles(Environment.CurrentDirectory, "*.vb", SearchOption.AllDirectories).Select(path => (path, File.ReadAllText(path))).Select(pair => { File.WriteAllText(pair.Item1, pair.Item2, Encoding.UTF8); return true; }).Count()
3460
> Directory.GetFiles(Environment.CurrentDirectory, "*.cs", SearchOption.AllDirectories).Select(path => (path, File.ReadAllText(path))).Select(pair => { File.WriteAllText(pair.Item1, pair.Item2, Encoding.UTF8); return true; }).Count()
9379

Copy link
Member

@Pilchie Pilchie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only looked at the first commit, but I'm in favor of this in general.

Slight preference for UTF-8 without BOM, but not enough to bikeshed over it.

@sharwell
Copy link
Member Author

Slight preference for UTF-8 without BOM, but not enough to bikeshed over it.

There are many cultures which will fail to read these files correctly using APIs like File.ReadAllText, and editors which will try to open and resave the files in the wrong encodings. I've seen it have effects like dropping accented characters from people's names, etc., which go overlooked and eventually are lost. UTF-8 without BOM only works in a world where it's everyone's editor default.

@tannergooding
Copy link
Member

@sharwell, I agree with Pilchie. Preference for UTF-8 without BOM.

While there are some scenarios that require it, In some cases (CMD/BAT files) the BOM can and will break the file.

The editorconfig website additionally recommends NOT using utf-8-bom, with a link to the following: https://stackoverflow.com/questions/2223882/whats-different-between-utf-8-and-utf-8-without-bom/2223926#2223926 (which itself links to the Unicode standard which also indicates using the BOM is NOT required or recommended).

@jaredpar
Copy link
Member

This is fantastic. @sharwell will now be listed as the last to edit all of our source files. We can blame him for any bug as he's the last to edit that file 馃槃

On a serious note: I'm ambivalent as to which encoding we use.

@sharwell
Copy link
Member Author

@jaredpar Want me to set the author of the commits to @dotnet-bot ?

@tannergooding
Copy link
Member

At the very least, there are certain files where UTF-8-BOM breaks things and/or is considered illegal. For those files, we should explicitly set the encoding to be non-bom (regardless of what we do for other files).

BAT/CMD only support ASCII characters (specifically Code Page 437) unless launched with /U, in which case it supports "Unicode" (not sure what encodings though).

Shell scripts do (read as "tend to") not support BOM, as the #! is considered a magic identifier.

I'm pretty sure Powershell had issues at one point, but that no longer appears to be the case.

The RFC for JSON indicates the BOM is illegal...

We can add any others when/if we come across them.

@Pilchie
Copy link
Member

Pilchie commented Jun 29, 2017

This PR only sets it for cs/vb/csx.

@tannergooding
Copy link
Member

@Pilchie, yes. I was saying, now that it is supported, we should explicitly set it in places where we know it is broken (otherwise VS adds the BOM by default... I've logged several bugs over them breaking my BAT files).

@AdamSpeight2008
Copy link
Contributor

So the Squirrel is vacating @jaredpar desk to @sharwell desk, but for how long?

@jaredpar
Copy link
Member

Want me to set the author of the commits to @dotnet-bot ?

Nah. Set it to @Pilchie so we can blame him 馃槃

Seriously: I would just keep as is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants