New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set charset in .editorconfig #20541
Set charset in .editorconfig #20541
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only looked at the first commit, but I'm in favor of this in general.
Slight preference for UTF-8 without BOM, but not enough to bikeshed over it.
There are many cultures which will fail to read these files correctly using APIs like |
@sharwell, I agree with Pilchie. Preference for UTF-8 without BOM. While there are some scenarios that require it, In some cases (CMD/BAT files) the BOM can and will break the file. The editorconfig website additionally recommends NOT using utf-8-bom, with a link to the following: https://stackoverflow.com/questions/2223882/whats-different-between-utf-8-and-utf-8-without-bom/2223926#2223926 (which itself links to the Unicode standard which also indicates using the BOM is NOT required or recommended). |
This is fantastic. @sharwell will now be listed as the last to edit all of our source files. We can blame him for any bug as he's the last to edit that file 馃槃 On a serious note: I'm ambivalent as to which encoding we use. |
@jaredpar Want me to set the author of the commits to @dotnet-bot ? |
At the very least, there are certain files where UTF-8-BOM breaks things and/or is considered illegal. For those files, we should explicitly set the encoding to be non-bom (regardless of what we do for other files). BAT/CMD only support ASCII characters (specifically Code Page 437) unless launched with /U, in which case it supports "Unicode" (not sure what encodings though). Shell scripts do (read as "tend to") not support BOM, as the #! is considered a magic identifier. I'm pretty sure Powershell had issues at one point, but that no longer appears to be the case. The RFC for JSON indicates the BOM is illegal... We can add any others when/if we come across them. |
This PR only sets it for cs/vb/csx. |
@Pilchie, yes. I was saying, now that it is supported, we should explicitly set it in places where we know it is broken (otherwise VS adds the BOM by default... I've logged several bugs over them breaking my BAT files). |
Nah. Set it to @Pilchie so we can blame him 馃槃 Seriously: I would just keep as is. |
charset = utf-8-bom
for C# and VB sources in .editorconfigFile.ReadAllText
would fail to correctly read the file contents on en-US installations (impact on other regions unclear, but all will correctly read all but one file now)馃摑 Files were converted using the C# Interactive script window, then manually reviewed to verify accuracy.