Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't add Unicode BOM by default #712

Closed
zjturner opened this issue Aug 10, 2015 · 5 comments
Closed

Don't add Unicode BOM by default #712

zjturner opened this issue Aug 10, 2015 · 5 comments
Assignees
Labels
Milestone

Comments

@zjturner
Copy link

PTVS 2.2
Visual Studio 2013

File -> New -> File
Choose Python -> Python Class

type a single word consisting of only ascii characters and hit Save. The UTF-8 BOM is added.

This does not happen for any other types of files. Notice that if, instead of hitting Save in the final step, you go to File -> Advanced Save Options, you can see that the default encoding is Unicode UTF-8 (with signature). But for other types of files including .cpp files, it defaults to Western European.

This also didn't happen prior to PTVS 2.2 either, so it appears to be a regression.

This is a really big pain point as we have to manually adjust this every time we save a file to get the BOM out.

@zooba
Copy link
Member

zooba commented Aug 11, 2015

This also didn't happen prior to PTVS 2.2 either, so it appears to be a regression

You're right, and I know what it is too.

We switched the default encoding from ASCII (which breaks basically anyone who isn't American) to UTF-8 (which adds a BOM by default because that's what .NET does). Since Python 3 uses UTF-8 by default, none of the other encodings make sense, but the BOM is certainly an issue.

I suspect we'll need to provide our own custom encoding that does UTF-8 with no BOM. If it's as simple as I think it might be, we should do this in 2.2.1 because it is a regression.

@zooba zooba added this to the 2.2.1 milestone Aug 11, 2015
@zooba zooba self-assigned this Aug 11, 2015
zooba added a commit that referenced this issue Aug 11, 2015
Fixes #712: Don't add Unicode BOM by default
@zjturner
Copy link
Author

zjturner commented Sep 1, 2015

This still happens to me with the latest dev build from 4 days ago. Is this supposed to be fixed? I can't test the official 2.2.1 release because there is no download for VS 2013.

@zooba
Copy link
Member

zooba commented Sep 1, 2015

There's no official 2.2.1 release - that's 2.1.1 (annoyingly similar, I admit) and it doesn't have this fix.

The change was to the detection, so if your file is starting with a BOM then it will keep it. Are you sure we're adding it to files that previously don't have one? (We may be adding it to files that need it too - not sure we can override that behavior as easily...)

@zjturner
Copy link
Author

zjturner commented Sep 1, 2015

I tried it on a few more files and those automatically defaulted to the correct encoding (UTF-8 without signature). Perhaps there was a file that accidentally got a BOM written into it and it was just keeping that, as you suggested. I'll keep an eye out for if the problem persists.

@M0sc
Copy link

M0sc commented Mar 14, 2016

Zooba wrote:

We switched the default encoding from ASCII (which breaks basically anyone who isn't American) to UTF-8 (which adds a BOM by default because that's what .NET does).

Actually your old implementation adds BOM, not .NET core:

c0d2d63
https://github.com/dotnet/coreclr/blob/master/src/mscorlib/src/System/Text/Encoding.cs#L1542
https://github.com/dotnet/coreclr/blob/master/src/mscorlib/src/System/Text/UTF8Encoding.cs#L69

Seems that the new implementation

new UTF8Encoding(false)

have been chosen for clarity as

new UTF8Encoding()

produced the same result.
😉

And this annoying issue exists in version 2.1.1 (build 2.1.30810.00):
https://github.com/Microsoft/PTVS/blob/v2.1/Python/Product/Analysis/Parsing/Parser.cs#L4689
😠

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants