New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make newly created PowerShell files default to UTF-8 *with BOM* to avoid encoding misinterpretation #1771
Comments
The current big issue here is that the extension is third-party software in the eyes of both VSCode and PowerShell. There's no exposed API for it to configure this VSCode setting on installation. See microsoft/vscode#824. Anyone seeing issues here, please lend your 👍 to microsoft/vscode#824 and take a look at MicrosoftDocs/PowerShell-Docs#3743 (will update to the doc link when it's merged). @mklement0 I assume that document prompted this issue? |
Thanks, @rjmholt - it's unfortunate that there's still no API for this (I've since given the linked issue a thumbs-up).
No, I wasn't aware of that document (thanks for the link). It was my own experience and seeing people run into the problem on SO (Stack Overflow) that prompted me to create this issue. I've now (hopefully) given it wider exposure with this SO answer. |
Ah! I'll link to it in the new doc |
With the new doc on handling encoding WRT PowerShell and text editors, can we close this? |
I suggest keeping this open with an |
Windows PowerShell just can't pull a single change of the default . better users would be making files with BOM for no reason .. |
@mklement0 nowadays it is possible for the extension to supply a default configuration for "[powershell]": {
"files.encoding": "utf8bom",
"files.autoGuessEncoding": true
} This could go here: vscode-powershell/package.json Lines 885 to 890 in 8f7649d
I can't think of anything it would break...as you pointed out, PowerShell Core readily accepts UTF8BOM, and it fixes issues with Windows PowerShell. @rjmholt can you think of any reasons not to do this now? |
It will break people relying on shebangs on Linux if this change was to happen. Shebangs rely on the first 2 bytes of the file being 0x23 0x21 and the BOM changes that so a file with a BOM will break that setup. |
Thanks, @andschwa - this manual configuration option is already a part of the OP; the point of the issue was to have the PowerShell extension apply it automatically. @jborean93, while breaking shebang functionality with a BOM is a good point in general:
|
It's a significant item and one that would be very difficult to troubleshot, so I think this should be a toggleable opt-in option at best, not an automatic default, especially since Windows Powershell (5.1) is on deprecated life support. |
Why not, it's a perfectly valid thing to do on Linux to be able to do
It breaks the workflow where people create a script in vscode and want to do Both sides have disadvantages, I'm sure you can argue both ways but the question was asked what could it break and this is one of them. Personally I think trying to cater to an effectively EOL product at the expense of the new way forward is digging yourself into a hole you eventually need to get out of in the future. |
That's essentially where it already is. You can just change the encoding (or the default) yourself in settings. |
I agree with this, and yes breaking shebang would be big IMHO. Thank you for pointing that out...we had this vague feeling there was something big on Linux that it broke but couldn't remember what! |
Text file creation affects more than a compiler . I also saw multiple free Windows programs not supporting BOM . Altering VSC behavior set/known by user (, by extension), is basically more unexpected than unusual characters being misinterpreted by specific compiler . |
Fair enough, but we know how slow and painful such demises are in the Windows world...
It's technically valid, but conceptually ill-advised (as creating shebang-based A shebang-based file with extension However, I see your point re starting out with a
In summary: it would break something that shouldn't be done to begin with. All that said, overall I do agree that BOM-less UTF-8 is the way forward. |
Note that legacy powershell has issues with signed scripts that use UTF-8 no BOM encoding and Unicode characters. One solution is switch another recommended encoding in Windows World: UTF16 LE (with BOM), in order avoid ill-fated UTF8-BOM. This has already been fixed in Powershell 7, but not in native powershell.exe. Would default encoding to UTF-16 LE break Unix shebang? |
Thank you for your comment, but please note that this issue has been closed for over a week. For better visibility, consider opening a new issue with a link to this instead. |
it will, the shebang is checked in the kernel by reading the first few bytes as an ASCII equivalent string. Any BOM on a file will break that. |
Summary of the new feature
It seems that this has been a perennial pain point (whose root cause isn't obvious), as evidenced by, for instance, by #629 or this StackOverflow question.
Making the extension default all new PowerShell files to UTF-8 with BOM solves that problem.
Such files would be both cross-edition and cross-platform compatible (given that PowerShell Core still correctly interprets the BOM, even though it doesn't require it).
Proposed technical implementation details
I don't know how it works in the context of extension-specific settings, but in the general
settings.json
file you can simply add the following:The text was updated successfully, but these errors were encountered: