Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

out-file -append (or ">>") can mix two encodings in the same file #9423

Closed
jszabo98 opened this issue Apr 19, 2019 · 8 comments
Closed

out-file -append (or ">>") can mix two encodings in the same file #9423

jszabo98 opened this issue Apr 19, 2019 · 8 comments
Labels
Issue-Enhancement the issue is more of a feature request than a bug Resolution-No Activity Issue has had no activity for 6 months or more Up-for-Grabs Up-for-grabs issues are not high priorities, and may be opportunities for external contributors WG-Cmdlets-Utility cmdlets in the Microsoft.PowerShell.Utility module

Comments

@jszabo98
Copy link

jszabo98 commented Apr 19, 2019

Out-file -append (or ">>") can mix two encodings, like unicode and utf8, in the same file. Add-content doesn't seem to suffer from this. Note that out-file in PS 5.1 uniquely defaults to unicode
(undocumented). But the problem itself is in both out-file for PS 5.1 & PS 6.

Steps to reproduce

PS /Users/js> write-output hi | set-content hi.txt -Encoding unicode
PS /Users/js> write-output hi | out-file -Append hi.txt
PS /Users/js> get-content hi.txt

(or write-output hi >> hi.txt for line 2)

Expected behavior

hi
hi

Actual behavior

line one is unicode, line 2 is utf8

hi
楨

Environment data

Name                           Value
----                           -----
PSVersion                      6.2.0
PSEdition                      Core
GitCommitId                    6.2.0
OS                             Darwin 16.7.0 Darwin Kernel Version 16.7.0: Thu Jun 21 20:07:39 PDT 2018; root:xnu-3789.73.14~1/RELEASE_X86_64
Platform                       Unix
PSCompatibleVersions           {1.0, 2.0, 3.0, 4.0…}
PSRemotingProtocolVersion      2.3
SerializationVersion           1.1.0.1
WSManStackVersion              3.0
@jszabo98 jszabo98 added the Issue-Question ideally support can be provided via other mechanisms, but sometimes folks do open an issue to get a label Apr 19, 2019
@mklement0
Copy link
Contributor

mklement0 commented Apr 20, 2019

To summarize the inconsistency:

  • Add-Content detects the encoding of existing content in a file (by BOM) and matches it when appending content.

  • Out-File / >> blindly use their default encoding when appending.

While Add-Content's behavior is helpful and preferable, note that the behavior of Out-File' / >> behavior is the same as the behavior of >> in POSIX-like shells such as Bash:

Generally, unlike in the Windows world, Unix shells and utilities assume that one encoding is in use everywhere, as reflected in the LC_CTYPE locale category - which is virtually always UTF-8 these days. >> therefore blindly uses that encoding when appending.


As for the Windows PowerShell situation:

Note that out-file in PS 5.1 uniquely defaults to unicode (undocumented).

Out-File notably differs from Set-Content in its default encoding (Unicode (UTF-16LE) vs. ANSI), but there are other cmdlets that produce UTF-16LE by default: Export-CliXml and New-ModuleManifest (which doesn't support -Encoding), possibly others.

The behavior is technically not undocumented, but misdocumented: please see MicrosoftDocs/PowerShell-Docs#4155

@jszabo98
Copy link
Author

I assume "Out-Content" should be "Out-File" at that link (different noun).

@mklement0
Copy link
Contributor

Thanks, @jszabo98 - fixed; I've also added info about a few more cmdlets.

@iSazonov iSazonov added Issue-Enhancement the issue is more of a feature request than a bug WG-Cmdlets-Utility cmdlets in the Microsoft.PowerShell.Utility module and removed Issue-Question ideally support can be provided via other mechanisms, but sometimes folks do open an issue to get a labels Jan 15, 2021
@iSazonov
Copy link
Collaborator

Out-File could detect a file encoding at open time like Add-Content.
(Only we need to measure performance.)

@iSazonov iSazonov added the Up-for-Grabs Up-for-grabs issues are not high priorities, and may be opportunities for external contributors label Jan 15, 2021
Copy link
Contributor

This issue has not had any activity in 6 months, if this is a bug please try to reproduce on the latest version of PowerShell and reopen a new issue and reference this issue if this is still a blocker for you.

2 similar comments
Copy link
Contributor

This issue has not had any activity in 6 months, if this is a bug please try to reproduce on the latest version of PowerShell and reopen a new issue and reference this issue if this is still a blocker for you.

Copy link
Contributor

This issue has not had any activity in 6 months, if this is a bug please try to reproduce on the latest version of PowerShell and reopen a new issue and reference this issue if this is still a blocker for you.

@microsoft-github-policy-service microsoft-github-policy-service bot added the Resolution-No Activity Issue has had no activity for 6 months or more label Nov 16, 2023
Copy link
Contributor

This issue has been marked as "No Activity" as there has been no activity for 6 months. It has been closed for housekeeping purposes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Issue-Enhancement the issue is more of a feature request than a bug Resolution-No Activity Issue has had no activity for 6 months or more Up-for-Grabs Up-for-grabs issues are not high priorities, and may be opportunities for external contributors WG-Cmdlets-Utility cmdlets in the Microsoft.PowerShell.Utility module
Projects
None yet
Development

No branches or pull requests

3 participants