Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Let us specify EOL when using out-file #2145

Closed
kumarharsh opened this issue Aug 31, 2016 · 21 comments

Comments

@kumarharsh
Copy link

commented Aug 31, 2016

Right now, powershell in Windows writes CRLF when using Out-File. A simple use-case where this fails is when I use

git format-patch HEAD~3 | Out-File patch.patch -Encoding utf8

This outputs files which look ok, but the git apply command can't accept this file, as it as CRLF line endings. So, I'd like Out-File to output files with LF line endings.

This feature might also be useful in linux maybe?

@GeeLaw

This comment has been minimized.

Copy link

commented Aug 31, 2016

Native utilities shouldn't be used with PowerShell pipelines -- there are not only line-ending issues, but also encoding issues. PowerShell "smartly" converts the output into an array of string (with encoding guessed and line-breaks broken). It's worse when your command outputs true "binary" stream.

To use native utilities properly, use Start-Process with RedirectStandard(Input/Output/Error).

To avoid setting an array to a file with CRLF, use -join to join them with LF before sending it to Set-Content or Out-File.

Ah, did I mention that you shouldn't use Set-Content or Out-File if you want to get rid of BOM? Use [IO.File]::WriteAll(Lines/Text).

@rkeithhill

This comment has been minimized.

Copy link
Contributor

commented Aug 31, 2016

@GeeLaw If a Linux user can't execute git format-patch HEAD~3 > patch.patch then I view that as a major FAIL for PowerShell on Linux. There needs to be preference variables or some other mechanism to define the default encoding for Out-File (which > uses) and in 5.1 and higher you can set $PSDefaultParameterValues["Out-File:Encoding"] = "Ascii" and Out-File will honor that. However, that should be set perhaps by default on Linux? There also needs to be a EOL preference/setting that defaults to just on LF on Linux. It would also be nice to see Out-File/Set-Content also get a -NewLine parameter that takes CRLF or LF.

@GeeLaw

This comment has been minimized.

Copy link

commented Aug 31, 2016

@rkeithhill No... I don't think you got me... Even you get that option, native utilities are still subject to be broken secretly. I'm not against the proposed -NewLine parameter. I'm against using PowerShell's OO pipes with native utilities. PowerShell has done bad things to the output of git before you do anything more -- it guesses its encoding, interprets it as string and breaks them by line. If the output is originally mixed, or CRLF, you get broken when you re-output it with LF.

There should be, and will be a binary pipe, I think. And with binary pipe, native utilities will be happy to work with PowerShell.

@lzybkr

This comment has been minimized.

Copy link
Member

commented Aug 31, 2016

Binary pipes is part of #559.
File redirection using Windows style newlines on *nix is just a bug - it should just work w/o any extra options/settings.

@GeeLaw

This comment has been minimized.

Copy link

commented Aug 31, 2016

@lzybkr I don't think there is "file redirection" in PowerShell. The redirection for different object streams (output, verbose, warning etc.) are equivalent to storing them and then Set-Content. File redirection is about saving the content of a binary stream to a file, while the PS redirection is to serialize objects into files.

Again, before you "redirect" the output of git to a file, the stdout has been reinterpreted by PS.

@lzybkr

This comment has been minimized.

Copy link
Member

commented Aug 31, 2016

File redirection is absolutely a language feature of PowerShell. The implementation may rely on piping to Set-Content today, but that's an implementation detail that could change if necessary, e.g. to write binary data or whatever.

@GeeLaw

This comment has been minimized.

Copy link

commented Aug 31, 2016

@lzybkr that'll be breaking... It's the best to have the binary pipe and users should use that for native utilities. Mind you, that writing to a file with > is equivalent to piping the object to Set-Content (or perhaps Out-File, I don't remember which) is NOT an implementation detail, it's documented. And again, the corruption of output of a native utility happens * BEFORE* "redirecting" to a file.

Could you do the following experiment? I guess you'll understand the idea why current syntax/standard (documentation specified behaviours) won't allow the real "file redirection".

# suppose that git command will output more than 2 lines.
$output = git format-patch HEAD~3
$output.GetType()
$output | % { $_.GetType() }

The second command should give System.Object[]. That is, before PowerShell ever writes the file, the stream output by git is already lost. As @kumarharsh has shown to us, you have to use -Encoding UTF8, why? The reason is again simple. Though git outputs in a specific encoding, PowerShell engine reads its stdout as string (with encoding guessing), splits it by line, then gives the runtime an Object[]. The encoding, the line-ending styles and other possible information have been lost. There is no correct way to recover the encoding, the line-ending character sequence, from that object (array of objects). That's why you have to again specify the encoding.

You already know one half (encoding), and line-ending character sequence is just the second half, of the OO nature of PS.

I suggest you use Start-Process as a workaround and wait for the binary pipe.

@kumarharsh

This comment has been minimized.

Copy link
Author

commented Aug 31, 2016

Thank you for the detailed explaination @GeeLaw. I didn't even know half of it. Although I must point out that if powershell is guessing the encoding, it's wrong — using Out-File or the sugared > writes files in UTF16LE, which is very far from the UTF8 / ASCII it should be deducting from the output of git commands, or is it using it's default encoding always?

@GeeLaw

This comment has been minimized.

Copy link

commented Sep 1, 2016

@kumarharsh

Short Explanation

PS guesses the encoding to transform the output into objects, and then uses the default encoding to output. After the transformation, no guessing is needed and no encoding information is stored.

Long Explanation

The guessing happens when PowerShell transforms the stdout of git, it seems PS got this one correct (you have valid strings in memory now). After the transformation, there is no "encoding" anywhere -- it's stored as string objects (internally it'll be UTF16 on Windows, I guess CoreCLR uses the same internal encoding). At this point PS has "forgotten" the encoding. The default encoding for Set-Content or Out-File (the one used for "redirection") is UTF16LE. You can change this by supplying entries in $PSDefaultParameterValues.

The whole process is:

  1. PS executes git format-patch HEAD~3;
  2. PS reads its output stream (stdout) as a string;
  3. PS splits the string by line and returns the split result as the value of that invocation.

If this is still too abstract, let's say that line of git outputs

Hello
World
This is surely not outputed by git.

Then the line is equivalent to

# no encoding information can be seen by Out-File
@('Hello', 'World', 'This is surely not outputed by git.') |
    Out-File patch.patch -Encoding utf8

If you didn't supply -Encoding utf8, the encoding defaults to UTF16LE.

@rkeithhill

This comment has been minimized.

Copy link
Contributor

commented Sep 1, 2016

@GeeLaw One minor correction. Set-Content encoding defaults to ASCII. Also, I believe > is syntax sugar for Out-File which brings up another issue. Out-File always appends a newline seq to the last string it writes to the file e.g.:

38> 'hello' > foo.txt
39> fhex foo.txt

Address:  0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F ASCII
-------- ----------------------------------------------- ----------------
00000000 FF FE 68 00 65 00 6C 00 6C 00 6F 00 0D 00 0A 00 ..h.e.l.l.o.....

If you use Out-File directly, you can avoid that final newline with the NoNewline parameter. No such luck with >.

the corruption of output of a native utility happens * BEFORE* "redirecting" to a file.

Yes. However, I would like to be able to operate on that output as strings so it is very useful to have the output of a utility like git converted to string objects (instead of having to deal with a raw byte array - sans encoding info). Perhaps with PowerShell's ETS magic, strings could carry along their origin encoding info??

@GeeLaw

This comment has been minimized.

Copy link

commented Sep 1, 2016

@rkeithhill (edited 11:18 AM UTC+8, was ating wrong person) yeah, you're right. Didn't check docs, sorry. You can avoid the newline for > by setting $PSDefaultParameterValues['Out-File:NoNewline'] = $true. (Just tested on Windows PowerShell 5)

The idea of operating the output as a string is absolutely great! However, if we change the returning of git invocation directly, that'll be a breaking change. With binary pipes, we can receive a byte[], and we can have cmdlets like Convert-ByteArrayToString [-Encoding ...]. This will give us full control on interpreting the output of a native utility. Also, the idea of using ETS to record the encoding information on System.String is innovative! I'm with you on these.

@GeeLaw

This comment has been minimized.

Copy link

commented Sep 3, 2016

I just wrote a workaround for this. Didn't test it out on Mac though, but it should work for @kumarharsh as he uses Windows PowerShell. Check out Save-Module -Name 'Use-RawPipeline'.

@joeyaiello joeyaiello added the Issue-Bug label Sep 5, 2016
@mklement0

This comment has been minimized.

Copy link
Contributor

commented Apr 4, 2017

@rkeithhill:

Good points, and great idea to carry the input encoding info forward.

Note that Set-Content - despite what the help topic states - uses Default encoding by default, which in Windows PowerShell is the active "ANSI" code page (a culture-specific, 8-bit superset of ASCII, as implied by the legacy system locale).

As of this writing, the plan is for PowerShell Core on Windows to default to the same, and on Unix to default to UTF-8 (without BOM).

@powercode

This comment has been minimized.

Copy link
Collaborator

commented Apr 22, 2017

Isn't it time for windows to start defaulting to UTF8 too?

Enough of this nonsense! :) We are in a hole, but we can at least stop digging!

@iSazonov

This comment has been minimized.

Copy link
Collaborator

commented Apr 22, 2017

@powercode We are discussing this in PowerShell/PowerShell-RFC#71

@be5invis

This comment has been minimized.

Copy link

commented May 7, 2017

@iSazonov Could we customize the behavior of (>)? i.e, replace this operator with another cmdlet?

@be5invis

This comment has been minimized.

Copy link

commented May 7, 2017

@GeeLaw Do not quibble for the past stupidity. WRONG IS WRONG.

@GeeLaw

This comment has been minimized.

Copy link

commented May 8, 2017

In reply to @be5invis

@GeeLaw Do not quibble for the past stupidity. WRONG IS WRONG.

Could you please attach part of the post you're replying to? I had several posts in this thread and couldn't find out which part you are criticising.

If I get it correctly, you meant the thing about guessing the encoding? It's already stupid enough to mix two worlds without control over what happens in between -- in the past, the programmer just hopes PowerShell deals the byte stream from/to string[] in the way they hoped. Even if you can specify EOL, there are more problems, for example, "all people speak ASCII".

The cure is the long awaited binary pipe + conversion cmdlets.

@mklement0

This comment has been minimized.

Copy link
Contributor

commented May 24, 2017

As for explicitly specifying a newline sequence with Out-File / Set-Content: I've created #3855, which more generically asks for a -Delimiter parameter (to parallel the existing Get-Content -Delimiter) that would also cover this use case.

@iSazonov

This comment has been minimized.

Copy link
Collaborator

commented Aug 27, 2018

I think we should close this and continue in #3855

/cc @SteveL-MSFT @mklement0

@SteveL-MSFT

This comment has been minimized.

Copy link
Member

commented Aug 30, 2018

@iSazonov agree

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
10 participants
You can’t perform that action at this time.