Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preserve stdout byte stream for native commands #17857

Merged
merged 37 commits into from
Apr 28, 2023

Conversation

SeeminglyScience
Copy link
Collaborator

@SeeminglyScience SeeminglyScience commented Aug 5, 2022

With this change, any native commands in a pipeline will be able to detect if the downstream command is also native. In that case (and when stderr is not merged into stdout), the native command will write directly to the input stream of the downstream command.

Also added in this change is the ability to pipe bytes directly to a native command, e.g.

0xFFuy | native
[byte[]](0..255) | native

# Passing as a strongly typed byte array with unary comma also works for performance
,[byte[]](0..255) | native

PR Summary

PR Context

Left to do:

  • Hook up provider path resolution and error handling
  • See if it's feasible to wrap as an experimental feature
  • Fix handling for manually created pipelines (e.g. PowerShell.AddCommand)
  • Investigate Stream.CopyToAsync (thank you @daxian-dbw!)
  • Fix steppable pipelines
  • Tests
  • Clean up

PR Checklist

Also preserve for `native > file.txt`.
@AE1020
Copy link

AE1020 commented Aug 6, 2022

Glad to see this being taken seriously! It is important for generic shells--especially ones that aspire to work cross platform--to be able to pass data through agnostically.

(I cloned and built Powershell just to try this PR! 😃)

The stdout redirection did appear to work on my tests (which require not introducing CR into LF-only streams). That is good! I was able to pipe | and redirect > successfully.

However, the powershell I built seems to put the redirected native outputs wherever powershell was started up, not in the current directory. So if run from some directory like:

C:\projects\> C:\projects\powershell\src\powershell-win-core\bin\Debug\net7.0\win7-x64\publish\pwsh
PowerShell 7.3.0-preview.3-267-g4faa5f99e3f6c7f959da35ce264959bb43017126
PS C:\Projects> cd test
PS C:\Projects\test> git --version > gitver.txt
PS C:\Projects\test> cat gitver.txt
Get-Content: Cannot find path 'C:\Projects\test\gitver.txt' because it does not exist.

But we find the file in the original directory:

PS C:\Projects\test> cat ..\gitver.txt
git version 2.36.1.windows.1

Piping commands that aren't native output, e.g. echo "hello" > echotest.txt act as expected, and go to the current directory.

Redirecting 2> for errors is introducing CRs when CMD.EXE does not give them, and does not have the directory issue...so I assume it is not running the new handling. (As per my post on the discussion thread, my vote is very much that stderr be covered by the same handling!)

AE1020 added a commit to AE1020/PowerShell that referenced this pull request Aug 6, 2022
I do not know much about PowerShell, but wanted to try [building a PR to help test it](PowerShell#17857 (comment)).  (The PR addresses one big reason why I *can't* use PowerShell...[pipe/redirect corruption](PowerShell#1908).)

It is nice that there is a script which will automate the build process.  But the default disposition of powershell is to reject it.  And the link suggested by the warning message given goes to a very long page, with no obvious prescription for the exact incantation to make to run the script.

I don't know if what I did is the "right" answer: (`pwsh -ExecutionPolicy Unrestricted`).  But it worked--so I'm offering it as a first draft.  If there's a better way to advise first-time-builders, then that should be said instead.  But some sort of guidance here would be very helpful.
@SeeminglyScience
Copy link
Collaborator Author

Glad to see this being taken seriously! It is important for generic shells--especially ones that aspire to work cross platform--to be able to pass data through agnostically.

(I cloned and built Powershell just to try this PR! 😃)

❤️

However, the powershell I built seems to put the redirected native outputs wherever powershell was started up, not in the current directory. So if run from some directory like:

Ah yeah that makes sense. That'd be part of what gets fixed with my comment above about hooking up path resolution via FileSystemProvider.

Redirecting 2> for errors is introducing CRs when CMD.EXE does not give them, and does not have the directory issue...so I assume it is not running the new handling. (As per my post on the discussion thread, my vote is very much that stderr be covered by the same handling!)

Yeah I can definitely understand the desire, but the only real way to do that with consistency would be to rewrite the Process class from scratch with p/invoke code specific to every platform. While that does genuinely sound like a fun thing to write, that's a ton of extra code to maintain for the level of impact it would have. I asked in that thread because if there were show stoppers around reverting to current behavior for stderr, I'd likely have to scrap the idea entirely.

Maybe in the future dotnet will add something to the SD.Process class to make it feasible, but for now this is an acceptable compromise imo.

@AE1020
Copy link

AE1020 commented Aug 6, 2022

the only real way to do that with consistency would be to rewrite the Process class from scratch with p/invoke code specific to every platform

Can you explain in more detail why this is significantly different from the stdout case?

@SeeminglyScience
Copy link
Collaborator Author

Can you explain in more detail why this is significantly different from the stdout case?

The biggest technical issue is around merging stderr into stdout. So the ideal way this would work (using Windows APIs as an example) is I would create a single pipe (or file for redirection) with kernel32!CreatePipe (or kernel32!CreateFile) that I could pass to kernel32!CreateProcess for both stdout and stderr. I don't have a generic way to do this without writing a lot of platform specific code so instead I'm reading from the Stream created by System.Diagnostics.Process for stdout.

Doing the same for stderr and trying to flush them both into the downstream's stdin Stream leads to a significant amount of inconsistency in order. That inconsistency already exists in the current string reading behavior but it's made worse when dealing with bytes directly as you can end up writing half of a line (or even half of a unicode glyph) from stdout and then another half from stderr.

This doesn't directly exclude lighting up the native 2> log.txt scenario and reverting only for native 2>&1, but it does make it less clear cut. It's easy to understand that "error is text, so it's treated like strings, and when you merge it makes stdout also like strings". It's harder to understand "neither are treated like strings unless you merge them together" if that makes sense.

It's also not necessarily a simple switch I'd be flipping to enable it for stderr as well. There are a lot of places it needs to be special cased in pipeline creation similarly to how I'm doing it for stdout but subtly different. If/when this PR is merged I'll create a separate issue that is specifically for preserving bytes when redirecting stderr so the engine WG can discuss it. This PR will be for stdout specifically, but that doesn't necessarily mean another PR can't address it.

@AE1020
Copy link

AE1020 commented Aug 7, 2022

The stdout case probably would satisfy most people, whose concerns are more about binary preservation with things like curl/zip. My issue with cross-platform consistency on CRLF is its own crusade, which is distinct from this if one is going to actually "use" PowerShell in a bigger sense. But it happens to be measurably better with this change.

I can see that interleaving the streams raises a lot of issues. I'm not clear on how it ever works (are UNIX shells at risk of doing half-codepoints in UTF-8? If not, how? Would their guarantee be accomplished by line buffering vs. understanding UTF-8 encodings specifically? If so, what happens when the line buffer size is exceeded?) 😦 Sounds like something one would find a lot of inconvenient truths by reading up on...

@SeeminglyScience
Copy link
Collaborator Author

I can see that interleaving the streams raises a lot of issues. I'm not clear on how it ever works (are UNIX shells at risk of doing half-codepoints in UTF-8? If not, how? Would their guarantee be accomplished by line buffering vs. understanding UTF-8 encodings specifically? If so, what happens when the line buffer size is exceeded?) 😦 Sounds like something one would find a lot of inconvenient truths by reading up on...

The trouble mostly comes from utilizing System.Diagnostics.Process. When you redirect a stream using that API, it creates a pipe for each stream you've redirected. Then I would be asynchronously reading from both of these separate streams and trying to merge the output after the fact. These streams will also have different buffer sizes and flush intervals.

It's much easier when you are synchronously reading from and writing to a single pipe, as most platform specific shells with less output processing will do.

@ghost ghost added the Waiting on Author The PR was reviewed and requires changes or comments from the author before being accept label Aug 11, 2022
@ghost ghost added the Stale label Aug 26, 2022
@ghost
Copy link

ghost commented Aug 26, 2022

This pull request has been automatically marked as stale because it has been marked as requiring author feedback but has not had any activity for 15 days. It will be closed if no further activity occurs within 10 days of this comment.

@AE1020
Copy link

AE1020 commented Aug 29, 2022

@adityapatwardhan What sort of feedback would be needed for PowerShell to prioritize reviewing this?

It is a rather popular request (see #1908, and touched upon by some other issues), where these seem like things people expect to work in a shell:

curl.exe http://whatever/a.png > a.png

node a.js | gzip -c > out.gz

Modulo the output directory I found when trying it, it seemed to work for that.

@ghost ghost removed the Stale label Aug 29, 2022
@SeeminglyScience
Copy link
Collaborator Author

What sort of feedback would be needed for PowerShell to prioritize reviewing this?

It's not finished yet, it's a work in progress PR. You can ignore the bot, if it closed it then I'd just reopen

@ghost ghost added Waiting on Author The PR was reviewed and requires changes or comments from the author before being accept and removed Waiting on Author The PR was reviewed and requires changes or comments from the author before being accept labels Aug 29, 2022
Copy link
Member

@daxian-dbw daxian-dbw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Left one more suggestion, mostly ready to merge now 🚀

Copy link
Member

@JamesWTruher JamesWTruher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a couple of small suggestions
being overly descriptive here isn't necessarily a desired thing. It increases our payload size and ingestion. Think of the string as a simple key that can be queried for in the telemetry data (terabytes of it)
It essentially turns into a column title, but I think the string as is still flows on the wire.

src/System.Management.Automation/engine/BytePipe.cs Outdated Show resolved Hide resolved
src/System.Management.Automation/engine/pipeline.cs Outdated Show resolved Hide resolved
@daxian-dbw daxian-dbw merged commit 2424ad8 into PowerShell:master Apr 28, 2023
@daxian-dbw
Copy link
Member

@SeeminglyScience Thanks for pushing this change through!

@ghost
Copy link

ghost commented Jun 29, 2023

🎉v7.4.0-preview.4 has been released which incorporates this pull request.:tada:

Handy links:

@hez2010
Copy link
Contributor

hez2010 commented Jul 1, 2023

Wondering will the feature also be enabled by default in 7.4.0 GA eventually? So I don't have to detect PowerShell version -> set ExperimentalFeature every time in my PS script.

@iSazonov
Copy link
Collaborator

iSazonov commented Jul 3, 2023

@hez2010 Will if we get no negative feedbacks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CL-Engine Indicates that a PR should be marked as an engine change in the Change Log Extra Large PowerShell-Docs needed The PR was reviewed and a PowerShell Docs update is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet