Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How should Proc::Async.new, run and shell call cmd.exe? #20

Closed
patrickbkr opened this issue Apr 29, 2019 · 33 comments · Fixed by MoarVM/MoarVM#1265 or Raku/nqp#610
Closed

How should Proc::Async.new, run and shell call cmd.exe? #20

patrickbkr opened this issue Apr 29, 2019 · 33 comments · Fixed by MoarVM/MoarVM#1265 or Raku/nqp#610
Assignees
Labels
rakudo Big changes to Rakudo

Comments

@patrickbkr
Copy link
Member

On Windows it is not possible to call cmd.exe with complex arguments. Calling .bat files is affected as well. Rakudo bug r#2005 also reports this.

Short problem description

On Windows the API to start programs (CreateProcess) does not take an array, but a single string. There is a convention nearly all programs adhere to of how to turn an array of arguments into a single string (a well defined quoting). This convention is what rakudo currently implements (via libuv, which does the actual quoting).
Not all programs adhere to this convention. Most prominently cmd.exe and as a result .bat scripts.
When trying to call a .bat file rakudo applies the usual quoting to the commandline arguments and as a result makes it impossible to pass some arguments.

Perl 6 functionality affected: Proc::Async, shell and run.

How could the API look that in addition to the current behaviour also allows to call cmd.exe and any other program not adhering to the commandline quoting convention?

@patrickbkr
Copy link
Member Author

patrickbkr commented Apr 29, 2019

How command line processing on Windows works

Do read this awesome writeup of how commandline processing works on Windows.

Also don't miss this bugreport that already has a proposal of how to handle this in Perl 6.

Windows does not have the concept of an ARGV array. At the API level there is only one single string that is passed. There is a convention (CommandLineToArgv() implements that) of how a string array is converted to a single string and back. With rare exceptions all programs adhere to this convention. The most prominent exception is cmd.exe. cmd.exe just directly processes its argument string as if you'd have pasted it into a cmd.exe window. When calling a .bat file, Windows implicitly wraps the call in cmd.exe /C <your.bat stuff>. (Not entirely sure what exact command Windows actually puts around the command, it's difficult to find out.) Thus calling any .bat file is subject to cmd.exe processing. One usually doesn't want that.

When calling an .exe file through cmd.exe, then cmd.exe will process the argument string first and the output is afterwards dequoted using CommandLineToArgv(). When calling a .bat file with cmd.exe (irrespective of implicitly or explicitly), then CommandLineToArgv() will not be called. The arguments will arrive in the .bat file as cmd.exe processed them.

The CommandLineToArgv() quoting (CLTA)

If the argument contains \n\t\v" it needs quotation as follows. Put it in "" and put a \ before every " the argument contains. If a contained " is preceded by one or more \ , duplicate all the \ . Also duplicate all \ at the end of the argument. One must not duplicate any other \ . All the so quoted and unquoted arguments are joined with a single space in between.

The cmd quoting

  • Either: Prefix all cmd.exe metacharacters (those are ( ) % ! ^ " < > & |) with the escape character ^. It's not possible to escape spaces using ^.
  • Or: Wrap arguments in " and prefix all contained " with ^

@patrickbkr
Copy link
Member Author

patrickbkr commented Apr 29, 2019

Five possible usage scenarios

In decreasing order of commonness.

  1. I want to run a exe file
    • Then I don't use cmd.exe
    • I want CLTA quoting
  2. I want to run a bat file
    • I don't want any cmd.exe side effects
    • I thus need cmd quoting, but no CLTA quoting
  3. I want to do shell stuff (or use one of the other programs that have different argument parsing, e.g. nmake)
    • Irrespective whether I want to start a .exe or .bat or something else altogether: I want the side effects cmd.exe has
    • I want no quoting at all. Otherwise I'd circumvent the functionality of cmd.exe
  4. I want to do something bare metal. I know what i do:
    • I want no quoting at all
  5. I want to run a exe file via cmd.exe, but don't want any cmd.exe side effects
    • I'd want CLTA quoting first, then cmd quoting on the result
    • That's what the Microsoft blog recommends.
    • Why would one want to do such a thing and not just call the exe file directly?

Number 4. and 5. won't happen in any normal usage scenarios.

What others do

libuv quotes arguments compatible to CommandLineToArgv(). When passing the UV_PROCESS_WINDOWS_VERBATIM_ARGUMENTS argument this quoting is disabled and the arguments are just joined with a space.

node exposes a shell flag which wraps the call in cmd.exe /d /s /c "<command>" by hand and enables UV_PROCESS_WINDOWS_VERBATIM_ARGUMENTS.

javas ProcessBuilder has a very sophisticated and complex processing of arguments. e.g. it detects whether the user called a .bat file or .cmd file and acts on that.

@patrickbkr
Copy link
Member Author

patrickbkr commented Apr 29, 2019

Initial proposal

A good API should:

  • Work in the common case (CLTA quoting) without the user having to know about the ugly details.
  • Give enough control to the user to call any program the way he wants.
  • Make sure the changes don't make the API more complex for the use cases not affected by this.

Proc::Async idea 1

Add a new enum WindowsProcQuoting and add an additional argument to Proc::Async to select the quoting to use.

enum WindowsProcQuoting < normal, cmd, off >;
multi method new(:$path, :@args,  :$w, :$enc, :$translate-nl, WindowsProcQuoting :$win-quote = normal --> Proc::Async:D)

Proc::Async idea 2

Add a new flag to the Proc::Async constructor to turn quoting off. Add quoting subs.

multi method new(:$path, :@args,  :$w, :$enc, :$translate-nl, :$win-quote-args = True --> Proc::Async:D)
sub win-quote-normal(@args --> List)
sub win-quote-cmd(@args --> List)

run idea

run should be changed analogous to Proc::Async.

shell idea

shell should never quote its arguments.

@patrickbkr
Copy link
Member Author

ping @jnthn, @ugexe
Can I encourage you to give your opinion?
I'm willing to work on this, but I'd like to have consensus first on how the API should look first.

@ugexe
Copy link
Contributor

ugexe commented Feb 27, 2020

I like the java way of doing it (+ a flag to disable it if they don't do that) because it doesn't require the programmer to have as much of an understanding of e.g. Windows to write platform-independent code. Not thrilled about the idea of speccing some potentially complex and heuristic-based logic though.

The :windows-specific-flag alternative is acceptable, but I do feel like exposing the platform-specificness is a bit weird.

@raiph
Copy link

raiph commented Feb 28, 2020

(Edited to simplify.)

This comment is my 2c on the API / paintwork, either reflecting my confusion or, if I'm not confused, reflecting my concerns and attempts to reduce confusion:

enum arg-processing < default CommandLineArgvW cmd-exe WINDOWS_VERBATIM >;
:arg-processing = ...

I named the overall enum "arg-processing" rather than something Windows specific to establish an overall context, so someone encountering it gets that that's the overall context. And then:

  • default (could be auto) instead of normal more strongly suggests Rakudo has a default enum if you don't specify a choice.

  • CommandLineArgvW is named so that the first page listed when googling for it says "Parses a Unicode command line string and returns an array of pointers to the command line arguments, along with a count of such arguments, in a way that is similar to the standard C run-time argv and argc values." I consider it useful that a google will immediately produce a result that is both exactly the right direction and both obviously and authoritatively exactly the right direction.

  • WINDOWS_VERBATIM is so that a google for "WINDOWS_VERBATIM" provides good results.

Putting these together, our doc might say something like:

On unix default does what the standard C run-time does. On Windows it does the nearest Windows equivalent ("CommandLineArgvW") most of the time, cmd-exe when that makes sense, and WINDOWS_VERBATIM (which corresponds to libuv's option) if you say so.

@patrickbkr
Copy link
Member Author

@raiph

enum arg-processing < default CommandLineArgvW cmd-exe WINDOWS_VERBATIM >; :arg-processing = ...
As I understand it you either get the default, and live with it, or use one of three Windows specific options. Is that right?
If I understand correctly the above is your proposed alternative for the following I proposed:
enum WindowsProcQuoting < normal, cmd, off >;

Here is how I intended the respective parameters to influence processing:

  • normal: Does CommandLineArgvW quoting.
  • cmd: Escapes the string so cmd.exe doesn't transform it.
  • off: Concatenates the arguments with a single space, nothing else. (The flag to trigger this mode of quoting in libuv is called UV_PROCESS_WINDOWS_VERBATIM_ARGUMENTS).

Now on to your proposal.
I'm unsure what default should do on Windows. Multiple strings need to be concatenated in some way. There is no "we just don't do anything to the parameters" option.
If I understand correctly the other three options you list are different names for the options I listed:

  • CommandLineArgvW corresponds to normal
  • cmd-exe corresponds to cmd
  • WINDOWS_VERBATIM corresponds to off
    I do like the idea of picking names with a high "googleability". The names you chose are definitely better than the ones I chose in that respect.
    CommandLineArgvW does not apply on other platforms though (in contrast to normal). We might be able to solve this by simply providing different methods on different OSes. Only the one on Windows would accept the enum parameter.

@patrickbkr
Copy link
Member Author

@ugexe I fully agree with all of your points. Sadly I haven't been able to come up with a solution that fits all bills. I'm open for other suggestions!

@raiph
Copy link

raiph commented Feb 28, 2020

Some links I found during my research of this issue that might be useful.

A key comment in a libuv issue: libuv/libuv#2627 (comment)

Two SOs that may only be useful because they're so monumental and scary that it emphasizes the need for a "Give enough control to the user to call any program the way he wants." escape hatch: https://stackoverflow.com/questions/6828751/batch-character-escaping and https://stackoverflow.com/questions/4094699/how-does-the-windows-command-interpreter-cmd-exe-parse-scripts

A link to fellow PL folk (Julia) in the process of figuring out related issues: JuliaLang/julia#34111

@raiph
Copy link

raiph commented Feb 28, 2020

@patrickbkr You have it about right for my paintjob. To sort the one point of confusion, my intent was that default and CommandLineArgvW are aliases and are just a rename of your normal option. Now I'm thinking it might make sense to rename normal to portable or unix-conventions or just unix.

Is your cmd option in essence rakudo-escapes-and-or-quotes-user's-command-line-so-that-when-cmd-exe-processes-it-it-turns-the-command-line-back-into-what-it-was-before-rakudo-escaped-and-or-quoted-it?

@patrickbkr
Copy link
Member Author

patrickbkr commented Feb 29, 2020

Is your cmd option in essence rakudo-escapes-and-or-quotes-user's-command-line-so-that-when-cmd-exe-processes-it-it-turns-the-command-line-back-into-what-it-was-before-rakudo-escaped-and-or-quoted-it?

Exactly.

@raiph
Copy link

raiph commented Mar 2, 2020

Thanks for following up in that reddit @patrickbkr. I suspect you got nothing at all out of it but I wanted to try. Also, I promised folk in my reddit post I'd follow up today if no one else had replied to them. There is one remaining as-yet-unanswered follow up to a comment you made @patrickbkr, and one other that didn't get any response so I said I'd post it here:

Do you have an API to just pass command line as a string?

@patrickbkr
Copy link
Member Author

@raiph As far as I know we do not have any API to directly pass a string. Because of that it's sometimes impossible to work around this issue - there is no API to fall back to.

@raiph
Copy link

raiph commented Mar 2, 2020

@patrickbkr To try to be clear, I presumed there was no API to fall back on at the moment. But I also thought that your proposal, which I imagine is what the poster was asking about, was to provide an API that, among other cases, covered this passthru case so that a user could indeed do whatever they wanted provided they called the code they needed to call. I thought this was perhaps (probably?) available via the off / VERBATIM option, but I confess to being confused enough about this that I decided I would spend hours reading online articles and comments by MS and others, and write a reddit post, to see if that led me to a place in which I better understood. I'm not there yet but continue to hope I can usefully play a role in confirming your API is the right one by not only playing the role of being a bear of very little brain but actually being one.

@patrickbkr
Copy link
Member Author

@raiph Yes, I was talking about our current situation. The new API will hopefully provide the means to be able to just pass a single string that won't be processed (via the off / VERBATIM option). You summed it up nicely.

@patrickbkr
Copy link
Member Author

Thinking about this some more. I'm a bit uneasy about my no. 5 use case.

  1. I want to run a exe file via cmd.exe, but don't want any cmd.exe side effects
    • I'd want CLTA quoting first, then cmd quoting on the result
    • That's what the Microsoft blog recommends.
    • Why would one want to do such a thing and not just call the exe file directly?

I don't see when one would actually need such double quoting. But that's exactly what the Microsoft blog promotes. So I guess I miss the point there. It would be a bad idea to just ignore that use case then.

@patrickbkr
Copy link
Member Author

patrickbkr commented Mar 7, 2020

My two API ideas each have a tradeoff:

Idea 1: We are limited to three given built in quoting systems that are provided. Given you need a different one, you have to set it to off and quote the command line manually. I find this especially annoying, because it seems double quoting (first CLTA, then cmd quoting) is necessary in some situations and the quoting possibilities I listed don't include that option.

Idea 2: This proposal uses separate functions. Of the top off my head I can't remember any other place in the Raku setting where there are such dependent functions. i. e. One needs to know about the existence of the win-quote-normal and win-quote-cmd functions. I don't like that.

I currently lean toward Idea 1. We can add other options later when actual use cases show up. And it's still possible to turn automatic quoting off entirely and do it all by yourself.

@raiph
Copy link

raiph commented Mar 16, 2020

I miss the point there. It would be a bad idea to just ignore that use case

It seems like you're saying that you don't get why that MS person recommend this, but you nevertheless think they must surely have a point, perhaps even a good one.

As another data point, let me quote a 2017 perlmonks post:

Everyone quotes command line arguments the wrong way is quite funny, in a sad way, and it is wrong.

afoken's profile page includes plenty of evidence they're to be taken seriously on this topic. More importantly, their comment has "Reputation: 22 (no significant downvotes)?" and no rebuttal. For a comment like it, that voting/reply/rebuttal status on perlmonks speaks very loudly.

I grant that it's probably written from the perspective of someone whose main focus is perl but, to the degree there is any conflict between the perspective the MS writer provides, and the one afoken provides, but you must make simplifying presumptions (even if it's the presumption "I can't afford to make such-and-such a presumption"), I'd recommend erring on the side of being consistent with afoken's perspective, rather than trusting the perspective of a random MS blog poster.

@patrickbkr
Copy link
Member Author

More thoughts:

There is no reliable way to escape parameters for use with CMD. CMD splits its argument string on space (and ,, ; and =) to create the arguments. ^ can not be used to escape a space to include that space in an argument and not split on it. One can wrap the individual arguments in " " to include a space in an argument, but then the " end up in that argument!
Because of that I don't any more think it makes sense to support CMD quoting.

So I once again change my mind and now favour my Idea No. 2: Provide a toggle to turn off argument quoting on Windows. But I wouldn't provide separate quoting subs anymore. The CLTR quoting sub is not needed as that quoting is already active by default. A CMD quoting sub should be omitted because it can't be made to work right.

This is pretty much what node.js decided to do.

I have a basic implementation of this running.
A problem I ran in to though: The Java API provides no way to influence how the arguments of a process are put together. So, apart from reimplementing the Java process handling, there is no way to make Java provide the functionality we would need. I'm not sure how to proceed here.

@patrickbkr
Copy link
Member Author

I have looked around a bit. It seems there really is no sensible way to make Java cooperate. So I'll just fall back to using Javas built in logic when running on the JVM.

patrickbkr pushed a commit to patrickbkr/rakudo that referenced this issue Mar 27, 2020
patrickbkr pushed a commit to patrickbkr/MoarVM that referenced this issue Mar 27, 2020
patrickbkr pushed a commit to patrickbkr/rakudo that referenced this issue Mar 28, 2020
@patrickbkr
Copy link
Member Author

I have created the above two draft PRs. They implement a $win-verbatim-args argument for sub run() and Proc::spawn(). Quoting on Windows is done in Rakudo and does not rely on the backend anymore.

Feedback welcome!

patrickbkr pushed a commit to patrickbkr/nqp that referenced this issue Apr 3, 2020
@patrickbkr
Copy link
Member Author

Above is another PR for NQP that adapts the JS backend to the proposed solution.

@patrickbkr
Copy link
Member Author

I think there is nothing I can do to push this any further. So this ticket is now blocking on feedback.

@ugexe
Copy link
Contributor

ugexe commented Apr 6, 2020

I wonder if its worth having a test that checks the command line arguments come out correct via something like is run("echo","q|foo|").command, (...);. Not so much to test what it does, but more as a form of behavior documentation one can look at and see the before and after.

Do you happen to know if we can get the escaped command the JVM would run in such a way that we could set Proc.command to that value? If so then the aforementioned test could potentially work on the JVM backend.

I wanted to modify zefs AppVeyor to use a path with a space to see what would happen, but I have not gotten around to that. However -- this all appears sane to me.

@patrickbkr
Copy link
Member Author

If I understand your comment correctly you'd like to have tests to check the argument string that is handed to CreateProcess() (or the resulting string we created in win-quote-CommandLineToArgvW(), should be the same). According to the docs (and also a quick glance over the code) that's not what Proc.command returns. That returns the arguments it was given by the user.
Do we want to have a method on the Proc object returning the string sent to the other process? Proc.win-command-string?

Alternatively we could expose the win-quote-CommandLineToArgvW function and just test it directly.

As far as I know the JVM has no way of retrieving the thing it sent out.

I wrote a set of very platform specific tests on Windows using a custom bat script that dumps its args and a native executable doing the same. But that only checks that the args we expect arrive in the called application (so already deserialized). With a bit of bodging I might be able to create a native executable that directly prints the single argument string it is passed. One could then check for that. Not sure how good an idea it is to have the test suite create native executables for testing though.

None of the above ideas are particularly pretty.

@ugexe
Copy link
Contributor

ugexe commented Apr 7, 2020

Edit: it does make sense to be able to run .command back through run, in which case the escaped .command would not do the right thing. So definitively scratch that.

win-quote-CommandLineToArgvW could be marked as an implementation-detail to allow testing without making the routine explicitly public.

@niner
Copy link

niner commented Apr 8, 2020

Why not run a raku script to dump those arguments? I'd think our handling of incoming arguments is pretty solid by now.

@patrickbkr
Copy link
Member Author

@niner: I can do that (I can then even reuse the tests I already have). But when doing this naively I only test what arguments arrive in the target process (the already deserialized array of command arguments), not how the single quoted argument string looks like. To actually look at that quoted argument string that was actually passed to CreateProcess() one can use GetCommandLineW() The string GetCommandLineW() returns might still not be entirely unchanged, but good enough.
Is it OK to do nativecall stuff in tests?

@niner
Copy link

niner commented Apr 8, 2020

Sure, NativeCall seems like the right tool for the job

@patrickbkr
Copy link
Member Author

Tests added to a roast PR.

@patrickbkr
Copy link
Member Author

Documentation added in a doc PR. I think that's all I can do.

So this ticket now needs feedback to progress further.

patrickbkr pushed a commit to patrickbkr/problem-solving that referenced this issue Apr 23, 2020
@patrickbkr
Copy link
Member Author

I read the manual for the problem-solving repo again and realized a PR with a solution description is necessary. So here it is: Windows-process-argument-quoting.md

patrickbkr pushed a commit to patrickbkr/rakudo that referenced this issue Apr 24, 2020
patrickbkr pushed a commit to patrickbkr/nqp that referenced this issue Apr 24, 2020
jnthn added a commit that referenced this issue Jun 3, 2020
Solution for #20: Windows process argument quoting
patrickbkr pushed a commit to patrickbkr/rakudo that referenced this issue Jun 5, 2020
patrickbkr pushed a commit to patrickbkr/nqp that referenced this issue Jun 5, 2020
patrickbkr pushed a commit to patrickbkr/MoarVM that referenced this issue Jun 5, 2020
@patrickbkr
Copy link
Member Author

The solution document was approved and merged. So this issue is now solved. The implementation will be merged shortly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
rakudo Big changes to Rakudo
Projects
None yet
6 participants