Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How should Proc::Async.new, run and shell call cmd.exe? #20

Open
patrickbkr opened this issue Apr 29, 2019 · 24 comments · May be fixed by MoarVM/MoarVM#1265 or Raku/nqp#610
Open

How should Proc::Async.new, run and shell call cmd.exe? #20

patrickbkr opened this issue Apr 29, 2019 · 24 comments · May be fixed by MoarVM/MoarVM#1265 or Raku/nqp#610
Assignees
Labels

Comments

@patrickbkr
Copy link
Member

@patrickbkr patrickbkr commented Apr 29, 2019

On Windows it is not possible to call cmd.exe with complex arguments. Calling .bat files is affected as well. Rakudo bug r#2005 also reports this.

Short problem description

On Windows the API to start programs (CreateProcess) does not take an array, but a single string. There is a convention nearly all programs adhere to of how to turn an array of arguments into a single string (a well defined quoting). This convention is what rakudo currently implements (via libuv, which does the actual quoting).
Not all programs adhere to this convention. Most prominently cmd.exe and as a result .bat scripts.
When trying to call a .bat file rakudo applies the usual quoting to the commandline arguments and as a result makes it impossible to pass some arguments.

Perl 6 functionality affected: Proc::Async, shell and run.

How could the API look that in addition to the current behaviour also allows to call cmd.exe and any other program not adhering to the commandline quoting convention?

@patrickbkr

This comment has been minimized.

Copy link
Member Author

@patrickbkr patrickbkr commented Apr 29, 2019

How command line processing on Windows works

Do read this awesome writeup of how commandline processing works on Windows.

Also don't miss this bugreport that already has a proposal of how to handle this in Perl 6.

Windows does not have the concept of an ARGV array. At the API level there is only one single string that is passed. There is a convention (CommandLineToArgv() implements that) of how a string array is converted to a single string and back. With rare exceptions all programs adhere to this convention. The most prominent exception is cmd.exe. cmd.exe just directly processes its argument string as if you'd have pasted it into a cmd.exe window. When calling a .bat file, Windows implicitly wraps the call in cmd.exe /C <your.bat stuff>. (Not entirely sure what exact command Windows actually puts around the command, it's difficult to find out.) Thus calling any .bat file is subject to cmd.exe processing. One usually doesn't want that.

When calling an .exe file through cmd.exe, then cmd.exe will process the argument string first and the output is afterwards dequoted using CommandLineToArgv(). When calling a .bat file with cmd.exe (irrespective of implicitly or explicitly), then CommandLineToArgv() will not be called. The arguments will arrive in the .bat file as cmd.exe processed them.

The CommandLineToArgv() quoting (CLTA)

If the argument contains \n\t\v" it needs quotation as follows. Put it in "" and put a \ before every " the argument contains. If a contained " is preceded by one or more \ , duplicate all the \ . Also duplicate all \ at the end of the argument. One must not duplicate any other \ . All the so quoted and unquoted arguments are joined with a single space in between.

The cmd quoting

  • Either: Prefix all cmd.exe metacharacters (those are ( ) % ! ^ " < > & |) with the escape character ^. It's not possible to escape spaces using ^.
  • Or: Wrap arguments in " and prefix all contained " with ^
@patrickbkr

This comment has been minimized.

Copy link
Member Author

@patrickbkr patrickbkr commented Apr 29, 2019

Five possible usage scenarios

In decreasing order of commonness.

  1. I want to run a exe file
    • Then I don't use cmd.exe
    • I want CLTA quoting
  2. I want to run a bat file
    • I don't want any cmd.exe side effects
    • I thus need cmd quoting, but no CLTA quoting
  3. I want to do shell stuff (or use one of the other programs that have different argument parsing, e.g. nmake)
    • Irrespective whether I want to start a .exe or .bat or something else altogether: I want the side effects cmd.exe has
    • I want no quoting at all. Otherwise I'd circumvent the functionality of cmd.exe
  4. I want to do something bare metal. I know what i do:
    • I want no quoting at all
  5. I want to run a exe file via cmd.exe, but don't want any cmd.exe side effects
    • I'd want CLTA quoting first, then cmd quoting on the result
    • That's what the Microsoft blog recommends.
    • Why would one want to do such a thing and not just call the exe file directly?

Number 4. and 5. won't happen in any normal usage scenarios.

What others do

libuv quotes arguments compatible to CommandLineToArgv(). When passing the UV_PROCESS_WINDOWS_VERBATIM_ARGUMENTS argument this quoting is disabled and the arguments are just joined with a space.

node exposes a shell flag which wraps the call in cmd.exe /d /s /c "<command>" by hand and enables UV_PROCESS_WINDOWS_VERBATIM_ARGUMENTS.

javas ProcessBuilder has a very sophisticated and complex processing of arguments. e.g. it detects whether the user called a .bat file or .cmd file and acts on that.

@patrickbkr

This comment has been minimized.

Copy link
Member Author

@patrickbkr patrickbkr commented Apr 29, 2019

Initial proposal

A good API should:

  • Work in the common case (CLTA quoting) without the user having to know about the ugly details.
  • Give enough control to the user to call any program the way he wants.
  • Make sure the changes don't make the API more complex for the use cases not affected by this.

Proc::Async idea 1

Add a new enum WindowsProcQuoting and add an additional argument to Proc::Async to select the quoting to use.

enum WindowsProcQuoting < normal, cmd, off >;
multi method new(:$path, :@args,  :$w, :$enc, :$translate-nl, WindowsProcQuoting :$win-quote = normal --> Proc::Async:D)

Proc::Async idea 2

Add a new flag to the Proc::Async constructor to turn quoting off. Add quoting subs.

multi method new(:$path, :@args,  :$w, :$enc, :$translate-nl, :$win-quote-args = True --> Proc::Async:D)
sub win-quote-normal(@args --> List)
sub win-quote-cmd(@args --> List)

run idea

run should be changed analogous to Proc::Async.

shell idea

shell should never quote its arguments.

@patrickbkr

This comment has been minimized.

Copy link
Member Author

@patrickbkr patrickbkr commented Feb 27, 2020

ping @jnthn, @ugexe
Can I encourage you to give your opinion?
I'm willing to work on this, but I'd like to have consensus first on how the API should look first.

@ugexe

This comment has been minimized.

Copy link
Contributor

@ugexe ugexe commented Feb 27, 2020

I like the java way of doing it (+ a flag to disable it if they don't do that) because it doesn't require the programmer to have as much of an understanding of e.g. Windows to write platform-independent code. Not thrilled about the idea of speccing some potentially complex and heuristic-based logic though.

The :windows-specific-flag alternative is acceptable, but I do feel like exposing the platform-specificness is a bit weird.

@raiph

This comment has been minimized.

Copy link

@raiph raiph commented Feb 28, 2020

Despite reading a bunch, I'm still confused about this issue or have some minor concerns. This comment provides input from others and from myself. Please read both parts of it (about the reddit post, and then my 2c) before posting anything. Thanks.

First, please read the following reddit thread that I started yesterday to get feedback on this issue:

https://www.reddit.com/r/programming/comments/fakzl4/link_to_old_ms_article_everyone_quotes_command/

Some of the comments in the thread so far correspond to some of my areas of confusion/concern.

If you wish, please respond to comments directly on reddit. If you prefer not to comment on reddit, please respond to their questions and comments in this thread and I will then follow up on reddit.

In the meantime, the rest of this comment is my 2c on the API / paintwork, either reflecting my confusion or, if I'm not confused, reflecting my concerns and attempts to reduce confusion:

enum arg-processing < default CommandLineArgvW cmd-exe WINDOWS_VERBATIM >;
:arg-processing = ...

As I understand it you either get the default, and live with it, or use one of three Windows specific options. Is that right?

Calling the default "normal" is imo weighted slightly wrongly in several ways. It has an accidental hint of derogatory attitude toward Windows that, while arguably deserved, equally arguably is out of place. It accidentally implies a reason for anxiety about switching away from it to an "abnormal" alternative. Again, that's arguably deserved but doesn't belong in the API. And "normal" doesn't hint at any particular relationship with the alternatives except inasmuch as it hints that it is perhaps notably different to all of them.

The word "default" conveys something close to "normal" so it's not a win in that regard. And it also has an unclear relationship with the alternatives. But -- and this is why I suggest it -- it slightly biases toward the sense it might just be the same as one of them. I consider this a good thing because...

Calling the next listed option "CommandLineArgvW" is so that the first page listed when googling for it says:

Parses a Unicode command line string and returns an array of pointers to the command line arguments, along with a count of such arguments, in a way that is similar to the standard C run-time argv and argc values.

I consider it useful in its own right that a google will immediately produce a result that is both exactly the right direction and both obviously and authoritatively exactly the right direction. Which is one of the reasons I'm suggesting it as the enum identifier.

But more pertinent to my point about "default", I think the word "similar" in MS's doc helps point an investigating user in the right direction inasmuch as it is simultaneously similar to the standard C run-time processing and not quite the same. And it makes it easy for our doc to leverage that when describing what the "default" option does, something like:

On unix it does what the standard C run-time does. On Windows it does the nearest Windows equivalent ("CommandLineArgvW") most of the time, "cmd-exe" when that makes sense, and WINDOWS_VERBATIM (which corresponds to libuv's option) if you say so.

@patrickbkr

This comment has been minimized.

Copy link
Member Author

@patrickbkr patrickbkr commented Feb 28, 2020

@raiph

enum arg-processing < default CommandLineArgvW cmd-exe WINDOWS_VERBATIM >; :arg-processing = ...
As I understand it you either get the default, and live with it, or use one of three Windows specific options. Is that right?
If I understand correctly the above is your proposed alternative for the following I proposed:
enum WindowsProcQuoting < normal, cmd, off >;

Here is how I intended the respective parameters to influence processing:

  • normal: Does CommandLineArgvW quoting.
  • cmd: Escapes the string so cmd.exe doesn't transform it.
  • off: Concatenates the arguments with a single space, nothing else. (The flag to trigger this mode of quoting in libuv is called UV_PROCESS_WINDOWS_VERBATIM_ARGUMENTS).

Now on to your proposal.
I'm unsure what default should do on Windows. Multiple strings need to be concatenated in some way. There is no "we just don't do anything to the parameters" option.
If I understand correctly the other three options you list are different names for the options I listed:

  • CommandLineArgvW corresponds to normal
  • cmd-exe corresponds to cmd
  • WINDOWS_VERBATIM corresponds to off
    I do like the idea of picking names with a high "googleability". The names you chose are definitely better than the ones I chose in that respect.
    CommandLineArgvW does not apply on other platforms though (in contrast to normal). We might be able to solve this by simply providing different methods on different OSes. Only the one on Windows would accept the enum parameter.
@patrickbkr

This comment has been minimized.

Copy link
Member Author

@patrickbkr patrickbkr commented Feb 28, 2020

@ugexe I fully agree with all of your points. Sadly I haven't been able to come up with a solution that fits all bills. I'm open for other suggestions!

@raiph

This comment has been minimized.

Copy link

@raiph raiph commented Feb 28, 2020

Some links I found during my research of this issue that might be useful.

A key comment in a libuv issue: libuv/libuv#2627 (comment)

Two SOs that may only be useful because they're so monumental and scary that it emphasizes the need for a "Give enough control to the user to call any program the way he wants." escape hatch: https://stackoverflow.com/questions/6828751/batch-character-escaping and https://stackoverflow.com/questions/4094699/how-does-the-windows-command-interpreter-cmd-exe-parse-scripts

A link to fellow PL folk (Julia) in the process of figuring out related issues: JuliaLang/julia#34111

@raiph

This comment has been minimized.

Copy link

@raiph raiph commented Feb 28, 2020

@patrickbkr You have it about right for my paintjob. To sort the one point of confusion, my intent was that default and CommandLineArgvW are aliases and are just a rename of your normal option. Now I'm thinking it might make sense to rename normal to portable or unix-conventions or just unix.

Is your cmd option in essence rakudo-escapes-and-or-quotes-user's-command-line-so-that-when-cmd-exe-processes-it-it-turns-the-command-line-back-into-what-it-was-before-rakudo-escaped-and-or-quoted-it?

@patrickbkr

This comment has been minimized.

Copy link
Member Author

@patrickbkr patrickbkr commented Feb 29, 2020

Is your cmd option in essence rakudo-escapes-and-or-quotes-user's-command-line-so-that-when-cmd-exe-processes-it-it-turns-the-command-line-back-into-what-it-was-before-rakudo-escaped-and-or-quoted-it?

Exactly.

@raiph

This comment has been minimized.

Copy link

@raiph raiph commented Mar 2, 2020

Thanks for following up in that reddit @patrickbkr. I suspect you got nothing at all out of it but I wanted to try. Also, I promised folk in my reddit post I'd follow up today if no one else had replied to them. There is one remaining as-yet-unanswered follow up to a comment you made @patrickbkr, and one other that didn't get any response so I said I'd post it here:

Do you have an API to just pass command line as a string?

@patrickbkr

This comment has been minimized.

Copy link
Member Author

@patrickbkr patrickbkr commented Mar 2, 2020

@raiph As far as I know we do not have any API to directly pass a string. Because of that it's sometimes impossible to work around this issue - there is no API to fall back to.

@raiph

This comment has been minimized.

Copy link

@raiph raiph commented Mar 2, 2020

@patrickbkr To try to be clear, I presumed there was no API to fall back on at the moment. But I also thought that your proposal, which I imagine is what the poster was asking about, was to provide an API that, among other cases, covered this passthru case so that a user could indeed do whatever they wanted provided they called the code they needed to call. I thought this was perhaps (probably?) available via the off / VERBATIM option, but I confess to being confused enough about this that I decided I would spend hours reading online articles and comments by MS and others, and write a reddit post, to see if that led me to a place in which I better understood. I'm not there yet but continue to hope I can usefully play a role in confirming your API is the right one by not only playing the role of being a bear of very little brain but actually being one.

@patrickbkr

This comment has been minimized.

Copy link
Member Author

@patrickbkr patrickbkr commented Mar 2, 2020

@raiph Yes, I was talking about our current situation. The new API will hopefully provide the means to be able to just pass a single string that won't be processed (via the off / VERBATIM option). You summed it up nicely.

@patrickbkr

This comment has been minimized.

Copy link
Member Author

@patrickbkr patrickbkr commented Mar 7, 2020

Thinking about this some more. I'm a bit uneasy about my no. 5 use case.

  1. I want to run a exe file via cmd.exe, but don't want any cmd.exe side effects
    • I'd want CLTA quoting first, then cmd quoting on the result
    • That's what the Microsoft blog recommends.
    • Why would one want to do such a thing and not just call the exe file directly?

I don't see when one would actually need such double quoting. But that's exactly what the Microsoft blog promotes. So I guess I miss the point there. It would be a bad idea to just ignore that use case then.

@patrickbkr

This comment has been minimized.

Copy link
Member Author

@patrickbkr patrickbkr commented Mar 7, 2020

My two API ideas each have a tradeoff:

Idea 1: We are limited to three given built in quoting systems that are provided. Given you need a different one, you have to set it to off and quote the command line manually. I find this especially annoying, because it seems double quoting (first CLTA, then cmd quoting) is necessary in some situations and the quoting possibilities I listed don't include that option.

Idea 2: This proposal uses separate functions. Of the top off my head I can't remember any other place in the Raku setting where there are such dependent functions. i. e. One needs to know about the existence of the win-quote-normal and win-quote-cmd functions. I don't like that.

I currently lean toward Idea 1. We can add other options later when actual use cases show up. And it's still possible to turn automatic quoting off entirely and do it all by yourself.

@raiph

This comment has been minimized.

Copy link

@raiph raiph commented Mar 16, 2020

I miss the point there. It would be a bad idea to just ignore that use case

It seems like you're saying that you don't get why that MS person recommend this, but you nevertheless think they must surely have a point, perhaps even a good one.

As another data point, let me quote a 2017 perlmonks post:

Everyone quotes command line arguments the wrong way is quite funny, in a sad way, and it is wrong.

afoken's profile page includes plenty of evidence they're to be taken seriously on this topic. More importantly, their comment has "Reputation: 22 (no significant downvotes)?" and no rebuttal. For a comment like it, that voting/reply/rebuttal status on perlmonks speaks very loudly.

I grant that it's probably written from the perspective of someone whose main focus is perl but, to the degree there is any conflict between the perspective the MS writer provides, and the one afoken provides, but you must make simplifying presumptions (even if it's the presumption "I can't afford to make such-and-such a presumption"), I'd recommend erring on the side of being consistent with afoken's perspective.

@patrickbkr

This comment has been minimized.

Copy link
Member Author

@patrickbkr patrickbkr commented Mar 21, 2020

More thoughts:

There is no reliable way to escape parameters for use with CMD. CMD splits its argument string on space (and ,, ; and =) to create the arguments. ^ can not be used to escape a space to include that space in an argument and not split on it. One can wrap the individual arguments in " " to include a space in an argument, but then the " end up in that argument!
Because of that I don't any more think it makes sense to support CMD quoting.

So I once again change my mind and now favour my Idea No. 2: Provide a toggle to turn off argument quoting on Windows. But I wouldn't provide separate quoting subs anymore. The CLTR quoting sub is not needed as that quoting is already active by default. A CMD quoting sub should be omitted because it can't be made to work right.

This is pretty much what node.js decided to do.

I have a basic implementation of this running.
A problem I ran in to though: The Java API provides no way to influence how the arguments of a process are put together. So, apart from reimplementing the Java process handling, there is no way to make Java provide the functionality we would need. I'm not sure how to proceed here.

@patrickbkr

This comment has been minimized.

Copy link
Member Author

@patrickbkr patrickbkr commented Mar 21, 2020

I have looked around a bit. It seems there really is no sensible way to make Java cooperate. So I'll just fall back to using Javas built in logic when running on the JVM.

patrickbkr pushed a commit to patrickbkr/rakudo that referenced this issue Mar 27, 2020
Patrick Böker
patrickbkr pushed a commit to patrickbkr/MoarVM that referenced this issue Mar 27, 2020
patrickbkr pushed a commit to patrickbkr/rakudo that referenced this issue Mar 28, 2020
Patrick Böker
@patrickbkr

This comment has been minimized.

Copy link
Member Author

@patrickbkr patrickbkr commented Mar 28, 2020

I have created the above two draft PRs. They implement a $win-verbatim-args argument for sub run() and Proc::spawn(). Quoting on Windows is done in Rakudo and does not rely on the backend anymore.

Feedback welcome!

patrickbkr pushed a commit to patrickbkr/nqp that referenced this issue Apr 3, 2020
Together with a change in rakudo this fixes Raku/problem-solving#20.
@patrickbkr patrickbkr linked a pull request that will close this issue Apr 3, 2020
@patrickbkr

This comment has been minimized.

Copy link
Member Author

@patrickbkr patrickbkr commented Apr 3, 2020

Above is another PR for NQP that adapts the JS backend to the proposed solution.

@patrickbkr

This comment has been minimized.

Copy link
Member Author

@patrickbkr patrickbkr commented Apr 4, 2020

I think there is nothing I can do to push this any further. So this ticket is now blocking on feedback.

@ugexe

This comment has been minimized.

Copy link
Contributor

@ugexe ugexe commented Apr 6, 2020

I wonder if its worth having a test that checks the command line arguments come out correct via something like is run("echo","q|foo|").command, (...);. Not so much to test what it does, but more as a form of behavior documentation one can look at and see the before and after.

Do you happen to know if we can get the escaped command the JVM would run in such a way that we could set Proc.command to that value? If so then the aforementioned test could potentially work on the JVM backend.

I wanted to modify zefs AppVeyor to use a path with a space to see what would happen, but I have not gotten around to that. However -- this all appears sane to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

5 participants
You can’t perform that action at this time.