Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create RFC####-Improve-generation-of-argument-string-for-executables #90

Closed
wants to merge 10 commits into from

Conversation

TSlivede
Copy link

@TSlivede TSlivede commented May 5, 2017

This RFC suggests a change that fixes PowerShell/PowerShell#3049 / PowerShell/PowerShell#1995.


PS:
The pull request template suggests, that a new RFC should go into the "Draft" folder, however
https://github.com/PowerShell/PowerShell-RFC/blob/master/RFC0000-RFC-Process.md
mentions:
"New proposed drafts should be submitted as a Pull Request from the Author's fork into the Draft-Accepted folder."
So should I create a second Pull request for the "Draft Accepted" folder?


Link to file: https://github.com/TSlivede/PowerShell-RFC/blob/master/1-Draft/RFC%23%23%23%23-Improve-generation-of-argument-string-for-executables.md


This change is Reviewable

@vors
Copy link
Contributor

vors commented May 14, 2017

So great to see this RFC written! Thank you for taking this one step forward.

  • I'd love to see list of things (i.e. workarounds) that would break
  • I'm not a fan of new syntax --=, although it's a clever trick. I think if somebody in need, the existing --% escaping would be sufficient for a workaround.

@TSlivede
Copy link
Author

TSlivede commented May 14, 2017

Workarounds that break:
Whenever quotes needed to be passed to an executable, those needed to be escaped. When the proposed changes are applied, that escaping is done by PowerShell itself. If the arguments are additionally escaped manually by the caller, they are in fact double escaped.
=> Any workaround, that manually escapes quotes and doesn't pass arguments via --% will break.
Examples that break:

Regarding --=
I must admit that I absolutely don't like --%: It breaks syntactical rules of PowerShell and one can only pass dynamic contents by previously assigning them to environment variables?!
--= wouldn't break PowerShell syntax and one can easily access any PowerShell variables.

@TSlivede
Copy link
Author

TSlivede commented May 14, 2017

One worse thing that just came to my mind:

The rare cases where no escaping was necessary (because the executables don't follow the typical rules) will also break. The probably most common case could be
cmd.exe /c '"C:\path with spaces\file.exe" "argument with spaces"'
This will break. (Although I don't know why one would use cmd for this anyway...)
EDIT:
However, something like
cmd.exe /c mklink "C:\path with space\file 1.txt" "C:\path with space\file 2.txt"
will still work.

However, if the argument itself contains quotes, these are not escaped. Therefore, the corresponding element in `argv[]` has no quotes and depending on the actual string, the argument might be split into multiple `argv[]` elements. It can even occur, that the following arguments are not handled correctly.

This RFC suggests making the NativeCommandParameterBinder compatible to
[the typical CommandLine escaping rules](https://msdn.microsoft.com/en-us/library/17w5ykft.aspx).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First things first: Great RFC.

The linked topic's most recent version is now available at https://docs.microsoft.com/en-us/cpp/cpp/parsing-cpp-command-line-arguments

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.


### Linux

While this change is important on windows, it's absolutely necessary on Linux: On Windows the CommandLine is split by the next executable and most executables follow the described rules. On Linux the CommandLine is not split by the called executable -- the .Net Core runtime splits the string. Therefore, on Linux the described rules do not only apply to many calls of external executables, they apply to ALL calls of external executables. When the proposed changes are implemented, the arguments from within PowerShell always arrive -- as expected -- as the `argv[]` array in called executables.
Copy link
Contributor

@mklement0 mklement0 Aug 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is hard to believe that CoreFX currently first requires assembling the arguments into a pseudo shell "command line" to assign a single string to ProcessStartInfo.Arguments, only to split them back into arguments later - I've created https://github.com/dotnet/corefx/issues/23592 to address this.

Sadly, one of the few exceptions to those typical parsing rules is `cmd.exe`. Because of this, one cannot reliably call batch files with arbitrary arguments. (This is no problem of PowerShell, it's a cmd design problem.) Some arguments are impossible -- an uneven number of double quotes can only be sent in the last argument. To my knowledge there is no clean way to deal with this, therefore I think using the typical escaping rules is still the way to go. In many cases this is the correct way, as many batch files simply redirect their arguments to other executables and in those cases [these rules](https://msdn.microsoft.com/en-us/library/17w5ykft.aspx) apply.

**Edit:**
Optionally a special rule for batch files could be added: `"` in arguments of batch files won't be escaped according to the rules described in [Specification->Quoting](#quoting) -- instead, each literal `"` will be replaced by `""`. Many batch files seem to expect this and this way arguments won't be split into multiple `%1`,`%2`,... variables.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to state the perhaps obvious: in case of extension-less command names (e.g. foo instead of foo.cmd / foo.bat), PS first needs to determine if the command invoked happens to refer to a batch file or not.


### Quoting

The decision if an argument needs to be quoted, will be simplified: Any Argument that contains `"`, `'` or a character that matches `char.IsWhiteSpace` will be quoted. To quote an argument (compatible to [MSVC rules](https://msdn.microsoft.com/en-us/library/17w5ykft.aspx) and `CommandLineToArgvW`):
Copy link
Contributor

@mklement0 mklement0 Aug 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seem to me that '-containing tokens don't need quoting per se (I'm talking about embedded ' chars. in the literal resulting from PowerShell's own parsing).

Copy link
Contributor

@mklement0 mklement0 Aug 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given PowerShell/PowerShell#4661, it's probably worth pointing out that whatever argument evaluates to the empty string must be represented as "".

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was also talking about "embedded ' chars". It is correct that these rules don't require quoting arguments containing '.

The reason I included ' in the list of chars that require quoting, is that some applications mostly follow the MSVC rules, but additionally allow ' as a string delimiter (AFAIK cygwin, ruby). As I see no disadvantage for apps that strictly follow MSVC rules, I thought it might be worth to also quote arguments containing '.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

empty string

-> New proposal text:

Empty arguments and any argument that contains ", ' or a character that matches char.IsWhiteSpace will be quoted.

- Every occurrence of N times `\` followed by `"` will be replaced by (2*N+1) times `\` followed by `"`. (N ∈ {0,1,2,...})
- N times `\` at the end of the string is replaced by (2*N) times `\`. (N ∈ {0,1,2,...})
- `"` is added to the beginning and to the end of the string

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that CoreFX already has an internal implementation of the above rules: PasteArguments.Paste() takes an enumerable collection of individual arguments and returns them as a "command line" string: https://github.com/dotnet/corefx/blob/5d3e42f831ec3d4c5964040e235824f779d5db53/src/Common/src/System/PasteArguments.cs

@joeyaiello
Copy link
Contributor

Thanks for the RFC, @TSlivede! Sorry it took us so long to finally take a look at it.

No quorum today (just @JamesWTruher, @SteveL-MSFT, and me), but we think that this makes a ton of sense irrespective of existing behavior and without regard for the breaking-ness of it. Now that we have experimental features, we think it's perfectly reasonable to go and implement this today behind an experimental feature flag, and we can figure out further down the line whether this is opt-in vs. opt-out behavior, whether there's some transition path, and if a preference variable is the right mechanism for turning it on and off.

Certainly, implementing it behind an experimental feature flag will allow us to play with it and understand the extent of the breakage so we can make a call.

@TSlivede given this is years later, are you still potentially interested in implementing this?

@TSlivede
Copy link
Author

TSlivede commented Jul 9, 2019

potentially interested: Yes.

But after almost 2 years, I need to make some slight adjustments. I can say more in a few days.

@TSlivede
Copy link
Author

TSlivede commented Jul 15, 2019

Before changing the RFC, I'll ask for your thoughts here, so I don't have to change the RFC that often :-)


I no longer want to add the --= syntax. I still think that we need something better than --% to pass a custom CommandLine to the called executable - however I think that shouldn't be done with more special syntax. (I have some ideas and will write an other issue about that.)


In the RFC, I did not write anything about the behavior of --% - I thought it was clear, that the arguments before --% are handled according to the rules defined in the RFC and everything after it is copied verbatim to the CommandLine (after replacing environment-variables).

I don't think, that this is a good option anymore, because now that ProcessStartInfo.ArgumentList exists, I'd like to use that API and instead of replicating the quoting code within powershell. But to quote some arguments with that quoting code and append some string verbatim to the CommandLine, I would have to copy that code.

There is another reason, why handling --% this way actually never really made sense. One of the usecases for --% is to pass a custom formatted/escaped CommandLine to an executable that doesn't follow the typical rules. In that case it would simply be wrong to escape the previous arguments according to those typical rules.

Option A: I therefore suggest this possible RFC text:

If a command contains the stop-parsing-symbol (--%), powershell assumes that the called executable doesn't follow the typical rules. Because of that, the arguments before the stop-parsing-symbol are processed according to the following simplified rules:

  • If the argument contains " the argument is copied to the argument-string exactly as it is, without escaping or adding quotes around the argument. (Powershell assumes, the user did that himself.)
  • Else, If the argument contains a character that matches char.IsWhiteSpace " is added to the beginning and to the end of the argument.
  • If the argument contains neither whitespace nor " the argument is copied as it is.

The text after --% is handled in the same way, as it is handled now - environment-variables are replaced, nothing else is changed.

Option B: Alternatively, one could leave that behavior as it is:

If a command contains the stop-parsing-symbol (--%), powershell assumes that the called executable doesn't follow the typical rules. Because of that, the arguments before the stop-parsing-symbol are processed like they were handled before this RFC - powershell tries to smartly determine if an argument needs to be quoted.

The text after --% is handled in the same way, as it is handled now - environment-variables are replaced, nothing else is changed.

Option C: I could possibly consider to actually replicate the quoting code to handle arguments before --% according to the typical rules (as described in the RFC) if this is, what everyone wants.

The advantage of option A over option B is, that it can be much better documented and explained.


As by now the parameterBinder has the information which part of the argument was quoted, we could also consider to actually generate a CommandLine, that has partially quoted arguments. (Of course only for arguments that were partially quoted in the powershell command.) This would also require to have the quoting/escaping code in powershell, but it gives a real benefit:

Currently this doesn't work: If you type in powershell:

msiexec /passive TARGETDIR="C:\Program Files\7-Zip" /i 7z1900-x64.msi

then powershell sends this CommandLine to msiexec:

msiexec /passive "TARGETDIR=C:\Program Files\7-Zip" /i 7z1900-x64.msi

and because msiexec doesn't accept the usual rules it doesn't work. msiexec needs the opening quote after the equal sign.

The current workaround is this powershell command:

msiexec /passive 'TARGETDIR="C:\Program Files\7-Zip"' /i 7z1900-x64.msi

This works, because powershell tries to determine, if arguments already contain enough mached quotes, and if so, it doesn't add additional quotes. This workaround would no longer work after this RFC.

The "correct"TM solution of this problem would be, that microsoft fixes msiexec to accept arguments that are quoted according to the typical rules.

However we could fix this for powershell by quoting only the part of the argument, that was previously quoted in the powershell command. The information is AFAIK available in the ParameterBinder because of the strange way, that globbing is implemented.

@mklement0
Copy link
Contributor

mklement0 commented Nov 10, 2019

Good points, @TSlivede.

The "correct"TM solution of this problem would be, that microsoft fixes msiexec to accept arguments that are quoted according to the typical rules.

I agree, so I don't think we should go out of our way to accommodate "rogue" programs - this should be a legitimate reason - perhaps the only remaining one - to use --%.

Unfortunately, as wonderful as ProcessStartInfo.ArgumentList is, it doesn't account for another rogue program that we do have to accommodate: cmd.exe itself, especially given that there is popular software out there that uses it implicitly, via batch files, notably the az.cmd Azure CLI and the npm CLI, npm.cmd, along with the CLIs that come with packages from the npm registry.

Take @SteveL-MSFT's example:

az "myargs&b"  # az is az.cmd - a batch file - on Windows

This breaks not only currently, but will also break if we use ProcessStartInfo.ArgumentList, because the command line that is built behind the scenes - on Windows - will also not double-quote myargs&b, because the value contains neither spaces nor embedded double quotes.

Note that this problem is limited to (a) Windows and (b) the fact the command line is (ultimately) parsed by another shell, cmd.exe, where & is a metacharacter.

Sadly, this means that we cannot use ProcessStartInfo.ArgumentList on Windows, at least not unconditionally, if we want to be batch-file-friendly - and given the widespread implicit use, I think we should.

To be clear: I don't think we need to batch-file-friendly in the escaping-"-as-"" sense, for pure batch code (I think it is fine to use --% for that) but in the sense of passing arguments through with %* to components that do expect and understand \-escaped " instances.

In short: On Windows, double-quoting must be applied not only if an argument contains a <space> or embedded ", but also if one of the following characters is present:

& | < > ^ , ;

Note:

  • % is not on that list, because expanding of %...% tokens (references to defined environment-variable references) fundamentally cannot be prevented (there's no escaping mechanism when called from APIs or the command prompt; %% only works inside batch files).

  • ! is not on that list, because it is only special if setlocal enabledelayedexpansion is in effect inside a batch file; if it is in effect, ! are either interpreted as environment-variable delimiters and, if they're not simply discarded; passing ! as literals would require not only double-quoting but additionally ^-escaping; but if enabledelayedexpansion is not in effect, such a ^ would become a literal; in short: it is virtually impossible to handle ! correctly in all scenarios, and passing ! through properly in the default case -enabledelayedexpansion being off - is the best approach.

  • , and ; are only needed to be a little more batch-code-friendly, because unquoted use of these chars. separates arguments as seen by batch files; however, pass-through use with %* is not affected.

@mklement0
Copy link
Contributor

As for the Options A vs. B vs. C:

I think C is the best choice, though, as you point out, it requires our own implementation of the typical rules; given that need to accommodate cmd.exe discussed above, however, it sounds like we may have to do that anyway.

@joeyaiello
Copy link
Contributor

Tagging in @theJasonHelmick to give a read over this one when he gets the chance

@TSlivede
Copy link
Author

TSlivede commented Jun 5, 2020

As it is now again (almost) a year later, and there still are no answers of Powershell Team members to the questions in my comment I can now say, that I am no longer interested in implementing this, sorry. 🙁

@TSlivede
Copy link
Author

TSlivede commented Jun 5, 2020

Some more thoughts if someone wants to implement something:

@mklement0

In short: On Windows, double-quoting must be applied not only if an argument contains a or embedded ", but also if one of the following characters is present:

& | < > ^ , ;

While I agree, that this is definitely a good idea for arguments to batch files, I think that the case is less clear for calling cmd.exe directly: If calling cmd.exe won't be made a special case (and I'd suggest that it won't be), then a call like

cmd /c '"C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Auxiliary\Build\vcvars64.bat" && dumpbin "C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\Common7\IDE\devenv.exe"'

will no longer work. That's fine because even now I prefer to use

cmd /c@ "C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Auxiliary\Build\vcvars64.bat" "&&" dumpbin "C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\Common7\IDE\devenv.exe"

(I've described the /c@ workaround here)
This works now and it would continue to work after the argument escaping has been fixed - but it requires, that arguments without spaces but with @ or && will not be quoted...

If arguments with @ or && will be quoted, than one has to use the ugly --% syntax (which should probably never have been added in the first place), so one can no longer use variables in an easy way, etc. etc.

@TSlivede
Copy link
Author

TSlivede commented Jun 5, 2020

Another problem/inconsistency on windows: When one launches a script file by it's name (instead of explicitly calling the interpreter with the script file as argument), eg.

& "C:\path\to\script.py" "some argument"

then powershell has to use one of the ShellExecute... API functions in some way. These functions look up which exe file to actually execute and how to pass the parameters. For example: For my python installation the key HKEY_CLASSES_ROOT\Python.File\Shell\open\command looks like this

"C:\WINDOWS\py.exe" "%L" %*

where %L is the "long path" to the original script ("C:\path\to\script.py" in the example above) and %* are the remaining arguments quoted exactly as they were given in the ShellExecute... call.

So far this is completely fine. However, that registry key can also contain placeholders of the form %2, %3, and so on. In this case the argument string is split by some internal code of (I guess) the windows shell into the tokens %2, %3, and so on.

The problem is, that this code does not follow the same rules as the startup code of c/c++ application compiled with MCVS nor those of the CommandLineToArgvW api function nor those of cmd.exe. The exact way in which the arguments are split in this case is to my knowledge not documented.


Luckily this feature is AFAIK not used very often, so I don't think that it's necessary for powershell to consider this. But maybe someone else thinks differently about this, so I thought I'd mention it.

@TSlivede
Copy link
Author

TSlivede commented Jun 5, 2020

As I'm giving up on this topic, I thought I could at least mention why:

  • Seemingly little interest (at least it seems to me this way) of the Powershell Team members in this topic even though the issue thread is already really long and old.
  • Microsoft still releases new applications and doesn't fix old ones, that absolutely don't follow any common rules for how the commandline is split into the argument array (msiexec, cmdkey.exe, wsl, ...)

If for some reason I get motivated enough in the future to approach this topic again, it's likely, that I will just implement something and create a pullrequest instead of again trying to discuss things in a RFC, sorry. 😟

@musm
Copy link

musm commented Jun 26, 2020

@TSlivede Despite the Powershell Team's little interest, this is one of the most problematic issues with Powershell. It's a a major pain point for myself. I do plea with he Powershell team to strongly consider this PR and it's implications.

@mklement0
Copy link
Contributor

mklement0 commented Jun 26, 2020

Fully agreed, @musm.

@TSlivede: Thanks for all the work you've done so far.

In the hopes that this will pick up momentum again, let me address your cmd concern:

If calling cmd.exe won't be made a special case

Thinking about this some more: I suggest special-casing calls to batch files only (commands that resolve to *.cmd or .bat files), by selectively double-quoting arguments without spaces that have cmd metacharacters; that way, all other programs, including direct calls to cmd.exe wouldn't be affected.

Special-casing is never a great solution, but I think it would ease a lot of pain that comes from calls such as az "myargs&b" to popular tools that happen to have batch-file wrappers (purely an implementation detail).

(It is highly unfortunate that when a batch-file call from outside cmd is handled, that the arguments are still subject to the cmd-internal parsing rules - instead, they should be treated verbatim (i.e., recognize double quotes as the only syntactic elements) - but, we're clearly stuck with the current behavior.)

It is unfortunate that we therefore cannot take advantage of .ArgumentList for batch files, given that it passes something like verbatim myargs&b invariably as-is (unquoted), but I still think implementing this exception is worthwhile.

@mklement0
Copy link
Contributor

@joeyaiello, re the following and the way forward in general:

we can figure out further down the line whether this is opt-in vs. opt-out behavior, whether there's some transition path, and if a preference variable is the right mechanism for turning it on and off.

I've posted thoughts here.

@joeyaiello
Copy link
Contributor

Apologies, @TSlivede. We're still working to get a handle on this older backlog of RFCs.

Unfortunately, it's been very difficult to form a consensus around this issue (PowerShell/PowerShell#1995), particularly for the reasons you described with regards to inconsistent Windows executables.

Right now, we're looking to improve this functionality within the work in PowerShell/PowerShell#14692 by driving an experimental feature to understand the real-world implications of the proposed fix, to be followed by a new RFC.

More rationale posted in PowerShell/PowerShell#14747

@joeyaiello
Copy link
Contributor

Given all this, we're going to close this PR. Thank you for getting the conversation started with this proposal, @TSlivede

@joeyaiello joeyaiello closed this Mar 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
RFCs
Community plans to implement
Development

Successfully merging this pull request may close these issues.

double quotes in string literal arguments get removed when calling native commands
8 participants