Skip to content

Latest commit

 

History

History
85 lines (57 loc) · 7.93 KB

RFC####-Improve-generation-of-argument-string-for-executables.md

File metadata and controls

85 lines (57 loc) · 7.93 KB
RFC Author Status Version Area Comments Due Plan to implement
Timo Schwarte
Draft
1.0
Calling external executables
2017-07-07
true

Improve generation of argument-string for executables

When PowerShell calls an external executable, the NativeCommandParameterBinder takes all parameters of the command and generates the argument-string (also called "CommandLine"), which is sent to the executable (as ProcessStartInfo.Arguments). It seems like the NativeCommandParameterBinder tries to generate an argument-string from the given parameters, such that argv[] of the called executable is set accordingly. This works well, if an argument contains spaces. In this case the NativeCommandParameterBinder adds quotes around the argument, so it appears in argv[] as one argument (and is not split into parts). However, if the argument itself contains quotes, these are not escaped. Therefore, the corresponding element in argv[] has no quotes and depending on the actual string, the argument might be split into multiple argv[] elements. It can even occur, that the following arguments are not handled correctly.

This RFC suggests making the NativeCommandParameterBinder compatible to the typical CommandLine escaping rules. Additionally, it suggests adding a verbose-symbol to add a custom formatted string to the CommandLine (for executables, that don't follow the typical escaping rules.)

This is basically a bugfix for PowerShell/PowerShell#1995 and PowerShell/PowerShell#3049 and on stackoverflow ("This seems like a bug to me. If I am passing the correct escaped strings to PowerShell, then PowerShell should take care of whatever escaping may be necessary for however it invokes the command." and "... and I think this is a bug Powershell doesn't escape any double quotes that appear inside the arguments."). However, as this is a very longstanding bug, there are workarounds for some cases. These will no longer work if the proposed changes are applied, therefore this document suggests to add a preference variable to get the old behavior back if somebody needs it.

Motivation

As a PowerShell user who wants to call external executables,
I can pass any arguments to these executables,
so that they reach the `argv[]` array unchanged.

Specification

Quoting

The decision if an argument needs to be quoted, will be simplified: Any Argument that contains ", ' or a character that matches char.IsWhiteSpace will be quoted. To quote an argument (compatible to MSVC rules and CommandLineToArgvW):

  • Every occurrence of N times \ followed by " will be replaced by (2*N+1) times \ followed by ". (N ∈ {0,1,2,...})
  • N times \ at the end of the string is replaced by (2*N) times \. (N ∈ {0,1,2,...})
  • " is added to the beginning and to the end of the string

Verbatim Argument

When the parser detects the 'Verbatim-Argument-Symbol' (--=), the next argument is copied to the argument-string exactly as it is, without escaping or adding quotes around the argument. The symbol is detected in the parser, so the string literal '--=' or any other object that expands to that string is not interpreted as 'Verbatim-Argument-Symbol'.

Preference Variable

If the Preference Variable $PsUseLegacyArgumentStringGeneration is set to true, the quoting is done as it is now. (Only for compatibility with old Scripts). The Variable is false by default.

Alternate Proposals and Considerations

Other shells

Maybe this is not the strongest argument ever, but many other modern Commandshells on windows create the argument-string compatible to these rules:

Wine - although not a shell - also faced this problem and also creates the CommandLine compatible to these rules.

These rules are widely accepted

Most Compilers on Windows generate executables that split the CommandLine compatible to those rules, for example the compiler shipped with VisualStudio or the mingw compiler suite. The .Net runtime also splits the CommandLine in a way that is compatible to those rules. The Windows API function CommandLineToArgvW is (although incorrectly documented) also compatible to those rules. Therefore, PowerShell should build the CommandLine in a way that matches those rules instead of just sometimes adding quotes around arguments.

Linux

While this change is important on windows, it's absolutely necessary on Linux: On Windows the CommandLine is split by the next executable and most executables follow the described rules. On Linux the CommandLine is not split by the called executable -- the .Net Core runtime splits the string. Therefore, on Linux the described rules do not only apply to many calls of external executables, they apply to ALL calls of external executables. When the proposed changes are implemented, the arguments from within PowerShell always arrive -- as expected -- as the argv[] array in called executables.

Batch files

Sadly, one of the few exceptions to those typical parsing rules is cmd.exe. Because of this, one cannot reliably call batch files with arbitrary arguments. (This is no problem of PowerShell, it's a cmd design problem.) Some arguments are impossible -- an uneven number of double quotes can only be sent in the last argument. To my knowledge there is no clean way to deal with this, therefore I think using the typical escaping rules is still the way to go. In many cases this is the correct way, as many batch files simply redirect their arguments to other executables and in those cases these rules apply.

Edit: Optionally a special rule for batch files could be added: " in arguments of batch files won't be escaped according to the rules described in Specification->Quoting -- instead, each literal " will be replaced by "". Many batch files seem to expect this and this way arguments won't be split into multiple %1,%2,... variables.
End edit

Considerations outside of the scope of this RFC

Maybe on Linux --% and --= can be deprecated, as they don't make much sense on Linux -- the CommandLine always needs to be escaped according to these rules (see paragraph "Linux").

The main purpose of --% was (as far as I know) not the possibility to influence the CommandLine directly, but instead a way to disable many special characters in a PowerShell command. In the future one could possibly add a --$ token that disables all special characters except quotes and $ -- a proper option that does what --% was originally meant for.