std.process: Command escaping functions #457

Merged
merged 8 commits into from Apr 29, 2012

Conversation

Projects
None yet
6 participants
Owner

CyberShadow commented Feb 24, 2012

This adds functions to escape shell commands for use with the system and shell functions.

This is a cleanup of the code I've written for rdmd. Let me know how I did with docs / semantics. "Real-life" unit tests are, unfortunately, hard to write.

@ghost

ghost commented Feb 24, 2012

Have you looked at http://blogs.msdn.com/b/twistylittlepassagesallalike/archive/2011/04/23/everyone-quotes-arguments-the-wrong-way.aspx ? I've tried implementing that C# algorithm in D but with no luck. I definitely need this sort of thing. :)

Owner

CyberShadow commented Feb 24, 2012

No, I haven't (yet), but I think my unit tests are quite comprehensive! They literally brute-force a string and pass the output through the Win32 parsing function.

Owner

CyberShadow commented Feb 24, 2012

I think that blog post is wrong about the way cmd handles double quotes on its command line. From what I understood, it contradicts the output of cmd /?, and his description of cmd's algorithm makes it sound like cmd /C ""test" "hello world"" wouldn't pass "hello world" as a single string to "test".

Owner

CyberShadow commented Feb 25, 2012

Thanks for the link, Andrej! I looked at the code again today and did some more tests and turns out I was wrong. I'll need to do more changes and testing to make sure I got this right.

Member

kyllingstad commented Feb 25, 2012

I am not sure if we're talking about the same thing, but @schveiguy wrote something like this for Tango's process module, and he has reimplemented it for the new std.process. (BTW, Walter included the necessary changes to snn.lib in 2.058, which means we can finally submit the new std.process for Phobos review!)

Owner

CyberShadow commented Feb 25, 2012

It seems to be the same idea, but his implementation seems to be a good deal behind mine (original research which seems to miss some subtle issues, suboptimal code which needlessly uses unicode iteration, no tests).

which means we can finally submit the new std.process for Phobos review!

Great! I'm really excited about popen functionality on Windows. You should be able to practically copy-paste the bottom of the current module into the new one to get my changes.

@ghost

ghost commented Feb 25, 2012

The last time I've ran into issues with spaces it was with Optlink. When you have paths with spaces you have to use quotation marks, but this is problematic. E.g.:

link ".\cache\win32\wingdi.obj" ,D:\dev\projects\adl\.\main.exe,nul,"D:\dev\newprojects\empty space\xfbuild\";

Here optlink missed the last backslash and is treating the import path as a path to a library file:

Warning 2: File Not Found D:\dev\newprojects\empty space\xfbuild.lib

To work around it, I have to pass an extra backslash:

link ".\cache\win32\wingdi.obj" ,D:\dev\projects\adl\.\main.exe,nul,"D:\dev\newprojects\empty space\xfbuild\\";

Of course this wouldn't even be an issue if optlink didn't use such a silly command-line syntax where it differentiates between import paths and file paths based on whether there's a backslash at the end.

Owner

CyberShadow commented Feb 25, 2012

Yes, that sounds like the usual CommandLineToArgvW algorithm. Backslashes at the end of quoted strings are treaded differently.

Owner

CyberShadow commented Feb 25, 2012

OK, should be solid now.

@ghost

ghost commented Feb 25, 2012

Fantastic, escapeWindowsArgument works great! The resulting string is quite a weird-looking thing, a before and after of raw strings:

"D:\dev\newprojects\empty space\xfbuild\"
"\"D:\dev\newprojects\empty space\xfbuild\\\""

Thanks for the hard work.

You may want to use the WEB macro which does not need http://

Owner

CyberShadow replied Feb 26, 2012

The WEB macro was deprecated by Denis, because the same macro name was used for hyperlinks and web-only content, causing confusion. All files that use the WEB macro now declare it locally.

Why does escaping start at true?

Explanatory comment? I assume the two 1s come from the surrounding quotes?

Owner

CyberShadow commented Feb 26, 2012

Clarified.

@CyberShadow CyberShadow std.process: Code comments
* Remove self-contradiction
* Clarify escapeWindowsArgument implementation
* Remove a redundant if subcondition in a unit test.
c9fa9f1
Member

schveiguy commented Feb 29, 2012

My std.process rules are correct, I spent a lot of time testing them. The MSDN docs are useless/wrong! But my implementation is lacking. In particular I don't handle repeated backslashes properly (contrary to what my rules state!). Good call about not doing unicode. I think when I did that I didn't understand unicode, and so I did what I thought was the most robust.

I'll also cite a limitation in your code, your function returns a string per argument, but if you are building a command line you want a string for the whole command (CreateProcess requires a single command line string). This is a waste of allocations. I'd suggest having an internal function that takes an output range that it appends the data to, and wrap that with a version that returns a string. Then I'll probably use it in my std.process rewrite :)

Owner

CyberShadow commented Feb 29, 2012

My std.process rules are correct, I spent a lot of time testing them.

I don't see any unit tests.

I'd like to say something from my history of writing this code. Initially, some of these functions were written for rdmd. I thought my unit test was comprehensive enough - turns out it was not. I had omitted the case of backslashes preceding quotes in the middle of a string.

After I posted this pull request, I found another problem with the escapeShellCommandString function. The problem made me realize that my initial approach at allowing users to stack multiple commands with shell redirection was wrong, so I had to rewrite it again, and add a truly comprehensive test that left no room for error. Even though the test requires special effort (a helper program) to run, it is included as part of this module.

Therefore, I reserve my right to remain skeptical about statements like "don't worry, I tested it".

The MSDN docs are useless/wrong! But my implementation is lacking. In particular I don't handle repeated backslashes properly (contrary to what my rules state!).

I admit, I skipped the text and looked at the code.

I'll also cite a limitation in your code, your function returns a string per argument, but if you are building a command line you want a string for the whole command (CreateProcess requires a single command line string). This is a waste of allocations. I'd suggest having an internal function that takes an output range that it appends the data to, and wrap that with a version that returns a string.

The cost of a few heap allocations is negligible compared to that of creating a process.

Comparably, your code does a character append for each character. (Yes, I know escapeWindowsShellCommand does that as well - but see above.)

@ghost

ghost commented Feb 29, 2012

If your string is so big that you worry about allocations you're going to have to output that long command to a response file anyway. And that is where File IO is probably going to overshadow any optimizations you had.

Speaking of which, is there any way to programmatically figure out the maximum length of a command on Posixes? On win32 it's pretty simple since it's predefined..

Owner

CyberShadow commented Feb 29, 2012

I think Autotools configure scripts find it as part of the configuration process.

Member

schveiguy commented Feb 29, 2012

I don't see any unit tests.

The testing was from a previous life (when I wrote the code implementing these rules in tango). In fact, I don't mean that the implementation is well tested, I mean the rules are well tested. Clearly the implementation is poorly tested since I had an obvious error in it :) The only way to test the rules is to spawn processes and see if windows thinks you got it right. There is no accurate or valid documentation anywhere. It was about 2 or 3 days of playing with cmd.exe to try and nail down the Windows argument parsing black-box.

Your implementation follows my rules, so I think we are in agreement on that.

The cost of a few heap allocations is negligible compared to that of creating a process.

Creating processes is not occurring in your argument processing function, nor is it a prerequisite for using the function. My point is, if it can be better implemented, it should be. I dislike the philosophy that one can be sloppy in writing software because most users won't notice. Especially in library code. If it's a design for performance tradeoff, that's one thing, but this should be free.

Member

schveiguy commented Feb 29, 2012

Just to clarify, I'm perfectly happy having someone else implement this horrible convention of windows, but I think it should be as efficient as possible. Your code is certainly more advanced than mine, and I'm happy to use it.

Owner

CyberShadow commented Feb 29, 2012

I dislike the philosophy that one can be sloppy in writing software because most users won't notice.

There is nothing philosophical about premature optimization. It is a misallocation of resources, and nothing more. You are forgetting about the cost of maintaining more code, or more complicated code, as well as the cost to actually do the change - my time would be better spent working on something else D-related.

Member

schveiguy commented Mar 1, 2012

You understand that as library code, application writers don't have a choice to optimize this.

And this isn't just an optimization, it's providing extra design flexibility.

It's ok by me to pull this, I'll just fix it when the new std.process is ready to use it. Not a big deal.

@JakobOvrum JakobOvrum commented on an outdated diff Mar 1, 2012

std/process.d
+ or passed to cmd /C - this one may contain shell
+ control characters, e.g. > or | for redirection /
+ piping - thus, yet another layer of escaping is
+ used to distinguish them from program arguments.
+
+ Except for escapeWindowsArgument, the intermediary
+ format (2) is hidden away from the user in this module.
+*/
+
+/**
+ Quote an argument in a manner conforming to the behavior of
+ $(LINK2 http://msdn.microsoft.com/en-us/library/windows/desktop/bb776391(v=vs.85).aspx,
+ CommandLineToArgvW).
+*/
+
+string escapeWindowsArgument(string arg)
@JakobOvrum

JakobOvrum Mar 1, 2012

Member

This should be in char[], the result is always a new buffer.
After a quick look, it looks like it should be pure, nothrow and @trusted too (it looks like assumeUnique isn't @safe)

@JakobOvrum JakobOvrum commented on an outdated diff Mar 1, 2012

std/process.d
+
+ foreach (s; testStrings)
+ {
+ auto q = escapeWindowsArgument(s);
+ LPWSTR lpCommandLine = (to!(wchar[])("Dummy.exe " ~ q) ~ "\0"w).ptr;
+ int numArgs;
+ LPWSTR* args = CommandLineToArgvW(lpCommandLine, &numArgs);
+ scope(exit) LocalFree(args);
+ assert(numArgs==2, s ~ " => " ~ q ~ " #" ~ text(numArgs-1));
+ auto arg = to!string(args[1][0..wcslen(args[1])]);
+ assert(arg == s, s ~ " => " ~ q ~ " => " ~ arg);
+ }
+ }
+}
+
+private string escapePosixArgument(string arg)
@JakobOvrum

JakobOvrum Mar 1, 2012

Member

This should be using inout. (oh, this function is private, missed that, sorry. Below still applies)

Also, it's unnecessarily producing an extra garbage string for every call.
I'm not sure about the current quality of Appender or if we have an efficient
concatenation function (to combine N amounts of concatenations), but surely this garbage can easily be eliminated.

@JakobOvrum JakobOvrum and 1 other commented on an outdated diff Mar 1, 2012

std/process.d
+
+ version (Windows)
+ return escapeWindowsArgument(arg);
+ else
+ return escapePosixArgument(arg);
+}
+
+private string escapeShellArguments(string[] args)
+{
+ auto result = args.dup;
+ foreach (ref arg; result)
+ arg = escapeShellArgument(arg);
+ return result.join(" ");
+}
+
+string escapeWindowsShellCommand(string command)
@JakobOvrum

JakobOvrum Mar 1, 2012

Member

should be in char[] command, though I guess since it's not documented it's not really supposed to be used? Maybe add a comment? Or was this also supposed to be private?

@CyberShadow

CyberShadow Mar 1, 2012

Owner

I'm undecided if it should be public. It'd be useful for creating Windows batch files on all platforms, but I don't know if that's a use case common enough to add a public, documented function to the standard library.

@JakobOvrum JakobOvrum commented on an outdated diff Mar 1, 2012

std/process.d
+ ">" ~
+ escapeShellFileName("D download links.txt"));
+---
+*/
+
+string escapeShellCommand(string[] args...)
+{
+ return escapeShellCommandString(escapeShellArguments(args));
+}
+
+/**
+ Escape a filename to be used for shell redirection with
+ the $(D system) or $(D shell) functions.
+*/
+
+string escapeShellFileName(string fn)
@JakobOvrum

JakobOvrum Mar 1, 2012

Member

This also produces immediate garbage, and should be using inout

Member

JakobOvrum commented Mar 1, 2012

The cost of a few heap allocations is negligible compared to that of creating a process.

In terms of computational cost when allocating, indeed it's negligible.

But the memory cost is significant, since we don't have any compacting garbage collection. Please don't produce more garbage than necessary, especially not in the standard library - you can never predict the demands of the user.

edit:

Oh, and, it looks you haven't considered @safe, pure or nothrow for any of these functions, so I didn't remark on it beyond the first comment.

Owner

CyberShadow commented Mar 1, 2012

heap fragmentation

Good point. I'll take care of it.

CyberShadow added some commits Mar 1, 2012

@CyberShadow CyberShadow std.process: Optimize allocations in new functions
Optimized functions:
* escapeWindowsArgument
* escapePosixArgument
* escapeWindowsShellCommand
4ccfb65
@CyberShadow CyberShadow std.process: Fix constness and attributes 026444c
Owner

CyberShadow commented Mar 1, 2012

Now?

Did you mean ==? I think = still does what you want, but it looks awfully suspicious...

Owner

CyberShadow replied Mar 1, 2012

No, I meant =. I want to test if the result is writable.

Ah, of course! :)

I'm a little sad that this is done manually. join? Appender? Do we really have nothing appropriate for this task?

Nevermind, I see that the issue is with using replace, sorry about that :)

Owner

CyberShadow replied Mar 1, 2012

Appender does from O(log(N)) to O(N/M) allocations. Using split / join requires a second allocation. std.array.replace uses Appender internally. I don't really know what the goal is here.

Owner

CyberShadow commented Mar 9, 2012

Oops, forgot to remove the [WIP] tag.

No further objections?

Member

schveiguy commented Mar 9, 2012

Fix the test failures, I have no problem with the design. The allocator is perfect hook for my purposes.

http://d.puremagic.com/test-results/pull.ghtml?runid=89494

Owner

CyberShadow commented Mar 16, 2012

Thanks. Should be fixed now.

jmdavis merged commit b08a0f9 into dlang:master Apr 29, 2012

Member

jmdavis commented Apr 29, 2012

Merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment