Skip to content

std.process: Command escaping functions #457

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Apr 29, 2012

Conversation

CyberShadow
Copy link
Member

This adds functions to escape shell commands for use with the system and shell functions.

This is a cleanup of the code I've written for rdmd. Let me know how I did with docs / semantics. "Real-life" unit tests are, unfortunately, hard to write.

@ghost
Copy link

ghost commented Feb 24, 2012

Have you looked at http://blogs.msdn.com/b/twistylittlepassagesallalike/archive/2011/04/23/everyone-quotes-arguments-the-wrong-way.aspx ? I've tried implementing that C# algorithm in D but with no luck. I definitely need this sort of thing. :)

@CyberShadow
Copy link
Member Author

No, I haven't (yet), but I think my unit tests are quite comprehensive! They literally brute-force a string and pass the output through the Win32 parsing function.

@CyberShadow
Copy link
Member Author

I think that blog post is wrong about the way cmd handles double quotes on its command line. From what I understood, it contradicts the output of cmd /?, and his description of cmd's algorithm makes it sound like cmd /C ""test" "hello world"" wouldn't pass "hello world" as a single string to "test".

@CyberShadow
Copy link
Member Author

Thanks for the link, Andrej! I looked at the code again today and did some more tests and turns out I was wrong. I'll need to do more changes and testing to make sure I got this right.

@kyllingstad
Copy link
Member

I am not sure if we're talking about the same thing, but @schveiguy wrote something like this for Tango's process module, and he has reimplemented it for the new std.process. (BTW, Walter included the necessary changes to snn.lib in 2.058, which means we can finally submit the new std.process for Phobos review!)

@CyberShadow
Copy link
Member Author

It seems to be the same idea, but his implementation seems to be a good deal behind mine (original research which seems to miss some subtle issues, suboptimal code which needlessly uses unicode iteration, no tests).

which means we can finally submit the new std.process for Phobos review!

Great! I'm really excited about popen functionality on Windows. You should be able to practically copy-paste the bottom of the current module into the new one to get my changes.

@ghost
Copy link

ghost commented Feb 25, 2012

The last time I've ran into issues with spaces it was with Optlink. When you have paths with spaces you have to use quotation marks, but this is problematic. E.g.:

link ".\cache\win32\wingdi.obj" ,D:\dev\projects\adl\.\main.exe,nul,"D:\dev\newprojects\empty space\xfbuild\";

Here optlink missed the last backslash and is treating the import path as a path to a library file:

Warning 2: File Not Found D:\dev\newprojects\empty space\xfbuild.lib

To work around it, I have to pass an extra backslash:

link ".\cache\win32\wingdi.obj" ,D:\dev\projects\adl\.\main.exe,nul,"D:\dev\newprojects\empty space\xfbuild\\";

Of course this wouldn't even be an issue if optlink didn't use such a silly command-line syntax where it differentiates between import paths and file paths based on whether there's a backslash at the end.

@CyberShadow
Copy link
Member Author

Yes, that sounds like the usual CommandLineToArgvW algorithm. Backslashes at the end of quoted strings are treaded differently.

@CyberShadow
Copy link
Member Author

OK, should be solid now.

@ghost
Copy link

ghost commented Feb 25, 2012

Fantastic, escapeWindowsArgument works great! The resulting string is quite a weird-looking thing, a before and after of raw strings:

"D:\dev\newprojects\empty space\xfbuild\"
"\"D:\dev\newprojects\empty space\xfbuild\\\""

Thanks for the hard work.

@CyberShadow
Copy link
Member Author

Clarified.

* Remove self-contradiction
* Clarify escapeWindowsArgument implementation
* Remove a redundant if subcondition in a unit test.
@schveiguy
Copy link
Member

My std.process rules are correct, I spent a lot of time testing them. The MSDN docs are useless/wrong! But my implementation is lacking. In particular I don't handle repeated backslashes properly (contrary to what my rules state!). Good call about not doing unicode. I think when I did that I didn't understand unicode, and so I did what I thought was the most robust.

I'll also cite a limitation in your code, your function returns a string per argument, but if you are building a command line you want a string for the whole command (CreateProcess requires a single command line string). This is a waste of allocations. I'd suggest having an internal function that takes an output range that it appends the data to, and wrap that with a version that returns a string. Then I'll probably use it in my std.process rewrite :)

@CyberShadow
Copy link
Member Author

My std.process rules are correct, I spent a lot of time testing them.

I don't see any unit tests.

I'd like to say something from my history of writing this code. Initially, some of these functions were written for rdmd. I thought my unit test was comprehensive enough - turns out it was not. I had omitted the case of backslashes preceding quotes in the middle of a string.

After I posted this pull request, I found another problem with the escapeShellCommandString function. The problem made me realize that my initial approach at allowing users to stack multiple commands with shell redirection was wrong, so I had to rewrite it again, and add a truly comprehensive test that left no room for error. Even though the test requires special effort (a helper program) to run, it is included as part of this module.

Therefore, I reserve my right to remain skeptical about statements like "don't worry, I tested it".

The MSDN docs are useless/wrong! But my implementation is lacking. In particular I don't handle repeated backslashes properly (contrary to what my rules state!).

I admit, I skipped the text and looked at the code.

I'll also cite a limitation in your code, your function returns a string per argument, but if you are building a command line you want a string for the whole command (CreateProcess requires a single command line string). This is a waste of allocations. I'd suggest having an internal function that takes an output range that it appends the data to, and wrap that with a version that returns a string.

The cost of a few heap allocations is negligible compared to that of creating a process.

Comparably, your code does a character append for each character. (Yes, I know escapeWindowsShellCommand does that as well - but see above.)

@ghost
Copy link

ghost commented Feb 29, 2012

If your string is so big that you worry about allocations you're going to have to output that long command to a response file anyway. And that is where File IO is probably going to overshadow any optimizations you had.

Speaking of which, is there any way to programmatically figure out the maximum length of a command on Posixes? On win32 it's pretty simple since it's predefined..

@CyberShadow
Copy link
Member Author

I think Autotools configure scripts find it as part of the configuration process.

@schveiguy
Copy link
Member

I don't see any unit tests.

The testing was from a previous life (when I wrote the code implementing these rules in tango). In fact, I don't mean that the implementation is well tested, I mean the rules are well tested. Clearly the implementation is poorly tested since I had an obvious error in it :) The only way to test the rules is to spawn processes and see if windows thinks you got it right. There is no accurate or valid documentation anywhere. It was about 2 or 3 days of playing with cmd.exe to try and nail down the Windows argument parsing black-box.

Your implementation follows my rules, so I think we are in agreement on that.

The cost of a few heap allocations is negligible compared to that of creating a process.

Creating processes is not occurring in your argument processing function, nor is it a prerequisite for using the function. My point is, if it can be better implemented, it should be. I dislike the philosophy that one can be sloppy in writing software because most users won't notice. Especially in library code. If it's a design for performance tradeoff, that's one thing, but this should be free.

@schveiguy
Copy link
Member

Just to clarify, I'm perfectly happy having someone else implement this horrible convention of windows, but I think it should be as efficient as possible. Your code is certainly more advanced than mine, and I'm happy to use it.

@CyberShadow
Copy link
Member Author

I dislike the philosophy that one can be sloppy in writing software because most users won't notice.

There is nothing philosophical about premature optimization. It is a misallocation of resources, and nothing more. You are forgetting about the cost of maintaining more code, or more complicated code, as well as the cost to actually do the change - my time would be better spent working on something else D-related.

@schveiguy
Copy link
Member

You understand that as library code, application writers don't have a choice to optimize this.

And this isn't just an optimization, it's providing extra design flexibility.

It's ok by me to pull this, I'll just fix it when the new std.process is ready to use it. Not a big deal.

CommandLineToArgvW).
*/

string escapeWindowsArgument(string arg)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be in char[], the result is always a new buffer.
After a quick look, it looks like it should be pure, nothrow and @trusted too (it looks like assumeUnique isn't @safe)

@JakobOvrum
Copy link
Member

The cost of a few heap allocations is negligible compared to that of creating a process.

In terms of computational cost when allocating, indeed it's negligible.

But the memory cost is significant, since we don't have any compacting garbage collection. Please don't produce more garbage than necessary, especially not in the standard library - you can never predict the demands of the user.

edit:

Oh, and, it looks you haven't considered @safe, pure or nothrow for any of these functions, so I didn't remark on it beyond the first comment.

@CyberShadow
Copy link
Member Author

heap fragmentation

Good point. I'll take care of it.

Optimized functions:
* escapeWindowsArgument
* escapePosixArgument
* escapeWindowsShellCommand
@CyberShadow
Copy link
Member Author

Now?

@CyberShadow
Copy link
Member Author

Oops, forgot to remove the [WIP] tag.

No further objections?

@schveiguy
Copy link
Member

Fix the test failures, I have no problem with the design. The allocator is perfect hook for my purposes.

http://d.puremagic.com/test-results/pull.ghtml?runid=89494

@CyberShadow
Copy link
Member Author

Thanks. Should be fixed now.

jmdavis added a commit that referenced this pull request Apr 29, 2012
@jmdavis jmdavis merged commit b08a0f9 into dlang:master Apr 29, 2012
@jmdavis
Copy link
Member

jmdavis commented Apr 29, 2012

Merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants