-
-
Notifications
You must be signed in to change notification settings - Fork 609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
allow subdirectories on Windows #5337
Conversation
|
Test please? |
|
Don't you already test it on Posix? The point here is really just to get consistent cross-platform behavior. The contents of the test directory is meaningless to me, so I can't tell what's there and what's not. If there is a Posix specific test, all we have to do is expand it to run on Windows too. |
Yes, so there should be a Windows test to make sure this doesn't regress.
I'm sorry you think so, but everyone else who contributed to DMD before you clearly found their way around it.
Yeah. |
|
I added a test and all green on the board. Let's get this merged. |
| @@ -472,12 +472,21 @@ struct FileName | |||
| { | |||
| version (Windows) | |||
| { | |||
| /* Disallow % / \ : and .. in name characters | |||
| // don't allow loading / because it might be an absolute | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
loading -> leading
|
|
|
Please rebase to get rid of the merge commit |
|
Cheers. LGTM |
|
It is peculiar to allow / in Windows paths but not \ |
|
|
|
This change ignores the advice in https://www.securecoding.cert.org/confluence/display/c/FIO02-C.+Canonicalize+path+names+originating+from+tainted+sources I believe we must take this advice seriously, and not have the D compiler become a malware vector. |
|
Who cares? It makes no difference in any realistic scenario. This is why I don't like trying to contribute to anything here. D is run by a bunch of short-sighted fools. |
I do. I believe we have a responsibility to do what we can to not create software that opens back doors to malicious people.
All I can say to that is "famous last words". And people do host instances of the compiler and allow remote users to run it - those paste bins, for example. Heck, there's our very own autotester which tries to compile anything someone submits via a pull request.
I understand the sentiment. I also read most of that document, and it suggested things like not allowing // embedded in the middle of paths, which was not done. I don't think either of us (and certainly not me) should be declaring our practices secure without a much better understanding of best practices. Keep in mind that dmd can run arbitrary code at compile time. Combining that with reading arbitrary files from the file system sounds like a large opportunity for the equivalent of a cross-site scripting attack. The |
|
On Mon, Apr 18, 2016 at 04:03:59PM -0700, Walter Bright wrote:
From the document you posted: "The best advice is to try to avoid making decisions based on a path, directory, or file name [Howard 2002]. Alternatively, use operating-system-based mechanisms, such as access control lists (ACLs) or other authorization techniques." When I set up a "run this code" thing, I used an expendable virtual machine because I didn't trust user input. That's the way to do it, not string sanitization. |
That's because you know about this capability of the compiler. It isn't a common language feature, and many users may not expect that they'd need to harden the use of it. Recall that we promote the use of D as a "scripting" language, and people may be using the compiler to run random scripts they download from the internet. Consider all the ugly surprises people have gotten from Word documents, pdf files, etc., where the readers for those files allow arbitrary code execution. |
That article does not apply to our situation. Ignore it. We do not need to resolve symbolic links, because creating these links is "on the other side of the airtight hatchway". For our purposes, is not any harder to validate Windows paths than on any other platform. |
|
On Mon, Apr 18, 2016 at 04:33:58PM -0700, Walter Bright wrote:
If they are running random scripts they download from the internet, The compiled program is always a security threat greater than or equal The only possible scenario in which the compiler's "holes" would be a |
|
Also, we should not be doing |
There doesn't appear to be any validation code on the Windows path. |
|
What do you mean? There's 18 lines of code in the |
|
Does that follow any of the recommendations in the article? |
To the best of my knowledge it follows all the recommendations that apply to us. It forbids absolute paths and escaping via |
|
On Mon, Apr 18, 2016 at 05:21:04PM -0700, Vladimir Panteleev wrote:
Which, by the way, is the status quo. All this PR does is allow '/' to appear after the first character Everything else is left unmodified. |
Where does it say that? To the best of my knowledge, two consecutive slashes or backslashes in the middle of a string will behave as one. But there's also no harm in checking for them. |
I know. But what about invalid UTF code points? What about backspace characters? All the weird Unicode non-printing values? The API docs I've read are utterly silent on this, which means who knows what they do. We don't even disallow clearly illegal characters like *.
I don't share your confidence that since some APIs strip off the leading
That's not a justification for adding more vectors.
It is not, as these strings will be coming from D source files, or can even be the result of CTFE execution. This is where the checks should be; there isn't any other place. Here's where I'm coming from. Reading arbitrary files is risky. Hackers have shown themselves over and over to be very clever in exploiting holes that never occurred to the programmer. The idea is to be as conservative as possible, not as expansive as possible. Tighten down the screws and only loosen them as experience shows that loosening is both safe and the extra capability is shown to be worth while. |
I know. If I was doing a do-over, D wouldn't allow non-ASCII characters in identifiers (which is why they are allowed in module names). |
There are no OS file APIs that parse this path format!!!!!
No! Then you would need to forbid any such characters to appear ANYWHERE in a compiled program - source code, calculated CTFE expressions, etc. Mr. Bright, I'm sorry but in this particular situation, you don't know what you're talking about. Please allow someone else to review this instead. |
|
On Mon, Apr 18, 2016 at 08:18:45PM -0700, Walter Bright wrote:
It is never stripped off, that's a legal file name on Linux and will work in the current directory, and an illegal one on Windows and the operating system will not allow you to open it. This is such a silly discussion. |
Can't you just deprecate that feature? Print a warning for one or two versions and then remove it?
+1 |
It will not solve any technical issues. Some filesystems (notably HFS+) do hairy stuff with Unicode normalization, but it is not any more of a concern in this situation than case insensitivity. The "problems" from using Unicode characters remain as they can occur from numerous other sources. |
|
Just to elaborate on this for a bit. Here is the situation: We have string imports which take arbitrary strings, and we need to make sure that these strings are file names under one of the -J switches. Right? What is our concern here?
So, what is the security impact? Here's my understanding of what's happening:
So, what's the attack vectors? I see:
As things are, currently I don't see any issues with the current code, except that it is overly restrictive (e.g. But really, I'm with Adam on how completely absurd this thread is. Currently most programs are not distributed as mere It's shameful how much time we're wasting on this. You've just earned yourself one fewer Digger feature. |
|
Paths are fundamentally just 16-bit words on Windows and bytes on Posix. That's how they should be treated although today we mostly get away with assuming Unicode encodings. E.g. When you insert a CD-ROM the charset is converted to Unicode. We can also make all compiler interactions with file names require a portable ASCII subset, including module names, if file systems are not Unicode enough yet. |
That's actually precisely the point. I do know that opening up files is risky and should be restricted to a subdirectory of the -Jpath. What I don't know is a reliable way of restricting this, given the menagerie of things that can appear in paths. I'm skeptical of the "don't worry about it" attitude here. Are any of us particularly knowledgeable/experienced about what kinds of exploits are possible? I'd like to know a compelling reason why this should allow unusual (to put it mildly) characters in a filename. (Control characters, malformed Unicode, wildcards, paragraph separators, etc.) This is not a general purpose utility to read any file - it's likely to be a file generated for the purpose of being read by the D compiler. What awesome feature are we giving up in order to not support paths that consist of x86 binary instructions being inserted as part of a buffer overflow attack? I can't think of any. Just because we can turn on "read any filename" doesn't necessarily mean we should. Where we don't know, we should err on the side of caution. |
|
Did my big post above not get through? Because it already answers those questions. It's pointless, it's dangerous, and it's none of our business. |
|
You did make some good points, but some I disagree with. For example, dub allowing any path with -J is not a justification for the compiler allowing any path to be appended to a -Jpath. Secondly, I still don't see a justification for allowing any random binary data to be allowed as filenames, even if the OS API accepts them. Note that import path/file names are restricted to being identifier characters. Sometimes people grumble about that, but in general that has turned out to be a good decision with little (if any) downside. What is the upside to allowing random binary data as the string import filename? |
|
No justification is needed because there is no identifiable downside. The answer is the same as to the question "Why should we not forbid paths that contain the string "virus"?" (or any other string). Anyway, since it shouldn't make any difference in practice, I'm not strongly opposed the idea (I just strongly believe it's unnecessary). However, introducing such checks now would technically be a breaking change at this point. It is also orthogonal to this pull request, since it should probably apply to all platforms, and should be done regardless if the path string contains a directory separator. |
It's not that bad. I've programmed through 40 years of additions to how paths evolve, and there are certain predictable things about it. One is that alphanumerics will always work as filename characters. It's the other characters that shift around.
I don't understand how creating restrictions would be dangerous. |
From the command line, I agree. From within the compiler, not so much, as this is arbitrary user input data that is sent to the operating system filesystem API. This can happen on things like pastebin - anytime the compiler is on one system and untrusted users are sending it source code to compile. (The system shouldn't be set up to allow untrusted users sending it arbitrary command line switches.) I'm no expert on exploits, but they often follow the pattern of "find a bug, and exploit it by sending a carefully crafted string." By sanitizing the filename string, this makes it very hard to create a carefully crafted string. This is not like any other source string, as I don't know of any other part of D source code that gets sent to operating system APIs other than as I/O buffered data or the aforementioned module names (which are sanitized). You've characterized being concerned about imaginary bugs as FUD, but all exploits exploit bugs that nobody knew existed, and there doesn't seem to be any shortage of new ones being found all the time. |
I don't agree. This discussion about security and the right way to do this is one we should be having (even if it concludes with we don't have a problem). Too many organizations don't give a thought to security until after the barn burned down. |
I don't think path syntax has changed on either Windows or POSIX for the past 20 years.
This is still far from being an exploitable scenario.
We are writing code to ward off imaginary vulnerabilities we can only hypothesize the nature of. I think it's more likely we'll introduce a vulnerability than close one, by means of stack / heap buffer overflows (the code uses C strings) or obscuring the actually important logic (thus introducing a vulnerability now or later when something else changes, and it's not obvious why).
No, it's just FUD. You haven't provided one concrete reason for this.
Then perhaps you should leave this conversation to someone more experienced with the subject. Myself, Steven and Lars have worked on the path escaping functions in std.process. I've discovered a number of curious properties on how the Windows command processor escapes paths during my work on that. I had identified and created proof-of-concept exploits for some buffer overflow and transclusion exploits in the past. If you trust my expertise on this subject, you should trust my judgement. Otherwise you're just wasting our time. I've also asked a few friends (incl. an infosec person) to look at this patch. They haven't found any issues.
Your concerns are misplaced. I can name one much more serious security concern for D right now (though I won't do so publicly). |
|
BTW this discussion is still off-topic for this pull request. You are discussing a hypothetical breaking change orthogonal to the one presenting here. |
I appreciate that. Email me privately so we can fix it. |
I'm glad you've done this, it does make me feel a lot better that we aren't missing something. I still don't understand, however, how anyone can be so sure there are no buffer overflow exploits in the filesystem API that could not possibly be exploitable with a specially crafted filename string containing malicious binary code, now or in the future or on some system we may port the compiler to.
That is true. But I am reasonably satisfied now that there aren't other means besides .. and leading / of opening any file on the filesystem. |
The length of a path has increased dramatically on Windows. Support for / as path separator remains erratic. I don't know when the ?\ appeared. There was the abandonment of Win95 and its 'A' APIs. Possibly some changes to deal with the appearance of surrogate pairs. |
Only for UNC paths using the Unicode API. We forbid UNC paths (as they start with a backslash) and don't use the Unicode API (though we should), so we are not affected.
I think the change has been mostly in userspace utilities, not OS functions.
(I'm guessing you mean
I believe the underlying handling remains the same. The specification may have changed from UCS-2 to UTF-16, but that did not affect how they were handled. I don't think anything changed regarding directory traversal? |
Well, by simple fact that everyone would be in a LOT of trouble if there was a vulnerability in the OS for handling file names. There are many networked applications that access user-specified file names. It is many times more likely that such a vulnerability would be present in userspace (libc, dmd, or some other library). By the way, it's perfectly possible to create a vulnerability payload using only printable or even alphanumeric characters. See: https://en.wikipedia.org/wiki/Alphanumeric_shellcode Anyway, as I said, I'm not strongly against such validation, I just don't see the necessity. Userspace programs generally don't need to do this. I haven't seen any other project do this sort of validation, and I don't see why we should. For example, |
What I remember is that dealing with quotations on the command line was very bizarre, and did not follow the documentation (or the documentation was incomplete). I came up with a correct generator through trial and error. As far as paths, I don't remember that update specifically, but I wouldn't doubt it was hairy. As far as allowing I think continuing to disallow These are my opinions on the matter, I am not a security expert or have any experience with exploiting bugs at all. |
Yes. Github helpfully removed the leading \
That is a good point.
I had forgotten about that, thanks for reminding me.
I agree it is unlikely there is a vulnerability in this for widely used systems like Windows and Linux. But I keep reading about vulnerabilities being discovered in unexpected places in things people assumed were bulletproof because they were widely used. Furthermore, D may get ported to other operating systems that may not be so well hardened. I would like to turn this around, and ask again what need is served by supporting string import filenames with backspaces in them? This is not a grep program or a git program that needs to be able to deal with any filename supported by the operating system. You have made a good case for this PR, and I'll pull it. But I will open another one that will sanitize the filename, both the types of characters accepted and restricting the length to PATH_MAX, and we can talk about that there. |
|
Auto-merge toggled on |
Thanks.
Well, I'm out of arguments regarding this question, and I'm not strongly against such a restriction, so if if you'd still like to do it, go ahead. I agree that there is no strong use case to support unusual file names, so restricting the paths to what is supported on Windows and Linux would probably be fine. |
https://issues.dlang.org/show_bug.cgi?id=14349
I actually think the import restrictions are silly anyway, but this is just too restricting for use and creates a platform incompatibility between Windows and posix. This slight loosening of the rules should let us do the same without really opening anything else up (though even if it does, meh, users can already access files on their own computer so what difference does it make if some random code does too?)