Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow subdirectories on Windows #5337

Merged
merged 2 commits into from
Apr 20, 2016
Merged

Conversation

adamdruppe
Copy link
Contributor

https://issues.dlang.org/show_bug.cgi?id=14349

I actually think the import restrictions are silly anyway, but this is just too restricting for use and creates a platform incompatibility between Windows and posix. This slight loosening of the rules should let us do the same without really opening anything else up (though even if it does, meh, users can already access files on their own computer so what difference does it make if some random code does too?)

@CyberShadow
Copy link
Member

Test please?

@adamdruppe
Copy link
Contributor Author

Don't you already test it on Posix? The point here is really just to get consistent cross-platform behavior.

The contents of the test directory is meaningless to me, so I can't tell what's there and what's not.

If there is a Posix specific test, all we have to do is expand it to run on Windows too.

@CyberShadow
Copy link
Member

Don't you already test it on Posix? The point here is really just to get consistent cross-platform behavior.

Yes, so there should be a Windows test to make sure this doesn't regress.

The contents of the test directory is meaningless to me, so I can't tell what's there and what's not.

I'm sorry you think so, but everyone else who contributed to DMD before you clearly found their way around it.

If there is a Posix specific test, all we have to do is expand it to run on Windows too.

Yeah.

@adamdruppe
Copy link
Contributor Author

I added a test and all green on the board. Let's get this merged.

@@ -472,12 +472,21 @@ struct FileName
{
version (Windows)
{
/* Disallow % / \ : and .. in name characters
// don't allow loading / because it might be an absolute
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

loading -> leading

@CyberShadow
Copy link
Member

CyberShadow commented Apr 18, 2016

LGTM

@CyberShadow
Copy link
Member

Please rebase to get rid of the merge commit

@CyberShadow
Copy link
Member

Cheers. LGTM

@WalterBright
Copy link
Member

It is peculiar to allow / in Windows paths but not \

@adamdruppe
Copy link
Contributor Author

\ paths do not work on other systems either. This patch just makes the rules consistent cross-platform.

@WalterBright
Copy link
Member

WalterBright commented Apr 18, 2016

This change ignores the advice in https://www.securecoding.cert.org/confluence/display/c/FIO02-C.+Canonicalize+path+names+originating+from+tainted+sources

I believe we must take this advice seriously, and not have the D compiler become a malware vector.

@adamdruppe
Copy link
Contributor Author

Who cares? It makes no difference in any realistic scenario.

This is why I don't like trying to contribute to anything here. D is run by a bunch of short-sighted fools.

@WalterBright
Copy link
Member

Who cares?

I do. I believe we have a responsibility to do what we can to not create software that opens back doors to malicious people.

It makes no difference in any realistic scenario.

All I can say to that is "famous last words". And people do host instances of the compiler and allow remote users to run it - those paste bins, for example. Heck, there's our very own autotester which tries to compile anything someone submits via a pull request.

D is run by a bunch of short-sighted fools.

I understand the sentiment. I also read most of that document, and it suggested things like not allowing // embedded in the middle of paths, which was not done. I don't think either of us (and certainly not me) should be declaring our practices secure without a much better understanding of best practices.

Keep in mind that dmd can run arbitrary code at compile time. Combining that with reading arbitrary files from the file system sounds like a large opportunity for the equivalent of a cross-site scripting attack.

The -Jpath is there for getting files from a path.

@adamdruppe
Copy link
Contributor Author

On Mon, Apr 18, 2016 at 04:03:59PM -0700, Walter Bright wrote:

All I can say to that is "famous last words". And people do host instances of the compiler and allow remote users to run it - those paste bins, for example.

From the document you posted:

"The best advice is to try to avoid making decisions based on a path, directory, or file name [Howard 2002]. Alternatively, use operating-system-based mechanisms, such as access control lists (ACLs) or other authorization techniques."

When I set up a "run this code" thing, I used an expendable virtual machine because I didn't trust user input. That's the way to do it, not string sanitization.

@WalterBright
Copy link
Member

When I set up a "run this code" thing, I used an expendable virtual machine

That's because you know about this capability of the compiler. It isn't a common language feature, and many users may not expect that they'd need to harden the use of it. Recall that we promote the use of D as a "scripting" language, and people may be using the compiler to run random scripts they download from the internet.

Consider all the ugly surprises people have gotten from Word documents, pdf files, etc., where the readers for those files allow arbitrary code execution.

@CyberShadow
Copy link
Member

CyberShadow commented Apr 18, 2016

This change ignores the advice in https://www.securecoding.cert.org/confluence/display/c/FIO02-C.+Canonicalize+path+names+originating+from+tainted+sources

That article does not apply to our situation. Ignore it. We do not need to resolve symbolic links, because creating these links is "on the other side of the airtight hatchway". For our purposes, is not any harder to validate Windows paths than on any other platform.

@CyberShadow CyberShadow reopened this Apr 18, 2016
@adamdruppe
Copy link
Contributor Author

On Mon, Apr 18, 2016 at 04:33:58PM -0700, Walter Bright wrote:

That's because you know about this capability of the compiler. It isn't a common language feature, and many users may not expect that they'd need to harden the use of it. Recall that we promote the use of D as a "scripting" language, and people may be using the compiler to run random scripts they download from the internet.

If they are running random scripts they download from the internet,
the program can already do anything! It doesn't have to trick
the compiler, it can just call fopen("whatever/it/wants"); - moreover,
it can even dial out and transmit the files to some other machine using
regular network calls.

The compiled program is always a security threat greater than or equal
to the compiler.

The only possible scenario in which the compiler's "holes" would be a
problem is if the compiler is run in a more sensitive context than the
program it is compiling... and how often does that happen? The pasties
and testers actually run the program too, so any attacker would just
write malicious code in an ordinary fashion. Most developers run programs
they are testing locally too. Why import("") when you can fopen("")?

@CyberShadow
Copy link
Member

Also, we should not be doing realpath either. If a user created a symbolic link under a -J directory, then the expected behavior is that DMD will follow it. It does not make sense to resolve symbolic links as compiling a program with DMD does not offer the capability to create them. Symbolic links are also just one type of redirection available, as there are also hard links, bind mounts, etc. Thus the POSIX behavior is also currently wrong.

@WalterBright
Copy link
Member

That article does not apply to our situation. Ignore it. We do not need to resolve symbolic links, because creating these links is "on the other side of the airtight hatchway. For our purposes, is not any harder to validate Windows paths than on any other platform.

There doesn't appear to be any validation code on the Windows path.

@CyberShadow
Copy link
Member

What do you mean? There's 18 lines of code in the version(Windows) block.

@WalterBright
Copy link
Member

Does that follow any of the recommendations in the article?

@CyberShadow
Copy link
Member

Does that follow any of the recommendations in the article?

To the best of my knowledge it follows all the recommendations that apply to us. It forbids absolute paths and escaping via ...

@adamdruppe
Copy link
Contributor Author

On Mon, Apr 18, 2016 at 05:21:04PM -0700, Vladimir Panteleev wrote:

To the best of my knowledge it follows all the recommendations that apply to us. It forbids absolute paths and escaping via ...

Which, by the way, is the status quo.

All this PR does is allow '/' to appear after the first character
in the string.

Everything else is left unmodified.

@CyberShadow
Copy link
Member

and it suggested things like not allowing // embedded in the middle of paths

Where does it say that?

To the best of my knowledge, two consecutive slashes or backslashes in the middle of a string will behave as one.

But there's also no harm in checking for them.

@WalterBright
Copy link
Member

It's totally valid to have non-ascii in a filename.

I know. But what about invalid UTF code points? What about backspace characters? All the weird Unicode non-printing values? The API docs I've read are utterly silent on this, which means who knows what they do. We don't even disallow clearly illegal characters like *.

It is completely pointless complexity which obscures the actually important tests.

I don't share your confidence that since some APIs strip off the leading file: off that no other APIs will. I would put this one in the category of "disallow // anywhere in the string."

there are far more simple attack vector

That's not a justification for adding more vectors.

E.g. all strings coming from user input should be quoted and have special characters represented as escape sequences. This is a separate concern from this PR.

It is not, as these strings will be coming from D source files, or can even be the result of CTFE execution. This is where the checks should be; there isn't any other place.


Here's where I'm coming from. Reading arbitrary files is risky. Hackers have shown themselves over and over to be very clever in exploiting holes that never occurred to the programmer. The idea is to be as conservative as possible, not as expansive as possible. Tighten down the screws and only loosen them as experience shows that loosening is both safe and the extra capability is shown to be worth while.

@WalterBright
Copy link
Member

Currently module names with international characters are allowed.

I know. If I was doing a do-over, D wouldn't allow non-ASCII characters in identifiers (which is why they are allowed in module names).

@CyberShadow
Copy link
Member

I don't share your confidence that since some APIs strip off the leading file: off that no other APIs will. I would put this one in the category of "disallow // anywhere in the string."

There are no OS file APIs that parse this path format!!!!!

It is not, as these strings will be coming from D source files, or can even be the result of CTFE execution. This is where the checks should be; there isn't any other place.

No! Then you would need to forbid any such characters to appear ANYWHERE in a compiled program - source code, calculated CTFE expressions, etc.

Mr. Bright, I'm sorry but in this particular situation, you don't know what you're talking about.

Please allow someone else to review this instead.

@adamdruppe
Copy link
Contributor Author

On Mon, Apr 18, 2016 at 08:18:45PM -0700, Walter Bright wrote:

I know. But what about invalid UTF code points? What about backspace characters? All the weird Unicode non-printing values? The API docs I've read are utterly silent on this, which means who knows what they do. We don't even disallow clearly illegal characters like *.

  • is a perfectly legal character on Linux filesystems, and there's no need to check it on Windows, since the operating system will do it for you.

I don't share your confidence that since some APIs strip off the leading file: off that no other APIs will. I would put this one in the category of "disallow // anywhere in the string."

It is never stripped off, that's a legal file name on Linux and will work in the current directory, and an illegal one on Windows and the operating system will not allow you to open it.

This is such a silly discussion.

@CyberShadow
Copy link
Member

CC @kyllingstad @schveiguy

@wilzbach
Copy link
Member

I know. If I was doing a do-over, D wouldn't allow non-ASCII characters in identifiers (which is why they are allowed in module names).

Can't you just deprecate that feature? Print a warning for one or two versions and then remove it?

No! Then you would need to forbid any such characters to appear ANYWHERE in a compiled program - source code, calculated CTFE expressions, etc.

+1

@CyberShadow
Copy link
Member

CyberShadow commented Apr 19, 2016

Can't you just deprecate that feature? Print a warning for one or two versions and then remove it?

It will not solve any technical issues.

Some filesystems (notably HFS+) do hairy stuff with Unicode normalization, but it is not any more of a concern in this situation than case insensitivity. The "problems" from using Unicode characters remain as they can occur from numerous other sources.

@CyberShadow
Copy link
Member

CyberShadow commented Apr 19, 2016

Just to elaborate on this for a bit.

Here is the situation:

We have string imports which take arbitrary strings, and we need to make sure that these strings are file names under one of the -J switches. Right?

What is our concern here?

  • Do we care about directory traversal attacks (using absolute paths, or using ../ to escape into parent directories)? YES. Our goal is to keep the read files under the -J paths, and allowing directory traversal clearly does not satisfy that goal.
  • Do we care about resolving symbolic links? NO. Module path name resolution does not resolve or care about symbolic links. Breaking that would break a lot of code, and would be completely pointless. Symbolic links are useful for -J paths, in the same reason they are useful for module imports and elsewhere. It is not possible to create a symbolic link by compiling a D program. If an attacker can create a symbolic link to the drive root, he can already steal any file. As I mentioned, there are other avenues of filesystem aliasing (mounts, bind mounts, hard links), to which we do not need to pay any concern for the same reasons.
  • Do we care about canonicalizing the file names (fold case, Unicode normalization)? NO. Other than collapsing .., there is simply no reason to. Git has a valid reason in that it needs to forbid checking out files that, after canonicalization, end up under its .git. We have no such requirement - we have no such special directory, and we are reading files, not writing them. For symbolic link resolution (normally part of canonicalization), see above.
  • Do we care about possible imaginary OS bugs, or fictional future OS changes? NO. This is just FUD. We have no way of knowing what the bugs or changes would be like. Creating arbitrary restrictions based on what we think might be dangerous is in itself dangerous. The correct approach is to gain a complete understanding of the current situation and implement the correct solution.
  • Do we care about validating the strings against a blacklist of bad words? NO. I hope I shouldn't need to explain this one.
  • Do we care about Unicode / non-printable characters in the string? NO. With the exception of NUL and special path characters such as :, /, \ and .., such characters do not have any special roles in path name resolution. The documentation doesn't mention them precisely because they are not treated specially. The OS or filesystem may forbid some characters from appearing in file names, but this is not different from forbidding paths that are too long or too short or have too many path components, and it is not our business to perform this validation, but the OS's. As I wrote above, there is the concern that these characters may end up on the screen and trick the user into what the source of these characters are, but this is currently also doable via pragma(msg, ...) and many other means, and is not a concern for this PR.
  • Do we care if the strings will somehow end up being used in a way we don't use them? NO. Again, one can think of any number of ways that any piece of data can be misused by some hypothetical inconceivable future change in the program. What if the string somehow ends up being read by a web browser? What if the string somehow gets written to /dev/sda? What if the string gets sent to the Pope? Same magnitude of absurdity.
  • Do we care whether we do exactly what a page on the Internet says without trying to fully understand the problem? NO. That's cargo-cult programming.

So, what is the security impact?

Here's my understanding of what's happening:

  1. The user-supplied data (the string import paths from source code) will be parsed by the compiler.
  2. The compiler will validate the path to prevent directory traversal. During this stage it is treated as a C null-terminated string, which causes any embedded null characters to cause the string to be truncated. From a security point of view, this is harmless, though it should probably be an explicit error.
  3. The compiler will then search for the string import path under all directories supplied via -J switches, and load the first hit, using the POSIX open call or the Windows CreateFileA call. (That ought to be changed to use CreateFileW BTW.)

So, what's the attack vectors?

I see:

  • Buffer overflows and such during validation. Using C strings here doesn't help.
  • Improperly sanitized paths to open / CreateFileA which lead to loading files from paths not under those specified via -J.

As things are, currently I don't see any issues with the current code, except that it is overly restrictive (e.g. mixin("a..txt"), backslashes on Windows, symbolic links on POSIX etc. won't work even though they're harmless).

But really, I'm with Adam on how completely absurd this thread is. Currently most programs are not distributed as mere .d files, but as dub packages, or accompanied with makefiles, etc. Dub allows specifying arbitrary -J switches in its build configuration, and makefiles of course allow unrestricted execution. Furthermore, even the case of where a program will be compiled on a machine ONLY, that's it, and the binary transferred over somewhere else, which is the entire reason for -J's existence (and it was me who suggested its introduction!) is so remote as to border on hypothetical.

It's shameful how much time we're wasting on this. You've just earned yourself one fewer Digger feature.

@mleise
Copy link
Contributor

mleise commented Apr 19, 2016

Paths are fundamentally just 16-bit words on Windows and bytes on Posix. That's how they should be treated although today we mostly get away with assuming Unicode encodings. E.g. When you insert a CD-ROM the charset is converted to Unicode. We can also make all compiler interactions with file names require a portable ASCII subset, including module names, if file systems are not Unicode enough yet.
Removal of non-ASCII identifiers on the other hand has little to do with this pull request aside from a small overlap with module identifiers. It's a major breakage. I'd have to replace 1900 occurrences of ℕ with size_t and change a bunch of functions and ddoc using α, β, γ, Δ, ... denoting angles, differences and such. I don't see what this solves.

@WalterBright
Copy link
Member

you don't know what you're talking about.

That's actually precisely the point. I do know that opening up files is risky and should be restricted to a subdirectory of the -Jpath. What I don't know is a reliable way of restricting this, given the menagerie of things that can appear in paths. I'm skeptical of the "don't worry about it" attitude here. Are any of us particularly knowledgeable/experienced about what kinds of exploits are possible?

I'd like to know a compelling reason why this should allow unusual (to put it mildly) characters in a filename. (Control characters, malformed Unicode, wildcards, paragraph separators, etc.) This is not a general purpose utility to read any file - it's likely to be a file generated for the purpose of being read by the D compiler. What awesome feature are we giving up in order to not support paths that consist of x86 binary instructions being inserted as part of a buffer overflow attack? I can't think of any.

Just because we can turn on "read any filename" doesn't necessarily mean we should. Where we don't know, we should err on the side of caution.

@CyberShadow
Copy link
Member

CyberShadow commented Apr 19, 2016

Did my big post above not get through? Because it already answers those questions. It's pointless, it's dangerous, and it's none of our business.

@WalterBright
Copy link
Member

You did make some good points, but some I disagree with. For example, dub allowing any path with -J is not a justification for the compiler allowing any path to be appended to a -Jpath. Secondly, I still don't see a justification for allowing any random binary data to be allowed as filenames, even if the OS API accepts them.

Note that import path/file names are restricted to being identifier characters. Sometimes people grumble about that, but in general that has turned out to be a good decision with little (if any) downside.

What is the upside to allowing random binary data as the string import filename?

@CyberShadow
Copy link
Member

CyberShadow commented Apr 19, 2016

No justification is needed because there is no identifiable downside. The answer is the same as to the question "Why should we not forbid paths that contain the string "virus"?" (or any other string).

Anyway, since it shouldn't make any difference in practice, I'm not strongly opposed the idea (I just strongly believe it's unnecessary). However, introducing such checks now would technically be a breaking change at this point. It is also orthogonal to this pull request, since it should probably apply to all platforms, and should be done regardless if the path string contains a directory separator.

@WalterBright
Copy link
Member

We have no way of knowing what the bugs or changes would be like.

It's not that bad. I've programmed through 40 years of additions to how paths evolve, and there are certain predictable things about it. One is that alphanumerics will always work as filename characters. It's the other characters that shift around.

Creating arbitrary restrictions based on what we think might be dangerous is in itself dangerous.

I don't understand how creating restrictions would be dangerous.

@WalterBright
Copy link
Member

No justification is needed because there is no identifiable downside. The answer is the same as to the question "Why should we not forbid paths that contain the string "virus" (or any other string)?".

From the command line, I agree. From within the compiler, not so much, as this is arbitrary user input data that is sent to the operating system filesystem API. This can happen on things like pastebin - anytime the compiler is on one system and untrusted users are sending it source code to compile. (The system shouldn't be set up to allow untrusted users sending it arbitrary command line switches.)

I'm no expert on exploits, but they often follow the pattern of "find a bug, and exploit it by sending a carefully crafted string." By sanitizing the filename string, this makes it very hard to create a carefully crafted string.

This is not like any other source string, as I don't know of any other part of D source code that gets sent to operating system APIs other than as I/O buffered data or the aforementioned module names (which are sanitized).

You've characterized being concerned about imaginary bugs as FUD, but all exploits exploit bugs that nobody knew existed, and there doesn't seem to be any shortage of new ones being found all the time.

@WalterBright
Copy link
Member

It's shameful how much time we're wasting on this.

I don't agree. This discussion about security and the right way to do this is one we should be having (even if it concludes with we don't have a problem). Too many organizations don't give a thought to security until after the barn burned down.

@CyberShadow
Copy link
Member

It's not that bad. I've programmed through 40 years of additions to how paths evolve, and there are certain predictable things about it. One is that alphanumerics will always work as filename characters. It's the other characters that shift around.

I don't think path syntax has changed on either Windows or POSIX for the past 20 years.

This can happen on things like pastebin - anytime the compiler is on one system and untrusted users are sending it source code to compile.

This is still far from being an exploitable scenario.

I don't understand how creating restrictions would be dangerous.

We are writing code to ward off imaginary vulnerabilities we can only hypothesize the nature of. I think it's more likely we'll introduce a vulnerability than close one, by means of stack / heap buffer overflows (the code uses C strings) or obscuring the actually important logic (thus introducing a vulnerability now or later when something else changes, and it's not obvious why).

You've characterized being concerned about imaginary bugs as FUD, but all exploits exploit bugs that nobody knew existed, and there doesn't seem to be any shortage of new ones being found all the time.

No, it's just FUD. You haven't provided one concrete reason for this.

I'm no expert on exploits

Then perhaps you should leave this conversation to someone more experienced with the subject.

Myself, Steven and Lars have worked on the path escaping functions in std.process. I've discovered a number of curious properties on how the Windows command processor escapes paths during my work on that. I had identified and created proof-of-concept exploits for some buffer overflow and transclusion exploits in the past. If you trust my expertise on this subject, you should trust my judgement. Otherwise you're just wasting our time.

I've also asked a few friends (incl. an infosec person) to look at this patch. They haven't found any issues.

This discussion about security and the right way to do this is one we should be having (even if it concludes with we don't have a problem).

Your concerns are misplaced. I can name one much more serious security concern for D right now (though I won't do so publicly).

@CyberShadow
Copy link
Member

BTW this discussion is still off-topic for this pull request. You are discussing a hypothetical breaking change orthogonal to the one presenting here.

@WalterBright
Copy link
Member

Your concerns are misplaced. I can name one much more serious security concern for D right now (though I won't do so publicly).

I appreciate that. Email me privately so we can fix it.

@WalterBright
Copy link
Member

I've also asked a few friends (incl. an infosec person) to look at this patch. They haven't found any issues.

I'm glad you've done this, it does make me feel a lot better that we aren't missing something. I still don't understand, however, how anyone can be so sure there are no buffer overflow exploits in the filesystem API that could not possibly be exploitable with a specially crafted filename string containing malicious binary code, now or in the future or on some system we may port the compiler to.

You are discussing a hypothetical breaking change orthogonal to the one presenting here.

That is true. But I am reasonably satisfied now that there aren't other means besides .. and leading / of opening any file on the filesystem.

@WalterBright
Copy link
Member

I don't think path syntax has changed on either Windows or POSIX for the past 20 years.

The length of a path has increased dramatically on Windows. Support for / as path separator remains erratic. I don't know when the ?\ appeared. There was the abandonment of Win95 and its 'A' APIs. Possibly some changes to deal with the appearance of surrogate pairs.

@CyberShadow
Copy link
Member

The length of a path has increased dramatically on Windows.

Only for UNC paths using the Unicode API. We forbid UNC paths (as they start with a backslash) and don't use the Unicode API (though we should), so we are not affected.

Support for / as path separator remains erratic.

I think the change has been mostly in userspace utilities, not OS functions.

I don't know when the ?\ appeared

(I'm guessing you mean \\?\ and \\.\ prefixes) In Windows NT I think, but that's just a variant of UNC paths.

Possibly some changes to deal with the appearance of surrogate pairs.

I believe the underlying handling remains the same. The specification may have changed from UCS-2 to UTF-16, but that did not affect how they were handled.

I don't think anything changed regarding directory traversal?

@CyberShadow
Copy link
Member

CyberShadow commented Apr 19, 2016

I still don't understand, however, how anyone can be so sure there are no buffer overflow exploits in the filesystem API that could not possibly be exploitable with a specially crafted filename string containing malicious binary code, now or in the future or on some system we may port the compiler to.

Well, by simple fact that everyone would be in a LOT of trouble if there was a vulnerability in the OS for handling file names. There are many networked applications that access user-specified file names. It is many times more likely that such a vulnerability would be present in userspace (libc, dmd, or some other library).

By the way, it's perfectly possible to create a vulnerability payload using only printable or even alphanumeric characters. See:

https://en.wikipedia.org/wiki/Alphanumeric_shellcode

Anyway, as I said, I'm not strongly against such validation, I just don't see the necessity. Userspace programs generally don't need to do this. I haven't seen any other project do this sort of validation, and I don't see why we should.

For example, git allows checking in files with control characters in the filenames, and they've been pretty careful about path security considering the recent incidents.

@schveiguy
Copy link
Member

Myself, Steven and Lars have worked on the path escaping functions in std.process.

What I remember is that dealing with quotations on the command line was very bizarre, and did not follow the documentation (or the documentation was incomplete). I came up with a correct generator through trial and error.

As far as paths, I don't remember that update specifically, but I wouldn't doubt it was hairy.

As far as allowing /, I am 100% on board with that. We don't have to support all the different windows APIs here, just file reading, and I'm sure / support is here to stay. I don't see any reason to disallow \ if you are going to allow /. It's already not cross platform due to filesystem differences (e.g. capitalization).

I think continuing to disallow .. anywhere is fine. The use case is questionable, and no existing code relies on it.

These are my opinions on the matter, I am not a security expert or have any experience with exploiting bugs at all.

@WalterBright
Copy link
Member

(I'm guessing you mean ?\ and .\ prefixes)

Yes. Github helpfully removed the leading \

everyone would be in a LOT of trouble if there was a vulnerability in the OS for handling file names.

That is a good point.

it's perfectly possible to create a vulnerability using only printable or even alphanumeric characters.

I had forgotten about that, thanks for reminding me.

I haven't seen any other project do this sort of validation, and I don't see why we should.

I agree it is unlikely there is a vulnerability in this for widely used systems like Windows and Linux. But I keep reading about vulnerabilities being discovered in unexpected places in things people assumed were bulletproof because they were widely used. Furthermore, D may get ported to other operating systems that may not be so well hardened.

I would like to turn this around, and ask again what need is served by supporting string import filenames with backspaces in them? This is not a grep program or a git program that needs to be able to deal with any filename supported by the operating system.

You have made a good case for this PR, and I'll pull it. But I will open another one that will sanitize the filename, both the types of characters accepted and restricting the length to PATH_MAX, and we can talk about that there.

@WalterBright
Copy link
Member

Auto-merge toggled on

@WalterBright WalterBright merged commit ce91301 into dlang:master Apr 20, 2016
@CyberShadow
Copy link
Member

You have made a good case for this PR, and I'll pull it

Thanks.

But I will open another one that will sanitize the filename, both the types of characters accepted and restricting the length to PATH_MAX, and we can talk about that there.

Well, I'm out of arguments regarding this question, and I'm not strongly against such a restriction, so if if you'd still like to do it, go ahead. I agree that there is no strong use case to support unusual file names, so restricting the paths to what is supported on Windows and Linux would probably be fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants