BC breaching behavior change in URL parsing with file:// scheme on Windows. #1187

Closed
weltling opened this Issue Jan 2, 2017 · 12 comments

Projects

None yet

4 participants

@weltling
Contributor
weltling commented Jan 2, 2017

I did this

In PHP, local paths can be wrapped with file://, which works cross platform. The change in the commit
3463408 causes a regression on the Windows side in the PHP test suite, when the following code is used

<?php

$of = "file://" . dirname(__FILE__) . DIRECTORY_SEPARATOR . "curl.out";
$h = fopen("some_file", "r");

$c = curl_init();
curl_setopt($c, CURLOPT_URL, $of);
curl_setopt($c, CURLOPT_UPLOAD, 1);
curl_setopt($c, CURLOPT_INFILE, $h);
curl_exec($c);
curl_close($c);

fclose($h);

I expected the following

The $of file is written with the contents of the file to be uploaded. The filename under Windows will look like file://c:\some\path\curl.out. With the new parsing mechanism, the URL becomes invalid on Windows. This change is sudden and will break many PHP apps, at least. Of course, the change itself is correct, but given it has a high breach potential, IMO the compatibility parts should be kept.

Especially, as there is no particular reason to not to keep BC. I've also checked other software, fe with a path like file://c:\windows, both Firefox and Edge, will convert it to file:///c:\windows or similar. Same when used file://127.0.0.1/c:\windows

curl/libcurl version

The change happened between 7.51.0 and 7.52.1.

operating system

Windows

Thanks.

@weltling weltling referenced this issue Jan 2, 2017
@bagder bagder URL-parser: for file://[host]/ URLs, the [host] must be localhost
Previously, the [host] part was just ignored which made libcurl accept
strange URLs misleading users. like "file://etc/passwd" which might've
looked like it refers to "/etc/passwd" but is just "/passwd" since the
"etc" is an ignored host name.

Reported-by: Mike Crowe
Assisted-by: Kamil Dudka
3463408
@bagder
Member
bagder commented Jan 3, 2017

After the initial two slashes comes a hostname or nothing at all. Putting the directory name immediately following the two slashes was always wrong and is shown by the browsers correcting it to the three slashes version. Is there a reason why you can't just fix the PHP tests to instead use three slashes instead of two?

given it has a high breach potential, IMO the compatibility parts should be kept

We learned that the previous approached mislead users so it had "a high breach potential" and now many months after the change you say the change is bad. How do you then suggest we do this in a way that doesn't mislead users and yet keeps old users using the wrong syntax happy?

@jay
Member
jay commented Jan 3, 2017

It is true that the browsers do adjust for Windows paths like file://c:/tmp => file:///c:/tmp where in Linux something like file://tmp is just bad format. Related is an issue I have yet to fix with the --proto-default file where something like c:\abc and c is mistakenly considered a scheme

@weltling
Contributor
weltling commented Jan 3, 2017 edited

Currently 7.51.0 is used in PHP, so that's not far ago, not many months. Seems the change was introduced in the subsequent version, which had a followup short after. OFC, there's no upgrade to every version, but you see it's quite close to the latest. I was not telling that the change is bad, and you see it also in the description :) I'm only telling, that there's a BC breach.

Fe imagine there are two paths like /some/dir/file.php on Linux and c:\some\dir\file.php on Windows. The particular snippet like "file://" . dirname(__FILE__) would automatically evaluate to file:///some/dir and file://c:\some\dir respectively, depending on the actual OS the script runs on. So with 7.52.x, without any change to the script but the libcurl version only, it becomes incorrect.

If it stays by the current handling, it'll probably have to be worked around internally in PHP by auto correcting the supplied URL, which costs some processor cycles OFC. Even then, one branch goes into the security only mode soon, so there'll be no time to put a fix and ensure the stability, so we'd have to stick to a patched libs versions :(

I fully understand, that you don't have to care about any third party things depending on libcurl. I think fixing tests is inconvenient, as tests are not only in PHP core, but also in any arbitrary frameworks, etc. and quite possibly some patterns are in use in the running PHP code. This is a Windows specific thing, and the only this specific case, that I'd suggest to handle same way as browsers do. AFM, it's suitable to keep this backward compatibility piece, otherwise please ignore the noise.

Thanks.

@Jan-E Jan-E referenced this issue in winlibs/cURL Jan 3, 2017
Open

Enable NGHTTP2 #5

@bagder
Member
bagder commented Jan 4, 2017 edited

(I was referring to commit 3463408, which is almost two months old by now. But that's not "many months", I was wrong.)

I'm open for fixing it to support file://c:\blablabla on windows again (and ideally correcting the URL internally) as I think this case is not as vulnerable for mistakes or tickery as the unix case we mostly discussed and targeted with the change. The biggest downside is probably that the code is likely to become messier and will bring more target specific changes.

IMHO: You should still consider fixing your URLs in PHP tests and elsewhere, as it seems wrong to rely on non-speced URL formats...

@weltling
Contributor
weltling commented Jan 7, 2017

@badger great! Would you mind then to apply the patch i've linked before or any other you'd see sufficient? I could open a PR, if required, anyway.

That's an unfortunate construct, that beats usage at quite a few places. Another case was some time ago with libxml 2.9.2, that decided to go by prefix file:/ when returning paths. Weird, but still works with Unix style paths in some way. Lucikily, libxml also supports pure paths without scheme, so it was possible to do a cheap fix by just sliding the path over file:/ part :) The crossplatform usage in PHP is file://<some_path>, so all the scripts would require to be rewritten to either use file:// or file:/// depending on the platform. While it's correct, it would make things unnecessary sophisticated for the cross platform code. And there's indeed no security impact with the discussed case, anyway.

Please let me know about the further procedure.

Thanks.

@bagder
Member
bagder commented Jan 7, 2017

On the patch

  1. is uint8_t safe to use for all windows compilers with no other includes? Feels safer to use unsigned char...
  2. Wrong indent at one place, run make checksrc and it'll complain.
  3. We don't #ifdef on _WIN32, we use WIN32 - see existing code.
  4. Most importantly perhaps, the #ifdef magic in lib/file.c that allows "[letter]:" constructs is more complicated: https://github.com/curl/curl/blob/master/lib/file.c#L69 and it feels like we should perhaps consider using the same in both places?

(maybe doing a pull-request with it would be easier to facilitate easier discussions on the code)

That's an unfortunate construct, that beats usage at quite a few places

That may be true, but that doesn't change the fact that you (the plural you as in everyone writing code that uses URLs written like this) are relying on a badly formatted URLs that aren't following the spec and that's just... wrong. We get interoperability by following specs.

If you'd really wanted the URL spec to allow that weirdo format, you should take that discussion to the IETF and the work on updating the FILE: URI format: https://tools.ietf.org/html/draft-ietf-appsawg-file-scheme-16 (tirelessly worked on by @phluid61)

@jay
Member
jay commented Jan 7, 2017

This may be able to be worked in to the other issue I'm working on with the proto-default file. Also I think we should cover the case of c:/foo/bar as well as c:\foo\bar. Let me take a look at this first, it might be better to land mine before he continues on his

@weltling
Contributor
weltling commented Jan 8, 2017

Thanks for the comments @badger. So then, i'm going to wait for @jay's ping back and produce a patch with the mentioned corrections.

@phluid61 i'm willing to help, if the topic sounds plausible to you and you can give me a direction :)

Thanks.

@phluid61
phluid61 commented Jan 8, 2017

@weltling thanks for the offer. This particular draft is already in the IETF editor's queue, so we shouldn't be making any changes at the moment. I'm pretty sick of it right now, to be honest, but I'm not opposed to the idea of drafting a replacement spec once this one's published. More buy-in from interested parties would certainly make it a different authoring experience.

Back in the glorious days of yore (before April last year) earlier drafts did make mention of the "file://c:..." construct: [1]

Dubious encodings:

o "file://c:/path/to/file"

o "file://c/path/to/file"

    An encoding that includes a Windows drive letter as the
    authority field.  This encoding exists in some extant
    implementations, and is supported by the grammar for historical
    reasons.  New URIs of this form SHOULD NOT be generated.

For some reason that was dropped and never made it back into what is now the "non-standard syntax variations" section. There is explicit mention of backslashes, though:

E.4. Backslash as Separator

Historically some usages have copied entire file paths into the path
components of file URIs. Where DOS or Windows file paths were thus
copied the resulting URI strings contained unencoded backslash ""
characters, which are forbidden by both [RFC1738] and [RFC3986].

It may be possible to translate or update such an invalid file URI by
replacing all backslashes "" with slashes "/", if it can be
determined with reasonable certainty that the backslashes are
intended as path separators.

So there's that.

Personally I'm firmly of the opinion that standard libraries should provide APIs to convert between equivalent data types (i.e. file paths and file: URIs). That way people won't need to jury rig solutions like 'file://' . dirname(__FILE__).

They still will, of course, but at least that way when it goes wrong you can say: "Use the official API"

@weltling
Contributor
weltling commented Jan 8, 2017

@phluid61 many thanks for this extended infos. Yeah, i can imagine one'd need a pause after finishing the amount of work like that :) In the current draft, there's also a mention of the other prefixes in the win32 namespace like \\?\ for the long path and others. It would probably make sense to handle all that stuff, once the work on the new draft continues. I'm going also to poke my colleges at MSFT, as this is for sure a thing for interoperability that should be handled centralized. Please ping in any case, when there's a continuation.

Thanks.

@jay
Member
jay commented Jan 9, 2017

@weltling I was able to cover this usage in my changes for #1124, refer to draft2.

@weltling
Contributor
weltling commented Jan 9, 2017

Closing in favor of #1124.

Thanks.

@weltling weltling closed this Jan 9, 2017
@jay jay added a commit that referenced this issue Jan 12, 2017
@jay jay url: Fix parsing for when 'file' is the default protocol
Follow-up to 3463408.

Prior to 3463408 file:// hostnames were silently stripped.

Prior to this commit it did not work when a schemeless url was used with
file as the default protocol.

Ref: https://curl.haxx.se/mail/lib-2016-11/0081.html
Closes #1124

Also fix for drive letters:

- Support --proto-default file c:/foo/bar.txt

- Support file://c:/foo/bar.txt

- Fail when a file:// drive letter is detected and not MSDOS/Windows.

Bug: #1187
Reported-by: Anatol Belski
Assisted-by: Anatol Belski
1d4202a
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment