Skip to content
Permalink
Browse files

URL-parser: for file://[host]/ URLs, the [host] must be localhost

Previously, the [host] part was just ignored which made libcurl accept
strange URLs misleading users. like "file://etc/passwd" which might've
looked like it refers to "/etc/passwd" but is just "/passwd" since the
"etc" is an ignored host name.

Reported-by: Mike Crowe
Assisted-by: Kamil Dudka
  • Loading branch information...
bagder committed Nov 11, 2016
1 parent 8c15e0d commit 346340808c89db33803ef7461dee191ff7c3d07f
Showing with 30 additions and 25 deletions.
  1. +30 −25 lib/url.c
@@ -4068,33 +4068,38 @@ static CURLcode parseurlandfillconn(struct Curl_easy *data,
* the URL protocols specified in RFC 1738
*/
if(path[0] != '/') {
/* the URL included a host name, we ignore host names in file:// URLs
as the standards don't define what to do with them */
char *ptr=strchr(path, '/');
if(ptr) {
/* there was a slash present
RFC1738 (section 3.1, page 5) says:
The rest of the locator consists of data specific to the scheme,
and is known as the "url-path". It supplies the details of how the
specified resource can be accessed. Note that the "/" between the
host (or port) and the url-path is NOT part of the url-path.
As most agents use file://localhost/foo to get '/foo' although the
slash preceding foo is a separator and not a slash for the path,
a URL as file://localhost//foo must be valid as well, to refer to
the same file with an absolute path.
*/
/* the URL includes a host name, it must match "localhost" or
"127.0.0.1" to be valid */
char *ptr;
if(!checkprefix("localhost/", path) &&
!checkprefix("127.0.0.1/", path)) {
failf(data, "Valid host name with slash missing in URL");
return CURLE_URL_MALFORMAT;
}
ptr = &path[9]; /* now points to the slash after the host */

if(ptr[1] && ('/' == ptr[1]))
/* if there was two slashes, we skip the first one as that is then
used truly as a separator */
ptr++;
/* there was a host name and slash present
/* This cannot be made with strcpy, as the memory chunks overlap! */
memmove(path, ptr, strlen(ptr)+1);
}
RFC1738 (section 3.1, page 5) says:
The rest of the locator consists of data specific to the scheme,
and is known as the "url-path". It supplies the details of how the
specified resource can be accessed. Note that the "/" between the
host (or port) and the url-path is NOT part of the url-path.
As most agents use file://localhost/foo to get '/foo' although the
slash preceding foo is a separator and not a slash for the path,
a URL as file://localhost//foo must be valid as well, to refer to
the same file with an absolute path.
*/

if('/' == ptr[1])
/* if there was two slashes, we skip the first one as that is then
used truly as a separator */
ptr++;

/* This cannot be made with strcpy, as the memory chunks overlap! */
memmove(path, ptr, strlen(ptr)+1);
}

protop = "file"; /* protocol string */

5 comments on commit 3463408

@weltling

This comment has been minimized.

Copy link
Contributor

replied Jan 2, 2017

This change breaks the Windows use case, as the paths there usually don't start with /. Furthermore, it is unclear, how to handle it in general with the strictness. I guess, that the path like file://c:\windows should be handled like file:///c:\windows. Likewise, a path like file://127.0.0.1/c:\windows should strip the part, preceding the actual path. I might be wrong though, as the third slash after the scheme is actually something, expected to belong tho the path, while on Windows it wouldn't.

I discovered this, while preparing 7.52.1 for updates in PHP. Here's my first reaction patch winlibs/cURL@b8f0151 . This only handles the Windows absolute path variant, as described in the paragraph above. There are several other path variants, that might want to be handled as well. However, this form of the absolute path is the most one used on Windows and with PHP, other path variants could probably be ignored for now.

@bagder Could you please check the patch? IMO, it is important to keep the backward compatibility in this case. At least in PHP, we'd rather have to stick to 7.51.0 for a while or use patched versions, to avoid compatibility breach in arbitrary apps.

Thanks.

@jay

This comment has been minimized.

Copy link
Member

replied Jan 2, 2017

We decided to be explicit about what we'd accept as the file hostname. Can you review the mailing list thread and if you think you have a bug then file it. This works fine for me both before and after the changes in Windows 7: curl file:///c:\foo\bar.txt

@bagder

This comment has been minimized.

Copy link
Member Author

replied Jan 2, 2017

I'm with @jay here: please spell out exactly what the bug is you have with current curl and we can discuss that. We decided that the old lenient way of parsing mislead users due to the hostname part being ignored. If you have a URL (that is actually a valid and legal URL) that now doesn't parse properly in 7.52.1, please file a bug and tell us the details.

@weltling

This comment has been minimized.

Copy link
Contributor

replied Jan 2, 2017

Thanks for checking. Being explicit is good, of course. However 7.52.1 causes some dozen test fails in the PHP test suite, so the change is a bit sudden :) I'm not deep into the RFCs yet, but i see, that there's quite some software keeping this parts compatible. I'm going to file a bug.

Thanks.

@weltling

This comment has been minimized.

Copy link
Contributor

replied Jan 2, 2017

Here is the ticket #1187.

Thanks.

Please sign in to comment.
You can’t perform that action at this time.