I did this
/tmp/repro.c:
#include <stdio.h>
#include <curl/curl.h>
static const char URL[] = "file:?q";
int main(void) {
CURLU *u = curl_url();
CURLU *v = curl_url();
char *out = NULL;
CURLUcode rc;
rc = curl_url_set(u, CURLUPART_URL, URL, 0);
printf("Set url to \"%s\" with rc=%d\n", URL, rc);
rc = curl_url_get(u, CURLUPART_URL, &out, 0);
printf("Got url \"%s\" with rc=%d\n", out, rc);
rc = curl_url_set(v, CURLUPART_URL, out, 0);
printf("Reparsed url \"%s\" with rc=%d\n", out, rc);
curl_free(out);
curl_url_cleanup(u);
curl_url_cleanup(v);
return 0;
}
> export CURL_PATH="/path/to/curl/repo"
> gcc "-I$CURL_PATH/include" /tmp/repro.c "-L$CURL_PATH/lib/.libs" "-Wl,-rpath,$CURL_PATH/lib/.libs" -lcurl -o /tmp/repro && /tmp/repro
Set url to "file:?q" with rc=0
Got url "file://(nil)?q" with rc=0
Reparsed url "file://(nil)?q" with rc=19
I expected the following
Either curl_url_set or curl_url_get to return a non-zero rc, or for (nil) not to appear in the retrieved URL.
Following the pseudocode in 5.3 of RFC3986 I'd expect file:?q.
parse_file in lib/urlapi.c checks for authority components, but doesn't confirm that things that aren't authority components (i.e. that don't start with //) start with at least a single /.
The (nil) output occurs because for inputs with empty paths (e.g. file:?q, file:#f), handle_path doesn't allocate u->path when the path length is zero, so u->path stays NULL. urlget_url then passes that NULL into its format string. The hardcoded file:// in the output URL is the same kind of issue in that the format string assumes the empty-authority form with a slash-prefixed path, but parse_file lets stuff in that violates that assumption.
This bug occurs with a variety of URLs, e.g. in the below table, in ways which don't always cause a (nil) to be emitted, but do make the retrieved URL unparseable by Curl. (As in, calling curl_url_set with the URL retrieved with curl_url_get results in CURLUE_BAD_FILE_URL for all the URLs in the second column below.)
| Input |
curl_url_get |
file:?q |
file://(nil)?q |
file:#f |
file://(nil)#f |
file:... |
file://... |
file:foo |
file://foo |
file:foo/bar |
file://foo/bar |
file:./ |
file:// |
file:.. |
file:// |
file:./foo |
file://foo |
file:../foo |
file://foo |
file:a/../b |
file://a/b |
If you agree this is a bug, a fix might be to update the parse_file logic such that inputs that aren't file:// or file:/ return CURLUE_BAD_FILE_URL. This also fixes the (nil) issue, as no input can have pathlen less than one since there'll always be at least one /.
For context, I found this by running some of the WPT URL tests against curl. The test case below was one of the ones that helped me find this issue (though it came up in many of them).
{
"input": "file:?q=v",
"base": null,
"href": "file:///?q=v",
"protocol": "file:",
"username": "",
"password": "",
"host": "",
"hostname": "",
"port": "",
"pathname": "/",
"search": "?q=v",
"hash": ""
},
curl/libcurl version
230a986
operating system
7.0.9-arch1-1
I did this
/tmp/repro.c:
I expected the following
Either
curl_url_setorcurl_url_getto return a non-zero rc, or for(nil)not to appear in the retrieved URL.Following the pseudocode in 5.3 of RFC3986 I'd expect
file:?q.parse_fileinlib/urlapi.cchecks for authority components, but doesn't confirm that things that aren't authority components (i.e. that don't start with//) start with at least a single/.The
(nil)output occurs because for inputs with empty paths (e.g.file:?q,file:#f),handle_pathdoesn't allocateu->pathwhen the path length is zero, sou->pathstays NULL.urlget_urlthen passes that NULL into its format string. The hardcodedfile://in the output URL is the same kind of issue in that the format string assumes the empty-authority form with a slash-prefixed path, butparse_filelets stuff in that violates that assumption.This bug occurs with a variety of URLs, e.g. in the below table, in ways which don't always cause a
(nil)to be emitted, but do make the retrieved URL unparseable by Curl. (As in, callingcurl_url_setwith the URL retrieved withcurl_url_getresults inCURLUE_BAD_FILE_URLfor all the URLs in the second column below.)curl_url_getfile:?qfile://(nil)?qfile:#ffile://(nil)#ffile:...file://...file:foofile://foofile:foo/barfile://foo/barfile:./file://file:..file://file:./foofile://foofile:../foofile://foofile:a/../bfile://a/bIf you agree this is a bug, a fix might be to update the
parse_filelogic such that inputs that aren'tfile://orfile:/returnCURLUE_BAD_FILE_URL. This also fixes the(nil)issue, as no input can havepathlenless than one since there'll always be at least one/.For context, I found this by running some of the WPT URL tests against curl. The test case below was one of the ones that helped me find this issue (though it came up in many of them).
curl/libcurl version
230a986
operating system
7.0.9-arch1-1