Skip to content

Some file URLs improperly parsed. #21743

@tiymat

Description

@tiymat

I did this

/tmp/repro.c:

#include <stdio.h>
#include <curl/curl.h>

static const char URL[] = "file:?q";

int main(void) {
  CURLU *u = curl_url();
  CURLU *v = curl_url();
  char *out = NULL;
  CURLUcode rc;

  rc = curl_url_set(u, CURLUPART_URL, URL, 0);
  printf("Set url to \"%s\" with rc=%d\n", URL, rc);

  rc = curl_url_get(u, CURLUPART_URL, &out, 0);
  printf("Got url \"%s\" with rc=%d\n", out, rc);

  rc = curl_url_set(v, CURLUPART_URL, out, 0);
  printf("Reparsed url \"%s\" with rc=%d\n", out, rc);

  curl_free(out);
  curl_url_cleanup(u);
  curl_url_cleanup(v);

  return 0;
}
> export CURL_PATH="/path/to/curl/repo"
> gcc "-I$CURL_PATH/include" /tmp/repro.c "-L$CURL_PATH/lib/.libs" "-Wl,-rpath,$CURL_PATH/lib/.libs" -lcurl -o /tmp/repro && /tmp/repro
Set url to "file:?q" with rc=0
Got url "file://(nil)?q" with rc=0
Reparsed url "file://(nil)?q" with rc=19

I expected the following

Either curl_url_set or curl_url_get to return a non-zero rc, or for (nil) not to appear in the retrieved URL.


Following the pseudocode in 5.3 of RFC3986 I'd expect file:?q.

parse_file in lib/urlapi.c checks for authority components, but doesn't confirm that things that aren't authority components (i.e. that don't start with //) start with at least a single /.

The (nil) output occurs because for inputs with empty paths (e.g. file:?q, file:#f), handle_path doesn't allocate u->path when the path length is zero, so u->path stays NULL. urlget_url then passes that NULL into its format string. The hardcoded file:// in the output URL is the same kind of issue in that the format string assumes the empty-authority form with a slash-prefixed path, but parse_file lets stuff in that violates that assumption.

This bug occurs with a variety of URLs, e.g. in the below table, in ways which don't always cause a (nil) to be emitted, but do make the retrieved URL unparseable by Curl. (As in, calling curl_url_set with the URL retrieved with curl_url_get results in CURLUE_BAD_FILE_URL for all the URLs in the second column below.)

Input curl_url_get
file:?q file://(nil)?q
file:#f file://(nil)#f
file:... file://...
file:foo file://foo
file:foo/bar file://foo/bar
file:./ file://
file:.. file://
file:./foo file://foo
file:../foo file://foo
file:a/../b file://a/b

If you agree this is a bug, a fix might be to update the parse_file logic such that inputs that aren't file:// or file:/ return CURLUE_BAD_FILE_URL. This also fixes the (nil) issue, as no input can have pathlen less than one since there'll always be at least one /.


For context, I found this by running some of the WPT URL tests against curl. The test case below was one of the ones that helped me find this issue (though it came up in many of them).

{
  "input": "file:?q=v",
  "base": null,
  "href": "file:///?q=v",
  "protocol": "file:",
  "username": "",
  "password": "",
  "host": "",
  "hostname": "",
  "port": "",
  "pathname": "/",
  "search": "?q=v",
  "hash": ""
},

curl/libcurl version

230a986

operating system

7.0.9-arch1-1

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions