Skip to content

Should colons in paths skip url-encoding? #17977

@jeroen

Description

@jeroen

After we switched the R bindings to libcurl's curl_url_set() for building URLs, a lot of users reported the same problem: many public http servers expect colons : in the path to NOT be url-encoded. This affects almost all of these AI chat APIs, which use URI schemes with colons.

The core issue is similar to the curl_url_set() treatment of slashes:

When setting the path component with URL encoding enabled, the slash character
is skipped.

Perhaps colons deserve similar treatment, because they are commonly used as URI resource separators. For example:

https://firestore.googleapis.com/v1beta1/projects/{myprojectid}/databases/(default)/documents/Users:runQuery

Here we want {myprojectid} and (default) get URL encoded, but Users:runQuery should not. Right now there is no good way to do this, because if I call curl_url_set() with CURLU_URLENCODE then the colon gets encoded and the server does not understand the request.

However there is also no workaround: if we do not CURLU_URLENCODE we need to escape the input ourselves beforehand, but we run into this documented problem that also the slashes get escaped:

URLs are by definition *URL encoded*. To create a proper URL from a set of
components that may not be URL encoded already, you cannot just URL encode the
entire URL string with curl_easy_escape(3), because it then also converts
colons, slashes and other symbols that you probably want untouched.

Interestingly, the above already mentions colons as a special case, but perhaps this is about the colon in the scheme prefix, not the path?

Upon further search there are quite some services that use colons in the URL scheme, for example wikipedia, however in this case the server is smart enough to understand the url-encoded variant as well, and add redirect:

https://en.wikipedia.org/wiki/Wikipedia:Contents/Human_activities

I am not convinced there is any practical need for url-encoding the : in paths. What purpose does it serve? All servers I found seem to accept or even require an unencoded : in the path. Is it just to distinguish from the colon in the scheme prefix of the url? I doubt any server will mistake the entire part of the URL up till the latest : as the scheme?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions