After we switched the R bindings to libcurl's curl_url_set() for building URLs, a lot of users reported the same problem: many public http servers expect colons : in the path to NOT be url-encoded. This affects almost all of these AI chat APIs, which use URI schemes with colons.
The core issue is similar to the curl_url_set() treatment of slashes:
|
When setting the path component with URL encoding enabled, the slash character |
|
is skipped. |
Perhaps colons deserve similar treatment, because they are commonly used as URI resource separators. For example:
https://firestore.googleapis.com/v1beta1/projects/{myprojectid}/databases/(default)/documents/Users:runQuery
Here we want {myprojectid} and (default) get URL encoded, but Users:runQuery should not. Right now there is no good way to do this, because if I call curl_url_set() with CURLU_URLENCODE then the colon gets encoded and the server does not understand the request.
However there is also no workaround: if we do not CURLU_URLENCODE we need to escape the input ourselves beforehand, but we run into this documented problem that also the slashes get escaped:
|
URLs are by definition *URL encoded*. To create a proper URL from a set of |
|
components that may not be URL encoded already, you cannot just URL encode the |
|
entire URL string with curl_easy_escape(3), because it then also converts |
|
colons, slashes and other symbols that you probably want untouched. |
Interestingly, the above already mentions colons as a special case, but perhaps this is about the colon in the scheme prefix, not the path?
Upon further search there are quite some services that use colons in the URL scheme, for example wikipedia, however in this case the server is smart enough to understand the url-encoded variant as well, and add redirect:
https://en.wikipedia.org/wiki/Wikipedia:Contents/Human_activities
I am not convinced there is any practical need for url-encoding the : in paths. What purpose does it serve? All servers I found seem to accept or even require an unencoded : in the path. Is it just to distinguish from the colon in the scheme prefix of the url? I doubt any server will mistake the entire part of the URL up till the latest : as the scheme?
After we switched the R bindings to libcurl's
curl_url_set()for building URLs, a lot of users reported the same problem: many public http servers expect colons:in the path to NOT be url-encoded. This affects almost all of these AI chat APIs, which use URI schemes with colons.The core issue is similar to the
curl_url_set()treatment of slashes:curl/docs/libcurl/curl_url_set.md
Lines 187 to 188 in 7c23e88
Perhaps colons deserve similar treatment, because they are commonly used as URI resource separators. For example:
Here we want
{myprojectid}and(default)get URL encoded, butUsers:runQueryshould not. Right now there is no good way to do this, because if I callcurl_url_set()withCURLU_URLENCODEthen the colon gets encoded and the server does not understand the request.However there is also no workaround: if we do not
CURLU_URLENCODEwe need to escape the input ourselves beforehand, but we run into this documented problem that also the slashes get escaped:curl/docs/libcurl/curl_easy_escape.md
Lines 54 to 57 in 7c23e88
Interestingly, the above already mentions colons as a special case, but perhaps this is about the colon in the scheme prefix, not the path?
Upon further search there are quite some services that use colons in the URL scheme, for example wikipedia, however in this case the server is smart enough to understand the url-encoded variant as well, and add redirect:
I am not convinced there is any practical need for url-encoding the
:in paths. What purpose does it serve? All servers I found seem to accept or even require an unencoded:in the path. Is it just to distinguish from the colon in the scheme prefix of the url? I doubt any server will mistake the entire part of the URL up till the latest:as the scheme?