Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign uphandling URL with http:/// (3 slashes between protocol and domain) #791
Comments
This comment has been minimized.
This comment has been minimized.
I can confirm both Firefox and Chrome will skip seemingly any number of arbitrary slashes after the scheme. Is that some legacy issue? I don't think it's correct to do that. |
This comment has been minimized.
This comment has been minimized.
I agree that it isn't correct, and RFC 7230 makes a missing host in an absolute http URI explicitly invalid now (MUST NOT send, if received, MUST treat as invalid). Earlier RFCs never seemed to make that explicit or clear enough, apparently. Worse, this comment implies that Firefox used to reject HTTP URLs with the wrong number of slashes after the scheme: http://superuser.com/questions/352133/why-do-file-urls-start-with-3-slashes#comment388378_352134 Even worse: this autocorrection appears intentional in Chrome and is wontfix: I don't have a strong opinion on whether you should implement the same autocorrection in curl after researching the above. Like I said it would be nice and it would save me some time, but I can code a workaround. |
bagder
added
the
HTTP
label
May 5, 2016
This comment has been minimized.
This comment has been minimized.
First, let's not add file:/// to the confusion. It has three slashes in the correct case. http:// is different. A http:// URI cannot work without a host name. RFC 3986 dictates how URIs work and there's no room for three slashes for HTTP. But sure, Chrome and Firefox most probably do this for "web compatibility" as we like to call it. Meaning that enough other clients break the specs to make you want to do it as well as otherwise users get upset and think your product is flawed. Similar to handling space in Location: headers, which we already do for exactly that reason. So, I would not be against a patch that makes curl act like the popular browsers in this regard. After all, people use curl to mimic browsers to a large extent and not acting like browsers in this aspect makes curl not deliver that promise for these users. |
This comment has been minimized.
This comment has been minimized.
Firefox does in fact accept one or more slashes for HTTP and HTTPS redirects. I just tested redirects with one slash and I tested with 10 slashes using Firefox. They all redirect fine. |
This comment has been minimized.
This comment has been minimized.
tomuta
commented
May 6, 2016
On that note, it would be nice to have a CURLOPT_REDIRECTFUNCTION option so that an application could easily implements its own behavior by re-writing the redirection URL or rejecting it, causing the transfer to fail. If we had this, one could implement this and support such malformed URLs. |
This comment has been minimized.
This comment has been minimized.
@tomuta You can use CURLINFO_REDIRECT_URL to manually redirect. #define MAXREDIRS 50
int redir_count;
for(redir_count = 0; redir_count < MAXREDIRS; ++redir_count) {
char *url = NULL;
res = curl_easy_perform(curl);
if(res || curl_easy_getinfo(curl, CURLINFO_REDIRECT_URL, &url) || !url)
break;
/* redirect needed. this is where you could make a copy of the url and modify that */
curl_easy_setopt(curl, CURLOPT_URL, url);
}
if(redir_count == MAXREDIRS) {
fprintf(stderr, "\nError: Maximum (%d) redirects followed\n", MAXREDIRS);
} |
This comment has been minimized.
This comment has been minimized.
To allow one, two or three slashes. Something like this could be applied: diff --git a/lib/url.c b/lib/url.c
index 70ccd0f..f07dd39 100644
--- a/lib/url.c
+++ b/lib/url.c
@@ -4133,16 +4133,22 @@ static CURLcode parseurlandfillconn(struct SessionHandle *data,
protop = "file"; /* protocol string */
}
else {
/* clear path */
+ char slashbuf[4];
path[0]=0;
- if(2 > sscanf(data->change.url,
- "%15[^\n:]://%[^\n/?]%[^\n]",
- protobuf,
- conn->host.name, path)) {
+ rc = sscanf(data->change.url,
+ "%15[^\n:]:%3[/]%[^\n/?]%[^\n]",
+ protobuf, slashbuf, conn->host.name, path);
+ if(2 == rc) {
+ failf(data, "Bad URL");
+ return CURLE_URL_MALFORMAT;
+ }
+ if(3 > rc) {
/*
* The URL was badly formatted, let's try the browser-style _without_
* protocol specified like 'http://'.
*/ |
This comment has been minimized.
This comment has been minimized.
Apparently browsers support any amount of slashes. They do that because their spec says so. And the spec says so because they do that. |
Lukasa
referenced this issue
May 11, 2016
Closed
Consider replacing urlparse with something that can handle the WHATWG's 'generous' definition of a URL. #859
bagder
changed the title
curl handling of HTTP 301 redirection fails when response location header starts with http:///<domain> (3 slashes between protocol and domain))
handling URL with http:/// (3 slashes between protocol and domain)
May 16, 2016
bagder
self-assigned this
May 17, 2016
This comment has been minimized.
This comment has been minimized.
The rantcurl has actually never been very strict or particular with its URL parsing. I mean, it even accepts URLs on the command line with the "scheme://" part completely left out, which never has been considered a URL by anyone. It also only parses the "bare minimum" for it to be able to do what it needs to, which means that it accepts other sorts of illegal URLs as well if you just want to. I've given this a lot of thoughts and I've discussed the WHATWG-URL "standard" widely and intensely the last couple of days. I think it would be a tactical mistake to give up completely and say we accept the WHATWG-URL as a standard. They run and write their "standard" as they see fit only for browsers without proper concern for the entire web and ecosystem. That said, hopefully they will come around at some point and we can work on converting their doc into a "real" standard. That would be of a huge benefit for the web. This said, I think we need to be realists and adapt to the world around us and when the WHATWG clearly says these URLs are fine and a huge portion of browsers accept them, it forces us to act. Sure we can say they're not RFC3986 compliant and refuse to work with them. But who'd be happy with that in the long run? I don't think we in the curl project have enough power to make such a stance have any effect on the servers and providers that send back broken URLs in headers. They will just curse curl and continue to successfully use browsers against said servers. The intentI intend to merge a patch similar to what I described above, after the curl 7.49.0 release to give us time to test it out and feel it. It will accept one, two or three slashes only and it will complain in the verbose output for anything that isn't two slashes and it will rewrite the URL internally to the correct look so that extracting the URL or passing it onward to proxies etc will still use the correct format. |
This comment has been minimized.
This comment has been minimized.
How is /// any more likely than //// ? They both seem really really unlikely. Is there some common server configuration error which causes the former? |
This comment has been minimized.
This comment has been minimized.
Very anecdotal "evidence" only so more of a hunch or a guess. We've seen the former (in this bug report) and not the latter. When I've complained to whatwg people some of them have hinted that URLs "like this" (unclear how many slashes that imply) are being found to at least a measurable extent and finally I'm just guessing that fewer slashes are more likely than more. Like /// as a typo is more common than //// just because the first is a single letter mistake and the second means twice as many mistakes. If we at a later point reconsider and have a reason to start accepting more slashes, then there's nothing preventing us from revisiting this topic. |
This comment has been minimized.
This comment has been minimized.
On 5/17/16, Daniel Stenberg notifications@github.com wrote:
Could see "///*" for misconfigured CMSs or otherwise auto-generated
|
This comment has been minimized.
This comment has been minimized.
There has been no data provided in this discussion, just random people making up random statements. Me included. I've mentioned that it would be possible to add a counter in Firefox or similar, but (A) I'm not sure it would be accepted by the maintainers of said code, (B) I don't feel like writing that code and (C) I fear whatever number would get out of that won't make a difference in the end. |
cjbern commentedMay 5, 2016
•
edited
I did this
The url is communicating with a live webserver that is returning a malformed location field in the 301 HTTP response.
I expected the following
Successful redirect to http://www.bozardford.net
despite the fact that there was an extra slash between the protocol and the domain name in the location field of the 301 response header "http:///www.bozardford.net"
Firefox 45 and Chrome 49 both handle the malformed location field by ignoring the extra slash and redirected according to what was meant.
What I got
curl/libcurl version
Also happens in pycurl:
Haven't had time to build a more recent libcurl version to test this on, but I haven't found any previous mention of this problem in the issues on github or in an internet search.
operating system
Ubuntu 14.04 LTS
Like I said, it appears that current browsers handle location fields malformed in this way in a manner that makes implicit sense. It would be nice for curl to do this as well, instead of my having to build out specialized redirection handling in my code.