-
-
Notifications
You must be signed in to change notification settings - Fork 6.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
handling URL with http:/// (3 slashes between protocol and domain) #791
Comments
I can confirm both Firefox and Chrome will skip seemingly any number of arbitrary slashes after the scheme. Is that some legacy issue? I don't think it's correct to do that. |
I agree that it isn't correct, and RFC 7230 makes a missing host in an absolute http URI explicitly invalid now (MUST NOT send, if received, MUST treat as invalid). Earlier RFCs never seemed to make that explicit or clear enough, apparently. Worse, this comment implies that Firefox used to reject HTTP URLs with the wrong number of slashes after the scheme: http://superuser.com/questions/352133/why-do-file-urls-start-with-3-slashes#comment388378_352134 Even worse: this autocorrection appears intentional in Chrome and is wontfix: I don't have a strong opinion on whether you should implement the same autocorrection in curl after researching the above. Like I said it would be nice and it would save me some time, but I can code a workaround. |
First, let's not add file:/// to the confusion. It has three slashes in the correct case. http:// is different. A http:// URI cannot work without a host name. RFC 3986 dictates how URIs work and there's no room for three slashes for HTTP. But sure, Chrome and Firefox most probably do this for "web compatibility" as we like to call it. Meaning that enough other clients break the specs to make you want to do it as well as otherwise users get upset and think your product is flawed. Similar to handling space in Location: headers, which we already do for exactly that reason. So, I would not be against a patch that makes curl act like the popular browsers in this regard. After all, people use curl to mimic browsers to a large extent and not acting like browsers in this aspect makes curl not deliver that promise for these users. |
Firefox does in fact accept one or more slashes for HTTP and HTTPS redirects. I just tested redirects with one slash and I tested with 10 slashes using Firefox. They all redirect fine. |
On that note, it would be nice to have a CURLOPT_REDIRECTFUNCTION option so that an application could easily implements its own behavior by re-writing the redirection URL or rejecting it, causing the transfer to fail. If we had this, one could implement this and support such malformed URLs. |
@tomuta You can use CURLINFO_REDIRECT_URL to manually redirect. #define MAXREDIRS 50
int redir_count;
for(redir_count = 0; redir_count < MAXREDIRS; ++redir_count) {
char *url = NULL;
res = curl_easy_perform(curl);
if(res || curl_easy_getinfo(curl, CURLINFO_REDIRECT_URL, &url) || !url)
break;
/* redirect needed. this is where you could make a copy of the url and modify that */
curl_easy_setopt(curl, CURLOPT_URL, url);
}
if(redir_count == MAXREDIRS) {
fprintf(stderr, "\nError: Maximum (%d) redirects followed\n", MAXREDIRS);
} |
To allow one, two or three slashes. Something like this could be applied: diff --git a/lib/url.c b/lib/url.c
index 70ccd0f..f07dd39 100644
--- a/lib/url.c
+++ b/lib/url.c
@@ -4133,16 +4133,22 @@ static CURLcode parseurlandfillconn(struct SessionHandle *data,
protop = "file"; /* protocol string */
}
else {
/* clear path */
+ char slashbuf[4];
path[0]=0;
- if(2 > sscanf(data->change.url,
- "%15[^\n:]://%[^\n/?]%[^\n]",
- protobuf,
- conn->host.name, path)) {
+ rc = sscanf(data->change.url,
+ "%15[^\n:]:%3[/]%[^\n/?]%[^\n]",
+ protobuf, slashbuf, conn->host.name, path);
+ if(2 == rc) {
+ failf(data, "Bad URL");
+ return CURLE_URL_MALFORMAT;
+ }
+ if(3 > rc) {
/*
* The URL was badly formatted, let's try the browser-style _without_
* protocol specified like 'http://'.
*/ |
Apparently browsers support any amount of slashes. They do that because their spec says so. And the spec says so because they do that. |
The rantcurl has actually never been very strict or particular with its URL parsing. I mean, it even accepts URLs on the command line with the "scheme://" part completely left out, which never has been considered a URL by anyone. It also only parses the "bare minimum" for it to be able to do what it needs to, which means that it accepts other sorts of illegal URLs as well if you just want to. I've given this a lot of thoughts and I've discussed the WHATWG-URL "standard" widely and intensely the last couple of days. I think it would be a tactical mistake to give up completely and say we accept the WHATWG-URL as a standard. They run and write their "standard" as they see fit only for browsers without proper concern for the entire web and ecosystem. That said, hopefully they will come around at some point and we can work on converting their doc into a "real" standard. That would be of a huge benefit for the web. This said, I think we need to be realists and adapt to the world around us and when the WHATWG clearly says these URLs are fine and a huge portion of browsers accept them, it forces us to act. Sure we can say they're not RFC3986 compliant and refuse to work with them. But who'd be happy with that in the long run? I don't think we in the curl project have enough power to make such a stance have any effect on the servers and providers that send back broken URLs in headers. They will just curse curl and continue to successfully use browsers against said servers. The intentI intend to merge a patch similar to what I described above, after the curl 7.49.0 release to give us time to test it out and feel it. It will accept one, two or three slashes only and it will complain in the verbose output for anything that isn't two slashes and it will rewrite the URL internally to the correct look so that extracting the URL or passing it onward to proxies etc will still use the correct format. |
How is /// any more likely than //// ? They both seem really really unlikely. Is there some common server configuration error which causes the former? |
Very anecdotal "evidence" only so more of a hunch or a guess. We've seen the former (in this bug report) and not the latter. When I've complained to whatwg people some of them have hinted that URLs "like this" (unclear how many slashes that imply) are being found to at least a measurable extent and finally I'm just guessing that fewer slashes are more likely than more. Like /// as a typo is more common than //// just because the first is a single letter mistake and the second means twice as many mistakes. If we at a later point reconsider and have a reason to start accepting more slashes, then there's nothing preventing us from revisiting this topic. |
On 5/17/16, Daniel Stenberg notifications@github.com wrote:
Could see "///*" for misconfigured CMSs or otherwise auto-generated
|
There has been no data provided in this discussion, just random people making up random statements. Me included. I've mentioned that it would be possible to add a counter in Firefox or similar, but (A) I'm not sure it would be accepted by the maintainers of said code, (B) I don't feel like writing that code and (C) I fear whatever number would get out of that won't make a difference in the end. |
I did this
The url is communicating with a live webserver that is returning a malformed location field in the 301 HTTP response.
I expected the following
Successful redirect to http://www.bozardford.net
despite the fact that there was an extra slash between the protocol and the domain name in the location field of the 301 response header "http:///www.bozardford.net"
Firefox 45 and Chrome 49 both handle the malformed location field by ignoring the extra slash and redirected according to what was meant.
What I got
curl/libcurl version
Also happens in pycurl:
Haven't had time to build a more recent libcurl version to test this on, but I haven't found any previous mention of this problem in the issues on github or in an internet search.
operating system
Ubuntu 14.04 LTS
Like I said, it appears that current browsers handle location fields malformed in this way in a manner that makes implicit sense. It would be nice for curl to do this as well, instead of my having to build out specialized redirection handling in my code.
The text was updated successfully, but these errors were encountered: