Error when location param is not url encoded in redirects #473

Closed
p-developer opened this Issue Oct 7, 2015 · 12 comments

Projects

None yet

4 participants

@p-developer

Some web servers (IIS for example) response with error if the URL is not url encoded.

We can url encode before request but if the website is redirecting to another page and the location param in HTTP header is not url encoded, we will see error.

However, browsers(Chrome for example) will url encode the location param of HTTP headers before sending destination request.

I'm using CURL in PHP.

Is there any solution to this problem? We are seeing lots of these errors lately.

Is there anyway to url encode the location param in HTTP header before redirecting? It's very hard for me to change my program in order to prevent redirects and handle this situation outside CURL.

@bagder
Member
bagder commented Oct 7, 2015

URLs are per definition URL encoded already. Otherwise it is not a URL.

HTTP redirects should by definition redirect to URLs and they MUST be URL encoded already. Not doing so is a violation of the HTTP spec.

So, stupid servers act stupidly and browsers are weak to follow the lowest denominator (for reasons) so when someone started this ill-thought habit other browsers followed and so yeah I'm sure Chrome and others do this. libcurl actually already do this to a very small extent by URL-encoding plain spaces that appear in redirect URLs just because it is so common and the browsers allow this so sites continue doing it.

"this" being trying to take junk thrown at it claiming to be a URL and magically transposing it into a valid URL syntax. But it's not really a conversion that is defined anywhere and its not foolproof. It is not an "URL encoding" that needs to be applied because a blind URL encoding on something that is already supposed to be URL encoded will just ruin it completely.

So, let's talk specifics. What's the exact string your server sends in the redirect that chrome (and others?) deal with properly that libcurl doesn't do the same thing with?

@bagder bagder self-assigned this Oct 7, 2015
@bagder bagder added the HTTP label Oct 7, 2015
@p-developer

Thanks for your quick response.
I'm completely agree with you. every body should obey the specs.

URL below is an example and it's not my website.
http://ifnaa.ir/fa/news/27973/%D8%A2%D8%BA%D8%A7%D8%B2-%D8%B3%D9%85-%D8%B2%D8%AF%D8%A7%DB%8C%DB%8C-%D8%A7%D8%B2-%D8%A8%D8%A7%D8%B2%D8%A7%D8%B1-%D9%BE%D9%88%D9%84

You can see the response headers below:

HTTP/1.0 301 Moved Permanently
Date: Wed, 07 Oct 2015 05:03:33 GMT
Server: Microsoft-IIS/6.0
Vary: Accept-Encoding
X-Powered-By-Plesk: PleskWin
X-Powered-By: ASP.NET
X-AspNet-Version: 4.0.30319
Location: http://www.ifnaa.ir/fa/news/27973/آغاز-سم-زدایی-از-بازار-پول
Cache-Control: private
Content-Length: 0
Connection: close

HTTP/1.0 500 Internal Server Error
Content-Length: 91
Content-Type: text/html
Server: Microsoft-IIS/6.0
Date: Wed, 07 Oct 2015 05:03:36 GMT
Connection: close

@bagder
Member
bagder commented Oct 7, 2015

Gah, this is completely horrible. The browser makes a character set conversion somehow?

@p-developer

I'm not expert on this issue.
But it seems the browser is detecting the character set and then encodes the URL!

@bagder
Member
bagder commented Oct 7, 2015

That's a genuinely tricky problem and not something we can easily add to libcurl, even if we wanted to. :-(

@p-developer

Every day lots of non-english websites are adding their title to the URL, in order to get better search ranks.
I'm wondering why this issue was not brought up before, because we are seeing this more and more.
I'm not very familiar with how libcurl works.
If we could have a callback that we can change the location param before redirect, maybe we can url encode it to a URL that web servers can accept.

@bagder
Member
bagder commented Oct 7, 2015

In this specific case it seems that is UTF8 data that comes in the Location: header that then gets converted into %-encoded bytes by the browser.

I could see us attempt to do that if we believe it has a fair chance of working on sites like this.

Regarding a callback for redirects, I don't think we need that. libcurl already offers a rather good API that returns enough info to allow a caller to do the redirects themselves if the built-in logic isn't good enough.

@MoSal
Contributor
MoSal commented Oct 7, 2015

I agree that there is no need for a dedicated callback.

One can stop libcurl from following redirects, and use a header function to get the redirected-URL from 'Location:' , and then do whatever he/she pleases with that URL.

@devrimbaris

I also had the same problem with redirect urls having turkish characters. I tested with wget and it seemed to handle the redirect. I am using the command line curl compiled from github repo.

@bagder
Member
bagder commented Oct 8, 2015

well, wget struggles with the problems just the same way but then they've gone further than we have in this area. It doesn't really help us though.

@devrimbaris in your case then, was it also just "pure" UTF8 that had to be %-encoded?

@bagder
Member
bagder commented Nov 2, 2015

converting the >= 0x80 bytes to %-encoded string seems to be what it expects. I'm leaning towards making curl do this since the browsers do. I'll test some more before I make a decision.

@bagder bagder added a commit that closed this issue Nov 2, 2015
@bagder bagder http redirects: %-encode bytes outside of ascii range
Apparently there are sites out there that do redirects to URLs they
provide in plain UTF-8 or similar. Browsers and wget %-encode such
headers when doing a subsequent request. Now libcurl does too.

Added test 1138 to verify.

Closes #473
3f7b1bb
@bagder bagder closed this in 3f7b1bb Nov 2, 2015
@p-developer

Thank you kindly for considering and solving the issue. I am looking forward to the changes in the next release of curl.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment