Some web servers (IIS for example) response with error if the URL is not url encoded.
We can url encode before request but if the website is redirecting to another page and the location param in HTTP header is not url encoded, we will see error.
However, browsers(Chrome for example) will url encode the location param of HTTP headers before sending destination request.
I'm using CURL in PHP.
Is there any solution to this problem? We are seeing lots of these errors lately.
Is there anyway to url encode the location param in HTTP header before redirecting? It's very hard for me to change my program in order to prevent redirects and handle this situation outside CURL.
URLs are per definition URL encoded already. Otherwise it is not a URL.
HTTP redirects should by definition redirect to URLs and they MUST be URL encoded already. Not doing so is a violation of the HTTP spec.
So, stupid servers act stupidly and browsers are weak to follow the lowest denominator (for reasons) so when someone started this ill-thought habit other browsers followed and so yeah I'm sure Chrome and others do this. libcurl actually already do this to a very small extent by URL-encoding plain spaces that appear in redirect URLs just because it is so common and the browsers allow this so sites continue doing it.
"this" being trying to take junk thrown at it claiming to be a URL and magically transposing it into a valid URL syntax. But it's not really a conversion that is defined anywhere and its not foolproof. It is not an "URL encoding" that needs to be applied because a blind URL encoding on something that is already supposed to be URL encoded will just ruin it completely.
So, let's talk specifics. What's the exact string your server sends in the redirect that chrome (and others?) deal with properly that libcurl doesn't do the same thing with?
Thanks for your quick response.
I'm completely agree with you. every body should obey the specs.
URL below is an example and it's not my website.
You can see the response headers below:
HTTP/1.0 301 Moved Permanently
Date: Wed, 07 Oct 2015 05:03:33 GMT
HTTP/1.0 500 Internal Server Error
Date: Wed, 07 Oct 2015 05:03:36 GMT
Gah, this is completely horrible. The browser makes a character set conversion somehow?
I'm not expert on this issue.
But it seems the browser is detecting the character set and then encodes the URL!
That's a genuinely tricky problem and not something we can easily add to libcurl, even if we wanted to. :-(
Every day lots of non-english websites are adding their title to the URL, in order to get better search ranks.
I'm wondering why this issue was not brought up before, because we are seeing this more and more.
I'm not very familiar with how libcurl works.
If we could have a callback that we can change the location param before redirect, maybe we can url encode it to a URL that web servers can accept.
In this specific case it seems that is UTF8 data that comes in the Location: header that then gets converted into %-encoded bytes by the browser.
I could see us attempt to do that if we believe it has a fair chance of working on sites like this.
Regarding a callback for redirects, I don't think we need that. libcurl already offers a rather good API that returns enough info to allow a caller to do the redirects themselves if the built-in logic isn't good enough.
I agree that there is no need for a dedicated callback.
One can stop libcurl from following redirects, and use a header function to get the redirected-URL from 'Location:' , and then do whatever he/she pleases with that URL.
I also had the same problem with redirect urls having turkish characters. I tested with wget and it seemed to handle the redirect. I am using the command line curl compiled from github repo.
well, wget struggles with the problems just the same way but then they've gone further than we have in this area. It doesn't really help us though.
@devrimbaris in your case then, was it also just "pure" UTF8 that had to be %-encoded?
converting the >= 0x80 bytes to %-encoded string seems to be what it expects. I'm leaning towards making curl do this since the browsers do. I'll test some more before I make a decision.
http redirects: %-encode bytes outside of ascii range
Apparently there are sites out there that do redirects to URLs they
provide in plain UTF-8 or similar. Browsers and wget %-encode such
headers when doing a subsequent request. Now libcurl does too.
Added test 1138 to verify.
Thank you kindly for considering and solving the issue. I am looking forward to the changes in the next release of curl.