New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when location param is not url encoded in redirects #473

Closed
p-developer opened this Issue Oct 7, 2015 · 12 comments

Comments

Projects
None yet
4 participants
@p-developer

p-developer commented Oct 7, 2015

Some web servers (IIS for example) response with error if the URL is not url encoded.

We can url encode before request but if the website is redirecting to another page and the location param in HTTP header is not url encoded, we will see error.

However, browsers(Chrome for example) will url encode the location param of HTTP headers before sending destination request.

I'm using CURL in PHP.

Is there any solution to this problem? We are seeing lots of these errors lately.

Is there anyway to url encode the location param in HTTP header before redirecting? It's very hard for me to change my program in order to prevent redirects and handle this situation outside CURL.

@bagder

This comment has been minimized.

Member

bagder commented Oct 7, 2015

URLs are per definition URL encoded already. Otherwise it is not a URL.

HTTP redirects should by definition redirect to URLs and they MUST be URL encoded already. Not doing so is a violation of the HTTP spec.

So, stupid servers act stupidly and browsers are weak to follow the lowest denominator (for reasons) so when someone started this ill-thought habit other browsers followed and so yeah I'm sure Chrome and others do this. libcurl actually already do this to a very small extent by URL-encoding plain spaces that appear in redirect URLs just because it is so common and the browsers allow this so sites continue doing it.

"this" being trying to take junk thrown at it claiming to be a URL and magically transposing it into a valid URL syntax. But it's not really a conversion that is defined anywhere and its not foolproof. It is not an "URL encoding" that needs to be applied because a blind URL encoding on something that is already supposed to be URL encoded will just ruin it completely.

So, let's talk specifics. What's the exact string your server sends in the redirect that chrome (and others?) deal with properly that libcurl doesn't do the same thing with?

@bagder bagder self-assigned this Oct 7, 2015

@bagder bagder added the HTTP label Oct 7, 2015

@p-developer

This comment has been minimized.

p-developer commented Oct 7, 2015

Thanks for your quick response.
I'm completely agree with you. every body should obey the specs.

URL below is an example and it's not my website.
http://ifnaa.ir/fa/news/27973/%D8%A2%D8%BA%D8%A7%D8%B2-%D8%B3%D9%85-%D8%B2%D8%AF%D8%A7%DB%8C%DB%8C-%D8%A7%D8%B2-%D8%A8%D8%A7%D8%B2%D8%A7%D8%B1-%D9%BE%D9%88%D9%84

You can see the response headers below:

HTTP/1.0 301 Moved Permanently
Date: Wed, 07 Oct 2015 05:03:33 GMT
Server: Microsoft-IIS/6.0
Vary: Accept-Encoding
X-Powered-By-Plesk: PleskWin
X-Powered-By: ASP.NET
X-AspNet-Version: 4.0.30319
Location: http://www.ifnaa.ir/fa/news/27973/آغاز-سم-زدایی-از-بازار-پول
Cache-Control: private
Content-Length: 0
Connection: close

HTTP/1.0 500 Internal Server Error
Content-Length: 91
Content-Type: text/html
Server: Microsoft-IIS/6.0
Date: Wed, 07 Oct 2015 05:03:36 GMT
Connection: close

@bagder

This comment has been minimized.

Member

bagder commented Oct 7, 2015

Gah, this is completely horrible. The browser makes a character set conversion somehow?

@p-developer

This comment has been minimized.

p-developer commented Oct 7, 2015

I'm not expert on this issue.
But it seems the browser is detecting the character set and then encodes the URL!

@bagder

This comment has been minimized.

Member

bagder commented Oct 7, 2015

That's a genuinely tricky problem and not something we can easily add to libcurl, even if we wanted to. :-(

@p-developer

This comment has been minimized.

p-developer commented Oct 7, 2015

Every day lots of non-english websites are adding their title to the URL, in order to get better search ranks.
I'm wondering why this issue was not brought up before, because we are seeing this more and more.
I'm not very familiar with how libcurl works.
If we could have a callback that we can change the location param before redirect, maybe we can url encode it to a URL that web servers can accept.

@bagder

This comment has been minimized.

Member

bagder commented Oct 7, 2015

In this specific case it seems that is UTF8 data that comes in the Location: header that then gets converted into %-encoded bytes by the browser.

I could see us attempt to do that if we believe it has a fair chance of working on sites like this.

Regarding a callback for redirects, I don't think we need that. libcurl already offers a rather good API that returns enough info to allow a caller to do the redirects themselves if the built-in logic isn't good enough.

@MoSal

This comment has been minimized.

Contributor

MoSal commented Oct 7, 2015

I agree that there is no need for a dedicated callback.

One can stop libcurl from following redirects, and use a header function to get the redirected-URL from 'Location:' , and then do whatever he/she pleases with that URL.

@devrimbaris

This comment has been minimized.

devrimbaris commented Oct 8, 2015

I also had the same problem with redirect urls having turkish characters. I tested with wget and it seemed to handle the redirect. I am using the command line curl compiled from github repo.

@bagder

This comment has been minimized.

Member

bagder commented Oct 8, 2015

well, wget struggles with the problems just the same way but then they've gone further than we have in this area. It doesn't really help us though.

@devrimbaris in your case then, was it also just "pure" UTF8 that had to be %-encoded?

@bagder

This comment has been minimized.

Member

bagder commented Nov 2, 2015

converting the >= 0x80 bytes to %-encoded string seems to be what it expects. I'm leaning towards making curl do this since the browsers do. I'll test some more before I make a decision.

@p-developer

This comment has been minimized.

p-developer commented Nov 17, 2015

Thank you kindly for considering and solving the issue. I am looking forward to the changes in the next release of curl.

@lock lock bot locked as resolved and limited conversation to collaborators May 7, 2018

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.