Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

curl doesn't follow Refresh: header redirects #3657

Closed
wdmcclen opened this Issue Mar 8, 2019 · 15 comments

Comments

Projects
None yet
4 participants
@wdmcclen
Copy link

wdmcclen commented Mar 8, 2019

I did this

I used the attached script with CURL (curlbugphp.txt is the PHP code).

URL to reproduce is:
curlbigphp.txt

http://localhost/curlbug.php?id=32211102999336

I expected the following

I expected to get back the webpage response but got the response (both in PHP and "curl" command line in the attached file (curlbugresponse.txt).

curl/libcurl version

curl 7.54.0 (x86_64-apple-darwin18.0) libcurl/7.54.0
curlbugresponse.txt
LibreSSL/2.6.4 zlib/1.2.11 nghttp2/1.24.1
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtsp smb smbs smtp smtps telnet tftp
Features: AsynchDNS IPv6 Largefile GSS-API Kerberos SPNEGO NTLM NTLM_WB SSL libz HTTP2 UnixSockets HTTPS-proxy

operating system

OSX 10.13 and OS X 10.11

@bagder

This comment has been minimized.

Copy link
Member

bagder commented Mar 8, 2019

You're reporting a problem with a really old curl version but more importantly, the target site your PHP script uses doesn't respond:

* Failed to connect to libcat.pspl.org port 8480: Connection timed out

It would be more helpful if you instead would show us the exact response headers in the case of the problem.

@bagder bagder added the HTTP label Mar 8, 2019

@wdmcclen

This comment has been minimized.

Copy link
Author

wdmcclen commented Mar 8, 2019

I included that in the "curlbugresponse.txt file. I'll check the URL, but I know it's correct, as you can see from the curlbugresponse.txt file.

If you use the URL in the script (attached), you will see it. If you used the one above, that is if you want to use the test script I sent to reproduce it.

FYI, I'm using the delivered version of "php" with Mojave. But I have tried it on other releases at the 7.x level.

Happy to try it on newer versions, but I see no reports that match and no resutls to indicate it might work in later releases.

@wdmcclen

This comment has been minimized.

Copy link
Author

wdmcclen commented Mar 8, 2019

Here is the URL inside the PHP script - if you want to try it outside the example script.

https://libcat.pspl.org:8480/TLCScripts/interpac.dll?Search&Config=pac&SearchField=8388608&ItemsPerPage=10&SearchData=32211102999336

@wdmcclen

This comment has been minimized.

Copy link
Author

wdmcclen commented Mar 8, 2019

The PHP file had to be renamed to ".txt" as the interface would not let me drag and drop the ".php" file.

@bagder

This comment has been minimized.

Copy link
Member

bagder commented Mar 8, 2019

Ah, thanks I missed the response file you attached. It does indeed highlight exactly what the problem is. These are the response headers:

HTTP/1.1 200 OK
Content-Type: text/html
Server: Microsoft-IIS/8.5
Refresh: 0; URL=interpac.dll?LabelDisplay&Config=PAC&Branch=,0,&FormId=1869006&RecordNumber=593377
X-Powered-By: ASP.NET
Date: Fri, 08 Mar 2019 15:05:50 GMT
Content-Length: 74

This "redirect" is done using the Refresh: header, which actually is not a header curl supports, and never has supported either. Refresh is own of those evil headers that aren't actually defined by the HTTP standard for redirects.

Historically, it seems it was considered for HTTP 1.1 but never made it!

I also found more useful info on the header and support for it in this blog post. Given that we've managed without support for this header for so long, I'm not 100% convinced adding support now is necessary.

Or is it?

@bagder bagder changed the title CURL 7.54.0 Not following this IIS response curl doesn't follow Refresh: header redirects Mar 8, 2019

@jay

This comment has been minimized.

Copy link
Member

jay commented Mar 9, 2019

I'm not convinced either. Has anyone else reported this? OTOH Firefox 67 and Chrome 72 process the header. Empirical: Number required; comma, semi-colon or space; with or without url=; and url. Probably like Refresh: ([0-9]+)[,; ] *(?:url=)?(.+)

@bagder

This comment has been minimized.

Copy link
Member

bagder commented Mar 10, 2019

Random data point: it seems the request library doesn't support it either, and their earliest report is from 2011.

@bagder

This comment has been minimized.

Copy link
Member

bagder commented Mar 11, 2019

(I just posted this email to the HTTPbis mailing list, pasted here and slightly reformatted for looks)

Hi friends!

The other day someone filed a bug on curl that we don't support redirects with the Refresh header. This took me down a rabbit hole and I figured I would share with you what I learned down there.

As you all know, redirects in HTTP is specified to use 3xx response codes and a Location: header to point out the the URL (I'll use the term URL here but you know what I mean). This has been the case since RFC 1945 (HTTP/1.0). According to an old mail from Roy, Refresh "didn't make it" into that spec.

The little detail that it never made it into the 1.0 spec (nor any later one) doesn't seem to have affected the browsers. Still today, browsers keep supporting the Refresh header as a sort of Location: replacement even though it seems to never have been present in a HTTP spec.

How frequent is the use of the Refresh header? I decided to make an attempt to figure out, and for this venture I used the Rapid7 data trove. The method that data is collected with may not be the best, but it is still 52+ million HTTP responses from different current HTTP servers. (52254873 exactly in my data dump)

My counts show

  • Location is used in 18.49% of the responses
  • Refresh is used in 0.01738% of the responses
  • Location is thus used 1064 times more often than Refresh
  • In 35% of the cases when Refresh is used, Location is also used
  • curl thus handles 99.9939% of the redirects in this test

Other random notes

  • When Refresh is the only redirect header, the response code is usually 200
    (with 404 being the second most)
  • When both headers are used, the response code is almost always 30x
  • When both are used, it is common to redirect to the same target and it is also common for the Refresh header value to only contain a number (for the number of seconds until "refresh").

Contents

Redirects can also be done by meta tags and sending the refresh that way, but I have not investigated how common as that isn't strictly speaking HTTP so it is outside of my research (and interest) here.

Conclusion

Nah, sorry, I don't have any. Yet another undocumented quirky corner of the web I suppose.

@bagder

This comment has been minimized.

Copy link
Member

bagder commented Mar 20, 2019

There's no documentation for this header. It is not present in the HTTP standard. It is not implemented by other non-browser HTTP toolkits. I think the sane thing to do here is to push the browsers into dropping support for this abomination. Right now, I don't think it is in our interest to implement support for this header and just extend the pain in the world.

Do contact the site you interact with and urge that they switch to a standard HTTP header!

(My post here and to the list was also subsequently turned into a blogpost of mine that has some additional feedback that I've received when discussing this topic outside of the curl project.)

@bagder bagder closed this Mar 20, 2019

@wdmcclen

This comment has been minimized.

Copy link
Author

wdmcclen commented Mar 20, 2019

Just want to add, that while I totally agree that this is NOT the standard and should not be done - the browsers all support this because they must. t's in the field and they get it all the time.
FYI, this is an IIS webserver (v 8.5) and says it is using HTTP 1.1. Therefore, there will be many such cases where this is going to be pumped out - until a certain Redmond firm fixes their issue. Good luck with that. I can think of a good reason for a browser to support it. I cant think of ANY good reasons for a webserer to support and use it. Nontheless....

@bagder

This comment has been minimized.

Copy link
Member

bagder commented Mar 20, 2019

they get it all the time

At 0.02% usage it can of course be qualified as "all the time" at Internet scale but I think I'd phrase it differently. I'm looking forward to the use counter results from Chrome, mentioned in my blog post.

@wdmcclen

This comment has been minimized.

Copy link
Author

wdmcclen commented Mar 20, 2019

Interesting. What is the sample size? 0.02% seems quite small to have warranted the browser code base to have supported it.

Perhaps this is s the end of this particular issue as it nears 0.00% it may be that it's been phased out and awaiting complete annihilation.

@jzakrzewski

This comment has been minimized.

Copy link
Contributor

jzakrzewski commented Mar 20, 2019

@wdmcclen The analysis (and the numbers) is not only in the blog post @bagder linked but also just a few comments above!

@wdmcclen

This comment has been minimized.

Copy link
Author

wdmcclen commented Mar 20, 2019

I read the info that was posted earlier the first time. 52M is a pittance of total traffic. It's why I asked about the sample size. 52M tells me nothing without telling me the rate and the period. Is that 52M per minute? Per Second? Over the course of 1 hour? Only US traffic? Lots of other questions, Worldwidewebsize reports that on 2/17/2019 the number of "Google" indexed webpages was ~64.5B pages.

As a measure of raw throughput, Internet Live Stats shows at this moment total internet traffic today is 3.885 Billion GB's SO FAR.

That is not to say that badger's response wasn't fastastic - it was. It was quick, used a common source of data, and provided a good analysis of that traffic as it related to this issue and curl's thoroughness.

Fascinating to look into.

@bagder

This comment has been minimized.

Copy link
Member

bagder commented Mar 20, 2019

52M tells me nothing without telling me the rate and the period. Is that 52M per minute? Per

Then read the blog post or check the source where I got the data. It is explained in both places, more detailed in the latter. Those are 52 million different origins.

total internet traffic today is 3.885 Billion GB's SO FAR

By all means do your own measurements. I doubt you'll find a significant different usage level. I focused on responses from different origins, you can focus on something else.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.