New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cURL incorrectly removes the trailing dot from HTTP Host headers #716

Closed
AlexYst opened this Issue Mar 16, 2016 · 18 comments

Comments

Projects
None yet
3 participants
@AlexYst

AlexYst commented Mar 16, 2016

I did this

I ran "curl --insecure https://alice.sni.velox.ch./" on the command line

I expected the following

I expected cURL to send the following information:
SNI host name: alice.sni.velox.ch
HTTP Host header: alice.sni.velox.ch.

curl/libcurl version

7.38.0

[curl -V output perhaps?]
curl 7.38.0 (x86_64-pc-linux-gnu) libcurl/7.38.0 OpenSSL/1.0.1k zlib/1.2.8 libidn/1.29 libssh2/1.4.3 librtmp/2.3
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtmp rtsp scp sftp smtp smtps telnet tftp
Features: AsynchDNS IDN IPv6 Largefile GSS-API SPNEGO NTLM NTLM_WB SSL libz TLS-SRP

operating system

Debian 8

To quote a couple specifications:
https://tools.ietf.org/html/rfc6066#section-3 (SNI)
"HostName" contains the fully qualified DNS hostname of the server,
as understood by the client. The hostname is represented as a byte
string using ASCII encoding without a trailing dot.

https://tools.ietf.org/html/rfc7230#section-5.4 (HTTP)
A client MUST send a Host header field in all HTTP/1.1 request
messages. If the target URI includes an authority component, then a
client MUST send a field-value for Host that is identical to that
authority component, excluding any userinfo subcomponent and its "@"
delimiter (Section 2.7.1).

That means that the SNI host name and HTTP Host header do not always match. The SNI host name must never have a trailing dot, but the HTTP Host header must reflect a host name that is identical to the host name of the URI, so if the URI's host has a trailing dot, the HTTP Host header must include that trailing dot.

For example, if the URI of a page is https://alice.sni.velox.ch./, the following values should be sent by the Web browser:
SNI host: alice.sni.velox.ch
HTTP host: alice.sni.velox.ch.

However, while cURL properly strips the trailing dot off of the SNI host name as per RFC 6066, it also incorrectly strips the trailing dot off of the HTTP Host header.

@jay

This comment has been minimized.

Member

jay commented Mar 16, 2016

Ref https://savannah.gnu.org/bugs/index.php?47408

As I noted there none of the browsers strip off the trailing dot. Frankly I think it's better to strip it, I'd guess it provides better compatibility but really I don't have anything to back that up.

@bagder

This comment has been minimized.

Member

bagder commented Mar 16, 2016

I disagree. I think it makes the most sense to treat SNI and Hosts: the same and that removing the dot is the sensible thing. But also, web servers such as Apache already strips that trailing dot themselves so sending the dot or not doesn't make a difference there.

As you already know but for newcomers, I took this issue to the HTTP WG mailing list earlier today, as I hope we can agree on a wider community on what the "correct" handling of this fields should be.

@bagder

This comment has been minimized.

Member

bagder commented Mar 17, 2016

I've tested a bunch of sites to add a dot to the Host: name when not using the dot in the SNI field, and that causes breakages. Lots of sites simply show different content than without the dot (presumably because it no longer matches a known site so I get the default virtual host's content instead) and for all IIS hosted sites I've tested I get HTTP/1.1 400 Bad Request if the names differ.

@bagder bagder self-assigned this Mar 17, 2016

@AlexYst

This comment has been minimized.

AlexYst commented Mar 17, 2016

Can you please give examples of such sites? When testing myself, I didn't find that error in any site that could handle a trailing dot in the HTTP Host header without TLS. Admittedly, I think IIS chokes on the trailing dot over HTTP or HTTPS. I've only seen the 400 error from Apache when sending a trailing dot in the SNI host name. NGINX doesn't even seem to care if the SNI host name is malformed.

@bagder

This comment has been minimized.

Member

bagder commented Mar 17, 2016

I didn't test HTTP-only. I wanted to see how it works with HTTPS if we'd follow both RFCs as you say in your original post you think we should (strip dot + include dot). It is clear that doing so leads to 400 bad requests or wrong content in a large number of cases. I think that would be worse that what we're doing now.

The other alternative I think is to switch (back) to how the browsers work and include the dot in both cases. I'm not sure but I suspect that has a slightly higher failure rate. We should run some tests with that as well.

@bagder

This comment has been minimized.

Member

bagder commented Mar 17, 2016

The list of sites returning a 400 on a nodot+dot combo included https://skandia.se, https://microsoft.se/ and https://sls.senecta.se/

@AlexYst

This comment has been minimized.

AlexYst commented Mar 17, 2016

It looks like all three of those websites return a 400 error over HTTP with a trailing dot on the domain as well. Those sites aren't choking on a "mismatch", they're choking on the trailing dot in the HTTP Host header, which is a bug on their end.

@bagder

This comment has been minimized.

Member

bagder commented Mar 17, 2016

Bug or not, it means (some) IIS servers will do this on a trailing dot.

A popular site showing significantly different responses with the dot present or not in the Host: header:

$ curl -v https://www.yahoo.com -H "Host: www.yahoo.com" 

vs

$ curl -v https://www.yahoo.com -H "Host: www.yahoo.com." 
@AlexYst

This comment has been minimized.

AlexYst commented Mar 18, 2016

The Web server for that site doesn't have a response for that HTTP Host header. It's a bug in the server, in this case, but blaming cURL for that and striping of the trailing dot is like blaming cURL for the server sending a 404 error when you request https://example.com/not/an/existing/file.xhtml and trying to avoid requesting such pages. If the user isn't looking for the site that uses the trailing dot, they just won't put the trailing dot in the domain name of the URI that they ask cURL to request.

@bagder

This comment has been minimized.

Member

bagder commented Mar 18, 2016

But can you show us a few sites that break when we remove the dot from the Host: header?

@AlexYst

This comment has been minimized.

AlexYst commented Mar 18, 2016

It's less common than sites that don't use "www.", that's for sure. I don't have any live examples at the moment.

My old website used to canonicalize URIs with redirects to the dotted version of the domain. That was back when I used HTTP instead of HTTPS though, and things seemed to break when I switched to HTTPS. I let people talk me into believing that the trailing dot is invalid in HTTPS URIs, so I removed the redirect. Now years later, when I can actually understand the specifications when I read them, I see that this wasn't an error in my URIs, but an error in most clients.

With that in mind, I'm reporting the bug in as many free software Web clients as I can test in, then going back to my preferred canonization as soon as I am self-hosted again.

In the case of stripping the trailing dot, this will cause a canonization redirect to become a redirect loop. As I'm following the specification to the letter, the bug isn't in my configuration, but in cURL's misinterpretation of such URIs. If you're comfortable with this misinterpretation, by all means, keep it. You should just be aware that that implementation does not follow the standards laid out in the RFCs. Anyone that doesn't want a trailing dot sent in the HTTP Host header won't be adding a trailing dot to domains in their URIs though, will they? Fixing this bug would allow cURL to follow the standards while remaining compatible with existing URIs.

@bagder

This comment has been minimized.

Member

bagder commented Mar 18, 2016

The specs tell us how to behave. But when virtually nobody is following the specs, being the only one that holds the strict position and does a certain way will not benefit curl users and frankly it won't benefit the Internet at large either. Users will expect curl to behave mostly like the browsers work, in this aspect as well.

And the different behaviors in curl compared to the browsers is a concern and that's why we're discussing this on the http-wg list. The discussion on that list will then be used as feedback for me and us to make a decision on what curl should do with this matter in the future. Right now I maintain that following both specs (strip + nonstrip) seems to be the worst, web compat wise.

As basically nobody uses trailing dots on domain names in URLs anyway (possibly partly for this reason) this isn't a very big deal. And a users can in fact provide his/her own Host: header so curl can still be made to follow RFC7230 manually.

@bagder

This comment has been minimized.

Member

bagder commented Apr 22, 2016

I'm aiming for just documenting the situation as the discussion in httpbis didn't go anywhere, the browsers don't follow the specs (as mentioned) and I don't see any benefit in changing curl to start doing so.

@AlexYst

This comment has been minimized.

AlexYst commented Apr 22, 2016

If you want to document it and leave the code in cURL alone, that's fine. It's a bit misleading to say that Web browsers aren't following the standard though.

Most Web browsers aren't currently following the standard, but I did manage to get the Qt developers to fix the bug in their networking code, so any Web browser that relies on Qt for this sort of thing will be fixed upon the next Qt release (Qt might have even released a new version by now, I'm not sure). The GNOME developers have acknowledged the bug as well, believing it to be in glib. They aren't acting as quickly as the Qt developers, but a fix is likely coming at some point, so Web browsers dependent on glib will likely also end up fixed. Likewise, Google's known about this issue for a while as part of a larger issue in their code, and it sounds like a patch for that problem is in the works for Chromium/Chrome too. From the sounds of it, Firefox might not fix the problem (despite already having a patch for it) and I'm not sure that Microsoft knows that Internet Explorer has the problem, but many Web browsers are likely going to be following the standard in the nearish future.

@bagder

This comment has been minimized.

Member

bagder commented Apr 22, 2016

I was referring to the major browsers. Chrome, IE, Safari, Firefox. The Qt HTTP stack is not used in any of those.

As mentioned already, I opened a discussion about this on the httpbis mailing list and there was no particular interest in the HTTP community to work on this.

Google's known about this issue for a while as part of a larger issue in their code, and it sounds like a patch for that problem is in the works for Chromium

Link?

Finally, you're working very hard to "fix" this with no particular use case that shouts for a fix. If you would be able to show a few different places where this causes a problem in real life right now, I think I and others would be more inclined to work on changing things.

@AlexYst

This comment has been minimized.

AlexYst commented Apr 22, 2016

https://bugs.chromium.org./p/chromium/issues/detail?id=496472 says:

This is a tracking bug for the clean-up tasks, in case anyone wants to grab it / yell at me:

That means that it's not a bug report, but rather a note from one of the developers saying that this needs to be fixed. It also says:

The only usage of this is for validating that the QUIC SNI information is well-formed. QUIC SNI follows the rules of TLS's SNI (RFC 6066), which dictates the use of IDNA A-Labels without the trailing '.'

... and:

  1. Ensure it's normalized to strip off any trailing '.' that may have been in the URL host as a DNS resolver hint (if sending) or that it doesn't have a trailing '.' if handling as a server (this is handled by #3)

It sounds to me like they have admitted a need to fix the bug, but some parts of that bug page are too technical for me to understand, so I could be wrong.

I think that a lot of places don't use the dotted host name for one of three reasons: they don't think that their users are tech-savvy enough to add a dot, they don't realize that the dot is even an option, or because Web clients aren't handling it properly, so it can't be reliably used yet. (There's also the personal preference crowd. Some people begin their domains with "www.", some don't. The same applies to the trailing dot.) Website owners shouldn't be forced to choose the undotted version of their domain for canonization just because Web clients are too buggy to handle it. I think that this is part of why the XHTML Content-Type header still hasn't seen the adoption that it deserves: Most versions of Internet Explorer can't handle it.

I'm going to be setting up my canonization to use the dotted version of the domain, as it is the absolute form, the fully-qualified domain name. To redirect from the dotted version to the non-dotted version seems incorrect, as a non-dotted domain can and in some cases is resolved against a local domain. While I respect the decision of some websites to canonize to the undotted form, it seems like this should be the decision of the website owner, not the decision of faulty Web clients that choose to be buggy. I doubt that cURL will be an issue on my website, as my site's too small for cURL use to be likely.

However, choosing not to follow standards seems like a bad idea. You say that you don't want to follow the standards because most clients don't follow it, but that very inertia is the problem. Many of them might also not be fixing their bugs because none of the other clients follow the standard. "Why should we follow the standards if no one else does?" Instead of choosing to be part of the solution, you choose to be part of what prevents standards from being, well, standard.

@bagder

This comment has been minimized.

Member

bagder commented Apr 22, 2016

Standards aren't there to be followed blindly. They should be there for a purpose. Right now, following the standards break more things than holding off. Compatibility with "the browsers" and "not breaking stuff" is more important to me than basically anything else.

@AlexYst

This comment has been minimized.

AlexYst commented Apr 22, 2016

You think that I'm trying to blindly follow standards, and I get that, but I think that you're blindly following what other Web client developers are doing. If you want to leave the bug in place, I have no problem with that. It's your software, not mine.

However, you've yet to mention even one use case that this fix would break. You mentioned Yahoo! and sites like it that are hosted on servers that can't handle the trailing dot, but those same sites aren't redirecting to a trailing-dot domain, linking to their trailing-dot version, or interacting with trailing dots in any way. Websites that don't use the trailing dot can continue not using the trailing dot and cURL will continue to successfully download pages from them.

By all means, if you think that cURL should strip the trailing dot, have it continue doing that. Just please don't blame other Web clients or Web servers for your decision. It is your decision and your decision alone. (Well, and the decision of any other cURL developers, but not the decision of developers of other projects.) If other developers were adding DRM that prevented their Web clients from being copied, would you do that too?

bagder added a commit that referenced this issue Apr 25, 2016

test1322: verify stripping of trailing dot from host name
While being debated (in #716) and a violation of RFC 7230 section 5.4,
this test verifies that the existing functionality works as intended. It
strips the dot from the host name and uses the host without dot
throughout the internals.

@bagder bagder closed this in 3a61428 Apr 25, 2016

@lock lock bot locked as resolved and limited conversation to collaborators May 7, 2018

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.