Inconsistent support of IDNA hostnames in Client #1444
Description
Long story short
aiohttp's client handle IDNA hostnames in a way that seems inconsistent: the Host header always contains a dedcoded utf-8 value which seems problematic.
For instance:
session.get("http://éé.com/")makes a request withHost: éé.comsession.get("http://xn--9caa.com/")also makes a request withHost: éé.com.
While it's unclear to me if an unicode hostname should always be IDNA encoded (see bellow), it should at least not be decoded when explicitly encoded by the caller.
IDNA or not?
The newest HTTP/1 RFCs doesn't specify the encoding of the headers, but recommend to handle them as US-ASCII characters only for security reasons (see: https://tools.ietf.org/html/rfc7230#section-3, especially the last paragraph of 3.2.4).
Most of the resources I read from the W3C or the IETF (normative or not) tells that the hostname should always be encoded, for instance, https://www.w3.org/International/articles/idn-and-iri/#resolvedomain says:
Finally the user agent sends the request for the page. Since punycode contains no characters outside those normally allowed for protocols such as HTTP, there is no issue with the transmission of the address. This should simply match against a registered domain name.
Browsers I tested (Firefox, Chromium) always encode the hostname in IDNA.
I made some tests on a random hostname with unicode characters served by nginx. Nginx doesn't care about the encoding and applies the virtual host rules matching the exact string. Ie: with xn--9caa.com I see the right website, while éé.com returns a 404 probably because only the IDNA encoded version is specified in the configuration.
Expected behaviour
session.get("http://xn--9caa.com/")must make a request withHost: xn--9caa.com(encoded host).session.get("http://éé.com/")should make a request withHost: xn--9caa.com(encoded host)
Actual behaviour
session.get("http://xn--9caa.com/")makes a request with a decoded host:Host: éé.com(UTF-8 encoded host).session.get("http://éé.com/")makes a request withHost: éé.comtoo.
Suggested fix
It seems that self.url.raw_host should be used rather than self.url.host in ClientRequest:
https://github.com/KeepSafe/aiohttp/blob/master/aiohttp/client_reqrep.py#L168
(according to my quick test, yarl.URL.raw_host is always return the idna-encoded version, regardless of the encoding of the input url).