Host header position causes certain sites to not respond #3265
Closed
Description
Outbound requests add the Host header last instead of first which causes an issue fetching certain sites. Normally this shouldn't matter, however I'm coming across servers that won't respond unless it's defined first as it is in a browser.
Example:
This is through a browser and successfully responds:
GET / HTTP/1.1
Host: www.accuweather.com
User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:61.0) Gecko/20100101 Firefox/61.0
Accept: */*
Accept-Encoding: gzip, deflate
Connection: close
This is through aiohttp and does not respond (notice the Host header position):
GET / HTTP/1.1
User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:61.0) Gecko/20100101 Firefox/61.0
Accept: */*
Accept-Encoding: gzip, deflate
Host: www.accuweather.com
Connection: close
To replicate this behavior use the the following code and notice it will time out.
import asyncio
import aiohttp
async def req(url):
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:61.0) Gecko/20100101 Firefox/61.0'
}
timeout = aiohttp.ClientTimeout(total=10)
async with aiohttp.ClientSession(timeout=timeout, headers=headers) as session:
async with session.get(url, ssl=False) as resp:
print(resp.status)
print(await resp.text())
loop = asyncio.get_event_loop()
loop.run_until_complete(req("https://www.accuweather.com"))
Tested on:
Windows 7x64
Python 3.7.0
aiohttp 3.3.2