Encoding detection results in performance difference with other clients #1811
Description
Long story short
When using aiohttp for fetching pages I found strange performance problems. We started comparing timings with other clients like curl and requests and difference was significant. For other clients fetching page was 6 times faster compared to aiohttp. After some digging we found that problem was using method "text" that was used encoding detection from chardet.
Requests is also using chardet, but the difference is that it's skipping it if content-type contains word "text" by using 'ISO-8859-1'. https://github.com/kennethreitz/requests/blob/master/requests/utils.py#L362
Expected behaviour
Matching behavior to other popular clients.
Actual behaviour
Huge performance hit for pages without explicit encoding. For example "Content-Type: text/html"
For 300kB pages time difference for using encoding and not is 8s to 2s respectively. (using method "text" instead of "read"), and for 1.3MB page difference is 33s to 4.5s.
Steps to reproduce
I'm sorry I cannot disclose the page that I used for testing.
Your environment
I tested it on two environments
Linux 4.8 Ubuntu 16.10 Python 3.6
aiohttp==1.0.5
chardet==2.3.0
Linux 4.8 Ubuntu 16.10 Python 3.5.2
aiohttp==2.0.6
chardet==3.0.1
^ for that environment problem was smaller by 30%