Skip to content

Encoding detection results in performance difference with other clients #1811

Closed
@serathius

Description

Long story short

When using aiohttp for fetching pages I found strange performance problems. We started comparing timings with other clients like curl and requests and difference was significant. For other clients fetching page was 6 times faster compared to aiohttp. After some digging we found that problem was using method "text" that was used encoding detection from chardet.

Requests is also using chardet, but the difference is that it's skipping it if content-type contains word "text" by using 'ISO-8859-1'. https://github.com/kennethreitz/requests/blob/master/requests/utils.py#L362

Expected behaviour

Matching behavior to other popular clients.

Actual behaviour

Huge performance hit for pages without explicit encoding. For example "Content-Type: text/html"
For 300kB pages time difference for using encoding and not is 8s to 2s respectively. (using method "text" instead of "read"), and for 1.3MB page difference is 33s to 4.5s.

Steps to reproduce

I'm sorry I cannot disclose the page that I used for testing.

Your environment

I tested it on two environments
Linux 4.8 Ubuntu 16.10 Python 3.6
aiohttp==1.0.5
chardet==2.3.0

Linux 4.8 Ubuntu 16.10 Python 3.5.2
aiohttp==2.0.6
chardet==3.0.1
^ for that environment problem was smaller by 30%

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions