New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent ResponseHandler from parsing unexpected payloads #3979
base: master
Are you sure you want to change the base?
Prevent ResponseHandler from parsing unexpected payloads #3979
Conversation
…responses to a single request
This pull request fixes 1 alert when merging 0672191 into c5a80c8 - view on LGTM.com fixed alerts:
|
Codecov Report
@@ Coverage Diff @@
## master #3979 +/- ##
==========================================
- Coverage 97.77% 97.75% -0.02%
==========================================
Files 43 43
Lines 8763 8778 +15
Branches 1375 1377 +2
==========================================
+ Hits 8568 8581 +13
- Misses 83 84 +1
- Partials 112 113 +1
Continue to review full report at Codecov.
|
This patch may break case when we have not read completely the first response, but someone has sent second request in the same connection from other coroutine, and next we may see two responses. I don't know if it is possible. |
@socketpair Based on my understanding of the code, I think the case you describe is possible, as connections are released and can be reused when an EOF is received, and the buffer is read after that. |
@@ -0,0 +1 @@ | |||
Prevent ResponseHandler to parse unexpected payloads. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prevent ResponseHandler to parse unexpected payloads. | |
Prevent ResponseHandler from parsing unexpected payloads. |
@NicolasAubry I'd like to see a test for what @socketpair described and also some negative tests. |
Sending a request before the previous response is fully read would qualify as HTTP pipelining. Documentation is quite clear on this subject:
|
What do these changes do?
They fix an edge-case bug causing high memory consumption that occurs when sending requests to a few specific Web hosts.
We use aiohttp for our scanning tools that run in docker containers. While they work perfectly most of the time, we sometimes experienced repeated OOM cycles that caused the container to be killed over and over again because its memory usage spiked to the 2GB limit as soon as a scan started.
We discovered that some of the host we scan sends more than one response per request, usually around 100-200, but sometimes up to a few thousands, resulting in high memory consumption.
For example, making a GET request to one of those hosts with this simple script:
produces the following output:
The response headers and the content is what we expect.
The TCP dump for this single request, however, is more surprising:
The server (www.example.com) sends the same response over and over again. Those additional payloads aren't tied to a
ClientResponse
and stay there, filling memory. As we do thousands of requests, those payloads take a lot of memory.Putting a print at line 189 in aiohttp/client_proto.py, as shown below, produce the following output:
There is more payload than what is added to the
ClientResponse
the user gets.When a lot of requests is performed and even more data is received for some requests, the memory usage becomes an issue.
Are there changes in behavior for the user?
Hopefully, none apart from reduced memory consumption when hitting this strange server-side behavior. We have tested this fix internally for more than a week, and it fixed this particular problem without causing any noticeable undesired behaviors.
Related issue number
None that I'm aware of.
Checklist
CONTRIBUTORS.txt
CHANGES
folder<issue_id>.<type>
for example (588.bugfix)issue_id
change it to the pr id after creating the pr.feature
: Signifying a new feature..bugfix
: Signifying a bug fix..doc
: Signifying a documentation improvement..removal
: Signifying a deprecation or removal of public API..misc
: A ticket has been closed, but it is not of interest to users.