Chunked stream memory leak #3631

mvanderkroon · 2019-02-28T11:47:11Z

Long story short

I'm trying to (POST) stream data to an aiohttp server instance (potentially hundreds of gigabytes), however my data is not typically stored in files and can be 'generated' in client processes on the fly. I can't use a multipart upload as it does not fit my needs, nor do I use the data = await request.post() shorthand (which the docs are clear about in that that will OOM for large files).

I'm trying to use the underlying StreamReader (request._payload) to allow line by line iteration over the stream. In doing so, aiohttp (server) consumes more and more memory until the application OOMs.

Expected behaviour

Processing a stream of data in aiohttp server should not cause OOMs

Actual behaviour

aiohttp OOMs on large streams of data

Steps to reproduce

aiohttp server

# server.py
async def process_stream(request):
    async for line in request._payload:
        pass

    return web.json_response({"STATUS": "OK"})

requests client

# client.py
def generate_data():
    while True:
        yield """hello world\n""".encode('utf-8')
        
r = requests.post("http://localhost:8080/", data=generate_data())

Additional info

I found a resource relating to asyncio and StreamReader/Writer-backpressure. I have done my best to read through the aiohttp source but it looks like the fixes mentioned in the document are already in place so I'm not sure why this is not working.

In fact, I'm not sure whether the memory increase is due to aiohttp (or an underlying lib) holding references to elements in memory, or whether the producing process is simply pushing data in to the queue faster than aiohttp is consuming it (this latter case would suggest a problem with backpressure).

Your environment

server
aiohttp 3.5.4
alpine 3.7.0

The text was updated successfully, but these errors were encountered:

aio-libs-bot · 2019-02-28T11:50:21Z

GitMate.io thinks the contributors most likely able to help are @fafhrd91, and @asvetlov.

Possibly related issues are #1656 (Memory leak), #271 (Memory leak in request), #2566 (how to get post data), #469 (how to get the post data), and #133 (Memory leak when doing https request).

mvanderkroon · 2019-03-01T15:34:40Z

Found what looks like a bonafide bug. Analysis using tracemalloc showed that memory was leaking in streams:238.

Confusing at first, closer inspection reveals that on streams:276 this self.total_bytes value is appended to a list (_http_chunk_splits) of infinitely increasing size: hence the memory leak.

As far as I can tell, this _http_chunk_splits list is only used when iterating over the data stream in chunks using AsyncStreamReaderMixin.iter_chunks, in this case is seems to also be properly cleaned up (as elements are popped). Of the other three methods of iteration over the stream at least AsyncStreamReaderMixin.__aiter__ seems vulnerable to the bug.

My proposed fix would be to ensure the _http_chunk_splits list is properly cleaned up when the readline method is used to iterate over the stream line by line. Concretely I suggest the following:

# streams.py
async def readline(self) -> bytes:
    if self._exception is not None:
        raise self._exception

    line = []
    line_size = 0
    not_enough = True

    while not_enough:
        while self._buffer and not_enough:
            offset = self._buffer_offset
            ichar = self._buffer[0].find(b"\n", offset) + 1
            # Read from current offset to found b'\n' or to the end.
            data = self._read_nowait_chunk(ichar - offset if ichar else -1)
            line.append(data)
            line_size += len(data)
            if ichar:
                not_enough = False

            if line_size > self._high_water:
                raise ValueError("Line is too long")

        if self._eof:
            break

        if not_enough:
            await self._wait("readline")

    # fixes memory leak: https://github.com/aio-libs/aiohttp/issues/3631
    self._http_chunk_splits = []

    return b"".join(line)

socketpair · 2019-03-01T16:57:55Z

@mvanderkroon could you please revert quotes and whitespaces in your commit ?

mvanderkroon · 2019-03-01T17:32:40Z

@socketpair sure thing - latest commit should be fine

socketpair · 2019-03-01T19:10:22Z

@mvanderkroon I wouldn't say so:

mvanderkroon · 2019-03-02T08:30:28Z

I think somehow my latest commit is not being picked up here: da7bcaa

If all fails I'll simply redo the process, let me know

socketpair · 2019-03-02T17:57:43Z

Well, squash (fixup) commits please and make a PR.

odysseusmax · 2019-03-30T06:58:33Z

i'm too experiencing some memory leak in chunked stream responce.

I'm downloading files over http using aiohttp, my code looks some what like this

async with aiohttp.ClientSession() as session:
            async with session.get(url) as r:
                if(r.status < 400):
                   with open(filename, 'wb') as fd:
                                    async for chunk in r.content.iter_chunked(1024):
                                          fd.write(chunk)

in concurent downloads i get out of memory error.

socketpair · 2019-03-30T07:05:15Z

@odysseusmax please report another issue and please prepare complete program to reproduce the bug. Every detail is importatant - Python version, and especially web server which I can use in order to reproduce.

socketpair · 2019-03-30T07:05:50Z

Fix is landed in both master and 3.5 branches

mvanderkroon mentioned this issue Mar 3, 2019

Fix memory leak #3634

Closed

2 tasks

socketpair changed the title ~~POST stream data memory leak~~ Chunked stream memory leak Mar 3, 2019

socketpair added a commit that referenced this issue Mar 3, 2019

Fix memory leak in chunked streams handling code (#3631)

0b9f824

socketpair added a commit that referenced this issue Mar 3, 2019

🐽 Fix memory leak in chunked streams handling code (#3631)

6241bee

socketpair added a commit that referenced this issue Mar 3, 2019

🐽 Fix memory leak in chunked streams handling code (#3631)

e9ac7d1

socketpair added a commit that referenced this issue Mar 3, 2019

🐽 Fix memory leak in chunked streams handling code (#3631)

d842114

randydu mentioned this issue Mar 5, 2019

memory leak kyuupichan/electrumx#750

Closed

socketpair added a commit that referenced this issue Mar 10, 2019

🐽 Fix memory leak in chunked streams handling code (#3631)

11c996b

socketpair added a commit that referenced this issue Mar 24, 2019

🐽 Fix memory leak in chunked streams handling code (#3631)

64997af

socketpair added a commit that referenced this issue Mar 24, 2019

🐽 Fix memory leak in chunked streams handling code (#3631)

dc67f3f

socketpair added a commit that referenced this issue Mar 24, 2019

🐽 Fix memory leak in chunked streams handling code (#3631)

1330c02

socketpair added a commit that referenced this issue Mar 24, 2019

🐽 Fix memory leak in chunked streams handling code (#3631)

124d9bd

socketpair added a commit that referenced this issue Mar 24, 2019

🐽 Fix memory leak in chunked streams handling code (#3631)

0504111

socketpair added a commit that referenced this issue Mar 24, 2019

🐽 Fix memory leak in chunked streams handling code (#3631)

a02426b

socketpair added a commit that referenced this issue Mar 24, 2019

🐽 Fix memory leak in chunked streams handling code (#3631)

e44fcac

socketpair added a commit that referenced this issue Mar 24, 2019

🐽 Fix memory leak in chunked streams handling code (#3631)

f2d27d8

socketpair added a commit that referenced this issue Mar 25, 2019

🐽 Fix memory leak in chunked streams handling code (#3631)

a19a1f0

socketpair added a commit that referenced this issue Mar 25, 2019

🐽 Fix memory leak in chunked streams handling code (#3631)

4981e4b

socketpair added a commit that referenced this issue Mar 25, 2019

🐽 Fix memory leak in chunked streams handling code (#3631)

25ee9ff

socketpair added a commit that referenced this issue Mar 25, 2019

🐽 Fix memory leak in chunked streams handling code (#3631)

4c526ab

socketpair added a commit that referenced this issue Mar 26, 2019

🐽 Fix memory leak in chunked streams handling code (#3631) (#3635)

d52a2de

socketpair added a commit that referenced this issue Mar 26, 2019

🐽 Fix memory leak in chunked streams handling code (#3631)

dce5a64

socketpair added a commit that referenced this issue Mar 26, 2019

🐽 Fix memory leak in chunked streams handling code (#3631)

beb7ec2

socketpair added a commit that referenced this issue Mar 29, 2019

🐽 Fix memory leak in chunked streams handling code (#3631) (#3668)

61825e7

socketpair closed this as completed Mar 30, 2019

aio-libs-bot mentioned this issue Mar 30, 2019

stream response memory leak #3673

Closed

lock bot added the outdated label Apr 2, 2020

lock bot locked as resolved and limited conversation to collaborators Apr 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chunked stream memory leak #3631

Chunked stream memory leak #3631

mvanderkroon commented Feb 28, 2019 •

edited

Loading

aio-libs-bot commented Feb 28, 2019

mvanderkroon commented Mar 1, 2019 •

edited

Loading

socketpair commented Mar 1, 2019

mvanderkroon commented Mar 1, 2019

socketpair commented Mar 1, 2019

mvanderkroon commented Mar 2, 2019 •

edited

Loading

socketpair commented Mar 2, 2019

odysseusmax commented Mar 30, 2019 •

edited

Loading

socketpair commented Mar 30, 2019

socketpair commented Mar 30, 2019 •

edited

Loading

Chunked stream memory leak #3631

Chunked stream memory leak #3631

Comments

mvanderkroon commented Feb 28, 2019 • edited Loading

Long story short

Expected behaviour

Actual behaviour

Steps to reproduce

Additional info

Your environment

aio-libs-bot commented Feb 28, 2019

mvanderkroon commented Mar 1, 2019 • edited Loading

socketpair commented Mar 1, 2019

mvanderkroon commented Mar 1, 2019

socketpair commented Mar 1, 2019

mvanderkroon commented Mar 2, 2019 • edited Loading

socketpair commented Mar 2, 2019

odysseusmax commented Mar 30, 2019 • edited Loading

socketpair commented Mar 30, 2019

socketpair commented Mar 30, 2019 • edited Loading

mvanderkroon commented Feb 28, 2019 •

edited

Loading

mvanderkroon commented Mar 1, 2019 •

edited

Loading

mvanderkroon commented Mar 2, 2019 •

edited

Loading

odysseusmax commented Mar 30, 2019 •

edited

Loading

socketpair commented Mar 30, 2019 •

edited

Loading