Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chunked stream memory leak #3631

Closed
mvanderkroon opened this issue Feb 28, 2019 · 10 comments

Comments

@mvanderkroon
Copy link

commented Feb 28, 2019

Long story short

I'm trying to (POST) stream data to an aiohttp server instance (potentially hundreds of gigabytes), however my data is not typically stored in files and can be 'generated' in client processes on the fly. I can't use a multipart upload as it does not fit my needs, nor do I use the data = await request.post() shorthand (which the docs are clear about in that that will OOM for large files).

I'm trying to use the underlying StreamReader (request._payload) to allow line by line iteration over the stream. In doing so, aiohttp (server) consumes more and more memory until the application OOMs.

Expected behaviour

Processing a stream of data in aiohttp server should not cause OOMs

Actual behaviour

aiohttp OOMs on large streams of data

Steps to reproduce

aiohttp server

# server.py
async def process_stream(request):
    async for line in request._payload:
        pass

    return web.json_response({"STATUS": "OK"})

requests client

# client.py
def generate_data():
    while True:
        yield """hello world\n""".encode('utf-8')
        
r = requests.post("http://localhost:8080/", data=generate_data())

Additional info

I found a resource relating to asyncio and StreamReader/Writer-backpressure. I have done my best to read through the aiohttp source but it looks like the fixes mentioned in the document are already in place so I'm not sure why this is not working.

In fact, I'm not sure whether the memory increase is due to aiohttp (or an underlying lib) holding references to elements in memory, or whether the producing process is simply pushing data in to the queue faster than aiohttp is consuming it (this latter case would suggest a problem with backpressure).

Your environment

server
aiohttp 3.5.4
alpine 3.7.0

@aio-libs-bot

This comment has been minimized.

Copy link
Collaborator

commented Feb 28, 2019

GitMate.io thinks the contributors most likely able to help are @fafhrd91, and @asvetlov.

Possibly related issues are #1656 (Memory leak), #271 (Memory leak in request), #2566 (how to get post data), #469 (how to get the post data), and #133 (Memory leak when doing https request).

@mvanderkroon

This comment has been minimized.

Copy link
Author

commented Mar 1, 2019

Found what looks like a bonafide bug. Analysis using tracemalloc showed that memory was leaking in streams:238.

Confusing at first, closer inspection reveals that on streams:276 this self.total_bytes value is appended to a list (_http_chunk_splits) of infinitely increasing size: hence the memory leak.

As far as I can tell, this _http_chunk_splits list is only used when iterating over the data stream in chunks using AsyncStreamReaderMixin.iter_chunks, in this case is seems to also be properly cleaned up (as elements are popped). Of the other three methods of iteration over the stream at least AsyncStreamReaderMixin.__aiter__ seems vulnerable to the bug.

My proposed fix would be to ensure the _http_chunk_splits list is properly cleaned up when the readline method is used to iterate over the stream line by line. Concretely I suggest the following:

# streams.py
async def readline(self) -> bytes:
    if self._exception is not None:
        raise self._exception

    line = []
    line_size = 0
    not_enough = True

    while not_enough:
        while self._buffer and not_enough:
            offset = self._buffer_offset
            ichar = self._buffer[0].find(b"\n", offset) + 1
            # Read from current offset to found b'\n' or to the end.
            data = self._read_nowait_chunk(ichar - offset if ichar else -1)
            line.append(data)
            line_size += len(data)
            if ichar:
                not_enough = False

            if line_size > self._high_water:
                raise ValueError("Line is too long")

        if self._eof:
            break

        if not_enough:
            await self._wait("readline")

    # fixes memory leak: https://github.com/aio-libs/aiohttp/issues/3631
    self._http_chunk_splits = []

    return b"".join(line)
@socketpair

This comment has been minimized.

Copy link
Contributor

commented Mar 1, 2019

@mvanderkroon could you please revert quotes and whitespaces in your commit ?

@mvanderkroon

This comment has been minimized.

Copy link
Author

commented Mar 1, 2019

@socketpair sure thing - latest commit should be fine

@socketpair

This comment has been minimized.

Copy link
Contributor

commented Mar 1, 2019

@mvanderkroon I wouldn't say so:

image

@mvanderkroon

This comment has been minimized.

Copy link
Author

commented Mar 2, 2019

I think somehow my latest commit is not being picked up here: da7bcaa

If all fails I'll simply redo the process, let me know

@socketpair

This comment has been minimized.

Copy link
Contributor

commented Mar 2, 2019

Well, squash (fixup) commits please and make a PR.

@mvanderkroon mvanderkroon referenced this issue Mar 3, 2019
0 of 2 tasks complete

@socketpair socketpair changed the title POST stream data memory leak Chunked stream memory leak Mar 3, 2019

socketpair added a commit that referenced this issue Mar 3, 2019
socketpair added a commit that referenced this issue Mar 3, 2019
socketpair added a commit that referenced this issue Mar 3, 2019
socketpair added a commit that referenced this issue Mar 3, 2019
@randydu randydu referenced this issue Mar 5, 2019
socketpair added a commit that referenced this issue Mar 10, 2019
socketpair added a commit that referenced this issue Mar 24, 2019
socketpair added a commit that referenced this issue Mar 24, 2019
socketpair added a commit that referenced this issue Mar 24, 2019
socketpair added a commit that referenced this issue Mar 24, 2019
socketpair added a commit that referenced this issue Mar 24, 2019
socketpair added a commit that referenced this issue Mar 24, 2019
socketpair added a commit that referenced this issue Mar 24, 2019
socketpair added a commit that referenced this issue Mar 24, 2019
socketpair added a commit that referenced this issue Mar 25, 2019
socketpair added a commit that referenced this issue Mar 25, 2019
socketpair added a commit that referenced this issue Mar 25, 2019
socketpair added a commit that referenced this issue Mar 25, 2019
socketpair added a commit that referenced this issue Mar 26, 2019
socketpair added a commit that referenced this issue Mar 26, 2019
socketpair added a commit that referenced this issue Mar 26, 2019
socketpair added a commit that referenced this issue Mar 29, 2019
@odysseusmax

This comment has been minimized.

Copy link

commented Mar 30, 2019

i'm too experiencing some memory leak in chunked stream responce.

I'm downloading files over http using aiohttp, my code looks some what like this

async with aiohttp.ClientSession() as session:
            async with session.get(url) as r:
                if(r.status < 400):
                   with open(filename, 'wb') as fd:
                                    async for chunk in r.content.iter_chunked(1024):
                                          fd.write(chunk)

in concurent downloads i get out of memory error.

@socketpair

This comment has been minimized.

Copy link
Contributor

commented Mar 30, 2019

@odysseusmax please report another issue and please prepare complete program to reproduce the bug. Every detail is importatant - Python version, and especially web server which I can use in order to reproduce.

@socketpair socketpair closed this Mar 30, 2019

@socketpair

This comment has been minimized.

Copy link
Contributor

commented Mar 30, 2019

Fix is landed in both master and 3.5 branches

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.