Description
Long story short
I'm trying to (POST) stream data to an aiohttp server instance (potentially hundreds of gigabytes), however my data is not typically stored in files and can be 'generated' in client processes on the fly. I can't use a multipart upload as it does not fit my needs, nor do I use the data = await request.post() shorthand (which the docs are clear about in that that will OOM for large files).
I'm trying to use the underlying StreamReader (request._payload) to allow line by line iteration over the stream. In doing so, aiohttp (server) consumes more and more memory until the application OOMs.
Expected behaviour
Processing a stream of data in aiohttp server should not cause OOMs
Actual behaviour
aiohttp OOMs on large streams of data
Steps to reproduce
aiohttp server
# server.py
async def process_stream(request):
async for line in request._payload:
pass
return web.json_response({"STATUS": "OK"})requests client
# client.py
def generate_data():
while True:
yield """hello world\n""".encode('utf-8')
r = requests.post("http://localhost:8080/", data=generate_data())Additional info
I found a resource relating to asyncio and StreamReader/Writer-backpressure. I have done my best to read through the aiohttp source but it looks like the fixes mentioned in the document are already in place so I'm not sure why this is not working.
In fact, I'm not sure whether the memory increase is due to aiohttp (or an underlying lib) holding references to elements in memory, or whether the producing process is simply pushing data in to the queue faster than aiohttp is consuming it (this latter case would suggest a problem with backpressure).
Your environment
server
aiohttp 3.5.4
alpine 3.7.0