StreamReader.iter_chunked(n) yields chunk_sizes < n before end of stream #7929

nathom · 2023-12-02T00:05:26Z

Describe the bug

I was having compatibility issues when moving from requests to aiohttp in my application. Issue seems to be that aiohttp's iter_chunked method may yield small chunks in the middle of the stream, unlike requests.Response.iter_content(n).

Not sure if this behavior is intended but it was not indicated by the documentation or the source code.

To Reproduce

Run this when downloading a large file.

async for chunk in resp.content.iter_chunked(2048*3):
    print(len(chunk))

Expected behavior

All chunk sizes are 6144 except the last one.

Logs/tracebacks

(Actual) Sample output of "To Reproduce" code:


6144
6144
6144
6144
6144
6144
6144
6144
6144
6144
6144
6144
6144
6144
6144
6144
6144
6144
6144
6144
6144
6144
6144
110
6144
6144
6144
6144
6144
2048
6144
6144
6144
5904
6144
6144
6144
1928
6144
6144
4096
6144
6144
6144
5904
6144



### Python Version

```console
$ python --version
Python 3.10.0

aiohttp Version

$ python -m pip show aiohttp
Name: aiohttp
Version: 3.9.1
Summary: Async http client/server framework (asyncio)
...

multidict Version

$ python -m pip show multidict
Name: multidict
Version: 6.0.4

yarl Version

$ python -m pip show yarl
Name: yarl
Version: 1.9.3
Summary: Yet another URL library

OS

macOS 12

Related component

Client

Additional context

No response

Code of Conduct

I agree to follow the aio-libs Code of Conduct

The text was updated successfully, but these errors were encountered:

Dreamsorcerer · 2023-12-02T00:55:00Z

Not sure if this behavior is intended but it was not indicated by the documentation or the source code.

It says 'with maximum size limit', which atleast to me, suggests that chunks may be smaller. But, feel free to make a PR to make it clearer.

I read that as a deliberate design. To have chunks of a fixed size would require not sending data that is already available, which could cause unnecessary delays.

nathom · 2023-12-02T03:46:39Z

Got it, thanks! So in order to get precise chunks of size N, is it recommended to use readexactly(N)? This was several orders of magnitude slower for my use case. I just ended up writing to a tempfile and reading out exactly sized chunks.

I think it's important to address these since the chunked reading feature is a widely used part of requests and a lot of people expect exact analogues in aiohttp.

I can make a PR updating docs. Let me know if there's any other relevant information/context.

Dreamsorcerer · 2023-12-02T14:06:35Z

I guess readexactly() works, or just adding chunks together yourself, maybe something like:

for new_chunk in ...:
    chunk += new_chunk
    if len(chunk) >= n:
        process(chunk[:n])
        chunk = chunk[n:]

I guess it's possible we could add an option if there's some solid use cases for it, but I'm not clear what those are yet. I think the main use case for the chunk size is to limit the amount of memory the application uses (e.g. downloading a large file and writing it to disk, on a tiny embedded system you may want to limit the amount of memory used for this to 1KB or something, while a desktop application might be good with 10MB as a limit).

nathom · 2023-12-06T04:36:41Z

My use case was for blowfish decryption, which was done on a chunk by chunk basis. This behavior was an issue because for any chunk with length n < chunk_size

# from iteration i   from iteration i+1
decrypt(chunk[:n]) + decrypt(chunk[n:])

is not the same as

# from iteration i
decrypt(chunk)

which is what I want. I don't know any other use cases off the top of my head but I feel as though reading constant size blocks from a stream should be common enough.

Dreamsorcerer · 2023-12-06T17:26:46Z

This feels to me like an itertools job.
Something like batched(chain.from_iterable(resp.content.iter_chunked(n)), n) should do it.

Looks like there are async versions of itertools at https://asyncstdlib.readthedocs.io/en/stable/source/api/itertools.html and https://aioitertools.omnilib.dev/en/stable/api.html (the former doesn't look like it has batched() yet, and the latter appears to have named it chunked()).

nathom added the bug label Dec 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

StreamReader.iter_chunked(n) yields chunk_sizes < n before end of stream #7929

StreamReader.iter_chunked(n) yields chunk_sizes < n before end of stream #7929

nathom commented Dec 2, 2023

Dreamsorcerer commented Dec 2, 2023

nathom commented Dec 2, 2023

Dreamsorcerer commented Dec 2, 2023

nathom commented Dec 6, 2023 •

edited

Loading

Dreamsorcerer commented Dec 6, 2023 •

edited

Loading

StreamReader.iter_chunked(n) yields chunk_sizes < n before end of stream #7929

StreamReader.iter_chunked(n) yields chunk_sizes < n before end of stream #7929

Comments

nathom commented Dec 2, 2023

Describe the bug

To Reproduce

Expected behavior

Logs/tracebacks

aiohttp Version

multidict Version

yarl Version

OS

Related component

Additional context

Code of Conduct

Dreamsorcerer commented Dec 2, 2023

nathom commented Dec 2, 2023

Dreamsorcerer commented Dec 2, 2023

nathom commented Dec 6, 2023 • edited Loading

Dreamsorcerer commented Dec 6, 2023 • edited Loading

nathom commented Dec 6, 2023 •

edited

Loading

Dreamsorcerer commented Dec 6, 2023 •

edited

Loading