Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StreamReader.iter_chunked(n) yields chunk_sizes < n before end of stream #7929

Open
1 task done
nathom opened this issue Dec 2, 2023 · 5 comments
Open
1 task done
Labels

Comments

@nathom
Copy link

nathom commented Dec 2, 2023

Describe the bug

I was having compatibility issues when moving from requests to aiohttp in my application. Issue seems to be that aiohttp's iter_chunked method may yield small chunks in the middle of the stream, unlike requests.Response.iter_content(n).

Not sure if this behavior is intended but it was not indicated by the documentation or the source code.

To Reproduce

Run this when downloading a large file.

async for chunk in resp.content.iter_chunked(2048*3):
    print(len(chunk))

Expected behavior

All chunk sizes are 6144 except the last one.

Logs/tracebacks

(Actual) Sample output of "To Reproduce" code:


6144
6144
6144
6144
6144
6144
6144
6144
6144
6144
6144
6144
6144
6144
6144
6144
6144
6144
6144
6144
6144
6144
6144
110
6144
6144
6144
6144
6144
2048
6144
6144
6144
5904
6144
6144
6144
1928
6144
6144
4096
6144
6144
6144
5904
6144


### Python Version

```console
$ python --version
Python 3.10.0

aiohttp Version

$ python -m pip show aiohttp
Name: aiohttp
Version: 3.9.1
Summary: Async http client/server framework (asyncio)
...

multidict Version

$ python -m pip show multidict
Name: multidict
Version: 6.0.4

yarl Version

$ python -m pip show yarl
Name: yarl
Version: 1.9.3
Summary: Yet another URL library

OS

macOS 12

Related component

Client

Additional context

No response

Code of Conduct

  • I agree to follow the aio-libs Code of Conduct
@nathom nathom added the bug label Dec 2, 2023
@Dreamsorcerer
Copy link
Member

Not sure if this behavior is intended but it was not indicated by the documentation or the source code.

It says 'with maximum size limit', which atleast to me, suggests that chunks may be smaller. But, feel free to make a PR to make it clearer.

I read that as a deliberate design. To have chunks of a fixed size would require not sending data that is already available, which could cause unnecessary delays.

@nathom
Copy link
Author

nathom commented Dec 2, 2023

Got it, thanks! So in order to get precise chunks of size N, is it recommended to use readexactly(N)? This was several orders of magnitude slower for my use case. I just ended up writing to a tempfile and reading out exactly sized chunks.

I think it's important to address these since the chunked reading feature is a widely used part of requests and a lot of people expect exact analogues in aiohttp.

I can make a PR updating docs. Let me know if there's any other relevant information/context.

@Dreamsorcerer
Copy link
Member

I guess readexactly() works, or just adding chunks together yourself, maybe something like:

for new_chunk in ...:
    chunk += new_chunk
    if len(chunk) >= n:
        process(chunk[:n])
        chunk = chunk[n:]

I guess it's possible we could add an option if there's some solid use cases for it, but I'm not clear what those are yet. I think the main use case for the chunk size is to limit the amount of memory the application uses (e.g. downloading a large file and writing it to disk, on a tiny embedded system you may want to limit the amount of memory used for this to 1KB or something, while a desktop application might be good with 10MB as a limit).

@nathom
Copy link
Author

nathom commented Dec 6, 2023

My use case was for blowfish decryption, which was done on a chunk by chunk basis. This behavior was an issue because for any chunk with length n < chunk_size

# from iteration i   from iteration i+1
decrypt(chunk[:n]) + decrypt(chunk[n:])

is not the same as

# from iteration i
decrypt(chunk)

which is what I want. I don't know any other use cases off the top of my head but I feel as though reading constant size blocks from a stream should be common enough.

@Dreamsorcerer
Copy link
Member

Dreamsorcerer commented Dec 6, 2023

This feels to me like an itertools job.
Something like batched(chain.from_iterable(resp.content.iter_chunked(n)), n) should do it.

Looks like there are async versions of itertools at https://asyncstdlib.readthedocs.io/en/stable/source/api/itertools.html and https://aioitertools.omnilib.dev/en/stable/api.html (the former doesn't look like it has batched() yet, and the latter appears to have named it chunked()).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants