Skip to content

Concurrent put_object_stream chunk uploads are still unstable #463

@robinfriedli

Description

@robinfriedli

Describe the bug
This is a continuation of #346 and #404 . Depending on the speed or behaviour of the provided reader, requests often retry or even fail. Since the current implementation no longer creates a request to upload a part before the corresponding chunk is loaded, it seems to a different issue from before where slow uploads caused certain parts to not receive data for too long and disconnect. Rather, the issue could be caused by bursty and backpressure sensitive producers and rate limiting issues when multiple simultaneous uploads use 100 concurrent chunks each.

I run a server with two main use cases:

  • Client upload proxying: The processes a client upload and streams it to s3. In my local testing this case usually works, but occasionally still fails when processing multiple uploads at once. In production, the speed of the reader depends on the client and is unpredictable.
  • ffmpeg output streaming: Streaming ffmpeg output from a unix fifo / named pipe for HLS video transcoding. This case almost always fails for the larger video files I'm testing. Ffmpeg writes multiple HLS segments that are uploaded simultaneously.

It seems to me that the main issue is simultaneous uploads with 100 concurrent chunks each running into rate limiting problems. Unfortunately, the errors I logged aren't very helpful.
First, retries for a couple of parts:

[WARN][2026-05-16 17:49:00][s3::request::tokio_backend] Retrying reqwest: error sending request for url (https://foo.s3.eu-central-2.amazonaws.com/26d35133-971a-47ab-99fc-d95107c6b783/stream_0.ts?partNumber=37&uploadId=...)
[WARN][2026-05-16 17:49:00][s3::request::tokio_backend] Retrying reqwest: error sending request for url (https://foo.s3.eu-central-2.amazonaws.com/26d35133-971a-47ab-99fc-d95107c6b783/stream_0.ts?partNumber=38&uploadId=...)
[WARN][2026-05-16 17:49:00][s3::request::tokio_backend] Retrying reqwest: error sending request for url (https://foo.s3.eu-central-2.amazonaws.com/26d35133-971a-47ab-99fc-d95107c6b783/stream_0.ts?partNumber=40&uploadId=...)
[WARN][2026-05-16 17:49:00][s3::request::tokio_backend] Retrying reqwest: error sending request for url (https://foo.s3.eu-central-2.amazonaws.com/26d35133-971a-47ab-99fc-d95107c6b783/stream_0.ts?partNumber=39&uploadId=...)
[WARN][2026-05-16 17:49:00][s3::request::tokio_backend] Retrying reqwest: error sending request for url (https://foo.s3.eu-central-2.amazonaws.com/26d35133-971a-47ab-99fc-d95107c6b783/stream_0.ts?partNumber=46&uploadId=...)
[WARN][2026-05-16 17:49:00][s3::request::tokio_backend] Retrying reqwest: error sending request for url (https://foo.s3.eu-central-2.amazonaws.com/26d35133-971a-47ab-99fc-d95107c6b783/stream_0.ts?partNumber=111&uploadId=...)
[WARN][2026-05-16 17:49:00][s3::request::tokio_backend] Retrying reqwest: error sending request for url (https://foo.s3.eu-central-2.amazonaws.com/26d35133-971a-47ab-99fc-d95107c6b783/stream_0.ts?partNumber=88&uploadId=...)

and then, eventually:

reqwest: error sending request for url (https://foo.s3.eu-central-2.amazonaws.com/26d35133-971a-47ab-99fc-d95107c6b783/stream_0.ts?partNumber=88&uploadId=...)

To Reproduce
Try uploading multiple large files with put_object_stream_with_content_type simultaneously on a system with enough memory to support several hundred concurrent chunks total.

Expected behavior
Ideally, the number of concurrent chunks should be configurable for a library user to tweak it to their needs. My use cases do not benefit from aggressive concurrency anyway because reading the producer is relatively slow. Additionally, 100 concurrent chunks per upload seems excessive. I know the number of concurrent chunks was deliberately changed from 10 to 100 in commit d8fb703 (btw, the doc for the function still says 10), but in my testing, even when uploading a local file and streaming it through the server going from 10 chunks to 100 chunks does not improve throughput / upload duration. I tried uploading a local 4.7 GB video file and it took 174 seconds with 10 concurrent chunks, 100 chunks were actually slightly slower at 183 seconds due to being more likely to trigger a retry for a part. In my opinion, 10 chunks were a more sensible default.

Environment

  • Rust version: 1.95.0
  • lib version: 0.37.2

Metadata

Metadata

Assignees

Labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions