Skip to content

Conversation

@GarrettBeatty
Copy link
Contributor

@GarrettBeatty GarrettBeatty commented Nov 24, 2025

Previously in #4130, we would buffer every part, including the first part. This was not optimal because the first part can technically be streamed directly to the user because it will always be in order. Similarly, there is another case where if a part finishes downloading and is the next expected part, we can stream that right to the user instead of buffering.

This change adds support so that in those cases the part is streamed directly to the user instead of buffering in memory.

Changes

  1. Add support for direct streaming
  2. Also fix potential resource leak in existing code. I discovered these issues while working on this PR and i figure they are small enough so i included them here

Motivation and Context

#3806

Testing

  1. Unit Tests which validate no additional array pools are created for this scenario and the part is streamed to the user directly
  2. Existing integration tests pass

Re-ran performance tests and got similar results

Total bytes per run: 5,368,709,120

Run:1 Secs:2.661879 Gb/s:16.135098
Run:2 Secs:1.455994 Gb/s:29.498529
Run:3 Secs:1.204284 Gb/s:35.664076
Run:4 Secs:1.119768 Gb/s:38.355867
Run:5 Secs:1.057126 Gb/s:40.628725
Run:6 Secs:1.055039 Gb/s:40.709082
Run:7 Secs:1.060168 Gb/s:40.512150
Run:8 Secs:1.053934 Gb/s:40.751759
Run:9 Secs:1.062503 Gb/s:40.423092
Run:10 Secs:1.094352 Gb/s:39.246664

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist

  • My code follows the code style of this project
  • My change requires a change to the documentation
  • I have updated the documentation accordingly
  • I have read the README document
  • I have added tests to cover my changes
  • All new and existing tests passed

License

  • I confirm that this pull request can be released under the Apache 2 license

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes the S3 multipart download manager to reduce memory usage by streaming responses directly to the consumer when parts arrive in sequential order, while falling back to buffering for out-of-order parts. The optimization is transparent to callers and maintains backward compatibility.

Key Changes:

  • Introduced StreamingDataSource class to stream GetObjectResponse directly without buffering
  • Modified BufferedPartDataHandler to intelligently choose between streaming (in-order parts) and buffering (out-of-order parts)
  • Added IPartBufferManager.AddBufferAsync(IPartDataSource) overload to support both streaming and buffered data sources
  • Updated response disposal logic to handle ownership transfer for streaming vs immediate disposal for buffering

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
generator/.DevConfigs/9d07dc1e-d82d-4f94-8700-c7b57f872045.json Dev config with patch version bump and changelog (contains spelling error)
sdk/src/Services/S3/Custom/Transfer/Internal/StreamingDataSource.cs New class implementing IPartDataSource for direct streaming from GetObjectResponse without buffering
sdk/src/Services/S3/Custom/Transfer/Internal/BufferedPartDataHandler.cs Enhanced to decide between streaming vs buffering based on part arrival order
sdk/src/Services/S3/Custom/Transfer/Internal/IPartBufferManager.cs Added AddBufferAsync(IPartDataSource) method to support both streaming and buffered sources
sdk/src/Services/S3/Custom/Transfer/Internal/PartBufferManager.cs Implemented new AddBufferAsync overload to handle IPartDataSource types
sdk/src/Services/S3/Custom/Transfer/Internal/MultipartDownloadManager.cs Updated response disposal logic to account for ownership transfer in streaming path
sdk/test/Services/S3/UnitTests/Custom/StreamingDataSourceTests.cs Comprehensive unit tests for new StreamingDataSource class (708 lines)
sdk/test/Services/S3/UnitTests/Custom/BufferedPartDataHandlerTests.cs Updated tests to cover streaming vs buffering decision logic and mixed scenarios
sdk/test/Services/S3/UnitTests/Custom/PartBufferManagerTests.cs Added integration tests for StreamingDataSource with PartBufferManager

@GarrettBeatty GarrettBeatty force-pushed the gcbeatty/optimization branch 2 times, most recently from e75f10a to 6a42d97 Compare November 24, 2025 20:38
@GarrettBeatty GarrettBeatty changed the title Optimize part streaming Optimize Part Stream for Multi part Download Nov 24, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.

/// </list>
/// <para><strong>Response Ownership:</strong></para>
/// <para>
/// This method takes ownership of the response and is responsible for disposing it in ALL cases,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the reason for this is because previously MultipartDownloadManager disposed of the response in the finally block. but with this change of adding StreamingDataSource, if we kept the dispose in the MultipartDownloadManager, the response would've been dispoed by the time StreamingDataSource has to read it back to the user. Therefore we need to have StreamingDataSource keep ownership of the response

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.

@GarrettBeatty GarrettBeatty force-pushed the gcbeatty/optimization branch 2 times, most recently from 12b92d1 to e6e17e3 Compare November 24, 2025 21:24
{
discoveryResult.InitialResponse.WriteObjectProgressEvent -= wrappedCallback;
// Always detach the event handler to prevent memory leak
// This runs whether ProcessPartAsync succeeds or throws
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not related to the change at all, but one potential issue i found anyway, so fixing it here

// This is critical because GetObjectResponse holds unmanaged resources that
// won't be cleaned up by GC - must be explicitly disposed to return HTTP
// connection to the pool and close network streams
_discoveryResult?.InitialResponse?.Dispose();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unrelated to the optimizing stream for multi part download, but i discovered this potential resource leak while working on this

@GarrettBeatty GarrettBeatty marked this pull request as ready for review November 24, 2025 21:34
@GarrettBeatty GarrettBeatty force-pushed the gcbeatty/directoryresponse branch from a522b17 to 135eacc Compare November 25, 2025 14:56
{
Logger.DebugFormat("BufferedPartDataHandler: [Part {0}] Starting to buffer part from response stream - ContentLength={1}",
partNumber, response.ContentLength);
if (partNumber == _partBufferManager.NextExpectedPartNumber)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this "felt" kind of weird putting this logic here but i felt it was fine for now

"serviceName": "S3",
"type": "patch",
"changeLogMessages": [
"Optimized multipart download manager to stream responses directly where applicable."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this an optimization for a feature we haven't yet released? Maybe we don't need a changelog in this case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah you are right i can remove this dev config

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed it

@GarrettBeatty GarrettBeatty force-pushed the gcbeatty/directoryresponse branch from 135eacc to 105d5c6 Compare December 1, 2025 22:05
Base automatically changed from gcbeatty/directoryresponse to feature/transfermanager December 1, 2025 22:07
remove dev config
@GarrettBeatty GarrettBeatty merged commit dd150cb into feature/transfermanager Dec 1, 2025
1 check passed
@GarrettBeatty GarrettBeatty deleted the gcbeatty/optimization branch December 1, 2025 22:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants