-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support multipart upload of S3, OSS, COS and OBS #17447
Conversation
Thanks for the improvement and it could benefit the write case a lot! I wonder what's the user interface you are using to trigger the write(alluxio, fuse, s3)? |
Set |
Can you explain the difference between the previously implemented streaming upload and the current implementation of multipart upload, as well as why there is a performance improvement? To me, they seem to be the same. |
The referenced document about S3 MD5 refers to the fact that the S3 server will calculate the MD5 of the uploaded fragments after uploading, rather than calculating the MD5 on the client side before uploading. |
@zhezhidashi Thanks for your reply.
|
Thank you! I will rethink these problem. |
|
LGTM |
alluxio-bot, merge this please |
merge failed: |
a99b796
to
e23f3cc
Compare
alluxio-bot, merge this please |
What changes are proposed in this pull request?
Support multipart upload of S3, OSS, COS and OBS.
Why are the changes needed?
partition size
, and use Netty zero-copy technology for splicing.There are some experiments conducted on Mac Laptop and AWS instance. Simple upload (default) and streaming upload are two original upload methods of alluxio.
File Size: 4.8GB
MinIO (Mac Laptop):
Simple Upload: 28 seconds
Streaming Upload: 20 seconds
Multipart Upload: 12 seconds
AWS same region (r6a.xlarge):
Simple Upload: 25 seconds
Streaming Upload: 18 seconds
Multipart Upload: 12 seconds
In an environment with sufficient bandwidth (or an intranet environment), the speed increase is obvious.
Does this PR introduce any user facing changes?
alluxio.underfs.object.store.multipart.upload.timeout
: Timeout for uploading part when using multipart upload.S3:
alluxio.underfs.s3.multipart.upload.enabled
: Whether to enable multipart upload for S3. If it istrue
, then multipart upload of S3 will be enabled. Defult value isfalse
.alluxio.underfs.s3.multipart.upload.partition.size
: Multipart upload partition size for S3. The default partition size is64MB
.OSS:
alluxio.underfs.oss.multipart.upload.enabled
: Whether to enable multipart upload for OSS.alluxio.underfs.oss.multipart.upload.threads
: Thread pool size for OSS multipart upload.alluxio.underfs.oss.multipart.upload.partition.size
: Multipart upload partition size for OSS. The default partition size is 64MB.COS:
alluxio.underfs.cos.multipart.upload.enabled
: Whether to enable multipart upload for COS.alluxio.underfs.cos.multipart.upload.threads
: Thread pool size for COS multipart upload.alluxio.underfs.cos.multipart.upload.partition.size
: Multipart upload partition size for COS. The default partition size is 64MB.OBS:
alluxio.underfs.obs.multipart.upload.enabled
: Whether to enable multipart upload for OBS.alluxio.underfs.obs.multipart.upload.threads
: Thread pool size for OBS multipart upload.alluxio.underfs.obs.multipart.upload.partition.size
: Multipart upload partition size for OBS. The default partition size is 64MB.