Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement parallelization to speed up S3 artifacts upload and download #12442

Open
nitays opened this issue Jan 2, 2024 · 2 comments
Open

Implement parallelization to speed up S3 artifacts upload and download #12442

nitays opened this issue Jan 2, 2024 · 2 comments
Labels
area/artifacts S3/GCP/OSS/Git/HDFS etc type/feature Feature request

Comments

@nitays
Copy link

nitays commented Jan 2, 2024

Summary

Implement a client that supports parallelism such as s5cmd to enable much faster S3 artifacts I/O speeds, which is crucial when working with many and\or large files.

Use Cases

When working with many and\or large artifacts via S3 artifact repository on a Workflow, the part of uploading and downloading artifacts in the beginning of every step can take more than the processing time of the step itself.


Message from the maintainers:

Love this enhancement proposal? Give it a 👍. We prioritise the proposals with the most 👍.

@nitays nitays added the type/feature Feature request label Jan 2, 2024
@tooptoop4
Copy link
Contributor

one caution is that a lot of these libs don't handle files larger than 5gb ie peak/s5cmd#29

@agilgur5 agilgur5 added the area/artifacts S3/GCP/OSS/Git/HDFS etc label Jan 5, 2024
@pranavarnav2
Copy link

pranavarnav2 commented Jan 8, 2024

one caution is that a lot of these libs don't handle files larger than 5gb ie peak/s5cmd#29

That is only when copying files from bucket to bucket, which I don't think is a use case with Argo Workflows artifacts. I've just copied a 10GB file from my local computer to an S3 bucket with s5cmd with no problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/artifacts S3/GCP/OSS/Git/HDFS etc type/feature Feature request
Projects
None yet
Development

No branches or pull requests

4 participants