Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datacomp dataset download #3

Closed
mactavish91 opened this issue Apr 6, 2024 · 2 comments
Closed

Datacomp dataset download #3

mactavish91 opened this issue Apr 6, 2024 · 2 comments

Comments

@mactavish91
Copy link

Hello,

I am attempting to download the complete datacomp dataset but have encountered significant difficulties. As of now, more than a third of the links I’ve attempted to access are invalid. I’m curious to know how others have successfully downloaded the entire dataset. Could you share the method you used? I greatly appreciate any guidance and look forward to your response.

@Beckschen
Copy link
Owner

Hello, and thank you for expressing your interest!

I share similar concerns as you do. We downloaded the dataset around May 2023, achieving a download success rate of approximately 95% to 97%. I am not sure what is the success rate for now.
The Datacomp-1B dataset card can be also found here: https://huggingface.co/datasets/mlfoundations/datacomp_1b

Best
Jieneng

@Beckschen
Copy link
Owner

There are similar issues for your references:

mlfoundations/datacomp#68

mlfoundations/datacomp#39

mlfoundations/datacomp#41

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants