-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add functionality to load/save distisets to/from disk #673
Conversation
…to distiset-to-disk
@rasdani did you have time to try saving/loading to s3 with this branch? |
yes, saving and loading with S3 works so far! :) However one needs to keep in mind, to append EDIT: hold on, I just realised, I tested with |
…n calling Distiset.load_from_disk
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
…to distiset-to-disk
Co-authored-by: Gabriel Martín Blázquez <gmartinbdev@gmail.com>
Co-authored-by: Gabriel Martín Blázquez <gmartinbdev@gmail.com>
Co-authored-by: Gabriel Martín Blázquez <gmartinbdev@gmail.com>
Co-authored-by: Gabriel Martín Blázquez <gmartinbdev@gmail.com>
Description
This PR includes two new methods on
Distiset
to save the content to disk (or load from it), and also to a remote storage based on the content of thestorage_options
passed, following the implementation of Hugging Face's datasets: https://huggingface.co/docs/datasets/filesystems#saving-serialized-datasets