-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new serialization strategy #288
Conversation
…feat/serializer
from distilabel.utils.imports import _ARGILLA_AVAILABLE | ||
from distilabel.utils.serialization import load_task_from_disk |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't we be able to replace load_task_from_disk
too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do you mean exactly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We call self.task.save
but though we might also add load_task_from_disk
in that load_from_disk
functon but now see it is inheriting from Dataset
so that does not make sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes those two are different, one for the dataset which we don't really deal with it ourselves, and the task which is saved as a json file.
Still need to review some private variables from the dump |
Description
This PR adds a new module
distilabel.utils.serialization
to allow custom serialization of the tasks (to be extended to the different distilabel components in the future). When we save/load from disk, we will have atask-distilabel.json
instead oftask.pkl
.pickle
, making it safer to move the files around.Sample task from
UltraFeedbackTask
:Closes #261.