Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add functionality to push tasks to the HuggingFace hub and download them automatically. #297

Merged
merged 24 commits into from
Jan 30, 2024

Conversation

plaguss
Copy link
Contributor

@plaguss plaguss commented Jan 24, 2024

Description

This PR adds the functionality to push tasks to the hub and download them automatically if available. It will simplify working with CustomDataset and keep track of the tasks that come with them out of the box.

An example can be seen here.

Closes #290.

@plaguss plaguss self-assigned this Jan 24, 2024
Copy link
Member

@davidberenstein1957 davidberenstein1957 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@plaguss looking good! Added some small comments?
Shouldn't we also test if we can actually push and pull from the hub.

Also, I think the classes for Test*TaskSerialization can be generalized a bit w.r.t. having the same base class and different init args.

src/distilabel/dataset.py Outdated Show resolved Hide resolved
path_in_repo=TASK_FILE_NAME,
repo_id=repo_id,
repo_type="dataset",
token=kwargs.get("token"),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if None will this default to the HF token internally?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The HfApi().upload_file function should take care of that on it's own, this is the same implementation we have in argilla to update the dataset card here.

src/distilabel/dataset.py Outdated Show resolved Hide resolved
src/distilabel/dataset.py Outdated Show resolved Hide resolved
tests/tasks/test_serialization.py Outdated Show resolved Hide resolved
@plaguss plaguss merged commit 6bf3ee8 into main Jan 30, 2024
4 checks passed
@plaguss plaguss deleted the feat/tasks-to-hub branch January 30, 2024 15:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE] Include the tasks when pushing the CustomDataset to HuggingFace hub
2 participants