New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feedback welcome] CLI to upload arbitrary huge folder #2254
base: main
Are you sure you want to change the base?
Conversation
Co-authored-by: Lysandre Debut <hi@lysand.re>
Co-authored-by: Lysandre Debut <hi@lysand.re>
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Feedback so far:
EDIT:
|
IMO, it would make sense for this not to default to uploading as a model repo i.e. require this: huggingface-cli large-upload <repo-id> <local-path> --repo-type dataset If a user runs:
they should get an error along the lines of "Please specify the repo type you want to use" Quite a few people using this tool have accidentally uploaded a dataset to a model repo, and currently, it's not easy to move this to a dataset repo. I know that many of the
|
ah i rather agree with @davanstrien here |
Can the parameters of "large-upload" be aligned to the "upload"? |
@wanng-ide Agree we should aim for consistency yes. What parameters/options you would specifically change? So far we have: $ huggingface-cli large-upload --help
usage: huggingface-cli <command> [<args>] large-upload [-h] [--repo-type {model,dataset,space}]
[--revision REVISION] [--private]
[--include [INCLUDE ...]] [--exclude [EXCLUDE ...]]
[--token TOKEN] [--num-workers NUM_WORKERS]
repo_id local_path $ huggingface-cli upload --help
usage: huggingface-cli <command> [<args>] upload [-h] [--repo-type {model,dataset,space}]
[--revision REVISION] [--private] [--include [INCLUDE ...]]
[--exclude [EXCLUDE ...]] [--delete [DELETE ...]]
[--commit-message COMMIT_MESSAGE]
[--commit-description COMMIT_DESCRIPTION] [--create-pr]
[--every EVERY] [--token TOKEN] [--quiet]
repo_id [local_path] [path_in_repo] |
what about: huggingface-cli large-upload [local_path] [path_in_repo] |
I'm not sure to understand what's the purpose of the |
What for?
Upload arbitrarily large folders in a single command line!
How to use it?
Install
Upload folder
Every minute a report is printed to the terminal with the current status. Apart from that, progress bars and errors are still displayed.
Run
huggingface-cli large-upload --help
to see all options.What does it do?
This CLI is intended to upload arbitrary large folders in a single command:
A
.hugginface/
folder will be created at the root of your folder to keep track of the progress. Please do not modify these files manually. If you feel this folder got corrupted, please report it here, delete the.huggingface/
entirely and then restart you command. Some intermediate steps will be lost but the upload process should be able to continue correctly.Known limitations
path_in_repo
=> always upload files at root of the folder. If you want to upload to a subfolder, you need to set the proper structure locally.hf_transfer
(though it works) => better to set--num-workers
to 2 otherwise CPU will be bloatedrevision
What to review?
Nothing yet.
For now the goal is to gather as much feedback as possible. If it proves successful, I will clean the implementation and make it more production-ready. Also, this PR is built on top of #2223 that is not merged yet, which makes the changes very long.
For curious people, here is the logic to decide what should be the next task to perform.