Huggingface dataset integration#101
Conversation
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
…r-organization Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com> # Conflicts: # resources_servers/comp_coding/configs/comp_coding.yaml
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
|
Hey @fsiino-nvidia , can we resolve the conflict as well? Left some comments there also. Thx! |
| print(f"[Nemo-Gym] - Repo '{repo_id}' already exists") | ||
| except HfHubHTTPError as e: | ||
| if e.response is not None and e.response.status_code == 404: | ||
| client.create_repo(repo_id=repo_id, token=config.hf_token, repo_type="dataset", private=True) |
There was a problem hiding this comment.
This seems to be a slightly unnatural way to check existing repo and then create one. Does create_repo not automatically check the existence?
There was a problem hiding this comment.
added exist_ok=True to simplify this.
| ``` | ||
|
|
||
| Naming convention for Huggingface datasets is as follows: | ||
| `{hf_organization}/{hf_collection_name}-{domain}–{resource_server_name}-{your dataset name}` |
There was a problem hiding this comment.
Can we decouple collection vs dataset prefix? There might be cases where we want to put it under "Nemo-Gym" collection without having a prefix "Nemo-Gym" there. So it would be nice to have two fields for these.
There was a problem hiding this comment.
I changed this to hf_dataset_prefix. By default it is NeMo-Gym- and can be overridden with even an empty string if desired.
Signed-off-by: Frankie Siino <fsiino@nvidia.com> # Conflicts: # .pre-commit-config.yaml # README.md # nemo_gym/config_types.py # resources_servers/google_search/configs/google_search.yaml # scripts/update_resource_servers.py
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
8af32c4 to
f8fd7bc
Compare
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
f8fd7bc to
7d668f4
Compare
Signed-off-by: Frankie Siino <fsiino@nvidia.com> # Conflicts: # README.md # nemo_gym/config_types.py # scripts/update_resource_servers.py
|
Hey @fsiino-nvidia , the PR should be ready to merge. Could you please resolve the current conflicts? |
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
f5fb64b to
c52f84a
Compare
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
This change adds support for Huggingface dataset management (upload/download/delete Gitlab artifact(s)) Addresses items 2 and 3 from #81 --------- Signed-off-by: Frankie Siino <fsiino@nvidia.com> Signed-off-by: Lawrence Lane <llane@nvidia.com>
This change adds support for Huggingface dataset management (upload/download/delete Gitlab artifact(s)) Addresses items 2 and 3 from NVIDIA-NeMo#81 --------- Signed-off-by: Frankie Siino <fsiino@nvidia.com>
This change adds support for Huggingface dataset management (upload/download/delete Gitlab artifact(s)) Addresses items 2 and 3 from NVIDIA-NeMo#81 --------- Signed-off-by: Frankie Siino <fsiino@nvidia.com>
This change adds support for Huggingface dataset management (upload/download/delete Gitlab artifact(s)) Addresses items 2 and 3 from NVIDIA-NeMo#81 --------- Signed-off-by: Frankie Siino <fsiino@nvidia.com>
This change adds support for Huggingface dataset management (upload/download/delete Gitlab artifact(s))
Addresses items 2 and 3 from #81