Replies: 1 comment
-
I would towards representing the specific branch tag or commit in the URL (or separate dedicated fields), and not just in hash fields like cr:md5, cr:sha1 or sc:sha256. Those fields are intended to provide a checksum to verify content, not as a mechanism to address content, like git does with hashes. On the exact representation:
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Context
Croissant supports git-based repositories (e.g., GitHub or Hugging Face). However, we already define a way to extract either from the default branch or from a ref.
Proposal
Reference a specific branch or a tag. We could encode this in the URL just as we encoded the ref. This encoding would be specific to each repository.
https://github.com/<username>/<repository_name>/tree/<branch_name>
where branch_name is encoded (e.g.,feature/new updates
->feature%2Fnew%20updates
).https://huggingface.co/datasets/<dataset_id>/tree/<branch_name>
.Reference a specific commit.
cr:sha1
(just like we addedcr:md5
).https://github.com/<username>/<repository_name>/commit/<commit>
. The inconvenient of this method is that we lose the information about the branch or the ref.Beta Was this translation helpful? Give feedback.
All reactions