-
Notifications
You must be signed in to change notification settings - Fork 24
feat: metadata on dataset creation #850
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I tried the following: (1) didn't realize that the (2) Didn't realize But then
I don't quite understand |
|
Not gonna dig into this one now; can see it going either way (i.e. if the directory shows up, users will think they can add files to the directory without
What's the definition of a "valid Renku name"?
What about the inconsistency between whether it's named by |
f05610b to
c0c7d21
Compare
Please create an issue for this if you think we need to discuss it.
A valid Git reference. Basically, one can use alphanumeric, ., -, and _. Some more characters are allowed by Git to use but we should (and will) disallow them.
The |
|
Since we are now using |
renku/core/management/datasets.py
Outdated
| dataset.to_yaml() | ||
|
|
||
| def create_dataset( | ||
| self, name, internal_name='', description='', creators=() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should allow for specifying an identifier?
Also, +1 on this refactor. :)
rokroskar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a good improvement, thanks! I'm still not 100% sure how the names should be handled exactly, but I find this much less confusing than before.
We already have a |
|
Right - I'm wondering if we could follow the same convention for newly-created datasets - if the name given by the user matches the "allowed characters" regex, then we keep it as is and then this would become |
|
I believe this will confuse some users as they expect to the name |
|
What if we followed up the dataset creation with a helpful message that said something like: We also need something like this to follow the import command anyway. |
bdfc87c to
d482ec2
Compare
|
I had to force-push to resolve merge conflicts. Please review the last two commits. |
|
Cool! I think that works nicely: @emmjab what do you think? |
|
Hah -- I think this is still confusing. Why does the name have to change? It has to be a valid git name? Why? it's not a submodule anymore, was that why? |
|
Because it's annoying to type something that is super long and (potentially) contains special characters? |
|
It's not obvious in this example, clearly, but if you import a dataset from zenodo it's likely that the name will be annoyingly long. You don't want to have to type that or copy/paste that every time you use it |
|
hmm... autocomplete? 😅
Why is that? I can google this |
|
Take https://zenodo.org/record/3549866 for example. The name is "Synthetic dataset used in "The maximum weighted submatrix coverage problem: A CP approach"" - how do you suppose to use that on the command line? |
|
why not use the DOI? <-- but we're not just talking about the imported datasets, are we? |
|
Ok so @emmjab and I have been discussing this a bit offline. I find "internal_name" a bit confusing - what if we called it "short_name"? We could store this using So it would look like or |
rokroskar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is definitely a step in the right direction! Thanks!
Description
When creating datasets, users can provide its description, creators, and display name using command line options. Creator is a string with "Name " format and users can pass multiple creators.
This also makes a consistent use of
short_namein the code.short_nameis used as the dataset's data directory name and dataset's reference name.nameis not used by Renku other than storing it as a metadata. If users don't provide a short name when creating a dataset then one is automatically created.Fixes #515
Fixes #791
Fixes #840