Skip to content
This repository has been archived by the owner on Jan 14, 2022. It is now read-only.

make dandiset.yaml a proper file in the archive + canonical source for dandiset metadata #491

Closed
yarikoptic opened this issue Aug 31, 2020 · 3 comments
Labels
severity-important major effect on the usability of a package, without rendering it completely unusable to everyone

Comments

@yarikoptic
Copy link
Member

I have voiced that opinion before and feel increasingly strong about it to give it a priority (thus a label):
let's discuss so either I am convinced that current implementation is indeed "a better way to go" or we introduce necessary changes before I breed more of "dandiset.yaml" specific code (e.g. within datalad crawler etc).

Giving dandiset.yaml special treatment already

  • added burden within dandi-cli code for download/upload operations
  • created "dichotomy" in how dandiset looks locally (when downloaded) and on the archive (there is no dandiset.yaml)
  • IIRC made tracking changes to the dandiset.yaml file more tricky/different than from the rest of the files

As long as "dandiset metadata" is intended to be downloaded/instantiated as dandiset.yaml locally, I really do not see why dandiset.yaml needs any special treatment: in similar fashion we do have metadata extracted and uploaded for each file. The same situation with dandiset.yaml, just that metadata is not be assigned into dandiset.yaml's Item on girder but into a corresponding folder. Thus, as part of the change, I would have even went forward and left "folder" metadata untouched -- metadata could be loaded from dandiset.yaml item (file) metadata with the same success/little of changes even now AFAIK. That would result in a paradigm (consistent between dandi backend and local instances) that dandiset.yaml is the canonical source of metadata for the dandiset.

NB I am keeping any "stats" away from the picture ATM

If dandiset metadata record is edited in web ui: save a new copy of the dandiset.yaml upon "Save" and a copy metadata into metadata record (or just use dandi-cli to upload it ;))
Sounds like a burden for web ui ATM. But in the longer run: since we are working on API, I think it should be just a matter of API providing an endpoint to "set metadata for a dandiset", and doing necessary update to dandiset.yaml at that single point (upload of a dandiset.yaml would need to set metadata as for any other file - similar procedure, just different type of metadata record).
Alternatively we could even: do not bother providing API for setting dandiset metadata record, and just rely on upload of a new dandiset.yaml.
For upload (PUT and asset) -- as for any other (file) upload operation, API backend should ensure that there is an updated metadata record uploaded with a file, and if not discard that upload with some error. So again, situation is the same between dandiset.yaml and any other file.

If such "non-special" treatment within API is introduced then dandi-cli then could remove custom code for both download and upload operations entirely. Any other client which would work without API would not need any special code to "instantiate" a local copy of dandiset.yaml -- it would be just yet another asset it would download, etc.

@yarikoptic yarikoptic added the severity-important major effect on the usability of a package, without rendering it completely unusable to everyone label Aug 31, 2020
@satra
Copy link
Member

satra commented Aug 31, 2020

my quick response to this:

i don't think dandiset.yaml as a separate file for the database is a good idea still. that information is still in the metadata of the dandiset object. yes, it gets serialized to disk, but i don't think it should be a separate file in the db. on publish, yes, on disk yes, but not in db.

the api should still return metadata when you do a GET on the dandiset and you should be able to POST metadata to a dandiset - neither of which i believe are in the current form, but should be in the unified API.

much of the metadata will eventually be "functional", i.e. generated from the assets in realtime on change/from provenance. we are starting with a fairly static form just because it is easier.

@yarikoptic
Copy link
Member Author

So no specific cons stated so far ;-)

the api should still return metadata when you do a GET on the dandiset and you should be able to POST metadata to a dandiset - neither of which i believe are in the current form, but should be in the unified API.

That is ok with me, and inline with non-"Alternatively we could" approach.

much of the metadata will eventually be "functional", i.e. generated from the assets in realtime on change/from provenance

That is Ok, and IMHO not directly pros/cons here as long as that "metadata" needs to be "serialized" into a standardized form (i.e. dandiset.yaml in this case) at any point. It would indeed be different situation while thinking about notions of "subjects" (will likely to be directories, possibly in multiple dandisets), or sessions etc. They all would have some metadata aggregated across files or directories and do not have a direct "file serialization" (besides some "one way" extracts, e.g. as possibly participants.tsv in BIDS).

@waxlamp waxlamp added this to Needs triage in Migration checklist via automation Mar 3, 2021
@waxlamp waxlamp removed this from Needs triage in Migration checklist Mar 8, 2021
@yarikoptic
Copy link
Member Author

ok, current state of things is that it would remain just in dandiset metadata and it would be for dandi-cli to treat it separately. I think we might come back to this later on, but I will close it meanwhile

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
severity-important major effect on the usability of a package, without rendering it completely unusable to everyone
Projects
None yet
Development

No branches or pull requests

2 participants