Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: annex-metadata-outputs procedure wishlist #77

Open
yarikoptic opened this issue Sep 20, 2019 · 2 comments
Open

RFC: annex-metadata-outputs procedure wishlist #77

yarikoptic opened this issue Sep 20, 2019 · 2 comments

Comments

@yarikoptic
Copy link
Member

I am thinking about creating a procedure such as annex-metadata-outputs which should take as its parameters options for git annex metadata -s call(s) which would be ran on the (annexed) files in the last commit.

The idea came from the fact that unlikely we would extend datalad save with some options to specify git annex metadata to be set for the saved files. So may be then I could use --proc-post to do that. Such procedure should get a list of modified files in the last commit, and run git annex metadata -s on them. Then we should be able to use it with a regular datalad save or datalad run (may be! since it might happen that run doesn't save any new results.. not yet sure what to do about that). It is also partially due to the inability to specify those via .gitattributes: https://git-annex.branchable.com/git-annex-metadata/#comment-fde59930f108af0fff842f5e25351e93

Sample use cases

  • adding materials which should be annotated with some distribution-restrictions git annex metadata, e.g.
$ datalad --proc-post annex-metadata-outputs distribution-restrictions=sensitive save -m "Added various license files" licenses/*
  • annotating sensitive logs
$ datalad --proc-post annex-metadata-outputs distribution-restrictions=sensitive containers-run -m "Running subject X" --output logs/* -n containers/repronim-ptb-3 scripts/myexperiment.m

I still feel that simply specifying in .gitattributes some action to do on the matching files would be the most consistent and reliable way. That is why I am still wondering if such a procedure worth pursuing. May be in the scope of metalad it needs to generalize even further (attach not only git annex metadata) anyways.

So -- the issue is open for discussion

@mih
Copy link
Member

mih commented Sep 21, 2019

Q to me is whether git-annex metadata is the right receptacle for this information, but that would depend on the desired use-cases. In metalad there is already a custom extractor that has the ability to pull metadata for individual files from a configurable location. This approach could be generalized.

However, if the desire is to make annex aware of metadata, e.g. to be able to use it in wanted expressions that wouldn't help much.

@yarikoptic
Copy link
Member Author

yes, annex wanted has been my primary usecase for this field for awhile now. Quite a number of datasets on /// are setup that way and it generally works great! I hope that eventually we will come back to our discussion on create-sibling (#925) and now found an earlier sibling of this issue (#921).

@bpoldrack bpoldrack transferred this issue from datalad/datalad Oct 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants