Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH (sugaring): -c | --cfg-proc option for rev-create to specify cfg_ procedure(s) to run #3353

Merged
merged 7 commits into from Apr 29, 2019

Conversation

@yarikoptic
Copy link
Member

commented Apr 24, 2019

ATM there is no easy way to specify procedures to run on a freshly created dataset. But in my case it is a common use case to quickly create datasets where I know what type they should be - should not
place text files to git, or might be BIDS datasets, etc. So it would be nice to be able to succinctly specify procedure(s) to be ran on the new dataset as to configure it.

Alternative implementation could be adding --proc-post to the rev-create command itself. Pros then would be:

  • consistent with --proc-post for datalad command itself
  • could then run any procedure, not only cfg_ one

Cons:

  • would be notably more long/tedius to type - i.e.

    rev-create --proc-post cfg_text2git
    

    vs

    rev-create -c text2git
    

I also added a lgr.info call within run_procedure to signal what procedure is actually about to be ran. Although might be considered a bit oxymoronic in the case of direct run_procedure invocation, makes sense everywhere else procedures could be invoked.

So now it would look like

$> datalad rev-create -c text2git -f /tmp/newds2        
[INFO   ] Creating a new annex repo at /tmp/newds2 
create(ok): /tmp/newds2 (dataset)                                                              
[INFO   ] Running procedure cfg_text2git 
[INFO   ] == Command start (output follows) ===== 
[INFO   ] == Command exit (modification check follows) ===== 

With something like this I think I wouldn't miss any other option of current create too much happen we just decide to replace create with rev-create.

TODOs:

  • Document special purpose/treatment of cfg_ procedures
  • Decide on cfg_ vs setup_?

yarikoptic added some commits Apr 24, 2019

ENH: --cfg option for rev-create to specify cfg_ procedure(s) to run
ATM there is no easy way to specify procedures to run on a freshly
created dataset. But in my case it is a common use case to quickly
create datasets where I know what type they should be - should not
place text files to git, or might be BIDS datasets, etc.  So it would
be nice to be able to specify succinctly procedure(s) to be ran on
the new dataset.

Alternative implementation could adding  --proc-post to the rev-create
command itself.

Pros would be:

  - consistent with --proc-post for datalad command itself
  - could then run any procedure, not only cfg_ one

Cons:

  - would be notably more long/tedius to type - i.e.

    rev-create --proc-post cfg_text2git

    vs

    rev-create -c text2git
@codecov

This comment has been minimized.

Copy link

commented Apr 24, 2019

Codecov Report

Merging #3353 into master will decrease coverage by 0.02%.
The diff coverage is 92.3%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3353      +/-   ##
==========================================
- Coverage   91.15%   91.12%   -0.03%     
==========================================
  Files         263      263              
  Lines       34162    34170       +8     
==========================================
- Hits        31139    31138       -1     
- Misses       3023     3032       +9
Impacted Files Coverage Δ
datalad/interface/run_procedure.py 89.93% <100%> (+0.69%) ⬆️
datalad/interface/tests/test_run_procedure.py 100% <100%> (ø) ⬆️
datalad/core/local/tests/test_create.py 100% <100%> (ø) ⬆️
datalad/core/local/create.py 98.37% <75%> (-0.8%) ⬇️
datalad/downloaders/tests/test_http.py 88.86% <0%> (-2.23%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update dbe9d30...02c9006. Read the comment docs.

@mih

mih approved these changes Apr 25, 2019

Copy link
Member

left a comment

I like this.

datalad/interface/run_procedure.py Show resolved Hide resolved
@kyleam

This comment has been minimized.

Copy link
Member

commented Apr 25, 2019

Should we decide the fate of command hooks (#3264) before going forward with this?

@yarikoptic

This comment has been minimized.

Copy link
Member Author

commented Apr 25, 2019

Should we decide the fate of command hooks (#3264) before going forward with this?

as I have expressed myself hooks specification doesn't allow for quick/concise option to achieve trivial dataset customization upon create. Although might be more generic and useful (although besides this create use case I do not have immediate others yet), it would probably largely be complimentary. May be even this sugaring functionality could be refactored later to (ab)use hooks for some reason instead of invoking run_procedure directly.

Overall, I do not think there is need to decide on hooks before starting to provide this convenience -c option. Or am I missing something?

@yarikoptic

This comment has been minimized.

Copy link
Member Author

commented Apr 25, 2019

My concern with this PR is necessity to agree on the cfg_ prefix for this purpose and us already having at least two - cfg_ and setup_ which IMHO are the same thing here semantically:

$> datalad run-procedure --discover                                        
setup_bids_dataset (datalad_hirni/resources/procedures/setup_bids_dataset.py) [python_script]
setup_hirni_dataset (datalad_hirni/resources/procedures/setup_hirni_dataset.py) [python_script]
cfg_metadatatypes (../datalad-master/datalad/resources/procedures/cfg_metadatatypes.py) [python_script]
cfg_text2git (../datalad-master/datalad/resources/procedures/cfg_text2git.py) [python_script]
setup_yoda_dataset (../datalad-master/datalad/resources/procedures/setup_yoda_dataset.py) [python_script]

Do you think we could rename setup_ to cfg_ or there is some principled difference?

@kyleam

This comment has been minimized.

Copy link
Member

commented Apr 25, 2019

@mih

This comment has been minimized.

Copy link
Member

commented Apr 28, 2019

I am fine with cfg_ and adjusted accordingly: datalad/datalad-neuroimaging#60

@mih

This comment has been minimized.

Copy link
Member

commented Apr 28, 2019

TODO: Document special purpose/treatment of cfg_ procedures

What do you envision here, it seem done already?

@mih

This comment has been minimized.

Copy link
Member

commented Apr 28, 2019

@yarikoptic Pushed a small test too

mih added a commit to psychoinformatics-de/datalad-hirni that referenced this pull request Apr 29, 2019

@yarikoptic

This comment has been minimized.

Copy link
Member Author

commented Apr 29, 2019

Thank you @mih for all the tune ups! I am ok to merge it as is or we could may be discuss the issue about possible need of allowing for parametrization which I initially brough up in datalad/datalad-neuroimaging#60 (comment) and may be we better decide upon right away to not require changing API later on:

... if we figure out how to "parametrize" additional actions/parametrizations scalably/flexibly. E.g. here bids_derivative is probably just a flavor of bids, so we could may be better encode/interface parameters of the procedures? i.e. I want bids which supports derivatives specification and may be some other specifics which might be orthogonal to being bids raw or derivative (e.g. multisession vs single session) and thus ideally be in a single command and not elevating one aspect (raw-vs-derivative, or single-vs-multisession) into procedure name.
In the case of -c we probably then should follow (for consistency) what you have already for --proc-pre/post: --proc-pre <PROCEDURE NAME> [ARGS ...]: -c bids derivative or -c bids flavor=derivative, although I dislike those open ended options and preferred some encoding within the name (-c bids[derivative] or -c bids[flavor=derivative]).

What do you think?

@mih

This comment has been minimized.

Copy link
Member

commented Apr 29, 2019

I think this is good for now. Any parameterization syntax should be discussed and implemented at a more global scope, as run_procedure(), the hooks, and storage of such specs in the config would all benefit from this.

@mih mih merged commit 7121e8c into datalad:master Apr 29, 2019

5 checks passed

WIP ready for review
Details
codecov/patch 92.3% of diff hit (target 91.15%)
Details
codecov/project Absolute coverage decreased by -0.02% but relative coverage increased by +1.15% compared to dbe9d30
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@yarikoptic

This comment has been minimized.

Copy link
Member Author

commented Apr 29, 2019

Fwiw run_procedure already has one in place AFAIK, that is what I thought to follow. Similar treatment is established by crawl-init

yarikoptic added a commit to yarikoptic/datalad that referenced this pull request May 1, 2019

Merge remote-tracking branch 'origin/master' into temp-test-devel-annex
* origin/master: (152 commits)
  DOC: Touch up docstring addition from previous commit
  ENH: Add low-level switch to prevent file type evaluation
  BF+TST: Adjust tests for renamed procedure
  RF+TST: Workaround to get cross-platform compatible test
  RF: setup_yoda_dataset -> cfg_yoda
  TST: Simple test for rev_create(cfg_proc=...)
  ENH: Improve logging as suggested in datalad#3353
  ENH: --cfg option for rev-create to specify cfg_ procedure(s) to run
  ENH: lgr.info what procedure is about to be ran
  BF: empty list means no files for get_changed_files (+fix generate_file_chunks)
  RF: extract code into helper generate_file_chunks and use it in get_changed_files
  RF/BF: commit(..., files=...) - use get_changed_files also on a list of provided files
  ENH: be able to pass "files" to get_changed_files
  TEMP: ENH(TST): test to demonstrate save issue on subdirectories
  BF(BK): use ls-files to expand list of provided files
  DOC: gitrepo: Tailor status and diff's descriptions of untracked
  DOC: gitrepo: Fix a docstring typo
  DOC: gitrepo.diff: Mention None as possible value for fr
  CLN: gitrepo: Cosmetics
  CLN: gitrepo: Drop unused/repeated imports
  ...

@yarikoptic yarikoptic added this to the Release 0.12.0 milestone May 2, 2019

mih added a commit to psychoinformatics-de/datalad-hirni that referenced this pull request May 9, 2019

mih added a commit to psychoinformatics-de/datalad-hirni that referenced this pull request May 9, 2019

mih added a commit to bpoldrack/datalad-hirni that referenced this pull request Jun 1, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.