Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH (sugaring): -c | --cfg-proc option for rev-create to specify cfg_ procedure(s) to run #3353

Merged
merged 7 commits into from Apr 29, 2019

Conversation

@yarikoptic
Copy link
Member

@yarikoptic yarikoptic commented Apr 24, 2019

ATM there is no easy way to specify procedures to run on a freshly created dataset. But in my case it is a common use case to quickly create datasets where I know what type they should be - should not
place text files to git, or might be BIDS datasets, etc. So it would be nice to be able to succinctly specify procedure(s) to be ran on the new dataset as to configure it.

Alternative implementation could be adding --proc-post to the rev-create command itself. Pros then would be:

  • consistent with --proc-post for datalad command itself
  • could then run any procedure, not only cfg_ one

Cons:

  • would be notably more long/tedius to type - i.e.

    rev-create --proc-post cfg_text2git
    

    vs

    rev-create -c text2git
    

I also added a lgr.info call within run_procedure to signal what procedure is actually about to be ran. Although might be considered a bit oxymoronic in the case of direct run_procedure invocation, makes sense everywhere else procedures could be invoked.

So now it would look like

$> datalad rev-create -c text2git -f /tmp/newds2        
[INFO   ] Creating a new annex repo at /tmp/newds2 
create(ok): /tmp/newds2 (dataset)                                                              
[INFO   ] Running procedure cfg_text2git 
[INFO   ] == Command start (output follows) ===== 
[INFO   ] == Command exit (modification check follows) ===== 

With something like this I think I wouldn't miss any other option of current create too much happen we just decide to replace create with rev-create.

TODOs:

  • Document special purpose/treatment of cfg_ procedures
  • Decide on cfg_ vs setup_?
yarikoptic added 2 commits Apr 24, 2019
ATM there is no easy way to specify procedures to run on a freshly
created dataset. But in my case it is a common use case to quickly
create datasets where I know what type they should be - should not
place text files to git, or might be BIDS datasets, etc.  So it would
be nice to be able to specify succinctly procedure(s) to be ran on
the new dataset.

Alternative implementation could adding  --proc-post to the rev-create
command itself.

Pros would be:

  - consistent with --proc-post for datalad command itself
  - could then run any procedure, not only cfg_ one

Cons:

  - would be notably more long/tedius to type - i.e.

    rev-create --proc-post cfg_text2git

    vs

    rev-create -c text2git
@codecov
Copy link

@codecov codecov bot commented Apr 24, 2019

Codecov Report

Merging #3353 into master will decrease coverage by 0.02%.
The diff coverage is 92.3%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3353      +/-   ##
==========================================
- Coverage   91.15%   91.12%   -0.03%     
==========================================
  Files         263      263              
  Lines       34162    34170       +8     
==========================================
- Hits        31139    31138       -1     
- Misses       3023     3032       +9
Impacted Files Coverage Δ
datalad/interface/run_procedure.py 89.93% <100%> (+0.69%) ⬆️
datalad/interface/tests/test_run_procedure.py 100% <100%> (ø) ⬆️
datalad/core/local/tests/test_create.py 100% <100%> (ø) ⬆️
datalad/core/local/create.py 98.37% <75%> (-0.8%) ⬇️
datalad/downloaders/tests/test_http.py 88.86% <0%> (-2.23%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update dbe9d30...02c9006. Read the comment docs.

mih
mih approved these changes Apr 25, 2019
Copy link
Member

@mih mih left a comment

I like this.

datalad/interface/run_procedure.py Show resolved Hide resolved
@kyleam
Copy link
Contributor

@kyleam kyleam commented Apr 25, 2019

Should we decide the fate of command hooks (#3264) before going forward with this?

@yarikoptic
Copy link
Member Author

@yarikoptic yarikoptic commented Apr 25, 2019

Should we decide the fate of command hooks (#3264) before going forward with this?

as I have expressed myself hooks specification doesn't allow for quick/concise option to achieve trivial dataset customization upon create. Although might be more generic and useful (although besides this create use case I do not have immediate others yet), it would probably largely be complimentary. May be even this sugaring functionality could be refactored later to (ab)use hooks for some reason instead of invoking run_procedure directly.

Overall, I do not think there is need to decide on hooks before starting to provide this convenience -c option. Or am I missing something?

@yarikoptic
Copy link
Member Author

@yarikoptic yarikoptic commented Apr 25, 2019

My concern with this PR is necessity to agree on the cfg_ prefix for this purpose and us already having at least two - cfg_ and setup_ which IMHO are the same thing here semantically:

$> datalad run-procedure --discover                                        
setup_bids_dataset (datalad_hirni/resources/procedures/setup_bids_dataset.py) [python_script]
setup_hirni_dataset (datalad_hirni/resources/procedures/setup_hirni_dataset.py) [python_script]
cfg_metadatatypes (../datalad-master/datalad/resources/procedures/cfg_metadatatypes.py) [python_script]
cfg_text2git (../datalad-master/datalad/resources/procedures/cfg_text2git.py) [python_script]
setup_yoda_dataset (../datalad-master/datalad/resources/procedures/setup_yoda_dataset.py) [python_script]

Do you think we could rename setup_ to cfg_ or there is some principled difference?

@kyleam
Copy link
Contributor

@kyleam kyleam commented Apr 25, 2019

@mih
Copy link
Member

@mih mih commented Apr 28, 2019

I am fine with cfg_ and adjusted accordingly: datalad/datalad-neuroimaging#60

@mih
Copy link
Member

@mih mih commented Apr 28, 2019

TODO: Document special purpose/treatment of cfg_ procedures

What do you envision here, it seem done already?

@mih
Copy link
Member

@mih mih commented Apr 28, 2019

@yarikoptic Pushed a small test too

mih added a commit to psychoinformatics-de/datalad-hirni that referenced this issue Apr 29, 2019
@yarikoptic
Copy link
Member Author

@yarikoptic yarikoptic commented Apr 29, 2019

Thank you @mih for all the tune ups! I am ok to merge it as is or we could may be discuss the issue about possible need of allowing for parametrization which I initially brough up in datalad/datalad-neuroimaging#60 (comment) and may be we better decide upon right away to not require changing API later on:

... if we figure out how to "parametrize" additional actions/parametrizations scalably/flexibly. E.g. here bids_derivative is probably just a flavor of bids, so we could may be better encode/interface parameters of the procedures? i.e. I want bids which supports derivatives specification and may be some other specifics which might be orthogonal to being bids raw or derivative (e.g. multisession vs single session) and thus ideally be in a single command and not elevating one aspect (raw-vs-derivative, or single-vs-multisession) into procedure name.
In the case of -c we probably then should follow (for consistency) what you have already for --proc-pre/post: --proc-pre <PROCEDURE NAME> [ARGS ...]: -c bids derivative or -c bids flavor=derivative, although I dislike those open ended options and preferred some encoding within the name (-c bids[derivative] or -c bids[flavor=derivative]).

What do you think?

@mih
Copy link
Member

@mih mih commented Apr 29, 2019

I think this is good for now. Any parameterization syntax should be discussed and implemented at a more global scope, as run_procedure(), the hooks, and storage of such specs in the config would all benefit from this.

@mih mih merged commit 7121e8c into datalad:master Apr 29, 2019
5 checks passed
@yarikoptic
Copy link
Member Author

@yarikoptic yarikoptic commented Apr 29, 2019

Fwiw run_procedure already has one in place AFAIK, that is what I thought to follow. Similar treatment is established by crawl-init

yarikoptic added a commit to yarikoptic/datalad that referenced this issue May 1, 2019
* origin/master: (152 commits)
  DOC: Touch up docstring addition from previous commit
  ENH: Add low-level switch to prevent file type evaluation
  BF+TST: Adjust tests for renamed procedure
  RF+TST: Workaround to get cross-platform compatible test
  RF: setup_yoda_dataset -> cfg_yoda
  TST: Simple test for rev_create(cfg_proc=...)
  ENH: Improve logging as suggested in datalad#3353
  ENH: --cfg option for rev-create to specify cfg_ procedure(s) to run
  ENH: lgr.info what procedure is about to be ran
  BF: empty list means no files for get_changed_files (+fix generate_file_chunks)
  RF: extract code into helper generate_file_chunks and use it in get_changed_files
  RF/BF: commit(..., files=...) - use get_changed_files also on a list of provided files
  ENH: be able to pass "files" to get_changed_files
  TEMP: ENH(TST): test to demonstrate save issue on subdirectories
  BF(BK): use ls-files to expand list of provided files
  DOC: gitrepo: Tailor status and diff's descriptions of untracked
  DOC: gitrepo: Fix a docstring typo
  DOC: gitrepo.diff: Mention None as possible value for fr
  CLN: gitrepo: Cosmetics
  CLN: gitrepo: Drop unused/repeated imports
  ...
@yarikoptic yarikoptic added this to the Release 0.12.0 milestone May 2, 2019
mih added a commit to psychoinformatics-de/datalad-hirni that referenced this issue May 9, 2019
mih added a commit to psychoinformatics-de/datalad-hirni that referenced this issue May 9, 2019
mih added a commit to bpoldrack/datalad-hirni that referenced this issue Jun 1, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants