Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Command to execute "prepared procedures" #2603

Merged
merged 15 commits into from
Jun 5, 2018
Merged

Conversation

mih
Copy link
Member

@mih mih commented Jun 3, 2018

Rational is in #2593 AKA

  • dataset templates
  • run-before/run_after
  • configuration snippets

The current state of the PR implements the procedural aspects of #1462.

This will not be complete until run gets a feature extension discussed in #2593. But already does meaningful things now (but limited to clean datasets right now).

Desired functionality is not limited to dataset management/setup use cases (for which datalad could ship a bunch of procedures), but also targets totally custom user-provided functionality. Example:

A dataset can provided implementations of a bunch of algorithms. Another dataset can include this dataset as a subdataset and configure its own datalad.locations.dataset-procedures config to extend search for procedures to the subdataset directory. This can be used to expose a "toolchain" of arbitrary procedures to the new parent dataset. Such procedures can be full blow data processing pipelines...

TODOs (not necessarily all in this PR):

  • revive the run-before/run_after functionality that is commented out right now. This will reenable the configuration handling to run arbitrary procedures before and after other commands, based on user configuration. Use case: "create --run-after mydatasetsetuproutine" -> dataset templates.
  • any datalad extension can provide additional procedures
  • dataset template for YODA datasets (demo stage)
  • make run work with dirty datasets @kyleam do you have some extra juice for that?
    run: Add --explicit flag #2607
  • test to pull procedure config from git config and run before a command

mih added 2 commits June 3, 2018 17:23
Both replicate and replace functionality that is currently only
available as part of `create`, but is frequently needed at a latter
stage in the life span of a dataset.
@codecov
Copy link

codecov bot commented Jun 3, 2018

Codecov Report

Merging #2603 into master will decrease coverage by 0.01%.
The diff coverage is 87.01%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2603      +/-   ##
==========================================
- Coverage   90.32%   90.31%   -0.02%     
==========================================
  Files         241      242       +1     
  Lines       30265    30375     +110     
==========================================
+ Hits        27337    27432      +95     
- Misses       2928     2943      +15
Impacted Files Coverage Δ
datalad/tests/utils.py 90.73% <ø> (-0.27%) ⬇️
datalad/interface/common_cfg.py 100% <ø> (ø) ⬆️
datalad/interface/__init__.py 100% <ø> (ø) ⬆️
datalad/distribution/tests/test_create.py 100% <100%> (ø) ⬆️
datalad/cmdline/main.py 78% <100%> (+0.18%) ⬆️
datalad/interface/utils.py 91.39% <64.7%> (-1.62%) ⬇️
datalad/interface/run_procedure.py 85.18% <85.18%> (ø)
datalad/utils.py 89.79% <95.34%> (+0.33%) ⬆️
datalad/interface/base.py 95.55% <0%> (+0.44%) ⬆️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 744daa9...5dc63ee. Read the comment docs.

mih added 3 commits June 3, 2018 20:49
Example:

datalad \
    -c datalad.locations.system-procedures=/bin \
    -c 'datalad.clean.run-before=echo COMPLAIN' \
    clean

The above prints "COMPLAIN" just before the `clean` command is executed.
@kyleam
Copy link
Contributor

kyleam commented Jun 4, 2018

make run work with dirty datasets @kyleam do you have some extra juice for that?

I'll take a look at it today.

@mih
Copy link
Member Author

mih commented Jun 4, 2018

OSX looks like spurious connection failure

kyleam added a commit to kyleam/datalad that referenced this pull request Jun 4, 2018
This feature will be used for dataset templates, and the ability to
run things with a dirty dataset and to save specific outputs is useful
in general.

Re: datalad#2593, datalad#2603
@mih
Copy link
Member Author

mih commented Jun 5, 2018

All the controversy seems to around #2607 -- hence merging this one. It is impaired by the lack of functionality equivalent to #2607, but nevertheless functional.

@mih mih merged commit 97f990c into datalad:master Jun 5, 2018
@mih mih deleted the nf-procedures branch June 5, 2018 05:07
@mih
Copy link
Member Author

mih commented Jun 5, 2018

Arrg, forgot that docs were pending... Coming shortly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants