run: Add dry run mode #5539

kyleam · 2021-03-31T17:29:37Z

This PR implements the dry-run operation proposed in gh-5538 by adding a --dry-run flag to run along with a custom result renderer. I went light on the reported details, but, if desired, I think it should be straightforward to extend this approach later with more information.

I decided to go with the name --dry-run because I think that's the most familiar/obvious name, though --report would align with rerun. Also, --report might be a bit more natural if this option is later extended to take an optional value (e.g., to restrict or extend what is reported).

$ touch foo
$ datalad save

$ datalad run --dry-run -i 'fo*' -o out 'cat {inputs[0]} >{outputs[0]}'
Dry run information
 location: /tmp/dl-vaZ53PN
 expanded inputs:
  ['foo']
 expanded outputs:
  ['out']
 command:
  cat {inputs[0]} >{outputs[0]}
 expanded command:
  cat foo >out

codecov · 2021-03-31T17:47:31Z

Codecov Report

Merging #5539 (6416161) into master (941c615) will increase coverage by 0.01%.
The diff coverage is 98.78%.

@@            Coverage Diff             @@
##           master    #5539      +/-   ##
==========================================
+ Coverage   90.32%   90.33%   +0.01%     
==========================================
  Files         305      305              
  Lines       41577    41654      +77     
==========================================
+ Hits        37554    37630      +76     
- Misses       4023     4024       +1

Impacted Files	Coverage Δ
datalad/core/local/run.py	`98.15% <97.61%> (-0.14%)`	⬇️
datalad/core/local/tests/test_run.py	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 941c615...6416161. Read the comment docs.

yarikoptic · 2021-03-31T18:35:21Z

Thank you @kyleam - yet to review changes etc, I see myself using it so good.

yarikoptic

Looks good to me. Didn't try, but left some comments to consider.

yarikoptic · 2021-03-31T23:53:33Z

datalad/core/local/run.py

@@ -231,6 +237,13 @@ class Run(Interface):
            '.datalad/runinfo' directory (customizable via the
            'datalad.run.record-directory' configuration variable).""",
            constraints=EnsureNone() | EnsureBool()),
+        dry_run=Parameter(
+            # Leave out common -n short flag to avoid confusion with
+            # `containers-run [-n|--container-name]`.


yeah, --dry-run is good enough as to me -- to be used sparingly anyways, and better be this explicit.

yarikoptic · 2021-03-31T23:57:37Z

datalad/core/local/run.py

+            lines = [fmt_line("location", dry_run_info["pwd_full"])]
+
+            # TODO: Inputs and outputs could be pretty long. These may be worth
+            # truncating.


oh well... at least expanded command is printed last, and it is the main target IMHO. Otherwise it would need some additional option or configuration parameter to either truncate or not and at what length etc - could be done later if/when need arises. There is also always > /tmp/file; vim /tmp/file to navigate conveniently ;)

I think the expansion of the inputs/outputs in conjunction with the command is instrumental, not just the command. I gave an example in #5539 (comment)

yarikoptic · 2021-04-01T00:07:22Z

datalad/core/local/run.py

+            args=("--dry-run",),
+            action="store_true",
+            doc="""Do not run the command; just display details about the
+            command execution."""),


I wonder if it is worth adding a note that globs in input/outputs might not expand (at all or fully) if pointing to some paths within not yet installed subdatasets.
I do not think we should be getting all those inputs etc, even though a "perfect" --dry-run could have included listing of (sub)datasets to install, and a number/size of files to get - I think it would bring up its run time considerably, thus possibly making it less convenient for typical use cases, and thus not done until need is expressed.

Good point, but I fear that the overhead to be able to state this clearly would be substantial. I would lean towards keeping it simple, for a start.

I've added a sentence to the docstring mentioning that globs from uninstalled datasets aren't expanded.

datalad/core/local/tests/test_run.py

mih · 2021-04-01T07:10:46Z

First an impulse comment after having played with it for a bit....YES!

This alone will have enormous impact:

# wtf?!
% datalad run "for m in 1 2 3; do echo $m; done"
[INFO   ] == Command start (output follows) ===== 



[INFO   ] == Command exit (modification check follows) =====

# ah!
% datalad run --dry-run "for m in 1 2 3; do echo $m; done"
Dry run information
 location: /tmp/dummy
 command:
  for m in 1 2 3; do echo ; done

Oh, yes...

(datalad-dev) mih@meiner /tmp/dummy (git)-[master] % dl run --dry-run -i [1-9]  some body
Dry run information
 location: /tmp/dummy
 expanded inputs:
  ['1']
 command:
  2 3 some body
(datalad-dev) mih@meiner /tmp/dummy (git)-[master] % dl run --dry-run -i '[1-9]'  some body
Dry run information
 location: /tmp/dummy
 expanded inputs:
  ['1', '2', '3']
 command:
  some body

mih

This is great! Thx @kyleam

I am still struggling with the API and (limited) potential for extending with different reporting strategies, as you already pointed out. I tried a few alternatives that all came out inferior to your's. It would be great to already have this implemented as an option that could take values, but doesn't have to. I have no good approach to suggest right now, though...

mih · 2021-04-01T07:30:14Z

datalad/core/local/run.py

+            args=("--dry-run",),
+            action="store_true",
+            doc="""Do not run the command; just display details about the
+            command execution."""),


Good point, but I fear that the overhead to be able to state this clearly would be substantial. I would lean towards keeping it simple, for a start.

mih · 2021-04-01T07:54:18Z

datalad/core/local/run.py

+            lines = [fmt_line("location", dry_run_info["pwd_full"])]
+
+            # TODO: Inputs and outputs could be pretty long. These may be worth
+            # truncating.


I think the expansion of the inputs/outputs in conjunction with the command is instrumental, not just the command. I gave an example in #5539 (comment)

kyleam · 2021-04-01T22:00:15Z

Thanks for the feedback, @yarikoptic and @mih.

@mih:

It would be great to already have this implemented as an option that could take values, but doesn't have to. I have no good approach to suggest right now, though...

Moving to an option with values sounds fine to me. My thinking was that using a bare option (until there are concrete ideas for other modes) doesn't block adding values because we could tack on optional values later (in the same way status --annex maps to status --annex=basic).

mih · 2021-04-02T06:20:21Z

But it is difficult here. The command will pretty much always be last. I tried and the potential for confusion is high (swallow first item of a command).

kyleam · 2021-04-02T12:26:13Z

I tried and the potential for confusion is high (swallow first item of a command).

Ah, true, that's a good reason to avoid the optional value approach.

kyleam · 2021-04-02T14:57:21Z

It would be great to already have this implemented as an option that could take values, but doesn't have to

I've switched the option over to taking values and added a mode that shows only the expanded command. Given the issue you pointed out with extended a flag later in this case, I think supporting values right away is the way to go, though now I'm on the fence about whether --report would be a better name than --dry-run.

kyleam · 2021-04-02T15:18:14Z

The hirni failure is the same one that has been occurring in gh-5534.

https://github.com/datalad/datalad/pull/5539/checks?check_run_id=2254534141

error

datalad_hirni.tests.test_demos.test_demo_repro_analysis ... /tmp/datalad_temp_test_demo_repro_analysis9hlb8uij/sourcedata/code/hirni-toolbox/procedures/copy-converter.sh: line 9: $1: unbound variable
ERROR
[...]
======================================================================
ERROR: datalad_hirni.tests.test_demos.test_demo_repro_analysis
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/opt/hostedtoolcache/Python/3.7.10/x64/lib/python3.7/site-packages/nose/case.py", line 198, in runTest
    self.test(*self.arg)
  File "/opt/hostedtoolcache/Python/3.7.10/x64/lib/python3.7/site-packages/datalad/tests/utils.py", line 757, in _wrap_with_tempfile
    return t(*(arg + (filename,)), **kw)
  File "/opt/hostedtoolcache/Python/3.7.10/x64/lib/python3.7/site-packages/datalad/tests/utils.py", line 757, in _wrap_with_tempfile
    return t(*(arg + (filename,)), **kw)
  File "/opt/hostedtoolcache/Python/3.7.10/x64/lib/python3.7/site-packages/datalad_hirni/tests/utils.py", line 257, in newfunc
    return f(*(arg + (new_url,)), **kw)
  File "/opt/hostedtoolcache/Python/3.7.10/x64/lib/python3.7/site-packages/datalad_hirni/tests/test_demos.py", line 156, in test_demo_repro_analysis
    anonymize=True)
  File "/opt/hostedtoolcache/Python/3.7.10/x64/lib/python3.7/site-packages/datalad/distribution/dataset.py", line 505, in apply_func
    return f(**kwargs)
  File "/opt/hostedtoolcache/Python/3.7.10/x64/lib/python3.7/site-packages/datalad/interface/utils.py", line 482, in eval_func
    return return_func(generator_func)(*args, **kwargs)
  File "/opt/hostedtoolcache/Python/3.7.10/x64/lib/python3.7/site-packages/datalad/interface/utils.py", line 470, in return_func
    results = list(results)
  File "/opt/hostedtoolcache/Python/3.7.10/x64/lib/python3.7/site-packages/datalad/interface/utils.py", line 401, in generator_func
    allkwargs):
  File "/opt/hostedtoolcache/Python/3.7.10/x64/lib/python3.7/site-packages/datalad/interface/utils.py", line 557, in _process_results
    for res in results:
  File "/opt/hostedtoolcache/Python/3.7.10/x64/lib/python3.7/site-packages/datalad_hirni/commands/spec2bids.py", line 268, in __call__
    return_type='generator'
  File "/opt/hostedtoolcache/Python/3.7.10/x64/lib/python3.7/site-packages/datalad/interface/utils.py", line 401, in generator_func
    allkwargs):
  File "/opt/hostedtoolcache/Python/3.7.10/x64/lib/python3.7/site-packages/datalad/interface/utils.py", line 557, in _process_results
    for res in results:
  File "/opt/hostedtoolcache/Python/3.7.10/x64/lib/python3.7/site-packages/datalad/interface/run_procedure.py", line 451, in __call__
    return_type='generator'
  File "/opt/hostedtoolcache/Python/3.7.10/x64/lib/python3.7/site-packages/datalad/interface/utils.py", line 401, in generator_func
    allkwargs):
  File "/opt/hostedtoolcache/Python/3.7.10/x64/lib/python3.7/site-packages/datalad/interface/utils.py", line 557, in _process_results
    for res in results:
  File "/opt/hostedtoolcache/Python/3.7.10/x64/lib/python3.7/site-packages/datalad/core/local/run.py", line 275, in __call__
    dry_run=dry_run):
  File "/opt/hostedtoolcache/Python/3.7.10/x64/lib/python3.7/site-packages/datalad/core/local/run.py", line 791, in run_command
    raise exc
  File "/opt/hostedtoolcache/Python/3.7.10/x64/lib/python3.7/site-packages/datalad/core/local/run.py", line 556, in _execute_command
    command
  File "/opt/hostedtoolcache/Python/3.7.10/x64/lib/python3.7/site-packages/datalad/cmd.py", line 412, in run
    **results,
datalad.support.exceptions.CommandError: CommandError: 'bash /tmp/datalad_temp_test_demo_repro_analysis9hlb8uij/sourcedata/code/hirni-toolbox/procedures/copy-converter.sh ' failed with exitcode 1 under /tmp/datalad_temp_test_demo_repro_analysis9hlb8uij

yarikoptic · 2021-04-07T22:59:22Z

windows fails seems to be legit but then why in only one of the runs?

======================================================================
FAIL: datalad.core.local.tests.test_run.test_dry_run
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\Python39-x64\lib\site-packages\nose\case.py", line 198, in runTest
    self.test(*self.arg)
  File "C:\Python39-x64\lib\site-packages\datalad\tests\utils.py", line 578, in _wrap_with_tree
    return t(*(arg + (d,)), **kw)
  File "C:\Python39-x64\lib\site-packages\datalad\core\local\tests\test_run.py", line 628, in test_dry_run
    assert_in(
AssertionError: 'blah "sub/baz"' not found in 'Dry run information\n location: C:\\DLTMP\\datalad_temp_tree_6m7w12oz\n\n expanded inputs:\n\n  [\'sub\\\\baz\']\n\n command:\n\n  blah {inputs}\n\n expanded command:\n\n  blah "sub\\baz"\n'

kyleam · 2021-04-07T23:21:45Z

windows fails seems to be legit

I somehow overlooked that failure. Thanks!

but then why in only one of the runs?

Of the Windows jobs, I think that's the only one that runs the datalad.core tests.

run() puts the run information into a dictionary right before it formats the JSON record for the commit message or sidecar file. Instead do it before the command execution so that this information can be used in the dry run mode added by the next commit.

@mih

As suggested by @mih, adding a way to easily view information about the execution, particularly the expanded command, 'could shorten the "design phase" of a run command'. Closes dataladgh-5538

@yarikoptic

Thanks, @yarikoptic, for noting that this was untested/undocumented.

@mih

It's likely that the need for more modes will come up in the future, and, as pointed out by @mih, tacking on optional values later is awkward/confusing because '--dry-run command arg ...' would take "command" as the argument to --dry-run. So, put the current mode under "basic" (inspired by `status --annex=basic`), and add a mode that shows just the expanded command. If --dry-run=basic ends up being too verbose, we can add an empty string value as a shortcut.

As of 56bc402 (Use dedicated action name for dry-run operation, 2021-04-08), create_sibling_github(..., dry_run=True) appends "[dry-run]" to its action name. Do the same for consistency and to reduce the chance that result hooks are unintentionally triggered.

kyleam · 2021-04-26T21:26:29Z

I've rebased to resolve conflicts and also appended "[dry-run]" to the action name to follow the recent change in create-sibling-github.

kyleam · 2021-04-30T21:25:03Z

Thanks for the review, @yarikoptic.

@mih Are you okay with the current state, in particular with --dry-run={basic,command}?

yarikoptic · 2021-04-30T22:11:53Z

eh, codespell already started to negatively effect the DX... hopefully would not happen that often ;-)

mih · 2021-05-04T09:53:29Z

I like it! thx much!

kyleam force-pushed the run-dryrun branch from 15ef80f to b62e432 Compare March 31, 2021 18:31

yarikoptic reviewed Apr 1, 2021

View reviewed changes

mih mentioned this pull request Apr 1, 2021

Mention run --dry-run datalad-handbook/book#698

Closed

mih reviewed Apr 1, 2021

View reviewed changes

This was referenced Apr 2, 2021

update: support reset --hard action #5534

Merged

hirni: test_demo_repro_analysis failure in recent PRs #5549

Closed

kyleam force-pushed the run-dryrun branch from 89125a6 to 11a32d1 Compare April 7, 2021 23:17

kyleam added 6 commits April 26, 2021 16:59

CLN: test_run: Drop a now unused import

9f2da69

NF: run: Add dry run mode

7adeb0e

As suggested by @mih, adding a way to easily view information about the execution, particularly the expanded command, 'could shorten the "design phase" of a run command'. Closes dataladgh-5538

DOC: run: Document and test dry-run subdataset handling

245639a

Thanks, @yarikoptic, for noting that this was untested/undocumented.

kyleam force-pushed the run-dryrun branch from 11a32d1 to e32556e Compare April 26, 2021 21:22

MNT: Pacify codespell

6416161

yarikoptic approved these changes Apr 28, 2021

View reviewed changes

mih merged commit ff61d7b into datalad:master May 4, 2021

kyleam deleted the run-dryrun branch May 4, 2021 22:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

run: Add dry run mode #5539

run: Add dry run mode #5539

kyleam commented Mar 31, 2021

codecov bot commented Mar 31, 2021 •

edited

Loading

yarikoptic commented Mar 31, 2021

yarikoptic left a comment

yarikoptic Mar 31, 2021

yarikoptic Mar 31, 2021

mih Apr 1, 2021

yarikoptic Apr 1, 2021

mih Apr 1, 2021

kyleam Apr 2, 2021

mih commented Apr 1, 2021 •

edited

Loading

mih left a comment

mih Apr 1, 2021

mih Apr 1, 2021

kyleam commented Apr 1, 2021

mih commented Apr 2, 2021

kyleam commented Apr 2, 2021

kyleam commented Apr 2, 2021

kyleam commented Apr 2, 2021

yarikoptic commented Apr 7, 2021

kyleam commented Apr 7, 2021

kyleam commented Apr 26, 2021

kyleam commented Apr 30, 2021

yarikoptic commented Apr 30, 2021

mih commented May 4, 2021

run: Add dry run mode #5539

run: Add dry run mode #5539

Conversation

kyleam commented Mar 31, 2021

codecov bot commented Mar 31, 2021 • edited Loading

Codecov Report

yarikoptic commented Mar 31, 2021

yarikoptic left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mih commented Apr 1, 2021 • edited Loading

mih left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kyleam commented Apr 1, 2021

mih commented Apr 2, 2021

kyleam commented Apr 2, 2021

kyleam commented Apr 2, 2021

kyleam commented Apr 2, 2021

yarikoptic commented Apr 7, 2021

kyleam commented Apr 7, 2021

kyleam commented Apr 26, 2021

kyleam commented Apr 30, 2021

yarikoptic commented Apr 30, 2021

mih commented May 4, 2021

codecov bot commented Mar 31, 2021 •

edited

Loading

mih commented Apr 1, 2021 •

edited

Loading