Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

run: Add dry run mode #5539

Merged
merged 7 commits into from
May 4, 2021
Merged

run: Add dry run mode #5539

merged 7 commits into from
May 4, 2021

Conversation

kyleam
Copy link
Contributor

@kyleam kyleam commented Mar 31, 2021

This PR implements the dry-run operation proposed in gh-5538 by adding a --dry-run flag to run along with a custom result renderer. I went light on the reported details, but, if desired, I think it should be straightforward to extend this approach later with more information.

I decided to go with the name --dry-run because I think that's the most familiar/obvious name, though --report would align with rerun. Also, --report might be a bit more natural if this option is later extended to take an optional value (e.g., to restrict or extend what is reported).


$ touch foo
$ datalad save

$ datalad run --dry-run -i 'fo*' -o out 'cat {inputs[0]} >{outputs[0]}'
Dry run information
 location: /tmp/dl-vaZ53PN
 expanded inputs:
  ['foo']
 expanded outputs:
  ['out']
 command:
  cat {inputs[0]} >{outputs[0]}
 expanded command:
  cat foo >out

@codecov
Copy link

codecov bot commented Mar 31, 2021

Codecov Report

Merging #5539 (6416161) into master (941c615) will increase coverage by 0.01%.
The diff coverage is 98.78%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #5539      +/-   ##
==========================================
+ Coverage   90.32%   90.33%   +0.01%     
==========================================
  Files         305      305              
  Lines       41577    41654      +77     
==========================================
+ Hits        37554    37630      +76     
- Misses       4023     4024       +1     
Impacted Files Coverage Δ
datalad/core/local/run.py 98.15% <97.61%> (-0.14%) ⬇️
datalad/core/local/tests/test_run.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 941c615...6416161. Read the comment docs.

@yarikoptic
Copy link
Member

Thank you @kyleam - yet to review changes etc, I see myself using it so good.

Copy link
Member

@yarikoptic yarikoptic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Didn't try, but left some comments to consider.

@@ -231,6 +237,13 @@ class Run(Interface):
'.datalad/runinfo' directory (customizable via the
'datalad.run.record-directory' configuration variable).""",
constraints=EnsureNone() | EnsureBool()),
dry_run=Parameter(
# Leave out common -n short flag to avoid confusion with
# `containers-run [-n|--container-name]`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, --dry-run is good enough as to me -- to be used sparingly anyways, and better be this explicit.

lines = [fmt_line("location", dry_run_info["pwd_full"])]

# TODO: Inputs and outputs could be pretty long. These may be worth
# truncating.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh well... at least expanded command is printed last, and it is the main target IMHO. Otherwise it would need some additional option or configuration parameter to either truncate or not and at what length etc - could be done later if/when need arises. There is also always > /tmp/file; vim /tmp/file to navigate conveniently ;)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the expansion of the inputs/outputs in conjunction with the command is instrumental, not just the command. I gave an example in #5539 (comment)

args=("--dry-run",),
action="store_true",
doc="""Do not run the command; just display details about the
command execution."""),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it is worth adding a note that globs in input/outputs might not expand (at all or fully) if pointing to some paths within not yet installed subdatasets.
I do not think we should be getting all those inputs etc, even though a "perfect" --dry-run could have included listing of (sub)datasets to install, and a number/size of files to get - I think it would bring up its run time considerably, thus possibly making it less convenient for typical use cases, and thus not done until need is expressed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, but I fear that the overhead to be able to state this clearly would be substantial. I would lean towards keeping it simple, for a start.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a sentence to the docstring mentioning that globs from uninstalled datasets aren't expanded.

datalad/core/local/tests/test_run.py Show resolved Hide resolved
@mih
Copy link
Member

mih commented Apr 1, 2021

First an impulse comment after having played with it for a bit....YES!

This alone will have enormous impact:

# wtf?!
% datalad run "for m in 1 2 3; do echo $m; done"
[INFO   ] == Command start (output follows) ===== 



[INFO   ] == Command exit (modification check follows) ===== 
# ah!
% datalad run --dry-run "for m in 1 2 3; do echo $m; done"
Dry run information
 location: /tmp/dummy
 command:
  for m in 1 2 3; do echo ; done

Oh, yes...

(datalad-dev) mih@meiner /tmp/dummy (git)-[master] % dl run --dry-run -i [1-9]  some body
Dry run information
 location: /tmp/dummy
 expanded inputs:
  ['1']
 command:
  2 3 some body
(datalad-dev) mih@meiner /tmp/dummy (git)-[master] % dl run --dry-run -i '[1-9]'  some body
Dry run information
 location: /tmp/dummy
 expanded inputs:
  ['1', '2', '3']
 command:
  some body

Copy link
Member

@mih mih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! Thx @kyleam

I am still struggling with the API and (limited) potential for extending with different reporting strategies, as you already pointed out. I tried a few alternatives that all came out inferior to your's. It would be great to already have this implemented as an option that could take values, but doesn't have to. I have no good approach to suggest right now, though...

args=("--dry-run",),
action="store_true",
doc="""Do not run the command; just display details about the
command execution."""),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, but I fear that the overhead to be able to state this clearly would be substantial. I would lean towards keeping it simple, for a start.

lines = [fmt_line("location", dry_run_info["pwd_full"])]

# TODO: Inputs and outputs could be pretty long. These may be worth
# truncating.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the expansion of the inputs/outputs in conjunction with the command is instrumental, not just the command. I gave an example in #5539 (comment)

@kyleam
Copy link
Contributor Author

kyleam commented Apr 1, 2021

Thanks for the feedback, @yarikoptic and @mih.

@mih:

It would be great to already have this implemented as an option that could take values, but doesn't have to. I have no good approach to suggest right now, though...

Moving to an option with values sounds fine to me. My thinking was that using a bare option (until there are concrete ideas for other modes) doesn't block adding values because we could tack on optional values later (in the same way status --annex maps to status --annex=basic).

@mih
Copy link
Member

mih commented Apr 2, 2021

But it is difficult here. The command will pretty much always be last. I tried and the potential for confusion is high (swallow first item of a command).

@kyleam
Copy link
Contributor Author

kyleam commented Apr 2, 2021

I tried and the potential for confusion is high (swallow first item of a command).

Ah, true, that's a good reason to avoid the optional value approach.

@kyleam
Copy link
Contributor Author

kyleam commented Apr 2, 2021

It would be great to already have this implemented as an option that could take values, but doesn't have to

I've switched the option over to taking values and added a mode that shows only the expanded command. Given the issue you pointed out with extended a flag later in this case, I think supporting values right away is the way to go, though now I'm on the fence about whether --report would be a better name than --dry-run.

@kyleam
Copy link
Contributor Author

kyleam commented Apr 2, 2021

The hirni failure is the same one that has been occurring in gh-5534.

https://github.com/datalad/datalad/pull/5539/checks?check_run_id=2254534141

error
datalad_hirni.tests.test_demos.test_demo_repro_analysis ... /tmp/datalad_temp_test_demo_repro_analysis9hlb8uij/sourcedata/code/hirni-toolbox/procedures/copy-converter.sh: line 9: $1: unbound variable
ERROR
[...]
======================================================================
ERROR: datalad_hirni.tests.test_demos.test_demo_repro_analysis
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/opt/hostedtoolcache/Python/3.7.10/x64/lib/python3.7/site-packages/nose/case.py", line 198, in runTest
    self.test(*self.arg)
  File "/opt/hostedtoolcache/Python/3.7.10/x64/lib/python3.7/site-packages/datalad/tests/utils.py", line 757, in _wrap_with_tempfile
    return t(*(arg + (filename,)), **kw)
  File "/opt/hostedtoolcache/Python/3.7.10/x64/lib/python3.7/site-packages/datalad/tests/utils.py", line 757, in _wrap_with_tempfile
    return t(*(arg + (filename,)), **kw)
  File "/opt/hostedtoolcache/Python/3.7.10/x64/lib/python3.7/site-packages/datalad_hirni/tests/utils.py", line 257, in newfunc
    return f(*(arg + (new_url,)), **kw)
  File "/opt/hostedtoolcache/Python/3.7.10/x64/lib/python3.7/site-packages/datalad_hirni/tests/test_demos.py", line 156, in test_demo_repro_analysis
    anonymize=True)
  File "/opt/hostedtoolcache/Python/3.7.10/x64/lib/python3.7/site-packages/datalad/distribution/dataset.py", line 505, in apply_func
    return f(**kwargs)
  File "/opt/hostedtoolcache/Python/3.7.10/x64/lib/python3.7/site-packages/datalad/interface/utils.py", line 482, in eval_func
    return return_func(generator_func)(*args, **kwargs)
  File "/opt/hostedtoolcache/Python/3.7.10/x64/lib/python3.7/site-packages/datalad/interface/utils.py", line 470, in return_func
    results = list(results)
  File "/opt/hostedtoolcache/Python/3.7.10/x64/lib/python3.7/site-packages/datalad/interface/utils.py", line 401, in generator_func
    allkwargs):
  File "/opt/hostedtoolcache/Python/3.7.10/x64/lib/python3.7/site-packages/datalad/interface/utils.py", line 557, in _process_results
    for res in results:
  File "/opt/hostedtoolcache/Python/3.7.10/x64/lib/python3.7/site-packages/datalad_hirni/commands/spec2bids.py", line 268, in __call__
    return_type='generator'
  File "/opt/hostedtoolcache/Python/3.7.10/x64/lib/python3.7/site-packages/datalad/interface/utils.py", line 401, in generator_func
    allkwargs):
  File "/opt/hostedtoolcache/Python/3.7.10/x64/lib/python3.7/site-packages/datalad/interface/utils.py", line 557, in _process_results
    for res in results:
  File "/opt/hostedtoolcache/Python/3.7.10/x64/lib/python3.7/site-packages/datalad/interface/run_procedure.py", line 451, in __call__
    return_type='generator'
  File "/opt/hostedtoolcache/Python/3.7.10/x64/lib/python3.7/site-packages/datalad/interface/utils.py", line 401, in generator_func
    allkwargs):
  File "/opt/hostedtoolcache/Python/3.7.10/x64/lib/python3.7/site-packages/datalad/interface/utils.py", line 557, in _process_results
    for res in results:
  File "/opt/hostedtoolcache/Python/3.7.10/x64/lib/python3.7/site-packages/datalad/core/local/run.py", line 275, in __call__
    dry_run=dry_run):
  File "/opt/hostedtoolcache/Python/3.7.10/x64/lib/python3.7/site-packages/datalad/core/local/run.py", line 791, in run_command
    raise exc
  File "/opt/hostedtoolcache/Python/3.7.10/x64/lib/python3.7/site-packages/datalad/core/local/run.py", line 556, in _execute_command
    command
  File "/opt/hostedtoolcache/Python/3.7.10/x64/lib/python3.7/site-packages/datalad/cmd.py", line 412, in run
    **results,
datalad.support.exceptions.CommandError: CommandError: 'bash /tmp/datalad_temp_test_demo_repro_analysis9hlb8uij/sourcedata/code/hirni-toolbox/procedures/copy-converter.sh ' failed with exitcode 1 under /tmp/datalad_temp_test_demo_repro_analysis9hlb8uij

@yarikoptic
Copy link
Member

windows fails seems to be legit but then why in only one of the runs?

======================================================================
FAIL: datalad.core.local.tests.test_run.test_dry_run
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\Python39-x64\lib\site-packages\nose\case.py", line 198, in runTest
    self.test(*self.arg)
  File "C:\Python39-x64\lib\site-packages\datalad\tests\utils.py", line 578, in _wrap_with_tree
    return t(*(arg + (d,)), **kw)
  File "C:\Python39-x64\lib\site-packages\datalad\core\local\tests\test_run.py", line 628, in test_dry_run
    assert_in(
AssertionError: 'blah "sub/baz"' not found in 'Dry run information\n location: C:\\DLTMP\\datalad_temp_tree_6m7w12oz\n\n expanded inputs:\n\n  [\'sub\\\\baz\']\n\n command:\n\n  blah {inputs}\n\n expanded command:\n\n  blah "sub\\baz"\n'

@kyleam
Copy link
Contributor Author

kyleam commented Apr 7, 2021

windows fails seems to be legit

I somehow overlooked that failure. Thanks!

but then why in only one of the runs?

Of the Windows jobs, I think that's the only one that runs the datalad.core tests.

run() puts the run information into a dictionary right before it
formats the JSON record for the commit message or sidecar file.
Instead do it before the command execution so that this information
can be used in the dry run mode added by the next commit.
As suggested by @mih, adding a way to easily view information about
the execution, particularly the expanded command, 'could shorten the
"design phase" of a run command'.

Closes dataladgh-5538
Thanks, @yarikoptic, for noting that this was untested/undocumented.
It's likely that the need for more modes will come up in the future,
and, as pointed out by @mih, tacking on optional values later is
awkward/confusing because '--dry-run command arg ...' would take
"command" as the argument to --dry-run.

So, put the current mode under "basic" (inspired by `status
--annex=basic`), and add a mode that shows just the expanded command.

If --dry-run=basic ends up being too verbose, we can add an empty
string value as a shortcut.
As of 56bc402 (Use dedicated action name for dry-run operation,
2021-04-08), create_sibling_github(..., dry_run=True) appends
"[dry-run]" to its action name.  Do the same for consistency and to
reduce the chance that result hooks are unintentionally triggered.
@kyleam
Copy link
Contributor Author

kyleam commented Apr 26, 2021

I've rebased to resolve conflicts and also appended "[dry-run]" to the action name to follow the recent change in create-sibling-github.

@kyleam
Copy link
Contributor Author

kyleam commented Apr 30, 2021

Thanks for the review, @yarikoptic.

@mih Are you okay with the current state, in particular with --dry-run={basic,command}?

@yarikoptic
Copy link
Member

eh, codespell already started to negatively effect the DX... hopefully would not happen that often ;-)

@mih
Copy link
Member

mih commented May 4, 2021

I like it! thx much!

@mih mih merged commit ff61d7b into datalad:master May 4, 2021
@kyleam kyleam deleted the run-dryrun branch May 4, 2021 22:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants