Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OPT+ENH: determine config to query for procedure once, add more "log(2," to eval_func #3576

Merged
merged 4 commits into from Aug 2, 2019

Conversation

@yarikoptic
Copy link
Member

commented Aug 1, 2019

Original OPT issue: #3575 . This PR addresses only one aspect, and doesn't touch upon relatively slow .config and .is_installed which are yet to be figured out and probably should be properly "lazied" at the dataset level since we are doing this over and over for every decorated function in the same dataset!

Those added debug statements seems to have no cost but could help determining unexpected
slow downs

For me it is cutting down time for a dummy datalad run from 2.9 to 2.2 sec (so ~20% boost! ;))

~datalad/datalad > time DATALAD_LOG_TIMESTAMP=1 datalad run --explicit --input benchmarks echo nothing todo 2>&1 | tools/dtime 
 1280 2019-08-01 11:40:50,695 [INFO] Making sure inputs are available (this may take some time) 
    - nothing todo
    3 2019-08-01 11:40:51,975 [INFO] == Command start (output follows) ===== 
----- 2019-08-01 11:40:51,978 [INFO] == Command exit (modification check follows) ===== 
DATALAD_LOG_TIMESTAMP=1 datalad run --explicit --input benchmarks echo nothin  2.02s user 0.21s system 100% cpu 2.221 total
tools/dtime  0.01s user 0.01s system 0% cpu 2.225 total

and I do not expect negative effect on other usecases.

OPT+ENH: determine config to query for procedure once, add more "log(…
…2," to eval_func

Original OPT issue: #3575
Those debug statements seems to have no cost but could help determining unexpected
slow downs

@yarikoptic yarikoptic added this to the Release 0.11.7 milestone Aug 1, 2019

BF: eval_func: Fix spec value from previous commit (ce4c66f)
The next line adds `proc_cfg` and expects `spec` to be a plain list.
@codecov

This comment has been minimized.

Copy link

commented Aug 2, 2019

Codecov Report

Merging #3576 into 0.11.x will decrease coverage by 18.2%.
The diff coverage is 86.66%.

Impacted file tree graph

@@             Coverage Diff             @@
##           0.11.x    #3576       +/-   ##
===========================================
- Coverage   77.29%   59.08%   -18.21%     
===========================================
  Files         255      255               
  Lines       33641    33645        +4     
===========================================
- Hits        26002    19879     -6123     
- Misses       7639    13766     +6127
Impacted Files Coverage Δ
datalad/interface/utils.py 75.47% <86.66%> (-0.6%) ⬇️
datalad/interface/tests/test_save.py 16.3% <0%> (-83.7%) ⬇️
datalad/ui/tests/test_base.py 16.66% <0%> (-83.34%) ⬇️
datalad/distribution/tests/test_dataset.py 17.6% <0%> (-82.4%) ⬇️
datalad/metadata/tests/test_aggregation.py 17.84% <0%> (-81.23%) ⬇️
datalad/distribution/tests/test_siblings.py 19.14% <0%> (-80.86%) ⬇️
datalad/distribution/tests/test_dataset_binding.py 19.23% <0%> (-80.77%) ⬇️
datalad/interface/tests/test_base.py 19.48% <0%> (-80.52%) ⬇️
datalad/interface/tests/test_diff.py 22.03% <0%> (-77.97%) ⬇️
datalad/interface/tests/test_ls_webui.py 15.5% <0%> (-77.31%) ⬇️
... and 105 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ebb8b93...fe9bff6. Read the comment docs.

@codecov

This comment has been minimized.

Copy link

commented Aug 2, 2019

Codecov Report

Merging #3576 into 0.11.x will decrease coverage by 18.2%.
The diff coverage is 86.66%.

Impacted file tree graph

@@             Coverage Diff             @@
##           0.11.x    #3576       +/-   ##
===========================================
- Coverage   77.29%   59.08%   -18.21%     
===========================================
  Files         255      255               
  Lines       33641    33645        +4     
===========================================
- Hits        26002    19879     -6123     
- Misses       7639    13766     +6127
Impacted Files Coverage Δ
datalad/interface/utils.py 75.47% <86.66%> (-0.6%) ⬇️
datalad/interface/tests/test_save.py 16.3% <0%> (-83.7%) ⬇️
datalad/ui/tests/test_base.py 16.66% <0%> (-83.34%) ⬇️
datalad/distribution/tests/test_dataset.py 17.6% <0%> (-82.4%) ⬇️
datalad/metadata/tests/test_aggregation.py 17.84% <0%> (-81.23%) ⬇️
datalad/distribution/tests/test_siblings.py 19.14% <0%> (-80.86%) ⬇️
datalad/distribution/tests/test_dataset_binding.py 19.23% <0%> (-80.77%) ⬇️
datalad/interface/tests/test_base.py 19.48% <0%> (-80.52%) ⬇️
datalad/interface/tests/test_diff.py 22.03% <0%> (-77.97%) ⬇️
datalad/interface/tests/test_ls_webui.py 15.5% <0%> (-77.31%) ⬇️
... and 105 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ebb8b93...4601749. Read the comment docs.

@kyleam
kyleam approved these changes Aug 2, 2019
@@ -363,33 +364,39 @@ def eval_func(wrapped, instance, args, kwargs):
def _result_filter(res):
return result_filter(res, **allkwargs)

def _get_procedure_specs(param_key=None, cfg_key=None, ds=None):
def _get_procedure_specs(param_key=None, cfg_key=None, ds=None, proc_cfg=None):

This comment has been minimized.

Copy link
@kyleam

kyleam Aug 2, 2019

Member

The additional parameter is fine by me, though I have a slight preference for just defining proc_cfg above _get_procedure_specs.

This comment has been minimized.

Copy link
@yarikoptic

yarikoptic Aug 2, 2019

Author Member

that is how I did it first, but then it should be not a pure variable since you can't rebind it inside, but some kind of a container (I had list). Just looks ugly etc, so decided to make it explicit

@yarikoptic

This comment has been minimized.

Copy link
Member Author

commented Aug 2, 2019

no travis report here, but it is there and I seems screwed up

1114======================================================================
1115ERROR: datalad.distribution.tests.test_create.test_create_withprocedure
1116----------------------------------------------------------------------
1117Traceback (most recent call last):
1118  File "/home/travis/virtualenv/python2.7.15/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
1119    self.test(*self.arg)
1120  File "/home/travis/virtualenv/python2.7.15/lib/python2.7/site-packages/datalad/tests/utils.py", line 615, in newfunc
1121    return t(*(arg + (filename,)), **kw)
1122  File "/home/travis/virtualenv/python2.7.15/lib/python2.7/site-packages/datalad/distribution/tests/test_create.py", line 320, in test_create_withprocedure
1123    proc_post=[['cfg_metadatatypes', 'xmp', 'datacite']])
1124  File "/home/travis/virtualenv/python2.7.15/lib/python2.7/site-packages/datalad/interface/utils.py", line 399, in eval_func
1125    proc_cfg=proc_cfg)
1126ValueError: need more than 1 value to unpack

so will fix it now

@kyleam

This comment has been minimized.

Copy link
Member

commented Aug 2, 2019

so will fix it now

I've already fixed it in fe9bff6.

Edit: Travis is just starting to run for that push here.

@kyleam

This comment has been minimized.

Copy link
Member

commented Aug 2, 2019

so will fix it now

I've already fixed it in fe9bff6.

Sorry, ignore me. I fixed another error that the initial commit caused in test_run_procedure, but it looks like this is a different source.

@yarikoptic

This comment has been minimized.

Copy link
Member Author

commented Aug 2, 2019

then I guess there more! I should have indeed went the "outside variable" way to make it less fragile :-/

$> git show                                        
commit 4601749db64e9d0e45b81b68f8086f9feb422d4c (HEAD -> opt-doitonce)
Author: Yaroslav Halchenko <debian@onerussian.com>
Date:   Fri Aug 2 09:42:04 2019 -0400

    BF: eval_func: One more fix to return value

diff --git a/datalad/interface/utils.py b/datalad/interface/utils.py
index ea510f598..0a169aaae 100644
--- a/datalad/interface/utils.py
+++ b/datalad/interface/utils.py
@@ -369,7 +369,7 @@ def eval_results(func):
             spec = common_params.get(param_key, None)
             if spec is not None:
                 # this is already a list of lists
-                return spec
+                return spec, proc_cfg
 
             if not proc_cfg:
                 # .is_installed and .config can be costly, so ensure we do
changes on filesystem:                                                                                                                                                                                                            
 .asv | 0
(dev) 1 15626.....................................:Fri 02 Aug 2019 09:50:14 AM EDT:.
(git)hopa:~datalad/datalad[opt-doitonce]git
$> git range-diff gh-yarikoptic/opt-doitonce...HEAD
-:  --------- > 1:  4601749db BF: eval_func: One more fix to return value

why range-diff doesn't show the diff? here is no rewrite but a new commit

pushing

@kyleam

This comment has been minimized.

Copy link
Member

commented Aug 2, 2019

why range-diff doesn't show the diff? here is no rewrite but a new commit

range-diff won't show the diff for added or dropped commits, only those for matched patches, which means it's not particularly useful (or needed) when you're not rewriting a series.

@kyleam

This comment has been minimized.

Copy link
Member

commented Aug 2, 2019

commit 4601749db64e9d0e45b81b68f8086f9feb422d4c (HEAD -> opt-doitonce)
Author: Yaroslav Halchenko <debian@onerussian.com>
Date:   Fri Aug 2 09:42:04 2019 -0400

    BF: eval_func: One more fix to return value

diff --git a/datalad/interface/utils.py b/datalad/interface/utils.py
index ea510f598..0a169aaae 100644
--- a/datalad/interface/utils.py
+++ b/datalad/interface/utils.py
@@ -369,7 +369,7 @@ def eval_results(func):
             spec = common_params.get(param_key, None)
             if spec is not None:
                 # this is already a list of lists
-                return spec
+                return spec, proc_cfg

Looks good. I missed that in my fixup commit.

@kyleam kyleam merged commit 4601749 into datalad:0.11.x Aug 2, 2019

2 of 3 checks passed

continuous-integration/appveyor/pr AppVeyor build failed
Details
WIP Ready for review
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
kyleam added a commit that referenced this pull request Aug 2, 2019

@yarikoptic yarikoptic deleted the yarikoptic:opt-doitonce branch Aug 9, 2019

yarikoptic added a commit that referenced this pull request Sep 6, 2019
Merge tag '0.11.7' into debian
0.11.7 (Sep 02, 2019) -- python2-we-still-love-you-but-...

Primarily bugfixes with some optimizations and refactorings.

 Fixes

- [addurls][]
  - now provides better handling when the URL file isn't in the
    expected format.  ([#3579][])
  - always considered a relative file for the URL file argument as
    relative to the current working directory, which goes against the
    convention used by other commands of taking relative paths as
    relative to the dataset argument.  ([#3582][])

- [run-procedure][]
  - hard coded "python" when formatting the command for non-executable
    procedures ending with ".py".  `sys.executable` is now used.
    ([#3624][])
  - failed if arguments needed more complicated quoting than simply
    surrounding the value with double quotes.  This has been resolved
    for systems that support `shlex.quote`, but note that on Windows
    values are left unquoted. ([#3626][])

- [siblings][] now displays an informative error message if a local
  path is given to `--url` but `--name` isn't specified.  ([#3555][])

- [sshrun][], the command DataLad uses for `GIT_SSH_COMMAND`, didn't
  support all the parameters that Git expects it to.  ([#3616][])

- Fixed a number of Unicode py2-compatibility issues. ([#3597][])

 Enhancements and new features

- The [annotate-paths][] helper now caches subdatasets it has seen to
  avoid unnecessary calls.  ([#3570][])

- A repeated configuration query has been dropped from the handling of
  `--proc-pre` and `--proc-post`.  ([#3576][])

- Calls to `git annex find` now use `--in=.` instead of the alias
  `--in=here` to take advantage of an optimization that git-annex (as
  of the current release, 7.20190730) applies only to the
  former. ([#3574][])

- [addurls][] now suggests close matches when the URL or file format
  contains an unknown field.  ([#3594][])

- Shared logic used in the setup.py files of Datalad and its
  extensions has been moved to modules in the _datalad_build_support/
  directory.  ([#3600][])

- Get ready for upcoming git-annex dropping support for direct mode
  ([#3631][])

* tag '0.11.7': (87 commits)
  DOC: Added an entry to changelogn on merged 3631
  ENH: finalizing changelog for 0.11.7
  TST: Update tests for a git-annex without direct mode
  TST: utils: Add decorator that skips when direct mode is unsupported
  ENH: annexrepo: Refuse to initialize in direct mode if unsupported
  ENH: annexrepo: Add check_direct_mode_support method
  BF+TST: Avoid leaking patched git-annex version
  TST+RF: test_annexrepo: Split up a test
  CHANGELOG.md: Second batch for 0.11.7
  TST: run_procedure: Mark test_spaces() as known Windows failure
  TST: run_procedure: Mark test_quoting as known windows failure
  TST: run_procedure: Test more arguments that need quoting
  BF(py2): run_procedure: Avoid encoding error in log message
  TST: add run_procedure test with spaces in file name
  TST/RF: non-hardcoded Python executable
  RF: newline at end of file
  RF: helper instead of conditional
  RF: remove superfluous imports
  BF/TST: remove quoting
  ENH: replace conditionals with helper function
  ...
yarikoptic added a commit that referenced this pull request Sep 6, 2019
Merge tag '0.11.7' into debian
0.11.7 (Sep 02, 2019) -- python2-we-still-love-you-but-...

Primarily bugfixes with some optimizations and refactorings.

 Fixes

- [addurls][]
  - now provides better handling when the URL file isn't in the
    expected format.  ([#3579][])
  - always considered a relative file for the URL file argument as
    relative to the current working directory, which goes against the
    convention used by other commands of taking relative paths as
    relative to the dataset argument.  ([#3582][])

- [run-procedure][]
  - hard coded "python" when formatting the command for non-executable
    procedures ending with ".py".  `sys.executable` is now used.
    ([#3624][])
  - failed if arguments needed more complicated quoting than simply
    surrounding the value with double quotes.  This has been resolved
    for systems that support `shlex.quote`, but note that on Windows
    values are left unquoted. ([#3626][])

- [siblings][] now displays an informative error message if a local
  path is given to `--url` but `--name` isn't specified.  ([#3555][])

- [sshrun][], the command DataLad uses for `GIT_SSH_COMMAND`, didn't
  support all the parameters that Git expects it to.  ([#3616][])

- Fixed a number of Unicode py2-compatibility issues. ([#3597][])

- [download-url][] now will create leading directories of the output path
  if they do not exist ([#3646][])

 Enhancements and new features

- The [annotate-paths][] helper now caches subdatasets it has seen to
  avoid unnecessary calls.  ([#3570][])

- A repeated configuration query has been dropped from the handling of
  `--proc-pre` and `--proc-post`.  ([#3576][])

- Calls to `git annex find` now use `--in=.` instead of the alias
  `--in=here` to take advantage of an optimization that git-annex (as
  of the current release, 7.20190730) applies only to the
  former. ([#3574][])

- [addurls][] now suggests close matches when the URL or file format
  contains an unknown field.  ([#3594][])

- Shared logic used in the setup.py files of Datalad and its
  extensions has been moved to modules in the _datalad_build_support/
  directory.  ([#3600][])

- Get ready for upcoming git-annex dropping support for direct mode
  ([#3631][])

* tag '0.11.7':
  Changelog entry for download-url paths handling
  ENH: downloaders: Ensure directories for target exist
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.