Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BF+ENH+OPT: @memoize, correct config target for logging outputs #4194

Merged
merged 3 commits into from Feb 28, 2020

Conversation

yarikoptic
Copy link
Member

@yarikoptic yarikoptic commented Feb 25, 2020

Since CI is too busy, decided to just bundle in a single PR. If needed -- could split into two or three if needed (ideally should have added a test for the DATALAD_LOG_OUTPUTS env var actually having an effect).

See individual commits for possibly more info ;-)

datalad/utils.py Outdated
@@ -1081,6 +1081,31 @@ def wrapped_func(*args, **kwargs):
return wrapped_func


@optional_args
def memoize(f):
Copy link
Collaborator

@kyleam kyleam Feb 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use @functools.lru_cache rather than adding custom code?

Copy link
Member Author

@yarikoptic yarikoptic Feb 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ha ha -- because I forgot about that one! will abandon this bastard child here and RF to use it

datalad/config.py Outdated Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Feb 25, 2020

Codecov Report

No coverage uploaded for pull request base (maint@8da33b0). Click here to learn what that means.
The diff coverage is 100%.

Impacted file tree graph

@@           Coverage Diff            @@
##             maint    #4194   +/-   ##
========================================
  Coverage         ?   89.26%           
========================================
  Files            ?      275           
  Lines            ?    36791           
  Branches         ?        0           
========================================
  Hits             ?    32841           
  Misses           ?     3950           
  Partials         ?        0
Impacted Files Coverage Δ
datalad/cmd.py 89.62% <ø> (ø)
datalad/config.py 96.54% <100%> (ø)
datalad/support/external_versions.py 95.62% <100%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8da33b0...d1e3761. Read the comment docs.

…f logging parameters

Took me awhile to dig it out although bug was really there in the code staring into my face
@kyleam kyleam changed the base branch from master to maint Feb 28, 2020
@kyleam kyleam force-pushed the enh-memoize-git-version branch from f74acab to 9bffa97 Compare Feb 28, 2020
@kyleam
Copy link
Collaborator

kyleam commented Feb 28, 2020

I've switched this to use lru_cache. Note that the previous version with the custom decorator wasn't actually effective because each ConfigManager instance passed a new GitRunner, so there were no common arguments to cache. To give some numbers: when I ran core/local/tests/test_save.py off the old base of this PR (6fe0041), there were 241 calls to get_git_version. With your change in d76b4be, there were 239. The current push gets it down to 2.

range-diff
1:  1a5fff13b2 < -:  ---------- NF: simplistic @memoize
3:  f74acab69f = 1:  1a578f4789 BF: it is datalad.log not datalad.log.cmd section to decide destiny of logging parameters
-:  ---------- > 2:  bc244f9acf RF: config: Make passing a runner to get_git_version() optional
2:  d76b4bee80 ! 3:  9bffa97917 OPT: use @memoize for config.get_git_version
    @@ Metadata
     Author: Yaroslav Halchenko <debian@onerussian.com>
     
      ## Commit message ##
    -    OPT: use @memoize for config.get_git_version
    +    OPT: use lru_cache for config.get_git_version
     
         I saw us invoking git version  over and over again.  There really should be
         no reason to sample that "densly" and IMHO it should be safe to assume that
    @@ Commit message
         2016 .  I think it is most sensible to assume that we would have more recent
         one, especially given that we need git annex >= 201905 ATM
     
    +    Modified-by: Kyle Meyer <kyle@kyleam.com>
    +      Switched from custom decorator to lru_cache.
    +
      ## datalad/config.py ##
     @@
    - )
    - from datalad.cmd import GitRunner
    - from datalad.dochelpers import exc_str
    -+from datalad.utils import memoize
    - from distutils.version import LooseVersion
    + """
    + """
      
    - import re
    ++from functools import lru_cache
    + import datalad
    + from datalad.consts import (
    +     DATASET_CONFIG_FILE,
     @@
      
      # we cannot import external_versions here, as the cfg comes before anything
      # and we would have circular imports
    -+@memoize
    - def get_git_version(runner):
    ++@lru_cache()
    + def get_git_version(runner=None):
          """Return version of available git"""
    -     return runner.run('git version'.split())[0].split()[2]
    +     runner = runner or GitRunner()

@yarikoptic
Copy link
Member Author

yarikoptic commented Feb 28, 2020

Heh heh, I guess I checked/saw those 2 saved ones on a simple invocation and got satisfied ;-) thank you!

kyleam and others added 2 commits Feb 28, 2020
ConfigManager.__init__() and external_versions._get_git_version()
instantiate GitRunner() and pass it as an argument to
config.get_git_version().  This is fine, but not really necessary.
There's nothing distinct across these instances.  (The current working
directory can change, but that doesn't matter for 'git version'.)

The next commit will wrap get_git_version() with lru_cache().
Continuing to pass separate GitRunner instances would prevent the
cache from being effective because _every_ ConfigManager() instance
passes a unique runner.

Make the runner argument optional, and update ConfigManager.__init__()
and external_versions._get_git_version() to not use it.  Note that
reason we need to keep the argument around at all is because
external_versions._get_system_git_version() relies on passing it a
plain Runner() instance.
I saw us invoking git version  over and over again.  There really should be
no reason to sample that "densly" and IMHO it should be safe to assume that
it would not drastically change through the life time of the process.
Moreover, we are using it seems only to check if git  >= 2.8, which is from
2016 .  I think it is most sensible to assume that we would have more recent
one, especially given that we need git annex >= 201905 ATM

Modified-by: Kyle Meyer <kyle@kyleam.com>
  Switched from custom decorator to lru_cache.
@kyleam
Copy link
Collaborator

kyleam commented Feb 28, 2020

Windows builds should be back to a green state, so I'll do a no-op rewrite to trigger the tests and make sure no new windows issues were masked.

I've canceled the Travis build given an identical tree already passed here.

@kyleam kyleam force-pushed the enh-memoize-git-version branch from 9bffa97 to d1e3761 Compare Feb 28, 2020
@kyleam
Copy link
Collaborator

kyleam commented Feb 28, 2020

I'll do a no-op rewrite to trigger the tests and make sure no new windows issues were masked

Checks out.

My update of this branch to use lru_cache hasn't sat for very long, but the idea is the same as the initial iteration of the PR, which didn't receive any objections over a few days, so I'll go ahead with the merge.

@kyleam kyleam merged commit 6adb746 into datalad:maint Feb 28, 2020
14 of 15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants