Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Absorb rev-diff #3366

Merged
merged 8 commits into from May 2, 2019
Merged

Absorb rev-diff #3366

merged 8 commits into from May 2, 2019

Conversation

@kyleam
Copy link
Contributor

@kyleam kyleam commented Apr 30, 2019

Replace core's diff with -revolution's.  Update the few diff callers
to use the new variant, with the exception of annotate_paths.  Keep
the older diff around but hidden (not listed in an interface group and
named with an underscore) for annotate_paths.  We could update it to
use the new diff, but it's not worth the effort because it is only
used by add, create, and the save, which are on their way out.

This is copied from revolution's 38df2ddd2858fa8786b6fcca6c472ef9be8a53e8.
You can verify that there are no substantial changes by checking
datalad-revolution out at that commit, pointing $DL_REV_DIR to the
repository, and then running

  % git diff --no-index -- \
    $DL_REV_DIR/datalad_revolution/revdiff.py \
    datalad/core/local/diff.py

  % git diff --no-index -- \
    $DL_REV_DIR/datalad_revolution/tests/test_diff.py \
    datalad/core/local/tests/test_diff.py

Closes #3190.

kyleam added a commit to kyleam/datalad-revolution that referenced this issue Apr 30, 2019
@kyleam
Copy link
Contributor Author

@kyleam kyleam commented Apr 30, 2019

The tests are going to have at least one failure:


======================================================================
FAIL: datalad.tests.test_api.test_consistent_order_of_args(<class 'datalad.core.local.diff.Diff'>, {'path'})
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/kyle/src/python/venvs/datalad/lib/python3.5/site-packages/nose/case.py", line 198, in runTest
    self.test(*self.arg)
  File "/home/kyle/src/python/datalad/datalad/tests/test_api.py", line 59, in _test_consistent_order_of_args
    eq_(set(args[:len(spec_posargs)]), spec_posargs)
AssertionError: {'fr'} != {'path'}

----------------------------------------------------------------------

I'll leave that to @yarikoptic and @mih to revisit and hash out.

@mih
Copy link
Member

@mih mih commented May 1, 2019

Thx @kyleam

Re arg order: I maintain my opinion that the tested condition is undesirable, but it is far more important to get this merged soon, hence I am OK with changing the order.

@codecov
Copy link

@codecov codecov bot commented May 1, 2019

Codecov Report

Merging #3366 into master will decrease coverage by 47.14%.
The diff coverage is 20.28%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master    #3366       +/-   ##
===========================================
- Coverage   91.13%   43.99%   -47.15%     
===========================================
  Files         263      265        +2     
  Lines       34210    34440      +230     
===========================================
- Hits        31177    15151    -16026     
- Misses       3033    19289    +16256
Impacted Files Coverage Δ
datalad/interface/__init__.py 100% <ø> (ø) ⬆️
datalad/distribution/publish.py 16.42% <0%> (-71.43%) ⬇️
datalad/interface/tests/test_diff.py 20.9% <0%> (-79.1%) ⬇️
datalad/interface/rerun.py 18.48% <0%> (-77.73%) ⬇️
datalad/interface/tests/test_run.py 37.7% <0%> (-62.14%) ⬇️
datalad/interface/annotate_paths.py 84.35% <100%> (-12.59%) ⬇️
datalad/interface/diff.py 84.37% <100%> (-13.13%) ⬇️
datalad/core/local/tests/test_diff.py 15.75% <15.75%> (ø)
datalad/core/local/diff.py 31.32% <31.32%> (ø)
datalad/core/local/tests/test_save.py 12.13% <0%> (-87.87%) ⬇️
... and 225 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 02e344a...a1a7028. Read the comment docs.

@codecov
Copy link

@codecov codecov bot commented May 1, 2019

Codecov Report

Merging #3366 into master will increase coverage by 0.11%.
The diff coverage is 98.16%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3366      +/-   ##
==========================================
+ Coverage   91.13%   91.24%   +0.11%     
==========================================
  Files         263      265       +2     
  Lines       34210    34452     +242     
==========================================
+ Hits        31177    31436     +259     
+ Misses       3033     3016      -17
Impacted Files Coverage Δ
datalad/interface/__init__.py 100% <ø> (ø) ⬆️
datalad/distribution/publish.py 87.85% <100%> (ø) ⬆️
datalad/interface/tests/test_diff.py 100% <100%> (ø) ⬆️
datalad/interface/rerun.py 96.2% <100%> (ø) ⬆️
datalad/interface/tests/test_run.py 99.83% <100%> (ø) ⬆️
datalad/interface/annotate_paths.py 96.94% <100%> (+0.01%) ⬆️
datalad/interface/diff.py 95.62% <100%> (-1.88%) ⬇️
datalad/core/local/diff.py 95% <95%> (ø)
datalad/core/local/tests/test_diff.py 99.39% <99.39%> (ø)
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 02e344a...362d4ec. Read the comment docs.

mih
mih approved these changes May 1, 2019
Copy link
Member

@mih mih left a comment

Push the arg order change. Good to go.

@kyleam
Copy link
Contributor Author

@kyleam kyleam commented May 1, 2019

In the commit message I say that annotate_paths is only used in add, create, and save. I'm not sure how I came to that conclusion because, as indicated by #3368 and simple grepping, it's used in many other spots. I'll drop that misinformation from the message with the merge or with any needed updates (since I know @yarikoptic is currently reviewing these changes).

Copy link
Member

@yarikoptic yarikoptic left a comment

Largely minor comments and questions but I feel that some should better be addressed to avoid going fishing for the "cause" later on


# convert cmdline args into plain labels
if isinstance(fr, list):
fr = fr[0]
Copy link
Member

@yarikoptic yarikoptic May 1, 2019

I guess just for my own comprehension -- why do we need nargs=1 above which is what I guess possibly leads to a list here? (note that the if body isn't even triggered by the tests)

Copy link
Member

@mih mih May 2, 2019

I cannot recall, or construct a reason. Removed.

yield r

@staticmethod
def custom_result_renderer(res, **kwargs): # pragma: no cover
Copy link
Member

@yarikoptic yarikoptic May 1, 2019

I dislike those pragma: no cover for pieces of code which can and should be tested. If we do not test them, that is "ok" but I think we shouldn't pretend that they shouldn't be

Copy link
Member

@mih mih May 2, 2019

I recently started doing that. The reason is that I can see right in the code what does not have a dedicated test (i.e. a test that is intentionally written for a code piece). I find that coverage has fooled me frequently into thinking things are being tested. I dont see it as pretending, rather than a declaration of test absence. I cannot come up with a reason why any piece of code should not be tested.

Copy link
Member

@yarikoptic yarikoptic May 2, 2019

Just visit codecov and you will see all lines which aren't covered by tests. Such pragma comments is intended to be used for code which isn't/can't be tested - those are ignored while computing % of code covered

Copy link
Member

@mih mih May 3, 2019

FTR: I know that I can get lines that are not executed at all from coverage reports. What I described is something different. I want to mark a code block that I know is not tested systematically (it may be executed, hence "covered", but not under all relevant circumstances). An increasingly relevant issue for me are list comprehensions (that I started to use much more frequently). They are generally full of conditionals, but regardless of whether I test them, the list comprehension is always covered completely. For the case that you commented on that distinction is not relevant, because as of now that code is not executed by the tests (I think). But I don't care much if it is. The point is that even if executed, it is not tested.

Copy link
Member

@yarikoptic yarikoptic May 3, 2019

ah, thanks for the explanation of the purpose. So we should then come up with some additional/different annotation for those to not confuse with "exclude from what needs to be covered" which is what pragma: no cover is used for. May be pragma: more cover or alike?

BTW, just discovered to myself ["branch coverage"](https://coverage.readthedocs.io/en/coverage-4.3.4/branch.html#branch) in coverage.py.
$> python-coverage run cov.py         
(dev) 1 10091.....................................:Fri 03 May 2019 11:32:29 AM EDT:.
hopa:/tmp
$> python-coverage report    
Name     Stmts   Miss  Cover
----------------------------
cov.py       8      0   100%
(dev) 1 10092.....................................:Fri 03 May 2019 11:32:31 AM EDT:.
hopa:/tmp
$> python-coverage run --branch cov.py
(dev) 1 10093.....................................:Fri 03 May 2019 11:32:38 AM EDT:.
hopa:/tmp
$> python-coverage report             
Name     Stmts   Miss Branch BrPart  Cover
------------------------------------------
cov.py       8      0      2      1    90%
(dev) 1 10094.....................................:Fri 03 May 2019 11:32:40 AM EDT:.
hopa:/tmp
$> python-coverage annotate           
(dev) 1 10095.....................................:Fri 03 May 2019 11:32:55 AM EDT:.
hopa:/tmp
$> cat cov.py,cover        
> def my_partial_fn(x):       # line 1
>     if x:                   #      2
>         y = 10              #      3
>     return y                #      4
  
> y = 1
> my_partial_fn(y)
  # list comprehension where portion is never triggered as well
> [i for i in range(2, 10) if i > y]
  # or assignment
> z = y or None

Copy link
Member

@yarikoptic yarikoptic May 3, 2019

#3382 to continue on this topic


def _diff_ds(ds, fr, to, constant_refs, recursion_level, origpaths, untracked,
annexinfo, eval_file_type, cache):
if not ds.repo:
Copy link
Member

@yarikoptic yarikoptic May 1, 2019

Suggested change
if not ds.repo:
if not ds.is_installed():

Copy link
Member

@mih mih May 2, 2019

Is it faster?

Copy link
Member

@mih mih May 2, 2019

Speed seems to be exactly the same.

Copy link
Member

@mih mih May 2, 2019

Adjusted.

Copy link
Member

@yarikoptic yarikoptic May 2, 2019

my comment was not about speed but about coded semantic/assumptions -- are you really care about .repo being there or dataset being installed (whatever that means and regardless how it is assessed)? I thought that it about it being installed.

untracked=untracked,
eval_file_type=eval_file_type,
_cache=cache)
except ValueError as e:
Copy link
Member

@yarikoptic yarikoptic May 1, 2019

I would say we better define a custom InvalidReferenceError(ValueError) (to be thrown in GitRepo.get_content_info) and avoid matching string representation of an exception here - fragile etc.

Copy link
Member

@mih mih May 2, 2019

If you know how to do it without string matching, please let me know. Git doesn't give me an exception.

Copy link
Member

@mih mih May 2, 2019

status='impossible',
message=estr,
)
return
Copy link
Member

@yarikoptic yarikoptic May 1, 2019

shouldn't it be reraised otherwise?

Copy link
Member

@mih mih May 2, 2019

Done.

)
return

if annexinfo and hasattr(ds.repo, 'get_content_annexinfo'):
Copy link
Member

@yarikoptic yarikoptic May 1, 2019

why is it not just

Suggested change
if annexinfo and hasattr(ds.repo, 'get_content_annexinfo'):
if annexinfo and isinstance(ds.repo, AnnexRepo):

? (the same check exists also in datalad/core/local/status.py) Is it left for some compatibility with pre-0.12rc datalad which did not have it?

Copy link
Member

@mih mih May 2, 2019

There was a time, when AnnexRepo was kept intentionally uneducated ;-)

Copy link
Member

@mih mih May 2, 2019

I thought about this, and I want to keep it as-is. The test checks exactly what is needed: an object with this method. It is good as is, and changing it only has the potential to be stricter than necessary.

Copy link
Member

@yarikoptic yarikoptic May 2, 2019

Should we move all code to similar duck typing for uniformity? ;-) but whatever, let's call it stylistic and use interchangeably

if annexinfo and hasattr(ds.repo, 'get_content_annexinfo'):
# this will ammend `status`
ds.repo.get_content_annexinfo(
paths=paths if paths else None,
Copy link
Member

@yarikoptic yarikoptic May 1, 2019

so empty list (well -- dict) here (and below) would get a special treatment as "no paths provided"? Shouldn't this not be done and all the semantic of empty list -vs- None be handled in those lower level functions consistently?

Copy link
Member

@mih mih May 2, 2019

True. However, I changed it in a different way to mitigate the confusion that may result from passing a dict where a list is needed -- really no other information then the keys is used.

datalad/interface/annotate_paths.py Show resolved Hide resolved
kyleam and others added 8 commits May 2, 2019
Replace core's diff with -revolution's.  Update the few diff callers
to use the new variant, with the exception of annotate_paths.  Keep
the older diff around but hidden (not listed in an interface group and
named with an underscore) for annotate_paths.  We could update it to
use the new diff, but it's not worth the effort because the plan is to
retire annotate_paths.

This is copied from revolution's 38df2ddd2858fa8786b6fcca6c472ef9be8a53e8.
You can verify that there are no substantial changes by checking
datalad-revolution out at that commit, pointing $DL_REV_DIR to the
repository, and then running

  % git diff --no-index -- \
    $DL_REV_DIR/datalad_revolution/revdiff.py \
    datalad/core/local/diff.py

  % git diff --no-index -- \
    $DL_REV_DIR/datalad_revolution/tests/test_diff.py \
    datalad/core/local/tests/test_diff.py

Closes datalad#3190.
and stop relying on implicit knowledge that passing a dict where the
function assumes a list, in a world where nobody reads the docs ;-)

https://github.com/datalad/datalad/pull/3366/files#r280068735
@kyleam kyleam force-pushed the absorb-rev-diff branch from f6e34dc to 362d4ec May 2, 2019
@kyleam kyleam merged commit 362d4ec into datalad:master May 2, 2019
1 of 3 checks passed
@kyleam
Copy link
Contributor Author

@kyleam kyleam commented May 2, 2019

Looks like all the main points have been addressed, and the other bits are just stylistic differences. Merged. @yarikoptic, please open a dedicated issue if I'm mistaken.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

3 participants