Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NF: GitRepo.for_each_ref_() #3705

Merged
merged 9 commits into from Sep 25, 2019
Merged

NF: GitRepo.for_each_ref_() #3705

merged 9 commits into from Sep 25, 2019

Conversation

mih
Copy link
Member

@mih mih commented Sep 24, 2019

Work towards gh-3703

It seems that the present code doesn't conveniently handle things like annotated tags that point to an actual commit we are interested in, but actually have a separate objectname (or SHA1).

mih and others added 3 commits Sep 24, 2019
Thx @kyleam for the suggestion of using `git symbolic-ref`
When replacing the custom "git for-each-ref" call with the new
GitRepo.for_each_ref_(), ca34bc6 incorrectly changed the namespace
pattern from 'refs/tags' to 'refs/heads'.
@codecov
Copy link

codecov bot commented Sep 24, 2019

Codecov Report

Merging #3705 into master will decrease coverage by 48.24%.
The diff coverage is 13.33%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master    #3705       +/-   ##
===========================================
- Coverage   82.88%   34.64%   -48.25%     
===========================================
  Files         273      273               
  Lines       35783    35792        +9     
===========================================
- Hits        29658    12399    -17259     
- Misses       6125    23393    +17268
Impacted Files Coverage Δ
datalad/support/repodates.py 18.75% <0%> (-77.17%) ⬇️
datalad/support/gitrepo.py 22.35% <14.28%> (-62.23%) ⬇️
datalad/support/tests/test_fileinfo.py 11.34% <0%> (-88.66%) ⬇️
datalad/local/tests/test_subdataset.py 11.82% <0%> (-88.18%) ⬇️
datalad/core/local/tests/test_save.py 12.12% <0%> (-87.07%) ⬇️
datalad/support/tests/test_repodates.py 12.96% <0%> (-87.04%) ⬇️
datalad/support/tests/test_stats.py 13.11% <0%> (-86.89%) ⬇️
datalad/interface/tests/test_ls_webui.py 14.28% <0%> (-85.72%) ⬇️
datalad/tests/test_protocols.py 14.81% <0%> (-85.19%) ⬇️
datalad/tests/test_config.py 14.88% <0%> (-85.12%) ⬇️
... and 206 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0050f09...5934e78. Read the comment docs.

Copy link
Contributor

@kyleam kyleam left a comment

Most of the comments are reading-along thoughts. The main thing that I think ought to be changed is replacing taggerdate with creatordate.


Parameters
----------
fields : list or str
Copy link
Contributor

@kyleam kyleam Sep 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For most these parameters, I'm okay with not having a description because it is just a direct map to the git for-each-ref option. But I think the description for fields should mention that it is used to construct the --format value.

Also, I suspect you're using a tuple as the default to avoid the bad practice/gotcha of using a mutable list as the default value (quickly scanning, I don't think a list would actually be problematic here, though it should still be avoided). I'm fine with that (and would be fine with the more standard "None default, set list" as well), but I think if the default value is a tuple, the type shouldn't be documented as "list or str". Perhaps "str or iterable of str"?

Copy link
Member Author

@mih mih Sep 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add a description to all of them.

I used a tuple for those reasons, I used it over None to make it obvious from the signature what dict keys one can expect. I will fix the description.

# self.repo.git.branch(r=True).splitlines()]
return [
b['refname'][13:] # strip 'refs/remotes/'
for b in self.for_each_ref_(fields='refname', pattern='refs/remotes')
Copy link
Contributor

@kyleam kyleam Sep 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to remark that, if a "/HEAD" symbolic ref exists, this will return it, which we probably don't want. But it turns out that the gitpy variant does the same thing. OK.

hexsha=t['object'] if t['object'] else t['objectname'],
)
for t in self.for_each_ref_(
fields=['refname:lstrip=2', 'objectname', 'object'],
Copy link
Contributor

@kyleam kyleam Sep 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, neat, I didn't realize fields for object (and other header values) existed. So this is equivalent to dereferencing objectname: ['refname:lstrip=2', 'objectname', '*objectname'].

For sorting, taggerdate isn't ideal because it only applies to annotated tags. creatordate is better here.

I was a bit surprised to see you use ":lstrip" here rather than splicing (so [10:]) like you do elsewhere. I think either is fine, though if we're going to use one consistently, I have a weak preference for using ":lstrip", or actually ":strip". ":strip" has been around since Git v2.7.1, while the synonym ":lstrip" has only been around since v2.13.0. While I personally don't think we should worry about supporting v2.13.0, we might as well use the older spelling. (We really should specify a minimum Git version so that we can make clear decisions in these cases.)

My other related thought was whether you should use --short for symbolic-ref and :short for for-each-ref to let git handle the truncation, keeping what is needed for the ref to be unambiguous (e.g., refs/heads/foo -> heads/foo and refs/tags/foo -> tags/foo). But on second thought, there are at least two reasons not to qualify ambiguous refs: (1) the gitpy versions you are replacing do not and (2) get_active_branch(), get_{remote_,}branches}(), and get_tags() are by definition limited to a particular namespace, so we do not need to qualify that the ref is ambiguous in some other namespace.

Copy link
Member Author

@mih mih Sep 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha, I didn't know that you can dereference objectname! ;-)

I now uniformly do :strip=2. Thx for illustrating the problem space.

I agree re --short, and left it as-is.

@mih
Copy link
Member Author

mih commented Sep 25, 2019

@kyleam I think I have addressed all points you have raised. Thx for the review!

kyleam
kyleam approved these changes Sep 25, 2019
Copy link
Contributor

@kyleam kyleam left a comment

@kyleam I think I have addressed all points you have raised.

Thanks.

@mih mih merged commit f541ed1 into datalad:master Sep 25, 2019
1 of 3 checks passed
@mih mih deleted the nf-foreachref branch Sep 25, 2019
@kyleam
Copy link
Contributor

kyleam commented Sep 25, 2019

test_get_tags is failing in the run that overrides the git author/date variables: https://travis-ci.org/datalad/datalad/jobs/589416798#L1138

I'll look into it and fix it.

We're seeing red frequently due to gh-3653, which makes it hard to spot the failures triggered by the PR. I'm still not able to trigger 3653 locally and don't understand why the failure has emerged, but I'll open a PR later today proposing a workaround for it.

kyleam added a commit to kyleam/datalad that referenced this issue Sep 25, 2019
Prior to the for_each_ref_() series (dataladgh-3705), get_tags() returned
tags sorted by the committer date.  For annotated tags, this means
they were sorted by the object pointed to rather than the tagger date.
Following dataladgh-3705, the sorted value is by for-each-ref's "creatordate"
field (which exists so that lightweight and annotated tags can be
sorted together), and annotated tags are now sorted based on the
tagger date.

This change seems harmless enough.  No current callers in the code
base rely on annotated tags being sorted by committer date, and it
seems unlikely that outside callers rely on this behavior.  But it
does mean that test_get_tags needs to patch GIT_COMMITTER_DATE when
creating annotated tags so that the tagger date is not affected by the
environment of the test run.

If we really want to preserve the old behavior, note that using
"committerdate" (as originally done by dataladgh-3705's bb17d0e) is not
correct.  It will sort lightweight tags by their committer date and
leave annotated tags sorted alphabetically at the top of the list.
Instead we'd have to use a more involved approach, such as including
both "committerdate" and the dereferenced "*committerdate" as fields
and sorting by whichever value is non-empty for a given entry.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants