New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NF: GitRepo.for_each_ref_() #3705
Conversation
Codecov Report
@@ Coverage Diff @@
## master #3705 +/- ##
===========================================
- Coverage 82.88% 34.64% -48.25%
===========================================
Files 273 273
Lines 35783 35792 +9
===========================================
- Hits 29658 12399 -17259
- Misses 6125 23393 +17268
Continue to review full report at Codecov.
|
Most of the comments are reading-along thoughts. The main thing that I think ought to be changed is replacing taggerdate with creatordate.
datalad/support/gitrepo.py
Outdated
|
||
Parameters | ||
---------- | ||
fields : list or str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For most these parameters, I'm okay with not having a description because it is just a direct map to the git for-each-ref
option. But I think the description for fields
should mention that it is used to construct the --format
value.
Also, I suspect you're using a tuple as the default to avoid the bad practice/gotcha of using a mutable list as the default value (quickly scanning, I don't think a list would actually be problematic here, though it should still be avoided). I'm fine with that (and would be fine with the more standard "None default, set list" as well), but I think if the default value is a tuple, the type shouldn't be documented as "list or str". Perhaps "str or iterable of str"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will add a description to all of them.
I used a tuple for those reasons, I used it over None to make it obvious from the signature what dict keys one can expect. I will fix the description.
datalad/support/gitrepo.py
Outdated
# self.repo.git.branch(r=True).splitlines()] | ||
return [ | ||
b['refname'][13:] # strip 'refs/remotes/' | ||
for b in self.for_each_ref_(fields='refname', pattern='refs/remotes') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was going to remark that, if a "/HEAD" symbolic ref exists, this will return it, which we probably don't want. But it turns out that the gitpy variant does the same thing. OK.
datalad/support/gitrepo.py
Outdated
hexsha=t['object'] if t['object'] else t['objectname'], | ||
) | ||
for t in self.for_each_ref_( | ||
fields=['refname:lstrip=2', 'objectname', 'object'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, neat, I didn't realize fields for object
(and other header values) existed. So this is equivalent to dereferencing objectname
: ['refname:lstrip=2', 'objectname', '*objectname']
.
For sorting, taggerdate
isn't ideal because it only applies to annotated tags. creatordate
is better here.
I was a bit surprised to see you use ":lstrip" here rather than splicing (so [10:]
) like you do elsewhere. I think either is fine, though if we're going to use one consistently, I have a weak preference for using ":lstrip", or actually ":strip". ":strip" has been around since Git v2.7.1, while the synonym ":lstrip" has only been around since v2.13.0. While I personally don't think we should worry about supporting v2.13.0, we might as well use the older spelling. (We really should specify a minimum Git version so that we can make clear decisions in these cases.)
My other related thought was whether you should use --short
for symbolic-ref and :short
for for-each-ref to let git handle the truncation, keeping what is needed for the ref to be unambiguous (e.g., refs/heads/foo -> heads/foo and refs/tags/foo -> tags/foo). But on second thought, there are at least two reasons not to qualify ambiguous refs: (1) the gitpy versions you are replacing do not and (2) get_active_branch(), get_{remote_,}branches}(), and get_tags() are by definition limited to a particular namespace, so we do not need to qualify that the ref is ambiguous in some other namespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ha, I didn't know that you can dereference objectname! ;-)
I now uniformly do :strip=2
. Thx for illustrating the problem space.
I agree re --short
, and left it as-is.
Thx @kyleam for pointing out that the prev approach only works for annotated tags.
@kyleam I think I have addressed all points you have raised. Thx for the review! |
test_get_tags is failing in the run that overrides the git author/date variables: https://travis-ci.org/datalad/datalad/jobs/589416798#L1138 I'll look into it and fix it. We're seeing red frequently due to gh-3653, which makes it hard to spot the failures triggered by the PR. I'm still not able to trigger 3653 locally and don't understand why the failure has emerged, but I'll open a PR later today proposing a workaround for it. |
Prior to the for_each_ref_() series (dataladgh-3705), get_tags() returned tags sorted by the committer date. For annotated tags, this means they were sorted by the object pointed to rather than the tagger date. Following dataladgh-3705, the sorted value is by for-each-ref's "creatordate" field (which exists so that lightweight and annotated tags can be sorted together), and annotated tags are now sorted based on the tagger date. This change seems harmless enough. No current callers in the code base rely on annotated tags being sorted by committer date, and it seems unlikely that outside callers rely on this behavior. But it does mean that test_get_tags needs to patch GIT_COMMITTER_DATE when creating annotated tags so that the tagger date is not affected by the environment of the test run. If we really want to preserve the old behavior, note that using "committerdate" (as originally done by dataladgh-3705's bb17d0e) is not correct. It will sort lightweight tags by their committer date and leave annotated tags sorted alphabetically at the top of the list. Instead we'd have to use a more involved approach, such as including both "committerdate" and the dereferenced "*committerdate" as fields and sorting by whichever value is non-empty for a given entry.
Work towards gh-3703
for_each_ref()
helpergit for-each-ref
#3703repodates
test failskm: 477e23a fixes the test_repodates.py:test_check_dates failure.
It seems that the present code doesn't conveniently handle things like annotated tags that point to an actual commit we are interested in, but actually have a separate
objectname
(or SHA1).