New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dev-cmd/contributions: Use GitHub APIs for commit author info #14737
Conversation
- Using `git log` was brittle with name changes and email address changes for contributors over the years unless we made a Git `mailmap` file which brings with it its own updatedness overhead. - Let's use the GitHub commits API (importantly _not_ the search API) so that we can give it a username and it will return contributions associated with every email address on that user's account: https://docs.github.com/en/rest/commits/commits?apiVersion=2022-11-28#list-commits--parameters. - This is quite significantly slower, but it's worth it for correctness especially when we get to all maintainers' contributions (in a separate PR). - The commits API does not (yet?) support trailers or commit "committer"s, just authors.
Review period will end on 2023-02-21 at 23:49:23 UTC. |
end | ||
|
||
results[repo] = { | ||
commits: git_log_author_cmd(T.must(repo_path), args), | ||
commits: GitHub.repo_commit_count_for_user(tap.full_name, args.named.first), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For backwards compatibility (and accuracy of "First Last" name queries), maybe we should keep git_log_author_cmd
around until everything is made to use the GitHub APIs and call a different method for commit counts depending on if args.named.first
is a GitHub username or a full name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need backwards compatibility for this command FWIW.
8c0b763
to
0a6c59c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking great so far!
@@ -19,11 +19,11 @@ module Homebrew | |||
sig { returns(CLI::Parser) } | |||
def contributions_args | |||
Homebrew::CLI::Parser.new do | |||
usage_banner "`contributions` <email|name> [<--repositories>`=`] [<--csv>]" | |||
usage_banner "`contributions` <email|username> [<--repositories>`=`] [<--csv>]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this should only ever take the username for consistency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's an open question about this and backwards compatibility at #14737 (comment). I think until we have trailers implemented at API level (or some querying of our own to filter to get them), we have to support both? Especially if we're doing #14722 that queries by name because not everyone uses their GitHub username in their Git committer attributes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool. I guess the problem I'm hoping to figure out a solution to (even if a manual one) is: how do I combine the results from the various different outputs into a cohesive one without double counting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or some querying of our own to filter to get them
Can we ask the API for email addresses associated with a user name and then use those to filter git log
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or some querying of our own to filter to get them
Can we ask the API for email addresses associated with a user name and then use those to filter
git log
?
Nope, only the authenticated user can get their own "private" email addresses: https://docs.github.com/en/rest/users/emails?apiVersion=2022-11-28#list-email-addresses-for-the-authenticated-user. Which makes sense.
I suppose what we could do is get all the commits for an author, slurp the email addresses and then, discounting the commits we already have, scan again for coauthors and signoffs ourselves? But that feels like a lot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but there's only ever one "primary" email address. 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool. I guess the problem I'm hoping to figure out a solution to (even if a manual one) is: how do I combine the results from the various different outputs into a cohesive one without double counting?
To perhaps be even more explicit (and this addresses backwards compatibility too): my understanding is the goal of this command is to be able to have me, and ideally any maintainer, see which maintainers are likely to be asked to step down due to inactivity.
Personally, I think that means:
- there's no need for any backwards compatibility
- a single path and fewer options are better, even if the output is inferior and if we need e.g. to run against both email and username and check both: this tool should be doing the aggregation and not the user
Given this: I think it'd make sense for something like a bare brew contributions
(i.e. the default options) to spit out the contributions for:
- all current maintainers
- over the last year
- using their username and/or email
- on primary repositories, grouped by repo
- sorted from most to least active
@issyl0 I can privately share the spreadsheet I filled in manually to calculate this last time if it helps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, that'd be useful. This definitely needs work. Sorry for the back and forth/confusion/diversions. The last "makes sense for" gives me more to work with, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@issyl0 Absolutely no problem on back-and-forth here! I've not been terribly clear/explicit here and wasn't sure if we were agreed on the goals here!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gonna post that comment in the main tracking isue and merge this as-is, since it's useful (and more accurate) than the previous implementation (for a GitHub username). Then I'll iterate on the maintainers
stuff more soon.
- Functionally it doesn't matter that the URL will have an `&` at the end if `additional_query_params` is `nil`, because it doesn't affect the URL at all.
Review period ended. |
brew style
with your changes locally?brew typecheck
with your changes locally?brew tests
with your changes locally?git log
was brittle with name changes and email address changes for contributors over the years unless we made a Gitmailmap
file which brings with it its own updatedness overhead.For some context: I am lucky enough to have
issyl0
in all the email addresses that I've used in the past to commit to Homebrew, soissyl0
worked as a wide enough search string before this change. But searching for "Issy Long" omits some commits from when my Git committer name was not that.I noticed this when doing #14722 which is moving towards using
user.full_name
(because people don't usually have their GitHub username as their committer name or email address). The numbers didn't match up between there andmaster
for myself (and probably others too, since name changes are common for a variety of reasons).This is not just about me. 😆 Moving towards using the GitHub APIs will make this more accurate at counting contributions, mean that we don't have to remember email addresses for people to accurately count every contribution they've made, and also once everything is migrated it means that we don't have to tap all the taps locally since that's time-consuming. Rate limits could potentially be a problem, but that's one of the reasons we're not using the commit search API.
Before:
After: