Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dev-cmd/contributions: Use GitHub APIs for commit author info #14737

Merged
merged 3 commits into from Feb 22, 2023

Conversation

issyl0
Copy link
Member

@issyl0 issyl0 commented Feb 20, 2023

  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same change?
  • Have you added an explanation of what your changes do and why you'd like us to include them?
  • Have you written new tests for your changes? Here's an example.
  • Have you successfully run brew style with your changes locally?
  • Have you successfully run brew typecheck with your changes locally?
  • Have you successfully run brew tests with your changes locally?

  • Using git log was brittle with name changes and email address changes for contributors over the years unless we made a Git mailmap file which brings with it its own updatedness overhead.
  • Let's use the GitHub commits API (importantly not the search API) so that we can give it a username and it will return contributions associated with every email address on that user's account: https://docs.github.com/en/rest/commits/commits?apiVersion=2022-11-28#list-commits--parameters.
  • This is quite significantly slower, but it's worth it for correctness especially when we get to all maintainers' contributions (in a separate PR).
  • The list commits API does not (yet? 🤔) support filtering by trailers or commit "committer"s, just authors.

For some context: I am lucky enough to have issyl0 in all the email addresses that I've used in the past to commit to Homebrew, so issyl0 worked as a wide enough search string before this change. But searching for "Issy Long" omits some commits from when my Git committer name was not that.

I noticed this when doing #14722 which is moving towards using user.full_name (because people don't usually have their GitHub username as their committer name or email address). The numbers didn't match up between there and master for myself (and probably others too, since name changes are common for a variety of reasons).

This is not just about me. 😆 Moving towards using the GitHub APIs will make this more accurate at counting contributions, mean that we don't have to remember email addresses for people to accurately count every contribution they've made, and also once everything is migrated it means that we don't have to tap all the taps locally since that's time-consuming. Rate limits could potentially be a problem, but that's one of the reasons we're not using the commit search API.


Before:

$ brew contributions "Issy Long"
The user Issy Long has made 1112 contributions in all time.

After:

$ brew contributions issyl0
The user issyl0 has made 1222 contributions in all time.

- Using `git log` was brittle with name changes and email address changes for
  contributors over the years unless we made a Git `mailmap` file which brings
  with it its own updatedness overhead.
- Let's use the GitHub commits API (importantly _not_ the search API) so that
  we can give it a username and it will return contributions associated with
  every email address on that user's account:
  https://docs.github.com/en/rest/commits/commits?apiVersion=2022-11-28#list-commits--parameters.
- This is quite significantly slower, but it's worth it for correctness
  especially when we get to all maintainers' contributions (in a separate PR).
- The commits API does not (yet?) support trailers or commit "committer"s, just
  authors.
@BrewTestBot
Copy link
Member

Review period will end on 2023-02-21 at 23:49:23 UTC.

@BrewTestBot BrewTestBot added the waiting for feedback Merging is blocked until sufficient time has passed for review label Feb 20, 2023
Library/Homebrew/tap.rb Outdated Show resolved Hide resolved
end

results[repo] = {
commits: git_log_author_cmd(T.must(repo_path), args),
commits: GitHub.repo_commit_count_for_user(tap.full_name, args.named.first),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For backwards compatibility (and accuracy of "First Last" name queries), maybe we should keep git_log_author_cmd around until everything is made to use the GitHub APIs and call a different method for commit counts depending on if args.named.first is a GitHub username or a full name?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need backwards compatibility for this command FWIW.

Library/Homebrew/tap.rb Outdated Show resolved Hide resolved
Library/Homebrew/dev-cmd/contributions.rb Outdated Show resolved Hide resolved
Copy link
Member

@carlocab carlocab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

Library/Homebrew/utils/github.rb Outdated Show resolved Hide resolved
Copy link
Member

@MikeMcQuaid MikeMcQuaid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great so far!

@@ -19,11 +19,11 @@ module Homebrew
sig { returns(CLI::Parser) }
def contributions_args
Homebrew::CLI::Parser.new do
usage_banner "`contributions` <email|name> [<--repositories>`=`] [<--csv>]"
usage_banner "`contributions` <email|username> [<--repositories>`=`] [<--csv>]"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this should only ever take the username for consistency?

Copy link
Member Author

@issyl0 issyl0 Feb 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's an open question about this and backwards compatibility at #14737 (comment). I think until we have trailers implemented at API level (or some querying of our own to filter to get them), we have to support both? Especially if we're doing #14722 that queries by name because not everyone uses their GitHub username in their Git committer attributes?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. I guess the problem I'm hoping to figure out a solution to (even if a manual one) is: how do I combine the results from the various different outputs into a cohesive one without double counting?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or some querying of our own to filter to get them

Can we ask the API for email addresses associated with a user name and then use those to filter git log?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or some querying of our own to filter to get them

Can we ask the API for email addresses associated with a user name and then use those to filter git log?

Nope, only the authenticated user can get their own "private" email addresses: https://docs.github.com/en/rest/users/emails?apiVersion=2022-11-28#list-email-addresses-for-the-authenticated-user. Which makes sense.

I suppose what we could do is get all the commits for an author, slurp the email addresses and then, discounting the commits we already have, scan again for coauthors and signoffs ourselves? But that feels like a lot.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but there's only ever one "primary" email address. 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. I guess the problem I'm hoping to figure out a solution to (even if a manual one) is: how do I combine the results from the various different outputs into a cohesive one without double counting?

To perhaps be even more explicit (and this addresses backwards compatibility too): my understanding is the goal of this command is to be able to have me, and ideally any maintainer, see which maintainers are likely to be asked to step down due to inactivity.

Personally, I think that means:

  • there's no need for any backwards compatibility
  • a single path and fewer options are better, even if the output is inferior and if we need e.g. to run against both email and username and check both: this tool should be doing the aggregation and not the user

Given this: I think it'd make sense for something like a bare brew contributions (i.e. the default options) to spit out the contributions for:

  • all current maintainers
  • over the last year
  • using their username and/or email
  • on primary repositories, grouped by repo
  • sorted from most to least active

@issyl0 I can privately share the spreadsheet I filled in manually to calculate this last time if it helps.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that'd be useful. This definitely needs work. Sorry for the back and forth/confusion/diversions. The last "makes sense for" gives me more to work with, thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@issyl0 Absolutely no problem on back-and-forth here! I've not been terribly clear/explicit here and wasn't sure if we were agreed on the goals here!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gonna post that comment in the main tracking isue and merge this as-is, since it's useful (and more accurate) than the previous implementation (for a GitHub username). Then I'll iterate on the maintainers stuff more soon.

Library/Homebrew/utils/github/api.rb Outdated Show resolved Hide resolved
- Functionally it doesn't matter that the URL will have an `&` at the
  end if `additional_query_params` is `nil`, because it doesn't affect
  the URL at all.
@BrewTestBot BrewTestBot removed the waiting for feedback Merging is blocked until sufficient time has passed for review label Feb 22, 2023
@BrewTestBot
Copy link
Member

Review period ended.

@issyl0 issyl0 merged commit 93ce211 into Homebrew:master Feb 22, 2023
@issyl0 issyl0 deleted the api-commits-for-person branch February 22, 2023 14:12
@github-actions github-actions bot added the outdated PR was locked due to age label Mar 25, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 25, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
outdated PR was locked due to age
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants