Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VOTE] Adopt round-robin assignment of reviewers for GitHub pull request reviewer assignment. #9057

Closed
jroesch opened this issue Sep 21, 2021 · 21 comments

Comments

@jroesch
Copy link
Member

jroesch commented Sep 21, 2021

Dear Community:

We recently started using GitHub's CODEOWNERS to assign reviewers automatically but many committers have complained that they are struggling with the default settings assigning far too many pull requests to far too many people and not providing fair scheduling across all reviewers.

For example a relatively "normal" patch ended up assigning nearly all reviewers to review it which makes the CODEOWNERS way of assigning reviewers nearly useless for us.

We would love to enable more advanced review settings for the Apache TVM team, but it appears to be an org-only setting according to GH documentation: https://docs.github.com/en/organizations/organizing-members-into-teams/managing-code-review-assignment-for-your-team#routing-algorithms.

In order to do this we need to explicitly execute a vote for Apache Infra.

The desired outcome is to configure the above setting for our team so that we can use this feature with 2 reviewers per PR, and the round-robin assignment algorithm.

This is a formal voting thread for the community to adopt the change to our code review process.

Please vote

Vote:
+1
0 - I don't feel strongly about it, but don't object
-1 - because..

The VOTE will open for a week.

This thread is mirrored to dev@, please vote by replying to this thread.

@jroesch
Copy link
Member Author

jroesch commented Sep 21, 2021

See: https://issues.apache.org/jira/browse/INFRA-22324 for more context.

@denise-k
Copy link
Collaborator

+1

@tqchen
Copy link
Member

tqchen commented Sep 21, 2021

+1 TQ

@masahi
Copy link
Member

masahi commented Sep 21, 2021

+1, let's try this

@Hzfengsy
Copy link
Member

+1

6 similar comments
@csullivan
Copy link
Contributor

+1

@vinx13
Copy link
Member

vinx13 commented Sep 21, 2021

+1

@jwfromm
Copy link
Contributor

jwfromm commented Sep 21, 2021

+1

@junrushao
Copy link
Member

+1

@mehrdadh
Copy link
Member

+1

@areusch
Copy link
Contributor

areusch commented Sep 22, 2021

+1

@tmoreau89
Copy link
Contributor

It's worth giving this a try!

+1

@liangfu
Copy link
Member

liangfu commented Sep 22, 2021

+1

@Mousius
Copy link
Member

Mousius commented Sep 26, 2021

-1

I don't believe this is will solve the problem of "assigning far too many pull requests to far too many people and not providing fair scheduling across all reviewers" and may introduce other issues. I believe this will result in is code owners missing pull requests which are relevant to them and recreating the "review-by-request" we had previously, when a less active code owner isn't available then help will be requested from the more active code owners in the community.

This is based on analysis on the last 1000 merge commits in main, using the associated responsibility of reviewing and merging code into the codebase. The analysis shows over those 1000 commits there were a set of active committers doing the majority of the merging, whereas others had other priorities, which I believe is fair in an open source project. Those active committers didn't change significantly after changes #8500 or #8512.

I'm therefore concerned that this change reduces the visibility of pull requests to active committers, decreases the likelihood of a given pull request being reviewed and not materially effecting the workload for active committers.

@manupak
Copy link
Contributor

manupak commented Sep 27, 2021

-1

It feels like a wrong solution to a valid problem. The objection would be for mainly two reasons as follows :

  1. The round-robin assignment could miss out interested reviewers. Giving everyone (who have working and contributed the specific component) the opportunity for review, IMO, should be the fair solution.

If the problem is that the granularity is too coarse, the actual action might have been make the CODEOWNERS bit more specific. It would be interesting to know why the committers are getting pinged for far too many pull requests between (a) there are lot of activity going on the specific component the committer owning vs (b) there is lot of activity going on many components that are grouped to a single bucket where committers may not necessarily be interested in. Are we sure its (a) ?

  1. I feel picking two reviewers per PR makes it bit non-voluntary.

@leandron
Copy link
Contributor

-1

I think we should try other measures before an enforced round-robin, such as:

  • tuning the list of maintainers in a more detailed way, with perhaps less people per directory so that lots of people don't get e-mails for simple reviews
  • providing more guidance to incentive smaller, incremental patches, which won't touch many areas of the code at once, therefore reaching out to less reviewers

Other points:

  1. If people feel like they are getting too many e-mails for reviews, GitHub e-mails are very descriptive, and people can filter them easily i.e. there is always the reason on why you're getting involved such as "Review requested" (the ones added automatically), "Mention" (if you're quoted by name), "Comment" (on the threads you comment).

  2. How to match automatic assignment with the time people can dedicate to reviews, which can be variable in different periods of the year (holidays, deadlines, other priorities, personal, etc...). On this point, I think it would be fair to have a documented way for people to decline assigned reviews without necessarily specifying a reason.

@electriclilies
Copy link
Contributor

electriclilies commented Sep 29, 2021

Thanks @manupa-arm @Mousius and @leandron for bringing these points up.


Building off what you said, I just wanted to point out that I think that this vote is really two questions rolled into one:
Q1: Whether the current system of tagging all code owners is working, and should we revert it
Q2: Whether we want to move to a system where we are tagging only 2 of the code owners for round-robin review

From what I’ve observed, tagging all code owners is not working. Right now, there’s a small subset of committers who do a large amount of shepherding code. These people are the ones whose workflows have been most affected because they are being tagged in nearly everything and can’t sort through the noise. I think that this is a good reason to revert the system where we tag all code owners in the short term. 

However, I don’t think that moving to the round robin review where we tag 2 people out of the current code owners list is a good long-term solution. Most of the code owners are not active so the result will pretty much be going back to the way things are where we just tag relevant people and hope they respond.

Backing up a bit, I think the three main goals of tagging all code owners was:
G1. Spread the load of “close reviewing” to more committers
G2. Increase visibility across TVM subprojects (i.e., we might want someone who works on automation to look at microTVM PRs (and vice-versa) to make sure that the goals of those subproject are consistent)
G3. Give new community members who don’t have a network inside TVM visibility

Tagging all code owners did not accomplish G1, but I think it is helpful in getting us closer to G2 and G3. (Although it is worth noting that I only think it helps us get closer to G2 since it tags so many people rather indiscriminately). If I understand correctly, Manupa, Christopher and Leandro’s concern is that the round robin solution does not accomplish G1 and moves us further from accomplishing G2 and G3.

I think that (G1) and (G2 and G3) require separate but complimentary solutions. Before talking about those solutions, though, it’s helpful to consider the kinds of reviews that there are in TVM and their purpose:

R1. Review from someone who works closely on this aspect of the project and can provide very specific feedback based on a “close reading”
(Purpose of R1: Ensuring good-quality code and getting it in in time (code shepherding))

R2. Review from someone who is not familiar with this subproject but is doing a “close reading” with the goal of that person gaining domain knowledge of that subproject
(Purpose of R2: Knowledge spreading; creating more people who can do “close readings” of this subproject (creating more code shepherds))

R3. Review from someone with expertise in another part of the project for consistency with overall TVM goals and cross-company / cross-subproject networking
(Purpose of R3: Ensuring project consistency and preventing siloing of groups of contributors)

To spread the reviewing load out (G1), we need close reviews from more code shepherds (R1). I think that the easiest way to do this is to drastically increase the number of committers. To do this, we should make it a lot easier to become a committer. Specifically, I think we should consider allowing committers to nominate reviewers. Additionally, we should consider having a bot that notifies the community when contributors and reviewers hit certain milestones (like number of PRs submitted, LOC submitted, etc). It would still be up to a committer / PMC member to do the nomination, but this would essentially create a short-list of people to nominate. Another benefit of this is that it will reduce cognitive load on PMC and committers because they don’t need to be watching for who to nominate.

Additionally, for community members to gain domain knowledge, they need to be able to do R2s, which means they need visibility into PRs relevant to a certain area. From this standpoint, I think that fine-tuning the code owners of a certain part of the codebase and then notifying all of those code owners is the way to go. 
(Also, I think that“code owners” is not the right concept here— ideally the people notified would be everyone who wants to be notified of PRs in that part of the codebase, whether they are contributor, reviewer or committer. This way everyone gets notified for what they want to be notified. As a reviewer I am not pinged as a code owner, even if I would like to be).

To increase R3s, it would be nice to do a round-robin where we randomly tag people who are not “code owners” for the part of the project the PR is relevant to (in addition to tagging all the code owners). Ideally we’d limit the number of times someone gets tagged for an R3 review to once or twice a month so it’s not too much and they can adequately engage in the conversation if they want to.

Finally, I just wanted to clarify that I don't think reviewing people's code should be mandatory in any way. (But like @leandron said it would be helpful to have a way to communicate to the community if you are unavailable). And, I think that auto-tagging people is supplementary to contributors tagging other contributors for review -- we still want people to have working relationships with other contributors).

In summary, I think we should:

  1. Increase the number of committers by making it easier to become a reviewer and creating a bot that reports contributor milestones.
  2. Fine-tune who is notified for parts of the project via some sort of opt-in / opt-out system and notify all of them for PRs in that part of the project. Additionally, we should allow people who are reviewers and contributors to opt-in so that they can keep up with what is going on.
  3. Occasionally notify people on PRs for subprojects they don’t regularly work in to facilitate cross pollination and consistency across the project.

I’m not saying we should implement all of this right now, but I'd like to see us moving towards a world like this (though my thoughts are by no means finalized, so if you have any suggestions / feedback / thoughts please share them!)
cc @denise-k @hogepodge Also interested to see what you guys think!

@areusch
Copy link
Contributor

areusch commented Sep 30, 2021

thanks @manupa-arm @leandron @Mousius @electriclilies for the comments!

I have a few more comments based on more detailed analysis of the relationship between CONTRIBUTORS.md and CODEOWNERS. My original +1 was based on cursory look at CODEOWNERS in which I thought only a smaller subset of CONTRIBUTORS (e.g. maybe those who would opt in to lots of code-review notifications) were listed. In fact that's not true; instead:

  • there are 42 codeowners, one of them being apache/tvm-committers
  • there are 43 committers
  • nearly everyone in CODEOWNERS is a committer.

In general, having the problems raised by this thread is a positive thing as it means the community and the pace of PRs is growing as adoption increases :). I think many of @electriclilies proposals reflect a need to consider how we might scale the whole review process as the project grows--increasing the number of committers is a way to help with the review load, as are modifications to the way we notify and assign people to code-reviews.

However, I would like to request we focus this thread on the issue being voted on, which I'll attempt to qualify more below. We should absolutely continue discussing on another thread e.g. in the discuss forum or perhaps also at Community Meeting/TVMConf. I personally feel the pain of the current system--I get about 50 notifications per day and properly reviewing them often takes about 3h of time--and would be happy to discuss the broader code-review process more there. There are many days I simply don't have this time, as evidenced by my higher response latency recently.

To provide more context on the current thread: there are a number of things contributing to the current situation:

  1. Prior to updating CODEOWNERS in Keep CODEOWNERS file up to date. #8500, mentions were the primary way to get a committer's attention. The change was intended to provide a more explicit indication to contributors of who might be a reasonable reviewer.
  2. In addition to mentions, we also have the "assign review" feature of GitHub. Assign is typically used manually to indicate who is shepherding a PR.
  3. Then there are review requests, which are similar to assignments but which can be added by non-committers. Prior to Keep CODEOWNERS file up to date. #8500, at least my personal experience was that review requests were infrequent.
  4. After Keep CODEOWNERS file up to date. #8500 landed, review requests became quite frequent (many PRs wound up requesting reviews from a large cross-section of the CODEOWNERS file because the way we've chosen the ownership means that a typical change which spans e.g. python, src, and tests has a good chance of triggering CODEOWNERS lines underneath those directories. For example, a relay change which includes an incidental microTVM test fix might result in requesting from a microTVM CODEOWNER. Prior to Keep CODEOWNERS file up to date. #8500, an assigned reviewer (either explicitly through GH assignment or implicitly through mention/thread history) could bring in others as was needed.
  5. We now have 3 categories which reviewers need to monitor: mentions, assignments, review requests. It seems the GH solution here is notifications; however, both review requests and assignments trigger cause GH to subscribe and thereby notify you. And, we haven't assigned meaning behind a review-request vs an assignment.

The root problem exacerbated by #8500 is not so much getting review from a wider set of audience--it's assigning a committer to shepherd the review and own the process of merging it. Now, many PRs have 10+ committers listed as "reviewers" (e.g. review-requested). Even with two committers assigned, there is ambiguity in who's shepherding the PR. And, with both review requests and assignments in play, I think people can be getting confused between the two. Compounding the problem, we aren't consistent in our use of assignments--so, there is no authoritative place to look for PRs which a committer needs to shepherd.

There are a couple of ways to address this shepherd-assignment problem:

  1. Through CODEOWNERS, in which review requests suggest an assigned shepherd and one of the reviewers self-assigns.
  2. Through mentions or side-channel pings, also a self-assignment.
  3. Through project leadership combing through incoming PRs and issuing assignments.

#8500 was an attempt to move from 2 and 3 towards 1. However, I think it has been a bit hampered because CODEOWNERS operates at the directory level but we have not really correctly organized the project to support this style of OWNERS. For instance, the test structure is different than the actual code structure.

This vote was originally intended to further address the shepherd-assignment problem by reducing the number of suggested reviewers auto-requested on a PR. Any open source project is going to have committers which are more and less active. My concern with this proposal was primarily that a PR may get a varying speed of review if it's round-robin'd to a less active contributor. Originally I had thought this wasn't likely to be a problem immediately because I'd thought that CODEOWNERS was a subset of the committers. It looks like this actually could happen--but, I think there are actually a couple of remedies:

  1. First off, a familiar contributor who merely wants a fast review can always mention a committer they work with regularly to nudge them to look.
  2. Any committer is free to reassign the review to take shepherding responsibilities.
  3. We could consider the triage role to help here.

There has been suggestion that contributors use e-mail filters to help sort the incoming review requests. First off this is something I do and I'm supportive of it; however there are a couple of cases which may cause the filters to be hard to compose (e.g. at least in gmail which many of us use):

  1. Some committers get mentioned on tons of reviews which they never find time to comment on.
  2. With the current CODEOWNERS, I don't think it's possible to differentiate between a requested review due to CODEOWNERS and a review requested explicitly.
  3. There are situations where e.g. you neither were mentioned on a review nor assigned/review-requested but would like to watch a PR.

Compounding these challenges is time--these conditions may come and go throughout the life of a review. GitHub supports subscriptions for this reason, I believe.

In summary, I've gone back and forth in supporting this change (from supportive to unsupportive and now back to supportive given the remedies). I'm not completely sure this is the right thing to do, but wiling to try as I don't think the current fully-automatic system is working that well. I would further support codifying the purpose of assignments/review-requests and considering adopting a project-wide methodology of triaging new review requests (e.g. perhaps utilizing a triage role and indicating who is responsible for this). And, we should definitely continue discussing the process of scaling code-review in other threads, as this is merely one aspect to discuss.

@hogepodge
Copy link
Contributor

+0 as someone who is not actively reviewing a large body of code, I don't have a strong opinion on if this is the right move. It's clear that the current "tag all code owners" is not workable, and I fully support removing that. To me the big questions is: how do we fairly allocate work amongst the community and code owners. We can look at how other communities have handled this.

O1) Within the Kubernetes community, code ownership is allocated to Special Interest Groups, and SIG leaders are round-robin allocated to review code under their ownership. Other reviewers can be automatically added also based on codeowners files, and reviewers and assignees can be manually added through tags to bots.
O2) The Rust community allows for volunteers to join a "high five" program that automatically round robin assigns them for code review. It's opt-in, and is a way to gain experience and merit within the community.

Echoing what @areusch was saying, since every pull request needs a review from a committer to merge, I feel we need a way to allocate committers, but it does no good if committers have become inactive and we're just spamming them. Going back to Kubernetes as an example, their three-month release cycle creates a natural period of time where contributors can volunteer for work but be able to step back gracefully if work or personal demands take them away from the project. Looking at O2, I could imagine a quarterly "call for reviewers" to give us space to build an equivalent of that "high five" list, and for the PMC to review contributions by community members and identify who is active, and who is inactive.

Within a community ladder I like to think of what roles and responsibilities people have to be regarded as active in the community. For example, an active reviewer or committer should be maintaining a certain level of reviews, and similarly somone who is not an active reviewer but wants to be promoted should be able to demonstrate an ability to do that. This is somewhat at odds with The Apache Way of doing things, but in my mind it helps to distinguish between active an inactive, as it helps the community in general identify who is there to help, and how to share the work amongst them.

@electriclilies
Copy link
Contributor

electriclilies commented Sep 30, 2021

@areusch I definitely agree with everything you said. To clarify, I'm in favor of this going forward given the impact it has on the quality of life of the code shepherds, so I guess I'll officially vote +1.
I just wanted to mention these concerns in a place where we already have some discussion :) and also point out that this does not really improve the quality of life for people who are not code shepherds-- there's still a lot of work to be done there, but it doesn't need to be done now.

Also I really think we should explore the triage role as a way to spread some of the logistical load away from code shepherds!

@hogepodge Yeah I think that looking at what other open source communities are doing could be quite instructive for us, since scaling TVM is going to be difficult. I think we should also really consider what purpose reviews serve in the community, both from a technical code shepherding standpoint and a knowledge sharing standpoint.

As Andrew said, this vote thread is probably not the best place to have this discussion though :) @hogepodge @areusch Let's figure out what a good format for this discussion is elsewhere

@masahi
Copy link
Member

masahi commented Jan 9, 2022

This is stale.

@masahi masahi closed this as completed Jan 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests