Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SIP-13] Proposal for Code Review Process #6132

Closed
kristw opened this issue Oct 17, 2018 · 9 comments
Closed

[SIP-13] Proposal for Code Review Process #6132

kristw opened this issue Oct 17, 2018 · 9 comments
Labels
inactive Inactive for >= 30 days sip Superset Improvement Proposal
Projects

Comments

@kristw
Copy link
Contributor

kristw commented Oct 17, 2018

Motivation

The Superset community has a large number of contributors and stakeholders, which leads to many people trying to modify the codebase at the same time.

For project maintainer this introduces a lot of challenges:

  • A large number of issues and pull requests, which require significant amount of time to triage, review and shepherd until merging. As of Oct 15, 2018, there are 864 open issues and 161 open PRs. Many of these are a few years old and remain in limbo.
  • Keeping the code base healthy. It is great to receive many community contributions, but the overall project stability and code quality must come first. If the PR idea/work is interesting, but the quality does not match the standard or introduces risky changes without justified reasons, the project maintainers have a hard time handling these PRs. With the proposal for frequent release plan (SIP-12), master stability becomes even more important.
  • Merging a PR that ends up breaking master branch. Old community PRs that are finally merged sometimes break the master branch (commonly CI) because they are not rebased against master and tested again before merging. The responsibility then falls to the maintainer to resolve the broken state.

For contributors, these are the common questions:

  • Who will review my PR? Currently, it is unclear who should participate in the code review of each PR. Contributors sometimes manually tag github username of project maintainers for reviews. This does not work well for beginners who do not know anyone, perhaps tagging @mistercrunch. Even experienced contributors can have this issue when touching the code in different sections of the project. It is also a manual process.
  • When will someone review my PR? There are several uncertainties around the code review process. Some are implied (e.g.,no review until the PR pass all CI tests, PRs with [WIP] in the title are considered work-in-progress and will not be reviewed, etc.). However, there are many PRs that are not ready for review, but do not have [WIP] and cause confusion.
  • When will someone merge my PR? After PRs are approved, sometimes they are left in that state for a while. It is not explained when it will be merged. This left the contributors hanging.
  • Which release will include my PR? Currently, it is unclear when will the change make it to an official release, leading to different organizations maintaining their own branches. Please also see a proposal for release plan in SIP-12 for more details.

To be more efficient at maintaining Superset and improving it in the long-term, we need more workflow clarity that will increase code quality and improve project stability. This proposal wishes to reduce operational workload for the project maintainers, set the right expectations for everyone and make the project keep growing with happy community.

Proposed Changes

[P.1] Project maintainers can request an issue to be created before a PR is reviewed.

A philosophy we would like to strong encourage is

"Before creating a PR, create an issue."

We would like to update the contributing guidelines for Superset as follows.

Bug fixes: If you’re only fixing a small bug, it’s fine to submit a pull request right away but we highly recommend to file an issue detailing what you’re fixing. This is helpful in case we don’t accept that specific fix but want to keep track of the issue. Please keep in mind that the project maintainers reserve the rights to accept or reject incoming PRs, so it is better to separate the issue and the code to fix it from each other. In some cases, project maintainers may request you to create a separate issue from PR before proceeding.

Refactor: For small refactors, it can be a standalone PR itself detailing what you are refactoring and why. If there are concerns, project maintainers may request you to create a SIP for the PR before proceeding.

Feature/Large changes: If you intend to change the public API, or make any non-trivial changes to the implementation, we requires you to file a new issue as SIP (Superset Improvement Proposal). This lets us reach an agreement on your proposal before you put significant effort into it. You are welcome to submit a PR along with the SIP (sometimes necessary for demonstration), but we will not review/merge the code until the SIP is approved.

In general, small PRs are always easier to review than large PRs. The best practice is to break your work into smaller independent PRs and refer to the same issue. This will greatly reduce turnaround time.

Finally, never submit a PR that will put master branch in broken state.

[P.2] Create multiple new templates for new issues/PRs

Currently we only have an issue template but not a PR template.

To facilitate the issue and PR creation, we will provide templates for different scenarios. Github supports multiple templates which users can choose from a menu when creating.

image

An issue will be one of the following types:

  1. Question
  2. Bug
  3. Feature request
  4. SIP (Superset Improvement Proposal)

A PR will be one of the following types

  1. Bugfix - Fix bugs
  2. Feature - Add new features
  3. Refactor - Refactor code to improve code quality
  4. Test - Add new test cases

A PR should generally includes the following:

  • Summary
  • Before/after screenshots or animated GIF (when applicable)
  • Test Plan
  • Checkboxes
    • Has associated issue(s):
    • Changes UI
    • Includes screenshot
    • Requires DB Migration
    • Introduces new feature or API
    • Removes existing feature or API

[P.3] Pre-define reviewers with CODEOWNERS

Github allows us to list mandatory reviewers for each directory in a CODEOWNERS file. When a new PR is opened, the corresponding reviewers will be assigned to review that PR automatically. Mandatory reviewers do not have to be committers.

CODEOWNERS will be defined as a team. A team contains one or more people, such as (python-reviewers, js-reviewers, vis-reviewers) See an example.

Note: CODEOWNERS requires changing github repo settings (via Apache) to enable protected branch option. Teams are defined in organization settings. Need to investigate if it has to be the same github organization (Apache) to define the teams. Fallback plan is using another github org or macro/alias.

[P.4] Define a protocol when a PR is in progress

Authoring

  • Fill in all sections of the PR template.
  • Add label “WIP” if not ready for review (WIP = work-in-progress)
  • Add category label: bugfix, refactor, test or feature
  • Remove label “WIP” and add “READY” when ready for review
  • Changes to user interface require before/after screenshots, or GIF for interactions
    • Recommended capture tools (Kapture, LICEcap, Skitch)
    • If no screenshot is provided, the mandatory reviewers will mark the PR with “missing:screenshot’ label and will not review until screenshot is provided.
  • Reviewers will not review the code until all CI tests are passed. Sometimes there can be flaky tests. You can close and open PR to re-run CI test. Please report if the issue persists and rebase your PR after the fix has been deployed to master.
  • If the PR was not ready for review and inactive for > 30 days, we will close it due to inactivity. The author is welcome to re-open and update. Will set up a stale bot to do this.

Reviewing

  • Triage and add labels: Here are some example labels. Please keep in mind that these are examples. The final list of labels and adding a bot to add them will become action items if this SIP is approved.
    • Missing: screenshot, issue, test
    • Language: Javascript, Python, Others
    • Components: Dashboard, SQL Lab, Explore, Charts
  • Use constructive tone when writing reviews.
  • If there are changes required, state clearly what needs to be done before the PR can be approved.

Merging

  • At least one mandatory reviewer is required for merging a PR.
  • A PR with label “READY” and approved will be merged by a committer. The committer adds label(s) of release branches to go into, i.e., release/1.1, release/1.2. (See SIP-12)
  • After the PR is merged, close the corresponding issue(s).

Post-merge Responsibility

  • Project maintainers may contact the PR author if new issues are introduced by the PR.
  • Project maintainers may revert your changes if a critical issue is found, such as breaking master branch CI.

Note about labels

Non-committers cannot assign label by default. We will setup bots that allow contributors to add whitelisted labels via bot. User can type in comments such as

-label WIP 
+label READY

If this does not work, we can fallback to adding [WIP] or [READY] to PR titles.

[P.5] New changes to CI

Here are a few proposed improvements to the CI

[P.5.1] Add mandatory reviewers hook

The mandatory reviewers are defined in CODEOWNERS file

[P.5.2] Enable Stale branch check.

Pros

- Prevents broken master when a stale branch is approved and merged.
- Parallelized merging of PRs.

Cons

- Serialized merging of PRs impies adding overhead (having to re-sync either via the UI or CLI) and rerun the CI.
- Improvements in CI performance would help reduce some of the burden.

Note: Require this change to the github repo settings

image

[P.5.3] Automatic checks for code coverage project/patch

Similar to scikit-learn, this enforces the contributing guidelines of unit testing. Commits pushed to master should not make the overall project coverage decrease by more than 1%:

This will be enabled, though initially as a non-required check as we iterate on the configuration settings.

Note: These CI changes require Apache admin changes to the Github repo.

Action items

Please comment and refer to [P.x.x] if you have any suggestion.

If this SIP is approved, here are the follow-up tasks.

  • Update CONTRIBUTING.md
  • Create new issues and PRs templates
  • Create CODEOWNERS file, teams, and list mandatory reviewers
  • Update CI configuration
  • Setup bots for handling labels
  • Setup a bot for closing stale issues and PR
  • Triage the current PRs and issues.

References

@betodealmeida @mistercrunch

@mistercrunch
Copy link
Member

This is all super great. Some thoughts:

  • this seems to very much reflect the general consensus, nothing too controversial here as far as I'm concerned
  • guidelines or hard rules? I'm siding towards guidelines unless we can actually enforce with tooling (github or GH integrations)
  • I'm not sure whether not-rebasing prior to merge is a real common issue (how many times did that really happen, and how much work is it to fix a basic merge issue like import ordering?), I'd rather deal with the occasional issue with hard reverts than making everyone constantly rebase. I'm thinking code should settle quite a bit after this huge round of refactor that's been taking place (py2 deprecation, major JS restructuring, plugins, ...)
  • happy to give code owners a try (haven't used it in the past)
  • I used a stale bot in the past, but it was a bit of a manual process (running a script, used my Github personal API token ...). I really think we need this. I'm not sure if Apache will let use probot, they have strict restrictions on giving perms to outside orgs. I can dig out the info as to the thing I used in the past, but it wasn't great...

@mistercrunch
Copy link
Member

Reminder that part of the SIP process should be to notify the Apache Superset dev@ mailing list.

@kristw
Copy link
Contributor Author

kristw commented Oct 18, 2018

  • this seems to very much reflect the general consensus, nothing too controversial here as far as I'm concerned
  • guidelines or hard rules? I'm siding towards guidelines unless we can actually enforce with tooling (github or GH integrations)

Glad to hear that. We intend for this to be a written-down version of the unwritten agreement of how it currently work, with some more explicit instructions of what to do, so there will be less surprise why something is not reviewed.

  • I'm not sure whether not-rebasing prior to merge is a real common issue (how many times did that really happen, and how much work is it to fix a basic merge issue like import ordering?), I'd rather deal with the occasional issue with hard reverts than making everyone constantly rebase. I'm thinking code should settle quite a bit after this huge round of refactor that's been taking place (py2 deprecation, major JS restructuring, plugins, ...)

We can extract this bullet point [P.5.2] into a separate issue if the community has different opinions about it.

  • happy to give code owners a try (haven't used it in the past)

I have never used CODEOWNERS specifically, but used OWNERS file per directory at my previous job. This can automate some manual taggings we are doing in PRs and is useful for maintaining large codebase in general. An alternative is welcome too. We choose CODEOWNERS in the proposal because it is already built-in by github.

  • I used a stale bot in the past, but it was a bit of a manual process (running a script, used my Github personal API token ...). I really think we need this. I'm not sure if Apache will let use probot, they have strict restrictions on giving perms to outside orgs. I can dig out the info as to the thing I used in the past, but it wasn't great...

Your experience with this will be super helpful.

@michellethomas
Copy link
Contributor

I created a pull request template to follow up on [P.2] Create multiple new templates for new issues/PRs. Having multiple Pr templates is difficult at this point because you need to use the ?template= query param to use them, so I just created one PR template for now, but once they improve that feature we can add multiple as you suggested.

@kristw
Copy link
Contributor Author

kristw commented Apr 19, 2019

Remaining tasks are

  • Create CODEOWNERS file, teams, and list mandatory reviewers. (May consider using the pull-approve bot instead because it allows more flexible way to define reviewers.)
  • Update CI configuration
  • Setup bots for handling labels

@mistercrunch mistercrunch added this to pre-DISCUSS in SIPs Feb 24, 2020
@rusackas
Copy link
Member

@kristw I'm closing this due to age, but please do reopen the issue if you intend to put it up for a DISCUSS or VOTE thread. Thank you!

@srinify srinify moved this from pre-DISCUSS to VOTING in SIPs Feb 4, 2022
@rusackas rusackas reopened this Feb 4, 2022
@rusackas
Copy link
Member

rusackas commented Feb 4, 2022

Apologies for the prior confusing comment. This SIP is still technically in the VOTE process. It has sufficient votes to pass. We should post a [RESULT] thread to close this out @kristw

Vote thread: https://lists.apache.org/thread/wvmzbczwbrmoo4y9y631f9cz5jq7lq3s

@stale
Copy link

stale bot commented Apr 16, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. For admin, please label this issue .pinned to prevent stale bot from closing the issue.

@stale stale bot added the inactive Inactive for >= 30 days label Apr 16, 2022
@john-bodley
Copy link
Member

Although this seems to be stuck in an open vote, some aspects of the SIP have been completed whereas others likely outdated, hence we sense closing this SIP makes the most sense. @kristw if you feel otherwise please re-open the SIP with additional context.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
inactive Inactive for >= 30 days sip Superset Improvement Proposal
Projects
Status: Denied / Closed / Discarded
SIPs
VOTING (in the mailing list)
Development

No branches or pull requests

5 participants