Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-6122] Update committer guidelines #7129

Merged
merged 3 commits into from Dec 4, 2018

Conversation

Projects
None yet
8 participants
@tweise
Copy link
Contributor

commented Nov 26, 2018

Update guidelines per discussion in https://lists.apache.org/thread.html/6d922820d6fc352479f88e5c8737f2c8893ddb706a1e578b50d28948@%3Cdev.beam.apache.org%3E


Follow this checklist to help us incorporate your contribution quickly and easily:

  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

It will help us expedite review of your Pull Request if you tag someone (e.g. @username) to look at it.

Post-Commit Tests Status (on master branch)

Lang SDK Apex Dataflow Flink Gearpump Samza Spark
Go Build Status --- --- --- --- --- ---
Java Build Status Build Status Build Status Build Status Build Status Build Status Build Status Build Status
Python Build Status --- Build Status
Build Status
Build Status --- --- ---

@tweise tweise requested review from kennknowles and robertwb Nov 26, 2018

@tweise tweise force-pushed the tweise:BEAM-6122.committer-guidelines branch from 1752050 to 2305a28 Nov 26, 2018

@tweise tweise force-pushed the tweise:BEAM-6122.committer-guidelines branch from 2305a28 to 9c018ac Nov 26, 2018

* We prefer small independent, incremental PRs with descriptive, isolated commits. Each commit is a single clear change
* It is OK to keep separate commits for different logical pieces of the code, if they make reviewing and revisiting code easier
* Making commits isolated is a good practice, authors should be able to relatively easily split the PR upon reviewer's request
* When there are multiple commits in a single PR, every commit that gets merged should compile and pass tests

This comment has been minimized.

Copy link
@robertwb

robertwb Nov 26, 2018

Contributor

This is too strong, sometimes it is really helpful to split out completely automated (but possibly broken) refactorings from the manual fixups that were required.

This comment has been minimized.

Copy link
@tweise

tweise Nov 26, 2018

Author Contributor

Do you have a suggestion how to word it? The challenge would be to come up with something that is clear and actionable for the common case.

This comment has been minimized.

Copy link
@swegner

swegner Nov 26, 2018

Contributor

Note that GitHub doesn't easily support this workflow: GitHub runs validation against the HEAD commit for a PR, but not on other commits. And for reviewing: GitHub displays the validation status of the HEAD commit, but the intermediate commit status is hidden away.

It's non-trivial work to ensure every commit is green, and I think the value provided (rollback granularity) is minimal. I would prefer that our default policy be for the PR as a whole to be green, and when rollback is necessary, rollback the entire PR.

I agree with the other criteria and I believe we could cut this bullet and the above goal of granular rollback.

This comment has been minimized.

Copy link
@kennknowles

kennknowles Nov 26, 2018

Member

I actually experimented with git aliases / scripts to do this automatically. It isn't terribly hard for precommits - you iterate over all unmerged commits and force push them one by one. This might flood Jenkins a bit. It would be nice to have this for postcommits. It seems like it ought to be a checkbox in Jenkins config actually.

I also want to mention that the rollback PR should be green except in rare cases. So a green rollback of one commit is just great.

This comment has been minimized.

Copy link
@kennknowles

kennknowles Nov 26, 2018

Member

I would simply remove this bullet. I don't think there's anything that is both reasonable and actionable.

In practice, one may often revert a whole PR as Scott suggests. You can always open multiple reverts for the whole PR, various commits in the PR, etc, and choose one (the first?) that restores signal. It is then easy enough to roll forward innocent commits that were caught in the dragnet.

This comment has been minimized.

Copy link
@tweise

tweise Nov 26, 2018

Author Contributor

Looks like there is agreement to just drop this item.

This comment has been minimized.

Copy link
@mxm

mxm Nov 28, 2018

Contributor

The beauty of every PR compiling and passing tests is that you can use git-bisect without any trouble. There might be a way to bisect only with merge commits; individual broken commits wouldn't matter then.

Instead of dropping the paragraph we could rephrase it to "Ideally, every commit should compile and pass tests" but I generally agree that it is not practical nor do we have the right test infrastructure to verify it.

This comment has been minimized.

Copy link
@robertwb

robertwb Nov 29, 2018

Contributor

I was also about to suggest softening this to "Generally, every commit should compile and past tests." This makes it more of a (valuable IMHO) guideline rather than something we require or enforce.

This comment has been minimized.

Copy link
@tweise

tweise Nov 30, 2018

Author Contributor

Changed wording as suggested by Robert.

This comment has been minimized.

Copy link
@tweise

tweise Nov 30, 2018

Author Contributor

I also want to mention that the rollback PR should be green except in rare cases.

Added that under rollback.


## Merging it!

After all the tests pass, there should be a green merge button at the bottom of
the pull request. There are multiple choices and you should choose "Merge pull
the pull request. There are multiple choices. Unless you want to squash commits
as part of the merge (see above) you should choose "Merge pull
request" (the default). This preserves the commit history and adds a merge

This comment has been minimized.

Copy link
@apilloud

apilloud Nov 26, 2018

Member

Merge pull request actually defaults to whatever operation you did last time. Can you remove (the default) and replace it with and ensure "Create a merge commit" is selected from the drop down.

This comment has been minimized.

Copy link
@tweise

tweise Nov 30, 2018

Author Contributor

Done!

* We prefer small independent, incremental PRs with descriptive, isolated commits. Each commit is a single clear change
* It is OK to keep separate commits for different logical pieces of the code, if they make reviewing and revisiting code easier
* Making commits isolated is a good practice, authors should be able to relatively easily split the PR upon reviewer's request
* When there are multiple commits in a single PR, every commit that gets merged should compile and pass tests

This comment has been minimized.

Copy link
@swegner

swegner Nov 26, 2018

Contributor

Note that GitHub doesn't easily support this workflow: GitHub runs validation against the HEAD commit for a PR, but not on other commits. And for reviewing: GitHub displays the validation status of the HEAD commit, but the intermediate commit status is hidden away.

It's non-trivial work to ensure every commit is green, and I think the value provided (rollback granularity) is minimal. I would prefer that our default policy be for the PR as a whole to be green, and when rollback is necessary, rollback the entire PR.

I agree with the other criteria and I believe we could cut this bullet and the above goal of granular rollback.

rollback; it is easy to roll back many commits, or a whole pull request, but
harder to roll back part of a commit.
* Commit messages should tag JIRAs and be otherwise descriptive.
It should later not be necessary to find a merge or first PR commit to find out what caused a change.

This comment has been minimized.

Copy link
@swegner

swegner Nov 26, 2018

Contributor

FYI, GitHub's UI makes it easy to find the associated PR, which I find to be most useful as it contains the discussion / metadata around a change.

image

This comment has been minimized.

Copy link
@kennknowles

kennknowles Nov 26, 2018

Member

Yea, GitHub has really stepped up this UI support. That said, git log is much more powerful in a lot of ways so it should be first class too.

@@ -100,3 +128,6 @@ Instead, pull it all into the subject line:
Merge pull request #1234: [BEAM-7873] Fix the foo bizzle bazzle

If you have comments to add, put them in the body of the commit message.

After merging the pull request, the committer is responsible for resolving associated JIRA(s)

This comment has been minimized.

Copy link
@swegner

swegner Nov 26, 2018

Contributor

Do you mean the contributor's responsibility, or the JIRA assignee? The contributor has the most context on whether the PR completes the associated JIRA. It might be that a JIRA requires multiple PR's (or something external, like a release) to complete.

I think it would be difficult for the committer to take this responsibility.

This comment has been minimized.

Copy link
@kennknowles

kennknowles Nov 26, 2018

Member

Yea, I think it should be the contributor who did the change and presumably has the Jira assigned to them. We do have a problem with Jira's staying open until they get re-triaged but not sure that fits with the discussion we had on list.

This comment has been minimized.

Copy link
@amaliujia

amaliujia Nov 26, 2018

Contributor

It's a trade off between full context of a JIRA and a well managed JIRA task list. Contributors could have more context on a JIRA but contributors could leave the JIRA open unintentionally.

A compromise could be that committer asks in a JIRA if it can be closed after associated PR is approved, and close the JIRA if no reply shows up within a reasonable time (like 3 days). Contributor could always reopen the JIRA or add more context later.

This comment has been minimized.

Copy link
@amaliujia

amaliujia Nov 26, 2018

Contributor

And yes, seems like this point is not on the discussion thread. We might need to have a discussion for it since it affects both roles (contributor and committer).

This comment has been minimized.

Copy link
@tweise

tweise Nov 26, 2018

Author Contributor

We should move this line out of the PR and discuss it separately. To review a PR, I probably need to sufficiently understand the scope of the JIRA and what the change I just merge addresses or not. Typically the committer merging the PR takes the last action in the chain and the contributor may not return just to close out the JIRA. The committer is also more likely to be aware of the versioning.

This comment has been minimized.

Copy link
@robertwb

robertwb Nov 29, 2018

Contributor

I'm also in favor of removing this line. I think it's nice to do (especially if it's obvious and you have the context in your head) but not essential.

This comment has been minimized.

Copy link
@tweise

tweise Nov 30, 2018

Author Contributor

Removed.

* We prefer small independent, incremental PRs with descriptive, isolated commits. Each commit is a single clear change
* It is OK to keep separate commits for different logical pieces of the code, if they make reviewing and revisiting code easier
* Making commits isolated is a good practice, authors should be able to relatively easily split the PR upon reviewer's request
* When there are multiple commits in a single PR, every commit that gets merged should compile and pass tests

This comment has been minimized.

Copy link
@kennknowles

kennknowles Nov 26, 2018

Member

I would simply remove this bullet. I don't think there's anything that is both reasonable and actionable.

In practice, one may often revert a whole PR as Scott suggests. You can always open multiple reverts for the whole PR, various commits in the PR, etc, and choose one (the first?) that restores signal. It is then easy enough to roll forward innocent commits that were caught in the dragnet.

rollback; it is easy to roll back many commits, or a whole pull request, but
harder to roll back part of a commit.
* Commit messages should tag JIRAs and be otherwise descriptive.
It should later not be necessary to find a merge or first PR commit to find out what caused a change.

This comment has been minimized.

Copy link
@kennknowles

kennknowles Nov 26, 2018

Member

Yea, GitHub has really stepped up this UI support. That said, git log is much more powerful in a lot of ways so it should be first class too.

Show resolved Hide resolved website/src/contribute/committer-guide.md
@@ -100,3 +128,6 @@ Instead, pull it all into the subject line:
Merge pull request #1234: [BEAM-7873] Fix the foo bizzle bazzle

If you have comments to add, put them in the body of the commit message.

After merging the pull request, the committer is responsible for resolving associated JIRA(s)

This comment has been minimized.

Copy link
@kennknowles

kennknowles Nov 26, 2018

Member

Yea, I think it should be the contributor who did the change and presumably has the Jira assigned to them. We do have a problem with Jira's staying open until they get re-triaged but not sure that fits with the discussion we had on list.

Show resolved Hide resolved website/src/contribute/index.md
Show resolved Hide resolved website/src/contribute/index.md
@@ -172,6 +174,9 @@ To contribute code, you need
Use `R: @username` in the pull request to notify a reviewer.
1. If you don't get any response in 3 business days, email the dev@ list to ask for someone to look at your pull
request.
1. Review feedback typically leads to follow-up changes. Please add these changes as additional "fixup" commits to the

This comment has been minimized.

Copy link
@kennknowles

kennknowles Nov 26, 2018

Member

I don't think we agreed on this, did we? This used to be critical for GitHub to not lose comments, but I think that is no longer the case. I see a lot of PRs that don't do this and it seems OK. In fact, when a large PR is reviewed incrementally like this I think it tends to let in worse changes because the increments after the first round look OK even though the resulting code is not so good.

FWIW reviewable.io does not require this and has its own incremental review that we have toyed with. Anyone can choose to use it on a per-review basis.

This comment has been minimized.

Copy link
@tweise

tweise Nov 26, 2018

Author Contributor

I don't think it was part of the discussion, rather it seems to be the undocumented status quo from looking at what happens currently (changes being added incrementally to PRs)? I don't have a strong preference one way or the other.

This comment has been minimized.

Copy link
@robertwb

robertwb Nov 29, 2018

Contributor

IIRC, github comments are tied to diffs, so they can be lost if the code changes too much and the commits are gone. I personally like seeing just the changes, as it gives me the choice of viewing everything vs. the incremental diff (and the default is still everything). If consensus isn't obvious here, maybe back to the list?

This comment has been minimized.

Copy link
@tweise

tweise Nov 30, 2018

Author Contributor

Changed wording.

@swegner

This comment has been minimized.

Copy link
Contributor

commented Nov 26, 2018

In general, I believe that manual processes should be made as simple as possible and have failure-modes which give timely and actionable feedback.

Some of these policies (like tagging a JIRA on every commit) would be very easy to validate in a pre-commit (the website build.gradle gives an example of interacting with Git from Gradle). But, the workflow for fixing up commit messages takes some degree of Git mastery (git rebase -i). It's not a huge leap, but it adds the requirement that all contributors known intermediate/advanced Git (or always write compliant commit messages on the first try).

A simpler failure mode would be something like: "If each commit is tagged with a JIRA, individual commits will be retained during merge; otherwise commiters should squash and merge a single commit with JIRA information from the PR."

@kennknowles

This comment has been minimized.

Copy link
Member

commented Nov 26, 2018

I really like Scott's latest points. Another aspect here is that the instructions for new contributors should be primarily helpful and welcoming, and avoid being prescriptive or pedantic. And concise. We want them to have an easy onramp, so we need enough tips, but we also don't want to discourage their contributor either by excessive demands or an intimidatingly large amount of instruction. My biggest goal is for them to open a PR and have a good interaction with a reviewer. I don't want to put anything in the way of that. That's why rules for policing our more experienced members should be clearly separate. As an extremely common special case, if someone thinks they already know how to open a PR, I want to get out of the way and let them do that, and work out the rest once we are in conversation. Putting this stuff in Jenkins is great not only for the usual reasons for automation but also because it makes iteration on it self-service.

Knowledgeable committers can take on additional chores and/or chat with the contributor about things on the PR. I do think Beam committers can be expected to know rebase -i and how to push to a contributor's branch, or else abstain from merging PRs.

On the automation front, just noting that we have 1575 non-merge commits on master since Oct 1 (choosing a recent date to avoid any differences in older policies) of which 454 are tagged with a Jira. I don't think that 2/3 of the commits are out of line. But I also think forcing folks to just tag them with the Jira they happen to also be addressing is fine. It is a really nice way to handle the "should I [ask for] squash or not?" problem.

tweise added some commits Nov 30, 2018

@tweise

This comment has been minimized.

Copy link
Contributor Author

commented Dec 3, 2018

@swegner

swegner approved these changes Dec 4, 2018

@tweise tweise merged commit e5d9cf4 into apache:master Dec 4, 2018

3 checks passed

RAT ("Run RAT PreCommit") SUCCESS
Details
Website ("Run Website PreCommit") SUCCESS
Details
Website_Stage_GCS ("Run Website_Stage_GCS PreCommit") SUCCESS
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.