[dev] provide a python merge script for merging pull requests #2526

sijie · 2018-09-05T20:51:28Z

Motivation

Currently we don't have a good tool on managing the merge process for pull requests.
Labels and milestones are not marked correctly. Changes are not cherry-picked to branches.
It causes a lot of troubles for release managers to do a release.

Changes

This PR adopts the bookkeeper merge script and change it to use github pull request merge api instead of git push.

This script handles:

labeling component/
labeling type/
mark the milestone
use github pull request merge api to do 'squash' merge
allow cherry-picking changes to branches and label them with corresponding release/ labels. in the cherry-picked commits, it will add back-references to the original commit. so when you negative the commits on github, it should be able to quickly negative the original commit.

### Motivation Currently we don't have a good tool on managing the merge process for pull requests. Labels and milestones are not marked correctly. Changes are not cherry-picked to branches. It causes a lot of troubles for release managers to do a release. ### Changes This PR adopts the bookkeeper merge script and change it to use github pull request merge api instead of git push.

ivankelly · 2018-09-06T08:01:30Z

dev/pulsar-merge-pr.py

+# Remote name which points to the GitHub site
+PR_REMOTE_NAME = os.environ.get("PR_REMOTE_NAME", "apache")
+# Remote name which points to Apache git
+PUSH_REMOTE_NAME = os.environ.get("PUSH_REMOTE_NAME", "apache")


Having PR_REMOTE_NAME and PUSH_REMOTE_NAME no longer makes sense, since github is the canonical repo.

ivankelly · 2018-09-06T08:04:23Z

dev/pulsar-merge-pr.py

+TEMP_BRANCH_PREFIX = "PR_TOOL"
+RELEASE_BRANCH_PREFIX = "branch-"
+
+DEFAULT_FIX_VERSION = os.environ.get("DEFAULT_FIX_VERSION", "0.9.1.0")


Where does this version come from?

I didn't change this when porting this from bookkeeper. so I think it is coming from spark, since the bk script was ported from spark.

ivankelly · 2018-09-06T08:04:43Z

dev/pulsar-merge-pr.py

+def get_json(url, preview_api = False):
+    try:
+        request = urllib2.Request(url)
+        if GITHUB_OAUTH_KEY:


Should never be null.

ivankelly · 2018-09-06T08:06:49Z

dev/pulsar-merge-pr.py

+                  "GitHub requests."
+        else:
+            print "Unable to fetch URL, exiting: %s - %s" % (url, e)
+        sys.exit(-1)


call fail()

ivankelly · 2018-09-06T08:06:57Z

dev/pulsar-merge-pr.py

+        else:
+            print "Unable to fetch URL, exiting: %s - %s" % (url, e)
+            print e
+        sys.exit(-1)


call fail()

ivankelly · 2018-09-06T08:07:01Z

dev/pulsar-merge-pr.py

+                  "GitHub requests."
+        else:
+            print "Unable to fetch URL, exiting: %s" % url
+        sys.exit(-1)


call fail()

ivankelly · 2018-09-06T08:14:25Z

dev/pulsar-merge-pr.py

+        if line.startswith('>'):
+            continue
+        modified_body = modified_body + line + "\n"
+    if modified_body != body:


This will ask every time, because it will add an extra '\n' at the end.

ivankelly · 2018-09-06T08:22:30Z

dev/pulsar-merge-pr.py

+
+    # Merged pull requests don't appear as merged in the GitHub API;
+    # Instead, they're closed by asfgit.
+    merge_commits = \


This looks like an artifact of the days when asf had it's own repos and github was just mirrors.

A lot of stuff can be removed from this script by not using a local repo at all, and using the github merge api.
https://developer.github.com/v3/pulls/#merge-a-pull-request-merge-button

the logic here is checking whether a PR can be merged or not. the real merge action has been replaced by the merge api. I have stated that in the description. I can clean this up if needed.

The merge api should throw an error if you can't merge, no?

@ivankelly yes, I said that I can clean this up if needed. but I don't want to make any changes until the community wants to adopt this script.

maskit

In short, I think the root cause of the mess is not lack of tools but lack of clear policies.

Labels and milestones are not marked correctly.

Where's the rule book?

Changes are not cherry-picked to branches.

Do we have a maintenance policy?
What commits have to be cherry-picked? To which branches? Until when? By Who?

I understand the motivation, however, I think we should discuss rules and policies, and should build a consensus first. Otherwise who can review and maintain the script correctly?

I'm fine with automating boring steps and preventing mistakes by using scripts, but it's questionable to me that we need complicated rules that we cannot manually apply correctly.

sijie · 2018-09-06T15:48:13Z

In short, I think the root cause of the mess is not lack of tools but lack of clear policies.

agreed. that's what I stated in the mail thread. what I need is a clear policy. the tool here is adding enforcement.

Where's the rule book?
Do we have a maintenance policy?
What commits have to be cherry-picked? To which branches? Until when? By Who?

I have started the mail thread for almost a month, but really almost no one in the community is interested in discussing the rules.

I understand the motivation, however, I think we should discuss rules and policies, and should build a consensus first. Otherwise who can review and maintain the script correctly?

I have started the discussion in the mail thread. This PR is just to demonstrate a proposal to follow what other project is doing spark, bookkeeper, kafka and many other ASF projects. If the community likes this approach, we can adopt it. If the community doesn't like the approach, it is okay since it is just a proposal, if there is other good approaches, please raise it up in the mail thread.

I'm fine with automating boring steps and preventing mistakes by using scripts, but it's questionable to me that we need complicated rules that we cannot manually apply correctly.

First, Labeling the issues and marking milestone. It is the easiest part. If everyone can follow the rules, that is totally fine. However people are just lazy and steps are usually missed.

Second, manage the commit message. when you click 'squash and merge' button, if you don't pay attention to copying the description in the PR to the merge text box, you will end up with a commit without description or with a mixed of random commits

Last, manage the cherry-picks. that is the difficult part. currently we are using milestone for tracking the releases, however a PR can be linked to only one milestone. so if there is no accurate view on how a pull request lands at the git repo. for a release manager, 1) he has to either manually cherry-pick to branches and left a comment in the original pull request, or 2) he has to manually create a new PR for it, which I think it is a load to the release manager.
For 1), those individual cherry-picks are not interconnected, when you look into a git log in the branch, you will never know if the change is a cherry-pick or a brand new commit. There is almost no way for people to traverse back the original change.
For 2), it has to going to another review cycle and the problem listed above during merging will occur.

I don't meant to introduce any overhead to the merging and releasing process. However I do think for Pulsar to become a success project, it has to do something on this management.

At the end, to really know the pain points, I would suggest every committer in the community should do at least one bugfix release.

merlimat · 2018-09-06T17:12:57Z

Second, manage the commit message. when you click 'squash and merge' button, if you don't pay attention to copying the description in the PR to the merge text box, you will end up with a commit without description or with a mixed of random commits

agree with commit message that is not uniformly enforced. Also most people (myself included) only fill the longer description in the PR but not on the commit log.

or with a mixed of random commits

The current setting in github is to only allow the option to do "squash & rebase", so at least this is already enforced.

sijie · 2018-09-06T17:25:24Z

The current setting in github is to only allow the option to do "squash & rebase", so at least this is already enforced.

Sorry I was trying to say commit messages. Basically GitHub squash and merge all commit logs from all commits together. If you don’t pay attention to replace it, it becomes a mess in the git log.

maskit · 2018-09-07T14:52:23Z

agreed. that's what I stated in the mail thread. what I need is a clear policy. the tool here is adding enforcement.

I have started the mail thread for almost a month, but really almost no one in the community is interested in discussing the rules.

I have started the discussion in the mail thread. This PR is just to demonstrate a proposal to follow what other project is doing spark, bookkeeper, kafka and many other ASF projects. If the community likes this approach, we can adopt it. If the community doesn't like the approach, it is okay since it is just a proposal, if there is other good approaches, please raise it up in the mail thread.

I'm sorry for not responding on the mail thread. However, as you commented, a clear policy is what we need. If we just borrowed scripts from another project, the policy would keep unclear to people who are not participating the project.

I'd like to see the policy that is coded into the script. We can enforce the policy later by using the script.

First, Labeling the issues and marking milestone. It is the easiest part. If everyone can follow the rules, that is totally fine. However people are just lazy and steps are usually missed.

I think we need to get used to doing it, and committers should be responsible for checking these things. ATS community is doing this pretty well without tools. A clear policy would be a help.

Second, manage the commit message. when you click 'squash and merge' button, if you don't pay attention to copying the description in the PR to the merge text box, you will end up with a commit without description or with a mixed of random commits

The same as above. Committers should take care of this. I don't think automating it always ends up with the best result.

Last, manage the cherry-picks. that is the difficult part. currently we are using milestone for tracking the releases, however a PR can be linked to only one milestone.

I understand the difficulty. ATS community is facing the same issue. To deal with it, they also use Project for backporting request management. It's still too early to say it's working well, but I think it worth to try.

For 1), those individual cherry-picks are not interconnected, when you look into a git log in the branch, you will never know if the change is a cherry-pick or a brand new commit. There is almost no way for people to traverse back the original change.

In ATS community, they uses -x when they cherry-pick commits. This option appends the original commit hash to the commit log.

For 2), it has to going to another review cycle and the problem listed above during merging will occur.

In ATS community, they make PRs for backporting only if the commit cannot be cherry-picked cleanly due to conflicts. And RM basically doesn't make PRs by himself but ask the original author to do so because RM isn't familiar with the change/module in some cases. RM just check that PRs are in policy and merge them.

I don't meant to introduce any overhead to the merging and releasing process. However I do think for Pulsar to become a success project, it has to do something on this management.

I understand and agree.

At the end, to really know the pain points, I would suggest every committer in the community should do at least one bugfix release.

I don't disagree but that would take time. Since you are trying to enforce a policy, I guess you already have good examples for explaining the pain and how the policy solve it, right? Why don't you share those examples?

sijie · 2018-09-07T15:49:22Z

i think there is a confusion here. There is no policy coded into the script. It is still left to the community to come up with a policy. If you want me to make a proposal about the policy, I am happy to do that.

What the script is really offering is to automating (end enforcing) the steps that committers need to run manually. It is still the committer’s responsibility for choosing labels, milestones and deciding cherrypicks. That says the tool is not the policy, it is trying to standardize what people is doing manually. The discussion of a policy should happen regardless it is manually or automated.

As you said, different projects have different way for doing things. Both of us have biases on doing things. ATS might have its practice doing things and it applies to the community. However I do not see that will apply to the community here. Even some of our committers know the “steps”, but they are just forgetting sometimes during their daily engineering life. The tool here is only meant to help automate the steps. It automated the “steps” currently the committers are using manually, so people will have to really think of the steps when doing manually. And again the discussion about the policy (or the steps for merging a PR) should happen regardless it is scripted or manually.

Regarding the pain points, I have stated what I have observed and encountered in my previous comments. The pain point comes from committer doing things manually, different things are missing during merging and that causes the problems in release. The solution I am proposing is to use a script to automate (and enforce) the steps that committers are doing manually. I provide what I can provide. Not sure what else I can provide.

maskit · 2018-09-07T17:51:17Z

Alright, it seems like I need to read what the script actually does.

Although I haven't read the script at all, I don't understand how we can automate something without policies, rules, whatever. I guess the script does the task based on something. It is expecting something. I think the something is apart of policies ("policy" and "rule" are different but I used it as "something we should follow").

And I guess the script ease the pain you stated in your motivation below.

Labels and milestones are not marked correctly. Changes are not cherry-picked to branches.
It causes a lot of troubles for release managers to do a release.

These are why I though some policy is coded into the script.

Anyways, I hope our comments here made people think about policies.

As for ways of doing things, I agree. How things can be done depends on people in the community. I was suggesting that automating is not the only way. Because I believe steps are based on some policy, I think people can do some of the things without enforcing with scripts once we define clear policies.

As for the pain points, my point is that showing what you exactly saw during releases and explaining how labels, milestones, etc help RM is more effective than just saying missing things cause problems.

dave2wave · 2018-09-07T18:07:41Z

I think that the topic being discussed - how to tag and build a release including the policies should be surfaced on dev@pulsar with a clear subject.

sijie · 2018-09-07T18:18:43Z

@dave2wave yes. there was already an email thread about "how to label milestone". I was hoping the discussion regarding the policy or whatever rules should happen at the email thread. The PR here is just to showcase how a script can automate the steps the committers are currently using. No meant to use this for discussing the policy or whatever rules about release and milestone.

Descriptions of the changes in this PR: ### Motivation The development of bookkeeper has been moved from JIRA to Github. So we can clean up the merge script to remove those instructions assuming we are still on apache mirror and JIRA. ### Changes This change follows the changes that I have adjusted at apache/pulsar#2526 - remove related logic about JIRA from merge script - change merge script to use github merge api Author: Reviewers: Enrico Olivelli <eolivelli@gmail.com> This closes #1663 from sijie/improve_merge_script_2

maskit · 2018-09-10T11:27:01Z

dev/pulsar-merge-pr.py

+    # Find the github issues to close
+    github_issues = re.findall("#[0-9]{3,6}", title)
+
+    if len(github_issues) != 0:


This block is expecting that all referred issues in a PR title will be resolved when the PR is closed.

I'm OK with this rule, however, it need to be discussed and clearly stated.

sijie · 2018-11-19T20:44:54Z

close this PR due to in-activities.

@merlimat is working on proposing a workflow for this.

sijie added 2 commits September 5, 2018 13:38

revert unneeded change

cf60c99

sijie added area/build type/task labels Sep 5, 2018

sijie self-assigned this Sep 5, 2018

sijie requested review from ivankelly, merlimat, maskit, srkukarni, jerrypeng, jiazhai, joefk and yush1ga September 5, 2018 20:51

Fix license header

35073c9

ivankelly requested changes Sep 6, 2018

View reviewed changes

maskit reviewed Sep 6, 2018

View reviewed changes

sijie mentioned this pull request Sep 7, 2018

[dev] clean up merge script apache/bookkeeper#1663

Merged

maskit reviewed Sep 10, 2018

View reviewed changes

sijie closed this Nov 19, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[dev] provide a python merge script for merging pull requests #2526

[dev] provide a python merge script for merging pull requests #2526

sijie commented Sep 5, 2018

ivankelly Sep 6, 2018

ivankelly Sep 6, 2018

sijie Sep 6, 2018

ivankelly Sep 6, 2018

ivankelly Sep 6, 2018

ivankelly Sep 6, 2018

ivankelly Sep 6, 2018

ivankelly Sep 6, 2018

ivankelly Sep 6, 2018

sijie Sep 6, 2018

ivankelly Sep 6, 2018

sijie Sep 6, 2018

maskit left a comment

sijie commented Sep 6, 2018

merlimat commented Sep 6, 2018

sijie commented Sep 6, 2018

maskit commented Sep 7, 2018

sijie commented Sep 7, 2018

maskit commented Sep 7, 2018

dave2wave commented Sep 7, 2018

sijie commented Sep 7, 2018

maskit Sep 10, 2018

sijie commented Nov 19, 2018

[dev] provide a python merge script for merging pull requests #2526

[dev] provide a python merge script for merging pull requests #2526

Conversation

sijie commented Sep 5, 2018

Motivation

Changes

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

maskit left a comment

Choose a reason for hiding this comment

sijie commented Sep 6, 2018

merlimat commented Sep 6, 2018

sijie commented Sep 6, 2018

maskit commented Sep 7, 2018

sijie commented Sep 7, 2018

maskit commented Sep 7, 2018

dave2wave commented Sep 7, 2018

sijie commented Sep 7, 2018

Choose a reason for hiding this comment

sijie commented Nov 19, 2018