Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dev] provide a python merge script for merging pull requests #2526

Closed
wants to merge 3 commits into from

Conversation

sijie
Copy link
Member

@sijie sijie commented Sep 5, 2018

Motivation

Currently we don't have a good tool on managing the merge process for pull requests.
Labels and milestones are not marked correctly. Changes are not cherry-picked to branches.
It causes a lot of troubles for release managers to do a release.

Changes

This PR adopts the bookkeeper merge script and change it to use github pull request merge api instead of git push.

This script handles:

  • labeling component/
  • labeling type/
  • mark the milestone
  • use github pull request merge api to do 'squash' merge
  • allow cherry-picking changes to branches and label them with corresponding release/ labels. in the cherry-picked commits, it will add back-references to the original commit. so when you negative the commits on github, it should be able to quickly negative the original commit.

 ### Motivation

Currently we don't have a good tool on managing the merge process for pull requests.
Labels and milestones are not marked correctly. Changes are not cherry-picked to branches.
It causes a lot of troubles for release managers to do a release.

 ### Changes

This PR adopts the bookkeeper
merge script and change it to use github pull request merge api instead of git push.
# Remote name which points to the GitHub site
PR_REMOTE_NAME = os.environ.get("PR_REMOTE_NAME", "apache")
# Remote name which points to Apache git
PUSH_REMOTE_NAME = os.environ.get("PUSH_REMOTE_NAME", "apache")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having PR_REMOTE_NAME and PUSH_REMOTE_NAME no longer makes sense, since github is the canonical repo.

TEMP_BRANCH_PREFIX = "PR_TOOL"
RELEASE_BRANCH_PREFIX = "branch-"

DEFAULT_FIX_VERSION = os.environ.get("DEFAULT_FIX_VERSION", "0.9.1.0")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does this version come from?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't change this when porting this from bookkeeper. so I think it is coming from spark, since the bk script was ported from spark.

def get_json(url, preview_api = False):
try:
request = urllib2.Request(url)
if GITHUB_OAUTH_KEY:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should never be null.

"GitHub requests."
else:
print "Unable to fetch URL, exiting: %s - %s" % (url, e)
sys.exit(-1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

call fail()

else:
print "Unable to fetch URL, exiting: %s - %s" % (url, e)
print e
sys.exit(-1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

call fail()

"GitHub requests."
else:
print "Unable to fetch URL, exiting: %s" % url
sys.exit(-1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

call fail()

if line.startswith('>'):
continue
modified_body = modified_body + line + "\n"
if modified_body != body:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will ask every time, because it will add an extra '\n' at the end.


# Merged pull requests don't appear as merged in the GitHub API;
# Instead, they're closed by asfgit.
merge_commits = \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like an artifact of the days when asf had it's own repos and github was just mirrors.

A lot of stuff can be removed from this script by not using a local repo at all, and using the github merge api.
https://developer.github.com/v3/pulls/#merge-a-pull-request-merge-button

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the logic here is checking whether a PR can be merged or not. the real merge action has been replaced by the merge api. I have stated that in the description. I can clean this up if needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The merge api should throw an error if you can't merge, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ivankelly yes, I said that I can clean this up if needed. but I don't want to make any changes until the community wants to adopt this script.

Copy link
Member

@maskit maskit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In short, I think the root cause of the mess is not lack of tools but lack of clear policies.

Labels and milestones are not marked correctly.

Where's the rule book?

Changes are not cherry-picked to branches.

Do we have a maintenance policy?
What commits have to be cherry-picked? To which branches? Until when? By Who?

I understand the motivation, however, I think we should discuss rules and policies, and should build a consensus first. Otherwise who can review and maintain the script correctly?

I'm fine with automating boring steps and preventing mistakes by using scripts, but it's questionable to me that we need complicated rules that we cannot manually apply correctly.

@sijie
Copy link
Member Author

sijie commented Sep 6, 2018

In short, I think the root cause of the mess is not lack of tools but lack of clear policies.

agreed. that's what I stated in the mail thread. what I need is a clear policy. the tool here is adding enforcement.

Where's the rule book?
Do we have a maintenance policy?
What commits have to be cherry-picked? To which branches? Until when? By Who?

I have started the mail thread for almost a month, but really almost no one in the community is interested in discussing the rules.

I understand the motivation, however, I think we should discuss rules and policies, and should build a consensus first. Otherwise who can review and maintain the script correctly?

I have started the discussion in the mail thread. This PR is just to demonstrate a proposal to follow what other project is doing spark, bookkeeper, kafka and many other ASF projects. If the community likes this approach, we can adopt it. If the community doesn't like the approach, it is okay since it is just a proposal, if there is other good approaches, please raise it up in the mail thread.

I'm fine with automating boring steps and preventing mistakes by using scripts, but it's questionable to me that we need complicated rules that we cannot manually apply correctly.

First, Labeling the issues and marking milestone. It is the easiest part. If everyone can follow the rules, that is totally fine. However people are just lazy and steps are usually missed.

Second, manage the commit message. when you click 'squash and merge' button, if you don't pay attention to copying the description in the PR to the merge text box, you will end up with a commit without description or with a mixed of random commits

Last, manage the cherry-picks. that is the difficult part. currently we are using milestone for tracking the releases, however a PR can be linked to only one milestone. so if there is no accurate view on how a pull request lands at the git repo. for a release manager, 1) he has to either manually cherry-pick to branches and left a comment in the original pull request, or 2) he has to manually create a new PR for it, which I think it is a load to the release manager.
For 1), those individual cherry-picks are not interconnected, when you look into a git log in the branch, you will never know if the change is a cherry-pick or a brand new commit. There is almost no way for people to traverse back the original change.
For 2), it has to going to another review cycle and the problem listed above during merging will occur.

I don't meant to introduce any overhead to the merging and releasing process. However I do think for Pulsar to become a success project, it has to do something on this management.

At the end, to really know the pain points, I would suggest every committer in the community should do at least one bugfix release.

@merlimat
Copy link
Contributor

merlimat commented Sep 6, 2018

Second, manage the commit message. when you click 'squash and merge' button, if you don't pay attention to copying the description in the PR to the merge text box, you will end up with a commit without description or with a mixed of random commits

agree with commit message that is not uniformly enforced. Also most people (myself included) only fill the longer description in the PR but not on the commit log.

or with a mixed of random commits

The current setting in github is to only allow the option to do "squash & rebase", so at least this is already enforced.

@sijie
Copy link
Member Author

sijie commented Sep 6, 2018

The current setting in github is to only allow the option to do "squash & rebase", so at least this is already enforced.

Sorry I was trying to say commit messages. Basically GitHub squash and merge all commit logs from all commits together. If you don’t pay attention to replace it, it becomes a mess in the git log.

@maskit
Copy link
Member

maskit commented Sep 7, 2018

agreed. that's what I stated in the mail thread. what I need is a clear policy. the tool here is adding enforcement.

I have started the mail thread for almost a month, but really almost no one in the community is interested in discussing the rules.

I have started the discussion in the mail thread. This PR is just to demonstrate a proposal to follow what other project is doing spark, bookkeeper, kafka and many other ASF projects. If the community likes this approach, we can adopt it. If the community doesn't like the approach, it is okay since it is just a proposal, if there is other good approaches, please raise it up in the mail thread.

I'm sorry for not responding on the mail thread. However, as you commented, a clear policy is what we need. If we just borrowed scripts from another project, the policy would keep unclear to people who are not participating the project.

I'd like to see the policy that is coded into the script. We can enforce the policy later by using the script.

First, Labeling the issues and marking milestone. It is the easiest part. If everyone can follow the rules, that is totally fine. However people are just lazy and steps are usually missed.

I think we need to get used to doing it, and committers should be responsible for checking these things. ATS community is doing this pretty well without tools. A clear policy would be a help.

Second, manage the commit message. when you click 'squash and merge' button, if you don't pay attention to copying the description in the PR to the merge text box, you will end up with a commit without description or with a mixed of random commits

The same as above. Committers should take care of this. I don't think automating it always ends up with the best result.

Last, manage the cherry-picks. that is the difficult part. currently we are using milestone for tracking the releases, however a PR can be linked to only one milestone.

I understand the difficulty. ATS community is facing the same issue. To deal with it, they also use Project for backporting request management. It's still too early to say it's working well, but I think it worth to try.

For 1), those individual cherry-picks are not interconnected, when you look into a git log in the branch, you will never know if the change is a cherry-pick or a brand new commit. There is almost no way for people to traverse back the original change.

In ATS community, they uses -x when they cherry-pick commits. This option appends the original commit hash to the commit log.

For 2), it has to going to another review cycle and the problem listed above during merging will occur.

In ATS community, they make PRs for backporting only if the commit cannot be cherry-picked cleanly due to conflicts. And RM basically doesn't make PRs by himself but ask the original author to do so because RM isn't familiar with the change/module in some cases. RM just check that PRs are in policy and merge them.

I don't meant to introduce any overhead to the merging and releasing process. However I do think for Pulsar to become a success project, it has to do something on this management.

I understand and agree.

At the end, to really know the pain points, I would suggest every committer in the community should do at least one bugfix release.

I don't disagree but that would take time. Since you are trying to enforce a policy, I guess you already have good examples for explaining the pain and how the policy solve it, right? Why don't you share those examples?

@sijie
Copy link
Member Author

sijie commented Sep 7, 2018

i think there is a confusion here. There is no policy coded into the script. It is still left to the community to come up with a policy. If you want me to make a proposal about the policy, I am happy to do that.

What the script is really offering is to automating (end enforcing) the steps that committers need to run manually. It is still the committer’s responsibility for choosing labels, milestones and deciding cherrypicks. That says the tool is not the policy, it is trying to standardize what people is doing manually. The discussion of a policy should happen regardless it is manually or automated.

As you said, different projects have different way for doing things. Both of us have biases on doing things. ATS might have its practice doing things and it applies to the community. However I do not see that will apply to the community here. Even some of our committers know the “steps”, but they are just forgetting sometimes during their daily engineering life. The tool here is only meant to help automate the steps. It automated the “steps” currently the committers are using manually, so people will have to really think of the steps when doing manually. And again the discussion about the policy (or the steps for merging a PR) should happen regardless it is scripted or manually.

Regarding the pain points, I have stated what I have observed and encountered in my previous comments. The pain point comes from committer doing things manually, different things are missing during merging and that causes the problems in release. The solution I am proposing is to use a script to automate (and enforce) the steps that committers are doing manually. I provide what I can provide. Not sure what else I can provide.

@maskit
Copy link
Member

maskit commented Sep 7, 2018

Alright, it seems like I need to read what the script actually does.

Although I haven't read the script at all, I don't understand how we can automate something without policies, rules, whatever. I guess the script does the task based on something. It is expecting something. I think the something is apart of policies ("policy" and "rule" are different but I used it as "something we should follow").

And I guess the script ease the pain you stated in your motivation below.

Labels and milestones are not marked correctly. Changes are not cherry-picked to branches.
It causes a lot of troubles for release managers to do a release.

These are why I though some policy is coded into the script.

Anyways, I hope our comments here made people think about policies.

As for ways of doing things, I agree. How things can be done depends on people in the community. I was suggesting that automating is not the only way. Because I believe steps are based on some policy, I think people can do some of the things without enforcing with scripts once we define clear policies.

As for the pain points, my point is that showing what you exactly saw during releases and explaining how labels, milestones, etc help RM is more effective than just saying missing things cause problems.

@dave2wave
Copy link
Member

I think that the topic being discussed - how to tag and build a release including the policies should be surfaced on dev@pulsar with a clear subject.

@sijie
Copy link
Member Author

sijie commented Sep 7, 2018

@dave2wave yes. there was already an email thread about "how to label milestone". I was hoping the discussion regarding the policy or whatever rules should happen at the email thread. The PR here is just to showcase how a script can automate the steps the committers are currently using. No meant to use this for discussing the policy or whatever rules about release and milestone.

sijie added a commit to apache/bookkeeper that referenced this pull request Sep 7, 2018
Descriptions of the changes in this PR:

 ### Motivation

The development of bookkeeper has been moved from JIRA to Github.
So we can clean up the merge script to remove those instructions
assuming we are still on apache mirror and JIRA.

 ### Changes

This change follows the changes that I have adjusted at apache/pulsar#2526

- remove related logic about JIRA from merge script
- change merge script to use github merge api



Author: 

Reviewers: Enrico Olivelli <eolivelli@gmail.com>

This closes #1663 from sijie/improve_merge_script_2
# Find the github issues to close
github_issues = re.findall("#[0-9]{3,6}", title)

if len(github_issues) != 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block is expecting that all referred issues in a PR title will be resolved when the PR is closed.

I'm OK with this rule, however, it need to be discussed and clearly stated.

@sijie
Copy link
Member Author

sijie commented Nov 19, 2018

close this PR due to in-activities.

@merlimat is working on proposing a workflow for this.

@sijie sijie closed this Nov 19, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants