-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[dev] provide a python merge script for merging pull requests #2526
Conversation
### Motivation Currently we don't have a good tool on managing the merge process for pull requests. Labels and milestones are not marked correctly. Changes are not cherry-picked to branches. It causes a lot of troubles for release managers to do a release. ### Changes This PR adopts the bookkeeper merge script and change it to use github pull request merge api instead of git push.
# Remote name which points to the GitHub site | ||
PR_REMOTE_NAME = os.environ.get("PR_REMOTE_NAME", "apache") | ||
# Remote name which points to Apache git | ||
PUSH_REMOTE_NAME = os.environ.get("PUSH_REMOTE_NAME", "apache") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having PR_REMOTE_NAME and PUSH_REMOTE_NAME no longer makes sense, since github is the canonical repo.
TEMP_BRANCH_PREFIX = "PR_TOOL" | ||
RELEASE_BRANCH_PREFIX = "branch-" | ||
|
||
DEFAULT_FIX_VERSION = os.environ.get("DEFAULT_FIX_VERSION", "0.9.1.0") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does this version come from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't change this when porting this from bookkeeper. so I think it is coming from spark, since the bk script was ported from spark.
def get_json(url, preview_api = False): | ||
try: | ||
request = urllib2.Request(url) | ||
if GITHUB_OAUTH_KEY: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should never be null.
"GitHub requests." | ||
else: | ||
print "Unable to fetch URL, exiting: %s - %s" % (url, e) | ||
sys.exit(-1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
call fail()
else: | ||
print "Unable to fetch URL, exiting: %s - %s" % (url, e) | ||
print e | ||
sys.exit(-1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
call fail()
"GitHub requests." | ||
else: | ||
print "Unable to fetch URL, exiting: %s" % url | ||
sys.exit(-1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
call fail()
if line.startswith('>'): | ||
continue | ||
modified_body = modified_body + line + "\n" | ||
if modified_body != body: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will ask every time, because it will add an extra '\n' at the end.
|
||
# Merged pull requests don't appear as merged in the GitHub API; | ||
# Instead, they're closed by asfgit. | ||
merge_commits = \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like an artifact of the days when asf had it's own repos and github was just mirrors.
A lot of stuff can be removed from this script by not using a local repo at all, and using the github merge api.
https://developer.github.com/v3/pulls/#merge-a-pull-request-merge-button
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the logic here is checking whether a PR can be merged or not. the real merge action has been replaced by the merge api. I have stated that in the description. I can clean this up if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The merge api should throw an error if you can't merge, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ivankelly yes, I said that I can clean this up if needed. but I don't want to make any changes until the community wants to adopt this script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In short, I think the root cause of the mess is not lack of tools but lack of clear policies.
Labels and milestones are not marked correctly.
Where's the rule book?
Changes are not cherry-picked to branches.
Do we have a maintenance policy?
What commits have to be cherry-picked? To which branches? Until when? By Who?
I understand the motivation, however, I think we should discuss rules and policies, and should build a consensus first. Otherwise who can review and maintain the script correctly?
I'm fine with automating boring steps and preventing mistakes by using scripts, but it's questionable to me that we need complicated rules that we cannot manually apply correctly.
agreed. that's what I stated in the mail thread. what I need is a clear policy. the tool here is adding enforcement.
I have started the mail thread for almost a month, but really almost no one in the community is interested in discussing the rules.
I have started the discussion in the mail thread. This PR is just to demonstrate a proposal to follow what other project is doing spark, bookkeeper, kafka and many other ASF projects. If the community likes this approach, we can adopt it. If the community doesn't like the approach, it is okay since it is just a proposal, if there is other good approaches, please raise it up in the mail thread.
First, Labeling the issues and marking milestone. It is the easiest part. If everyone can follow the rules, that is totally fine. However people are just lazy and steps are usually missed. Second, manage the commit message. when you click 'squash and merge' button, if you don't pay attention to copying the description in the PR to the merge text box, you will end up with a commit without description or with a mixed of random commits Last, manage the cherry-picks. that is the difficult part. currently we are using milestone for tracking the releases, however a PR can be linked to only one milestone. so if there is no accurate view on how a pull request lands at the git repo. for a release manager, 1) he has to either manually cherry-pick to branches and left a comment in the original pull request, or 2) he has to manually create a new PR for it, which I think it is a load to the release manager. I don't meant to introduce any overhead to the merging and releasing process. However I do think for Pulsar to become a success project, it has to do something on this management. At the end, to really know the pain points, I would suggest every committer in the community should do at least one bugfix release. |
agree with commit message that is not uniformly enforced. Also most people (myself included) only fill the longer description in the PR but not on the commit log.
The current setting in github is to only allow the option to do "squash & rebase", so at least this is already enforced. |
Sorry I was trying to say commit messages. Basically GitHub squash and merge all commit logs from all commits together. If you don’t pay attention to replace it, it becomes a mess in the git log. |
I'm sorry for not responding on the mail thread. However, as you commented, a clear policy is what we need. If we just borrowed scripts from another project, the policy would keep unclear to people who are not participating the project. I'd like to see the policy that is coded into the script. We can enforce the policy later by using the script.
I think we need to get used to doing it, and committers should be responsible for checking these things. ATS community is doing this pretty well without tools. A clear policy would be a help.
The same as above. Committers should take care of this. I don't think automating it always ends up with the best result.
I understand the difficulty. ATS community is facing the same issue. To deal with it, they also use Project for backporting request management. It's still too early to say it's working well, but I think it worth to try.
In ATS community, they uses
In ATS community, they make PRs for backporting only if the commit cannot be cherry-picked cleanly due to conflicts. And RM basically doesn't make PRs by himself but ask the original author to do so because RM isn't familiar with the change/module in some cases. RM just check that PRs are in policy and merge them.
I understand and agree.
I don't disagree but that would take time. Since you are trying to enforce a policy, I guess you already have good examples for explaining the pain and how the policy solve it, right? Why don't you share those examples? |
i think there is a confusion here. There is no policy coded into the script. It is still left to the community to come up with a policy. If you want me to make a proposal about the policy, I am happy to do that. What the script is really offering is to automating (end enforcing) the steps that committers need to run manually. It is still the committer’s responsibility for choosing labels, milestones and deciding cherrypicks. That says the tool is not the policy, it is trying to standardize what people is doing manually. The discussion of a policy should happen regardless it is manually or automated. As you said, different projects have different way for doing things. Both of us have biases on doing things. ATS might have its practice doing things and it applies to the community. However I do not see that will apply to the community here. Even some of our committers know the “steps”, but they are just forgetting sometimes during their daily engineering life. The tool here is only meant to help automate the steps. It automated the “steps” currently the committers are using manually, so people will have to really think of the steps when doing manually. And again the discussion about the policy (or the steps for merging a PR) should happen regardless it is scripted or manually. Regarding the pain points, I have stated what I have observed and encountered in my previous comments. The pain point comes from committer doing things manually, different things are missing during merging and that causes the problems in release. The solution I am proposing is to use a script to automate (and enforce) the steps that committers are doing manually. I provide what I can provide. Not sure what else I can provide. |
Alright, it seems like I need to read what the script actually does. Although I haven't read the script at all, I don't understand how we can automate something without policies, rules, whatever. I guess the script does the task based on something. It is expecting something. I think the something is apart of policies ("policy" and "rule" are different but I used it as "something we should follow"). And I guess the script ease the pain you stated in your motivation below.
These are why I though some policy is coded into the script. Anyways, I hope our comments here made people think about policies. As for ways of doing things, I agree. How things can be done depends on people in the community. I was suggesting that automating is not the only way. Because I believe steps are based on some policy, I think people can do some of the things without enforcing with scripts once we define clear policies. As for the pain points, my point is that showing what you exactly saw during releases and explaining how labels, milestones, etc help RM is more effective than just saying missing things cause problems. |
I think that the topic being discussed - how to tag and build a release including the policies should be surfaced on dev@pulsar with a clear subject. |
@dave2wave yes. there was already an email thread about "how to label milestone". I was hoping the discussion regarding the policy or whatever rules should happen at the email thread. The PR here is just to showcase how a script can automate the steps the committers are currently using. No meant to use this for discussing the policy or whatever rules about release and milestone. |
Descriptions of the changes in this PR: ### Motivation The development of bookkeeper has been moved from JIRA to Github. So we can clean up the merge script to remove those instructions assuming we are still on apache mirror and JIRA. ### Changes This change follows the changes that I have adjusted at apache/pulsar#2526 - remove related logic about JIRA from merge script - change merge script to use github merge api Author: Reviewers: Enrico Olivelli <eolivelli@gmail.com> This closes #1663 from sijie/improve_merge_script_2
# Find the github issues to close | ||
github_issues = re.findall("#[0-9]{3,6}", title) | ||
|
||
if len(github_issues) != 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This block is expecting that all referred issues in a PR title will be resolved when the PR is closed.
I'm OK with this rule, however, it need to be discussed and clearly stated.
close this PR due to in-activities. @merlimat is working on proposing a workflow for this. |
Motivation
Currently we don't have a good tool on managing the merge process for pull requests.
Labels and milestones are not marked correctly. Changes are not cherry-picked to branches.
It causes a lot of troubles for release managers to do a release.
Changes
This PR adopts the bookkeeper merge script and change it to use github pull request merge api instead of git push.
This script handles:
component/
type/
release/
labels. in the cherry-picked commits, it will add back-references to the original commit. so when you negative the commits on github, it should be able to quickly negative the original commit.