STORM-1885. python script for squashing and merging prs. #1468

harshach · 2016-06-06T17:48:33Z

No description provided.

ptgoetz · 2016-06-06T18:26:37Z

I'm a little on the fence in terms of squashing the commits of others vs. asking the contributor to do so. There are a lot of situations where spreading out a big patch over multiple commits makes sense and makes the history more consumable.

A couple of questions:

How does this preserve authorship in a pull request that has commits from multiple authors?
How would this work with our current branch model? Specifically, applying a pull request to multiple branches.

harshach · 2016-06-06T18:31:30Z

commits can be preserved in contributors branch as they seem fit but it doesn't help having them in the main repo. As a contributor they know what those commits means but everyone else will doesn't have any knowledge of individual commits and why they made them.

How does this preserve authorship in a pull request that has commits from multiple authors?
It will ask for primary authors and the user who is merging this can input more than one author at the time of merge.

How would this work with our current branch model? Specifically, applying a pull request to multiple branches
It will allow you to choose a main branch and allows you to pick the squashed commits onto other branches as well by cherry-picking.

ptgoetz · 2016-06-06T18:58:16Z

It will ask for primary authors and the user who is merging this can input more than one author at the time of merge.

That means it removes authorship information. If we tag a squashed commit as coming from multiple authors, we still wouldn't be able to differentiate what code was contributed by the individual authors.

So if I merged a pull request with multiple authors, the result would be a single commit from me with a message listing the contributing authors, is that correct?

harshach · 2016-06-06T22:23:26Z

"That means it removes authorship information. If we tag a squashed commit as coming from multiple authors, we still wouldn't be able to differentiate what code was contributed by the individual authors."
We won't be able capture this in JIRA either. I am not sure how much of this is important to have all the commits from each contributor for a single JIRA which in itself is rare unless its a big patch. It does have ability to give each contributor credit in the commit log.

"So if I merged a pull request with multiple authors, the result would be a single commit from me with a message listing the contributing authors, is that correct?"
Yes. If its important we could squash commits into single commit per contributor.

HeartSaVioR · 2016-06-06T23:20:44Z

dev-tools/storm-merge-pr.py

+# TODO Introduce a convention as this is too brittle
+RELEASE_BRANCH_PREFIX = "0."
+
+DEV_BRANCH_NAME = "trunk"


trunk is not used for Storm project.

HeartSaVioR · 2016-06-06T23:30:21Z

What's the difference between Spark script vs Kafka script?
Spark script is origin of Kafka and Zeppelin, so unless there're specific improvements, I think picking Spark script is more promising. For example, trunk is often not used for git project but Kafka is using that.
We're using branches which doesn't fully represent current version for branch. So our script should determine version more smart. One way to determine is looking at pom.xml.

harshach · 2016-06-06T23:37:29Z

@HeartSaVioR Not aware of spark script. I am ok with using either one or make this one work as we needed.

HeartSaVioR · 2016-06-06T23:48:12Z

Actually I was the one which claims separated credits from other project. (OpenTSDB/asynchbase#122)

But there was a strong reason to do so, and I think it's not the normal case we can see it often. As I addressed from mailing list, many big Apache projects already used this approach.

If there're cases which squashing really hurts then we can have exceptional case.

HeartSaVioR · 2016-06-06T23:52:07Z

@harshach Yeah, I don't know what things Kafka improve from Spark script so I wanted to see the benefit if you know about it. As I commented earlier, just adopting script doesn't work since we use different branch model (master, minor, bugfix) so it should be fully tested (including JIRA integration) before adopting.

harshach · 2016-06-06T23:53:58Z

@HeartSaVioR already ran a simple tests. I like it because it allows us to tag the reviewers and additional committers in the tag message and can be picked onto other branches as well. It does work with JIRA as well.

HeartSaVioR · 2016-06-07T00:00:13Z

@harshach
I skimmed a bit, and guess determining fix version will not work since branch names we use are different from Spark and Kafka and so on. We can still input them by hand so there's no issue on it but if we modify it to fit for Storm that would be great. (optional)

FYI: Spark script is having same issue.
https://github.com/apache/spark/blob/master/dev/merge_spark_pr.py

ptgoetz · 2016-06-07T00:49:27Z

We won't be able capture this in JIRA either. I am not sure how much of this is important to have all the commits from each contributor for a single JIRA which in itself is rare unless its a big patch. It does have ability to give each contributor credit in the commit log.

From a legal perspective it's very important that we be able to track the provenance of all code that lands in an ASF repository and could potentially be released.

For example:

Bob is a committer. Alice and Charles are not. Alice and Charles collaborate on a patch, both making commits. In the process Charles commits some code that he doesn't have the legal rights to (its proprietary, etc.). Later Bob uses this script to merge the pull request, and squash all the commits. Alice and Charles are listed as authors of the patch, but there is no history regarding how the code that the ASF doesn't have rights to get there. Was it Charles or Alice?

That may seem like an edge case, but one that we should absolutely consider.

ptgoetz · 2016-06-07T01:41:31Z

@harshach The source of the file is referenced here:
https://github.com/apache/storm/pull/1468/files#diff-da45fe3972445a9f82ef768808dd8853R20

I'd like to get clearance that what this script does or enables is okay before proceeding.

harshach · 2016-06-07T02:45:57Z

@ptgoetz makes sense. We can make explicit case for not merging PRs using this tool if the origin PR has commits from multiple authors and also can be integrated into the tool to not to proceed if thats the case.

HeartSaVioR · 2016-06-22T09:56:02Z

I'd really like to go forward with automated tools for developers / committers. What I've stated from dev@ mailing list, many projects already use specific tools for merging, and the merge script originated from Spark is well used for Spark, Kafka, Zeppelin (now TLP).

HeartSaVioR · 2016-06-22T09:58:48Z

I'll take a deep look and describe what this script actually does.

HeartSaVioR · 2016-06-23T07:03:55Z

Here's my understanding regarding this script.

main()

get latest branch (highest version) from github mirror
- it assumes that prefix of release branch is release version
get information regarding pull request and events of pull request
prompt user(committer) to input commit title: default value is title of pull request, and user can replace them
standardize commit title (via standardize_jira_ref)
prompt user to use title as modified vs original
check that pull request is already merged (closed by asfgit)
- if it is, check merge commit is fetched to local, and cherry-pick to latest branch assuming user wants backport
prompt user to continue if PR seems to resolve conflicts (by seeing flag from PR information)
print information of pull request, and commit title, and so on, and prompt user to go on
merge PR into target branch of PR: get commit hash afterwards
prompt user to see user wants cherry-pick
- if yes, cherry-pick to latest branch
prompt user to update associated JIRA
- if yes, resolve issue as fixed with leaving comment

merge_pr()

fetch branch which pull request is referring
fetch and checkout branch which pull request targets (from asf-git)
merge pull request branch with squash option
- if there're some conflicts, guide user to fix it and mark as resolved (via git add)
prompt user to input main author: default value is who has most of commits (via extracting authors from pull request branch and sort)
prompt user to input reviewers: can be blank
prompt user to see user wants to list all commits into commit message
construct commit message by
- commit title
- body of pull request if presented
- all authors
- all reviewers if presented
- user if user resolves merge conflict manually
- auto close message for pull request
- list of commits if user want to
commit with passing main author as author, and commit message
prompt user to push, or stop
push changes to remote repository (to asf-git)
clean temporary branches
get commit hash and return

cherry_pick()

prompt user to input branch name to cherry-pick: default value is latest branch which is passed from main()
fetch and checkout branch which cherry-pick targets (from asf-git)
cherry-pick with commit hash
- if there're some conflicts, guide user to fix it and complete cherry-pick manually
prompt user to push, or stop
push changes to remote repository (to asf-git)
clean temporary branches
get commit hash and return

resolve_jira_issue()

prompt user to input JIRA issue ID: default value is extracted value from commit title
get information of the issue
check status and stop if issue is already marked as 'Resolved'
print information of the issue
prompt user to input comma-separated fix versions
- default values are unreleased versions matched to target branch for merge() / cherry_pick()
  - develop branch is treated as default version
mark issue as 'Resolved' with setting fix versions and leaving comment

HeartSaVioR · 2016-06-24T00:49:09Z

We may have to modify lots of part of script since...

We don't have develop branch so all about develop branch should be modified. Spark merge script also doesn't have handling with develop branch since they don't have develop branch, too. Maybe adopting spark script would be easier than adopting kafka script.
Branch policy is not compatible with projects which uses this script. They have branches per version but we just maintain version lines (major, minor, bugfix) so we should do something while determining fix versions from merged branches.
We're having master and 1.x / 1.0.x branches heavily diverged, so there're often two or more pull requests submitted per one issue. (Committers don't cherry-pick between master and 1.x for storm-core manually since it's easy to see merge conflict.) It should be tested (at least unit test and integration test) individually, and issue should be closed when all of pull requests are checked in. It means that we're having different merging step which other projects don't have.
- Moreover, commit message hook (closing PR) doesn't work if PR is not against master.
We should update CHANGELOG while merging step. Personally I don't like updating CHANGELOG so opened thread for discussion but we didn't decide something clearly.
Commit message will contain body of pull request which is free format for now and tends to be meaningless for commit message. We need to guide contributor to write meaningful information. Thanks for Github we can have body template of the pull request which many projects have been using already.

So without arranging our branch policy and merging step, it will be hard to get merge script fit for us.

harshach · 2016-07-10T21:06:09Z

@HeartSaVioR Thanks for documenting the script.
"Branch policy is not compatible with projects which uses this script. They have branches per version but we just maintain version lines (major, minor, bugfix) so we should do something while determining fix versions from merged branches."
Agree but than again we need modify either script to get this done . So spark or kafka script doesn't matter much.
"We're having master and 1.x / 1.0.x branches heavily diverged, so there're often two or more pull requests submitted per one issue. (Committers don't cherry-pick between master and 1.x for storm-core manually since it's easy to see merge conflict.) It should be tested (at least unit test and integration test) individually, and issue should be closed when all of pull requests are checked in. It means that we're having different merging step which other projects don't have.
Moreover, commit message hook (closing PR) doesn't work if PR is not against master."
I am +1 on adding unit tests, integration tests to be run as part of the script but have a manual validation from the user to say yes/no to go-ahead with merge. As the unit tests can be flaky there can be false negatives.
"We should update CHANGELOG while merging step. Personally I don't like updating CHANGELOG so opened thread for discussion but we didn't decide something clearly."
Agree with you on this. This extra-step adds unnecessary commits to the log. As long as we update the JIRA fixVersions before the release its easy to generate a change log.

"Commit message will contain body of pull request which is free format for now and tends to be meaningless for commit message."
With the above script one can edit the commit title and it will have JIRA number & title that will give us much more meaningful message about the work done as part of th ejira.

"So without arranging our branch policy and merging step, it will be hard to get merge script fit for us."
I disagree with this. We can have minimal modifications to the script to work with our current branches.

knusbaum · 2016-10-04T22:07:07Z

-1
I am generally opposed to this. Most PRs only have a small number of commits and aren't a problem. For PRs with a large number of commits, it's simple enough to ask the contributor to squash their own commits.

I'm not sure I see the benefit in adding another script, which we will have to maintain, in order to do something we should rarely be doing.

Also, I worry that having this script will lead to a sharp rise in totally-squashed PR merges, even when there's not really any benefit (and in fact, loss of authorship info) since some people are likely just going to use the script whenever they're doing a merge.

ptgoetz · 2016-10-04T22:41:37Z

I'll second @knusbaum's -1. Based on points I made earlier. This has the potntial to automatically destroy code provenance, especially if more than one contributor is involved in a pull request. From a legal perspective, the ASF need to be able to determine the origin of all code changes.

I would recommend any project that uses a variant of this script to double-check with ASF legal that it is okay. I could be totally wrong, but I'd rather play it safe.

I'd also argue that a well thought out series of commits cane make reviews easier. For example, separating maven build changes from code changes. I'd rather we encourage devs to think out the partitioning of commits themselves and squash appropriately if necessary.

HeartSaVioR · 2016-10-04T22:58:29Z

I'm totally +1 to this approach, even though I think script should be modified to Storm's project style.

Like I said to dev@ mailing list, I have been doing reviewing and merging pending pull requests for weeks and months, and it was painful enough to merge and port back to each branches, even though I ignored cleaning up commits. (Pain is amplified when tiny commit should be merged to all branches.) If I want to clean up commits before merging it should be more painful. CHANGELOG is subject to not in sync among branches, but we need to write it manually because it's hard to filter merge commits to see the change list. (We could just rely on JIRA issues for alternative.)

Regarding commits, I don't want to keep commits like 'kicking travis', 'address review comments', etc. which is not helpful at any chances. For my last 2 years of development of Storm, I didn't utilize individual commit. If something is wrong with recent merge, we rollback the merge, not individual commit. Squashing commits is widely used strategy and already shows success story to many big projects. Even Github provides the squash merge mode (recently rebase mode too) in GUI.

If we want to merge in squashed commit, it should be done in merging process, not reviewing process. For me, ideal review process should be contributor-friendly. While we can't put efforts to only maintain Storm project (by reviewing pull requests, etc.), contributors also can't. Once we create a script which also squashes the commits, we don't need to make contributors bothering with rebasing and squashing commits. If not, all individuals including us should do it just after merger said 'please rebase and squash in order to merge.', which is also bothering for mergers, too. Moreover, there's a chance for contributors to be busy at the moment, and pull request goes stale. Pull requests which need upmerging are the case.

I understand and agree the authorship issue, but we can treat it as exceptional case. Many pull requests are authored by one.

Let's make merging phase as painless, or at least less painful thing.

ptgoetz · 2016-10-04T23:02:12Z

I'll also point out that the "if other Apache projects do it, it is oaky" stance is particularly dangerous. PMC members must understand ASF policy and not rely on what other projects do. If what another project does is wrong, then doing the same thing in our project just introduces liability.

ptgoetz · 2016-10-04T23:12:52Z

I'm okay with automating the merge process, just not the way it is implemented here. Perhaps we shouldmove the discussion to the dev list.

harshach · 2016-10-18T20:40:31Z

@knusbaum maintaining the script shouldn't push us from adopting a better approach. It's another piece of code that we need to maintain just like entire repo that we are maintaining right now.
Lets continue this discussion over mailing list.

We are closing stale Pull Requests to make the list more manageable. Please re-open any Pull Request that has been closed in error. Closes apache#608 Closes apache#639 Closes apache#640 Closes apache#648 Closes apache#662 Closes apache#668 Closes apache#692 Closes apache#705 Closes apache#724 Closes apache#728 Closes apache#730 Closes apache#753 Closes apache#803 Closes apache#854 Closes apache#922 Closes apache#986 Closes apache#992 Closes apache#1019 Closes apache#1040 Closes apache#1041 Closes apache#1043 Closes apache#1046 Closes apache#1051 Closes apache#1078 Closes apache#1146 Closes apache#1164 Closes apache#1165 Closes apache#1178 Closes apache#1213 Closes apache#1225 Closes apache#1258 Closes apache#1259 Closes apache#1268 Closes apache#1272 Closes apache#1277 Closes apache#1278 Closes apache#1288 Closes apache#1296 Closes apache#1328 Closes apache#1342 Closes apache#1353 Closes apache#1370 Closes apache#1376 Closes apache#1391 Closes apache#1395 Closes apache#1399 Closes apache#1406 Closes apache#1410 Closes apache#1422 Closes apache#1427 Closes apache#1443 Closes apache#1462 Closes apache#1468 Closes apache#1483 Closes apache#1506 Closes apache#1509 Closes apache#1515 Closes apache#1520 Closes apache#1521 Closes apache#1525 Closes apache#1527 Closes apache#1544 Closes apache#1550 Closes apache#1566 Closes apache#1569 Closes apache#1570 Closes apache#1575 Closes apache#1580 Closes apache#1584 Closes apache#1591 Closes apache#1600 Closes apache#1611 Closes apache#1613 Closes apache#1639 Closes apache#1703 Closes apache#1711 Closes apache#1719 Closes apache#1737 Closes apache#1760 Closes apache#1767 Closes apache#1768 Closes apache#1785 Closes apache#1799 Closes apache#1822 Closes apache#1824 Closes apache#1844 Closes apache#1874 Closes apache#1918 Closes apache#1928 Closes apache#1937 Closes apache#1942 Closes apache#1951 Closes apache#1957 Closes apache#1963 Closes apache#1964 Closes apache#1965 Closes apache#1967 Closes apache#1968 Closes apache#1971 Closes apache#1985 Closes apache#1986 Closes apache#1998 Closes apache#2031 Closes apache#2032 Closes apache#2071 Closes apache#2076 Closes apache#2108 Closes apache#2119 Closes apache#2128 Closes apache#2142 Closes apache#2174 Closes apache#2206 Closes apache#2297 Closes apache#2322 Closes apache#2332 Closes apache#2341 Closes apache#2377 Closes apache#2414 Closes apache#2469

harshach force-pushed the STORM-1885 branch from 7321353 to 700742a Compare June 6, 2016 18:01

STORM-1885. python script for squashing and merging prs.

39e42f0

harshach force-pushed the STORM-1885 branch from 700742a to 39e42f0 Compare June 6, 2016 18:15

harshach changed the title ~~STORM-1865. python script for squashing and merging prs.~~ STORM-1885. python script for squashing and merging prs. Jun 6, 2016

HeartSaVioR reviewed Jun 6, 2016
View reviewed changes

HeartSaVioR mentioned this pull request Jan 15, 2018

[STORM-2894] fix some random typos #2512

Merged

d2r mentioned this pull request Oct 16, 2018

STORM-3250: Closes Pull Requests unchanged in 2018 #2880

Merged

asfgit closed this in #2880 Oct 22, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

STORM-1885. python script for squashing and merging prs. #1468

STORM-1885. python script for squashing and merging prs. #1468

harshach commented Jun 6, 2016

ptgoetz commented Jun 6, 2016

harshach commented Jun 6, 2016

ptgoetz commented Jun 6, 2016

harshach commented Jun 6, 2016 •

edited

Loading

HeartSaVioR Jun 6, 2016

HeartSaVioR commented Jun 6, 2016

harshach commented Jun 6, 2016

HeartSaVioR commented Jun 6, 2016

HeartSaVioR commented Jun 6, 2016

harshach commented Jun 6, 2016

HeartSaVioR commented Jun 7, 2016 •

edited

Loading

ptgoetz commented Jun 7, 2016

ptgoetz commented Jun 7, 2016

harshach commented Jun 7, 2016 •

edited

Loading

HeartSaVioR commented Jun 22, 2016

HeartSaVioR commented Jun 22, 2016

HeartSaVioR commented Jun 23, 2016

HeartSaVioR commented Jun 24, 2016 •

edited

Loading

harshach commented Jul 10, 2016

knusbaum commented Oct 4, 2016

ptgoetz commented Oct 4, 2016

HeartSaVioR commented Oct 4, 2016

ptgoetz commented Oct 4, 2016

ptgoetz commented Oct 4, 2016

harshach commented Oct 18, 2016

STORM-1885. python script for squashing and merging prs. #1468

STORM-1885. python script for squashing and merging prs. #1468

Conversation

harshach commented Jun 6, 2016

ptgoetz commented Jun 6, 2016

harshach commented Jun 6, 2016

ptgoetz commented Jun 6, 2016

harshach commented Jun 6, 2016 • edited Loading

HeartSaVioR Jun 6, 2016

Choose a reason for hiding this comment

HeartSaVioR commented Jun 6, 2016

harshach commented Jun 6, 2016

HeartSaVioR commented Jun 6, 2016

HeartSaVioR commented Jun 6, 2016

harshach commented Jun 6, 2016

HeartSaVioR commented Jun 7, 2016 • edited Loading

ptgoetz commented Jun 7, 2016

ptgoetz commented Jun 7, 2016

harshach commented Jun 7, 2016 • edited Loading

HeartSaVioR commented Jun 22, 2016

HeartSaVioR commented Jun 22, 2016

HeartSaVioR commented Jun 23, 2016

HeartSaVioR commented Jun 24, 2016 • edited Loading

harshach commented Jul 10, 2016

knusbaum commented Oct 4, 2016

ptgoetz commented Oct 4, 2016

HeartSaVioR commented Oct 4, 2016

ptgoetz commented Oct 4, 2016

ptgoetz commented Oct 4, 2016

harshach commented Oct 18, 2016

harshach commented Jun 6, 2016 •

edited

Loading

HeartSaVioR commented Jun 7, 2016 •

edited

Loading

harshach commented Jun 7, 2016 •

edited

Loading

HeartSaVioR commented Jun 24, 2016 •

edited

Loading