Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

STORM-1885. python script for squashing and merging prs. #1468

Closed
wants to merge 1 commit into from

Conversation

harshach
Copy link
Contributor

@harshach harshach commented Jun 6, 2016

No description provided.

@harshach harshach changed the title STORM-1865. python script for squashing and merging prs. STORM-1885. python script for squashing and merging prs. Jun 6, 2016
@ptgoetz
Copy link
Member

ptgoetz commented Jun 6, 2016

I'm a little on the fence in terms of squashing the commits of others vs. asking the contributor to do so. There are a lot of situations where spreading out a big patch over multiple commits makes sense and makes the history more consumable.

A couple of questions:

  • How does this preserve authorship in a pull request that has commits from multiple authors?
  • How would this work with our current branch model? Specifically, applying a pull request to multiple branches.

@harshach
Copy link
Contributor Author

harshach commented Jun 6, 2016

commits can be preserved in contributors branch as they seem fit but it doesn't help having them in the main repo. As a contributor they know what those commits means but everyone else will doesn't have any knowledge of individual commits and why they made them.

How does this preserve authorship in a pull request that has commits from multiple authors?
It will ask for primary authors and the user who is merging this can input more than one author at the time of merge.

How would this work with our current branch model? Specifically, applying a pull request to multiple branches
It will allow you to choose a main branch and allows you to pick the squashed commits onto other branches as well by cherry-picking.

@ptgoetz
Copy link
Member

ptgoetz commented Jun 6, 2016

It will ask for primary authors and the user who is merging this can input more than one author at the time of merge.

That means it removes authorship information. If we tag a squashed commit as coming from multiple authors, we still wouldn't be able to differentiate what code was contributed by the individual authors.

So if I merged a pull request with multiple authors, the result would be a single commit from me with a message listing the contributing authors, is that correct?

@harshach
Copy link
Contributor Author

harshach commented Jun 6, 2016

"That means it removes authorship information. If we tag a squashed commit as coming from multiple authors, we still wouldn't be able to differentiate what code was contributed by the individual authors."
We won't be able capture this in JIRA either. I am not sure how much of this is important to have all the commits from each contributor for a single JIRA which in itself is rare unless its a big patch. It does have ability to give each contributor credit in the commit log.

"So if I merged a pull request with multiple authors, the result would be a single commit from me with a message listing the contributing authors, is that correct?"
Yes. If its important we could squash commits into single commit per contributor.

# TODO Introduce a convention as this is too brittle
RELEASE_BRANCH_PREFIX = "0."

DEV_BRANCH_NAME = "trunk"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trunk is not used for Storm project.

@HeartSaVioR
Copy link
Contributor

  1. What's the difference between Spark script vs Kafka script?
    Spark script is origin of Kafka and Zeppelin, so unless there're specific improvements, I think picking Spark script is more promising. For example, trunk is often not used for git project but Kafka is using that.
  2. We're using branches which doesn't fully represent current version for branch. So our script should determine version more smart. One way to determine is looking at pom.xml.

@harshach
Copy link
Contributor Author

harshach commented Jun 6, 2016

@HeartSaVioR Not aware of spark script. I am ok with using either one or make this one work as we needed.

@HeartSaVioR
Copy link
Contributor

Actually I was the one which claims separated credits from other project. (OpenTSDB/asynchbase#122)

But there was a strong reason to do so, and I think it's not the normal case we can see it often. As I addressed from mailing list, many big Apache projects already used this approach.

If there're cases which squashing really hurts then we can have exceptional case.

@HeartSaVioR
Copy link
Contributor

@harshach Yeah, I don't know what things Kafka improve from Spark script so I wanted to see the benefit if you know about it. As I commented earlier, just adopting script doesn't work since we use different branch model (master, minor, bugfix) so it should be fully tested (including JIRA integration) before adopting.

@harshach
Copy link
Contributor Author

harshach commented Jun 6, 2016

@HeartSaVioR already ran a simple tests. I like it because it allows us to tag the reviewers and additional committers in the tag message and can be picked onto other branches as well. It does work with JIRA as well.

@HeartSaVioR
Copy link
Contributor

HeartSaVioR commented Jun 7, 2016

@harshach
I skimmed a bit, and guess determining fix version will not work since branch names we use are different from Spark and Kafka and so on. We can still input them by hand so there's no issue on it but if we modify it to fit for Storm that would be great. (optional)

FYI: Spark script is having same issue.
https://github.com/apache/spark/blob/master/dev/merge_spark_pr.py

@ptgoetz
Copy link
Member

ptgoetz commented Jun 7, 2016

We won't be able capture this in JIRA either. I am not sure how much of this is important to have all the commits from each contributor for a single JIRA which in itself is rare unless its a big patch. It does have ability to give each contributor credit in the commit log.

From a legal perspective it's very important that we be able to track the provenance of all code that lands in an ASF repository and could potentially be released.

For example:

Bob is a committer. Alice and Charles are not. Alice and Charles collaborate on a patch, both making commits. In the process Charles commits some code that he doesn't have the legal rights to (its proprietary, etc.). Later Bob uses this script to merge the pull request, and squash all the commits. Alice and Charles are listed as authors of the patch, but there is no history regarding how the code that the ASF doesn't have rights to get there. Was it Charles or Alice?

That may seem like an edge case, but one that we should absolutely consider.

@ptgoetz
Copy link
Member

ptgoetz commented Jun 7, 2016

@harshach The source of the file is referenced here:
https://github.com/apache/storm/pull/1468/files#diff-da45fe3972445a9f82ef768808dd8853R20

I'd like to get clearance that what this script does or enables is okay before proceeding.

@harshach
Copy link
Contributor Author

harshach commented Jun 7, 2016

@ptgoetz makes sense. We can make explicit case for not merging PRs using this tool if the origin PR has commits from multiple authors and also can be integrated into the tool to not to proceed if thats the case.

@HeartSaVioR
Copy link
Contributor

I'd really like to go forward with automated tools for developers / committers. What I've stated from dev@ mailing list, many projects already use specific tools for merging, and the merge script originated from Spark is well used for Spark, Kafka, Zeppelin (now TLP).

@HeartSaVioR
Copy link
Contributor

I'll take a deep look and describe what this script actually does.

@HeartSaVioR
Copy link
Contributor

Here's my understanding regarding this script.

main()

  • get latest branch (highest version) from github mirror
    • it assumes that prefix of release branch is release version
  • get information regarding pull request and events of pull request
  • prompt user(committer) to input commit title: default value is title of pull request, and user can replace them
  • standardize commit title (via standardize_jira_ref)
  • prompt user to use title as modified vs original
  • check that pull request is already merged (closed by asfgit)
    • if it is, check merge commit is fetched to local, and cherry-pick to latest branch assuming user wants backport
  • prompt user to continue if PR seems to resolve conflicts (by seeing flag from PR information)
  • print information of pull request, and commit title, and so on, and prompt user to go on
  • merge PR into target branch of PR: get commit hash afterwards
  • prompt user to see user wants cherry-pick
    • if yes, cherry-pick to latest branch
  • prompt user to update associated JIRA
    • if yes, resolve issue as fixed with leaving comment

merge_pr()

  • fetch branch which pull request is referring
  • fetch and checkout branch which pull request targets (from asf-git)
  • merge pull request branch with squash option
    • if there're some conflicts, guide user to fix it and mark as resolved (via git add)
  • prompt user to input main author: default value is who has most of commits (via extracting authors from pull request branch and sort)
  • prompt user to input reviewers: can be blank
  • prompt user to see user wants to list all commits into commit message
  • construct commit message by
    • commit title
    • body of pull request if presented
    • all authors
    • all reviewers if presented
    • user if user resolves merge conflict manually
    • auto close message for pull request
    • list of commits if user want to
  • commit with passing main author as author, and commit message
  • prompt user to push, or stop
  • push changes to remote repository (to asf-git)
  • clean temporary branches
  • get commit hash and return

cherry_pick()

  • prompt user to input branch name to cherry-pick: default value is latest branch which is passed from main()
  • fetch and checkout branch which cherry-pick targets (from asf-git)
  • cherry-pick with commit hash
    • if there're some conflicts, guide user to fix it and complete cherry-pick manually
  • prompt user to push, or stop
  • push changes to remote repository (to asf-git)
  • clean temporary branches
  • get commit hash and return

resolve_jira_issue()

  • prompt user to input JIRA issue ID: default value is extracted value from commit title
  • get information of the issue
  • check status and stop if issue is already marked as 'Resolved'
  • print information of the issue
  • prompt user to input comma-separated fix versions
    • default values are unreleased versions matched to target branch for merge() / cherry_pick()
      • develop branch is treated as default version
  • mark issue as 'Resolved' with setting fix versions and leaving comment

@HeartSaVioR
Copy link
Contributor

HeartSaVioR commented Jun 24, 2016

We may have to modify lots of part of script since...

  • We don't have develop branch so all about develop branch should be modified. Spark merge script also doesn't have handling with develop branch since they don't have develop branch, too. Maybe adopting spark script would be easier than adopting kafka script.
  • Branch policy is not compatible with projects which uses this script. They have branches per version but we just maintain version lines (major, minor, bugfix) so we should do something while determining fix versions from merged branches.
  • We're having master and 1.x / 1.0.x branches heavily diverged, so there're often two or more pull requests submitted per one issue. (Committers don't cherry-pick between master and 1.x for storm-core manually since it's easy to see merge conflict.) It should be tested (at least unit test and integration test) individually, and issue should be closed when all of pull requests are checked in. It means that we're having different merging step which other projects don't have.
    • Moreover, commit message hook (closing PR) doesn't work if PR is not against master.
  • We should update CHANGELOG while merging step. Personally I don't like updating CHANGELOG so opened thread for discussion but we didn't decide something clearly.
  • Commit message will contain body of pull request which is free format for now and tends to be meaningless for commit message. We need to guide contributor to write meaningful information. Thanks for Github we can have body template of the pull request which many projects have been using already.

So without arranging our branch policy and merging step, it will be hard to get merge script fit for us.

@harshach
Copy link
Contributor Author

@HeartSaVioR Thanks for documenting the script.
"Branch policy is not compatible with projects which uses this script. They have branches per version but we just maintain version lines (major, minor, bugfix) so we should do something while determining fix versions from merged branches."
Agree but than again we need modify either script to get this done . So spark or kafka script doesn't matter much.
"We're having master and 1.x / 1.0.x branches heavily diverged, so there're often two or more pull requests submitted per one issue. (Committers don't cherry-pick between master and 1.x for storm-core manually since it's easy to see merge conflict.) It should be tested (at least unit test and integration test) individually, and issue should be closed when all of pull requests are checked in. It means that we're having different merging step which other projects don't have.
Moreover, commit message hook (closing PR) doesn't work if PR is not against master."
I am +1 on adding unit tests, integration tests to be run as part of the script but have a manual validation from the user to say yes/no to go-ahead with merge. As the unit tests can be flaky there can be false negatives.
"We should update CHANGELOG while merging step. Personally I don't like updating CHANGELOG so opened thread for discussion but we didn't decide something clearly."
Agree with you on this. This extra-step adds unnecessary commits to the log. As long as we update the JIRA fixVersions before the release its easy to generate a change log.

"Commit message will contain body of pull request which is free format for now and tends to be meaningless for commit message."
With the above script one can edit the commit title and it will have JIRA number & title that will give us much more meaningful message about the work done as part of th ejira.

"So without arranging our branch policy and merging step, it will be hard to get merge script fit for us."
I disagree with this. We can have minimal modifications to the script to work with our current branches.

@knusbaum
Copy link
Contributor

knusbaum commented Oct 4, 2016

-1
I am generally opposed to this. Most PRs only have a small number of commits and aren't a problem. For PRs with a large number of commits, it's simple enough to ask the contributor to squash their own commits.

I'm not sure I see the benefit in adding another script, which we will have to maintain, in order to do something we should rarely be doing.

Also, I worry that having this script will lead to a sharp rise in totally-squashed PR merges, even when there's not really any benefit (and in fact, loss of authorship info) since some people are likely just going to use the script whenever they're doing a merge.

@ptgoetz
Copy link
Member

ptgoetz commented Oct 4, 2016

I'll second @knusbaum's -1. Based on points I made earlier. This has the potntial to automatically destroy code provenance, especially if more than one contributor is involved in a pull request. From a legal perspective, the ASF need to be able to determine the origin of all code changes.

I would recommend any project that uses a variant of this script to double-check with ASF legal that it is okay. I could be totally wrong, but I'd rather play it safe.

I'd also argue that a well thought out series of commits cane make reviews easier. For example, separating maven build changes from code changes. I'd rather we encourage devs to think out the partitioning of commits themselves and squash appropriately if necessary.

@HeartSaVioR
Copy link
Contributor

I'm totally +1 to this approach, even though I think script should be modified to Storm's project style.

Like I said to dev@ mailing list, I have been doing reviewing and merging pending pull requests for weeks and months, and it was painful enough to merge and port back to each branches, even though I ignored cleaning up commits. (Pain is amplified when tiny commit should be merged to all branches.) If I want to clean up commits before merging it should be more painful. CHANGELOG is subject to not in sync among branches, but we need to write it manually because it's hard to filter merge commits to see the change list. (We could just rely on JIRA issues for alternative.)

Regarding commits, I don't want to keep commits like 'kicking travis', 'address review comments', etc. which is not helpful at any chances. For my last 2 years of development of Storm, I didn't utilize individual commit. If something is wrong with recent merge, we rollback the merge, not individual commit. Squashing commits is widely used strategy and already shows success story to many big projects. Even Github provides the squash merge mode (recently rebase mode too) in GUI.

If we want to merge in squashed commit, it should be done in merging process, not reviewing process. For me, ideal review process should be contributor-friendly. While we can't put efforts to only maintain Storm project (by reviewing pull requests, etc.), contributors also can't. Once we create a script which also squashes the commits, we don't need to make contributors bothering with rebasing and squashing commits. If not, all individuals including us should do it just after merger said 'please rebase and squash in order to merge.', which is also bothering for mergers, too. Moreover, there's a chance for contributors to be busy at the moment, and pull request goes stale. Pull requests which need upmerging are the case.

I understand and agree the authorship issue, but we can treat it as exceptional case. Many pull requests are authored by one.

Let's make merging phase as painless, or at least less painful thing.

@ptgoetz
Copy link
Member

ptgoetz commented Oct 4, 2016

I'll also point out that the "if other Apache projects do it, it is oaky" stance is particularly dangerous. PMC members must understand ASF policy and not rely on what other projects do. If what another project does is wrong, then doing the same thing in our project just introduces liability.

@ptgoetz
Copy link
Member

ptgoetz commented Oct 4, 2016

I'm okay with automating the merge process, just not the way it is implemented here. Perhaps we shouldmove the discussion to the dev list.

@harshach
Copy link
Contributor Author

@knusbaum maintaining the script shouldn't push us from adopting a better approach. It's another piece of code that we need to maintain just like entire repo that we are maintaining right now.
Lets continue this discussion over mailing list.

d2r pushed a commit to d2r/storm that referenced this pull request Oct 16, 2018
We are closing stale Pull Requests to make the list more manageable.

Please re-open any Pull Request that has been closed in error.

Closes apache#608
Closes apache#639
Closes apache#640
Closes apache#648
Closes apache#662
Closes apache#668
Closes apache#692
Closes apache#705
Closes apache#724
Closes apache#728
Closes apache#730
Closes apache#753
Closes apache#803
Closes apache#854
Closes apache#922
Closes apache#986
Closes apache#992
Closes apache#1019
Closes apache#1040
Closes apache#1041
Closes apache#1043
Closes apache#1046
Closes apache#1051
Closes apache#1078
Closes apache#1146
Closes apache#1164
Closes apache#1165
Closes apache#1178
Closes apache#1213
Closes apache#1225
Closes apache#1258
Closes apache#1259
Closes apache#1268
Closes apache#1272
Closes apache#1277
Closes apache#1278
Closes apache#1288
Closes apache#1296
Closes apache#1328
Closes apache#1342
Closes apache#1353
Closes apache#1370
Closes apache#1376
Closes apache#1391
Closes apache#1395
Closes apache#1399
Closes apache#1406
Closes apache#1410
Closes apache#1422
Closes apache#1427
Closes apache#1443
Closes apache#1462
Closes apache#1468
Closes apache#1483
Closes apache#1506
Closes apache#1509
Closes apache#1515
Closes apache#1520
Closes apache#1521
Closes apache#1525
Closes apache#1527
Closes apache#1544
Closes apache#1550
Closes apache#1566
Closes apache#1569
Closes apache#1570
Closes apache#1575
Closes apache#1580
Closes apache#1584
Closes apache#1591
Closes apache#1600
Closes apache#1611
Closes apache#1613
Closes apache#1639
Closes apache#1703
Closes apache#1711
Closes apache#1719
Closes apache#1737
Closes apache#1760
Closes apache#1767
Closes apache#1768
Closes apache#1785
Closes apache#1799
Closes apache#1822
Closes apache#1824
Closes apache#1844
Closes apache#1874
Closes apache#1918
Closes apache#1928
Closes apache#1937
Closes apache#1942
Closes apache#1951
Closes apache#1957
Closes apache#1963
Closes apache#1964
Closes apache#1965
Closes apache#1967
Closes apache#1968
Closes apache#1971
Closes apache#1985
Closes apache#1986
Closes apache#1998
Closes apache#2031
Closes apache#2032
Closes apache#2071
Closes apache#2076
Closes apache#2108
Closes apache#2119
Closes apache#2128
Closes apache#2142
Closes apache#2174
Closes apache#2206
Closes apache#2297
Closes apache#2322
Closes apache#2332
Closes apache#2341
Closes apache#2377
Closes apache#2414
Closes apache#2469
d2r pushed a commit to d2r/storm that referenced this pull request Oct 16, 2018
We are closing stale Pull Requests to make the list more manageable.

Please re-open any Pull Request that has been closed in error.

Closes apache#608
Closes apache#639
Closes apache#640
Closes apache#648
Closes apache#662
Closes apache#668
Closes apache#692
Closes apache#705
Closes apache#724
Closes apache#728
Closes apache#730
Closes apache#753
Closes apache#803
Closes apache#854
Closes apache#922
Closes apache#986
Closes apache#992
Closes apache#1019
Closes apache#1040
Closes apache#1041
Closes apache#1043
Closes apache#1046
Closes apache#1051
Closes apache#1078
Closes apache#1146
Closes apache#1164
Closes apache#1165
Closes apache#1178
Closes apache#1213
Closes apache#1225
Closes apache#1258
Closes apache#1259
Closes apache#1268
Closes apache#1272
Closes apache#1277
Closes apache#1278
Closes apache#1288
Closes apache#1296
Closes apache#1328
Closes apache#1342
Closes apache#1353
Closes apache#1370
Closes apache#1376
Closes apache#1391
Closes apache#1395
Closes apache#1399
Closes apache#1406
Closes apache#1410
Closes apache#1422
Closes apache#1427
Closes apache#1443
Closes apache#1462
Closes apache#1468
Closes apache#1483
Closes apache#1506
Closes apache#1509
Closes apache#1515
Closes apache#1520
Closes apache#1521
Closes apache#1525
Closes apache#1527
Closes apache#1544
Closes apache#1550
Closes apache#1566
Closes apache#1569
Closes apache#1570
Closes apache#1575
Closes apache#1580
Closes apache#1584
Closes apache#1591
Closes apache#1600
Closes apache#1611
Closes apache#1613
Closes apache#1639
Closes apache#1703
Closes apache#1711
Closes apache#1719
Closes apache#1737
Closes apache#1760
Closes apache#1767
Closes apache#1768
Closes apache#1785
Closes apache#1799
Closes apache#1822
Closes apache#1824
Closes apache#1844
Closes apache#1874
Closes apache#1918
Closes apache#1928
Closes apache#1937
Closes apache#1942
Closes apache#1951
Closes apache#1957
Closes apache#1963
Closes apache#1964
Closes apache#1965
Closes apache#1967
Closes apache#1968
Closes apache#1971
Closes apache#1985
Closes apache#1986
Closes apache#1998
Closes apache#2031
Closes apache#2032
Closes apache#2071
Closes apache#2076
Closes apache#2108
Closes apache#2119
Closes apache#2128
Closes apache#2142
Closes apache#2174
Closes apache#2206
Closes apache#2297
Closes apache#2322
Closes apache#2332
Closes apache#2341
Closes apache#2377
Closes apache#2414
Closes apache#2469
@asfgit asfgit closed this in #2880 Oct 22, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants