Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support squash and merge as well as fast-forward and merge #194

Open
jniles opened this issue May 25, 2017 · 30 comments

Comments

@jniles
Copy link
Contributor

commented May 25, 2017

It would be nice if the commits in a PR could be squashed with bors instead of simply fast fowarded. This would allow projects to accept PRs even if the contributor had not followed commit guidelines correctly, without having to ask the contributor to rebase and squash correctly. I imagine this implementation would have a "merge type" flag in bors.toml that might change the default behavior to squash-and-merge.

@notriddle

This comment has been minimized.

Copy link
Member

commented May 25, 2017

Is this a duplicate of #138? Or would you still like the merge commit at the end, just with the user's PR commits squished?

@jniles

This comment has been minimized.

Copy link
Contributor Author

commented May 25, 2017

I had seen #138, but wasn't sure if it included squashing the individual PR's commit history. If I understand correctly, if a PR has two commits on it is merged, it will look like this:

Commit #2 - PR Author
|
Commit #1 - PR Author
|
Original Master Head

Ideally, it would be nice to squash the commits,

Merge #PR (Comment #1 and #2 Inside)
|
Original Master Head

I can see that this might not be the bors-ng way though, if PRs are to be batched. It also would destroy the contributor's commit history, since @bors[bot] would always be the author. The goal though, would be to remove the burden of correct commit message syntax from the contributor since their messages would be collapsed anyway.

@notriddle

This comment has been minimized.

Copy link
Member

commented May 25, 2017

I had seen #138, but wasn't sure if it included squashing the individual PR's commit history. If I understand correctly, if a PR has two commits on it is merged, it will look like this

Yeah, that's how I imagined #138 working,

Ideally, it would be nice to squash the commits,

I see what you mean, now. Yeah, they're not the same thing. They both constitute "messing with the history," but they're not the same.


What we end up with is this matrix:

Squash No squash
Rebase ? #138
No rebase #194 Current

Both involve messing with the history, so they have some implementation overlap, but you could rebase the commits without squashing them (which means you don't need a merge commit) and you could squash the commits without rebasing them (which means you still need a merge commit).

Do you think we'd need to implement the whole matrix, though?

@jniles

This comment has been minimized.

Copy link
Contributor Author

commented May 27, 2017

Since I'm just getting my feet wet with bors-ng, I'm hesitant to recommend a lot of development in this area. My original motivation for this was to make sure the history is clean, but having a history of Merge #3, Merge #2, Merge #1 is likely not informative.

I'll get a little more experience playing around with this before coming up with a solid proposal. I do appreciate your time considering it though!

@notriddle

This comment has been minimized.

Copy link
Member

commented May 27, 2017

I do agree that the commit history could use some work, and doing a better job of it is definitely getting top priority over other feature development.

@ebkalderon

This comment has been minimized.

Copy link

commented Sep 7, 2017

I would love to have squash-and-merge support in Bors. It's a very desirable feature, considering how much clutter conventional merge commits tend to leave behind. I personally use this pattern frequently in Amethyst. I'm in the process of integrating Bors right now, and the lack of squash-and-merge feels like a step backward to me, honestly.

@zimbatm

This comment has been minimized.

Copy link
Contributor

commented Jan 4, 2018

I would up for giving it a go, can you give me pointers at where to look in the source code to get an idea of the work involved?

@notriddle

This comment has been minimized.

Copy link
Member

commented Jan 4, 2018

Right now, the place I'd start is the Batcher. That's what actually drives the queue forward, calling all the other parts that create the commit and receiving the events when the CI system builds the merge commit.

@kamalmarhubi

This comment has been minimized.

Copy link
Contributor

commented Jan 27, 2018

Before I spend too much time on #138, I wanted to think about this matrix a bit more. I think it's actually a 2x2x2 matrix, not a 2x2 matrix. There are three orthogonal options:

  • squash vs not
  • merge vs not
  • rebase vs not

Merge vs no merge means the creation of a merge commit. A fast-forward merge, eg if the base branch hasn't moved since the PR, involves neither a merge nor a rebase. However it's not possible in the general case where PRs are opened and approved concurrently, so I'm ignoring cells 0 and 4 below, which are marked with asterisks.

Lacking three dimensional tables in GitHub-flavored markdown, I'll just make two 2x2 tables. I semi-arbitrarily split them by squash no/squash. Some cells are not actually accessible in the general case, marked with asterisk.

No squash

No merge Merge
No rebase 0* 1
Rebase 2 3

Squash

No merge Merge
No rebase 4* 5
Rebase 6 7

(There are graphs of all these modes below.)

Deciding which combinations are useful

Here are a few properties that I think are good to consider:

  • green history
  • original commits
  • tidy history

Green history: this is what bors gets you: every commit in git rev-list --first-parent master will have passed CI. This is a really great property, as it naturally encodes the tested commits in the commit history.

Original commits: some projects may prefer to never rewrite commits from contributors. I can see this being true if, eg, the commits were signed.

Tidy history: "how wide is git log --graph?" some projects may prefer to keep their commit history linear or mostly linear. This can make the output of git log easier to work with in some cases.

Now, comparing the options! 0, 4 are not interesting because they aren't generally applicable.

My take on the useful combinations, in my personal order of preference:

  • 6 [green history, tidy history] for most use cases
  • 3 [green history, tidy history] for cases where the author or approver would like to preserve the intervening commits. Eg if vendoring and patcing a library, it can be nice to add the file unchanged in one commit, and the patch as a second. This way the local changes are easily visible in the history.
  • 1 [green history, original commits] is the current (and reasonable!) default and so no implementation work is needed

The others are less useful in my opinion:

  • 2 loses the green history property
  • 5 strictly worse than 1: you lose original commits, and still have a messy history.
  • 7 similarly, I don't see the point of this: if you're going to squash, the merge bubbles are not adding any useful structure to the commit history

All the graphs

There are pictures here, as well as rough git commands to achieve them. The git commands are intended to set up the staging branch.

Assume default branch is master and that pr1 through pr3 are commit sha1s of branch HEADs for the first, second, and third branch in a batch. All this assumes that the default branch is never rewound, ie, its history is append-only, so that the --fork-point mode of git merge-base is never needed.

All steps preceded with

git fetch origin
git branch --force staging.tmp origin/master
git checkout staging.tmp

and followed by

`git push --force origin staging.tmp:staging`.

Any time git rebase --interactive is used, it means we are squashing commits.

0 only fast forward merge

Only works with batch size one; may fail if master has moved on.

Rough git command equivalent:

git merge --ff-only pr

What it looks like:

              E---F---G  PR
             /                   ---->      A---B---C---D---E---F---G  master
A---B---C---D  master



          E---F---G  PR
         /                       ---->     <FAIL>
A---B---C---D  master

1 octopus merge

This is what bors does now.

Rough git command equivalent:

git merge --strategy=octopus --no-ff pr1 pr2 pr3

What it looks like:

            --E---F---G  PR1                           --E---F---G---,
           /                                          /               \
          /                                          /                 |
         /    H---I---J  PR2                        /    H---I---J--,  |
        /    /                                     /    /            \ |
       /    |                                     /    |              \|
      /     | K---L---M  PR3                     /     | K---L---M---, |
     /      |/                   ---->          /      |/             \|
A---B---C---D  master                      A---B---C---D---------------o  master

2 rebase without squashing, fast-forward merge

Rough git command equivalent:

for pr in pr1 pr2 pr3; do
  git checkout $pr
  git rebase staging.tmp
  git checkout staging.tmp
  git merge --ff-only $pr
done

What it looks like:

            --E---F---G  PR1
           /
          /
         /    H---I---J  PR2
        /    /
       /    |
      /     | K---L---M  PR3
     /      |/
A---B---C---D  master

                  |
                  |
                  v

A---B---C---D---E'--F'--G'--H'--I'--J'--K'--L'--M'  master

3 rebase without squashing, non-fast-forward merge

Rough git command equivalent:

for pr in pr1 pr2 pr3; do
  git checkout $pr
  git rebase staging.tmp
  git checkout staging.tmp
  git merge --no-ff $pr
done

What it looks like:

            --E---F---G  PR1
           /
          /
         /    H---I---J  PR2
        /    /
       /    |
      /     | K---L---M  PR3
     /      |/
A---B---C---D  master

                  |
                  |
                  v

              E'--F'--G'  H'--I'--J'  K'--L'--M'
             /         \ /         \ /         \
A---B---C---D-----------o'----------o-----------o  master

4 squash and fast-forward merge.

Only works with batch size one; may fail if master has moved on.

Rough git command equivalent:

git checkout pr
rebase --interactive $(git merge-base staging.tmp pr)
git checkout staging.tmp
git merge --ff-only pr

What it looks like:

              E---F---G  PR
             /                   ---->      A---B---C---D---[EFG]  master
A---B---C---D  master





          E---F---G  PR
         /                       ---->     <FAIL>
A---B---C---D  master

5 squash and octopus merge

Rough git command equivalent:

for pr in pr1 pr2 pr3; do
  git checkout "$pr"
  git rebase --interactive $(git merge-base staging.tmp $pr)
done
git checkout staging.tmp
git merge --no-ff pr1 pr2 pr3

What it looks like:

            --E---F---G  PR1                           ,---[EFG]-,
           /                                          /           \
          /                                          /             |
         /    H---I---J  PR2      ---->             /    ,-[HIJ]-, |
        /    /                                     /    /         \|
       /    |                                     /    |           |
      /     | K---L---M  PR3                     /     | ,-[KLM]-, |
     /      |/                                  /      |/         \|
A---B---C---D  master                      A---B---C---D-- --------o  master

6 squash and rebase, fast-forward merge

Rough git command equivalent:

for pr in pr1 pr2 pr3; do
  git checkout $pr
  git rebase --interactive staging.tmp
  git checkout staging.tmp
  git merge --ff-only $pr
done

What it looks like:

            --E---F---G  PR1
           /
          /
         /    H---I---J  PR2
        /    /
       /    |
      /     | K---L---M  PR3
     /      |/
A---B---C---D  master

                  |
                  |
                  v

A---B---C---D---[EFG]---[HIJ]---[KLM]  master

7 squash and rebase, with merge --no-ff for merge bubbles

Rough git command equivalent:

for pr in pr1 pr2 pr3; do
  git checkout $pr
  git rebase --interactive staging.tmp
  git checkout staging.tmp
  git merge --no-ff $pr
done

What it looks like:

            --E---F---G  PR1
           /
          /
         /    H---I---J  PR2
        /    /
       /    |
      /     | K---L---M  PR3
     /      |/
A---B---C---D  master

                  |
                  |
                  v

              ,-[EFG]-,   ,-[HIJ]-,   ,-[KLM]-,
             /         \ /         \ /         \
A---B---C---D-----------o'----------o-----------o  master
@jniles

This comment has been minimized.

Copy link
Contributor Author

commented Jan 28, 2018

@kamalmarhubi , this is a really great analysis! Having seen these diagrams, I agree with you that #6 is the most desirable.

@zimbatm

This comment has been minimized.

Copy link
Contributor

commented Jan 29, 2018

Agreed, only #1 and #6 are really useful. #1 is great because it preserves the original commits. If you're willing to touch the history one might as well work on PR chunk sized and #6 does that.

@notriddle

This comment has been minimized.

Copy link
Member

commented Jan 30, 2018

@kamalmarhubi are you having trouble with anything else related to squash-and-rebase-and-fast-forward-merge?

@zimbatm

This comment has been minimized.

Copy link
Contributor

commented Jan 31, 2018

What would happen if bors r+ was expanded with merge or squash attributes to override the default behavior? Then potentially you could have a batch that contain both types.

By the way, squashed PRs should be tracked and closed by the batch processor because github won't detect it as merged.

@notriddle

This comment has been minimized.

Copy link
Member

commented Jan 31, 2018

What would happen if bors r+ was expanded with merge or squash attributes to override the default behavior?

That would be fine. I don't think it's necessary for the first implementation, though.

By the way, squashed PRs should be tracked and closed by the batch processor because github won't detect it as merged.

Yeah, and the title should probably be changed to have a "[MERGED]" prefix or something.

@zimbatm

This comment has been minimized.

Copy link
Contributor

commented Feb 20, 2018

@kamalmarhubi any progress on this? I want to avoid re-implementing if it's already in progress.

@kamalmarhubi

This comment has been minimized.

Copy link
Contributor

commented Apr 30, 2018

Apologies for the really long delay here, without any update. I started off trying to figure out how to rebase in github without requiring the repo on the bors-ng machine. I took a wrong turn and briefly went down a path of trying to use Github as a libgit2 backend, which had associated yaks. Here's a quick update on how I think this can work.

First off, the problem: the github API lets you merge commits via the API, and lets you create new commits from a given tree with arbitrary parents. These are both used by bors-ng now. I realised you can get the result of a rebase via those operations. To rebase+squash a PR is easy:

  1. reset staging.tmp to master
  2. create a merge with base staging.tmp and head pr.head.sha (this updates staging.tmp)
  3. create a commit with parent master, the tree of staging.tmp, author and message from the PR, and bors as comitter
  4. reset staging.tmp to the commit from the previous step

To rebase without squashing (to achieve case 3 in my earlier comment) is a little more tedious, as you have to do these steps one commit at a time. But that's what loops are for.

There are a couple of open questions that I'd appreciate thoughts on:

  • if squashing a PR that includes commits from more than one author, which do you use for the squashed commit? Or should the squash strategy simply not be allowed for that case? (this can happen if the original PR author stops responding, and someone else picks it up and finishes it)
  • how should signed commits be handled? (this applies to rebase with or without squashing)
  • how should PRs be closed? I could see a few options:
    • change title and close with a comment pointing to the commit for that PR
    • if the PR author allows maintainers to push to their branch, you could push the commit to their branch so that Github's auto-merge marking works. (This is a bit iffy: it'll involve force pushing to a branch in someone else's fork, which seems pretty rude.)

In terms of actually implementing this I'm looking at the code again to see where this would fit in. My current thought is to extract the code that builds the staging / trying commits into an interface like create_candidate or something. This is really fuzzy right now, because ideally the way this gets implemented doesn't block later adding the r+ squash / r+ merge on a per-PR basis as @zimbatm suggested.

I'll write some more thoughts down tomorrow, but just wanted to get this out there. :-)

@notriddle

This comment has been minimized.

Copy link
Member

commented Apr 30, 2018

if squashing a PR that includes commits from more than one author, which do you use for the squashed commit? Or should the squash strategy simply not be allowed for that case? (this can happen if the original PR author stops responding, and someone else picks it up and finishes it)

You can use the co-author trailers that're used for the merge commits now.

how should signed commits be handled? (this applies to rebase with or without squashing)

We should add a general feature for signing commits, including signing the merge commits that are being made now, with a bors-private key. Not sure how to do that, but in any case, it doesn't seem really related to rebasing.

how should PRs be closed?

Do what homu does now; close the PR and change the title. The pushing trick seems rude, as you say, and it falls apart if the PR submitter unchecks the box that would allow it.

@kamalmarhubi

This comment has been minimized.

Copy link
Contributor

commented Apr 30, 2018

it doesn't seem really related to rebasing.

If an author signs their commits, rebasing will lose their signature.

@notriddle

This comment has been minimized.

Copy link
Member

commented Apr 30, 2018

Yeah, that's true. That's true even if you rebase from the GitHub merge button, though.

@kamalmarhubi

This comment has been minimized.

Copy link
Contributor

commented Apr 30, 2018

Fair point!

@kamalmarhubi

This comment has been minimized.

Copy link
Contributor

commented May 1, 2018

Ok, some more thoughts on how to implement this! I was thinking of making a type that represents an operation, eg, something like

  • {:merge, [pr1, pr2, pr3]}—the current operation
  • {:squash, pr}—the rebase operation described above
  • {:rebase_merge, pr}—the "case 3" rebase-without-squashing operation

Now the batcher picks up the waiting PRs and creates a list of operations. PRs are sorted by time of r+ (or the later of r+ and reaching required_approvals if #390 is implemented). The initial implementation will just just take a default from bors.toml, but a later iteration could allow each PR to specify how it wants to be landed. This list then gets processed one operation at a time by some other "thing", to form the batch's commit. (I really don't know how to structure stuff in Elixir!)

I have not yet thought about how this affects the try / attemptor stuff.

@notriddle does this seem like a reasonable appraoch?

@notriddle

This comment has been minimized.

Copy link
Member

commented May 1, 2018

That all sounds good to me.

@rjmk

This comment has been minimized.

Copy link

commented May 1, 2018

Hi @kamalmarhubi! Thanks for looking into this. At work we're pretty keen on 3, (or rebasing each branch and then octopus merging). It sounds like this might not be your top priority. If you end up implementing something that doesn't quite cover 3, it would be great if you'd leave a comment suggesting what would be needed to get 3 working and any pointers to the codebase where we could pick it up.

Thanks!

(The alternative rebase strategy we were thinking about was like:

o---o---o---M---------------M-----------M-------M  master
            |\             /|\         /|\     /
            | --o---o---o-- | ----o---- | --o--
            |\             /|\         / 
            | ----o---o---- | --o---o--  
             \             /
              --o---o---o--

)

@IsaacWoods IsaacWoods referenced this issue May 24, 2018
Merged
@kamalmarhubi

This comment has been minimized.

Copy link
Contributor

commented May 26, 2018

@rjmk I can't tell exactly what's going on in your diagram. I hope that it won't be hard to implement on top of what I'm doing.

@cmr

This comment has been minimized.

Copy link

commented Jun 1, 2018

@kamalmarhubi how's it going? I'm very interested in using this :)

@marcoabreu

This comment has been minimized.

Copy link

commented Jul 12, 2018

Same! We'd like to use method 6 in https://github.com/apache/incubator-mxnet/

@jerry-rubikloud

This comment has been minimized.

Copy link

commented Sep 18, 2018

@kamalmarhubi Any update on this issue? This issue is blocking us from using bors to projects that want to keep clean history and I think many people are actually waiting for it as well. If you can provide us with a update, that would be fantastic. Thanks!

@jerry-rubikloud

This comment has been minimized.

Copy link

commented Sep 18, 2018

@notriddle Is this feature will be added to bors soon? Just want to know if there is a timeline for this feature to add to this wonderful tool. Thank you!

@vvuk

This comment has been minimized.

Copy link

commented Feb 25, 2019

Likewise very interested in this -- happy to support development or help in any way needed.

@MarkSRobinson

This comment has been minimized.

Copy link

commented Jun 19, 2019

My company is very interested in the squash and merge functionality. Is someone currently working on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
You can’t perform that action at this time.