Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PR Merge strategy #554

Closed
gotmax23 opened this issue Oct 10, 2023 · 12 comments
Closed

PR Merge strategy #554

gotmax23 opened this issue Oct 10, 2023 · 12 comments
Labels
needs_triage Needs a first human triage before being processed. tooling This PR affects tooling (CI, pr_labeler, noxfile, linters, etc.) but not the docs builds themselves.

Comments

@gotmax23
Copy link
Collaborator

gotmax23 commented Oct 10, 2023

Currently, the repositories pull request merge strategy settings (https://github.com/ansible/ansible-documentation/settings#merge-button-settings) are as follows:

PR merge strategies

We usually use squash merges which takes the PR and compresses it down into one commit which is applied on top of the repository. For many PRs, this works out well, but for others, such as #529 and #524, each commit has its own semantic purpose which should be preserved instead of squashing the PR into one commit. In these cases, we could use the rebase strategy which takes each commit from the PR and applies it separately. This poses a problem, because patchback does not support (sanitizers/patchback-github-app#35) the rebase strategy, so we cannot use these PRs with the backport-* labels that we rely on. Merge commits do not have the problems.

Therefore, I suggest enabling merge commits and disabling rebases. For most PRs, we should keep using squash to keep the Git history clean, but we should enable merge commits so we can use them when it makes sense to keep commits separated and preserve metadata.

@gotmax23 gotmax23 added the tooling This PR affects tooling (CI, pr_labeler, noxfile, linters, etc.) but not the docs builds themselves. label Oct 10, 2023
@github-actions github-actions bot added the needs_triage Needs a first human triage before being processed. label Oct 10, 2023
@oraNod
Copy link
Contributor

oraNod commented Oct 11, 2023

Hmm. I guess I'm going to be annoying here but I dislike merge commits because they add some extra noise to log files.

As you say the current strategy works well with most PRs. Is it worth changing strategy to accommodate cases like #529 and #524 which are more related to tooling? We can just backport those manually and don't really need patchback imo.

@webknjaz
Copy link
Member

webknjaz commented Oct 11, 2023

Personally, I strongly dislike non-merge commits because squashes completely ruin carefully curated history and replace it with something that often makes zero sense. Such commits are harmful, as people who do squashes more often than not forget to compose meaningful commit titles and descriptions.

https://blog.jaraco.com/dont-use-squash-and-merge/

@gotmax23
Copy link
Collaborator Author

gotmax23 commented Oct 11, 2023

Hmm. I guess I'm going to be annoying here but I dislike merge commits because they add some extra noise to log files.

I don't think merge commits are that bad. There's always git log --graph or various GUIs if you want a better way to visualize changes. I think it's useful that, unlike rebases, merges preserve metadata about where the commits came from.

As you say the current strategy works well with most PRs. Is it worth changing strategy to accommodate cases like #529 and #524 which are more related to tooling? We can just backport those manually and don't really need patchback imo.

I'd prefer not to have to use a separate backport workflow for those PRs. The tooling PRs are almost always backported, as I want to have a consistent developer experience and CI results across branches.


Personally, I strongly dislike non-merge commits because squashes completely ruin carefully curated history and replace it with something that often makes zero sense. Such commits are harmful, as people who do squashes more often than not forget to compose meaningful commit titles and descriptions.

I strongly agree. If the PR's git history is messy to begin with, squash commits are probably better than "exporting" that mess onto main, but otherwise, they make the final history confusing and not particularly useful.

@samccann
Copy link
Contributor

squash commits - I'm against removing this. We get a lot of noise in the individual commits of a PR. I create some of that myself with "fix spelling error" "fix ci problem" "doh, really fix ci problem this time" etc etc. And I'm not alone in this. I see a lot of individual commit messages that are just noise, not something specific like the two example PRs noted in the description above.
While I could educate myself and create better commit messages each time, we're still left with a lot of noise from me and other docs contributors. Is it really important to the commit log to know each spelling error a contributor fixed? each spacing problem fixed, etc? Not to mention commiting a batch of suggested edits in the github UI creates a useless description in the commit message as well.

What this feels like to me just reading the comments - coders are better at having solid individual commits (and messages) that are worth keeping. Docs contributors are not.

So we either stay the way we are, and the few code commits are handled differently for manual backports...(onus on coders)
or we remove squash and merge, and our git log becomes an unreadable mess.

I'm also unsure how a manual backport works if the bot can't backport a docs PR. Today, I know I can go into the log, search for the PR # and find one commit that I cherry-pick. If the bot can't backport, then in the new world, I'd have to I guess find the merge commit, then scroll down and manually cherry-pick multiple commits, ensuring I do them in 'the correct order' etc etc.

So, it feels painful either way - either painful for the folks here maintaining the tooling/build stuff, or painful for the writers, especially if the backport bot can't auto backport for us.

@gotmax23
Copy link
Collaborator Author

I'm proposing enabling both squashes and merges. We can keep using squashing for most docs changes and fallback to merge commits where they make sense. Github allows choosing one or the other when merging. Both of these options work with Patchback.

@acozine
Copy link
Contributor

acozine commented Oct 11, 2023

I'm in favor of only allowing squash-commits when merging PRs (leaving the repo as it is now).

For folks who do not keep branches full of beautifully crafted commit messages, the current policy enforces at least a little tidiness.

For folks who are merging PRs, including PRs from folks who craft precise and useful commit messages, the current policy encourages good descriptions at the level of the whole PR. For example, in #524 one commit enables spellcheck and the other fixes existing spelling mistakes. But they were changed in the same PR for a reason - they enable a test and ensure that it passes. The PR commit, once merged, would ideally reflect that. For each PR, why were these changes made, and why were they made together? So something like "enables spellcheck with clean tests". One PR, one goal, one commit.

@gotmax23
Copy link
Collaborator Author

I'm in favor of only allowing squash-commits when merging PRs (leaving the repo as it is now).

Currently, we allow squash commits and rebases. I'm proposing allowing squash commits and merge commits instead.

For folks who are merging PRs, including PRs from folks who craft precise and useful commit messages, the current policy encourages good descriptions at the level of the whole PR. For example, in #524 one commit enables spellcheck and the other fixes existing spelling mistakes. But they were changed in the same PR for a reason - they enable a test and ensure that it passes. The PR commit, once merged, would ideally reflect that. For each PR, why were these changes made, and why were they made together? So something like "enables spellcheck with clean tests". One PR, one goal, one commit.

Commit messages and PR descriptions are different things. Commit messages provide context for each atomic change and refer to the code itself. PR descriptions provide rationale for the change as a whole and may include details or context for the reviewer that aren't appropriate for the git history itself. Squashing all the commit messages into one big blob takes away their value and makes it difficult to understand to what they're referring.

For the reasons @samccann and others mentioned, I think we should keep squashing as the default, but I'd like to allow merge commits for tooling/code and other changes where recording the individual commits is valuable.

Re. the enabling spell check example, splitting the commits to configure the rules and to actually apply the rules makes it much easier to understand what's going on. Squashing them into a large commit obscures the configuration changes by mixing them together with the large amount of changes to fix the spelling errors. They definitely need to be part of the same PR, though, as we wouldn't want to merge tests that fail.

@samccann
Copy link
Contributor

Okay if I'm reading the comments correctly, we can have our cake and squash it too :-)

I can keep squash and merge for docs PRs. @gotmax and others can choose to do merge commits on those PRs that make sense (likely most of the tooling/doc-build prs).

The patchbot works with both, and if I get a backport that fails, I can still go in and cherrypick the one squashed commit on docs prs.. right?

@gotmax23
Copy link
Collaborator Author

Okay if I'm reading the comments correctly, we can have our cake and squash it too :-)

I can keep squash and merge for docs PRs. @gotmax and others can choose to do merge commits on those PRs that make sense (likely most of the tooling/doc-build prs).

The patchbot works with both, and if I get a backport that fails, I can still go in and cherrypick the one squashed commit on docs prs.. right?

Yes, exactly!

@gotmax23
Copy link
Collaborator Author

gotmax23 commented Oct 11, 2023

I've gone ahead and applied the settings changes. The default behavior (squashing) should stay the same as before. You can always click the dropdown menu to change the merge strategy when applying a PR:

Dropdown to select merge strategy

We can of course modify the settings if we change our minds :).

@webknjaz
Copy link
Member

@samccann

While I could educate myself and create better commit messages each time, we're still left with a lot of noise from me and other docs contributors.

Educating oneself is always a good idea. Regarding the noise, that's fixable with responsible squash merges that don't leave bad commit messages in place, I discuss some of this below.

I happen to have a collection of resources in my Gists, on the topic. Feel free to browse them and ask me questions if you're interested. They are very comprehensive, so you'd probably want to pick random links from there, whatever feels important:

There's also linters for commits, like Gitlint that have native integrations with code editors and IDEs and can make hints as you compose commits.

Is it really important to the commit log to know each spelling error a contributor fixed? each spacing problem fixed, etc?

It is, because the history is only useful when it brings value. It's important to understand who's “the audience” of your commit messages. It's rarely a present-time-you. It's you-in-the-future. It's some Git paleontologist trying to understand what was happening, the motivations and stuff, five years from now. This is the audience that will be either grateful or frustrated when then get to actually needing to go through history. This is not the kind of thing you can know and disregard upfront.

I know that it's hard to describe some of the seemingly trivial patches. I hear you! Though, having understanding of why this is important and what's expected there could help with this. For example, many people think of the wrong kind of context to mention in commits. Most of the time, it's enough to answer the question “What's the effect of this change is? Where's it applied?”

With that, it's possible to have commit templates for typical kinds of trivial commits. For typo fixes, I like specifying where the typo was (the scope / document / dir) and what was it. If it's a mass-fix of the same typo, I'd probably put that into the commit title and mention the scope in the long part of the commit description. If it's multiple different typos in one specific document, maybe the document would go into the title instead.
Either way, such context is much more useful than not having any.

Providing such commit messages would allow somebody looking at the Git tree / graph, make educated guesses regarding which commits to skip and which commits to look into deeper, when they are looking for something. Same with git blame.

Not to mention commiting a batch of suggested edits in the github UI creates a useless description in the commit message as well.

That's up to the person accepting those suggested changes, really. GitHub web UI allows typing in a proper commit message. The mobile app, sadly, doesn't. You can also have multiple separate batches and group the suggestions into individual commits.

Though, I can understand that in the case of docs, you may opt into using a squash “merge”. And this is fine for as long as it's done responsibly.

What this feels like to me just reading the comments - coders are better at having solid individual commits (and messages) that are worth keeping. Docs contributors are not.

I think there's some truth in this observation. Though, I believe that gaining the understanding of why this is important and how to do better has a greater effect on what people can do, than the type of contribution.

So we either stay the way we are, and the few code commits are handled differently for manual backports...(onus on coders)
or we remove squash and merge, and our git log becomes an unreadable mess.

I've demonstrated below that it's not the case at all... Also, for backports, I think Maxwell may have confused you into thinking that he suggested backporting individual commits separately. AFAIK nobody does that — each backport is usually one commit regardless of what form it ended up in on the devel branch. The backport context was brought up solely because of the patchback's current capabilities.

One thing to mention — I saw some projects giving the choice to the contributors in how they want their PRs to be handled. They'd use a checkbox in PRs to specify if they prefer their PR branches to be squashed or they'd be willing to make them look presentable. I think, it's a good idea — letting people declare the expectations transparently and going from there.

The patchbot works with both, and if I get a backport that fails, I can still go in and cherrypick the one squashed commit on docs prs.. right?

patchback uses a regular git cherry-pick -x basically (with -m1 for merge commits), so it always produces a single resulting backported commit. It just doesn't know how many commits ended up on the target branch with rebase “merges” so it'll attempt to pick just the last one. Eventually, I'll fix that too. So for now, it's good to have that one disabled, just like @gotmax23 suggested.

@acozine

For folks who do not keep branches full of beautifully crafted commit messages, the current policy enforces at least a little tidiness.

That's not really the case, so I'll probably leave some examples of when having the “squash merge” button available didn't improve anything, from this same repository we're talking about. Here are some commits produced by merging PRs where whoever clicked “merge” didn't seem to feel the responsibility for their commit:

solves issue
Spellingfix
update banner html
Update ansible_tips_tricks.rst
minor ref fix issue
ansible_tips_trics: update
Update README.md
Create README.md
Update regex_* docs

* Added references to inline regex flags
update repo name in conf

Whoever ends up merging a PR, is ultimately responsible for editing out garbage and making sure the history makes sense. With squash “merges”, it's achieved by composing a better commit message, since GitHub just dumps bits of individual commits into one but doesn't magically make them nice. For true merges, this means making sure that the individual commits are well-maintained and optionally writing a summary in that merge commit.
This is when one may look at those commits and decide to use the real merge strategy vs. squashing. And I agree that squashing may be the best solution, since most maintainers wouldn't have time to fix individual commits in the PR branches. But then, making that new resulting commit look good is still their responsibility — just because nobody else would have power/access to edit the immutable thing afterward.

So to re-iterate — no, mandating squash merges doesn't enforce tidiness, it enforces the absence of merge commits (aka “linear history”) which is orthogonal to the problem and doesn't equate tidiness or messiness — these are the things connected to what humans do, not what programs generate.

One PR, one goal, one commit.

I'd like to separately comment on this. Maxwell is right — commits don't equate PRs. It's just implemented in a way that PR descriptions are sometimes populated from commits. I like to explain this through different levels of abstractions/atomicity — there's a concept of atomic commits that are self-sufficient bundles of changes. But for complex changes, they may consist of multiple such chunks, each of them would be atomic on the low level. Though, PRs can be atomic too — when they achieve one specific goal declared in the title, but that doesn't necessarily/always mean just one commit. The atomicity in case of PRs would be high-level.

@gotmax23
Copy link
Collaborator Author

I'll close this for now. If we want to continue discussing general git usage, the forum would be a good place for that 😉. Thanks everyone!

gotmax23 added a commit to gotmax23/ansible-documentation that referenced this issue Apr 28, 2024
This documents the current strategy for using the Patchback bot (at
least as I perceive it) and the conclusion from the PR merge strategies
discussion in
ansible#554. I've
noticed some confusion about this lately, so I wanted to clear it up.
gotmax23 added a commit to gotmax23/ansible-documentation that referenced this issue Apr 28, 2024
This documents the current strategy for using the Patchback bot (at
least as I perceive it) and the conclusion from the PR merge strategies
discussion in
ansible#554. I've
noticed some confusion about this lately, so I wanted to clear it up.
gotmax23 added a commit to gotmax23/ansible-documentation that referenced this issue May 2, 2024
This documents the current strategy for using the Patchback bot (at
least as I perceive it) and the conclusion from the PR merge strategies
discussion in
ansible#554. I've
noticed some confusion about this lately, so I wanted to clear it up.
gotmax23 added a commit to gotmax23/ansible-documentation that referenced this issue May 2, 2024
This documents the current strategy for using the Patchback bot (at
least as I perceive it) and the conclusion from the PR merge strategies
discussion in
ansible#554. I've
noticed some confusion about this lately, so I wanted to clear it up.
samccann pushed a commit that referenced this issue May 2, 2024
This documents the current strategy for using the Patchback bot (at
least as I perceive it) and the conclusion from the PR merge strategies
discussion in
#554. I've
noticed some confusion about this lately, so I wanted to clear it up.
patchback bot pushed a commit that referenced this issue May 2, 2024
This documents the current strategy for using the Patchback bot (at
least as I perceive it) and the conclusion from the PR merge strategies
discussion in
#554. I've
noticed some confusion about this lately, so I wanted to clear it up.

(cherry picked from commit cbb89d7)
patchback bot pushed a commit that referenced this issue May 2, 2024
This documents the current strategy for using the Patchback bot (at
least as I perceive it) and the conclusion from the PR merge strategies
discussion in
#554. I've
noticed some confusion about this lately, so I wanted to clear it up.

(cherry picked from commit cbb89d7)
patchback bot pushed a commit that referenced this issue May 2, 2024
This documents the current strategy for using the Patchback bot (at
least as I perceive it) and the conclusion from the PR merge strategies
discussion in
#554. I've
noticed some confusion about this lately, so I wanted to clear it up.

(cherry picked from commit cbb89d7)
samccann pushed a commit that referenced this issue May 2, 2024
This documents the current strategy for using the Patchback bot (at
least as I perceive it) and the conclusion from the PR merge strategies
discussion in
#554. I've
noticed some confusion about this lately, so I wanted to clear it up.

(cherry picked from commit cbb89d7)

Co-authored-by: Maxwell G <maxwell@gtmx.me>
samccann pushed a commit that referenced this issue May 2, 2024
This documents the current strategy for using the Patchback bot (at
least as I perceive it) and the conclusion from the PR merge strategies
discussion in
#554. I've
noticed some confusion about this lately, so I wanted to clear it up.

(cherry picked from commit cbb89d7)

Co-authored-by: Maxwell G <maxwell@gtmx.me>
samccann pushed a commit that referenced this issue May 6, 2024
This documents the current strategy for using the Patchback bot (at
least as I perceive it) and the conclusion from the PR merge strategies
discussion in
#554. I've
noticed some confusion about this lately, so I wanted to clear it up.

(cherry picked from commit cbb89d7)

Co-authored-by: Maxwell G <maxwell@gtmx.me>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs_triage Needs a first human triage before being processed. tooling This PR affects tooling (CI, pr_labeler, noxfile, linters, etc.) but not the docs builds themselves.
Projects
None yet
Development

No branches or pull requests

5 participants