Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving our git workflow #266

Closed
abrokenjester opened this issue Jul 27, 2016 · 17 comments
Closed

Improving our git workflow #266

abrokenjester opened this issue Jul 27, 2016 · 17 comments

Comments

@abrokenjester
Copy link
Contributor

Our current git workflow is outlined in the contributing guidelines.

Summarized, it's this:

  1. each enhancement or bug fix gets its own GitHub issue.
  2. each enhancement or bug fix gets its own branch, named issues/#<issue>-<keywords>
  3. issues for bug fixes or small (backward compatible) improvements are branched off from the latest release branch (currently releases/2.0.x).
  4. issues for larger improvements are branched off from master branch.
  5. every issue branch, once complete, is merged back into the same branch from which it originally came.

I have noticed that several people, in particular new developers, struggle with picking the correct branch to start fixing things and/or with picking the right target branch to merge their fixes back into.

Partly this could be caused by unclear explanation in our docs, or by inattentive reading on the part of the contributor, but I feel that perhaps our workflow itself is not as simple and straightforward as it could be.

In particular, I think that people struggle to remember step 5 (perhaps they simply forgot where their branch originated after a while), and this is not something that is trivially simply to see in most git tools.

So, open question: does anybody have any suggestions on how to improve our workflow and/or our workflow documentation? Any ideas to make it easier, both for new contributors and for existing contributors and full committers, are most welcome.

@ansell
Copy link
Contributor

ansell commented Jul 27, 2016

It is much more difficult than the previous workflow where we generally had a single minor version that was active and a single minor version that was only given major bug fixes. The introduction of the master branch and the lack of any releases yet to give a baseline for what makes a change significant has made it a bit confusing about where things should be targeted.

In the case of issue#264 today, I opened its pull request against master because it isn't a critical fix (data loss/stability/etc.) and I didn't want to interrupt the upcoming review of the first 2.0/1.0 releases.

Having a clear roadmap for which versions are upcoming along with expected maintenance dates for major/minor version combinations, with the name of the current git branch to start on would be useful. Then when you want to lock off a branch from features, such as when 2.0.0 gets released, you would change the roadmap to indicate that even small features get put on master.

I have also found it a little confusing that the two release branches have had -M1/-M2 releases and then are bumped back to the same snapshot build as previously. Bumping back to 2.0-SNAPSHOT rather than 2.0-M3-SNAPSHOT, for example, makes it difficult to easily visualise which milestone we are currently up to on each branch

@abrokenjester
Copy link
Contributor Author

abrokenjester commented Jul 27, 2016

It is much more difficult than the previous workflow where we generally had a single minor version that was active and a single minor version that was only given major bug fixes. The introduction of the master branch and the lack of any releases yet to give a baseline for what makes a change significant has made it a bit confusing about where things should be targeted.

I don't think the workflow is fundamentally different from what we did before. master is the branch for "the next minor/major release". 'releases/2.0.x' is the branch for all fixes to 2.0.

The thing that perhaps is confusing is that I created the 2.0.x branch before we released 2.0 - but that is because we needed to allow development of new features that would not be included in the 2.0 release (so that we could be ready in a pinch once we get through review). Hence the split from master. I'm not sure I can come up with a simpler model that still allows us to do this.

As an aside I am deliberately leaving the 1.0 branch out of all this, as it's a backport and really should not be used for any sort of fixes or enhancements.

In the case of issue#264 today, I opened its pull request against master because it isn't a critical fix (data loss/stability/etc.) and I didn't want to interrupt the upcoming review of the first 2.0/1.0 releases.

That seems very reasonable to me. But for what it's worth it's entirely acceptable to do bug fixes on the release branch, even during review.

Having a clear roadmap for which versions are upcoming along with expected maintenance dates for major/minor version combinations, with the name of the current git branch to start on would be useful. Then when you want to lock off a branch from features, such as when 2.0.0 gets released, you would change the roadmap to indicate that even small features get put on master.

A clearer roadmap is definitely a good idea. What I struggle with here though is everybody's limited availability. I can come up with a basic timeline and fill in some of the issues/features, but I will need input (and more importantly commitment) from other devs to plan the rest. This has traditionally not been a strong point of our little band :)

I have also found it a little confusing that the two release branches have had -M1/-M2 releases and then are bumped back to the same snapshot build as previously. Bumping back to 2.0-SNAPSHOT rather than 2.0-M3-SNAPSHOT, for example, makes it difficult to easily visualise which milestone we are currently up to on each branch

This was mainly done because milestone builds are not considered "proper" releases, and I sort of had to shoehorn the Milestone naming into our workflow (not to mention that pending legal approval it was unclear when we could do a proper release, so I wanted to keep our options open). But I can understand how it's confusing. I won't do it that way for the next major release (which is when I would first expect us to do milestones again).

@abrokenjester
Copy link
Contributor Author

Here's an interesting article, proposing an alternative approach to gitflow (which the author claims is needlessly complex):

http://endoflineblog.com/gitflow-considered-harmful

@ansell
Copy link
Contributor

ansell commented Jul 27, 2016

My impression is that the full Gitflow is most suited to web applications that have a quick release and deploy cycle which doesn't exactly fit this case. The tool support would be the main reason for going with it. but may make it difficult for others just using the basic git binary to contribute.

The anti-gitflow argument is almost as difficult to work through as the pro-gitflow argument in my view. They advocate rebasing and squashing simply to get a linear history, but there are just as many conflicts with git merges using rebasing/squashing as there are with branched/merged features. I personally don't mind seeing the full non-linear history as it actually occurred, but that mostly relies on the branch names being useful and not simply issue-key numbers.

Once the first legal review is done, could we highly recommend that both features and bug fixes be on "master", and cherry-pick the appropriate bug fixes back onto the releases/2.0.x branch ourselves to relieve users of that responsibility.

Is it a legal requirement from Eclipse that we put all of the contribution information into a single file? If not, can we add some direct instructions to README.md to get people started.

If it all has to be in the relatively large CONTRIBUTING.md file, can we rearrange or add to the file to give the information about a single recommended branch to start on and the git commands to get them started to it? For example, adding the commands "git checkout master" and "git checkout -B issues/#101-turtle-trig-line-numbers" might stand out to them where the current prose version of that doesn't.

@abrokenjester
Copy link
Contributor Author

My impression is that the full Gitflow is most suited to web applications that have a quick release and deploy cycle which doesn't exactly fit this case. The tool support would be the main reason for going with it. but may make it difficult for others just using the basic git binary to contribute.

That was my thinking as well. Much as I like adopting a well-documented standard, I feel gitflow is overkill for our situation.

The anti-gitflow argument is almost as difficult to work through as the pro-gitflow argument in my view. They advocate rebasing and squashing simply to get a linear history, but there are just as many conflicts with git merges using rebasing/squashing as there are with branched/merged features. I personally don't mind seeing the full non-linear history as it actually occurred, but that mostly relies on the branch names being useful and not simply issue-key numbers.

I'm not convinced about their linear history arguments to be honest, but a lot of the rest of it I do like. Especially the notion that only master is permanent, and hotfixes are done by branching off from the relevant version tag (rather than keeping several parallel release branches up in the air all the time).

Let me stew on this a bit and then propose an alternative workflow (which also takes your idea about doing both features and hotfixes on master into account).

Is it a legal requirement from Eclipse that we put all of the contribution information into a single file? If not, can we add some direct instructions to README.md to get people started.

Oh no, not at all, we have complete freedom in how we organize this.

@ansell
Copy link
Contributor

ansell commented Sep 1, 2016

Another solution when we come to the stage where we are only maintaining a single release stream may be to keep master as the next major.minor release, but allow the opening up of short-lived patch release branches for previous versions as necessary, and replace them with just a tag when the version is released (the tag doesn't show up in "git branch -a" so as not to confuse people).

E.g., we could have master at 2.4.1-SNAPSHOT, but if we wanted to release a bug fix targeted to 2.3.4 we would checkout the 2.3.3 tag as a new branch named "releases/2.3.4", bump the version on it to 2.3.4-SNAPSHOT, and work on that for a short time until its release, upon which time the releases/2.3.4 branch is replaced by a tag "2.3.4" and deleted, after merging the relevant changes back into 2.4.1-SNAPSHOT on master. If 2.3.5 is necessary, then we checkout the 2.3.4 tag and do a similar short-lived process on it.

We are currently maintaining two major release streams so it doesn't quite fit yet as the current model simplifies the three-stream situation a little compared to that model.

If we are able to regularly release new minor versions the short-lived branches may reduce confusion, as almost everything would be done against master and there wouldn't be other long-lived branches floating around the main eclipse/rdf4j git repository. It should be possible to start regularly bumping minor versions again now that we are past the large SPARQL-1.1/RDF-1.1/Java-8 changes that hampered that for a little while.

@abrokenjester
Copy link
Contributor Author

That could work quite well actually. The two major releases we are doing could be fitted in by thinking of the 1.0 branch as a "secondary master" branch: we occassionally merge release branches into it to do patch release but nothing else.

I wasn't really planning on doing a 1.1 release by the way. For me, 1.0 is the end of the line of Java 7 support (barring patch releases in sync with 2.0 patches).

@ansell
Copy link
Contributor

ansell commented Sep 1, 2016

To be fair, the ideas (branch-off-tag, then delete-after-tag-and-merge) above aren't my own, I read them somewhere recently but forgot to comment here at the time so I have lost the source.

@abrokenjester
Copy link
Contributor Author

Ok, so if I can summarize this we would have the following workflow:

  1. we have a single permanent branch: master. All feature development for the next minor/major release takes place here.
  2. a few days (max a week) before the next release is due, we branch off a release branch. This branch is only intended for "finishing touches" for release prep and last-minute bug fixes.
    after release is done (and the release is tagged in git), the release branch is merged back into master and closed.
  3. when we want to do a patch version of a previous release, we branch off a temporary patch release branch from the relevant version tag to do the fix(es), then release, then tag again and merge back into master, closing the branch.
  4. it's the responsibility of the main committers to decide what can go into a patch version, and to cherry-pick those changes if necessary, from the master branch (instead of having to ask casual contributors to close their PR and open a new one against 'the correct branch'). In effect, for casual contributors, the default branch to put PRs up against is always master.

I like this approach, but given that we will be using cherry-picking a lot more, it does mean that we should think about using merge squashing more. I'd hate to have to cherry-pick a fix with 30 individual commits. GitHub has the option for committers to squash a PR before merging, so we don't necessarily have to force contributors to do the squashing.

In all of this the Java 7 backport is an anomaly. It just becomes a separate branch (permanent for as long was decide to actually bring out backports) to which we regularly merge new fixes and keep things J7-compatible.

I think it's certainly easier to explain this workflow. What do you think is the ideal time to implement it? Directly after the 2.0.1 release?

@pulquero
Copy link

pulquero commented Sep 5, 2016

Sounds easier to me.

abrokenjester added a commit that referenced this issue Sep 7, 2016
abrokenjester added a commit that referenced this issue Sep 7, 2016
* issue #266: added guideline summary to README

* issue #266: simplified branching model, added info on release cycle

*  #266 reorganized: put the essentials at the top

* #266 fixes from review feedback
@abrokenjester
Copy link
Contributor Author

abrokenjester commented Sep 7, 2016

One or two minor tweaks I'd like to suggest to the workflow:

  1. Due to the use of 'squash and merge' the title of the Pull Request becomes important (because that becomes the first line of the squashed commit's message). Can we make it a requirement to have the PR title start with "Issue #: " followed by a short meaningful description of the fix?
  2. If we do the above, having the issue number in every single commit message within the PR becomes less necessary/useful, so we can drop that requirement.

Thoughts? Also keen to hear what you think of using 'squash and merge' instead of standard merge commits.

@pulquero
Copy link

pulquero commented Sep 7, 2016

Modifying/correcting the Pull Request title is also much easier than a commit message.

On Wednesday, 7 September 2016, 9:36, Jeen Broekstra <notifications@github.com> wrote:

One or two minor tweaks I'd like to suggest to the workflow:

  • Due to the use of 'squash and merge' the title of the Pull Request becomes important (because that becomes the first line of the squashed commit's message). Can we make it a requirement to have the PR title start with "Issue #: " followed by a short meaningful description of the fix?
  • If we do the above, having the issue number in every single commit message within the PR becomes less necessary/useful, so we can drop that requirement.
    Thoughts? Also keen to hear what you think of using 'squash and merge' instead of standard merge commits. —
    You are receiving this because you commented.
    Reply to this email directly, view it on GitHub, or mute the thread.

@abrokenjester
Copy link
Contributor Author

@ansell pointed out some major downsides to using squash-and-merge (see comments on #325). The most important of which is probably that we lose sign-off information - which is a big requirement from Eclipse's point of view.

This goes hand-in-hand with individual commits having issue numbers: if we don't squash, we should still require that. Of course, we can advise contributors to clean up/squash their commit history themselves before presenting a PR.

@ansell
Copy link
Contributor

ansell commented Sep 7, 2016

So the branching model seems to be acceptable to everyone. The reference to squashing I read as being an optional thing that the user would do locally which is okay by me.

Jeen has now proposed using the inbuilt GitHub squash and merge feature that is separate to that and has a few minor side-effects that we would need to work through.

Specifically:

  1. Are the original commits which contained the Signed-Off-By tags preserved?
  2. Is it possible to easily have Signed-Off-By added to the pull request message so that it is available on the resulting squash and merge commit.

@ansell
Copy link
Contributor

ansell commented Sep 7, 2016

And, 3. Should it be allowable for users to do squash and merge locally to preserve GPG signing (-S) and then not have the GitHub squash and merge used.

@abrokenjester
Copy link
Contributor Author

Ad 1. They are, but only for as long as the third-party repo (from which the PR came) is accessible to us. It's fine if a user deletes a branch, but if they remove their forked repo completely, we lose that bit of the history.
Ad 2. AFAICT this is not easy to change. I would need to manually copy-paste the sign-off message, which is (apart from busywork) error-prone.
Ad 3. I am fine with contributors doing all the squashing they want on their local fork - it won't affect the history of the central repo. For committers with issues branches in the central repo I am a bit hesitant about it.

@abrokenjester
Copy link
Contributor Author

The sign-off thing is a blocker for me. Unless someone has a clever solution to that I suggest we just stick with 'normal' merge commits, after all.

abrokenjester added a commit that referenced this issue Sep 8, 2016
Got rid of the redundant link to the guidelines (GitHub shows it right at the top itself) and the tick boxes for formatting and tests, etc (nobody really uses them anyway it seems).
abrokenjester added a commit that referenced this issue Sep 8, 2016
abrokenjester added a commit that referenced this issue Sep 8, 2016
Issue #266: simplified Pull Request template
abrokenjester pushed a commit that referenced this issue Aug 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants