Skip to content

WW3 Gitflow

Jessica Meixner edited this page Jul 27, 2023 · 3 revisions

Code Management of WW3

GitFlow & A Combination of An Authoritative repository with Trusted Institutional Forks

The WW3 development package will be moving to github employing the git distributed repository paradigm. Many times, the applications of standards are left to individual code managers of different forks, which leads to a plethora of visions and philosophies. Repository code management can get very messy, very fast if there are not uniform standards applied across all repositories. Therefore, with the transition to an open development platform, where there will be multiple trusted forks, it is critical that we apply the same standard across all the repositories. This serves two purposes: setting strict standards for code managers, with clear instructions on how code management should be done; and providing developers with procedures on how to create branches in the different authoritative repositories that they work in.

What is Gitflow?

The idea of Gitflow originated from a blog that was written by Vincent Driessen in 2010. The original blog can be seen here. Gitflow is a philosophy for managing branches in a development repository. It is very clean and straightforward and has been adopted as the standard for code management all over the world. The figure from Vincent’s blog is reproduced below.

This is not a new idea, we have followed some of them and many code development communities are well aware of this approach. At its heart the concept of Gitflow is a series of branches with strict functions, as follows:

  • develop: This is the main development trunk. All code development uses this branch for syncing and coordinating. This is different from the master in very definite ways. For example, updates to develop can only be done by the code manager of a trusted institutional fork after the code has been approved by a team of assigned reviewers.

  • main: This is the branch that contains the mature part of the development. It is the one that will either be considered for a public release, or will be put in operations at NCEP etc. It is updated much less frequently than the develop branch. It follows a strict testing protocol process before it is updated. Updates to master can only be done by the code manager after all the tests have been completed and reviewed by assigned reviewers. In addition, updates to master will only be done by the authoritative fork (NCEP).

  • feature: These are the branches where developers work. They are always created off of the develop branch to develop new ideas.

  • release: This branch is created once the development reaches a certain stage that you are ready to “freeze” the code for release to the community. A release branch is where detailed/exhaustive testing happens and bugs are fixed. This branch keeps getting merged back to develop to ensure that bug fixes get back to the main development trunk.

  • hotfix: This is a bugfix branch that is only created from the master to address immediate fixes that were overlooked in the testing process (it will happen, rest assured).

Figure 1: A Git Flow branching Strategy taken from Vincent Driessen’s blog

So how does it work?

In Figure 1 we started from a release version 0.1 in the master, from which a develop branch was created. You can alternatively start from a develop branch and then create a master at the end of the development process. Either way is fine as long as there is a starting point.

Once development starts, you create your feature branches off of develop and that follows the standard development process. Once feature branches are incorporated into develop, they should be deleted. This is very very critical. Feature branches should be just that, addressing a specific development (or fix). If they are not deleted they tend to become permanent branches and it becomes difficult to isolate issues. For example, if developer A is working on multiple capabilities using a single feature branch, and then tries to put them all back into the develop branch, and it breaks code functionality then it becomes very difficult to address which one of the developments was responsible. Having temporary feature branches also ensures that new development starts from the top of the develop branch by forcing developers to create new branches. Monitoring dormant feature branches becomes a large and burdensome task for the code managers, so it is imperative that some level of self policing be undertaken. During the development process developers should regularly sync their feature branch with the develop branch to ensure that they stay abreast with other developments and avoid nasty surprises during the merge back with the develop branch.

At some point in the development process, the development team feels comfortable enough to have a release to the public. At this point, a release branch is created. The purpose of the release branch is to isolate it from regular development. In that sense, the release branch should be created only once from the develop branch and not regularly updated from develop, except under strict circumstances. The release branch is where extensive testing occurs. In the operational environment, this is where the retrospectives are run. The release branch is used primarily to prepare the model for final release so ideally it has bug fixes and enhanced documentation, all of which should regularly go back to develop. It will happen that there may be a feature needed in the release branch that was discovered during the testing phase. In those instances, it is ok to create a feature branch off of the release branch, ensure that it works, and put this back into the develop branch as well. In rare instances you may want to update the release branch with the develop branch during the testing process. This should be done very carefully as answer-changing results may require restarting of the testing process. Due to the risks involved, we do not recommend regular updating of the release branch with the develop branch. [Note that the other way around, that is the regular updating of the develop branch with the release branch should continue to happen].

Once all testing is complete, the release branch should then be sent to the master and a new tag created. Note that the master is not created directly from develop but follows the release branch approach to ensure thorough testing. After the merge with master and the creation of a new tag, the release branch should be deleted for the same reasons enumerated in the feature branch discussion.

Once the code is released or is in operations or both, there will be instances when an immediate bugfix is needed (this is guaranteed to occur). The hotfix branch is for that purpose: it originates only from the master. In fact, this is the only branch that is ever created from the master. Its purpose is to address a critically necessary issue and it should be treated like an emergency fix. Once the fix is ready then that is propagated to both the master and the develop branch [Note, if the hotfix was done in the middle of a release branch testing then you may have to put this fix in the release branch as well]. Once again, the hotfix branch is deleted as soon as it has been fully merged into the other branches.

The master and develop are the two main branches of an authoritative repository, and updates to these repositories can only be done by the code manager. The develop is what we have traditionally referred to as the trunk in svn and serves the same purpose. It is the branch from where all development happens and is tracked. The master is the very stable version of develop, it will not have all the latest developments, but the features in master would have been thoroughly tested. Separating out the “master trunk” into two branches, develop and master also makes for much easier management as release tags will always come from the master, and new developments will always begin from develop.

Naming convention

Following a strict naming convention across all authoritative repositories is recommended for several reasons, for example:

  • It provides a habit and discipline to follow the Gitflow process enumerated above,
  • It allows developers to clearly identify which branch is which without having to regularly engage with the code manager,
  • In repositories with multiple code managers, confusion is avoided and there are then no single points of failure.

With that in mind the following naming convention is proposed:

  • develop for the develop branch,
  • master for the master branch,
  • FB[name] for feature branches,
  • RB[name] for release branches,
  • HF[name] for hotfix branches.

Sounds like too much work, is all this really necessary?

Past experience in working with different repositories has shown that while in spirit many repositories follow this process, some are more disciplined than others. And even in those cases where repositories are disciplined, with time, indiscipline sinks in. This is understandable as there are multiple constraints and deadlines that are enforced upon scientists.

Ensuring strict standards across the repositories will help counter this process. Even if slack shows up due to deadline constraints the standards ensure that these are temporary. The naming convention may seem like an unnecessary regulation, but in an environment where development will span multiple agencies and the broader research community, having such an approach will clarify development and avoid pitfalls. In addition to adopting standards, their enforcement will be ensured using a distributed structure with buy-in from "trusted institutional repository" fork managers, and tight/frequent communication with the NCEP authoritative fork (see below).

There are multiple tools that have been developed (e.g. hubflow and gitflow) that make implementing the Git Flow branching system straightforward for code managers and developers. Our hope is that this approach will be adopted across all trusted forks of WAVEWATCH III.

What are trusted institutional repositories and what is their role?

Instead of following the classical approach of all development happening in one repository, WAVEWATCH III development with its transition to GITHUB is going to take advantage of the forking capability in github to have a central authoritative repository (that will still be maintained by NCEP), with “trusted institutional forks” and their designated code managers that will continue to code-manage their forks reflecting the authoritative repository.

Code managers of trusted institutional forks are expected to do the following:

  • Keep their fork up to date with the authoritative repository.
  • Manage their branches and merging in accordance to the gitflow concept.
  • Remain in constant contact with NCEP about development,
  • To begin, we will have bi-weekly meetings organized between NCEP and the code-manager or representative from each trusted fork.
  • Enforce regression testing, in particular of feature and bugfix branches before merging them into their develop branch.
  • Ensure that development from forks to a trusted fork are made via pull requests into feature or bugfix branches,
  • They should not take pull requests directly into their develop branch.
  • Take responsibility for frequently pushing trusted-fork develop branches back to the authoritative fork,
  • Only trusted forks will be allowed to make pull requests to the authoritative repo’s develop branch.
  • NCEP will only take updates into the NCEP develop branch from a trusted-institutional fork’s develop branch. Development from a non-trusted institutional fork is only accepted via a feature or bugfix branch.
  • Take responsibility for branch management (i.e., deleting branches after merges).

The only difference between the authoritative and the institutional repositories is that all master updates and public release tags are created from the authoritative repository. Other than that these repositories act as development repositories with independent but consistent active code management and testing. Using this approach as opposed to a single repository has two main advantages: first, it allows the institutions to be able to have some control on the development activities; and second, it distributes the coordination and management of branches across multiple code managers. Both are important features of an open development environment.

How to become a trusted institutional repository?

To become an institutional repository contact the NCEP code manager, who will subsequently discuss requests for new institutional forks with representatives/code-managers of the other institutional forks in a future meeting. By asking to create a new institutional fork, you will need to follow and enforce the code-management rules and responsibilities of a trusted fork, as listed above. While the authoritative repository will be open and publicly viewable, the trusted institutional repositories may or may not be publicly viewable to all, depending on the institutions choice. We ask that the guidelines outlined in gitflow be followed for all forks to make coordination and syncing between the different forks straight forward.

What is the authoritative repository and what is its role?

The authoritative repository (located at NCEP) will be the only fork who makes updates the master branch and creates public release tags. In addition, they will coordinate public releases and communication between trusted forks.

Open WAVEWATCH III development calls

NCEP will continue to host monthly (or bi-monthly) development calls. These will be open to anyone to present recent work or developments. They will be announced on the NCEP github repo.

Alpha/Beta Testing

Instead of alpha/beta testing in its previous form will not be pursued because in the open-development paradigm anyone will be able to test a release branch before a public release tag is made.