Skip to content
This repository has been archived by the owner on Jan 11, 2023. It is now read-only.

[Proposal] Release process #667

Merged
merged 3 commits into from
May 31, 2017
Merged

[Proposal] Release process #667

merged 3 commits into from
May 31, 2017

Conversation

seanknox
Copy link
Contributor

@seanknox seanknox commented May 23, 2017

Background


A constant pain point for acs-engine users and internal teams at Microsoft using acs-engine has been lack of clearly defined versions that can be tracked for stability and feature completeness. Now that the project has agreed to use SemVer for version numbers, I'm proposing the following possible approach in defining releases of acs-engine.

For the first release, I propose the following:

  1. Define/flesh out high-level roadmap (included in this PR) to be updated through the course of the project toward a "stable" v1.0.0 release.
  2. Define the first official acs-engine release, v0.1.0, using GitHub Milestones
  3. Define special CI release job that 1) runs all unit tests, functional/regression tests, any other additional tests pertinent for release verification and 2) uses the GitHub API to tag the repo and create a release 3) build and deploy any artifacts like Docker images (not clear this is needed)
  4. Once we believe all issues/features are accepted for v0.1.0 release kick off automated release testing
  5. In addition to automated testing, conduct manual testing according to a regression test matrix (TBD)
  6. If testing produces an issue needing a fix, PR the fix, get it reviewed and merged into master
  7. Use automated CI job above to trigger release

Going forward, the release process would be:

  1. Bi-monthly planning meetings to review project Roadmap and define release milestone
  2. Once all issues/features are accepted for the release milestone, kick off automated release testing as well as manual testing
  3. If testing produces an issue needing a fix, PR the fix, get it reviewed and merged into master
  4. Use automated CI job to trigger release

This change is Reviewable

Fixes #55

### Step 3: Tag and Create a Release

TBD

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jchauncey do you have a lightweight recommendation for using CI to do this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So typically you have a foo-release job and watches for pushed tags to a repo. You can then build whatever artifact you want and with that given tag.

### Step 4: Close GitHub Milestones

TBD

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will flesh this out a bit later, before merging this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have used our "seed-repo" job in the past to close milestones and keep the labels in sync with each other.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering it's a single repo (just acs-engine) do you think that'd be helpful here?

@seanknox
Copy link
Contributor Author

First crack at a release process. Please take a gander and comment. Thanks!

cc @anhowe @rgardler @dmitsh @lachie83 @jchauncey @sgoings

@seanknox
Copy link
Contributor Author

@ritazh @wbuchwalter would love your thoughts as well.

@seanknox seanknox changed the title Release process [Proposal] Release process May 24, 2017
@wbuchwalter
Copy link
Contributor

Great stuff, two questions:

  • The dev would happen directly on the master branch like today, but releases would now happen more or less every two months.
    What would be the process if an urgent fix needs to be made to the previous release?
    Would it be easier to have the development happening on a dev branch and merging in master for each release?
  • Also what does a release would look like exactly? would the acs-engine binaries be stored in a GitHub release?

@seanknox
Copy link
Contributor Author

The dev would happen directly on the master branch like today, but releases would now happen more or less every two months.

Also what does a release would look like exactly? would the acs-engine binaries be stored in a GitHub release?

Master development will largely continue as it already does. Releases will occur monthly, driven by milestones defined in GitHub. Release milestones follow the roadmap toward v1.0.0. The roadmap itself still is TBD, though I've added some examples (GPU support, support for dedicated etcd clusters, etc.).

What would be the process if an urgent fix needs to be made to the previous release?

Patch fixes can happen at any time, based on need. We're using SemVer for version numbers, which works like:

Given a version number MAJOR.MINOR.PATCH, increment the:

MAJOR version when you make incompatible API changes,
MINOR version when you add functionality in a backwards-compatible manner, and
PATCH version when you make backwards-compatible bug fixes.

Would it be easier to have the development happening on a dev branch and merging in master for each release?

I think this would be harder, actually. Trunk-based development (committing to master) discourages long-lived development branches to avoid merge hell and broken builds.

@jchauncey
Copy link
Contributor

Yeah you really really want to avoid long running "feature/dev" branches in this workflow. So we are constantly merge into master.

Lets say that we find a bug we need to fix today and do an immediately release for. The workflow would look something like the following:

  • PR gets submitted for bug fix and is reviewed
  • Merge PR into master
  • Checkout last release tag and create local branch
  • Cherry pick new commit from master onto local branch (advancing that tree by 1 commit)
  • Cut tag in git and push to github which will kick off the release pipeline in CI
  • This pipeline validates the bug fix is good
  • Publish artifact with version of new tag

This process can keep going for subsequent PATCH releases and even works if we need to port fixes onto older releases (for whatever reason that occurs).

@wbuchwalter
Copy link
Contributor

Totally missed the SemVer part in your first explanation, makes sense. 👍 Forget what I said about git flow then.
Looking forward to this, I have a project that has acs-engine has a dependency and that would make life much easier.

@seanknox seanknox mentioned this pull request May 26, 2017
@seanknox
Copy link
Contributor Author

Folks, I'm planning to merge this to get us moving. We'll adjust the process as we learn what works bests.

@seanknox seanknox added this to the v0.1.0 milestone May 26, 2017
@JackQuincy
Copy link
Contributor

JackQuincy commented May 26, 2017

LGTM. But I don't have much experience on this sort of things so I'm not going to approve so someone else needs to look at this as well. @colemickens @anhowe @sgoings can any of you approve/look?

@amanohar
Copy link
Contributor

If I have a multi week feature that I am working on: is the proposal there are regular merges into master for that? What if a release was scheduled for say 6/15 but the feature is not ready (but other features are)? How do will commit be picked for being tagged as a version? Will the partial change for a feature in development still be released?

@jchauncey
Copy link
Contributor

It largely depends on how you want to divide up the feature development. Lets say its a sufficiently large feature then we might put it behind some type of conditional flag with it off by default. This wya we can continue to merge and ship off master without it being live. Then when the feature is ready for use we turn it on by default. If a release was scheduled for 6/15 and we arent ready for people to use it then we should still be able to ship.

Nothing should ever block master from being released (in theory ;))

@amanohar
Copy link
Contributor

In theory, yes :). I am using a feature flag so its hidden even today.

My question/concern is around shipping partial code/feature that might not have been fully tested. In theory nothing should even hit the code path of a new feature but that's not always the case. I am in favor of one branch approach but its not without risks.

@jchauncey
Copy link
Contributor

Sure there are risks but we can definitely minimize those. Feature flags help a lot and for areas where we are refactoring code and removing functionality in favor or something else we should have adequate test coverage to validate something is ok to merge.

@seanknox
Copy link
Contributor Author

seanknox commented May 27, 2017

If I have a multi week feature that I am working on: is the proposal there are regular merges into master for that? What if a release was scheduled for say 6/15 but the feature is not ready (but other features are)? How do will commit be picked for being tagged as a version? Will the partial change for a feature in development still be released?

These are great questions, so let's unpack this a bit.

If I have a multi week feature that I am working on: is the proposal there are regular merges into master for that?

Yes, and feature flags are a great way to enable this. This allows you as the engineer working on the long-running feature to stay current with HEAD without fear that the code path can hit your still-in-development work. As a corollary, I'm want to encourage us to seek to deliver value in chunks. If a feature is large and complex, break the feature down into discrete deliverables and put them behind feature flags. You can see an example of the "smallest thing that adds value" mindset in Lachie's issue about creating a separate etcd cluster:

  • a user can provision a single etcd member on a separate VM
  • a user can provision an etcd cluster on separate VMs
  • a user can provision an etcd instance with SSD disks

The feature as a whole is "dedicated etcd cluster" which is a lot of work until it would be complete. We can provide value to users right away and avoid merge conflict hell by merging each portion to master as it is completed. Use feature flags to rapidly iterate without jeopardizing safety.

What if a release was scheduled for say 6/15 but the feature is not ready (but other features are)?
How do will commit be picked for being tagged as a version?

Core to our ability to ship software at a regular cadence is planning. Periodically maintainers will meet—every month is what I proposed—to identify goals for the next version milestone. We do our best to estimate what we think we can get done. If we're wrong—say a feature was slated for the v0.2.0 release but isn't ready, then the feature won't be released. This is where a backlog and regular communication about the state of work can help us adjust course as we learn more. I'd like us to start using a backlog to provide transparency about feature and bug fix work (probably GitHub issues).

Will the partial change for a feature in development still be released?

Use feature flags. We want to constantly be delivering value and providing guardrails to ensure safety. This includes tests, linting, CI, and code review. If a commit can pass all of these gates and is behind a feature flip, then we have great confidence that the change set can be committed in a safe fashion.

My question/concern is around shipping partial code/feature that might not have been fully tested. In theory nothing should even hit the code path of a new feature but that's not always the case.

I've heard this fear many times in my career and can assure you there is no magic to making it work: a conditional evaluates to either true or false. If you have code behind feature flag that is only evaluated by the Go lexer if the feature flag conditional is true, it can't be executed unless it passes the truth test.

We can increase our confidence that this is all true by ensuring we include unit and functional tests at every opportunity. When something breaks—CI or a regression is actually released—we update our automated tests to catch that from happening again. We create a delivery pipeline that all code must pass through before being committed:

  • unit tests
  • lint/style
  • PMs working side by side with engineers to ensure we all share the same idea about the feature
  • code review
  • functional/regression/smoke tests
  • CI as a gating mechanism to enforce all of this

@amanohar
Copy link
Contributor

I've heard this fear many times in my career and can assure you there is no magic to making it work:

Makes sense. Thanks! Having done this before I understand that testing and CI are important. Although no fear involved here :) . I want to make sure I understand the proposal as it relates to my work and finding a good balance between quality and agility.

a conditional evaluates to either true or false. If you have code behind feature flag that is only evaluated by the Go lexer if the feature flag conditional is true, it can't be executed unless it passes the truth test.

As mentioned code is behind a condition so we should be good. And based on my experience it not always possible to write that can be just cleanly put behind a condition and turned off. There is possibility of some interactions/intersections between old and new code in some code paths.

@jdumars
Copy link

jdumars commented May 30, 2017

@seanknox - very well put. Working at Rally Software showed me how powerful Agile principles are when properly applied. It's not necessarily about dogma. It's about a spirit of partnership with the customer where delivery is as rapid as possible so course-correcting feedback can be reconciled with the plan. And, in every action, is the curiosity about how we might do it better next time. These two driving forces coupled with some of the practices you mentioned above can positively transform the entire landscape of development processes.

@ritazh
Copy link
Member

ritazh commented May 30, 2017

Great discussion! Completely agree with the approach.

We can increase our confidence that this is all true by ensuring we include unit and functional tests at every opportunity.

We should probably start from this. Creating issues where we are currently missing tests. Most of the tests today are around templates.

CI as a gating mechanism to enforce all of this

I'm sure there is already discussions around CI. As a developer, it would be really nice to be able to see which tests caused the CI to fail (especially the E2E tests and logs) and be able to fix it without bugging the maintainers. 😄

@seanknox seanknox added the ready label May 30, 2017
@seanknox
Copy link
Contributor Author

@ritazh love your idea to track gaps in test coverage. Would you mind opening an issue around missing coverage you're aware of?

I'm sure there is already discussions around CI. As a developer, it would be really nice to be able to see which tests caused the CI to fail (especially the E2E tests and logs) and be able to fix it without bugging the maintainers. 😄

Great observation. We're seeking to make CI public for this very reason.

## CI dashboard available to the public
## Native Azure VNET CNI support
## Multi-GPU support
## Support for dedicated etcd cluster VMs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we instead replace the above 4 items with a "feature" issue tag.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per our discussion over using the wiki to track Roadmap, I'm going to delete this roadmap.md file.

This section leads a maintainer through creating an acs-engine release.

### Step 1: Assemble Master Changelog
A change log is a file which contains a curated, chronologically ordered list of changes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the changelog generation automated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It probably should be, though we'd want to adopt semantic commit messages to ensure commits are in a format useful for the changelog. @jchauncey thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a changelog generator script that can make pretty changelogs but you must follow a commit style for it to work.

But it generates them to look like this - https://github.com/deis/workflow/releases/tag/v2.13.0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Roger that, then no need to specify in this PR. FWIW, I have another PR around enforcing go conventions which includes using semantic commits.


When showstopper-level bugs are found, the process is as follows:

1. Create an issue that describes the bug.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

write a test that covers the test gap?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it. Will update.


## Release Planning Meetings

Major decisions affecting the Roadmap are discussed during Release Planning Meetings on the first Thursday of each month, aligned with the [Release Schedule][].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These would also be aligned with the monthly objectives.

@@ -0,0 +1,28 @@
# Planning Process

acs-engine features a lightweight process that emphasizes openness and ensures every community member can be an integral part of planning for the future.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe delete this sentence (further description in slack)


## The Role of Maintainers

[Maintainers][] lead the acs-engine project. Their duties include proposing the Roadmap, reviewing and integrating contributions and maintaining the vision of the project.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to include the current roles covered by the "GitHub Issues" Role, and the "GitHub Code Review" role.

Copy link
Contributor

@anhowe anhowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, please answer the comments/questions, then it should be good

@jackfrancis
Copy link
Member

LGTM. My only opinion would be that for the initial v0.1.0 release, we stick to mostly the overhead of this new release process itself: tooling, orchestration, docs, etc. And that we deprioritize any add'l code changes during this "administrative" spike.

In other words, in my view, the purpose of this upcoming release is exclusively to "package" the existing code into a rational, versioned, process-approved publication.

Great work!

@seanknox
Copy link
Contributor Author

My only opinion would be that for the initial v0.1.0 release, we stick to mostly the overhead of this new release process itself: tooling, orchestration, docs, etc. And that we deprioritize any add'l code changes during this "administrative" spike.

In other words, in my view, the purpose of this upcoming release is exclusively to "package" the existing code into a rational, versioned, process-approved publication.

Well said @jackfrancis. I removed the roadmap.md in favor of using the wiki to manage the roadmap: https://github.com/Azure/acs-engine/wiki/Roadmap

FWIW, the "stretch" goals listed will assuredly be part of the release. In fact, 2 of them are already merged and GPU support is close behind.

@seanknox seanknox merged commit d6608f3 into master May 31, 2017
@seanknox seanknox removed the ready label May 31, 2017
@seanknox seanknox deleted the release-process branch May 31, 2017 02:57
@seanknox
Copy link
Contributor Author

Thanks for the considerate feedback everyone!

giphy-downsized-large

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Produce releases exist
10 participants