CircleCI Orb: Workflow Manager

Manage workflow concurrency and job state using an external store.

This orb allows you to:

Limit the number of concurrently running workflows. This is useful when you want to only allow one batch of changes at a time or use AWS CloudFormation and need to wait for the previous deploy to finish.
Squash commits that are deployed in a workflow when the workflow is allowed to proceed
Store and retreive data from a key-value store in jobs, even if they're run in parallel
Track the status of a workflow from the command line
Send a Slack message when a workflow succeeds after failing

Rationale

Even though we tested it in different jobs, the queue orb did not consistently block the deploy workflow from executing more than one commit at a time. We don't think the issue is necessarily with the queue orb, it's most likely with the CircleCI API.

Instead of using the CircleCI API to determine if the workflow can continue, we can use a remote key-value store (DynamoDB) that acts as a first-in-first-out (FIFO) queue. This will allow us to process deploys one at a time, in the order the commits were merged.

Installation

Requirements

Install the AWS CLI
Install jq
Set your AWS user profile or AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_DEFAULT_REGION in your shell

Deploying the Stack

This orb uses a DynamoDB table to store state. Deploy the CloudFormation stack with infrastructure to support this orb using the AWS CLI:

aws cloudformation deploy \
    --template-file cloudformation_template.yml \
    --stack-name my-new-stack

This will create a DynamoDB table. You can tweak that template to align with your organization's standards or use the AWS console instead.

IAM Permissions

You need to allow your CircleCI AWS IAM user to interact with the DynamoDB table. Create a managed or inline policy and attach it to the IAM user:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AllowTableAccess",
            "Effect": "Allow",
            "Action": [
                "dynamodb:PutItem",
                "dynamodb:UpdateItem",
                "dynamodb:GetItem",
                "dynamodb:Query"
            ],
            "Resource": [
                "arn:aws:dynamodb:*:*:table/name-of-the-dynamodb-table"
            ]
        }
    ]
}

CircleCI Context

For each workflow that uses this workflow-manager orb, create a CircleCI Context with an environment variable named WORKFLOW_LOCK_KEY set to the name of a workflow. Then, set context: my-new-context for every job in the workflow.

Usage

Entering the Queue: `wait-in-queue`

We run the wait-in-queue job first in the deploy workflow. For example:

jobs:
    - workflow-manager/wait-in-queue:
        context: my-deploy
        filters:
            branches:
            only: master
        check_previous_commit: true
        do_not_cancel_workflow_if_tag_in_commit: "[force deploy]"

It starts by adding an item to the table with some attributes pertaining to the workflow instance:

Field	Type	Description
key	string	Hash key. The lock "key" that unifies different workflow instances. You must set a `WORKFLOW_LOCK_KEY` environment variable in the your project settings or in a CircleCI context used by your workflow.
committed_at	number	Sort/range key. Unix timestamp of when this commit was committed
created_at	number	Unix timestamp of when this item was added to the table
expires_at	number	Unix timestamp of when this item expires and gets removed from the table
build_num	number	CircleCI build number
commit	string	Git commit SHA
username	string	GitHub username of commit author
workflow_id	string	Workflow instance ID
status	string	Localized secondary index. One of `QUEUED`, `RUNNING`, `SUCCESS`, `FAILED`. Starts out as `QUEUED`.

The job then starts polling the table for the oldest item that doesn't have a status attribute of SUCCESS or FAILED. If that item has the same workflow_id as the job, that means the job is at the "front" of the queue and can continue; we set the item's status to RUNNING and the workflow transitions to the next job. It will wait in the queue for up to 4 hours before failing. When the workflow is allowed to continue, it can be said to have a "lock".

Exiting the Queue: `exit-queue`

The exit-queue job updates the table item's status attribute to be SUCCESS or FAILED depending on the value of the exit_condition parameter. By updating the job status, it allows other workflows to continue. This job should be called as the last job of every "branch" in a given workflow. For example:

jobs:
    # ...
    - workflow-manager/exit-queue:
        context: my-deploy
        requires:
            - deploy-stack
        filters:
        branches:
            only: master
        send_slack_on_recovery: true

Even if we add the exit-queue job in all the right places in the workflow, we can still get into a state in which the lock is not released when a job fails. CircleCI does not have support for "run a job if some other job failed" so we have to add boilerplate to do so. This takes the form of adding the following step at the bottom of every job that the deploy workflow uses:

- exit-queue:
    exit_condition: on_fail

The exit-queue command will then release the lock only if a previous step in the job failed. It was modeled after the slack/status command source.

Instead of passing a lock key parameter down through the job/command parameter stack, we can instead leverage a CircleCI Context that sets an environment variable called WORKFLOW_LOCK_KEY. If all jobs in the workflow include the context: <context name> parameter, all commands will have access to that environment variable. The lock-related commands can source that variable to set the lock key.

Squashing Commits

The deploy worklow has the capability to "squash commits", which replicates a feature of Jenkins that didn't come for free in CircleCI.

Why? The primary goal of this is to reduce the time it takes for developers to get their merged code into production. A secondary goal is to reduce the cost of a bunch of containers burning CircleCI credits while trying to acquire a lock on the workflow.

How does it work? When you merge, your commit enters the queue. However, if someone else merges and your commit is still waiting to proceed, it will detect that it's no longer last in the queue and self-cancel. Since we build and deploy everything during the workflow, your changes will be included when that later commit starts running. Once a workflow passes the wait-in-queue job, it is considered to be in the "running" state and will not be squashed.

What if I don't want my commit squashed? There are known cases in which a commit should not be squashed, in these cases add [force deploy] in your merged commit message. This is most common when a commit has database migrations included or the workflow has specific conditions that require commits to be executed without being squashed.

Note: As of now, if there's a failure in the deploy workflow, the Slack message sent to the channel will only include the author of that commit, i.e. it won't contain the list of authors of commits that were squashed. We can see how it plays out before deciding if this behavior should be added.

Workflows CLI

The ./workflows-cli.sh script is a CLI for working with CircleCI workflows. It's mostly a wrapper on top of the aws CLI and primarily queries the workflows table in DynamoDB (see: Deploy Queue).

Usage:
        ./workflows-cli.sh <command> [<arg>...] [options]

Commands:
        ls                      List workflows
        cancel <workflow-id>    Cancel a workflow

List Options:
        -c <column>     Include specific columns in the table. Either provide 'all' or a comma-separated list
                        including one or more of: acquired_at,commit,committed_at,created_at,released_at,status,username,workflow_id
                        (default: workflow_id,status,username,committed_at,acquired_at,released_at,commit)
        -l <count>      Limit the number of workflows returned. Must be between 1 and 100 (default: 100)
        -p <seconds>    Polling interval when watching for updates. Must be between 1 and 60 (default: 2)
        -s <status>     Filter workflows by status. Either provide 'all' or a comma-separated list
                        including one or more of: RUNNING,QUEUED,SUCCESS,FAILED,CANCELLED
                        (default: QUEUED,RUNNING)
        -w              Watch for updates. This clears the terminal.

Common Options:
        -k <key>        DynamoDB primary key value
        -t <table>      DynamoDB table

Recipes

# Watch the list of currently running or queued workflows
./workflows-cli.sh ls -w

# Watch the list of currently running or queued workflows for a different workflow lock key
./workflows-cli.sh ls -w -k other-workflow-lock-key

# List the all completed workflows
./workflows-cli.sh ls -s SUCCESS,FAILED,CANCELLED

# Manually release the lock if something gets stuck
# List workflows...
./workflows-cli.sh ls
# ...then copy the workflow ID and pass it to the `cancel` command:
./workflows-cli.sh cancel <id>

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
src		src
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
cloudformation_template.yml		cloudformation_template.yml
publish-alpha.sh		publish-alpha.sh
release.sh		release.sh
workflows-cli.sh		workflows-cli.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

src

src

.gitignore

.gitignore

LICENSE.md

LICENSE.md

README.md

README.md

cloudformation_template.yml

cloudformation_template.yml

publish-alpha.sh

publish-alpha.sh

release.sh

release.sh

workflows-cli.sh

workflows-cli.sh

Repository files navigation

CircleCI Orb: Workflow Manager

Rationale

Installation

Requirements

Deploying the Stack

IAM Permissions

CircleCI Context

Usage

Entering the Queue: `wait-in-queue`

Exiting the Queue: `exit-queue`

Squashing Commits

Workflows CLI

Recipes

About

Releases 11

Packages

Contributors 3

Languages

License

NarrativeScience-old/circleci-orb-workflow-manager

Folders and files

Latest commit

History

Repository files navigation

CircleCI Orb: Workflow Manager

Rationale

Installation

Requirements

Deploying the Stack

IAM Permissions

CircleCI Context

Usage

Entering the Queue: wait-in-queue

Exiting the Queue: exit-queue

Squashing Commits

Workflows CLI

Recipes

About

Topics

Resources

License

Stars

Watchers

Forks

Languages

Entering the Queue: `wait-in-queue`

Exiting the Queue: `exit-queue`