Build Snakemake workflows that are command line utilities that can run on a kubernetes cluster. Plus, Travis CI tests with minikube... byok8s = Bring Your Own Kubernetes
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
cli
docs
mkdocs-material-dib @ 745d13f
scripts
test
.gitignore
.gitmodules
.travis.yml
LICENSE
MANIFEST.in
README.md
mkdocs.yml
requirements-docs.txt
requirements-to-freeze.txt
requirements.txt
setup.py

README.md

2019-snakemake-byok8s

travis license minikube 0.32 k8s 0.12 ubuntu bionic ubuntu xenial

Overview

This is an example of a Snakemake workflow that:

  • is a command line utility called byok8s
  • is bundled as an installable Python package
  • is designed to run on a Kubernetes (k8s) cluster
  • can be tested with Travis CI (and/or locally) using minikube

What is byok8s?

byok8s = Bring Your Own Kubernetes (cluster)

k8s = kubernetes

byok8s is a command line utility that launches a Snakemake workflow on an existing Kubernetes cluster. This allows you to do something like this (also see the Installation and Quickstart guides in the documentation):

# Install byok8s
python setup.py build install

# Create virtual k8s cluster
minikube start

# Run the workflow on the k8s cluster
cd /path/to/workflow/
byok8s my-workflowfile my-paramsfile --s3-bucket=my-bucket

# Clean up the virtual k8s cluster
minikube stop

Getting Up and Running

See the Quickstart Guide to get up and running with byok8s.

How does byok8s work?

The command line utility requires the user to provide three input files:

  • A snakemake workflow, via a Snakefile
  • A workflow configuration file (JSON)
  • A workflow parameters file (JSON)

Additionally, the user must create the following resources:

  • A kubernetes cluster up and running
  • An S3 bucket (and AWS credentials to read/write)

A sample Snakefile, workflow config file, and workflow params file are provided in the test/ directory.

The workflow config file specifies which workflow targets and input files to use.

The workflow parameters file specifies which parameters to use for the workflow steps.

Why S3 buckets?

AWS credentials and an S3 bucket is required to run workflows because of restrictions on file I/O on nodes in a kubernes cluster. The Snakemake workflows use AWS S3 buckets as remote providers for the Kubernetes nodes, but this can be modified to any others that Snakemake supports.

AWS credentials are set with the two environment variables:

AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY

These are passed into the Kubernetes cluster by byok8s and Snakemake.

Kubernetes and Minikube

Kubernetes is a technology that utilizes Docker container to orchestrate a cluster of compute nodes. These compute nodes are usually real compute nodes requested and managed via a cloud provider, like AWS or Google Cloud.

But the compute nodes can also be virtual, which is where minikube comes in. It creates a kubernetes cluster that is entirely local and virtual, which makes testing easy. See the byok8s Minikube Guide for details about how to use minikube with byok8s.

The Travis CI tests also utilize minikube to run test workflows. See byok8s Travis Tests for more information.

Cloud Providers

For real workflows, your options for kubernetes clusters are cloud providers. We have guides for the following:

  • AWS EKS (Elastic Container Service)
  • GCP GKE (Google Kuberntes Engine)
  • Digital Ocean Kubernetes service

Kubernetes + byok8s: In Practice

Cloud Provider Kubernetes Service Guide State
Minikube (on AWS EC2) Minikube byok8s Minikube Guide Finished
Google Cloud Platform (GCP) Google Container Engine (GKE) byok8s GCP GKE Guide Finished
Amazon Web Services (AWS) Elastic Container Service (EKS) byok8s AWS EKS Guide Unfinished
Digital Ocean (DO) DO Kubernetes (DOK) byok8s DO DOK Guide Unfinished