Archival and automation of bioinformatic pipelines and data
Python TypeScript HTML JavaScript CSS Shell Other
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
.settings
TravisSlurmPackages
api
db
docs
kive
kivebackup-devel
raw_assets
samplecode
utils
vagrant
.gitignore
.project
.pydevproject
.travis.yml
AUTHORS
CONTRIBUTING.md
Gruntfile.js
INSTALL.md
LICENSE.txt
README.md
__init__.py
karma.conf.js
package.json
pytest.ini
requirements-dev.py34.txt
requirements-dev.txt
requirements-test.py34.txt
requirements-test.txt
requirements.py34.txt
requirements.txt
travis_settings_modification.sed
travis_slurm.conf
travis_slurm_install.bash
tsconfig.json
tslint.json
webpack.config.js

README.md

Kive

Kive is an accessible computing framework for the version control of bioinformatic pipelines, along with their input and output datasets.

Background

  • Bioinformatic "pipelines" are collections of software programs that are used to process and analyze biological data.
  • Pipelines have become essential tools in modern biomedical and clinical laboratories.
  • Most pipelines are customized to meet the requirements of each lab and project. Therefore they are usually under constant development.
  • The end-users are often unaware of revisions being made to pipelines.
  • It can be difficult to determine which version of a pipeline was used to process a given data set, especially when there are multiple copies of results.
  • This makes it difficult to reproduce results for method validation or publication.
  • Clinical laboratory accreditation programs (such as the College of American Pathologists, CAP) have issued new requirements for the validation and version tracking of bioinformatic pipelines.
  • A system for tracking this information should make it possible to look up the pipeline history of any data set. It should be easy to use, with an intuitive graphical interface, and with as much of the "bookkeeping" automated as possible. We could not find a system that met these criteria.

What does Kive do?

We developed our new framework ("Kive") as a Django application. Django is a Python framework for developing web applications.

Kive is built on a PostgreSQL relational database. The database records the digital "fingerprint" (md5 checksum) of every version of pipeline components and data sets, their locations in the filesystem, and their relations to each other.

Executing a pipeline version on a data set is completely automated by Kive, which distributes jobs across computing resources (such as a computing cluster) and records every intermediate step in the database. Any intermediate step that can be re-used in subsequent pipeline versions will be loaded to minimize computing time.

Read/write privileges to pipelines and data sets in Kive are specific to users and groups.

Kive also features a web-based graphical user interface, including a point-and-click toolkit for assembling and running pipelines that is implemented in HTML5 Canvas and JavaScript.

We used Kive to track versions of pipelines being developed in-house for processing and interpreting raw data sets from an Illumina MiSeq. This pipeline comprises 8 scripts written in Python, Ruby, and R. For more information, read about how we fixed a problem with bad cycles in our example application.

Client requirements

The following browsers are supported

Browser Basic Support Bulk Upload Feature
Google Chrome version 4 version 5
Firefox version 4 version 4
Safari version 3.1 version 7
Internet Explorer version 10 version 10

RESTful API

You can upload data, launch pipelines, and update pipelines all through Kive's API. You can also use our Python library to script calls to the API.

What are we working on?

You can see active tasks on our project board, or look at the current milestone's burndown.