sb10 edited this page Feb 18, 2011 · 4 revisions
Clone this wiki locally

This wiki provides meta-instructions and help on using the internally-developed software of the Vertebrate Resequencing group at the Sanger Institute. We write and run software that tracks, processes and analyzes next-generation sequencing data.

vr-codebase is our main repository for all our pipelines and associated methods that do things like mapping, QC, SNP calling and so on.


The first thing you'll want to do is get a copy of our software. If you don't plan on developing our code and contributing improvements back to us, simply clone the repository and use the master branch (or just make use of the 'Downloads' button). If you want to contribute, make your own fork of the repository and do your development on the develop branch before sending us a pull request. If you're new to git and the preceding doesn't mean much to you, read our brief git guide.

Our pipelines are currently hard-coded to only work with LSF, so you'll want this installed on your farm. A limited amount of our software is still useful and usable without LSF.

Now follow the advice in the README.

Getting Started

You can poke around in modules/VertRes and read the POD documentation on the various modules there. There are lots of generally useful methods that you can make use of in your own code.

You can also take a look inside the scripts directory and see if anything sounds useful to you.

Our current pipeline system is based around the run-pipeline script. It is this that we run in a cron job or leave running in a loop that submits and tracks jobs of a given pipeline to a cluster. You give the script a config file that defines the dataset you want to use, the pipeline you want to run, and any parameters the pipeline might need.

Read through the Pipeline Tutorial to get a better idea of how our pipelines are supposed to be setup and run.