Skip to content

Snakemake profile to execute workflows on the Broad Institute UGER cluster

License

Notifications You must be signed in to change notification settings

broadinstitute/snakemake-broad-uger

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Snakemake profile for Broad Institute UGER cluster

Snakemake is a Pythonic workflow description language, that is easily configurable to run in all sorts of environments. Since version 4.1, Snakemake contains a feature called 'profiles', for easy exchange of configuration presets for running in a certain environment. This repository contains a snakemake profile to run your workflow on the Broad's UGER cluster.

Installation

Preparing a conda environment

The recommended way to use this Snakemake profile is to create a separate conda environment for your project. This environment will contain a separate Python installation specifically for your project, where you control which packages are installed. In the example below we will create an environment named snakemake (with the -n switch), but you can name it anything you want. Furthermore, Snakemake requires Python>=3.5, so we install Python 3 along with two additional packages: Snakemake itself and the package cookiecutter (used to install this profile).

use Anaconda3

# Create new conda environment with up to date snakemake
conda create -n snakemake "python>=3.6"
source activate snakemake

pip install snakemake cookiecutter

# (Optional) You can now install additional dependencies specific to your 
# project
conda install numpy scipy ...

NB: Conda creates the environment by default in your home directory. At Broad, your home directory is limited to 5GB so this may fill up quickly. It's probably a good idea to store the Conda environment in some other place. This can be done by replacing -n snakemake with --prefix /path/where/env/will/be/stored, and also specify the path to your conda environment when issuing the source activate command.

Install the Snakemake profile

Change to the directory containing your Snakefile and issue the following command:

cookiecutter gh:broadinstitute/snakemake-broad-uger

This command will ask a few questions:

  1. You can optionally specify a different profile name than the default (broad-uger).
  2. Which cluster to use, both UGER and UGES are supported.
  3. Whether to use the --immediate-submit option of Snakemake. With this option snakemake will submit every job immediately to the cluster, configuring which job depends on which (using -hold_jid option of qsub), and exit immediately afterwards. Not recommended, it's better to just run the snakemake master process on the cluster login node.
  4. Last but not least, specify the name (when using -n above) or the path (when using --prefix above) to the conda environment you want to use.

Using the Snakemake profile

We're ready to go! To use this profile invoke Snakemake as follows:

snakemake --profile broad-uger ...

If you're not using --immediate-submit, the Snakemake master process must be alive for the whole duration of your workflow (i.e. until all jobs have finished). My recommendation would be to start the Snakemake process on one of the login nodes, in a screen session. This makes sure the Snakemake master process doesn't get killed when you lose your SSH connection.

Example:

# Start screen session with snakemake in the background
screen -dmS snakemake snakemake --profile broad-uger ...

# View output:
screen -x snakemake

The Snakemake master process is light weight so it shouldn't be a problem to run this on the login node.

Resource specification

This profile determines the runtime, memory and amount of cores as follows:

  • Runtime: specify in your --cluster-config file, with key runtime
  • Memory: Specify in your rule under resources with key mem_mb. Can be overridden by specifying a value in your --cluster-config file.
  • Cores/CPUs: specify using threads per rule.
  • UGER project: specify in --cluster-config file with key project

Read more about:

Acknowledgements

The cluster submission and jobscripts are partly taken/inspired by the corresponding files in the broadinstitute/viral-ngs repository.

About

Snakemake profile to execute workflows on the Broad Institute UGER cluster

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published