an automated RNA-Seq workflow system
Branch: master
Clone or download
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
docs Merge pull request #62 from dib-lab/eel-icon Feb 4, 2019
ep_utils move CaptureStdout object to ep_utils Jan 31, 2019
examples fix URLs (#72) Feb 1, 2019
mkdocs-material-dib
rules Update trinity-wrapper.py Feb 14, 2019
schemas start refactor Nov 30, 2018
scripts deseq2: working pca rule Jul 5, 2018
tests Add eelpond tests/ directory (#75) Feb 2, 2019
.editorconfig add tsv tab indent Jul 6, 2018
.gitignore remove some redundancy in specifying workflows Jan 17, 2019
.travis.yml
README.md remove nema-test and rename nema-download to nema Jan 25, 2019
Snakefile
environment.yml Add eelpond tests/ directory (#75) Feb 2, 2019
mkdocs.yml
run_eelpond move CaptureStdout object to ep_utils Jan 31, 2019

README.md

eelpond

Build Status

                           ___
                        .-'   `'.
                       /         \
                      |           ;
                      |           |           ___.--,
             _.._     |O)  ~  (O) |    _.---'`__.-( (_.       
      __.--'`_.. '.__.\      '--. \_.-' ,.--'`     `""`
     ( ,.--'`   ',__ /./;     ;, '.__.'`    __
     _`) )  .---.__.' / |     |\   \__..--""  """--.,_
    `---' .'.''-._.-'`_./    /\ '.  \_.-~~~````~~~-.__`-.__.'
          | |  .' _.-' |    |  \  \  '.
           \ \/ .'     \    \   '. '-._)
            \/ /        \    \    `=.__`-~-.
            / /\         `)   )     / / `"".`\
      , _.-'.'\ \        /   /     (  (   /  /
       `--~`  )  )    .-'  .'       '.'. |  (
             (/`     (   (`           ) ) `-;
              `       '--;            (' 

eelpond started as a snakemake update of the Eel Pond Protocal for de novo RNAseq analysis. It has evolved slightly to enable a number of workflows for (mostly) RNA data, which can all be run via the eelpond workflow wrapper. eelpond uses snakemake for workflow management and conda for software installation. The code can be found here.

Getting Started

Linux is the recommended OS. Nearly everything also works on MacOSX, but some programs (fastqc, Trinity) are troublesome.

If you don't have conda yet, install miniconda (for Ubuntu 16.04 Jetstream image):

wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh -b
echo export PATH="$HOME/miniconda3/bin:$PATH" >> ~/.bash_profile
source ~/.bash_profile

Now, get the eelpond code

git clone https://github.com/dib-lab/eelpond.git
cd eelpond

Create a conda environment with all the dependencies for eelpond

conda env create --file environment.yml -n eelpond

Activate that environment. You'll need to do this anytime you want to run eelpond

conda activate eelpond

Now you can start running workflows on test data!

Default workflow: Eel Pond Protocol for de novo RNAseq analysis

The Eel Pond protocol (which inspired the eelpond name) included line-by-line commands that the user could follow along with using a test dataset provided in the instructions. We have re-implemented the protocol here to enable automated de novo transcriptome assembly, annotation, and quick differential expression analysis on a set of short-read Illumina data using a single command. See more about this protocol here.

To test the default workflow:

./run_eelpond examples/nema.yaml default

This will download and run a small set of Nematostella vectensis test data (from Tulin et al., 2013)

Running Your Own Data

To run your own data, you'll need to create two files:

  • a tsv file containing your sample info
  • a yaml file containing basic configuration info

Generate these by following instructions here: Understanding and Configuring Workflows.

Available Workflows

You can see the available workflows (and which programs they run) by using the --print_workflows flag:

./run_eelpond examples/nema.yaml --print_workflows

subworkflows

  • preprocess: Read Quality Trimming and Filtering (fastqc, trimmomatic)
  • kmer_trim: Kmer Trimming and/or Digital Normalization (khmer)
  • assemble: Transcriptome Assembly (trinity)
  • assemblyinput: Specify assembly for downstream steps
  • annotate : Annotate the transcriptome (dammit, sourmash)
  • quantify: Quantify transcripts (salmon)
  • plass_assemble: assemble at the protein level with PLASS
  • paladin_map: map to a protein assembly using paladin

main workflows:

  • default: preprocess, kmer_trim, assemble, annotate, quantify
  • protein assembly: preprocess, kmer_trim, plass_assemble, paladin_map

Each included tool can also be run independently, if appropriate input files are provided. See each tool's documentation for details.

Additional Info

See the help, here:

./run_eelpond -h

References: