___ .-' `'. / \ | ; | | ___.--, _.._ |O) ~ (O) | _.---'`__.-( (_. __.--'`_.. '.__.\ '--. \_.-' ,.--'` `""` ( ,.--'` ',__ /./; ;, '.__.'` __ _`) ) .---.__.' / | |\ \__..--"" """--.,_ `---' .'.''-._.-'`_./ /\ '. \_.-~~~````~~~-.__`-.__.' | | .' _.-' | | \ \ '. \ \/ .' \ \ '. '-._) \/ / \ \ `=.__`-~-. / /\ `) ) / / `"".`\ , _.-'.'\ \ / / ( ( / / `--~` ) ) .-' .' '.'. | ( (/` ( (` ) ) `-; ` '--; ('
eelpond started as a snakemake update of the Eel Pond Protocal for de novo RNAseq analysis. It has evolved slightly to enable a number of workflows for (mostly) RNA data, which can all be run via the
eelpond workflow wrapper.
eelpond uses snakemake for workflow management and conda for software installation. The code can be found here.
Linux is the recommended OS. Nearly everything also works on MacOSX, but some programs (fastqc, Trinity) are troublesome.
wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh bash Miniconda3-latest-Linux-x86_64.sh -b echo export PATH="$HOME/miniconda3/bin:$PATH" >> ~/.bash_profile source ~/.bash_profile
Now, get the eelpond code
git clone https://github.com/dib-lab/eelpond.git cd eelpond
Create a conda environment with all the dependencies for eelpond
conda env create --file environment.yml -n eelpond
Activate that environment. You'll need to do this anytime you want to run eelpond
conda activate eelpond
Now you can start running workflows on test data!
Default workflow: Eel Pond Protocol for de novo RNAseq analysis
The Eel Pond protocol (which inspired the
eelpond name) included line-by-line commands that the user could follow along with using a test dataset provided in the instructions. We have re-implemented the protocol here to enable automated de novo transcriptome assembly, annotation, and quick differential expression analysis on a set of short-read Illumina data using a single command. See more about this protocol here.
To test the default workflow:
./run_eelpond examples/nema.yaml default
This will download and run a small set of Nematostella vectensis test data (from Tulin et al., 2013)
Running Your Own Data
To run your own data, you'll need to create two files:
tsvfile containing your sample info
yamlfile containing basic configuration info
Generate these by following instructions here: Understanding and Configuring Workflows.
You can see the available workflows (and which programs they run) by using the
./run_eelpond examples/nema.yaml --print_workflows
- preprocess: Read Quality Trimming and Filtering (fastqc, trimmomatic)
- kmer_trim: Kmer Trimming and/or Digital Normalization (khmer)
- assemble: Transcriptome Assembly (trinity)
- assemblyinput: Specify assembly for downstream steps
- annotate : Annotate the transcriptome (dammit, sourmash)
- quantify: Quantify transcripts (salmon)
- plass_assemble: assemble at the protein level with PLASS
- paladin_map: map to a protein assembly using paladin
- default: preprocess, kmer_trim, assemble, annotate, quantify
- protein assembly: preprocess, kmer_trim, plass_assemble, paladin_map
Each included tool can also be run independently, if appropriate input files are provided. See each tool's documentation for details.
See the help, here: