Skip to content

dib-lab/dib-MMETSP

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 

dib-MMETSP

Output files available for download:

Transcriptome assemblies (fasta): DOI

Annotations (gff): DOI

Table of one annotation name (best = sorted by e-value < 1e-05) by transcript ID (.csv): DOI

Peptide translations (fasta): DOI

Expression quantification (salmon output): DOI

All files combined: DOI

Pipeline scripts: DOI

Citation:

Johnson, Lisa K., Alexander, Harriet, & Brown, C. Titus. (2018). MMETSP re-assemblies [Data set]. Zenodo. https://doi.org/10.5281/zenodo.740440

MMETSP pipeline

This respository contains the pipeline code used to generate re-assemblies of the Marine Microbial Eukaryote Transcriptome Sequencing Project (MMETSP). Originally: https://github.com/ljcohen/MMETSP

This pipeline was constructed to automate the eel pond khmer protocols over a large-scale RNAseq data set. The data set used is from the Marine Microbial Eukaryotic Transcriptome Sequencing Project (MMETSP), which contains 678 cultured samples of 306 pelagic and endosymbiotic marine eukaryotic species representing more than 40 phyla (Keeling et al. 2014).

Input file is SraRunInfo.csv, a metadata spreadsheet downloaded from NCBI-SRA that contains the url and sample ID information. Scripts were designed for the high performance computing cluster at Michigan State University, iCER, and will be launched in parallel through the portable batch system (PBS) scheduler. Scripts will use the SraRunInfo.csv metadata spreadsheet to download and extract data, run qc, trim, diginorm, then assemble using Trinity. If you are interested in using these scripts, please be aware that modifications will be required specific to the system you are using.

The main pipeline scripts in this repository:

  • getdata.py, download data from NCBI and organize into individual directories for each sample/accession ID
  • trim_qc.py, trim reads for quality, interleave reads
  • diginorm_mmetsp.py, normalize-by-median and filter-abund from khmer, rename, combined orphans
  • assembly.py, runs Trinity de novo transcriptome assembly software

Annotation and expression counts (run separately):

Additional scripts (run separately):

Usage:

  1. Clone this repo
git clone https://github.com/dib-lab/dib-MMETSP.git
  1. edit dibMMETSP_configuration.py with absolute path names specific to your system. The file SraRunInfo.csv was obtained from NCBI for NCBI Bioproject accession: PRJNA231566. This set of code could be used with SraRunInfo.csv input from any collection of SRA records from NCBI or ENA.

  2. Run the main python function

python main.py

References

Keeling et al. 2014: http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1001889

Supporting information with methods description: http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1001889#s6

Preliminary assembly protocol run by NCGR: https://github.com/ncgr/rbpa

MMETSP website: http://marinemicroeukaryotes.org/

iMicrobe project with data and combined assembly downloads: ftp://ftp.imicrobe.us/projects/104/

Blog posts: https://monsterbashseq.wordpress.com/2016/09/13/mmetsp-re-assemblies/

http://ivory.idyll.org/blog/2016-mmetsp-a-first-look.html

About

Code to generate re-assemblies of the Marine Microbial Eukaryote Transcriptome Sequencing Project (MMETSP)

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages