Transcript assembly

This data set is part of the Polistes dominula genome project, and details the assembly of the P. dominula transcriptome, as described in (Standage et al., Molecular Ecology, 2016). Included in this data set is the final transcript assembly itself, as well as documentation providing complete disclosure of the assembly procedure.

Synopsis

RNA-Seq reads were processed using Trimmomatic version 0.22 to remove adapter contamination and low-confidence base calls, and the processed reads were assembled using Trinity version r20131110. The assembled transcripts were then processed with mRNAmarkup version 10-3-2013 to remove contaminants and correct erroneously assembled chimeric transcripts.

Data access

Raw Illumina data is available from the NCBI Sequence Read Archive under the accession number PRJNA291219. The GetTranscriptomeSRA.make script automates the process of downloading these data files and converting them from SRA format to Fastq format. This script in turn depends on the fastq-dump command included in the SRA toolkit binary distribution.

./GetTranscriptomeSRA.make

Procedure

Short read quality control

First, we designated the number of processors available, and provided the path of the trimmomatic-0.22.jar file distributed with the Trimmomatic source code distribution.

NumProcs=16
TrimJar=/usr/local/src/Trimmomatic-0.22/trimmomatic-0.22.jar

We then applied the following filters to each read pair (see run-trim.sh for details).

remove adapter contamination
remove any nucleotides at either end of the read whose quality score is below 3
trim the read once the average quality in a 5bp sliding window falls below 20
discard any reads which, after processing, fall below the 40bp length threshold

for caste in q w
do
  for rep in {1..6}
  do
    sample=${caste}${rep}
    ./run-trim.sh $sample $TrimJar $NumProcs
  done
done

Assembly with Trinity

Trinity requires a pair of input files for paired-end data.

cat pdom-rnaseq-*-trim-1.fq > pdom-rnaseq-all-trim-1.fq
cat pdom-rnaseq-*-trim-2.fq > pdom-rnaseq-all-trim-2.fq

We then executed the Trinity assember using the --CuffFly algorithm.

Trinity.pl --seqType fq \
           --JM 100G \
           --bflyHeapSpaceMax 50G \
           --output pdom-trinity \
           --CPU $NumProcs \
           --left pdom-rnaseq-all-trim-1.fq \
           --right pdom-rnaseq-all-trim-2.fq \
           --full_cleanup \
           --jaccard_clip \
           --CuffFly

Post-processing with mRNAmarkup

Contaminant, reference protein, and miRNA databases were collected as described in the mRNAmarkup documentation (db/0README and db/0README-hy). The mRNAmarkup procedure was then run on the Trinity output. Be sure to edit the mRNAmarkup.conf file with the correct paths to the databases.

mRNAmarkup -c mRNAmarkup.conf \
           -i pdom-trinity/Trinity.fasta \
           -o output-mRNAmarkup

This work is licensed under a Creative Commons Attribution 4.0 International License.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
GetTranscriptomeSRA.make		GetTranscriptomeSRA.make
LICENSE.txt		LICENSE.txt
README.md		README.md
illumina-std-adapters.fa		illumina-std-adapters.fa
pdom-tsa-r1.2.fa.gz		pdom-tsa-r1.2.fa.gz
run-trim.sh		run-trim.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GetTranscriptomeSRA.make

GetTranscriptomeSRA.make

LICENSE.txt

LICENSE.txt

README.md

README.md

illumina-std-adapters.fa

illumina-std-adapters.fa

pdom-tsa-r1.2.fa.gz

pdom-tsa-r1.2.fa.gz

run-trim.sh

run-trim.sh

Repository files navigation

Transcript assembly

Synopsis

Data access

Procedure

Short read quality control

Assembly with Trinity

Post-processing with mRNAmarkup

About

Releases

Packages

Languages

License

PdomGenomeProject/transcript-assembly

Folders and files

Latest commit

History

Repository files navigation

Transcript assembly

Synopsis

Data access

Procedure

Short read quality control

Assembly with Trinity

Post-processing with mRNAmarkup

About

Resources

License

Stars

Watchers

Forks

Languages