-
Notifications
You must be signed in to change notification settings - Fork 10
6. De novo transcriptome assembly using TRINITY
When no reference genome is available to assist in the assembly, a de-novo transcriptome assembly is needed. This is often the case for non-model organisms.
We used TRINITY for the de-novo assembly:
/path_to_trinity/Trinity --seqType fq --SS_lib_type RF \
--left /path_to_normalized_data/reads_left_P_qtrim.fq.normalized_K25_C50_pctSD200.fq \
--right /path_to_normalized_data/reads_right_P_qtrim.fq.normalized_K25_C50_pctSD200.fq \
--CPU 8 --max_memory 100G --output /path_to_transcriptome
The parameters are defined as follows:
-
--seqType fq
: data is on FASTQ format. -
--SS_lib_type RF
: used for paired-end reads to reflect both ends. -
--left brain_left_P_qtrim.fq.normalized_K25_C50_pctSD200.fq
: left reads. -
--right brain_right_P_qtrim.fq.normalized_K25_C50_pctSD200.fq
: right reads. -
--CPU 8
: number of threads to use. -
--max_memory 100G
: number of GB of RAM to use. -
--output
: name of directory for output.
As a result, we obtained a FASTA file containing the assembled transcripts:
Trinity.fasta
To avoid redundant transcripts, we kept the longest isoform for each “gene” identified by TRINITY (unigene) using the “get_longest_isoform_seq_per_trinity_gene.pl” utility in TRINITY:
perl /path_to_trinity/util/misc/get_longest_isoform_seq_per_trinity_gene.pl Trinity.fasta > Trinity.longest.fasta
..................................................................................................................................................
END OF THIS SECTION
..................................................................................................................................................
Next step: 7. Assembly validation using BOWTIE, BUSCO & BLASTX
This practical guide is maintained by Santiago Montero-Mendieta © 2017 GitHub, Inc.