6. De novo transcriptome assembly using TRINITY

When no reference genome is available to assist in the assembly, a de-novo transcriptome assembly is needed. This is often the case for non-model organisms.

We used TRINITY for the de-novo assembly:

/path_to_trinity/Trinity --seqType fq --SS_lib_type RF \
--left /path_to_normalized_data/reads_left_P_qtrim.fq.normalized_K25_C50_pctSD200.fq \
--right /path_to_normalized_data/reads_right_P_qtrim.fq.normalized_K25_C50_pctSD200.fq \
--CPU 8 --max_memory 100G --output /path_to_transcriptome

The parameters are defined as follows:

--seqType fq: data is on FASTQ format.
--SS_lib_type RF: used for paired-end reads to reflect both ends.
--left brain_left_P_qtrim.fq.normalized_K25_C50_pctSD200.fq: left reads.
--right brain_right_P_qtrim.fq.normalized_K25_C50_pctSD200.fq: right reads.
--CPU 8: number of threads to use.
--max_memory 100G: number of GB of RAM to use.
--output: name of directory for output.

As a result, we obtained a FASTA file containing the assembled transcripts:

Trinity.fasta

To avoid redundant transcripts, we kept the longest isoform for each “gene” identified by TRINITY (unigene) using the “get_longest_isoform_seq_per_trinity_gene.pl” utility in TRINITY:

perl /path_to_trinity/util/misc/get_longest_isoform_seq_per_trinity_gene.pl Trinity.fasta > Trinity.longest.fasta

..................................................................................................................................................

                                   END OF THIS SECTION

..................................................................................................................................................

Next step: 7. Assembly validation using BOWTIE, BUSCO & BLASTX

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

6. De novo transcriptome assembly using TRINITY

Clone this wiki locally