Skip to content

6. De novo transcriptome assembly using TRINITY

Santiago Montero-Mendieta edited this page Mar 22, 2017 · 3 revisions

When no reference genome is available to assist in the assembly, a de-novo transcriptome assembly is needed. This is often the case for non-model organisms.

We used TRINITY for the de-novo assembly:

/path_to_trinity/Trinity --seqType fq --SS_lib_type RF \
--left /path_to_normalized_data/reads_left_P_qtrim.fq.normalized_K25_C50_pctSD200.fq \
--right /path_to_normalized_data/reads_right_P_qtrim.fq.normalized_K25_C50_pctSD200.fq \
--CPU 8 --max_memory 100G --output /path_to_transcriptome

The parameters are defined as follows:

  • --seqType fq: data is on FASTQ format.
  • --SS_lib_type RF: used for paired-end reads to reflect both ends.
  • --left brain_left_P_qtrim.fq.normalized_K25_C50_pctSD200.fq: left reads.
  • --right brain_right_P_qtrim.fq.normalized_K25_C50_pctSD200.fq: right reads.
  • --CPU 8: number of threads to use.
  • --max_memory 100G: number of GB of RAM to use.
  • --output: name of directory for output.

As a result, we obtained a FASTA file containing the assembled transcripts:

Trinity.fasta 

To avoid redundant transcripts, we kept the longest isoform for each “gene” identified by TRINITY (unigene) using the “get_longest_isoform_seq_per_trinity_gene.pl” utility in TRINITY:

perl /path_to_trinity/util/misc/get_longest_isoform_seq_per_trinity_gene.pl Trinity.fasta > Trinity.longest.fasta

..................................................................................................................................................

                                   END OF THIS SECTION

..................................................................................................................................................

Next step: 7. Assembly validation using BOWTIE, BUSCO & BLASTX