Permalink
Branch: master
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
232 lines (166 sloc) 8.04 KB

CyVerse logo

Home_Icon Learning Center Home

Filter and Trim High-throughput Sequencing Reads with Trimmomatic

Goal

A high-throughput sequencing run generates large files containing perhaps as many as several 10's of millions of individual sequencing reads. After assessment of sequencing quality using a software such as FastQC, filtering and trimming steps can remove populations of low quality reads, remove sequenicng adaptors, and trim low-quality regions of individual reads. Trimmomatic is a popular software that perform several manipulations to prepare reads for downstream analysis.


Prerequisites

Downloads, access, and services

In order to complete this tutorial you will need access to the following services/software

Prerequisite Preparation/Notes Link/Download
CyVerse account You will need a CyVerse account to complete this exercise Register

Platform(s)

We will use the following CyVerse platform(s):

Platform Interface Link Platform Documentation Learning Center Documentation
Discovery Environment Web/Point-and-click Discovery Environment DE Manual Guide

Input and example data

In order to complete this quickstart you will need to have the following inputs prepared

Input File(s) Format Preparation/Notes Example Data
High-throughput sequencing reads compressed FASTQ (.fq.gz or .fastq.gz - compressed) No pre-processing of these reads is necessary. See Trimmomatic inputs

Get started: Filter, Trim, and Process High-throughput Sequenicng Reads with Trimmomatic

Several of the most popular options for Trimmomatic will be shown here. For all of the options, and additional details including the ordering of cleaning/ filtering steps, see the full Trimmomatic documentation.

  1. Login to the Discovery Environment.

  2. Click on the 'Data' panel. In the desired directory, click the 'File' menu, select 'Create' and then 'New Plain Text File'. Create a Trimmomatic Settings file by entering the desired Trimmomatic functions (one per line) to set the options used by the Trimmomatic program. Click 'Save' and save the file with a '.txt' extenstion in the desired directory. See an example Trimmomatic Settings file.

    Hint

    Trimmomatic has several individual functions (see full Trimmomatic documentation). To specifiy a function and its parameters, you will usually give the function name, followed by a colon separated set of parameters. Commonly used functions include:

    • "SLIDINGWINDOW:<windowSize>:<requiredQuality>": Perform a sliding window trimming, cutting once the average quality within the window falls below a threshold
    • "LEADING:<quality>":Cut bases off the start of a read, if below a threshold quality
    • "TRAILING:<quality>": Cut bases off the end of a read, if below a threshold quality
    • "MINLEN:<length>": Drop the read if it is below a specified length

    Additionally, you can provide Trimmomatic with a file containing a list of adaptor sequences to be trimmed.

    • "ILLUMINACLIP:<fastaWithAdaptersEtc>:<seed mismatches>:<palindrome clip threshold>:<simple clip threshold>" : Cut adapter and other illumina-specific sequences from the read.
  3. Click 'Apps' and open the Trimmomatic App: Trimmomatic-programmable-0.36. Name your analysis, and if desired enter commands and select or adjust the output folder.

  4. Under settings, select 'paaired-ended' or 'single-ended'. Under 'Enter a folder of sequencing files:' select a folder containing one or more sequencing files (.fq.gz or .fastq.gz).

  5. Under 'Trimmer settings file in text format' browse to the location of the Trimmomatic settings file you created in step 2.

  6. If you are using the 'ILLUMINACLIP' function, browse to the location of the fasta file containing Illumina adaptor sequences. (You may find some relavant Illumina adaptors.

  7. Click 'Launch Analysis' to launch the analysis. Click the 'Analysis' button to view job status and obtain results.


Summary

Once completed, the Discovery Environment Trimmomatic App will return the trimmed reads:

Paired End Outputs - 4 outputs for each pair (R1/R2) of reads:

Output Description Example
  • trmPr_readname_R1.fq/.fastq (output_forward_paired)
  • trmPr_readname_R2.fq/.fastq (output_reverse_paired)
  • trmS_readname_R1.fq/.fastq (output_forward_unpaired)
  • trmS_readname_R2.fq/.fastq (output_reverse_unpaired)
Every pair of sequence reads will generate a set of paired reads that have been trimmed according to the functions specified in the provided trimmomatics settings file. See Example outputs

Single End Outputs - 2 outputs for each pair (R1/R2) of reads:

Output Description Example
trimreadname_R1.fq/.fastq Every sequence will generate a trimmed file. None provided.

Next Steps:

To confirm that Trimmomatic processing has achived the desired results, you may wish to evaluate the reads using FastQC.


Additional information, help

Search for an answer: CyVerse Learning Center or CyVerse Wiki

Post your question to the user forum: Ask CyVerse


Fix or improve this documentation


Home_Icon Learning Center Home

Illumina adaptors