Bioinformatic pipelines for mumps genome sequencing
Louise Moncla1, Allison Black1,2, Trevor Bedford1
1Department of Epidemiology, University of Washington, Seattle, WA, USA, 2Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
Overview of bioinformatic processing of mumps sequencing reads
- Adapter and quality trimming with Trimmomatic
- Mapping with bowtie2
- Manual inspection of mapping and consensus genome calling with Geneious
- Re-mapping fastq files called consensus with bowtie2
Trimming was performed with Trimmomatic to remove Illumina adapter sequencing and ends of reads with low quality scores. Reads were trimmed in 5 bp windows to a quality score of Q30, and trimmed reads with length < 100 bp were discarded, using the following command:
java -jar Trimmomatic-0.36/trimmomatic-0.36.jar SE input.fastq output.fastq ILLUMINACLIP:Nextera_XT_adapter.fa:1:30:10 SLIDINGWINDOW:5:30 MINLEN:100
We used a genome from the mumps outbreak in Massachusetts as a reference sequence. We performed a local mapping of our trimmed reads to that reference using bowtie2, with the following command:
bowtie2 -x reference_sequence.fasta -U read1.trimmed.fastq,read2.trimmed.fastq -S output.sam --local
The mapping (bam) file was manually inspected in Geneious.
Consensus sequence calling
Consensus sequences were called in Geneious, with nucleotide sites with <100x coverage called as Ns. Consensus genomes were exported in fasta format.
To avoid issues with mapping to improper reference sequences, we then remapped each sample's fastq files to its own consensus sequence. These bam files were again manually inspected in Geneious, and a final consensus sequence was called. Those consensus genomes for which we acquired at least 80% full-genome coverage are available here as fasta files.