Find file History
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
..
Failed to load latest commit information.
Nextera_XT_adapter.fa
README.md

README.md

Bioinformatic pipelines for mumps genome sequencing

Louise Moncla1, Allison Black1,2, Trevor Bedford1

1Department of Epidemiology, University of Washington, Seattle, WA, USA, 2Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA

Overview of bioinformatic processing of mumps sequencing reads

  1. Adapter and quality trimming with Trimmomatic
  2. Mapping with bowtie2
  3. Manual inspection of mapping and consensus genome calling with Geneious
  4. Re-mapping fastq files called consensus with bowtie2

Trimming

Trimming was performed with Trimmomatic to remove Illumina adapter sequencing and ends of reads with low quality scores. Reads were trimmed in 5 bp windows to a quality score of Q30, and trimmed reads with length < 100 bp were discarded, using the following command: java -jar Trimmomatic-0.36/trimmomatic-0.36.jar SE input.fastq output.fastq ILLUMINACLIP:Nextera_XT_adapter.fa:1:30:10 SLIDINGWINDOW:5:30 MINLEN:100

Mapping

We used a genome from the mumps outbreak in Massachusetts as a reference sequence. We performed a local mapping of our trimmed reads to that reference using bowtie2, with the following command: bowtie2 -x reference_sequence.fasta -U read1.trimmed.fastq,read2.trimmed.fastq -S output.sam --local

The mapping (bam) file was manually inspected in Geneious.

Consensus sequence calling

Consensus sequences were called in Geneious, with nucleotide sites with <100x coverage called as Ns. Consensus genomes were exported in fasta format.

Remapping

To avoid issues with mapping to improper reference sequences, we then remapped each sample's fastq files to its own consensus sequence. These bam files were again manually inspected in Geneious, and a final consensus sequence was called. Those consensus genomes for which we acquired at least 80% full-genome coverage are available here as fasta files.