title | author | date | always_allow_html | output | bibliography | csl | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Whole Genome Sequencing of <i>Ascochyta rabiei</i> Isolates |
Ido Bar |
18 July 2017 |
true |
|
style/Fungal_genomes.bib |
style/springer-basic-improved-author-date-with-italic-et-al-period.csl |
In 2017, DNA was extracted from 21 strains of Ascochyta rabiei and sent for Whole-Genome-Sequencing (WGS) on an Illumina HiSeq2500, producing 100 bp short paired-end reads (Macrogen, Korea).
In the following year (2018), DNA from 20 additional A. rabiei isolates was extracted and sent for WGS, first to AgriBio, Centre for AgriBioscience, Agriculture Victoria Research and on a HiSeq3000, producing 150 bp short paired-end reads. Since the library preparation and sequencing was substantially delayed, 18 DNA samples, mostly overlapping with the 20 samples sent for AgriVic, were sent for sequencing at the Australian Genome Research Facility (AGRF, Melbourne) on 4 lanes of a NextSeq500 flowcell, producing 150 bp paired-end reads (run name CAGRF19461).
Details of the sequenced isolates is provided in (Table 1).
Isolate | Site | State | Collection_Year | Host Cultivar | Host | Region | Haplotype | Pathotype |
---|---|---|---|---|---|---|---|---|
TR9529 | Chinchilla | QLD | 2017 | PBA Seamer | Seamer | Unknown | Unknown | Extreme |
TR9571 | Gurley | NSW | 2017 | PBA Seamer | Seamer | Reg3 | ARH09 | Extreme |
TR9573 | Gurley | NSW | 2017 | PBA Seamer | Seamer | Reg2 | ARH01 | Extreme |
F17191-1 | Pt Broughton | SA | 2017 | Genesis090 | Genesis090 | Unknown | Unknown | Extreme |
F17076-2 | Finley | NSW | 2017 | Genesis090 | Genesis090 | Unknown | Unknown | Very High |
TR9543 | Fox Holes | QLD | 2017 | PBA Seamer | Seamer | Unknown | Unknown | Very High |
16CUR018 | Curyo | VIC | 2016 | Genesis090 | Genesis090 | Unknown | Unknown | Very High |
15CUR002 | Curyo | VIC | 2015 | Genesis090 | Genesis090 | Reg5 | ARH02 | Very High |
15CUR005 | Curyo | VIC | 2015 | Genesis090 | Genesis090 | Reg5 | ARH20 | Very High |
TR6417 | Yallaroi | NSW | 2014 | PBA HatTrick | HatTrick | Reg3 | ARH01 | Very High |
FT13092-2 | Kingsford | SA | 2013 | Genesis090 | Genesis090 | Reg6 | ARH04 | Very High |
17CUR007 | Curyo | VIC | 2017 | Genesis090 | Genesis090 | Reg5 | ARH01 | High |
TR9568 | Gurley | NSW | 2017 | PBA Seamer | Seamer | Reg3 | ARH09 | High |
16CUR017 | Curyo | VIC | 2016 | Genesis090 | Genesis090 | Reg5 | ARH01 | High |
16CUR019 | Curyo | VIC | 2016 | Genesis090 | Genesis090 | Reg5 | ARH01 | High |
F16083-1 | Moonta | SA | 2016 | Genesis090 | Genesis090 | Reg6 | ARH01 | High |
F16253-1 | Pt Broughton | SA | 2016 | Genesis090 | Genesis090 | Unknown | Unknown | High |
15DON007 | Donald | VIC | 2015 | Slasher | Slasher | Reg5 | ARH01 | High |
FT15023 | Moonta | SA | 2015 | Genesis090 | Genesis090 | Unknown | Unknown | High |
FT15025 | Moonta | SA | 2015 | Genesis090 | Genesis090 | Unknown | Unknown | High |
FT15028 | Weetula | SA | 2015 | Genesis090 | Genesis090 | Unknown | Unknown | High |
FT15029 | Weetula | SA | 2015 | Genesis090 | Genesis090 | Unknown | Unknown | High |
FT15030 | Weetula | SA | 2015 | Genesis090 | Genesis090 | Reg6 | ARH01 | High |
FT13092-4 | Kingsford | SA | 2013 | Genesis090 | Genesis090 | Reg6 | ARH01 | High |
16CUR015 | Curyo | VIC | 2016 | Genesis090 | Genesis090 | Reg5 | ARH14 | Moderate |
TR8102 | Narromine | NSW | 2016 | PBA HatTrick | HatTrick | Reg4 | ARH01 | Moderate |
15CUR001 | Curyo | VIC | 2015 | Genesis090 | Genesis090 | Unknown | Unknown | Moderate |
FT13092-6 | Kingsford | SA | 2013 | Genesis090 | Genesis090 | Reg6 | ARH01 | Moderate |
16RUP012 | Rupanyup | VIC | 2016 | Genesis090 | Genesis090 | Reg5 | ARH01 | Medium |
16RUP013 | Rupanyup | VIC | 2016 | Genesis090 | Genesis090 | Reg5 | ARH01 | Medium |
TR8105 | Strathdoon, Narromine | NSW | 2016 | PBA HatTrick | HatTrick | Reg4 | ARH01 | Medium |
15CUR003 | Curyo | VIC | 2015 | Genesis090 | Genesis090 | Reg5 | ARH20 | Medium |
14DON003 | Donald | VIC | 2014 | Slasher | Slasher | Reg5 | ARH01 | Medium |
TR6400 | Yallaroi | NSW | 2014 | PBA HatTrick | HatTrick | Reg3 | ARH04 | Medium |
F17067-1 | Coonalpyn | SA | 2017 | Genesis090 | Genesis090 | Unknown | Unknown | Low |
F17175-1 | Elmore | VIC | 2017 | Genesis090 | Genesis090 | Unknown | Unknown | Low |
TR9544 | Fox Holes | QLD | 2017 | PBA Seamer | Seamer | Unknown | Unknown | Low |
TR9538 | Gravel Pit Hill | QLD | 2017 | PBA Seamer | Seamer | Unknown | Unknown | Low |
15DON001 | Donald | VIC | 2015 | Genesis090 | Genesis090 | Reg5 | ARH01 | Low |
TR6408 | Yallaroi | NSW | 2014 | PBA HatTrick | HatTrick | Reg3 | ARH01 | Low |
- Identify strain-unique variants to develop detection methods
- Associate aggressiveness with specific variants
- Data pre-processing: a. Quality check b. Adaptor trimming c. Post-trim quality check
- Mapping reads to a reference genome (keep unmapped)
- Reads deduplication
- Variant calling and filtration
- Variant annotation
- Variant-Pathogenicity association
- Produce variant statistics and assessment
DNA-Seq data processing, mapping and variant calling were performed on the Griffith University Gowonda HPC Cluster (using Torque scheduler), following the methods specified by @hagiwara_whole-genome_2014 (see details in Appendix 2), @haas_approaches_2011, @hittalmani_novo_2016 and @verma_draft_2016, with modification to use FreeBayes v1.2.0 [@garrison_haplotype-based_2012] to assign variant probability scores and call variants.
An alternative approach was tested, using a complete suite of tools from BBtools v38.22; @bushnell_bbmap:_2014. See official download page on SourceForge, user guide and SEQanswers thread.
Detailed methods, including code for running each of the analyses steps are provided in the associated A_rabiei_WGS_analysis GitHub repository.
- Whole-Genome Comparison of Aspergillus fumigatus Strains Serially Isolated from Patients with Aspergillosis. [@hagiwara_whole-genome_2014]:
Sequence analysis: The Illumina data sets were trimmed using fastq-mcf in ea-utils (version 1.1.2-484), i.e., sequencing adapters and sequences with low quality scores (Phred score [Q], <30) were removed (24). The data sets were mapped to the genome sequence of the A. fumigatus genome reference strain Af293 (29,420,142 bp, genome version s03-m04-r03) (25, 26) using Bowtie 2 (version 2.0.0-beta7) with the very sensitive option in end-to-end mode (27). Duplicated reads were removed using Picard (version 1.112) (http://picard.sourceforge.net). The programs mpileup and bcftools from SAMtools (version 0.1.19-44428cd) were used to perform further quality controls. In mpileup, the -q20 argument was used to trim reads with low-quality mapping, whereas the argument -q30 was used to trim low-quality bases at the 3' end (28). The bcftools setting was set to -c in order to call variants using Bayesian inference. Consensus and single nucleotide polymorphisms (SNPs) were excluded if they did not meet a minimum coverage of 5x or if the variant was present in <90% of the base calls (29, 30). The genotype field in the variant call format (VCF) files indicates homozygote and heterozygote probabilities as Phred-scaled likelihoods. SNPs were excluded if they were called as heterozygous genotypes using SAMtools. The mapping results were visualized in the Integrative Genomics Viewer (version 2.3.3) (31, 32). The reference genome data included information on open reading frames and annotations, from which the SNPs were designated non-synonymous or synonymous.
Single nucleotide mutations were confirmed by Sanger sequencing. Regions of approximately 400 bp that contained a mutation were amplified with appropriately designed primer pairs and then sequenced. The primer sequences are listed in Table S1 in the supplemental material, which were named as follows. For verification of the SNPs in strains from patient I or patient II, PaI or PaII was added to the primer name, respectively. For non-synonymous SNPs, synonymous SNPs, or SNPs in a non-coding region, (NS, Syno, NonC) was added to the primer name, respectively.
Analysis of unmapped reads: De novo assembly of the unmapped reads was conducted using the Newbler assembler 2.9 (Roche), with default parameters. The contigs were selected based on size/depth criteria: those of <500 bp and/or with a depth of <30x coverage were removed. To investigate whether unique genome sequences were present in strains isolated from the same patient, the unmapped reads of each strain were mapped to the contigs generated from all the strains in the same patient by the Bowtie 2 software. The coverage of the mapped regions was then evaluated. Gene predictions were performed using the gene prediction tool AUGUSTUS (version 2.5.5), with a training set of A. fumigatus (33). The parameters of AUGUSTUS were -species = aspergillus_fumigatus, -strand = both, -genemodel = partial, -singlestrand = false, -protein = on, -introns = on, -start = on, -stop = on, -cds = on, and -gff3 = on. To compare all the predicted genes with Aspergillus genes, consisting of 244,811 genes available on AspGD (34), a reciprocal BLAST best hit approach was performed by BLASTp (35), with an E value of 1.0e-4. All BLASTp results were filtered based on a BLASTp identity of$\ge80$ % and an aligned length coverage of$\ge80$ %.
This document was last updated at 2019-03-10 02:28:05 using R Markdown (built with R version 3.5.1 (2018-07-02)). Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. It is especially powerful at authoring documents and reports which include code and can execute code and use the results in the output. For more details on using R Markdown see http://rmarkdown.rstudio.com and Rmarkdown cheatsheet.