Skip to content

Whole-Genome-Sequencing analysis pipeline of Ascochyta rabiei isolates

Notifications You must be signed in to change notification settings

IdoBar/A_rabiei_WGS_analysis

Repository files navigation

title author date always_allow_html output bibliography csl
Whole Genome Sequencing of <i>Ascochyta rabiei</i> Isolates
Ido Bar
18 July 2017
true
bookdown::html_document2
toc toc_depth keep_md
true
3
true
style/Fungal_genomes.bib
style/springer-basic-improved-author-date-with-italic-et-al-period.csl

Whole Genome Sequencing of Ascochyta rabiei Isolates

Experimental Design

In 2017, DNA was extracted from 21 strains of Ascochyta rabiei and sent for Whole-Genome-Sequencing (WGS) on an Illumina HiSeq2500, producing 100 bp short paired-end reads (Macrogen, Korea).
In the following year (2018), DNA from 20 additional A. rabiei isolates was extracted and sent for WGS, first to AgriBio, Centre for AgriBioscience, Agriculture Victoria Research and on a HiSeq3000, producing 150 bp short paired-end reads. Since the library preparation and sequencing was substantially delayed, 18 DNA samples, mostly overlapping with the 20 samples sent for AgriVic, were sent for sequencing at the Australian Genome Research Facility (AGRF, Melbourne) on 4 lanes of a NextSeq500 flowcell, producing 150 bp paired-end reads (run name CAGRF19461).
Details of the sequenced isolates is provided in (Table 1).

(\#tab:sample_table)Table 1: Ascochyta rabiei isolates used for DNA sequencing.
Isolate Site State Collection_Year Host Cultivar Host Region Haplotype Pathotype
TR9529 Chinchilla QLD 2017 PBA Seamer Seamer Unknown Unknown Extreme
TR9571 Gurley NSW 2017 PBA Seamer Seamer Reg3 ARH09 Extreme
TR9573 Gurley NSW 2017 PBA Seamer Seamer Reg2 ARH01 Extreme
F17191-1 Pt Broughton SA 2017 Genesis090 Genesis090 Unknown Unknown Extreme
F17076-2 Finley NSW 2017 Genesis090 Genesis090 Unknown Unknown Very High
TR9543 Fox Holes QLD 2017 PBA Seamer Seamer Unknown Unknown Very High
16CUR018 Curyo VIC 2016 Genesis090 Genesis090 Unknown Unknown Very High
15CUR002 Curyo VIC 2015 Genesis090 Genesis090 Reg5 ARH02 Very High
15CUR005 Curyo VIC 2015 Genesis090 Genesis090 Reg5 ARH20 Very High
TR6417 Yallaroi NSW 2014 PBA HatTrick HatTrick Reg3 ARH01 Very High
FT13092-2 Kingsford SA 2013 Genesis090 Genesis090 Reg6 ARH04 Very High
17CUR007 Curyo VIC 2017 Genesis090 Genesis090 Reg5 ARH01 High
TR9568 Gurley NSW 2017 PBA Seamer Seamer Reg3 ARH09 High
16CUR017 Curyo VIC 2016 Genesis090 Genesis090 Reg5 ARH01 High
16CUR019 Curyo VIC 2016 Genesis090 Genesis090 Reg5 ARH01 High
F16083-1 Moonta SA 2016 Genesis090 Genesis090 Reg6 ARH01 High
F16253-1 Pt Broughton SA 2016 Genesis090 Genesis090 Unknown Unknown High
15DON007 Donald VIC 2015 Slasher Slasher Reg5 ARH01 High
FT15023 Moonta SA 2015 Genesis090 Genesis090 Unknown Unknown High
FT15025 Moonta SA 2015 Genesis090 Genesis090 Unknown Unknown High
FT15028 Weetula SA 2015 Genesis090 Genesis090 Unknown Unknown High
FT15029 Weetula SA 2015 Genesis090 Genesis090 Unknown Unknown High
FT15030 Weetula SA 2015 Genesis090 Genesis090 Reg6 ARH01 High
FT13092-4 Kingsford SA 2013 Genesis090 Genesis090 Reg6 ARH01 High
16CUR015 Curyo VIC 2016 Genesis090 Genesis090 Reg5 ARH14 Moderate
TR8102 Narromine NSW 2016 PBA HatTrick HatTrick Reg4 ARH01 Moderate
15CUR001 Curyo VIC 2015 Genesis090 Genesis090 Unknown Unknown Moderate
FT13092-6 Kingsford SA 2013 Genesis090 Genesis090 Reg6 ARH01 Moderate
16RUP012 Rupanyup VIC 2016 Genesis090 Genesis090 Reg5 ARH01 Medium
16RUP013 Rupanyup VIC 2016 Genesis090 Genesis090 Reg5 ARH01 Medium
TR8105 Strathdoon, Narromine NSW 2016 PBA HatTrick HatTrick Reg4 ARH01 Medium
15CUR003 Curyo VIC 2015 Genesis090 Genesis090 Reg5 ARH20 Medium
14DON003 Donald VIC 2014 Slasher Slasher Reg5 ARH01 Medium
TR6400 Yallaroi NSW 2014 PBA HatTrick HatTrick Reg3 ARH04 Medium
F17067-1 Coonalpyn SA 2017 Genesis090 Genesis090 Unknown Unknown Low
F17175-1 Elmore VIC 2017 Genesis090 Genesis090 Unknown Unknown Low
TR9544 Fox Holes QLD 2017 PBA Seamer Seamer Unknown Unknown Low
TR9538 Gravel Pit Hill QLD 2017 PBA Seamer Seamer Unknown Unknown Low
15DON001 Donald VIC 2015 Genesis090 Genesis090 Reg5 ARH01 Low
TR6408 Yallaroi NSW 2014 PBA HatTrick HatTrick Reg3 ARH01 Low

Aims

  • Identify strain-unique variants to develop detection methods
  • Associate aggressiveness with specific variants

Analysis Pipeline

General overview:

  1. Data pre-processing: a. Quality check b. Adaptor trimming c. Post-trim quality check
  2. Mapping reads to a reference genome (keep unmapped)
  3. Reads deduplication
  4. Variant calling and filtration
  5. Variant annotation
  6. Variant-Pathogenicity association
  7. Produce variant statistics and assessment

Methods

DNA-Seq data processing, mapping and variant calling were performed on the Griffith University Gowonda HPC Cluster (using Torque scheduler), following the methods specified by @hagiwara_whole-genome_2014 (see details in Appendix 2), @haas_approaches_2011, @hittalmani_novo_2016 and @verma_draft_2016, with modification to use FreeBayes v1.2.0 [@garrison_haplotype-based_2012] to assign variant probability scores and call variants.
An alternative approach was tested, using a complete suite of tools from BBtools v38.22; @bushnell_bbmap:_2014. See official download page on SourceForge, user guide and SEQanswers thread.
Detailed methods, including code for running each of the analyses steps are provided in the associated A_rabiei_WGS_analysis GitHub repository.

Appendices

Appendix 1. Useful resources

  • Whole-Genome Comparison of Aspergillus fumigatus Strains Serially Isolated from Patients with Aspergillosis. [@hagiwara_whole-genome_2014]:

Sequence analysis: The Illumina data sets were trimmed using fastq-mcf in ea-utils (version 1.1.2-484), i.e., sequencing adapters and sequences with low quality scores (Phred score [Q], <30) were removed (24). The data sets were mapped to the genome sequence of the A. fumigatus genome reference strain Af293 (29,420,142 bp, genome version s03-m04-r03) (25, 26) using Bowtie 2 (version 2.0.0-beta7) with the very sensitive option in end-to-end mode (27). Duplicated reads were removed using Picard (version 1.112) (http://picard.sourceforge.net). The programs mpileup and bcftools from SAMtools (version 0.1.19-44428cd) were used to perform further quality controls. In mpileup, the -q20 argument was used to trim reads with low-quality mapping, whereas the argument -q30 was used to trim low-quality bases at the 3' end (28). The bcftools setting was set to -c in order to call variants using Bayesian inference. Consensus and single nucleotide polymorphisms (SNPs) were excluded if they did not meet a minimum coverage of 5x or if the variant was present in <90% of the base calls (29, 30). The genotype field in the variant call format (VCF) files indicates homozygote and heterozygote probabilities as Phred-scaled likelihoods. SNPs were excluded if they were called as heterozygous genotypes using SAMtools. The mapping results were visualized in the Integrative Genomics Viewer (version 2.3.3) (31, 32). The reference genome data included information on open reading frames and annotations, from which the SNPs were designated non-synonymous or synonymous.
Single nucleotide mutations were confirmed by Sanger sequencing. Regions of approximately 400 bp that contained a mutation were amplified with appropriately designed primer pairs and then sequenced. The primer sequences are listed in Table S1 in the supplemental material, which were named as follows. For verification of the SNPs in strains from patient I or patient II, PaI or PaII was added to the primer name, respectively. For non-synonymous SNPs, synonymous SNPs, or SNPs in a non-coding region, (NS, Syno, NonC) was added to the primer name, respectively.
Analysis of unmapped reads: De novo assembly of the unmapped reads was conducted using the Newbler assembler 2.9 (Roche), with default parameters. The contigs were selected based on size/depth criteria: those of <500 bp and/or with a depth of <30x coverage were removed. To investigate whether unique genome sequences were present in strains isolated from the same patient, the unmapped reads of each strain were mapped to the contigs generated from all the strains in the same patient by the Bowtie 2 software. The coverage of the mapped regions was then evaluated. Gene predictions were performed using the gene prediction tool AUGUSTUS (version 2.5.5), with a training set of A. fumigatus (33). The parameters of AUGUSTUS were -species = aspergillus_fumigatus, -strand = both, -genemodel = partial, -singlestrand = false, -protein = on, -introns = on, -start = on, -stop = on, -cds = on, and -gff3 = on. To compare all the predicted genes with Aspergillus genes, consisting of 244,811 genes available on AspGD (34), a reciprocal BLAST best hit approach was performed by BLASTp (35), with an E value of 1.0e-4. All BLASTp results were filtered based on a BLASTp identity of $\ge80$% and an aligned length coverage of $\ge80$%.

Appendix 2. General information

This document was last updated at 2019-03-10 02:28:05 using R Markdown (built with R version 3.5.1 (2018-07-02)). Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. It is especially powerful at authoring documents and reports which include code and can execute code and use the results in the output. For more details on using R Markdown see http://rmarkdown.rstudio.com and Rmarkdown cheatsheet.


Bibliography

About

Whole-Genome-Sequencing analysis pipeline of Ascochyta rabiei isolates

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published