Whole Genome Sequencing of Ascochyta rabiei Isolates

title

author

date

always_allow_html

output

bibliography

csl

Whole Genome Sequencing of <i>Ascochyta rabiei</i> Isolates

Ido Bar

18 July 2017

true

bookdown::html_document2

toc	toc_depth	keep_md
true	3	true

style/Fungal_genomes.bib

style/springer-basic-improved-author-date-with-italic-et-al-period.csl

Whole Genome Sequencing of Ascochyta rabiei Isolates

Experimental Design

In 2017, DNA was extracted from 21 strains of Ascochyta rabiei and sent for Whole-Genome-Sequencing (WGS) on an Illumina HiSeq2500, producing 100 bp short paired-end reads (Macrogen, Korea).
In the following year (2018), DNA from 20 additional A. rabiei isolates was extracted and sent for WGS, first to AgriBio, Centre for AgriBioscience, Agriculture Victoria Research and on a HiSeq3000, producing 150 bp short paired-end reads. Since the library preparation and sequencing was substantially delayed, 18 DNA samples, mostly overlapping with the 20 samples sent for AgriVic, were sent for sequencing at the Australian Genome Research Facility (AGRF, Melbourne) on 4 lanes of a NextSeq500 flowcell, producing 150 bp paired-end reads (run name CAGRF19461).
Details of the sequenced isolates is provided in (Table 1).

(\#tab:sample_table)Table 1: Ascochyta rabiei isolates used for DNA sequencing.

Isolate	Site	State	Collection_Year	Host Cultivar	Host	Region	Haplotype	Pathotype
TR9529	Chinchilla	QLD	2017	PBA Seamer	Seamer	Unknown	Unknown	Extreme
TR9571	Gurley	NSW	2017	PBA Seamer	Seamer	Reg3	ARH09	Extreme
TR9573	Gurley	NSW	2017	PBA Seamer	Seamer	Reg2	ARH01	Extreme
F17191-1	Pt Broughton	SA	2017	Genesis090	Genesis090	Unknown	Unknown	Extreme
F17076-2	Finley	NSW	2017	Genesis090	Genesis090	Unknown	Unknown	Very High
TR9543	Fox Holes	QLD	2017	PBA Seamer	Seamer	Unknown	Unknown	Very High
16CUR018	Curyo	VIC	2016	Genesis090	Genesis090	Unknown	Unknown	Very High
15CUR002	Curyo	VIC	2015	Genesis090	Genesis090	Reg5	ARH02	Very High
15CUR005	Curyo	VIC	2015	Genesis090	Genesis090	Reg5	ARH20	Very High
TR6417	Yallaroi	NSW	2014	PBA HatTrick	HatTrick	Reg3	ARH01	Very High
FT13092-2	Kingsford	SA	2013	Genesis090	Genesis090	Reg6	ARH04	Very High
17CUR007	Curyo	VIC	2017	Genesis090	Genesis090	Reg5	ARH01	High
TR9568	Gurley	NSW	2017	PBA Seamer	Seamer	Reg3	ARH09	High
16CUR017	Curyo	VIC	2016	Genesis090	Genesis090	Reg5	ARH01	High
16CUR019	Curyo	VIC	2016	Genesis090	Genesis090	Reg5	ARH01	High
F16083-1	Moonta	SA	2016	Genesis090	Genesis090	Reg6	ARH01	High
F16253-1	Pt Broughton	SA	2016	Genesis090	Genesis090	Unknown	Unknown	High
15DON007	Donald	VIC	2015	Slasher	Slasher	Reg5	ARH01	High
FT15023	Moonta	SA	2015	Genesis090	Genesis090	Unknown	Unknown	High
FT15025	Moonta	SA	2015	Genesis090	Genesis090	Unknown	Unknown	High
FT15028	Weetula	SA	2015	Genesis090	Genesis090	Unknown	Unknown	High
FT15029	Weetula	SA	2015	Genesis090	Genesis090	Unknown	Unknown	High
FT15030	Weetula	SA	2015	Genesis090	Genesis090	Reg6	ARH01	High
FT13092-4	Kingsford	SA	2013	Genesis090	Genesis090	Reg6	ARH01	High
16CUR015	Curyo	VIC	2016	Genesis090	Genesis090	Reg5	ARH14	Moderate
TR8102	Narromine	NSW	2016	PBA HatTrick	HatTrick	Reg4	ARH01	Moderate
15CUR001	Curyo	VIC	2015	Genesis090	Genesis090	Unknown	Unknown	Moderate
FT13092-6	Kingsford	SA	2013	Genesis090	Genesis090	Reg6	ARH01	Moderate
16RUP012	Rupanyup	VIC	2016	Genesis090	Genesis090	Reg5	ARH01	Medium
16RUP013	Rupanyup	VIC	2016	Genesis090	Genesis090	Reg5	ARH01	Medium
TR8105	Strathdoon, Narromine	NSW	2016	PBA HatTrick	HatTrick	Reg4	ARH01	Medium
15CUR003	Curyo	VIC	2015	Genesis090	Genesis090	Reg5	ARH20	Medium
14DON003	Donald	VIC	2014	Slasher	Slasher	Reg5	ARH01	Medium
TR6400	Yallaroi	NSW	2014	PBA HatTrick	HatTrick	Reg3	ARH04	Medium
F17067-1	Coonalpyn	SA	2017	Genesis090	Genesis090	Unknown	Unknown	Low
F17175-1	Elmore	VIC	2017	Genesis090	Genesis090	Unknown	Unknown	Low
TR9544	Fox Holes	QLD	2017	PBA Seamer	Seamer	Unknown	Unknown	Low
TR9538	Gravel Pit Hill	QLD	2017	PBA Seamer	Seamer	Unknown	Unknown	Low
15DON001	Donald	VIC	2015	Genesis090	Genesis090	Reg5	ARH01	Low
TR6408	Yallaroi	NSW	2014	PBA HatTrick	HatTrick	Reg3	ARH01	Low

Aims

Identify strain-unique variants to develop detection methods
Associate aggressiveness with specific variants

Analysis Pipeline

General overview:

Data pre-processing: a. Quality check b. Adaptor trimming c. Post-trim quality check
Mapping reads to a reference genome (keep unmapped)
Reads deduplication
Variant calling and filtration
Variant annotation
Variant-Pathogenicity association
Produce variant statistics and assessment

Methods

DNA-Seq data processing, mapping and variant calling were performed on the Griffith University Gowonda HPC Cluster (using Torque scheduler), following the methods specified by @hagiwara_whole-genome_2014 (see details in Appendix 2), @haas_approaches_2011, @hittalmani_novo_2016 and @verma_draft_2016, with modification to use FreeBayes v1.2.0 [@garrison_haplotype-based_2012] to assign variant probability scores and call variants.
An alternative approach was tested, using a complete suite of tools from BBtools v38.22; @bushnell_bbmap:_2014. See official download page on SourceForge, user guide and SEQanswers thread.
Detailed methods, including code for running each of the analyses steps are provided in the associated A_rabiei_WGS_analysis GitHub repository.

Appendices

Appendix 1. Useful resources

Whole-Genome Comparison of Aspergillus fumigatus Strains Serially Isolated from Patients with Aspergillosis. [@hagiwara_whole-genome_2014]:

Sequence analysis: The Illumina data sets were trimmed using fastq-mcf in ea-utils (version 1.1.2-484), i.e., sequencing adapters and sequences with low quality scores (Phred score [Q], <30) were removed (24). The data sets were mapped to the genome sequence of the A. fumigatus genome reference strain Af293 (29,420,142 bp, genome version s03-m04-r03) (25, 26) using Bowtie 2 (version 2.0.0-beta7) with the very sensitive option in end-to-end mode (27). Duplicated reads were removed using Picard (version 1.112) (http://picard.sourceforge.net). The programs mpileup and bcftools from SAMtools (version 0.1.19-44428cd) were used to perform further quality controls. In mpileup, the -q20 argument was used to trim reads with low-quality mapping, whereas the argument -q30 was used to trim low-quality bases at the 3' end (28). The bcftools setting was set to -c in order to call variants using Bayesian inference. Consensus and single nucleotide polymorphisms (SNPs) were excluded if they did not meet a minimum coverage of 5x or if the variant was present in <90% of the base calls (29, 30). The genotype field in the variant call format (VCF) files indicates homozygote and heterozygote probabilities as Phred-scaled likelihoods. SNPs were excluded if they were called as heterozygous genotypes using SAMtools. The mapping results were visualized in the Integrative Genomics Viewer (version 2.3.3) (31, 32). The reference genome data included information on open reading frames and annotations, from which the SNPs were designated non-synonymous or synonymous.
Single nucleotide mutations were confirmed by Sanger sequencing. Regions of approximately 400 bp that contained a mutation were amplified with appropriately designed primer pairs and then sequenced. The primer sequences are listed in Table S1 in the supplemental material, which were named as follows. For verification of the SNPs in strains from patient I or patient II, PaI or PaII was added to the primer name, respectively. For non-synonymous SNPs, synonymous SNPs, or SNPs in a non-coding region, (NS, Syno, NonC) was added to the primer name, respectively.
Analysis of unmapped reads: De novo assembly of the unmapped reads was conducted using the Newbler assembler 2.9 (Roche), with default parameters. The contigs were selected based on size/depth criteria: those of <500 bp and/or with a depth of <30x coverage were removed. To investigate whether unique genome sequences were present in strains isolated from the same patient, the unmapped reads of each strain were mapped to the contigs generated from all the strains in the same patient by the Bowtie 2 software. The coverage of the mapped regions was then evaluated. Gene predictions were performed using the gene prediction tool AUGUSTUS (version 2.5.5), with a training set of A. fumigatus (33). The parameters of AUGUSTUS were -species = aspergillus_fumigatus, -strand = both, -genemodel = partial, -singlestrand = false, -protein = on, -introns = on, -start = on, -stop = on, -cds = on, and -gff3 = on. To compare all the predicted genes with Aspergillus genes, consisting of 244,811 genes available on AspGD (34), a reciprocal BLAST best hit approach was performed by BLASTp (35), with an E value of 1.0e^-4. All BLASTp results were filtered based on a BLASTp identity of $\ge80$% and an aligned length coverage of $\ge80$%.

Appendix 2. General information

This document was last updated at 2019-03-10 02:28:05 using R Markdown (built with R version 3.5.1 (2018-07-02)). Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. It is especially powerful at authoring documents and reports which include code and can execute code and use the results in the output. For more details on using R Markdown see http://rmarkdown.rstudio.com and Rmarkdown cheatsheet.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
sample_info		sample_info
src		src
style		style
.gitignore		.gitignore
A_rabiei_WGS_analysis.Rproj		A_rabiei_WGS_analysis.Rproj
A_rabiei_genome_sequencing.Rmd		A_rabiei_genome_sequencing.Rmd
A_rabiei_genome_sequencing.nb.html		A_rabiei_genome_sequencing.nb.html
README.Rmd		README.Rmd
README.html		README.html
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Whole Genome Sequencing of Ascochyta rabiei Isolates

Experimental Design

Aims

Analysis Pipeline

General overview:

Methods

Appendices

Appendix 1. Useful resources

Appendix 2. General information

Bibliography

About

Releases

Packages

Languages

IdoBar/A_rabiei_WGS_analysis

Folders and files

Latest commit

History

Repository files navigation

Whole Genome Sequencing of Ascochyta rabiei Isolates

Experimental Design

Aims

Analysis Pipeline

General overview:

Methods

Appendices

Appendix 1. Useful resources

Appendix 2. General information

Bibliography

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages