The Zambian wheat blast origin (2021 project)

Update 31 March 2021: I have updated the analyses below to include the very recently released whole-genome sequencing datasets from the Zambian wheat blast outbreak! Please find the report on Zenodo and information below.

Update 3 October 2016: Our publication reporting the origin of the Bangladesh wheat blast outbreak is now freely available on the website of BMC Biology

Update 20 June 2016: We have made a biorxiv preprint available of our collaborative analyses of Bangladesh wheat blast.

Update 2 May 2016: The arguments for (biorxiv, blog post) and against (comment) renaming the rice/wheat blast pathogens.

Update 29 April 2016: The Open Wheat Blast website shared a timeline of the recent developments.

Update 28 April 2016: Nature News article about the wheat blast outbreak.

The Zambian wheat blast origin (2021 project)

What is wheat blast? Is it dangerous?

Please see below for further insights and links.

What happened in Zambia? What has been done so far?

Recent morphological and molecular analyses using a SNP genotyping assay have confirmed that at least since 2018, Zambia was affected by outbreaks of wheat blast.

Tembo et al. (2020, PLOS One) first described symptoms observed in Zambian wheat fields and used a diagnostic PCR assay.

Later, Tembo et al. (2021, Zenodo DOI) showed that a set of 84 single nucleotide polymorphism (SNP) markers is highly effective in assigning unknown genotypes to diverse Magnaporthe oryzae lineages.

In a follow-up, Win et al. (2021, Zenodo DOI) and Latorre and Burbano (2021, Zenodo DOI) independently confirmed that the pathogen strains collected on Zambian wheat between 2018 and 2020 were indistinguishable from wheat blast strains detected in Bangladesh 2016 (see also Islam et al. 2016).

On the 25 March 2021, Tembo et al. released 13 whole-genome sequencing datasets of Zambian strains collected 2018-2020 (Zenodo DOI)

How did I analyze the whole-genome datasets?

Please see below for details on the procedure.

What other genomes did I analyze?

The genetic diversity of Magnaporthe oryzae across the different host-associated lineages has been extremely well studied. To ascertain the position of the Zambian genomes, I included the totality of the publicly available (NCBI SRA) and well identifiable isolates.

The total of included Magnaporthe genomes is 273 in addition 13 from new genomes from Zambia.

What did I find analyzing the Zambian wheat blast genomes?

Please have a look at my Zenodo dataset and description.

Croll D. Whole-genome analyses of 286 Magnaporthe oryzae genomes suggest that an independent introduction of a global pandemic lineage is at the origin of the Zambia wheat blast outbreak [dataset]. Zenodo https://doi.org/10.5281/zenodo.4650215

The origin of wheat blast in Bangladesh

What is wheat blast? Is it dangerous?

Wheat blast is a fungal disease that leads to large yield losses. You can find some more information at Open Wheat Blast and from Kansas State University

What happened in February 2016?

Wheat blast was first found in Bangladesh. This was the first report of the disease in Asia. The disease already caused large yield losses and there is a significant worry that the disease will rapidly spread to wheat production areas in India and beyond.

Why was there an outbreak of wheat blast in Bangladesh?

Currently, nobody knows. The key question is whether wheat blast was introduced to Bangladesh from elsewhere or whether it originated locally (e.g. through natural mutations in local fungal strains).

Wheat blast and rice blast are closely related pathogen lineages and currently identified as one species. All rice blast pathogen strains are genetically very similar. Wheat blast pathogen strains are genetically diverse and distinct from rice blast, as shown here.

Who are we?

We are plant pathologists at the ETH Zurich (Switzerland) using genomics tools to identify how plant pathogens cause disease. You can read more about Daniel Croll's research here and Bruce McDonald's research here.

UPDATE: The laboratory of Daniel Croll is now at the University of Neuchatel (website)

How do we find out where wheat blast came from?

We analyzed genome datasets generated around the world for rice and wheat blast strains. Though the rice blast strains came from many countries, all wheat blast strains came from Brazil. If the wheat blast strains found in Bangladesh are closely related to previously known strains, we can identify the likely source of the Bangladesh outbreak.

What data did we use?

The key piece of information was gathered by the teams of Sophien Kamoun at The Sainsbury Laboratory, Diane Saunders The Genome Analysis Centre (TGAC), and Tofazzal Islam of the Bangabandhu Sheikh Mujibur Rahman Agricultural University. These teams collected infected leaves from the field, sequenced a significant portion of the pathogen's genome and made the data instantly available.

Then, we added rice and wheat blast data from multiple sources including publicly available genomes on NCBI and genomes available from Nick Talbot and Darren Soanes (Exeter University).

We also generated our own genome sequence data. Paulo Ceresini’s group including Vanina Castroagudin (UNESP, Brazil) and João Nunes Maciel (EMBRAPA Wheat, Brazil) collected wheat blast strains across Brazil. Bruce McDonald and Daniel Croll (both ETH Zurich) generated genome sequencing data. We made the genome sequence of a Brazilian wheat blast strain available. Download the genome, gene models and protein sequences.

What did we find out?

The wheat blast outbreak in Bangladesh was not caused by a mutated rice blast strain. We found that the strains isolated from Bangladesh grouped very tightly with all known wheat blast strains from Brazil and not with any known rice blast strains.
The wheat blast strains in Bangladesh are genetically very similar to wheat blast strains previously identified in Brazil. The genetically most similar strains were collected in Brazilian wheat fields and on associated weeds Eleusine indica (goose grass) and Cenchrus echinatus collected in Brazil.
One of the genetically closest strains known from Brazil is PY0925. The genome sequence and annotation can be downloaded here. We made the genome sequence of an additional Brazilian wheat blast strain available. This is strain 205 isolated in São Borja (RS, Brazil) and is slightly more distant to the Bangladesh blast strains. Download the genome, gene models and protein sequences.
The high similarity among the wheat blast strains from Bangladesh and Brazil suggests that wheat blast was introduced into Bangladesh from Brazil. Many fungal diseases can be transmitted via grains and previous research indicated that wheat blast can be seed-transmitted. A transmission of the disease from Brazil is plausible because Bangladesh is one of the largest Asian importers of wheat and Brazil is one of the major suppliers of wheat to Bangladesh.
Other Asian countries that received wheat from Brazil, including Thailand, Philippines and Vietnam should increase surveillance efforts to learn if wheat blast has entered into their wheat fields.

Figure - wheat and rice blast: We constructed a maximum likelihood phylogeny of all available rice and wheat blast genome sequences. The samples included 47 rice blast strains, 32 Brazilian blast strains, 5 strains from various additional hosts and 4 samples collected in Bangladesh in spring 2016. The dataset contained only SNPs reliably called in the transcriptomic sequences of the Bangladesh sample 12 and genotyped in at least 90% of all other strains. We retained 55'041 informative SNPs. The phylogeny was constructed using RAxML version 8.2.8 with a GTR substitution matrix and 100 bootstrap replicates.

Figure - Brazilian and Bangladesh wheat blast: We assessed phylogenetic relationships among Brazilian (n=32) and Bangladesh (n=4) blast strains using a Neighbor-Net. The dataset contained only SNPs reliably called in the transcriptomic sequences of the Bangladesh sample 12 and genotyped in at least 90% of all other strains. We retained 7411 informative SNPs. We performed the analysis using splitsTree version 4.14.2. Phylogenetic distances are uncorrected p distances.

What are the genome sequences of the most closely related Brazilian wheat blast strains?

One of the most closely related genomes is available from Nick Talbot and Darren Soanes (Exeter University). This is strain PY0925 isolated in Perdizes (MG, Brazil) in 2009.

We made the genome sequence of an additional Brazilian wheat blast strain available. This is strain 205 isolated in São Borja (RS, Brazil) and is slightly more distant to the Bangladesh blast strains. Download the genome, gene models and protein sequences.

How did we analyze the data?

We used standard bioinformatic procedures to analyze genomic sequences. The most important steps are detailed below.

We performed all analyses using the genome of the rice blast strain 70-15 as a reference. The genome sequence is available at Ensembl Fungi.

Alignment of sequencing data against the reference genome

For all completely sequenced genomes, we aligned quality-trimmed Illumina short read data against the reference genome. We used bowtie2.

bowtie2 --very-sensitive-local \
	--phred33 -x reference-genome-index \
	-1 R1.fq.gz -2 R2.fq.gz -X 1000 \
	-S alignedreads.sam

For all strains collected from the outbreak, an approach called "Field pathogenomics" was used (see above). This data represents transcriptome sequences (RNA) and must be analyzed differently than complete genome sequencing data. We used the RNA-seq aligner tophat2.

tophat -G reference.gff --min-intron-length 10 \
   --rg-id strain.id --rg-sample strain.id --rg-library strain.id \
   --keep-fasta-order --output-dir strain.id \
   reference-genome-index R1.fq.gz R2.fq.gz

Raw variant calling using the Genome Analysis Toolkit GATK

We identified variants in the genomes of the different strains using the Genome Analysis Toolkit (GATK) from the Broad Institute.

Completely sequenced genomes

Perform variant calling using the GATK HaplotypeCaller.

java -Xmx8g -jar GenomeAnalysisTK.jar \
	-T HaplotypeCaller \
	-R reference-genome.fa \
	-ploidy 1 \
	--input_file alignedreads.bam \
	--emitRefConfidence GVCF \
	--variant_index_type LINEAR \
	--variant_index_parameter 128000 \
	-o alignedreads.g.vcf

Bangladesh strains (Field pathogenomics)

To prepare the aligned reads from tophat for variant calling.

java -Xmx8g -jar GenomeAnalysisTK.jar \
	-T SplitNCigarReads \
   -R reference-genome.fa -I RawData/$seqid/accepted_hits.bam -o RawData/$seqid/accepted_hits.cigarN.bam \
   -rf ReassignOneMappingQuality -RMQF 255 -RMQT 60 \
   -U ALLOW_N_CIGAR_READS

Perform variant calling using the GATK HaplotypeCaller.

java -Xmx8g -jar GenomeAnalysisTK.jar \
	-T HaplotypeCaller \
	-R reference-genome.fa \
	-ploidy 1 \
	--input_file alignedreads.bam \
	--emitRefConfidence GVCF \
	--variant_index_type LINEAR \
	--variant_index_parameter 128000 \
	-o alignedreads.g.vcf

Joint variant calling and filtration using the Genome Analysis Toolkit GATK

We generated a joint variant call set using the GATK GenotypeGVCFs.

java -Xmx35g -jar GenomeAnalysisTK.jar \
	-T GenotypeGVCFs \
	-R reference-genome.fa \
	--variant g.vcf.list \
	-maxAltAlleles 2 \
	-o genotyped.vcf

We removed all non-SNP variants.

java -Xmx16g -jar ~/software/$GATKVERSION/GenomeAnalysisTK.jar \
	-T SelectVariants \
	-R reference-genome.fa \
	-selectType SNP \
	--variant genotyped.vcf \
	-o genotyped.snp-only.vcf

Finally, we applied several hard filters to remove low-quality SNPs.

QUAL=5000.0

QD=5.0
MQ=20.0
ReadPosRankSum_lower=-2.0
ReadPosRankSum_upper=2.0
MQRankSum_lower=-2.0
MQRankSum_upper=2.0
BaseQRankSum_lower=-2.0
BaseQRankSum_upper=2.0

java -jar ~/software/$GATKVERSION/GenomeAnalysisTK.jar \
	-T VariantFiltration \
	-R reference-genome.fa \
	--variant genotyped.snp-only.vcf \
	--filterExpression "AN < $AN" --filterName "nSamples" \
	--filterExpression "QD < $QD" --filterName "QDFilter" \
	--filterExpression "QUAL < $QUAL" --filterName "QualFilter" \
	--filterExpression "MQ < $MQ" --filterName "MQ" \
	--filterExpression "ReadPosRankSum < $ReadPosRankSum_lower" --filterName "ReadPosRankSum" \
	--filterExpression "ReadPosRankSum > $ReadPosRankSum_upper" --filterName "ReadPosRankSum" \
	--filterExpression "MQRankSum < $MQRankSum_lower" --filterName "MQRankSum" \
	--filterExpression "MQRankSum > $MQRankSum_upper" --filterName "MQRankSum" \
	--filterExpression "BaseQRankSum < $BaseQRankSum_lower" --filterName "BaseQRankSum" \
	--filterExpression "BaseQRankSum > $BaseQRankSum_upper" --filterName "BaseQRankSum" \
	-o genotyped.snp-only.filters.vcf

Selection of SNPs for phylogenomic analyses using vcftools

We used vcftools to generate a SNP dataset for phylogenomic analyses. To avoid biases in the phylogenetic reconstruction, we only retained SNPs that were called in at least 90% of all analyzed strains. Furthermore, we retained a SNP only if the SNP was called in the best sequenced Bangladesh sample 12.

vcftools --vcf genotyped.snp-only.filters.vcf \
	--positions bangladesh_snps.txt
	--max-missing 0.9 --remove-filtered-all --recode \
	--out genotyped.snp-only.filters.subset.vcf

Phylogenomic analyses using RAxML and Splitstree

We built a phylogenetic tree from a SNP supermatrix using RAxML version 8.2.8. For detailed parameters see above.

We also constructed a NeighborNet among all wheat blast strains using splitsTree version 4.14.2. For detailed parameters see above.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
images		images
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

images

images

.gitignore

.gitignore

README.md

README.md

Repository files navigation

The Zambian wheat blast origin (2021 project)

What is wheat blast? Is it dangerous?

What happened in Zambia? What has been done so far?

How did I analyze the whole-genome datasets?

What other genomes did I analyze?

What did I find analyzing the Zambian wheat blast genomes?

The origin of wheat blast in Bangladesh

What is wheat blast? Is it dangerous?

What happened in February 2016?

Why was there an outbreak of wheat blast in Bangladesh?

Who are we?

How do we find out where wheat blast came from?

What data did we use?

What did we find out?

What are the genome sequences of the most closely related Brazilian wheat blast strains?

How did we analyze the data?

About

Releases

Packages

crolllab/wheat-blast

Folders and files

Latest commit

History

Repository files navigation

The Zambian wheat blast origin (2021 project)

What is wheat blast? Is it dangerous?

What happened in Zambia? What has been done so far?

How did I analyze the whole-genome datasets?

What other genomes did I analyze?

What did I find analyzing the Zambian wheat blast genomes?

The origin of wheat blast in Bangladesh

What is wheat blast? Is it dangerous?

What happened in February 2016?

Why was there an outbreak of wheat blast in Bangladesh?

Who are we?

How do we find out where wheat blast came from?

What data did we use?

What did we find out?

What are the genome sequences of the most closely related Brazilian wheat blast strains?

How did we analyze the data?

About

Resources

Stars

Watchers

Forks