The pattern of polymorphism in Arabidopsis thaliana (NSF DEB 0519961)

Ümit Seren edited this page Mar 10, 2015 · 2 revisions

Introduction

The primary goal of this project, which is a continuation of a previous collaborative project between the Bergelson, Kreitman, and Nordborg labs, was to enable genome-wide association mapping in A. thaliana by genotyping sufficiently many lines using sufficiently many markers. Based on our analyses of existing data (Nordborg et al., 2005; Aranzana et al., 2005; Zhao et al., 2007; Kim et al., 2007) we decided to genotype on the order of 1,300 lines using a custom Affymetrix 250k SNP chip developed from the Perlegen re-sequencing data (Clark et al., 2007; Kim et al., 2007). This represented a considerable increase in effort over the original proposal, and was made possible by the ever-decreasing costs of genotyping and by combining forces with the Borevitz lab (supported by NIH GM073822).

Project status

We genotyped over 6,000 lines (including all common stock-center accessions) using 149 genome-wide SNPs. Our primary purpose was to detect identical and heterozygous individuals, but we also sought to get a better picture of population structure. A first paper describing these data has been published.

The sample for 250k SNP genotyping was selected based on the 149-SNP data, and contains several large regional "population" samples as well as a geographically diverse selection of lines. The latest version of the data are here (all accessions have been submitted to the stock center). In Atwell et al. (2010) we described an initial attempt at GWAS using a subset of these data, thus realizing the major objective of this project. Further publications utilizing the full data are in preparation.

Original project abstract

The main objective of this project is to develop tools and resources that will enable the community of Arabidopsis geneticists to carry out population surveys for marker-trait associations, so-called linkage disequilibrium mapping. The basic idea is simple: rather than mapping genes by studying crosses, one types a large number of unrelated individuals (1,152 in the present case) with respect to a large number of variable marker loci (6,144 in the present case) distributed across the genome, in order to identify chromosomal regions that appear to be shared by individuals that are phenotypically similar (in the sense of being resistant to a particular pathogen, for example). This approach potentially leads to much faster gene identification than traditional methods, but has only become practicable as a result of advances in technology for studying genetic variation. The project is part of the effort to understand the genetic basis for phenotypic variation – arguably the greatest challenge facing modern biology, and central to genetic epidemiology (what explains why some people are more susceptible to asthma?), plant and animal breeding (why are some strains of rice more tolerant to drought?), as well as basic evolutionary biology (what kinds of genetic changes underlie adaptation to a novel habitat?). While the project is directed toward the model plant Arabidopsis thaliana, the methods that will be developed are broadly applicable (including to humans). Furthermore, Arabidopsis is a model for plant biology, and is often used to study agriculturally important traits indirectly.