As sequencing and genotyping costs continue to decrease, genome-wide association studies have emerged as a powerful, general approach for identifying alleles and loci responsible for natural variation. Although its application to human disease has received most attention, association mapping has tremendous potential in a wide range of organisms. Because it naturally occurs as inbred lines, A. thaliana is almost ideally suited for association mapping: once a set of lines has been genotyped, they can be phenotyped over and over, for the same or for different traits, by the entire community. A multi-group effort to realize this potential has been under way for some time:
- With funding from the NSF 2010 Program (DEB-0115062), the Bergelson, Kreitman, and Nordborg labs set out to sequence 1,500 short fragments in a panel of 96 lines using standard PCR-based dideoxy sequencing. The 1,214 manually curated sequence alignments generated by the project to date are available for download here.
- Based on the results of the project just described, the Ecker and Weigel labs selected a subset of 20 "maximally diverse" lines for whole-genome re-sequencing using Perlegen technology. The results were published in 2007 (Clark et al., 2007; Kim et al., 2007), and the data are available here.
- With continued support from the NSF 2010 Project (DEB-0519961), the Bergelson and Nordborg labs joined forces with the Borevitz lab (supported by NIH GM073822) to develop a 250k Affymetrix genotyping chip using SNPs discovered by the Perlegen re-sequencing (Kim et al., 2007), and use it to genotype around 1,300 lines. This project is essentially finished, and the latest results (for 1,307 genotypes) are here.
- An effort to completely sequence over a thousand lines (including many of the ones genotyped using the 250k SNP chip) is underway, and should be completed during 2011. See the "1001 Genomes Project" for more information. Several projects to integrate all these data are also in progress (including a final one funded by the NSF 2010 Project [DEB-0723935]). See below for further information.
Currently available resources
All lines have been submitted to the stock center.
In chronological order:
- manually curated, old-school sequencing data for 96 lines and 1,214 loci;
- the Perlegen re-sequencing data for 20 lines;
- the 149 SNP data used by Platt et al. (2010)
- the 250k SNP data used in Atwell et al. (2010)
- the latest version of the 250k SNP data
- the latest whole-genome sequencing data.
We make all data publicly available as it is generated. In return, we ask that the community adhere to standard practices for publishing results based on genome sequencing results. Specifically, you are not allowed to publish results based on data that have not yet been published. At present, this includes most of the 250k SNP data (except those data published in Atwell et al. (2010) and Li et al. (2010)) and all of the sequencing data.
We currently provide the following browsers:
- the accession browser described in Platt et al. (2010);
- the GWAS browser described in Atwell et al. (2010), and;
- a browser that lets you search the results of Atwell et al. (2010) by gene name.
- the latest version of the 250k SNP data of Atwell et al. (2010).
The database/web app is published in Huang et al. (2011). A database dump is available for download. Updated versions of both browsers and data are forthcoming. For sequence browsers, see the 1,001 Genomes Project site, and Joe Ecker's 1,001 Genomes site at Salk.
Plans for the future
Sequencing, integration: forward in all directions!