Short tandem repeats (STRs) are genomic sequences comprised of repetitions of a short motif (typically 2-6bp). Many STRs are implicated in a variety of genetic disorders, such as Huntington's disease, amyotrophic lateral sclerosis (ALS), and fragile X syndrome. Thanks to recent improvements in Next-Generation Sequencing (NGS) technologies and computational methods (e.g., ExpansionHunter), it is now possible to identify large expansions of STRs using Whole Genome Sequencing (WGS).
Accurate repeat genotyping requires a high-quality STR catalog that specifies reference coordinates and structure of each locus. Here, we generated well-curated STR catalog that can facilitate analysis of STRs genome-wide.
The way to use the catalog is to genotype STRs in the catalog in WGS data using ExpansionHunter. We provided a quick introduction to do that.
The contains 174,293 STRs that are polymorphic across populations and 30+ known pathogenic STRs.
The genome-wide polymorphic STR catalog is a comprehensive STR catalog focused on functional STRs generated using population sequencing data. It is the catalog to use if you would like to analyze STRs genome wide. For more details about how genome-wide polymorphic STR catalog differ from other catalogs and how it was generated. You can find more details here.
To facilitate the use of the catalog, we genotyped STRs across different samples in 1000 Genomes Project using ExpansionHunter 5.0.0. The histograms of genotypes are here. We also summarized polymorphism by population in documentation.
hg38/
catalog in json format in hg38hg19/
catalog in json format in hg19
- Catalogs are stored in json format as specified in ExpansionHunter.
- Genotypes are summarized as histograms of size distribution where repeat numbers and corresponding allele counts are provided for each STR.
- Yunjiang Qiu
- Viraj Deshpande
- Pavel Avdeyev
- Egor Dolzhenko
- Michael A. Eberle
If you have any questions or suggestions, you can either create an issue or reach me by email.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.