Skip to content

Illumina/RepeatCatalogs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

STR Catalogs

Short tandem repeats (STRs) are genomic sequences comprised of repetitions of a short motif (typically 2-6bp). Many STRs are implicated in a variety of genetic disorders, such as Huntington's disease, amyotrophic lateral sclerosis (ALS), and fragile X syndrome. Thanks to recent improvements in Next-Generation Sequencing (NGS) technologies and computational methods (e.g., ExpansionHunter), it is now possible to identify large expansions of STRs using Whole Genome Sequencing (WGS).

Accurate repeat genotyping requires a high-quality STR catalog that specifies reference coordinates and structure of each locus. Here, we generated well-curated STR catalog that can facilitate analysis of STRs genome-wide.

Documentation

Analyze STRs using ExpansionHunter

The way to use the catalog is to genotype STRs in the catalog in WGS data using ExpansionHunter. We provided a quick introduction to do that.

Catalogs

The contains 174,293 STRs that are polymorphic across populations and 30+ known pathogenic STRs.

Overview

The genome-wide polymorphic STR catalog is a comprehensive STR catalog focused on functional STRs generated using population sequencing data. It is the catalog to use if you would like to analyze STRs genome wide. For more details about how genome-wide polymorphic STR catalog differ from other catalogs and how it was generated. You can find more details here.

Population distribution

To facilitate the use of the catalog, we genotyped STRs across different samples in 1000 Genomes Project using ExpansionHunter 5.0.0. The histograms of genotypes are here. We also summarized polymorphism by population in documentation.

Folder structure and file format

  • hg38/ catalog in json format in hg38
  • hg19/ catalog in json format in hg19

File format

  • Catalogs are stored in json format as specified in ExpansionHunter.
  • Genotypes are summarized as histograms of size distribution where repeat numbers and corresponding allele counts are provided for each STR.

Contributors

  • Yunjiang Qiu
  • Viraj Deshpande
  • Pavel Avdeyev
  • Egor Dolzhenko
  • Michael A. Eberle

Contacts

If you have any questions or suggestions, you can either create an issue or reach me by email.

License

Shield: CC BY-SA 4.0

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

CC BY-SA 4.0