Skip to content

repeat catalog v0.9

Choose a tag to compare

@bw2 bw2 released this 26 Jul 22:08
· 278 commits to main since this release
c254526

Draft catalog based on combining the following 4 catalogs in order:

  1. Known disease-associated loci
  2. a catalog of all perfect repeats in hg38 that span at least 9bp in the reference and consist of at least 3 repeats of some motif that is between 2bp (dinucleotide) and 1000bp in size. This catalog was computed using ColabRepeatFinder
  3. Illumina catalog of 174k polymorphic repeats
  4. Catalog of polymorphic loci in 51 HPRC samples computed using the methods described in [Weisburd 2023]

The merging procedure involved taking all loci from the 1st catalog, then all loci from the 2nd catalog unless they
A) overlapped a previously-added locus by 66% or more, and B) had the same motif as that locus after cyclic shift.

The numbers (and %) of loci in the combined catalog that were added from each of the source catalogs were as follows:

          82 out of 3,289,806 ( 0.0%) from 1. known disease-associated loci
   3,220,632 out of 3,289,806 (97.9%) from 2. perfect repeats in hg38
      10,645 out of 3,289,806 ( 0.3%) from 3. Illumina catalog of 174k polymorphic loci
      58,447 out of 3,289,806 ( 1.8%) from 4. polymorphic loci in 51 HPRC samples