Skip to content

Commit

Permalink
Counts for all frequency data (sans 1KGP)
Browse files Browse the repository at this point in the history
  • Loading branch information
Daniel Standage committed Oct 24, 2023
1 parent 668fdb3 commit f0b9b5b
Show file tree
Hide file tree
Showing 16 changed files with 88,713 additions and 88,698 deletions.
22 changes: 12 additions & 10 deletions dbbuild/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -112,20 +112,21 @@ It includes the following fields.
- `Population`: the unique identifer of the population
- `Allele`: the allele of each variant in the microhap, separated by pipe symbols
- `Frequency`: the frequency of the allele in the specified population (a real number between 0.0 and 1.0)
- `Count`: the total number of alleles (denominator) used to compute the given population frequency estimate

For example, the first few lines of the `frequency.tsv` for van der Gaag (2018) look like this.

```csv
Marker,Population,Allele,Frequency
mh06PK-24844,MHDBP-383d86606a,T|C|G|C|C|C|A|A|G|A,0.000
mh06PK-24844,MHDBP-936bc36f79,T|C|G|C|C|C|A|A|G|A,0.000
mh06PK-24844,MHDBP-3dab7bdd14,T|C|G|C|C|C|A|A|G|A,0.123
mh06PK-24844,MHDBP-383d86606a,T|C|G|C|C|T|A|A|G|G,0.566
mh06PK-24844,MHDBP-936bc36f79,T|C|G|C|C|T|A|A|G|G,0.586
mh06PK-24844,MHDBP-3dab7bdd14,T|C|G|C|C|T|A|A|G|G,0.425
mh06PK-24844,MHDBP-383d86606a,C|C|G|C|C|C|A|A|G|A,0.071
mh06PK-24844,MHDBP-936bc36f79,C|C|G|C|C|C|A|A|G|A,0.000
mh06PK-24844,MHDBP-3dab7bdd14,C|C|G|C|C|C|A|A|G|A,0.329
Marker,Population,Allele,Frequency,Count
mh06PK-24844,MHDBP-383d86606a,T|C|G|C|C|C|A|A|G|A,0.000,99
mh06PK-24844,MHDBP-936bc36f79,T|C|G|C|C|C|A|A|G|A,0.000,87
mh06PK-24844,MHDBP-3dab7bdd14,T|C|G|C|C|C|A|A|G|A,0.123,73
mh06PK-24844,MHDBP-383d86606a,T|C|G|C|C|T|A|A|G|G,0.566,99
mh06PK-24844,MHDBP-936bc36f79,T|C|G|C|C|T|A|A|G|G,0.586,87
mh06PK-24844,MHDBP-3dab7bdd14,T|C|G|C|C|T|A|A|G|G,0.425,73
mh06PK-24844,MHDBP-383d86606a,C|C|G|C|C|C|A|A|G|A,0.071,99
mh06PK-24844,MHDBP-936bc36f79,C|C|G|C|C|C|A|A|G|A,0.000,87
mh06PK-24844,MHDBP-3dab7bdd14,C|C|G|C|C|C|A|A|G|A,0.329,73
```

### `population.csv`
Expand Down Expand Up @@ -200,6 +201,7 @@ They can be installed using pip and/or conda.
- UCSC liftover chain files
- hg19ToHg38
- hg38ToHg19
- UCSC RepeatMasker track

The following command will download data files required for the database build.

Expand Down
208 changes: 104 additions & 104 deletions dbbuild/sources/chen2019/frequency.csv
Original file line number Diff line number Diff line change
@@ -1,104 +1,104 @@
Marker,Population,Allele,Frequency
mh01CP-016,MHDBP-48c2cfb2aa,T|G|G,0.2166
mh01CP-016,MHDBP-48c2cfb2aa,T|G|A,0.2916
mh01CP-016,MHDBP-48c2cfb2aa,T|A|A,0.4250
mh01CP-016,MHDBP-48c2cfb2aa,A|G|A,0.0666
mh02CP-004,MHDBP-48c2cfb2aa,C|T|A,0.4166
mh02CP-004,MHDBP-48c2cfb2aa,C|T|G,0.0833
mh02CP-004,MHDBP-48c2cfb2aa,T|T|G,0.1500
mh02CP-004,MHDBP-48c2cfb2aa,C|G|G,0.3500
mh04CP-002,MHDBP-48c2cfb2aa,C|G|A,0.0584
mh04CP-002,MHDBP-48c2cfb2aa,C|G|T,0.1056
mh04CP-002,MHDBP-48c2cfb2aa,C|A|A,0.0544
mh04CP-002,MHDBP-48c2cfb2aa,G|A|A,0.0400
mh04CP-003,MHDBP-48c2cfb2aa,A|C|C,0.2250
mh04CP-003,MHDBP-48c2cfb2aa,G|C|C,0.1583
mh04CP-003,MHDBP-48c2cfb2aa,A|C|T,0.3416
mh04CP-003,MHDBP-48c2cfb2aa,G|T|C,0.2750
mh05CP-004,MHDBP-48c2cfb2aa,C|T|T,0.1250
mh05CP-004,MHDBP-48c2cfb2aa,C|T|C,0.2333
mh05CP-004,MHDBP-48c2cfb2aa,C|C|T,0.4500
mh05CP-004,MHDBP-48c2cfb2aa,C|C|C,0.0250
mh05CP-004,MHDBP-48c2cfb2aa,A|T|T,0.1666
mh06CP-003,MHDBP-48c2cfb2aa,A|A|C,0.2750
mh06CP-003,MHDBP-48c2cfb2aa,A|G|C,0.2416
mh06CP-003,MHDBP-48c2cfb2aa,G|G|C,0.1916
mh06CP-003,MHDBP-48c2cfb2aa,A|A|A,0.2833
mh06CP-003,MHDBP-48c2cfb2aa,G|A|C,0.0083
mh07CP-004,MHDBP-48c2cfb2aa,T|T|A|T|C,0.1333
mh07CP-004,MHDBP-48c2cfb2aa,A|A|T|A|T,0.4083
mh07CP-004,MHDBP-48c2cfb2aa,A|A|T|A|C,0.4583
mh10CP-003,MHDBP-48c2cfb2aa,C|C|A,0.3416
mh10CP-003,MHDBP-48c2cfb2aa,T|C|A,0.2666
mh10CP-003,MHDBP-48c2cfb2aa,C|C|C,0.3333
mh10CP-003,MHDBP-48c2cfb2aa,C|T|A,0.0583
mh11CP-003,MHDBP-48c2cfb2aa,A|G|A,0.3250
mh11CP-003,MHDBP-48c2cfb2aa,A|A|A,0.2000
mh11CP-003,MHDBP-48c2cfb2aa,A|A|C,0.2916
mh11CP-003,MHDBP-48c2cfb2aa,C|A|A,0.1833
mh14CP-003,MHDBP-48c2cfb2aa,G|G|G,0.1583
mh14CP-003,MHDBP-48c2cfb2aa,A|G|G,0.0833
mh14CP-003,MHDBP-48c2cfb2aa,G|G|A,0.4500
mh14CP-003,MHDBP-48c2cfb2aa,G|A|A,0.0333
mh14CP-003,MHDBP-48c2cfb2aa,G|A|G,0.2750
mh17CP-001,MHDBP-48c2cfb2aa,G|A|G,0.1583
mh17CP-001,MHDBP-48c2cfb2aa,G|G|G,0.3500
mh17CP-001,MHDBP-48c2cfb2aa,G|G|A,0.1250
mh17CP-001,MHDBP-48c2cfb2aa,C|A|G,0.3666
mh17CP-006,MHDBP-48c2cfb2aa,G|T|C,0.3750
mh17CP-006,MHDBP-48c2cfb2aa,C|T|T,0.1750
mh17CP-006,MHDBP-48c2cfb2aa,C|C|T,0.0916
mh17CP-006,MHDBP-48c2cfb2aa,C|T|C,0.3583
mh18CP-003,MHDBP-48c2cfb2aa,A|T|T,0.2666
mh18CP-003,MHDBP-48c2cfb2aa,A|C|T,0.2500
mh18CP-003,MHDBP-48c2cfb2aa,A|T|C,0.3500
mh18CP-003,MHDBP-48c2cfb2aa,G|T|C,0.1333
mh18CP-005,MHDBP-48c2cfb2aa,A|C|G|C,0.2666
mh18CP-005,MHDBP-48c2cfb2aa,A|T|A|C,0.2916
mh18CP-005,MHDBP-48c2cfb2aa,G|C|A|C,0.1333
mh18CP-005,MHDBP-48c2cfb2aa,A|C|A|T,0.3083
mh12CP-003,MHDBP-48c2cfb2aa,G|T|A,0.5166
mh12CP-003,MHDBP-48c2cfb2aa,G|C|G,0.1916
mh12CP-003,MHDBP-48c2cfb2aa,T|T|A,0.1833
mh12CP-003,MHDBP-48c2cfb2aa,G|C|A,0.1083
mh08CP-009,MHDBP-48c2cfb2aa,C|G|A,0.0500
mh08CP-009,MHDBP-48c2cfb2aa,T|G|A,0.2750
mh08CP-009,MHDBP-48c2cfb2aa,C|T|A,0.4333
mh08CP-009,MHDBP-48c2cfb2aa,T|G|C,0.2416
mh11CP-004,MHDBP-48c2cfb2aa,C|G|G,0.5583
mh11CP-004,MHDBP-48c2cfb2aa,C|G|A,0.1500
mh11CP-004,MHDBP-48c2cfb2aa,C|T|A,0.0916
mh11CP-004,MHDBP-48c2cfb2aa,T|G|G,0.2000
mh12CP-007,MHDBP-48c2cfb2aa,A|T|C,0.1583
mh12CP-007,MHDBP-48c2cfb2aa,C|T|C,0.2166
mh12CP-007,MHDBP-48c2cfb2aa,A|C|T,0.4500
mh12CP-007,MHDBP-48c2cfb2aa,A|T|T,0.1666
mh12CP-007,MHDBP-48c2cfb2aa,A|C|C,0.0083
mh10CP-005,MHDBP-48c2cfb2aa,T|T|T,0.3833
mh10CP-005,MHDBP-48c2cfb2aa,C|T|T,0.1083
mh10CP-005,MHDBP-48c2cfb2aa,C|T|C,0.3833
mh10CP-005,MHDBP-48c2cfb2aa,C|C|C,0.1250
mh17CP-002,MHDBP-48c2cfb2aa,G|G|T,0.2250
mh17CP-002,MHDBP-48c2cfb2aa,A|G|C,0.1333
mh17CP-002,MHDBP-48c2cfb2aa,A|G|T,0.2666
mh17CP-002,MHDBP-48c2cfb2aa,G|T|T,0.3750
mh04CP-004,MHDBP-48c2cfb2aa,C|C|C,0.4916
mh04CP-004,MHDBP-48c2cfb2aa,C|C|T,0.1500
mh04CP-004,MHDBP-48c2cfb2aa,T|C|C,0.1250
mh04CP-004,MHDBP-48c2cfb2aa,T|T|C,0.2333
mh01CP-010,MHDBP-48c2cfb2aa,T|C|A,0.3750
mh01CP-010,MHDBP-48c2cfb2aa,C|C|G,0.4750
mh01CP-010,MHDBP-48c2cfb2aa,C|C|A,0.0666
mh01CP-010,MHDBP-48c2cfb2aa,C|T|G,0.0833
mh13CP-010,MHDBP-48c2cfb2aa,G|G|A,0.3833
mh13CP-010,MHDBP-48c2cfb2aa,A|G|A,0.0416
mh13CP-010,MHDBP-48c2cfb2aa,A|G|G,0.2333
mh13CP-010,MHDBP-48c2cfb2aa,A|A|A,0.3416
mh16CP-001,MHDBP-48c2cfb2aa,G|C|C|C|T,0.3416
mh16CP-001,MHDBP-48c2cfb2aa,T|C|G|C|T,0.4833
mh16CP-001,MHDBP-48c2cfb2aa,G|A|G|C|T,0.1750
mh10CP-006,MHDBP-48c2cfb2aa,A|T|G|A|G,0.0916
mh10CP-006,MHDBP-48c2cfb2aa,A|C|G|G|G,0.2000
mh10CP-006,MHDBP-48c2cfb2aa,A|C|G|G|T,0.5333
mh10CP-006,MHDBP-48c2cfb2aa,C|T|G|A|G,0.0750
mh10CP-006,MHDBP-48c2cfb2aa,C|C|G|A|G,0.1000
Marker,Population,Allele,Frequency,Count
mh01CP-016,MHDBP-48c2cfb2aa,T|G|G,0.2166,
mh01CP-016,MHDBP-48c2cfb2aa,T|G|A,0.2916,
mh01CP-016,MHDBP-48c2cfb2aa,T|A|A,0.4250,
mh01CP-016,MHDBP-48c2cfb2aa,A|G|A,0.0666,
mh02CP-004,MHDBP-48c2cfb2aa,C|T|A,0.4166,
mh02CP-004,MHDBP-48c2cfb2aa,C|T|G,0.0833,
mh02CP-004,MHDBP-48c2cfb2aa,T|T|G,0.1500,
mh02CP-004,MHDBP-48c2cfb2aa,C|G|G,0.3500,
mh04CP-002,MHDBP-48c2cfb2aa,C|G|A,0.0584,
mh04CP-002,MHDBP-48c2cfb2aa,C|G|T,0.1056,
mh04CP-002,MHDBP-48c2cfb2aa,C|A|A,0.0544,
mh04CP-002,MHDBP-48c2cfb2aa,G|A|A,0.0400,
mh04CP-003,MHDBP-48c2cfb2aa,A|C|C,0.2250,
mh04CP-003,MHDBP-48c2cfb2aa,G|C|C,0.1583,
mh04CP-003,MHDBP-48c2cfb2aa,A|C|T,0.3416,
mh04CP-003,MHDBP-48c2cfb2aa,G|T|C,0.2750,
mh05CP-004,MHDBP-48c2cfb2aa,C|T|T,0.1250,
mh05CP-004,MHDBP-48c2cfb2aa,C|T|C,0.2333,
mh05CP-004,MHDBP-48c2cfb2aa,C|C|T,0.4500,
mh05CP-004,MHDBP-48c2cfb2aa,C|C|C,0.0250,
mh05CP-004,MHDBP-48c2cfb2aa,A|T|T,0.1666,
mh06CP-003,MHDBP-48c2cfb2aa,A|A|C,0.2750,
mh06CP-003,MHDBP-48c2cfb2aa,A|G|C,0.2416,
mh06CP-003,MHDBP-48c2cfb2aa,G|G|C,0.1916,
mh06CP-003,MHDBP-48c2cfb2aa,A|A|A,0.2833,
mh06CP-003,MHDBP-48c2cfb2aa,G|A|C,0.0083,
mh07CP-004,MHDBP-48c2cfb2aa,T|T|A|T|C,0.1333,
mh07CP-004,MHDBP-48c2cfb2aa,A|A|T|A|T,0.4083,
mh07CP-004,MHDBP-48c2cfb2aa,A|A|T|A|C,0.4583,
mh10CP-003,MHDBP-48c2cfb2aa,C|C|A,0.3416,
mh10CP-003,MHDBP-48c2cfb2aa,T|C|A,0.2666,
mh10CP-003,MHDBP-48c2cfb2aa,C|C|C,0.3333,
mh10CP-003,MHDBP-48c2cfb2aa,C|T|A,0.0583,
mh11CP-003,MHDBP-48c2cfb2aa,A|G|A,0.3250,
mh11CP-003,MHDBP-48c2cfb2aa,A|A|A,0.2000,
mh11CP-003,MHDBP-48c2cfb2aa,A|A|C,0.2916,
mh11CP-003,MHDBP-48c2cfb2aa,C|A|A,0.1833,
mh14CP-003,MHDBP-48c2cfb2aa,G|G|G,0.1583,
mh14CP-003,MHDBP-48c2cfb2aa,A|G|G,0.0833,
mh14CP-003,MHDBP-48c2cfb2aa,G|G|A,0.4500,
mh14CP-003,MHDBP-48c2cfb2aa,G|A|A,0.0333,
mh14CP-003,MHDBP-48c2cfb2aa,G|A|G,0.2750,
mh17CP-001,MHDBP-48c2cfb2aa,G|A|G,0.1583,
mh17CP-001,MHDBP-48c2cfb2aa,G|G|G,0.3500,
mh17CP-001,MHDBP-48c2cfb2aa,G|G|A,0.1250,
mh17CP-001,MHDBP-48c2cfb2aa,C|A|G,0.3666,
mh17CP-006,MHDBP-48c2cfb2aa,G|T|C,0.3750,
mh17CP-006,MHDBP-48c2cfb2aa,C|T|T,0.1750,
mh17CP-006,MHDBP-48c2cfb2aa,C|C|T,0.0916,
mh17CP-006,MHDBP-48c2cfb2aa,C|T|C,0.3583,
mh18CP-003,MHDBP-48c2cfb2aa,A|T|T,0.2666,
mh18CP-003,MHDBP-48c2cfb2aa,A|C|T,0.2500,
mh18CP-003,MHDBP-48c2cfb2aa,A|T|C,0.3500,
mh18CP-003,MHDBP-48c2cfb2aa,G|T|C,0.1333,
mh18CP-005,MHDBP-48c2cfb2aa,A|C|G|C,0.2666,
mh18CP-005,MHDBP-48c2cfb2aa,A|T|A|C,0.2916,
mh18CP-005,MHDBP-48c2cfb2aa,G|C|A|C,0.1333,
mh18CP-005,MHDBP-48c2cfb2aa,A|C|A|T,0.3083,
mh12CP-003,MHDBP-48c2cfb2aa,G|T|A,0.5166,
mh12CP-003,MHDBP-48c2cfb2aa,G|C|G,0.1916,
mh12CP-003,MHDBP-48c2cfb2aa,T|T|A,0.1833,
mh12CP-003,MHDBP-48c2cfb2aa,G|C|A,0.1083,
mh08CP-009,MHDBP-48c2cfb2aa,C|G|A,0.0500,
mh08CP-009,MHDBP-48c2cfb2aa,T|G|A,0.2750,
mh08CP-009,MHDBP-48c2cfb2aa,C|T|A,0.4333,
mh08CP-009,MHDBP-48c2cfb2aa,T|G|C,0.2416,
mh11CP-004,MHDBP-48c2cfb2aa,C|G|G,0.5583,
mh11CP-004,MHDBP-48c2cfb2aa,C|G|A,0.1500,
mh11CP-004,MHDBP-48c2cfb2aa,C|T|A,0.0916,
mh11CP-004,MHDBP-48c2cfb2aa,T|G|G,0.2000,
mh12CP-007,MHDBP-48c2cfb2aa,A|T|C,0.1583,
mh12CP-007,MHDBP-48c2cfb2aa,C|T|C,0.2166,
mh12CP-007,MHDBP-48c2cfb2aa,A|C|T,0.4500,
mh12CP-007,MHDBP-48c2cfb2aa,A|T|T,0.1666,
mh12CP-007,MHDBP-48c2cfb2aa,A|C|C,0.0083,
mh10CP-005,MHDBP-48c2cfb2aa,T|T|T,0.3833,
mh10CP-005,MHDBP-48c2cfb2aa,C|T|T,0.1083,
mh10CP-005,MHDBP-48c2cfb2aa,C|T|C,0.3833,
mh10CP-005,MHDBP-48c2cfb2aa,C|C|C,0.1250,
mh17CP-002,MHDBP-48c2cfb2aa,G|G|T,0.2250,
mh17CP-002,MHDBP-48c2cfb2aa,A|G|C,0.1333,
mh17CP-002,MHDBP-48c2cfb2aa,A|G|T,0.2666,
mh17CP-002,MHDBP-48c2cfb2aa,G|T|T,0.3750,
mh04CP-004,MHDBP-48c2cfb2aa,C|C|C,0.4916,
mh04CP-004,MHDBP-48c2cfb2aa,C|C|T,0.1500,
mh04CP-004,MHDBP-48c2cfb2aa,T|C|C,0.1250,
mh04CP-004,MHDBP-48c2cfb2aa,T|T|C,0.2333,
mh01CP-010,MHDBP-48c2cfb2aa,T|C|A,0.3750,
mh01CP-010,MHDBP-48c2cfb2aa,C|C|G,0.4750,
mh01CP-010,MHDBP-48c2cfb2aa,C|C|A,0.0666,
mh01CP-010,MHDBP-48c2cfb2aa,C|T|G,0.0833,
mh13CP-010,MHDBP-48c2cfb2aa,G|G|A,0.3833,
mh13CP-010,MHDBP-48c2cfb2aa,A|G|A,0.0416,
mh13CP-010,MHDBP-48c2cfb2aa,A|G|G,0.2333,
mh13CP-010,MHDBP-48c2cfb2aa,A|A|A,0.3416,
mh16CP-001,MHDBP-48c2cfb2aa,G|C|C|C|T,0.3416,
mh16CP-001,MHDBP-48c2cfb2aa,T|C|G|C|T,0.4833,
mh16CP-001,MHDBP-48c2cfb2aa,G|A|G|C|T,0.1750,
mh10CP-006,MHDBP-48c2cfb2aa,A|T|G|A|G,0.0916,
mh10CP-006,MHDBP-48c2cfb2aa,A|C|G|G|G,0.2000,
mh10CP-006,MHDBP-48c2cfb2aa,A|C|G|G|T,0.5333,
mh10CP-006,MHDBP-48c2cfb2aa,C|T|G|A|G,0.0750,
mh10CP-006,MHDBP-48c2cfb2aa,C|C|G|A|G,0.1000,
1 change: 1 addition & 0 deletions dbbuild/sources/chen2019/util.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,4 +39,5 @@ def reformat_frequencies(infile, outfile):
entry = (standardname, "MHDBP-48c2cfb2aa", haplotype, row.Frequency)
freqdata.append(entry)
freqtable = pd.DataFrame(freqdata, columns=["Marker", "Population", "Allele", "Frequency"])
freqtable["Count"] = None
freqtable.to_csv(outfile, index=False, float_format="%.4f")
Loading

0 comments on commit f0b9b5b

Please sign in to comment.