Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large-scale ID panel from de la Puente, Phillips, et al 2020 paper #49

Merged
merged 9 commits into from
Jan 14, 2020

Conversation

standage
Copy link
Member

@standage standage commented Jan 13, 2020

This update brings in 118 additional marker definitions published recently by de la Puente, Phillips, and colleagues in FSI: Genetics. The marker definitions are provided in Supplementary File S1 of the paper. I used pdfminer to extract the text from this file, and then wrote text2table.py to organize the data into a tabular format with one row per marker. Finally, Snakefile defines a brief Snakemake workflow for using rsIDs to grab the GRCh38 coordinates of each marker and compiling the marker definitions into the format required by MicroHapDB.

NOTE: Four rsIDs needed manual attention during pre-processing.

  • The first SNP in marker "XqD" was marked as "nors", but manual examination confirmed that rs772115763 refers to the SNP at the correct position with the expected alleles.
  • rs74898010 is deprecated and has been merged into rs73151289
  • rs28970291 is deprecated and has been merged into rs4076758
  • rs72629020 is deprecated and has been merged into rs36190610

This update also drops pytest as an install dependency. Closes #41.

A special thanks to Chris Phillips for responding to queries about this work!

@standage standage added the datasources References to existing data sources or proposals for new sources label Jan 13, 2020
@standage standage merged commit 0ff4729 into master Jan 14, 2020
@standage standage deleted the puentephillips branch January 14, 2020 19:14
standage added a commit that referenced this pull request Jan 15, 2020
This update changes the "lab designator" from PP (Puente/Phillips) to USC (Universidade de Santiago de Compostela) for the marker collection most recently added to MicroHapDB in #49.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datasources References to existing data sources or proposals for new sources
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Drop pytest as an install dependency
1 participant