Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Populate missing RSIDs #19

Open
apriha opened this issue Jun 25, 2019 · 0 comments
Open

Populate missing RSIDs #19

apriha opened this issue Jun 25, 2019 · 0 comments
Labels
enhancement New feature or request

Comments

@apriha
Copy link
Owner

apriha commented Jun 25, 2019

Utilize NCBI's Variation Services API to populate missing RSIDs.

An idea for how this could be performed... For all SNPs in a dataset, lookup RSIDs in batches of 50k SNPs (1 request / second) using the VCF endpoint. Then, use the resulting information to populate any missing RSIDs. However, note that this method would require the REF allele and a valid ALT allele.

As an example, one row for this query (using GRCh38) could be constructed as follows:

1 817186 . G A,C,G,T

Query:

curl -X POST "https://api.ncbi.nlm.nih.gov/variation/v0/vcf/file/set_rsids?assembly=GCF_000001405.38" -H "accept: text/plain; charset=utf-8" -H "Content-Type: text/plain; charset=utf-8" -d "1\t817186\t.\tG\tA,C,G,T"

Which would return:

1 817186 rs3094315 G A,C,G,T

This could also be used to verify SNP positions and help with #6.

Thanks to @gedankenstuecke for helping develop the idea!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant