Populate missing RSIDs #19

apriha · 2019-06-25T15:35:18Z

Utilize NCBI's Variation Services API to populate missing RSIDs.

An idea for how this could be performed... For all SNPs in a dataset, lookup RSIDs in batches of 50k SNPs (1 request / second) using the VCF endpoint. Then, use the resulting information to populate any missing RSIDs. However, note that this method would require the REF allele and a valid ALT allele.

As an example, one row for this query (using GRCh38) could be constructed as follows:

1 817186 . G A,C,G,T

Query:

curl -X POST "https://api.ncbi.nlm.nih.gov/variation/v0/vcf/file/set_rsids?assembly=GCF_000001405.38" -H "accept: text/plain; charset=utf-8" -H "Content-Type: text/plain; charset=utf-8" -d "1\t817186\t.\tG\tA,C,G,T"

Which would return:

1 817186 rs3094315 G A,C,G,T

This could also be used to verify SNP positions and help with #6.

Thanks to @gedankenstuecke for helping develop the idea!

The text was updated successfully, but these errors were encountered:

Fix and expand grch build detection

PhilPalmer mentioned this issue Oct 28, 2019

Questions: regarding using VCF & PLINK input files #34

Closed

apriha mentioned this issue Nov 28, 2019

Consolidate code to resolve SNP issues #42

Open

willgdjones mentioned this issue Dec 1, 2019

Assign SNPs on chromosome 0 #13

Open

apriha added the enhancement New feature or request label Dec 20, 2019

apriha pushed a commit that referenced this issue Jun 24, 2021

Merge pull request #19 from sanogenetics/hotfix/grch-build-detection

5609138

Fix and expand grch build detection

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Populate missing RSIDs #19

Populate missing RSIDs #19

apriha commented Jun 25, 2019

Populate missing RSIDs #19

Populate missing RSIDs #19

Comments

apriha commented Jun 25, 2019