Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements for creating the NCBI schema on uta #210

Merged
merged 18 commits into from Nov 4, 2017

Conversation

andreasprlic
Copy link
Contributor

@andreasprlic andreasprlic commented Sep 11, 2017

I had several requests of getting Entrez gene id to RefSeq transcript associations in uta. This PR adds several improvements to the ncbi database and adds a new table to it, that contains the following IDs: hgnc, transcript, genebank ID, protein accession.

Here the example for data for BRCA1:

hgnc tx_ac gene_id pro_ac origin
BRCA1 NM_007294.3 672 NP_009225.1 NCBI
BRCA1 NM_007297.3 672 NP_009228.2 NCBI
BRCA1 NM_007298.3 672 NP_009229.2 NCBI
BRCA1 NM_007299.3 672 NP_009230.2 NCBI
BRCA1 NM_007300.3 672 NP_009231.2 NCBI
BRCA1 NR_027676.1 672 - NCBI
BRCA1 NM_007295.2 672 - NCBI
BRCA1 NM_007296.2 672 - NCBI
BRCA1 NM_007301.2 672 - NCBI
BRCA1 NM_007302.2 672 - NCBI
BRCA1 NM_007303.2 672 - NCBI
BRCA1 NM_007305.2 672 - NCBI
BRCA1 NM_007306.2 672 - NCBI

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants