Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove CHEBI:CHEBI:41423 from PUBCHEM.COMPOUND:2662 results #69

Closed
gaurav opened this issue Sep 23, 2022 · 2 comments · Fixed by #78
Closed

Remove CHEBI:CHEBI:41423 from PUBCHEM.COMPOUND:2662 results #69

gaurav opened this issue Sep 23, 2022 · 2 comments · Fixed by #78
Assignees

Comments

@gaurav
Copy link
Collaborator

gaurav commented Sep 23, 2022

Querying PUBCHEM.COMPOUND:2662 returns CHEBI:CHEBI:41423 as an equivalent ID, which is wrong: https://nodenormalization-sri.renci.org/1.3/get_normalized_nodes?curie=PUBCHEM.COMPOUND%3A2662&conflate=true

Note also that MESH:C105934 is labeled as [OBSOLETE] celecoxib -- are we definitely identifying obsolete MESH terms correctly?

@colleenXu
Copy link

A similar issue: @cbizon noticed that in this screenshot, there is a CHEBI:CHEBI: ID but no correct CHEBI ID. I noticed that the top synonym seemed to be the UNII ID rather than a human-readable name (vitamin b12)

Screen Shot 2022-09-23 at 1 48 56 PM

@gaurav
Copy link
Collaborator Author

gaurav commented Sep 27, 2022

The CHEBI:CHEBI: bug seems to be coming from pull_drugcentral in drugcentral.py, which creates an xref file whose second column is expected to be an identifier only; however, for CHEBI (and only CHEBI) this column includes the CHEBI: prefix, and so we double-concatenate it later on. Working on a fix now.

gaurav added a commit that referenced this issue Oct 11, 2022
gaurav added a commit that referenced this issue Oct 13, 2022
The CHEBI:CHEBI: bug is caused by DrugCentral adding a `CHEBI:` prefix to its identifier. This PR checks for that prefix and removes it so the rest of our code can work correctly. Since it only works on `CHEBI` entries with the prefix, it should be fairly safe to leave in even if DrugCentral removes the prefix at a later date. Fixes #69.

This also deletes the non-functional `get_drugcentralx` target, which refers to a function that no longer exists in this repository. Closes #77.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants