Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate URLs using the Bioregistry #5

Merged
merged 2 commits into from
Jan 25, 2022

Conversation

cthoyt
Copy link
Contributor

@cthoyt cthoyt commented Jan 24, 2022

Currently, the id2url function in the BERN2 Flask web application uses hard-coded logic for converting from entity identifiers (which I call compact URIs, or CURIEs, throughout this pull request) into URLs that can be used in the web pages of the application.

def id2url(_id):
if "MESH" in _id:
t_id = _id.split(":")[1]
return "https://id.nlm.nih.gov/mesh/{}.html".format(t_id)
elif "OMIM" in _id:
t_id = _id.split(":")[1]
return "https://omim.org/entry/{}".format(t_id)
elif "EntrezGene" in _id:
t_id = _id.split(":")[1]
return "https://www.ncbi.nlm.nih.gov/gene/{}".format(t_id)
elif "CVCL" in _id:
return "https://web.expasy.org/cellosaurus/{}".format(_id)
elif "NCBI" in _id:
t_id = _id.split(":")[1]
return "https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id={}".format(t_id)
else:
return ""

Related to the discussion in #3, the bioregistry package has a more general version of this function already built-in that can be used like

>>> import bioregistry
>>> bioregistry.get_iri("MESH:C063233")
'https://meshb.nlm.nih.gov/record/ui?ui=C063233'

Therefore, I updated the implementation of id2url to do a small amount of string preprocessing on NCBI Taxonomy identifiers (as motivated by #3 (comment)) and defer to bioregistry.get_iri for the remainder of the implementation.

More documentation on bioregistry.get_iri can be found here. I'm the creator of the Bioregistry and would be happy to answer any questions about it.

Related to the discussion in dmis-lab#3, the Bioregistry has the logic for generating URLs given CURIEs
@mjeensung mjeensung merged commit 1ca4fa9 into dmis-lab:main Jan 25, 2022
@mjeensung
Copy link
Contributor

mjeensung commented Jan 25, 2022

Thank you so much for your contribution @cthoyt!

@cthoyt cthoyt deleted the use-bioregistry branch January 25, 2022 07:55
@mjeensung
Copy link
Contributor

Hi @cthoyt,

I've noticed that when I use the BioRegistry library, the response time for annotating an article increases dramatically (e.g., from 300ms to 900ms).
I'm guessing bioregistry.get iri() takes a while to process.
Therefore, for now, I have commented out the parts you implemented due to slow latency.
Do you have any thoughts on this?

@cthoyt
Copy link
Contributor Author

cthoyt commented Jan 28, 2022

Hmm that's a good point - as the Bioregistry has grown, the bioregistry.get_iri() function has also become a bit slower. An alternative would be to use the Bioregistry's resolver as a way to generate URLs. For example, https://bioregistry.io/chebi:1234 redirects to the right chebi page, and can be either made with

>>> import bioregistry as br
>>> br.get_bioregistry_iri("chebi", "1234")
'https://bioregistry.io/chebi:1234'

Or using simple string concatenation of https://bioregistry.io/ and your newly normalized CURIEs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants