Generate URLs using the Bioregistry #5

cthoyt · 2022-01-24T10:07:38Z

Currently, the id2url function in the BERN2 Flask web application uses hard-coded logic for converting from entity identifiers (which I call compact URIs, or CURIEs, throughout this pull request) into URLs that can be used in the web pages of the application.

BERN2/app/result_parser.py

Lines 13 to 29 in a16b9c7

    
           def id2url(_id): 
        
               if "MESH" in _id: 
        
                   t_id = _id.split(":")[1] 
        
                   return "https://id.nlm.nih.gov/mesh/{}.html".format(t_id) 
        
               elif "OMIM" in _id: 
        
                   t_id = _id.split(":")[1] 
        
                   return "https://omim.org/entry/{}".format(t_id) 
        
               elif "EntrezGene" in _id: 
        
                   t_id = _id.split(":")[1] 
        
                   return "https://www.ncbi.nlm.nih.gov/gene/{}".format(t_id) 
        
               elif "CVCL" in _id: 
        
                   return "https://web.expasy.org/cellosaurus/{}".format(_id) 
        
               elif "NCBI" in _id: 
        
                   t_id = _id.split(":")[1] 
        
                   return "https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id={}".format(t_id) 
        
               else: 
        
                   return ""

Related to the discussion in #3, the bioregistry package has a more general version of this function already built-in that can be used like

>>> import bioregistry
>>> bioregistry.get_iri("MESH:C063233")
'https://meshb.nlm.nih.gov/record/ui?ui=C063233'

Therefore, I updated the implementation of id2url to do a small amount of string preprocessing on NCBI Taxonomy identifiers (as motivated by #3 (comment)) and defer to bioregistry.get_iri for the remainder of the implementation.

More documentation on bioregistry.get_iri can be found here. I'm the creator of the Bioregistry and would be happy to answer any questions about it.

Related to the discussion in dmis-lab#3, the Bioregistry has the logic for generating URLs given CURIEs

mjeensung · 2022-01-25T01:51:27Z

Thank you so much for your contribution @cthoyt!

mjeensung · 2022-01-27T16:59:46Z

Hi @cthoyt,

I've noticed that when I use the BioRegistry library, the response time for annotating an article increases dramatically (e.g., from 300ms to 900ms).
I'm guessing bioregistry.get iri() takes a while to process.
Therefore, for now, I have commented out the parts you implemented due to slow latency.
Do you have any thoughts on this?

cthoyt · 2022-01-28T09:44:02Z

Hmm that's a good point - as the Bioregistry has grown, the bioregistry.get_iri() function has also become a bit slower. An alternative would be to use the Bioregistry's resolver as a way to generate URLs. For example, https://bioregistry.io/chebi:1234 redirects to the right chebi page, and can be either made with

>>> import bioregistry as br
>>> br.get_bioregistry_iri("chebi", "1234")
'https://bioregistry.io/chebi:1234'

Or using simple string concatenation of https://bioregistry.io/ and your newly normalized CURIEs

cthoyt added 2 commits January 24, 2022 11:01

Use Bioregistry for URL generation

df0e6d0

Related to the discussion in dmis-lab#3, the Bioregistry has the logic for generating URLs given CURIEs

Update result_parser.py

931652b

mjeensung merged commit 1ca4fa9 into dmis-lab:main Jan 25, 2022

cthoyt deleted the use-bioregistry branch January 25, 2022 07:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generate URLs using the Bioregistry #5

Generate URLs using the Bioregistry #5

cthoyt commented Jan 24, 2022

mjeensung commented Jan 25, 2022 •

edited

mjeensung commented Jan 27, 2022

cthoyt commented Jan 28, 2022

	def id2url(_id):
	if "MESH" in _id:
	t_id = _id.split(":")[1]
	return "https://id.nlm.nih.gov/mesh/{}.html".format(t_id)
	elif "OMIM" in _id:
	t_id = _id.split(":")[1]
	return "https://omim.org/entry/{}".format(t_id)
	elif "EntrezGene" in _id:
	t_id = _id.split(":")[1]
	return "https://www.ncbi.nlm.nih.gov/gene/{}".format(t_id)
	elif "CVCL" in _id:
	return "https://web.expasy.org/cellosaurus/{}".format(_id)
	elif "NCBI" in _id:
	t_id = _id.split(":")[1]
	return "https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id={}".format(t_id)
	else:
	return ""

Generate URLs using the Bioregistry #5

Generate URLs using the Bioregistry #5

Conversation

cthoyt commented Jan 24, 2022

mjeensung commented Jan 25, 2022 • edited

mjeensung commented Jan 27, 2022

cthoyt commented Jan 28, 2022

mjeensung commented Jan 25, 2022 •

edited