In [1]:
# Parameters to set
URL_node_normalizer = 'https://nodenormalization-sri.renci.org/get_normalized_nodes'
CURIE_OPRM1_HGNC = "HGNC:8156"
CURIE_OPRM1_NCBI = "NCBIGene:4988"
CURIE_OPRM1_UMLS = "UMLS:C1417965"

def URL_name_resolution_search(search_string):
    return(f'https://name-resolution-sri.renci.org/lookup?string={search_string}&offset=0&limit=10')

In [2]:
import requests
import json

There are two separate tools in this notebook that cover the conversion between `labels` and `IDs`, one for each direction.  The Name Resolver works by taking names and returning a set of related IDs, while the Node Normalizer takes IDs and returns names and other equivalent identifiers.

The name resolver is shown here.  The name-resolver: https://name-resolution-sri.renci.org/docs has a lookup function that can take a string and return potential identifiers.  Here, we look up the string `tremor`

In [3]:
results = requests.post(URL_name_resolution_search('tremor')).json()

In [4]:
print(json.dumps(results,indent=4))

[
    {
        "curie": "HP:0001337",
        "label": "Tremor",
        "synonyms": [
            "shake",
            "shakes",
            "quiver",
            "Tremor",
            "TREMOR",
            "Shakes",
            "tremor",
            "tremors",
            "Tremble",
            "tremble",
            "Shaking",
            "shaking",
            "Tremors",
            "TREMORS",
            "SHAKING",
            "quivers",
            "Trembles",
            "Trembled",
            "Quivered",
            "trembles",
            "Trembling",
            "trembling",
            "TREMBLING",
            "d tremors",
            "tremulous",
            "Quivering",
            "quivering",
            "TREMULOUS",
            "Tremor NOS",
            "the shakes",
            "The shakes",
            "Tremor, NOS",
            "Has a tremor",
            "Shaking/Tremors",
            "Tremor (finding)",
            "Shaking all over",
            "tremors as symp

The node normalizer (https://nodenormalization-sri.renci.org/docs) takes any CURIE as input and returns the preferred CURIE along with all other synonymous CURIES where the input CURIE is included. It also returns labels for the node, the biolink classes of the node, and often the information content of the node.

In [5]:
nn_query = {
  "curies": [
    CURIE_OPRM1_HGNC
  ],
  "conflate": True
}
results_nn_true = requests.post(URL_node_normalizer,json=nn_query)

In [6]:
print(json.dumps(results_nn_true.json(),indent=4))

{
    "HGNC:8156": {
        "id": {
            "identifier": "NCBIGene:4988",
            "label": "OPRM1"
        },
        "equivalent_identifiers": [
            {
                "identifier": "NCBIGene:4988",
                "label": "OPRM1"
            },
            {
                "identifier": "ENSEMBL:ENSG00000112038"
            },
            {
                "identifier": "HGNC:8156",
                "label": "OPRM1"
            },
            {
                "identifier": "OMIM:600018"
            },
            {
                "identifier": "UMLS:C1417965",
                "label": "OPRM1 gene"
            },
            {
                "identifier": "UniProtKB:B8K2Q5",
                "label": "B8K2Q5_HUMAN Mu opioid receptor splice variant MOR-1H (Fragment) (trembl)"
            },
            {
                "identifier": "UniProtKB:G8XRH4",
                "label": "G8XRH4_HUMAN Mu opioid receptor splice variant hMOR-1S (trembl)"
            },
        

Note that when setting the `conflate` option to `True`, both gene and protein identifiers are included in the results.  When setting `conflate` to `False`, gene and protein identifiers are not merged together in the output.  In the run below, where `conflate` is `False`, only 5 entries are present in `equivalent_identifiers`.

In [7]:
nn_query = {
  "curies": [
    CURIE_OPRM1_HGNC
  ],
  "conflate": False
}
results_nn_false = requests.post(URL_node_normalizer,json=nn_query)

In [8]:
print(json.dumps(results_nn_false.json(),indent=4))

{
    "HGNC:8156": {
        "id": {
            "identifier": "NCBIGene:4988",
            "label": "OPRM1"
        },
        "equivalent_identifiers": [
            {
                "identifier": "NCBIGene:4988",
                "label": "OPRM1"
            },
            {
                "identifier": "ENSEMBL:ENSG00000112038"
            },
            {
                "identifier": "HGNC:8156",
                "label": "OPRM1"
            },
            {
                "identifier": "OMIM:600018"
            },
            {
                "identifier": "UMLS:C1417965",
                "label": "OPRM1 gene"
            }
        ],
        "type": [
            "biolink:Gene",
            "biolink:BiologicalEntity",
            "biolink:NamedThing",
            "biolink:Entity",
            "biolink:GeneOrGeneProduct",
            "biolink:GenomicEntity",
            "biolink:ChemicalEntityOrGeneOrGeneProduct",
            "biolink:PhysicalEssence",
            "biolink:Ont

Different CURIES may have the same or similar labels attached to them, and in these cases there may be a preferred identifier to use.  To illustrate this, three different identifiers for the gene `OPRM1` are submitted to the Node Normalizer.  

In [9]:
nn_query = {
  "curies": [
    CURIE_OPRM1_NCBI,
    CURIE_OPRM1_UMLS,
    CURIE_OPRM1_HGNC
  ],
  "conflate": False
}
results_nn_multiple_inputs = requests.post(URL_node_normalizer,json=nn_query)

In [10]:
results_json = results_nn_multiple_inputs.json()
print(json.dumps(results_nn_multiple_inputs.json(),indent=4))

{
    "NCBIGene:4988": {
        "id": {
            "identifier": "NCBIGene:4988",
            "label": "OPRM1"
        },
        "equivalent_identifiers": [
            {
                "identifier": "NCBIGene:4988",
                "label": "OPRM1"
            },
            {
                "identifier": "ENSEMBL:ENSG00000112038"
            },
            {
                "identifier": "HGNC:8156",
                "label": "OPRM1"
            },
            {
                "identifier": "OMIM:600018"
            },
            {
                "identifier": "UMLS:C1417965",
                "label": "OPRM1 gene"
            }
        ],
        "type": [
            "biolink:Gene",
            "biolink:BiologicalEntity",
            "biolink:NamedThing",
            "biolink:Entity",
            "biolink:GeneOrGeneProduct",
            "biolink:GenomicEntity",
            "biolink:ChemicalEntityOrGeneOrGeneProduct",
            "biolink:PhysicalEssence",
            "biolink

The preferred CURIES for each entry is shown below.  In this case, for all three of the CURIES submitted for `OPMR1`, the preferred CURIE is the same: `NCBIGene:4988`.

In [11]:
for curie, result in results_json.items():
    print(f"CURIE: {curie}")
    print(f'Preferred ID: {result["id"]["identifier"]}\n')

CURIE: NCBIGene:4988
Preferred ID: NCBIGene:4988

CURIE: UMLS:C1417965
Preferred ID: NCBIGene:4988

CURIE: HGNC:8156
Preferred ID: NCBIGene:4988



In [12]:
id_list = []
for curie, result in results_json.items():
    if result['id']['identifier'] not in id_list:
        id_list.append(result['id']['identifier'])
        
print(id_list)

['NCBIGene:4988']
