In [30]:
# Parameters to set
URL_node_normalizer = 'https://nodenormalization-sri.renci.org/get_normalized_nodes'
CURIE_OPRM1_HGNC = "HGNC:8156"
CURIE_OPRM1_NCBI = "NCBIGene:4988"
CURIE_OPRM1_UMLS = "UMLS:C1417965"

def URL_name_resolution_search(search_string):
    return(f'https://name-resolution-sri.renci.org/lookup?string={search_string}&offset=0&limit=10')

In [31]:
import requests
import json

There are two separate tools in this notebook that cover the conversion between `labels` and `IDs`, one for each direction.  The Name Resolver works by taking names and returning a set of related IDs, while the Node Normalizer takes IDs and returns names and other equivalent identifiers.

The name resolver is shown here.  The name-resolver: https://name-resolution-sri.renci.org/docs has a lookup function that can take a string and return potential identifiers.  Here, we look up the string `tremor`

In [37]:
results = requests.post(URL_name_resolution_search('tremor')).json()

In [38]:
print(json.dumps(results,indent=4))

{
    "NCIT:C146780": [
        "Tremor",
        "Tremor, CTCAE_5",
        "Tremor, CTCAE 5.0"
    ],
    "HP:0001337": [
        "tremor",
        "Tremor",
        "TREMOR",
        "tremors",
        "TREMORS",
        "Tremors",
        "d tremors",
        "Tremor NOS",
        "Tremor, NOS",
        "Has a tremor",
        "A46-A47 TREMORS",
        "Shaking/Tremors",
        "Tremor (finding)",
        "tremor (diagnosis)",
        "tremors as symptom",
        "Tremor, unspecified",
        "tremor (physical finding)",
        "motor exam involuntary movements tremor trembles",
        "involuntary shaking or trembling movements (tremor)",
        "shake",
        "quiver",
        "Shakes",
        "shakes",
        "quivers",
        "Shaking",
        "SHAKING",
        "tremble",
        "Tremble",
        "shaking",
        "Trembled",
        "Trembles",
        "trembles",
        "Quivered",
        "quivering",
        "trembling",
        "Quivering",
        "TREMU

The node normalizer (https://nodenormalization-sri.renci.org/docs) takes any CURIE as input and returns the preferred CURIE along with all other synonymous CURIES where the input CURIE is included. It also returns labels for the node, the biolink classes of the node, and often the information content of the node.

In [None]:
nn_query = {
  "curies": [
    CURIE_OPRM1_HGNC
  ],
  "conflate": True
}
results_nn_true = requests.post(URL_node_normalizer,json=nn_query)

In [None]:
print(json.dumps(results_nn_true.json(),indent=4))

Note that when setting the `conflate` option to `True`, both gene and protein identifiers are included in the results.  When setting `conflate` to `False`, gene and protein identifiers are not merged together in the output.  In the run below, where `conflate` is `False`, only 5 entries are present in `equivalent_identifiers`.

In [10]:
nn_query = {
  "curies": [
    CURIE_OPRM1_HGNC
  ],
  "conflate": False
}
results_nn_false = requests.post(URL_node_normalizer,json=nn_query)

In [11]:
print(json.dumps(results_nn_false.json(),indent=4))

{
    "HGNC:8156": {
        "id": {
            "identifier": "NCBIGene:4988",
            "label": "OPRM1"
        },
        "equivalent_identifiers": [
            {
                "identifier": "NCBIGene:4988",
                "label": "OPRM1"
            },
            {
                "identifier": "ENSEMBL:ENSG00000112038"
            },
            {
                "identifier": "HGNC:8156",
                "label": "OPRM1"
            },
            {
                "identifier": "OMIM:600018"
            },
            {
                "identifier": "UMLS:C1417965",
                "label": "OPRM1 gene"
            }
        ],
        "type": [
            "biolink:Gene",
            "biolink:BiologicalEntity",
            "biolink:NamedThing",
            "biolink:Entity",
            "biolink:GeneOrGeneProduct",
            "biolink:GenomicEntity",
            "biolink:ChemicalEntityOrGeneOrGeneProduct",
            "biolink:PhysicalEssence",
            "biolink:Ont

In [25]:
nn_query = {
  "curies": [
    CURIE_OPRM1_NCBI,
    CURIE_OPRM1_UMLS,
    CURIE_OPRM1_HGNC
  ],
  "conflate": False
}
results_nn_multiple_inputs = requests.post(URL_node_normalizer,json=nn_query)

In [29]:
results_json = results_nn_multiple_inputs.json()
print(json.dumps(results_nn_multiple_inputs.json(),indent=4))

{
    "NCBIGene:4988": {
        "id": {
            "identifier": "NCBIGene:4988",
            "label": "OPRM1"
        },
        "equivalent_identifiers": [
            {
                "identifier": "NCBIGene:4988",
                "label": "OPRM1"
            },
            {
                "identifier": "ENSEMBL:ENSG00000112038"
            },
            {
                "identifier": "HGNC:8156",
                "label": "OPRM1"
            },
            {
                "identifier": "OMIM:600018"
            },
            {
                "identifier": "UMLS:C1417965",
                "label": "OPRM1 gene"
            }
        ],
        "type": [
            "biolink:Gene",
            "biolink:BiologicalEntity",
            "biolink:NamedThing",
            "biolink:Entity",
            "biolink:GeneOrGeneProduct",
            "biolink:GenomicEntity",
            "biolink:ChemicalEntityOrGeneOrGeneProduct",
            "biolink:PhysicalEssence",
            "biolink

In [27]:
for curie, result in results_json.items():
    print(f"CURIE: {curie}")
    print(f'Preferred ID: {result["id"]["identifier"]}\n')

CURIE: NCBIGene:4988
Preferred ID: NCBIGene:4988

CURIE: UMLS:C1417965
Preferred ID: NCBIGene:4988

CURIE: HGNC:8156
Preferred ID: NCBIGene:4988



In [28]:
id_list = []
for curie, result in results_json.items():
    if result['id']['identifier'] not in id_list:
        id_list.append(result['id']['identifier'])
        
print(id_list)

['NCBIGene:4988']
