# NodeNormalization

## Introduction

[Node normalization](https://nodenormalization-sri.renci.org/apidocs) takes a CURIE, and returns:

* The preferred CURIE for this entity
* All other known equivalent identifiers for the entity
* Semantic types for the entity as defined by the [Biolink Model](https://biolink.github.io/biolink-model/)

The data currently served by Node Normalization is created by the prototype project [Babel](https://github.com/TranslatorIIPrototypes/Babel), which attempts to find identifier equivalences, and makes sure that CURIE prefixes are Biolink Model compliant.  NodeNormalization, however, is independent of Babel and as improved identifier equivalence tools are developed, their results can be easily incorporated.

## Metadata

There are two metadata services that can be used to find out what sorts of results have been incorporated into NodeNormalization.  These return the semantic types that are included, and the prefixes included for each type.

Which types have been normalized?

In [1]:
import json 
import requests

result = requests.get('https://nodenormalization-sri.renci.org/get_semantic_types')
print( json.dumps( result.json(), indent = 2))

{
  "semantic_types": {
    "types": [
      "biolink:ChemicalSubstance",
      "biolink:OntologyClass",
      "biolink:MolecularEntity",
      "biolink:AnatomicalEntity",
      "biolink:GenomicEntity",
      "biolink:CellularComponent",
      "biolink:GeneOrGeneProduct",
      "biolink:PhenotypicFeature",
      "biolink:BiologicalProcess",
      "biolink:OrganismTaxon",
      "biolink:GeneFamily",
      "biolink:DiseaseOrPhenotypicFeature",
      "biolink:Gene",
      "biolink:BiologicalProcessOrActivity",
      "biolink:OrganismalEntity",
      "biolink:Disease",
      "biolink:BiologicalEntity",
      "biolink:MolecularActivity",
      "biolink:MacromolecularMachine",
      "biolink:Cell",
      "biolink:NamedThing",
      "biolink:Pathway"
    ]
  }
}


Even if a semantic type has some identifier equivalence, not every vocabulary has been included.  To see which vocabularies are likely to give useful results, call:

In [3]:
result = requests.get('https://nodenormalization-sri.renci.org/get_curie_prefixes/',
                     params={'semantic_type':"biolink:ChemicalSubstance"})
print( json.dumps( result.json(), indent = 2))

{
  "biolink:ChemicalSubstance": {
    "curie_prefix": {
      "PUBCHEM.COMPOUND": "70567467",
      "INCHIKEY": "70408859",
      "CHEMBL.COMPOUND": "1380728",
      "MESH": "200431",
      "CHEBI": "91359",
      "KEGG": "13684",
      "HMDB": "83303",
      "UNII": "60587",
      "DRUGBANK": "7843",
      "gtpo": "5958"
    }
  }
}


More than one type can be queried:

In [4]:
result = requests.get('https://nodenormalization-sri.renci.org/get_curie_prefixes/',
                     params={'semantic_type':["biolink:ChemicalSubstance","biolink:Disease"]})
print( json.dumps( result.json(), indent = 2))

{
  "biolink:ChemicalSubstance": {
    "curie_prefix": {
      "PUBCHEM.COMPOUND": "70567467",
      "INCHIKEY": "70408859",
      "CHEMBL.COMPOUND": "1380728",
      "MESH": "200431",
      "CHEBI": "91359",
      "KEGG": "13684",
      "HMDB": "83303",
      "UNII": "60587",
      "DRUGBANK": "7843",
      "gtpo": "5958"
    }
  },
  "biolink:Disease": {
    "curie_prefix": {
      "UMLS": "109600",
      "SNOMEDCT": "75365",
      "NCIT": "18041",
      "MEDDRA": "10946",
      "MONDO": "22251",
      "DOID": "8995",
      "OMIM": "8762",
      "ORPHANET": "9182",
      "MESH": "10636",
      "HP": "1834",
      "EFO": "1914",
      "medgen": "3",
      "ICD10": "12",
      "ICD9": "5",
      "MP": "2"
    }
  }
}


## Normalization

Given one or more Compact URIs (CURIES), `get_normalized_node` will return a list of equivalent identifiers for the entity, along with the Translator-preferred identifier, and the semantic type(s) for the entity.  This service is merely returning pre-computed values, and does no equivalence inference on its own.  If a CURIE is unknown to it, then null is returned.

In this example, `get_normalized_node` is called with a MeSH identifier.   MeSH contains many different semantic types, but the service correctly identifies the term.

In [5]:
result = requests.get('https://nodenormalization-sri.renci.org/get_normalized_nodes',
                     params={'curie':"MESH:D014867"})
print( json.dumps( result.json(), indent = 2))

{
  "MESH:D014867": {
    "id": {
      "identifier": "CHEBI:15377",
      "label": "water"
    },
    "equivalent_identifiers": [
      {
        "identifier": "CHEBI:15377",
        "label": "water"
      },
      {
        "identifier": "CHEMBL.COMPOUND:CHEMBL1098659",
        "label": "WATER"
      },
      {
        "identifier": "DRUGBANK:DB09145"
      },
      {
        "identifier": "PUBCHEM.COMPOUND:22247451"
      },
      {
        "identifier": "PUBCHEM.COMPOUND:962"
      },
      {
        "identifier": "MESH:D014867",
        "label": "Water"
      },
      {
        "identifier": "HMDB:HMDB0002111"
      },
      {
        "identifier": "INCHIKEY:IKBQPNVYXHKVJS-LVZFUZTISA-N"
      },
      {
        "identifier": "UNII:059QF0KO0R"
      },
      {
        "identifier": "KEGG:C00001",
        "label": "H2O"
      }
    ],
    "type": [
      "biolink:ChemicalSubstance",
      "biolink:MolecularEntity",
      "biolink:BiologicalEntity",
      "biolink:NamedThing"
    ]
 

To improve performance, multiple CURIEs may be batched into a single function call as a POST:

In [11]:
result = requests.post('https://nodenormalization-sri.renci.org/get_normalized_nodes',
                     json={"curies":["HP:0007354", "HGNC:613", "CURIE:NOTHING"]})
print( json.dumps( result.json(), indent = 2))

{
  "HP:0007354": {
    "id": {
      "identifier": "MONDO:0004976",
      "label": "amyotrophic lateral sclerosis"
    },
    "equivalent_identifiers": [
      {
        "identifier": "MONDO:0004976",
        "label": "amyotrophic lateral sclerosis"
      },
      {
        "identifier": "DOID:332"
      },
      {
        "identifier": "ORPHANET:803"
      },
      {
        "identifier": "EFO:0000253",
        "label": "amyotrophic lateral sclerosis"
      },
      {
        "identifier": "UMLS:C0393554"
      },
      {
        "identifier": "UMLS:C0543859"
      },
      {
        "identifier": "UMLS:C0002736"
      },
      {
        "identifier": "MESH:D000690"
      },
      {
        "identifier": "MEDDRA:10002026"
      },
      {
        "identifier": "NCIT:C34373"
      },
      {
        "identifier": "SNOMEDCT:230258005"
      },
      {
        "identifier": "SNOMEDCT:86044005"
      },
      {
        "identifier": "HP:0007354",
        "label": "Amyotrophic lateral scl

## TRAPI

Node normalization will now operate on TRAPI messages (version 1.0)

In [12]:
trapi_message = {
  "query_graph": {
    "nodes": {
      "n1": {
        "id": "HGNC:11603",
        "category": [
          "biolink:Gene"
        ]
      },
      "n2": {
        "id": "NCBIGene:9496",
        "category": [
          "biolink:Gene"
        ]
      },
      "n3": {
        "id": "MONDO:0005002",
        "category": [
          "biolink:Disease"
        ]
      },
      "n4": {
        "id": "DOID:3083",
        "category": [
          "biolink:Disease"
        ]
      },
      "n5": {
        "category": [
          "biolink:Disease"
        ]
      }
    },
    "edges": {
      "e1": {
        "subject": "n1",
        "object": "n3"
      },
      "e2": {
        "subject": "n2",
        "object": "n4",
        "predicate": "biolink:related_to"
      },
      "e3": {
        "subject": "n1",
        "object": "n5"
      }
    }
  },
  "knowledge_graph": {
    "nodes": {
      "HGNC:11603": {
        "name": "TBX4",
        "category": [
          "biolink:Gene"
        ]
      },
      "NCBIGene:9496": {
        "name": "T-box transcription factor 4",
        "category": [
          "biolink:Gene"
        ]
      },
      "MONDO:0005002": {
        "name": "chronic obstructive pulmonary disease",
        "category": [
          "biolink:Disease"
        ]
      },
      "DOID:3083": {
        "name": "chronic obstructive pulmonary disease",
        "category": [
          "biolink:Disease"
        ]
      },
      "UMLS:CN202575": {
        "name": "heritable pulmonary arterial hypertension",
        "category": [
          "biolink:Disease"
        ]
      }
    },
    "edges": {
      "a8575c4e-61a6-428a-bf09-fcb3e8d1644d": {
        "subject": "HGNC:11603",
        "object": "MONDO:0005002",
        "predicate": "biolink:related_to",
        "relation": "RO:0003304"
      },
      "2d38345a-e9bf-4943-accb-dccba351dd04": {
        "subject": "NCBIGene:9496",
        "object": "DOID:3083",
        "predicate": "biolink:related_to",
        "relation": "RO:0003304"
      },
      "044a7916-fba9-4b4f-ae48-f0815b0b222d": {
        "subject": "HGNC:11603",
        "object": "UMLS:CN202575",
        "predicate": "biolink:related_to",
        "relation": "RO:0004013"
      }
    }
  },
  "results": [
    {
      "node_bindings": {
        "n1": [
          {
            "id": "HGNC:11603"
          }
        ],
        "n3": [
          {
            "id": "MONDO:0005002"
          }
        ]
      },
      "edge_bindings": {
        "e1": [
          {
            "id": "a8575c4e-61a6-428a-bf09-fcb3e8d1644d"
          }
        ]
      }
    },
    {
      "node_bindings": {
        "n2": [
          {
            "id": "NCBIGene:9496"
          }
        ],
        "n4": [
          {
            "id": "DOID:3083"
          }
        ]
      },
      "edge_bindings": {
        "e2": [
          {
            "id": "2d38345a-e9bf-4943-accb-dccba351dd04"
          }
        ]
      }
    },
    {
      "node_bindings": {
        "n1": [
          {
            "id": "HGNC:11603"
          }
        ],
        "n5": [
          {
            "id": "UMLS:CN202575"
          }
        ]
      },
      "edge_bindings": {
        "e3": [
          {
            "id": "044a7916-fba9-4b4f-ae48-f0815b0b222d"
          }
        ]
      }
    }
  ]
}

In [16]:
result = requests.post('https://nodenormalization-sri.renci.org/message',json=trapi_message)
print(result.status_code)
print( json.dumps( result.json(), indent = 2))

500


JSONDecodeError: Expecting value: line 1 column 1 (char 0)