![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/nlu/blob/master/examples/release_notebooks/NLU_3_0_2_release_notebook.ipynb)


# Entity Resolution
**Named entities** are sub-strings in a text that can be classified into catogires. For example, in the String   
`"Tesla is a great stock to invest  in "` , the sub-string `"Tesla"` is a named entity, it can be classified with the label `company` by an ML algorithm.  
**Named entities** can easily be extracted by the various pre-trained Deep Learning based NER algorithms provided by NLU. 



After extracting **named entities** an **entity resolution algorithm** can be applied to the extracted named entities. The resolution algorithm classifies each extracted entitiy into a class, which reduces dimensionality of the data and has many useful applications. 
For example : 
- "**Tesla** is a great stock to invest in "
- "**TSLA**  is a great stock to invest  in "
- "**Tesla, Inc** is a great company to invest in"    

The sub-strings `Tesla` , `TSLA` and `Tesla, Inc` are all named entities, that are classified with the labeld `company` by the NER algorithm. It tells us, all these 3 sub-strings are of type `company`, but we cannot yet infer that these 3 strings are actually referring to literally the same company.    

This exact problem is solved by the resolver algorithms, it would resolve all these 3 entities to a common name, like a company ID. This maps every reference of Tesla, regardless of how the string is represented, to the same ID.

This example can analogusly be expanded to healthcare any any other text problems. In medical documents, the same disease can be referenced in many different ways. 

With NLU Healthcare you can leverage state of the art pre-trained NER models to extract **Medical Named Entities** (Diseases, Treatments, Posology, etc..) and **resolve these** to common **healthcare disease codes**.


These algorithms are based provided by **Spark NLP for Healthcare's**  [SentenceEntitiyResolver](https://nlp.johnsnowlabs.com/docs/en/licensed_annotators#sentenceentityresolver) and [ChunkEntityResolvers](https://nlp.johnsnowlabs.com/docs/en/licensed_annotators#chunkentityresolver)

## New Entity Resolovers In NLU 3.0.2 

| NLU REF                           | NLP REF                                 |
|-----------------------------------|-----------------------------------------|
|[`en.resolve.umls`                    ](https://nlp.johnsnowlabs.com/2021/05/16/sbiobertresolve_umls_major_concepts_en.html)| [`sbiobertresolve_umls_major_concepts`](https://nlp.johnsnowlabs.com/2021/05/16/sbiobertresolve_umls_major_concepts_en.html)     |
|[`en.resolve.umls.findings`           ](https://nlp.johnsnowlabs.com/2021/05/16/sbiobertresolve_umls_findings_en.html)| [`sbiobertresolve_umls_findings`](https://nlp.johnsnowlabs.com/2021/05/16/sbiobertresolve_umls_findings_en.html)           |
|[`en.resolve.loinc`                   ](https://nlp.johnsnowlabs.com/2021/04/29/sbiobertresolve_loinc_en.html)| [`sbiobertresolve_loinc`](https://nlp.johnsnowlabs.com/2021/04/29/sbiobertresolve_loinc_en.html)                   |
|[`en.resolve.loinc.biobert`           ](https://nlp.johnsnowlabs.com/2021/04/29/sbiobertresolve_loinc_en.html)| [`sbiobertresolve_loinc`](https://nlp.johnsnowlabs.com/2021/04/29/sbiobertresolve_loinc_en.html)                   |
|[`en.resolve.loinc.bluebert`          ](https://nlp.johnsnowlabs.com/2021/04/29/sbluebertresolve_loinc_en.html)| [`sbluebertresolve_loinc`](https://nlp.johnsnowlabs.com/2021/04/29/sbluebertresolve_loinc_en.html)                  |
|[`en.resolve.HPO`                     ](https://nlp.johnsnowlabs.com/2021/05/16/sbiobertresolve_HPO_en.html)| [`sbiobertresolve_HPO`](https://nlp.johnsnowlabs.com/2021/05/16/sbiobertresolve_HPO_en.html)                     |



In [None]:
# Upload add your spark_nlp_fo"r_healthcare.json
!wget http://setup.johnsnowlabs.com/nlu/colab.sh -O - | bash
import nlu
nlu.auth(SPARK_NLP_LICENSE,AWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEY,JSL_SECRET)

--2021-06-01 17:05:29--  http://setup.johnsnowlabs.com/nlu/colab.sh
Resolving setup.johnsnowlabs.com (setup.johnsnowlabs.com)... 51.158.130.125
Connecting to setup.johnsnowlabs.com (setup.johnsnowlabs.com)|51.158.130.125|:80... connected.
HTTP request sent, awaiting response... 302 Moved Temporarily
Location: https://raw.githubusercontent.com/JohnSnowLabs/nlu/master/scripts/colab_setup.sh [following]
--2021-06-01 17:05:29--  https://raw.githubusercontent.com/JohnSnowLabs/nlu/master/scripts/colab_setup.sh
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1662 (1.6K) [text/plain]
Saving to: ‘STDOUT’


2021-06-01 17:05:29 (30.8 MB/s) - written to stdout [1662/1662]

Installing  NLU 3.0.1 with  PySpark 3.0.2 and Spark NLP 3.0.1 for Google Colab ...
Get:1 http://se

<module 'nlu' from '/usr/local/lib/python3.7/dist-packages/nlu/__init__.py'>

#### [Sentence Entity Resolver for UMLS CUI Codes](https://nlp.johnsnowlabs.com/2021/05/16/sbiobertresolve_umls_major_concepts_en.html)

In [None]:
nlu.load('med_ner.jsl.wip.clinical en.resolve.umls').predict("""A 28-year-old female with a history of gestational diabetes mellitus diagnosed eight years prior to presentation and
subsequent type two diabetes mellitus (TSS2DM), one prior episode of HTG-induced pancreatitis three years prior to presentation, associated with an acute 
hepatitis, and obesity with a body mass index (BMI) of 33.5 kg/m2, presented with a one-week history of polyuria, polydipsia, poor appetite, and vomiting."""
,output_level = 'sentence')

jsl_ner_wip_clinical download started this may take some time.
Approximate size to download 14.5 MB
[OK!]
sbiobertresolve_umls_major_concepts download started this may take some time.
Approximate size to download 825.5 MB
[OK!]
embeddings_clinical download started this may take some time.
Approximate size to download 1.6 GB
[OK!]
sbiobert_base_cased_mli download started this may take some time.
Approximate size to download 384.3 MB
[OK!]
sentence_detector_dl download started this may take some time.
Approximate size to download 354.6 KB
[OK!]


Unnamed: 0,sentence_embedding_biobert,word_embedding_glove,sentence_resolution_umls_confidence,sentence_resolution_umls_code,entities_class,sentence,entities,sentence_resolution_umls,text,entities_confidence
0,"[0.20559579133987427, -0.12945963442325592, -0...","[[0.040217556059360504, 0.4003961980342865, 0....",0.3357,C1969413,"[Age, Gender, Diabetes, RelativeDate, Modifier...",A 28-year-old female with a history of gestati...,"[28-year-old, female, gestational diabetes mel...",onset of periodic paralysis (mean) 5 years (ra...,A 28-year-old female with a history of gestati...,"[0.9999, 0.9992, 0.33813334, 0.18636668, 0.197..."


In [None]:
nlu.load('med_ner.jsl.wip.clinical en.resolve.umls').viz("""A 28-year-old female with a history of gestational diabetes mellitus diagnosed eight years prior to presentation and
subsequent type two diabetes mellitus (TSS2DM), one prior episode of HTG-induced pancreatitis three years prior to presentation, associated with an acute 
hepatitis, and obesity with a body mass index (BMI) of 33.5 kg/m2, presented with a one-week history of polyuria, polydipsia, poor appetite, and vomiting.""")

jsl_ner_wip_clinical download started this may take some time.
Approximate size to download 14.5 MB
[OK!]
sbiobertresolve_umls_major_concepts download started this may take some time.
Approximate size to download 825.5 MB
[OK!]
embeddings_clinical download started this may take some time.
Approximate size to download 1.6 GB
[OK!]
sbiobert_base_cased_mli download started this may take some time.
Approximate size to download 384.3 MB
[OK!]
sentence_detector_dl download started this may take some time.
Approximate size to download 354.6 KB
[OK!]


#### [Sentence Entity Resolver for UMLS CUI Codes](https://nlp.johnsnowlabs.com/2021/05/16/sbiobertresolve_umls_findings_en.html)





In [None]:
nlu.load('med_ner.jsl.wip.clinical en.resolve.umls.findings').predict("""A 28-year-old female with a history of gestational diabetes mellitus diagnosed eight years prior to presentation and
subsequent type two diabetes mellitus (TSS2DM), one prior episode of HTG-induced pancreatitis three years prior to presentation, associated with an acute 
hepatitis, and obesity with a body mass index (BMI) of 33.5 kg/m2, presented with a one-week history of polyuria, polydipsia, poor appetite, and vomiting."""
,output_level = 'sentence')

jsl_ner_wip_clinical download started this may take some time.
Approximate size to download 14.5 MB
[OK!]
sbiobertresolve_umls_findings download started this may take some time.
Approximate size to download 541.9 MB
[OK!]
embeddings_clinical download started this may take some time.
Approximate size to download 1.6 GB
[OK!]
sbiobert_base_cased_mli download started this may take some time.
Approximate size to download 384.3 MB
[OK!]
sentence_detector_dl download started this may take some time.
Approximate size to download 354.6 KB
[OK!]


Unnamed: 0,entities_confidence,entities,sentence_resolution_umls_confidence,word_embedding_glove,text,entities_class,sentence_resolution_umls,sentence,sentence_resolution_umls_code,sentence_embedding_biobert
0,"[0.9999, 0.9992, 0.33813334, 0.18636668, 0.197...","[28-year-old, female, gestational diabetes mel...",0.3376,"[[0.040217556059360504, 0.4003961980342865, 0....",A 28-year-old female with a history of gestati...,"[Age, Gender, Diabetes, RelativeDate, Modifier...",hair loss begins in middle of first decade and...,A 28-year-old female with a history of gestati...,C4538462,"[0.20559579133987427, -0.12945963442325592, -0..."


In [None]:
nlu.load('med_ner.jsl.wip.clinical en.resolve.umls.findings').viz("""A 28-year-old female with a history of gestational diabetes mellitus diagnosed eight years prior to presentation and
subsequent type two diabetes mellitus (TSS2DM), one prior episode of HTG-induced pancreatitis three years prior to presentation, associated with an acute 
hepatitis, and obesity with a body mass index (BMI) of 33.5 kg/m2, presented with a one-week history of polyuria, polydipsia, poor appetite, and vomiting."""
)

jsl_ner_wip_clinical download started this may take some time.
Approximate size to download 14.5 MB
[OK!]
sbiobertresolve_umls_findings download started this may take some time.
Approximate size to download 541.9 MB
[OK!]
embeddings_clinical download started this may take some time.
Approximate size to download 1.6 GB
[OK!]
sbiobert_base_cased_mli download started this may take some time.
Approximate size to download 384.3 MB
[OK!]
sentence_detector_dl download started this may take some time.
Approximate size to download 354.6 KB
[OK!]


#### [Loinc Sentence Entity Resolver](https://nlp.johnsnowlabs.com/2021/04/29/sbiobertresolve_loinc_en.html)

In [None]:
nlu.load('med_ner.jsl.wip.clinical en.resolve.loinc').predict("""A 28-year-old female with a history of gestational diabetes mellitus diagnosed eight years prior to presentation and
subsequent type two diabetes mellitus (TSS2DM), one prior episode of HTG-induced pancreatitis three years prior to presentation, associated with an acute 
hepatitis, and obesity with a body mass index (BMI) of 33.5 kg/m2, presented with a one-week history of polyuria, polydipsia, poor appetite, and vomiting."""
,output_level = 'sentence')

jsl_ner_wip_clinical download started this may take some time.
Approximate size to download 14.5 MB
[OK!]
sbiobertresolve_loinc download started this may take some time.
Approximate size to download 215.1 MB
[OK!]
sbiobert_base_cased_mli download started this may take some time.
Approximate size to download 384.3 MB
[OK!]
embeddings_clinical download started this may take some time.
Approximate size to download 1.6 GB
[OK!]
sentence_detector_dl download started this may take some time.
Approximate size to download 354.6 KB
[OK!]


Unnamed: 0,sentence_resolution_loinc_code,sentence_embedding_biobert,text,sentence_resolution_loinc_confidence,sentence_resolution_loinc,entities_confidence,entities,entities_class,word_embedding_glove,sentence
0,90383-1,"[0.20559579133987427, -0.12945963442325592, -0...",A 28-year-old female with a history of gestati...,0.3348,Considering your shortness of breath over the ...,"[0.9999, 0.9992, 0.33813334, 0.18636668, 0.197...","[28-year-old, female, gestational diabetes mel...","[Age, Gender, Diabetes, RelativeDate, Modifier...","[[0.040217556059360504, 0.4003961980342865, 0....",A 28-year-old female with a history of gestati...


In [None]:
nlu.load('med_ner.jsl.wip.clinical en.resolve.loinc.biobert').viz("""A 28-year-old female with a history of gestational diabetes mellitus diagnosed eight years prior to presentation and
subsequent type two diabetes mellitus (TSS2DM), one prior episode of HTG-induced pancreatitis three years prior to presentation, associated with an acute 
hepatitis, and obesity with a body mass index (BMI) of 33.5 kg/m2, presented with a one-week history of polyuria, polydipsia, poor appetite, and vomiting.""")

jsl_ner_wip_clinical download started this may take some time.
Approximate size to download 14.5 MB
[OK!]
sbiobertresolve_loinc download started this may take some time.
Approximate size to download 215.1 MB
[OK!]
sbiobert_base_cased_mli download started this may take some time.
Approximate size to download 384.3 MB
[OK!]
embeddings_clinical download started this may take some time.
Approximate size to download 1.6 GB
[OK!]
sentence_detector_dl download started this may take some time.
Approximate size to download 354.6 KB
[OK!]


#### [Loinc Sentence Entity Resolver](https://nlp.johnsnowlabs.com/2021/04/29/sbluebertresolve_loinc_en.html)

In [None]:
nlu.load('med_ner.jsl.wip.clinical en.resolve.loinc.bluebert').predict("""A 28-year-old female with a history of gestational diabetes mellitus diagnosed eight years prior to presentation and
subsequent type two diabetes mellitus (TSS2DM), one prior episode of HTG-induced pancreatitis three years prior to presentation, associated with an acute 
hepatitis, and obesity with a body mass index (BMI) of 33.5 kg/m2, presented with a one-week history of polyuria, polydipsia, poor appetite, and vomiting."""
,output_level = 'sentence')

jsl_ner_wip_clinical download started this may take some time.
Approximate size to download 14.5 MB
[OK!]
sbluebertresolve_loinc download started this may take some time.
Approximate size to download 216.2 MB
[OK!]
sbluebert_base_uncased_mli download started this may take some time.
Approximate size to download 388.1 MB
[OK!]
embeddings_clinical download started this may take some time.
Approximate size to download 1.6 GB
[OK!]
sentence_detector_dl download started this may take some time.
Approximate size to download 354.6 KB
[OK!]


Unnamed: 0,sentence_resolution_loinc_code,text,sentence_resolution_loinc_confidence,sentence_resolution_loinc,entities_confidence,sentence_embedding_bluebert,entities,entities_class,word_embedding_glove,sentence
0,54795-0,A 28-year-old female with a history of gestati...,0.3363,Diabetes mellitus in last 7D,"[0.9999, 0.9992, 0.33813334, 0.18636668, 0.197...","[-0.6638715267181396, -0.5056581497192383, 0.3...","[28-year-old, female, gestational diabetes mel...","[Age, Gender, Diabetes, RelativeDate, Modifier...","[[0.040217556059360504, 0.4003961980342865, 0....",A 28-year-old female with a history of gestati...


In [None]:
nlu.load('med_ner.jsl.wip.clinical en.resolve.loinc.bluebert').viz("""A 28-year-old female with a history of gestational diabetes mellitus diagnosed eight years prior to presentation and
subsequent type two diabetes mellitus (TSS2DM), one prior episode of HTG-induced pancreatitis three years prior to presentation, associated with an acute 
hepatitis, and obesity with a body mass index (BMI) of 33.5 kg/m2, presented with a one-week history of polyuria, polydipsia, poor appetite, and vomiting.""")

jsl_ner_wip_clinical download started this may take some time.
Approximate size to download 14.5 MB
[OK!]
sbluebertresolve_loinc download started this may take some time.
Approximate size to download 216.2 MB
[OK!]
sbluebert_base_uncased_mli download started this may take some time.
Approximate size to download 388.1 MB
[OK!]
embeddings_clinical download started this may take some time.
Approximate size to download 1.6 GB
[OK!]
sentence_detector_dl download started this may take some time.
Approximate size to download 354.6 KB
[OK!]


#### [Entity Resolver for Human Phenotype Ontology](https://nlp.johnsnowlabs.com/2021/05/16/sbiobertresolve_HPO_en.html)

In [None]:
nlu.load('med_ner.jsl.wip.clinical en.resolve.HPO').predict("""These disorders include cancer, bipolar disorder, schizophrenia, autism, Cri-du-chat syndrome,
 myopia, cortical cataract-linked Alzheimer's disease, and infectious diseases"""
,output_level = 'sentence')

jsl_ner_wip_clinical download started this may take some time.
Approximate size to download 14.5 MB
[OK!]
sbiobertresolve_HPO download started this may take some time.
Approximate size to download 98.9 MB
[OK!]
sbiobert_base_cased_mli download started this may take some time.
Approximate size to download 384.3 MB
[OK!]
embeddings_clinical download started this may take some time.
Approximate size to download 1.6 GB
[OK!]
sentence_detector_dl download started this may take some time.
Approximate size to download 354.6 KB
[OK!]


Unnamed: 0,sentence_embedding_biobert,sentence_resolution_HPO,text,entities_confidence,sentence_resolution_HPO_code,sentence_resolution_HPO_confidence,entities,entities_class,word_embedding_glove,sentence
0,"[-0.05154234543442726, -0.18823659420013428, -...",papillary renal cell carcinoma type 1,"These disorders include cancer, bipolar disord...","[0.9311, 0.8449, 0.9917, 0.8014, 0.43535, 0.98...",HP:0011797,0.3354,"[cancer, bipolar disorder, schizophrenia, auti...","[Oncological, Psychological_Condition, Psychol...","[[-0.19218440353870392, -0.11771761626005173, ...","These disorders include cancer, bipolar disord..."


In [None]:
nlu.load('med_ner.jsl.wip.clinical en.resolve.HPO').viz("""These disorders include cancer, bipolar disorder, schizophrenia, autism, Cri-du-chat syndrome,
 myopia, cortical cataract-linked Alzheimer's disease, and infectious diseases""")

jsl_ner_wip_clinical download started this may take some time.
Approximate size to download 14.5 MB
[OK!]
sbiobertresolve_HPO download started this may take some time.
Approximate size to download 98.9 MB
[OK!]
sbiobert_base_cased_mli download started this may take some time.
Approximate size to download 384.3 MB
[OK!]
embeddings_clinical download started this may take some time.
Approximate size to download 1.6 GB
[OK!]
sentence_detector_dl download started this may take some time.
Approximate size to download 354.6 KB
[OK!]
