![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/nlu/blob/master/examples/colab/healthcare/entity_resolution/entity_resolvers_overview.ipynb)



# Entity Resolution
**Named entities** are sub-strings in a text that can be classified into catogires. For example, in the String   
`"Tesla is a great stock to invest  in "` , the sub-string `"Tesla"` is a named entity, it can be classified with the label `company` by an ML algorithm.  
**Named entities** can easily be extracted by the various pre-trained Deep Learning based NER algorithms provided by NLU.



After extracting **named entities** an **entity resolution algorithm** can be applied to the extracted named entities. The resolution algorithm classifies each extracted entitiy into a class, which reduces dimensionality of the data and has many useful applications.
For example :
- "**Tesla** is a great stock to invest in "
- "**TSLA**  is a great stock to invest  in "
- "**Tesla, Inc** is a great company to invest in"    

The sub-strings `Tesla` , `TSLA` and `Tesla, Inc` are all named entities, that are classified with the labeld `company` by the NER algorithm. It tells us, all these 3 sub-strings are of type `company`, but we cannot yet infer that these 3 strings are actually referring to literally the same company.    

This exact problem is solved by the resolver algorithms, it would resolve all these 3 entities to a common name, like a company ID. This maps every reference of Tesla, regardless of how the string is represented, to the same ID.

This example can analogusly be expanded to healthcare any any other text problems. In medical documents, the same disease can be referenced in many different ways.

With NLU Healthcare you can leverage state of the art pre-trained NER models to extract **Medical Named Entities** (Diseases, Treatments, Posology, etc..) and **resolve these** to common **healthcare disease codes**.


These algorithms are based provided by **Spark NLP for Healthcare's**  [SentenceEntitiyResolver](https://nlp.johnsnowlabs.com/docs/en/licensed_annotators#sentenceentityresolver) and [ChunkEntityResolvers](https://nlp.johnsnowlabs.com/docs/en/licensed_annotators#chunkentityresolver)


## Avaiable models






All the models avaiable are :


| Language | nlp.load() reference                                         | Spark NLP Model reference          |
| -------- | ------------------------------------------------------------ | ------------------------------------------------------------ |
| English  | embed_sentence.biobert.mli | sbiobert_base_cased_mli          |
| English  | resolve | sbiobertresolve_cpt          |
| English  | resolve.cpt | sbiobertresolve_cpt          |
| English  | resolve.cpt.augmented | sbiobertresolve_cpt_augmented          |
| English  | resolve.cpt.procedures_augmented | sbiobertresolve_cpt_procedures_augmented          |
| English  | resolve.hcc.augmented | sbiobertresolve_hcc_augmented          |
| English  | [resolve.icd10cm](https://nlp.johnsnowlabs.com/2020/11/27/sbiobertresolve_icd10cm_en.html) | [sbiobertresolve_icd10cm](https://nlp.johnsnowlabs.com/2020/11/27/sbiobertresolve_icd10cm_en.html)                   |
| English  | [resolve.icd10cm.augmented](https://nlp.johnsnowlabs.com/2020/12/13/sbiobertresolve_icd10cm_augmented_en.html) | [sbiobertresolve_icd10cm_augmented](https://nlp.johnsnowlabs.com/2020/12/13/sbiobertresolve_icd10cm_augmented_en.html)                   |
| English  | [resolve.icd10cm.augmented_billable](https://nlp.johnsnowlabs.com/2021/02/06/sbiobertresolve_icd10cm_augmented_billable_hcc_en.html) | [sbiobertresolve_icd10cm_augmented_billable_hcc](https://nlp.johnsnowlabs.com/2021/02/06/sbiobertresolve_icd10cm_augmented_billable_hcc_en.html)                   |
| English  | [resolve.icd10pcs](https://nlp.johnsnowlabs.com/2020/11/27/sbiobertresolve_icd10pcs_en.html) | [sbiobertresolve_icd10pcs](https://nlp.johnsnowlabs.com/2020/11/27/sbiobertresolve_icd10pcs_en.html)                   |
| English  | [resolve.icdo](https://nlp.johnsnowlabs.com/2020/11/27/sbiobertresolve_icdo_en.html) | [sbiobertresolve_icdo](https://nlp.johnsnowlabs.com/2020/11/27/sbiobertresolve_icdo_en.html)                   |
| English  | [resolve.rxcui](https://nlp.johnsnowlabs.com/2020/12/11/sbiobertresolve_rxcui_en.html) | [sbiobertresolve_rxcui](https://nlp.johnsnowlabs.com/2020/12/11/sbiobertresolve_rxcui_en.html)                   |
| English  | [resolve.rxnorm](https://nlp.johnsnowlabs.com/2020/11/27/sbiobertresolve_rxnorm_en.html) | [sbiobertresolve_rxnorm](https://nlp.johnsnowlabs.com/2020/11/27/sbiobertresolve_rxnorm_en.html)                   |
| English  | [resolve.snomed](https://nlp.johnsnowlabs.com/2020/11/27/sbiobertresolve_snomed_auxConcepts_en.html) | [sbiobertresolve_snomed_auxConcepts](https://nlp.johnsnowlabs.com/2020/11/27/sbiobertresolve_snomed_auxConcepts_en.html)                   |
| English  | [resolve.snomed.aux_concepts](https://nlp.johnsnowlabs.com/2020/11/27/sbiobertresolve_snomed_auxConcepts_en.html) | [sbiobertresolve_snomed_auxConcepts](https://nlp.johnsnowlabs.com/2020/11/27/sbiobertresolve_snomed_auxConcepts_en.html)                   |
| English  | [resolve.snomed.aux_concepts_int](https://nlp.johnsnowlabs.com/2020/11/27/sbiobertresolve_snomed_auxConcepts_int_en.html) | [sbiobertresolve_snomed_auxConcepts_int](https://nlp.johnsnowlabs.com/2020/11/27/sbiobertresolve_snomed_auxConcepts_int_en.html)                   |
| English  | [resolve.snomed.findings](https://nlp.johnsnowlabs.com/2020/11/27/sbiobertresolve_snomed_findings_en.html) | [sbiobertresolve_snomed_findings](https://nlp.johnsnowlabs.com/2020/11/27/sbiobertresolve_snomed_findings_en.html)                   |
| English  | [resolve.snomed.findings_int](https://nlp.johnsnowlabs.com/2020/11/27/sbiobertresolve_snomed_findings_int_en.html) | [sbiobertresolve_snomed_findings_int](https://nlp.johnsnowlabs.com/2020/11/27/sbiobertresolve_snomed_findings_int_en.html)                   |


In [None]:
from johnsnowlabs import nlp

# After uploading your license run this to install all licensed Python Wheels and pre-download Jars the Spark Session JVM
nlp.install()

#### [Sentence Entity Resolver for ICD10-CM (sbiobert_base_cased_mli embeddings)](https://nlp.johnsnowlabs.com/2020/11/27/sbiobertresolve_icd10cm_en.html)

In [None]:
nlp.load("med_ner.jsl.wip.clinical en.resolve.icd10cm").predict("""This is an 82 - year-old male with a history of prior tobacco use , hypertension , chronic renal insufficiency , COPD ,
gastritis , and TIA who initially presented to Braintree with a non-ST elevation MI and Guaiac positive stools , transferred to St . Margaret\'s Center for Women & Infants for cardiac
catheterization with PTCA to mid LAD lesion complicated by hypotension and bradycardia requiring Atropine , IV fluids and transient dopamine possibly secondary to vagal reaction ,
subsequently transferred to CCU for close monitoring , hemodynamically stable at the time of admission to the CCU .""",output_level =  "sentence")

ner_wikiner_glove_840B_300 download started this may take some time.
Approximate size to download 14.8 MB
[OK!]
sbiobertresolve_icd10cm download started this may take some time.
[OK!]
setInputCols in ENTITY_05f4bc95f0c5 expecting 1 columns. Provided column amount: 2. Which should be columns from the following annotators: ['sentence_embeddings']
sbiobert_base_cased_mli download started this may take some time.
Approximate size to download 384.3 MB
[OK!]
glove_840B_300 download started this may take some time.
Approximate size to download 2.3 GB
[OK!]
setInputCols in ENTITY_05f4bc95f0c5 expecting 1 columns. Provided column amount: 2. Which should be columns from the following annotators: ['sentence_embeddings']
setInputCols in ENTITY_05f4bc95f0c5 expecting 1 columns. Provided column amount: 2. Which should be columns from the following annotators: ['sentence_embeddings']
setInputCols in ENTITY_05f4bc95f0c5 expecting 1 columns. Provided column amount: 2. Which should be columns from the f

Unnamed: 0,entities_wikiner_glove_840B_300,entities_wikiner_glove_840B_300_class,entities_wikiner_glove_840B_300_confidence,entities_wikiner_glove_840B_300_origin_chunk,entities_wikiner_glove_840B_300_origin_sentence,resolution_icd10cm_code,resolution_icd10cm_confidence,resolution_icd10cm_distance,resolution_icd10cm_k_codes,resolution_icd10cm_k_confidences,resolution_icd10cm_k_cos_distances,resolution_icd10cm_k_distances,resolution_icd10cm_k_resolution,resolution_icd10cm_origin_sentence,resolution_icd10cm_resolved_text,resolution_icd10cm_target_text,resolution_icd10cm_token,sentence,sentence_embedding_biobert,word_embedding_glove
0,"[Braintree, non-ST, Guaiac, St, Margaret's Cen...","[LOC, MISC, MISC, LOC, ORG, ORG, ORG]","[0.8775, 0.873, 0.3519, 0.5045, 0.43828332, 0....","[0, 1, 2, 3, 4, 5, 6]","[0, 0, 0, 0, 0, 0, 0]","[L946, I781, B483, R061, O3402, A57, A57]","[0.3423, 0.3456, 0.3525, 0.3601, 0.3349, 0.334...","[0.5333, 0.3553, 0.3299, 0.3006, 0.4539, 0.484...","[[L946, L0213, M5145], [I781, Y66, S1191XS], [...","[[0.3423, 0.3289, 0.3288], [0.3456, 0.3329, 0....","[[0.2845, 0.3285, 0.3288], [0.1263, 0.1542, 0....","[[0.5333, 0.5731, 0.5734], [0.3553, 0.3927, 0....","[[Ainhum, Carbuncle of neck, Schmorl's nodes, ...","[0, 0, 0, 0, 0, 0, 0]","[Ainhum, Nevus, non-neoplastic, Geotrichosis, ...","[Braintree, non-ST, Guaiac, St, Margaret's Cen...","[Braintree, non-ST, Guaiac, St, Margaret's Cen...",This is an 82 - year-old male with a history o...,"[[0.41123586893081665, 0.3981207609176636, -0....","[[-0.27410000562667847, 0.22980999946594238, 0..."


#### [Sentence Entity Resolver for ICD10-PCS (sbiobert_base_cased_mli embeddings)](https://nlp.johnsnowlabs.com/2020/11/27/sbiobertresolve_icd10pcs_en.html)

In [None]:
nlp.load("med_ner.jsl.wip.clinical en.resolve.icd10pcs").predict("""This is an 82 - year-old male with a history of prior tobacco use , hypertension , chronic renal insufficiency , COPD ,
gastritis , and TIA who initially presented to Braintree with a non-ST elevation MI and Guaiac positive stools , transferred to St . Margaret\'s Center for Women & Infants for cardiac
catheterization with PTCA to mid LAD lesion complicated by hypotension and bradycardia requiring Atropine , IV fluids and transient dopamine possibly secondary to vagal reaction ,
subsequently transferred to CCU for close monitoring , hemodynamically stable at the time of admission to the CCU .""",output_level =  "sentence")

ner_wikiner_glove_840B_300 download started this may take some time.
Approximate size to download 14.8 MB
[OK!]
sbiobertresolve_icd10pcs download started this may take some time.
[OK!]
setInputCols in ENTITY_7090bfd98dcf expecting 1 columns. Provided column amount: 2. Which should be columns from the following annotators: ['sentence_embeddings']
sbiobert_base_cased_mli download started this may take some time.
Approximate size to download 384.3 MB
[OK!]
glove_840B_300 download started this may take some time.
Approximate size to download 2.3 GB
[OK!]
setInputCols in ENTITY_7090bfd98dcf expecting 1 columns. Provided column amount: 2. Which should be columns from the following annotators: ['sentence_embeddings']
setInputCols in ENTITY_7090bfd98dcf expecting 1 columns. Provided column amount: 2. Which should be columns from the following annotators: ['sentence_embeddings']
setInputCols in ENTITY_7090bfd98dcf expecting 1 columns. Provided column amount: 2. Which should be columns from the 

Unnamed: 0,entities_wikiner_glove_840B_300,entities_wikiner_glove_840B_300_class,entities_wikiner_glove_840B_300_confidence,entities_wikiner_glove_840B_300_origin_chunk,entities_wikiner_glove_840B_300_origin_sentence,resolution_icd10pcs_code,resolution_icd10pcs_confidence,resolution_icd10pcs_distance,resolution_icd10pcs_k_codes,resolution_icd10pcs_k_confidences,resolution_icd10pcs_k_cos_distances,resolution_icd10pcs_k_distances,resolution_icd10pcs_k_resolution,resolution_icd10pcs_origin_sentence,resolution_icd10pcs_resolved_text,resolution_icd10pcs_target_text,resolution_icd10pcs_token,sentence,sentence_embedding_biobert,word_embedding_glove
0,"[Braintree, non-ST, Guaiac, St, Margaret's Cen...","[LOC, MISC, MISC, LOC, ORG, ORG, ORG]","[0.8775, 0.873, 0.3519, 0.5045, 0.43828332, 0....","[0, 1, 2, 3, 4, 5, 6]","[0, 0, 0, 0, 0, 0, 0]","[B040ZZZ, F00Z7ZZ, F00ZDZZ, F01ZCZZ, BY4DZZZ, ...","[0.3347, 0.3430, 0.3399, 0.3471, 0.3350, 0.338...","[0.5942, 0.4194, 0.4798, 0.4559, 0.5078, 0.524...","[[B040ZZZ, 2W65XZZ, F14Z3ZZ], [F00Z7ZZ, 5A1905...","[[0.3347, 0.3336, 0.3317], [0.3430, 0.3306, 0....","[[0.3531, 0.3571, 0.3639], [0.1759, 0.2082, 0....","[[0.5942, 0.5976, 0.6032], [0.4194, 0.4563, 0....","[[Ultrasonography of Brain, Traction of Back, ...","[0, 0, 0, 0, 0, 0, 0]","[Ultrasonography of Brain, Nonspoken Language ...","[Braintree, non-ST, Guaiac, St, Margaret's Cen...","[Braintree, non-ST, Guaiac, St, Margaret's Cen...",This is an 82 - year-old male with a history o...,"[[0.41123586893081665, 0.3981207609176636, -0....","[[-0.27410000562667847, 0.22980999946594238, 0..."


#### [Sentence Entity Resolver for RxCUI (sbiobert_base_cased_mli embeddings)](https://nlp.johnsnowlabs.com/2020/12/11/sbiobertresolve_rxcui_en.html)

In [None]:
nlp.load("med_ner.jsl.wip.clinical en.resolve.rxcui").predict("He was seen by the endocrinology service and she was discharged on 50 mg of eltrombopag oral at night, 5 mg amlodipine with meals, and metformin 1000 mg two times a day",output_level =  "sentence")

ner_wikiner_glove_840B_300 download started this may take some time.
Approximate size to download 14.8 MB
[OK!]
sbiobertresolve_rxcui download started this may take some time.
[OK!]
setInputCols in ENTITY_57d31cc0027b expecting 1 columns. Provided column amount: 2. Which should be columns from the following annotators: ['sentence_embeddings']
sbiobert_base_cased_mli download started this may take some time.
Approximate size to download 384.3 MB
[OK!]
glove_840B_300 download started this may take some time.
Approximate size to download 2.3 GB
[OK!]
setInputCols in ENTITY_57d31cc0027b expecting 1 columns. Provided column amount: 2. Which should be columns from the following annotators: ['sentence_embeddings']
setInputCols in ENTITY_57d31cc0027b expecting 1 columns. Provided column amount: 2. Which should be columns from the following annotators: ['sentence_embeddings']
setInputCols in ENTITY_57d31cc0027b expecting 1 columns. Provided column amount: 2. Which should be columns from the fol

Unnamed: 0,entities_wikiner_glove_840B_300,sentence,word_embedding_glove
0,[],He was seen by the endocrinology service and s...,"[[0.02472599968314171, 0.273499995470047, 0.06..."


#### [Sentence Entity Resolver for RxNorm (sbiobert_base_cased_mli embeddings)](https://nlp.johnsnowlabs.com/2020/11/27/sbiobertresolve_rxnorm_en.html)

In [None]:
nlp.load("med_ner.jsl.wip.clinical en.resolve.rxnorm").predict("""This is an 82 - year-old male with a history of prior tobacco use , hypertension , chronic renal insufficiency , COPD ,
gastritis , and TIA who initially presented to Braintree with a non-ST elevation MI and Guaiac positive stools , transferred to St . Margaret\'s Center for Women & Infants for cardiac
catheterization with PTCA to mid LAD lesion complicated by hypotension and bradycardia requiring Atropine , IV fluids and transient dopamine possibly secondary to vagal reaction ,
subsequently transferred to CCU for close monitoring , hemodynamically stable at the time of admission to the CCU .""",output_level =  "sentence")

ner_wikiner_glove_840B_300 download started this may take some time.
Approximate size to download 14.8 MB
[OK!]
sbiobertresolve_rxnorm download started this may take some time.
[OK!]
setInputCols in ENTITY_e8e8480527b5 expecting 1 columns. Provided column amount: 2. Which should be columns from the following annotators: ['sentence_embeddings']
sbiobert_base_cased_mli download started this may take some time.
Approximate size to download 384.3 MB
[OK!]
glove_840B_300 download started this may take some time.
Approximate size to download 2.3 GB
[OK!]
setInputCols in ENTITY_e8e8480527b5 expecting 1 columns. Provided column amount: 2. Which should be columns from the following annotators: ['sentence_embeddings']
setInputCols in ENTITY_e8e8480527b5 expecting 1 columns. Provided column amount: 2. Which should be columns from the following annotators: ['sentence_embeddings']
setInputCols in ENTITY_e8e8480527b5 expecting 1 columns. Provided column amount: 2. Which should be columns from the fo

Unnamed: 0,entities_wikiner_glove_840B_300,entities_wikiner_glove_840B_300_class,entities_wikiner_glove_840B_300_confidence,entities_wikiner_glove_840B_300_origin_chunk,entities_wikiner_glove_840B_300_origin_sentence,resolution_rxnorm_code,resolution_rxnorm_confidence,resolution_rxnorm_distance,resolution_rxnorm_k_codes,resolution_rxnorm_k_confidences,resolution_rxnorm_k_cos_distances,resolution_rxnorm_k_distances,resolution_rxnorm_k_resolution,resolution_rxnorm_origin_sentence,resolution_rxnorm_resolved_text,resolution_rxnorm_target_text,resolution_rxnorm_token,sentence,sentence_embedding_biobert,word_embedding_glove
0,"[Braintree, non-ST, Guaiac, St, Margaret's Cen...","[LOC, MISC, MISC, LOC, ORG, ORG, ORG]","[0.8775, 0.873, 0.3519, 0.5045, 0.43828332, 0....","[0, 1, 2, 3, 4, 5, 6]","[0, 0, 0, 0, 0, 0, 0]","[607707, 1314253, 1373150, 583344, 1856515, 22...","[0.3374, 0.3438, 0.3529, 0.3344, 0.3399, 0.335...","[0.4892, 0.2549, 0.1804, 0.2870, 0.5064, 0.459...","[[607707, 215715, 352461], [1314253, 1363896, ...","[[0.3374, 0.3329, 0.3297], [0.3438, 0.3295, 0....","[[0.2393, 0.2524, 0.2623], [0.0650, 0.0885, 0....","[[0.4892, 0.5024, 0.5122], [0.2549, 0.2975, 0....","[[bionect, branchamin, baytet], [nonanal, berr...","[0, 0, 0, 0, 0, 0, 0]","[bionect, nonanal, guaiac, accu, levonorgestre...","[Braintree, non-ST, Guaiac, St, Margaret's Cen...","[Braintree, non-ST, Guaiac, St, Margaret's Cen...",This is an 82 - year-old male with a history o...,"[[0.41123586893081665, 0.3981207609176636, -0....","[[-0.27410000562667847, 0.22980999946594238, 0..."


#### [Sentence Entity Resolver for Snomed Concepts, INT version (sbiobert_base_cased_mli embeddings)](https://nlp.johnsnowlabs.com/2020/11/27/sbiobertresolve_snomed_findings_int_en.html)

In [None]:
nlp.load("med_ner.jsl.wip.clinical en.resolve.snomed.findings_int").predict("""This is an 82 - year-old male with a history of prior tobacco use , hypertension , chronic renal insufficiency , COPD ,
gastritis , and TIA who initially presented to Braintree with a non-ST elevation MI and Guaiac positive stools , transferred to St . Margaret\'s Center for Women & Infants for cardiac
catheterization with PTCA to mid LAD lesion complicated by hypotension and bradycardia requiring Atropine , IV fluids and transient dopamine possibly secondary to vagal reaction ,
subsequently transferred to CCU for close monitoring , hemodynamically stable at the time of admission to the CCU .""",output_level =  "sentence")

ner_wikiner_glove_840B_300 download started this may take some time.
Approximate size to download 14.8 MB
[OK!]
sbiobertresolve_snomed_findings_int download started this may take some time.
[OK!]
setInputCols in ENTITY_dcb03018cc1b expecting 1 columns. Provided column amount: 2. Which should be columns from the following annotators: ['sentence_embeddings']
sbiobert_base_cased_mli download started this may take some time.
Approximate size to download 384.3 MB
[OK!]
glove_840B_300 download started this may take some time.
Approximate size to download 2.3 GB
[OK!]
setInputCols in ENTITY_dcb03018cc1b expecting 1 columns. Provided column amount: 2. Which should be columns from the following annotators: ['sentence_embeddings']
setInputCols in ENTITY_dcb03018cc1b expecting 1 columns. Provided column amount: 2. Which should be columns from the following annotators: ['sentence_embeddings']
setInputCols in ENTITY_dcb03018cc1b expecting 1 columns. Provided column amount: 2. Which should be column

Unnamed: 0,entities_wikiner_glove_840B_300,entities_wikiner_glove_840B_300_class,entities_wikiner_glove_840B_300_confidence,entities_wikiner_glove_840B_300_origin_chunk,entities_wikiner_glove_840B_300_origin_sentence,resolution_snomed_code,resolution_snomed_confidence,resolution_snomed_distance,resolution_snomed_k_codes,resolution_snomed_k_confidences,resolution_snomed_k_cos_distances,resolution_snomed_k_distances,resolution_snomed_k_resolution,resolution_snomed_origin_sentence,resolution_snomed_resolved_text,resolution_snomed_target_text,resolution_snomed_token,sentence,sentence_embedding_biobert,word_embedding_glove
0,"[Braintree, non-ST, Guaiac, St, Margaret's Cen...","[LOC, MISC, MISC, LOC, ORG, ORG, ORG]","[0.8775, 0.873, 0.3519, 0.5045, 0.43828332, 0....","[0, 1, 2, 3, 4, 5, 6]","[0, 0, 0, 0, 0, 0, 0]","[70139002, 255199001, 770667002, 72670004, 289...","[0.3347, 0.3356, 0.3460, 0.3353, 0.3338, 0.333...","[0.5149, 0.2477, 0.3089, 0.2943, 0.4468, 0.432...","[[70139002, 49233005, 971211000000102], [25519...","[[0.3347, 0.3327, 0.3326], [0.3356, 0.3327, 0....","[[0.2652, 0.2713, 0.2717], [0.0613, 0.0658, 0....","[[0.5149, 0.5208, 0.5213], [0.2477, 0.2566, 0....","[[harara, dyspnea, reads bosnian], [benign tum...","[0, 0, 0, 0, 0, 0, 0]","[harara, benign tumor of pituitary and hypotha...","[Braintree, non-ST, Guaiac, St, Margaret's Cen...","[Braintree, non-ST, Guaiac, St, Margaret's Cen...",This is an 82 - year-old male with a history o...,"[[0.41123586893081665, 0.3981207609176636, -0....","[[-0.27410000562667847, 0.22980999946594238, 0..."
