![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/nlu/blob/master/examples/colab/healthcare/entity_resolution/entity_resolvers_overview.ipynb)



# Entity Resolution
**Named entities** are sub-strings in a text that can be classified into catogires. For example, in the String   
`"Tesla is a great stock to invest  in "` , the sub-string `"Tesla"` is a named entity, it can be classified with the label `company` by an ML algorithm.  
**Named entities** can easily be extracted by the various pre-trained Deep Learning based NER algorithms provided by NLU. 



After extracting **named entities** an **entity resolution algorithm** can be applied to the extracted named entities. The resolution algorithm classifies each extracted entitiy into a class, which reduces dimensionality of the data and has many useful applications. 
For example : 
- "**Tesla** is a great stock to invest in "
- "**TSLA**  is a great stock to invest  in "
- "**Tesla, Inc** is a great company to invest in"    

The sub-strings `Tesla` , `TSLA` and `Tesla, Inc` are all named entities, that are classified with the labeld `company` by the NER algorithm. It tells us, all these 3 sub-strings are of type `company`, but we cannot yet infer that these 3 strings are actually referring to literally the same company.    

This exact problem is solved by the resolver algorithms, it would resolve all these 3 entities to a common name, like a company ID. This maps every reference of Tesla, regardless of how the string is represented, to the same ID.

This example can analogusly be expanded to healthcare any any other text problems. In medical documents, the same disease can be referenced in many different ways. 

With NLU Healthcare you can leverage state of the art pre-trained NER models to extract **Medical Named Entities** (Diseases, Treatments, Posology, etc..) and **resolve these** to common **healthcare disease codes**.


These algorithms are based provided by **Spark NLP for Healthcare's**  [SentenceEntitiyResolver](https://nlp.johnsnowlabs.com/docs/en/licensed_annotators#sentenceentityresolver) and [ChunkEntityResolvers](https://nlp.johnsnowlabs.com/docs/en/licensed_annotators#chunkentityresolver)


## Avaiable models






All the models avaiable are :


| Language | nlu.load() reference                                         | Spark NLP Model reference          |
| -------- | ------------------------------------------------------------ | ------------------------------------------------------------ |
| English  | embed_sentence.biobert.mli | sbiobert_base_cased_mli          |
| English  | resolve | sbiobertresolve_cpt          |
| English  | resolve.cpt | sbiobertresolve_cpt          |
| English  | resolve.cpt.augmented | sbiobertresolve_cpt_augmented          |
| English  | resolve.cpt.procedures_augmented | sbiobertresolve_cpt_procedures_augmented          |
| English  | resolve.hcc.augmented | sbiobertresolve_hcc_augmented          |
| English  | [resolve.icd10cm](https://nlp.johnsnowlabs.com/2020/11/27/sbiobertresolve_icd10cm_en.html) | [sbiobertresolve_icd10cm](https://nlp.johnsnowlabs.com/2020/11/27/sbiobertresolve_icd10cm_en.html)                   |
| English  | [resolve.icd10cm.augmented](https://nlp.johnsnowlabs.com/2020/12/13/sbiobertresolve_icd10cm_augmented_en.html) | [sbiobertresolve_icd10cm_augmented](https://nlp.johnsnowlabs.com/2020/12/13/sbiobertresolve_icd10cm_augmented_en.html)                   |
| English  | [resolve.icd10cm.augmented_billable](https://nlp.johnsnowlabs.com/2021/02/06/sbiobertresolve_icd10cm_augmented_billable_hcc_en.html) | [sbiobertresolve_icd10cm_augmented_billable_hcc](https://nlp.johnsnowlabs.com/2021/02/06/sbiobertresolve_icd10cm_augmented_billable_hcc_en.html)                   |
| English  | [resolve.icd10pcs](https://nlp.johnsnowlabs.com/2020/11/27/sbiobertresolve_icd10pcs_en.html) | [sbiobertresolve_icd10pcs](https://nlp.johnsnowlabs.com/2020/11/27/sbiobertresolve_icd10pcs_en.html)                   |
| English  | [resolve.icdo](https://nlp.johnsnowlabs.com/2020/11/27/sbiobertresolve_icdo_en.html) | [sbiobertresolve_icdo](https://nlp.johnsnowlabs.com/2020/11/27/sbiobertresolve_icdo_en.html)                   |
| English  | [resolve.rxcui](https://nlp.johnsnowlabs.com/2020/12/11/sbiobertresolve_rxcui_en.html) | [sbiobertresolve_rxcui](https://nlp.johnsnowlabs.com/2020/12/11/sbiobertresolve_rxcui_en.html)                   |
| English  | [resolve.rxnorm](https://nlp.johnsnowlabs.com/2020/11/27/sbiobertresolve_rxnorm_en.html) | [sbiobertresolve_rxnorm](https://nlp.johnsnowlabs.com/2020/11/27/sbiobertresolve_rxnorm_en.html)                   |
| English  | [resolve.snomed](https://nlp.johnsnowlabs.com/2020/11/27/sbiobertresolve_snomed_auxConcepts_en.html) | [sbiobertresolve_snomed_auxConcepts](https://nlp.johnsnowlabs.com/2020/11/27/sbiobertresolve_snomed_auxConcepts_en.html)                   |
| English  | [resolve.snomed.aux_concepts](https://nlp.johnsnowlabs.com/2020/11/27/sbiobertresolve_snomed_auxConcepts_en.html) | [sbiobertresolve_snomed_auxConcepts](https://nlp.johnsnowlabs.com/2020/11/27/sbiobertresolve_snomed_auxConcepts_en.html)                   |
| English  | [resolve.snomed.aux_concepts_int](https://nlp.johnsnowlabs.com/2020/11/27/sbiobertresolve_snomed_auxConcepts_int_en.html) | [sbiobertresolve_snomed_auxConcepts_int](https://nlp.johnsnowlabs.com/2020/11/27/sbiobertresolve_snomed_auxConcepts_int_en.html)                   |
| English  | [resolve.snomed.findings](https://nlp.johnsnowlabs.com/2020/11/27/sbiobertresolve_snomed_findings_en.html) | [sbiobertresolve_snomed_findings](https://nlp.johnsnowlabs.com/2020/11/27/sbiobertresolve_snomed_findings_en.html)                   |
| English  | [resolve.snomed.findings_int](https://nlp.johnsnowlabs.com/2020/11/27/sbiobertresolve_snomed_findings_int_en.html) | [sbiobertresolve_snomed_findings_int](https://nlp.johnsnowlabs.com/2020/11/27/sbiobertresolve_snomed_findings_int_en.html)                   |


In [None]:
%%capture
# # Install NLU
# # Upload add your spark_nlp_fo"r_healthcare.json
!wget http://setup.johnsnowlabs.com/nlu/colab.sh -O - | bash
import nlu

#### [Athena Conditions Entity Resolver (Healthcare)](https://nlp.johnsnowlabs.com/2020/09/16/chunkresolve_athena_conditions_healthcare_en.html)

In [2]:
data ="""The patient is a 5-month-old infant who presented initially on Monday with a cold, cough, and runny nose for 2 days. Mom states she had no fever. Her appetite was good but she was spitting up a lot. She had no difficulty breathing and her cough was described as dry and hacky. At that time, physical exam showed a right TM, which was red. Left TM was okay. She was fairly congested but looked happy and playful. She was started on Amoxil and Aldex and we told to recheck in 2 weeks to recheck her ear. Mom returned to clinic again today because she got much worse overnight. She was having difficulty breathing. She was much more congested and her appetite had decreased significantly today. She also spiked a temperature yesterday of 102.6 and always having trouble sleeping secondary to congestion."""
nlu.load('med_ner.jsl.wip.clinical en.resolve_chunk.cpt_clinical').predict(data, output_level='chunk')

ner_wikiner_glove_840B_300 download started this may take some time.
Approximate size to download 14.8 MB
[OK!]


Exception: ignored

#### [Sentence Entity Resolver for ICD10-CM (sbiobert_base_cased_mli embeddings)](https://nlp.johnsnowlabs.com/2020/11/27/sbiobertresolve_icd10cm_en.html)

In [None]:
nlu.load("med_ner.jsl.wip.clinical en.resolve.icd10cm").predict("""This is an 82 - year-old male with a history of prior tobacco use , hypertension , chronic renal insufficiency , COPD ,
gastritis , and TIA who initially presented to Braintree with a non-ST elevation MI and Guaiac positive stools , transferred to St . Margaret\'s Center for Women & Infants for cardiac
catheterization with PTCA to mid LAD lesion complicated by hypotension and bradycardia requiring Atropine , IV fluids and transient dopamine possibly secondary to vagal reaction , 
subsequently transferred to CCU for close monitoring , hemodynamically stable at the time of admission to the CCU .""",output_level =  "sentence")

#### [Sentence Entity Resolver for ICD10-PCS (sbiobert_base_cased_mli embeddings)](https://nlp.johnsnowlabs.com/2020/11/27/sbiobertresolve_icd10pcs_en.html)

In [None]:
nlu.load("med_ner.jsl.wip.clinical en.resolve.icd10pcs").predict("""This is an 82 - year-old male with a history of prior tobacco use , hypertension , chronic renal insufficiency , COPD ,
gastritis , and TIA who initially presented to Braintree with a non-ST elevation MI and Guaiac positive stools , transferred to St . Margaret\'s Center for Women & Infants for cardiac
catheterization with PTCA to mid LAD lesion complicated by hypotension and bradycardia requiring Atropine , IV fluids and transient dopamine possibly secondary to vagal reaction , 
subsequently transferred to CCU for close monitoring , hemodynamically stable at the time of admission to the CCU .""",output_level =  "sentence")

#### [Sentence Entity Resolver for RxCUI (sbiobert_base_cased_mli embeddings)](https://nlp.johnsnowlabs.com/2020/12/11/sbiobertresolve_rxcui_en.html)

In [None]:
nlu.load("med_ner.jsl.wip.clinical en.resolve.rxcui").predict("He was seen by the endocrinology service and she was discharged on 50 mg of eltrombopag oral at night, 5 mg amlodipine with meals, and metformin 1000 mg two times a day",output_level =  "sentence")

#### [Sentence Entity Resolver for RxNorm (sbiobert_base_cased_mli embeddings)](https://nlp.johnsnowlabs.com/2020/11/27/sbiobertresolve_rxnorm_en.html)

In [None]:
import nlu
nlu.load("med_ner.jsl.wip.clinical en.resolve.rxnorm").predict("""This is an 82 - year-old male with a history of prior tobacco use , hypertension , chronic renal insufficiency , COPD ,
gastritis , and TIA who initially presented to Braintree with a non-ST elevation MI and Guaiac positive stools , transferred to St . Margaret\'s Center for Women & Infants for cardiac
catheterization with PTCA to mid LAD lesion complicated by hypotension and bradycardia requiring Atropine , IV fluids and transient dopamine possibly secondary to vagal reaction , 
subsequently transferred to CCU for close monitoring , hemodynamically stable at the time of admission to the CCU .""",output_level =  "sentence")

#### [Sentence Entity Resolver for Snomed Concepts, INT version (sbiobert_base_cased_mli embeddings)](https://nlp.johnsnowlabs.com/2020/11/27/sbiobertresolve_snomed_findings_int_en.html)

In [None]:
nlu.load("med_ner.jsl.wip.clinical en.resolve.snomed.findings_int").predict("""This is an 82 - year-old male with a history of prior tobacco use , hypertension , chronic renal insufficiency , COPD ,
gastritis , and TIA who initially presented to Braintree with a non-ST elevation MI and Guaiac positive stools , transferred to St . Margaret\'s Center for Women & Infants for cardiac
catheterization with PTCA to mid LAD lesion complicated by hypotension and bradycardia requiring Atropine , IV fluids and transient dopamine possibly secondary to vagal reaction , 
subsequently transferred to CCU for close monitoring , hemodynamically stable at the time of admission to the CCU .""",output_level =  "sentence")