![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)

# **SentenceEntityResolverModel**

This notebook will cover the different parameters and usages of `SentenceEntityResolverModel`. This annotator extracts entities from sentence embeddings and resolves them to a particular ontology / curated dataset.

**📖 Learning Objectives:**

1. Understand the application and relevance of these models in healthcare data analysis, particularly in coding and classification tasks related to healthcare ontologies like ICD-10, RxNorm, SNOMED, etc.

2. Become comfortable using the different parameters of the annotator.


**🔗 Helpful Links:**

- Documentation : [SentenceEntityResolverModel](https://nlp.johnsnowlabs.com/docs/en/licensed_annotators#sentenceentityresolver)

- Python Docs : [SentenceEntityResolverModel](https://nlp.johnsnowlabs.com/licensed/api/python/reference/autosummary/sparknlp_jsl/annotator/resolution/sentence_entity_resolver/index.html)

- Scala Docs : [SentenceEntityResolverModel](https://nlp.johnsnowlabs.com/licensed/api/com/johnsnowlabs/finance/chunk_classification/resolution/SentenceEntityResolverModel.html)

- For extended examples of usage, see the [Spark NLP Workshop repository](https://github.com/JohnSnowLabs/spark-nlp-workshop/tree/master/tutorials/Certification_Trainings/Healthcare).

## **🎬 Colab Setup**

In [1]:
!pip install -q johnsnowlabs

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m103.9/103.9 kB[0m [31m1.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m135.8/135.8 kB[0m [31m8.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m643.8/643.8 kB[0m [31m18.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.1/3.1 MB[0m [31m36.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m212.4/212.4 MB[0m [31m5.2 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m95.4/95.4 kB[0m [31m11.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m531.2/531.2 kB[0m [31m45.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m198.6/198.6 kB[0m [31m21.6 MB/s

In [None]:
from google.colab import files
print('Please Upload your John Snow Labs License using the button below')
license_keys = files.upload()

Please Upload your John Snow Labs License using the button below


In [None]:
from johnsnowlabs import nlp

nlp.install()

In [None]:
from johnsnowlabs import nlp, medical
import pyspark.sql.functions as F
import pandas as pd

spark = nlp.start()

## **🖨️ Input/Output Annotation Types**

- Input: `SENTENCE_EMBEDDINGS`

- Output: `ENTITY`

## **🔎 Parameters**


- `DistanceFunction`: Determines how the distance between different entities will be calculated.


### `setDistanceFunction()`



In [None]:
documentAssembler = nlp.DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("ner_chunk")

sbert_embedder = nlp.BertSentenceEmbeddings.pretrained('sbiobert_base_cased_mli', 'en','clinical/models')\
    .setInputCols(["ner_chunk"])\
    .setOutputCol("sentence_embeddings")\
    .setCaseSensitive(False)

rxnorm_resolver = medical.SentenceEntityResolverModel.pretrained("sbiobertresolve_rxnorm_augmented","en", "clinical/models") \
    .setInputCols(["sentence_embeddings"]) \
    .setOutputCol("rxnorm_code")\
    .setDistanceFunction("EUCLIDEAN")

rxnorm_pipelineModel = nlp.PipelineModel(
    stages = [
        documentAssembler,
        sbert_embedder,
        rxnorm_resolver])

sbiobert_base_cased_mli download started this may take some time.
Approximate size to download 384.3 MB
[OK!]
sbiobertresolve_rxnorm_augmented download started this may take some time.
[OK!]


In [None]:
text = 'metformin 100 mg'

In [None]:
df = spark.createDataFrame([[""]]).toDF("text")

In [None]:
results = rxnorm_pipelineModel.transform(df)

In [None]:
results.show(truncate=100)

+----+------------------------------------------+----------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------+
|text|                                 ner_chunk|                                                                                 sentence_embeddings|                                                                                         rxnorm_code|
+----+------------------------------------------+----------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------+
|    |[{document, 0, -1, , {sentence -> 0}, []}]|[{sentence_embeddings, 0, -1, , {sentence -> 0, token -> , pieceId -> -1, isWordStart -> true}, [...|[{entity, 0, -1, 801432, {all_k_results -> 801432:::219624:::220251:::352613:::1356558:::24791

# **SentenceEntityResolverApproach**

This notebook will cover the different parameters and usages of `SentenceEntityResolverApproach`. This annotator trains a SentenceEntityResolverModel that maps sentence embeddings to entities in a knowledge base.

**📖 Learning Objectives:**

1. Understand the application and relevance of these models in healthcare data analysis, particularly in coding and classification tasks related to healthcare ontologies like ICD-10, RxNorm, SNOMED, etc.

2. Become comfortable using the different parameters of the annotator.


**🔗 Helpful Links:**

- Documentation : [SentenceEntityResolverApproach](https://nlp.johnsnowlabs.com/docs/en/licensed_annotators#sentenceentityresolver)

- Python Docs : [SentenceEntityResolverApproach](https://nlp.johnsnowlabs.com/licensed/api/python/reference/autosummary/sparknlp_jsl/annotator/resolution/sentence_entity_resolver/index.html)

- Scala Docs : [SentenceEntityResolverApproach](https://nlp.johnsnowlabs.com/licensed/api/com/johnsnowlabs/finance/chunk_classification/resolution/SentenceEntityResolverApproach.html)

- For extended examples of usage, see the [Spark NLP Workshop repository](https://github.com/JohnSnowLabs/spark-nlp-workshop/tree/master/tutorials/Certification_Trainings/Healthcare).

## **🖨️ Input/Output Annotation Types**

- Input: `SENTENCE_EMBEDDINGS`

- Output: `ENTITY`

## **🔎 Parameters**


- `labelCol` : Column name for the value we are trying to resolve. Usually this contains the entity ID in the knowledge base (e.g., the ICD-10 code).

- `normalizedCol`: Column name for the original, normalized description

- `aux_label_col`: Auxiliary label which maps resolved entities to additional labels

- `useAuxLabel`: Whether to use the auxiliary column or not. Default value is False.

- `DistanceFunction`: Determines how the distance between different entities will be calculated.

In [None]:
documentAssembler = nlp.DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

sentenceDetector = nlp.SentenceDetector()\
    .setInputCols(["document"])\
    .setOutputCol("sentence")

tokenizer = nlp.Tokenizer()\
    .setInputCols(["sentence"])\
    .setOutputCol("token")

bertEmbeddings = nlp.BertSentenceEmbeddings.pretrained("sent_biobert_pubmed_base_cased")\
    .setInputCols(["sentence"])\
    .setOutputCol("embeddings")

data_pipeline = nlp.Pipeline(stages=[
   documentAssembler,
   sentenceDetector,
   tokenizer,
   bertEmbeddings
])

prepared_data = data_pipeline.fit(df).transform(df)

sent_biobert_pubmed_base_cased download started this may take some time.
Approximate size to download 386.4 MB
[OK!]


In [None]:
bertExtractor = medical.SentenceEntityResolverApproach()\
    .setNeighbours(25)\
    .setThreshold(1000)\
    .setInputCols(["embeddings"])\
    .setNormalizedCol("normalized_text")\
    .setLabelCol("text")\
    .setOutputCol("snomed_code")\
    .setDistanceFunction("EUCLIDIAN")\
    .setCaseSensitive(False)\
    .setUseAuxLabel(True)\
    .setAuxLabelCol("ground_truth")