

![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://githubtocolab.com/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/streamlit_notebooks/healthcare/ER_LOINC.ipynb)




# **LOINC coding**

To run this yourself, you will need to upload your license keys to the notebook. Just Run The Cell Below in order to do that. Also You can open the file explorer on the left side of the screen and upload `license_keys.json` to the folder that opens.
Otherwise, you can look at the example outputs at the bottom of the notebook.



## 1. Colab Setup

Import license keys

In [None]:
import os
import json

from google.colab import files

license_keys = files.upload()

with open(list(license_keys.keys())[0]) as f:
    license_keys = json.load(f)

sparknlp_version = license_keys["PUBLIC_VERSION"]
jsl_version = license_keys["JSL_VERSION"]

print ('SparkNLP Version:', sparknlp_version)
print ('SparkNLP-JSL Version:', jsl_version)

Saving License Keys 3.0.2.json to License Keys 3.0.2.json
SparkNLP Version: 3.0.2
SparkNLP-JSL Version: 3.0.2


Install dependencies

In [None]:
%%capture
for k,v in license_keys.items(): 
    %set_env $k=$v

!wget https://raw.githubusercontent.com/JohnSnowLabs/spark-nlp-workshop/master/jsl_colab_setup.sh
!bash jsl_colab_setup.sh

# Install Spark NLP Display for visualization
!pip install --ignore-installed spark-nlp-display

Import dependencies into Python

In [None]:
import pandas as pd
from pyspark.ml import Pipeline
from pyspark.sql import SparkSession
import pyspark.sql.functions as F

import sparknlp
from sparknlp.annotator import *
from sparknlp_jsl.annotator import *
from sparknlp.base import *
import sparknlp_jsl

Start the Spark session

In [None]:
spark = sparknlp_jsl.start(license_keys['SECRET'])

# manually configure the session
# params = {"spark.driver.memory" : "16G",
#           "spark.kryoserializer.buffer.max" : "2000M",
#           "spark.driver.maxResultSize" : "2000M"}

# spark = sparknlp_jsl.start(license_keys['SECRET'],params=params)

## 2. Select the Entity Resolver model and construct the pipeline

Select the models:

**LOINC Entity Resolver models:**

1.   **chunkresolve_loinc_clinical** is a Chunk Resolver
2.   **sbiobertresolve_loinc** is a Sentence Resolver


For more details: https://github.com/JohnSnowLabs/spark-nlp-models#pretrained-models---spark-nlp-for-healthcare

# **Chunk Resolver**

In [None]:
# Change this to the model you want to use and re-run the cells below.
ER_MODEL_NAME = "chunkresolve_loinc_clinical"
NER_MODEL_NAME = "ner_clinical"

Create the pipeline

In [None]:
document_assembler = DocumentAssembler() \
    .setInputCol('text')\
    .setOutputCol('document')

sentence_detector = SentenceDetector() \
    .setInputCols(['document'])\
    .setOutputCol('sentences')

tokenizer = Tokenizer()\
    .setInputCols(['sentences']) \
    .setOutputCol('tokens')

embeddings = WordEmbeddingsModel.pretrained('embeddings_clinical', 'en', 'clinical/models')\
    .setInputCols(["sentences", "tokens"])\
    .setOutputCol("embeddings")

ner_model = MedicalNerModel.pretrained(NER_MODEL_NAME, "en", "clinical/models") \
    .setInputCols(["sentences", "tokens", "embeddings"])\
    .setOutputCol("ner_tags")   

ner_chunker = NerConverter()\
    .setInputCols(["sentences", "tokens", "ner_tags"])\
    .setOutputCol("ner_chunk")

chunk_embeddings = ChunkEmbeddings()\
    .setInputCols("ner_chunk", "embeddings")\
    .setOutputCol("chunk_embeddings")

entity_resolver = \
    ChunkEntityResolverModel.pretrained(ER_MODEL_NAME,"en","clinical/models")\
    .setInputCols("tokens","chunk_embeddings").setOutputCol("resolution")
    
pipeline = Pipeline(stages=[
    document_assembler, 
    sentence_detector,
    tokenizer,
    embeddings,
    ner_model,
    ner_chunker,
    chunk_embeddings,
    entity_resolver])

empty_df = spark.createDataFrame([['']]).toDF("text")
pipeline_model = pipeline.fit(empty_df)

light_pipeline = sparknlp.base.LightPipeline(pipeline_model)

embeddings_clinical download started this may take some time.
Approximate size to download 1.6 GB
[OK!]
ner_clinical download started this may take some time.
Approximate size to download 13.9 MB
[OK!]
chunkresolve_loinc_clinical download started this may take some time.
Approximate size to download 130.1 MB
[OK!]


## 3. Create example inputs

In [None]:
# Enter examples as strings in this array
input_list = [
"""A 28-year-old female with a history of gestational diabetes mellitus diagnosed eight years prior to presentation and subsequent type two diabetes mellitus (T2DM), one prior episode of HTG-induced pancreatitis three years prior to presentation, associated with an acute hepatitis, and obesity with a body mass index (BMI) of 33.5 kg/m2, presented with a one-week history of polyuria, polydipsia, poor appetite, and vomiting.""",
]

## 4. Run the pipeline

In [None]:
df = spark.createDataFrame(pd.DataFrame({"text": input_list}))
result = pipeline_model.transform(df)
light_result = light_pipeline.fullAnnotate(input_list[0])

In [None]:
result.show(truncate=50)

+--------------------------------------------------+--------------------------------------------------+--------------------------------------------------+--------------------------------------------------+--------------------------------------------------+--------------------------------------------------+--------------------------------------------------+--------------------------------------------------+--------------------------------------------------+
|                                              text|                                          document|                                         sentences|                                            tokens|                                        embeddings|                                          ner_tags|                                         ner_chunk|                                  chunk_embeddings|                                        resolution|
+--------------------------------------------------+--------------------------

## 5. Visualize

In [None]:
result.select(
    F.explode(
        F.arrays_zip('ner_chunk.result', 
                     'ner_chunk.begin',
                     'ner_chunk.end',
                     'ner_chunk.metadata',
                     'resolution.metadata', 'resolution.result')
    ).alias('cols')
).select(
    F.expr("cols['0']").alias('chunk'),
    F.expr("cols['1']").alias('begin'),
    F.expr("cols['2']").alias('end'),
    F.expr("cols['3']['entity']").alias('entity'),
    F.expr("cols['4']['resolved_text']").alias('LOINC_description'),
    F.expr("cols['5']").alias('LOINC_Code'),
).show(truncate=False)

+-------------------------------------+-----+---+-------+---------------------------------------------------------+----------+
|chunk                                |begin|end|entity |LOINC_description                                        |LOINC_Code|
+-------------------------------------+-----+---+-------+---------------------------------------------------------+----------+
|gestational diabetes mellitus        |39   |67 |PROBLEM|Insulin dependent diabetes mellitus Ql                   |44877-9   |
|subsequent type two diabetes mellitus|117  |153|PROBLEM|Insulin dependent diabetes mellitus Ql                   |44877-9   |
|T2DM                                 |156  |159|PROBLEM|Cholesterol in VLDL [Mass or moles/Vol]                  |35199-9   |
|HTG-induced pancreatitis             |184  |207|PROBLEM|Cryoproteins identified Nom                              |15176-1   |
|an acute hepatitis                   |260  |277|PROBLEM|Acute hepatitis 2000 panel (S)                        

In [None]:
from sparknlp_display import EntityResolverVisualizer

vis = EntityResolverVisualizer()

## To set custom label colors:
vis.set_label_colors({'TREATMENT':'#0077b6', 'TEST':'#2a9d8f', 'PROBLEM':'#00b4d8'})

vis.display(light_result[0], 'ner_chunk', 'resolution', 'document')

# **Sentence Resolver**

In [None]:
# Change this to the model you want to use and re-run the cells below.
ER_MODEL_NAME = "sbiobertresolve_loinc"
NER_MODEL_NAME = "ner_clinical"

Create the pipeline

In [None]:
document_assembler = DocumentAssembler() \
    .setInputCol('text')\
    .setOutputCol('document')

sentence_detector = SentenceDetector() \
    .setInputCols(['document'])\
    .setOutputCol('sentences')

tokenizer = Tokenizer()\
    .setInputCols(['sentences']) \
    .setOutputCol('tokens')

word_embeddings = WordEmbeddingsModel.pretrained('embeddings_clinical', 'en', 'clinical/models')\
    .setInputCols(["sentences", "tokens"])\
    .setOutputCol("embeddings")

clinical_ner = MedicalNerModel.pretrained(NER_MODEL_NAME, "en", "clinical/models") \
    .setInputCols(["sentences", "tokens", "embeddings"])\
    .setOutputCol("ner_tags")   

ner_converter = NerConverter()\
    .setInputCols(["sentences", "tokens", "ner_tags"])\
    .setOutputCol("ner_chunk")

chunk2doc = Chunk2Doc().setInputCols("ner_chunk").setOutputCol("ner_chunk_doc")

sbert_embedder = BertSentenceEmbeddings.pretrained("sbiobert_base_cased_mli",'en','clinical/models')\
     .setInputCols(["ner_chunk_doc"])\
     .setOutputCol("sbert_embeddings")

entity_resolver = SentenceEntityResolverModel.pretrained("sbiobertresolve_loinc","en", "clinical/models") \
     .setInputCols(["ner_chunk", "sbert_embeddings"]) \
     .setOutputCol("resolution")\
     .setDistanceFunction("EUCLIDEAN")

pipeline_loinc = Pipeline(stages = [document_assembler, sentence_detector, tokenizer, word_embeddings, clinical_ner, ner_converter, chunk2doc, sbert_embedder, entity_resolver])

empty_df = spark.createDataFrame([['']]).toDF("text")

sentence_loinc_pipeline = pipeline_loinc.fit(empty_df)

light_pipeline = sparknlp.base.LightPipeline(sentence_loinc_pipeline)

embeddings_clinical download started this may take some time.
Approximate size to download 1.6 GB
[OK!]
ner_clinical download started this may take some time.
Approximate size to download 13.9 MB
[OK!]
sbiobert_base_cased_mli download started this may take some time.
Approximate size to download 384.3 MB
[OK!]
sbiobertresolve_loinc download started this may take some time.
Approximate size to download 215.1 MB
[OK!]


## 3. Create example inputs

In [None]:
# Enter examples as strings in this array
input_list = [
"""FINDINGS: The patient was found upon excision of the cyst that it contained a large Prolene suture, which is multiply knotted as it always is; beneath this was a very small incisional hernia, the hernia cavity, which contained omentum; the hernia was easily repaired DESCRIPTION OF PROCEDURE: The patient was identified, then taken into the operating room, where after induction of an LMA anesthetic, his abdomen was prepped with Betadine solution and draped in sterile fashion. The puncta of the wound lesion was infiltrated with methylene blue and peroxide. The lesion was excised and the existing scar was excised using an ellipse and using a tenotomy scissors, the cyst was excised down to its base. In doing so, we identified a large Prolene suture within the wound and followed this cyst down to its base at which time we found that it contained omentum and was in fact overlying a small incisional hernia. The cyst was removed in its entirety, divided from the omentum using a Metzenbaum and tying with 2-0 silk ties. The hernia repair was undertaken with interrupted 0 Vicryl suture with simple sutures. The wound was then irrigated and closed with 3-0 Vicryl subcutaneous and 4-0 Vicryl subcuticular and Steri-Strips. Patient tolerated the procedure well. Dressings were applied and he was taken to recovery room in stable condition. """,
]

## 4. Run the pipeline

In [None]:
df = spark.createDataFrame(pd.DataFrame({"text": input_list}))
result = sentence_loinc_pipeline.transform(df)
light_result = light_pipeline.fullAnnotate(input_list[0])

In [None]:
result.show(truncate=50)

+--------------------------------------------------+--------------------------------------------------+--------------------------------------------------+--------------------------------------------------+--------------------------------------------------+--------------------------------------------------+--------------------------------------------------+--------------------------------------------------+--------------------------------------------------+--------------------------------------------------+
|                                              text|                                          document|                                         sentences|                                            tokens|                                        embeddings|                                          ner_tags|                                         ner_chunk|                                     ner_chunk_doc|                                  sbert_embeddings|                            

## 5. Visualize

In [None]:
result.select(
    F.explode(
        F.arrays_zip('ner_chunk_doc.result', 
                     'ner_chunk_doc.begin',
                     'ner_chunk_doc.end',
                     'ner_chunk_doc.metadata',
                     'resolution.metadata', 'resolution.result')
    ).alias('cols')
).select(
    F.expr("cols['0']").alias('chunk'),
    F.expr("cols['1']").alias('begin'),
    F.expr("cols['2']").alias('end'),
    F.expr("cols['3']['entity']").alias('entity'),
    F.expr("cols['4']['resolved_text']").alias('LOINC_description'),
    F.expr("cols['5']").alias('LOINC_Code'),
).show(truncate=False)

+------------------------------+-----+---+---------+----------------------------------------------+----------+
|chunk                         |begin|end|entity   |LOINC_description                             |LOINC_Code|
+------------------------------+-----+---+---------+----------------------------------------------+----------+
|excision                      |37   |44 |TREATMENT|Exfoliation                                   |32971-4   |
|the cyst                      |49   |56 |PROBLEM  |View for cyst                                 |65799-9   |
|a large Prolene suture        |76   |97 |TREATMENT|Platelets.large                               |34167-7   |
|multiply knotted              |109  |124|TREATMENT|Lot number                                    |43156-9   |
|a very small incisional hernia|160  |189|PROBLEM  |Lipoprotein.beta.subparticle.very small-a     |92715-2   |
|the hernia                    |192  |201|PROBLEM  |Regurgitation degree                          |77919-9   |
|

In [None]:
from sparknlp_display import EntityResolverVisualizer

vis = EntityResolverVisualizer()

## To set custom label colors:
vis.set_label_colors({'TREATMENT':'#0077b6', 'TEST':'#2a9d8f', 'PROBLEM':'#f4a261'})

vis.display(light_result[0], 'ner_chunk_doc', 'resolution', 'document')