![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/streamlit_notebooks/healthcare_jsl/ER_CPT.ipynb)

# `sbiobertresolve_cpt_procedures_measurements_augmented` **Models**

This model maps medical entities to CPT codes using Sentence Bert Embeddings. The corpus of this model has been extented to measurements, and this model is capable of mapping both procedures and measurement concepts/entities to CPT codes. Measurement codes are helpful in codifying medical entities related to tests and their results.

## 1. Colab Setup

**Import license keys**

In [None]:
# Install the johnsnowlabs library to access Spark-OCR and Spark-NLP for Healthcare, Finance, and Legal.
! pip install -q johnsnowlabs

In [None]:
from google.colab import files
print("Please Upload your John Snow Labs License using the button below")
license_keys = files.upload()

In [None]:
from johnsnowlabs import *

# After uploading your license run this to install all licensed Python Wheels and pre-download Jars the Spark Session JVM
# Make sure to restart your notebook afterwards for changes to take effect

jsl.install()

## 2. Start Session

In [None]:
from johnsnowlabs import *
# Automatically load license data and start a session with all jars user has access to
spark = jsl.start()

## 3. Select the model and construct the pipeline

In [None]:
MODEL_NAME = "sbiobertresolve_cpt_procedures_measurements_augmented"

**Create the pipeline**

In [None]:
document_assembler = nlp.DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

sentence_detector = nlp.SentenceDetector() \
    .setInputCols(['document'])\
    .setOutputCol('sentence')

tokenizer = nlp.Tokenizer()\
    .setInputCols(["sentence"])\
    .setOutputCol("token")

word_embeddings = nlp.WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")\
    .setInputCols(["sentence", "token"])\
    .setOutputCol("embeddings")

ner = medical.NerModel.pretrained("ner_jsl", "en", "clinical/models") \
    .setInputCols(["sentence", "token", "embeddings"]) \
    .setOutputCol("ner")

ner_converter = medical.NerConverterInternal()\
    .setInputCols(["sentence", "token", "ner"])\
    .setOutputCol("ner_chunk")\
    .setWhiteList(["Procedure"])

c2doc = nlp.Chunk2Doc()\
    .setInputCols("ner_chunk")\
    .setOutputCol("ner_chunk_doc") 

sbert_embedder = nlp.BertSentenceEmbeddings.pretrained("sbiobert_base_cased_mli",'en','clinical/models')\
    .setInputCols(["ner_chunk_doc"])\
    .setOutputCol("sbert_embeddings")\
    .setCaseSensitive(False)

resolver = medical.SentenceEntityResolverModel.pretrained(MODEL_NAME, "en", "clinical/models") \
    .setInputCols(["ner_chunk_doc", "sbert_embeddings"]) \
    .setOutputCol("resolution")\
    .setDistanceFunction("EUCLIDEAN")


nlp_pipeline = Pipeline(
    stages = [
        document_assembler,
        sentence_detector,
        tokenizer,
        word_embeddings,
        ner,
        ner_converter,
        c2doc,
        sbert_embedder,
        resolver
  ])


embeddings_clinical download started this may take some time.
Approximate size to download 1.6 GB
[OK!]
ner_jsl download started this may take some time.
[OK!]
sbiobert_base_cased_mli download started this may take some time.
Approximate size to download 384.3 MB
[OK!]
sbiobertresolve_cpt_procedures_measurements_augmented download started this may take some time.
[OK!]


## 4. Create example inputs

In [None]:
sample_text = [
"""he was admitted to the hospital with chest pain and found to have bilateral pleural effusion, the right greater than the left. CT scan of the chest also revealed a large mediastinal lymph node. We reviewed the pathology obtained from the pericardectomy in March 2006, which was diagnostic of mesothelioma. At this time, chest tube placement for drainage of the fluid occurred and thoracoscopy, which were performed, which revealed epithelioid malignant mesothelioma.""",               
"""Management of pain medications, This is a 60-year-old white male with history of coronary artery disease. He is currently on dialysis due to end-stage renal disease. He also has been started on Seroquel 12.5 mg p.o. at bedtime and will receive his first dose on the evening of Monday, February 12, 2007.  He denies any other psychiatric symptoms including auditory or visual hallucinations or delusions.  His wife was present in the room and both him and his wife seemed to be offended by the suggestion of any psychiatric history or any psychiatric problems.,PAST MEDICAL HISTORY:,1.  DVT in December 2005.,2.  Three MI's (1996, 2005, and 2006).,3.  Diabetes for 5 years.,4.  Coronary artery disease for 10 years.,PAST SURGERIES:,1.  Appendectomy as a child.,2.  Sternal rewiring, December 2005.,MEDICATIONS:,1. Aspirin 81 mg p.o. daily.,2.  Metastron injection 4 mg IV q.6h. p.r.n. nausea.,3.  Albumin IV p.r.n. hemodialysis., 4. Ipratropium solution for nebulizer.,ALLERGIES:,  No known drug allergies.,PAST PSYCHIATRIC HISTORY:,  The patient denies any past psychiatric problems.  No medications.  He denies any outpatient visits or inpatient hospitalizations for psychiatric reasons., Mini mental status exams not completed.,ASSESSMENT:,AXIS I:  Pain with physical symptoms and possibly psychological symptoms.""",
"""PROCEDURE: , Esophagogastroduodenoscopy with biopsy ,INDICATIONS FOR PROCEDURE: , A 17-year-old with history of 40-pound weight loss, abdominal pain, status post appendectomy with recurrent abscess formation and drainage.  Currently, he has a fistula from his anterior abdominal wall out.   CT scans show thickened terminal ileum, which suggest that we are dealing with Crohn's disease.,MEDICATIONS:  ,General anesthesia.,INSTRUMENT:,  Olympus GIF-160 and PCF-160.,COMPLICATIONS: , None.,FINDINGS: , With the patient in the supine position, intubated under general anesthesia. The endoscope was inserted without difficulty into the hypopharynx.  The scope was advanced down the esophagus, which had normal mucosal coloration and vascular pattern.  It appeared normal and appeared to function normally.  The endoscope was advanced into the stomach, which was distended with excess air.  Rugal folds were flattened completely.  There were multiple superficial erosions scattered throughout the fundus, body, and antral portions consistent with Crohn's involvement of the stomach. Biopsies were obtained x2 in the second portion of the duodenum, antrum, body, and distal esophagus from the central incisors for histology.  Two additional biopsies were obtained in the antrum for CLO testing.  Excess air was evacuated from the stomach.  The scope was removed from the patient who tolerated that part of the procedure well.,The cecal area had multiple ulcers with exudate.  The ileocecal valve was markedly distorted.  Biopsies were obtained x2 in the cecal area and then the scope was withdrawn through the ascending, transverse, descending, sigmoid, and rectum.  The colonic mucosa in these areas was well seen and there were a few scattered aphthous ulcers in the ascending and descending colon.  Biopsies were obtained in the cecum at 65 cm, transverse colon 50 cm and rectosigmoid 20 cmNo fistulas were noted in the colon.  Excess air was evacuated from the colon.  The scope was removed.  The patient tolerated the procedure well and was taken to recovery in satisfactory condition.""",
"""Right lower quadrant abdominal pain, rule out acute appendicitis.,POSTOPERATIVE DIAGNOSIS:,  Acute suppurative appendicitis.,PROCEDURE PERFORMED:,1.  Diagnostic laparoscopy.,2.  Laparoscopic appendectomy.,ANESTHESIA: , General endotracheal and injectable 0.25% Marcaine.,ESTIMATED BLOOD LOSS: , Minimal.,SPECIMEN: , Appendix.,COMPLICATIONS: , None.,BRIEF HISTORY: , This is a 37-year-old Caucasian female with progressively worsening suprapubic and right lower quadrant abdominal pain, which progressed throughout its course starting approximately 12 hours prior to presentation.  She admits to some nausea associated with it.  There have been no fevers, chills, and/or genitourinary symptoms.  The patient had right lower quadrant tenderness with rebound and percussion tenderness in the right lower quadrant. Given the severity of her abdominal examination and her persistence of her symptoms, we recommend the patient undergo diagnostic laparoscopy with probable need for laparoscopic appendectomy and possible open appendectomy.  The risks, benefits, complications of the procedure, she gave us informed consent to proceed.""",
"""PREOPERATIVE DIAGNOSES: , Left obstructed renal ureteropelvic junction obstruction status post pyeloplasty, percutaneous procedure and status post Pseudomonas pyelonephritis x6, renal insufficiency, and solitary kidney., POSTOPERATIVE DIAGNOSES:,  Left obstructed renal ureteropelvic junction obstruction status post pyeloplasty, percutaneous procedure and status post Pseudomonas pyelonephritis x6, renal insufficiency, and solitary kidney., PROCEDURE:  ,Cystoscopy under anesthesia, left ureteropelvic junction obstruction, difficult and open renal biopsy.,FLUIDS RECEIVED:  ,1000 mL crystalloid.,ESTIMATED BLOOD LOSS:  ,Less than 10 mL.,SPECIMENS: , Tissue sent to pathology is a renal biopsy.,ABNORMAL FINDINGS: , A stenotic scarred ureteropelvic junction with dilated ureter and dilated renal pelvis., TUBES AND DRAINS:  ,A 10-French silicone Foley catheter with 3 mL in balloon and a 4.7-French ureteral double J-stent multilength.,INDICATIONS FOR OPERATION:  ,The patient is a 3-1/2-year-old boy, who has a solitary left kidney with renal insufficiency with creatinine of 1.2, who has had a ureteropelvic junction repair performed by Dr. Chang.  It was subsequently obstructed with multiple episodes of pyelonephritis, two percutaneous tube placements and continued obstruction.  Plan is for co surgeons due to the complexity of the situation and the solitary kidney to do surgical procedure to correct the obstruction."""
]

In [None]:
from pyspark.sql.types import StringType, IntegerType

df = spark.createDataFrame(sample_text, StringType()).toDF('text')

df.show(truncate = 100)

+----------------------------------------------------------------------------------------------------+
|                                                                                                text|
+----------------------------------------------------------------------------------------------------+
|he was admitted to the hospital with chest pain and found to have bilateral pleural effusion, the...|
|Management of pain medications, This is a 60-year-old white male with history of coronary artery ...|
|PROCEDURE: , Esophagogastroduodenoscopy with biopsy ,INDICATIONS FOR PROCEDURE: , A 17-year-old w...|
|Right lower quadrant abdominal pain, rule out acute appendicitis.,POSTOPERATIVE DIAGNOSIS:,  Acut...|
|PREOPERATIVE DIAGNOSES: , Left obstructed renal ureteropelvic junction obstruction status post py...|
+----------------------------------------------------------------------------------------------------+



## 5. Use the pipeline to create outputs

In [None]:
limited_df = df.limit(2)

result = nlp_pipeline.fit(limited_df).transform(limited_df)

result.select(F.explode(F.arrays_zip(result.ner_chunk.result, 
                                     result.ner_chunk.begin, 
                                     result.ner_chunk.end,
                                     result.ner_chunk.metadata,
                                     result.resolution.result,
                                     result.resolution.metadata,)).alias("cols"))\
      .select(F.expr("cols['0']").alias("chunk"),
                    F.expr("cols['1']").alias("begin"),
                    F.expr("cols['2']").alias("end"),
                    F.expr("cols['3']['entity']").alias("entity"),
                    F.expr("cols['4']").alias("UML_code"),
                    F.expr("cols['5']['all_k_results']").alias("all_codes"),
                    F.expr("cols['5']['all_k_resolutions']").alias("resolutions")).show(truncate=40)

+---------------------+-----+---+---------+--------+----------------------------------------+----------------------------------------+
|                chunk|begin|end|   entity|UML_code|                               all_codes|                             resolutions|
+---------------------+-----+---+---------+--------+----------------------------------------+----------------------------------------+
|       pericardectomy|  238|251|Procedure|   33030|33030:::33020:::64746:::49250:::27350...|Pericardectomy [Pericardiectomy, subt...|
| chest tube placement|  320|339|Procedure|   39503|39503:::96440:::32553:::35820:::32100...|Insertion of chest tube [Repair, neon...|
|drainage of the fluid|  345|365|Procedure|   10140|10140:::40800:::61108:::41006:::62180...|Drainage of blood or fluid accumulati...|
|         thoracoscopy|  380|391|Procedure| 1020900|1020900:::32654:::32668:::1006014:::3...|Thoracoscopy [Thoracoscopy]:::Thoraco...|
|             dialysis|  125|132|Procedure| 1019071|101

## 6. Visualize results

In [None]:
from sparknlp_display import EntityResolverVisualizer

resolver_viz = EntityResolverVisualizer()


for j in range(limited_df.count()):
    resolver_viz.display(result = result.collect()[j], label_col = "ner_chunk", resolution_col="resolution")
    print("\n\n")









