

![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/streamlit_notebooks/healthcare_jsl/Visit_Normalization.ipynb)


## 1. Colab Setup

Import license keys

In [None]:
# Install the johnsnowlabs library to access Spark-OCR and Spark-NLP for Healthcare, Finance, and Legal.
! pip install -q johnsnowlabs

In [None]:
from google.colab import files
print("Please Upload your John Snow Labs License using the button below")
license_keys = files.upload()

In [None]:
from johnsnowlabs import *

# After uploading your license run this to install all licensed Python Wheels and pre-download Jars the Spark Session JVM
# Make sure to restart your notebook afterwards for changes to take effect

jsl.install()

## Start Session

In [None]:
from johnsnowlabs import *
# Automatically load license data and start a session with all jars user has access to
spark = jsl.start()

## 2. Select the NER model and construct the pipeline

In [None]:
model = 'visit_normalization'

In [None]:
input_list = ["""Sample Name: Mesothelioma - Pleural Biopsy
Description: Right pleural effusion and suspected malignant mesothelioma. (Medical Transcription Sample Report)
PREOPERATIVE DIAGNOSIS:  Right pleural effusion and suspected malignant mesothelioma.
POSTOPERATIVE DIAGNOSIS: Right pleural effusion, suspected malignant mesothelioma.
PROCEDURE:  Right VATS pleurodesis and pleural biopsy.
ANESTHESIA: General double-lumen endotracheal.
DESCRIPTION OF FINDINGS:  Right pleural effusion, firm nodules, diffuse scattered throughout the right pleura and diaphragmatic surface.
SPECIMEN:  Pleural biopsies for pathology and microbiology.
INDICATIONS:  Briefly, this is a 66-year-old gentleman who has been transferred from an outside hospital after a pleural effusion had been drained and biopsies taken from the right chest that were thought to be consistent with mesothelioma. Upon transfer, he had a right pleural effusion demonstrated on x-ray as well as some shortness of breath and dyspnea on exertion. The risks, benefits, and alternatives to right VATS pleurodesis and pleural biopsy were discussed with the patient and his family and they wished to proceed.
Dr. X was present for the entire procedure which was right VATS pleurodesis and pleural biopsies.The counts were correct x2 at the end of the case.""",
             """Sample Name: MediPort Placement
Description: Rhabdomyosarcoma of the left orbit. Left subclavian vein MediPort placement. Needs chemotherapy. (Medical Transcription Sample Report)
PREOPERATIVE DIAGNOSIS: Rhabdomyosarcoma of the left orbit.
POSTOPERATIVE DIAGNOSIS: Rhabdomyosarcoma of the left orbit.
PROCEDURE:  Left subclavian vein MediPort placement (7.5-French single-lumen).
INDICATIONS FOR PROCEDURE:  This patient is a 16-year-old girl, with newly diagnosed rhabdomyosarcoma of the left orbit. The patient is being taken to the operating room for MediPort placement. She needs chemotherapy.
DESCRIPTION OF PROCEDURE:  The patient was taken to the operating room, placed supine, put under general endotracheal anesthesia. The patient's neck, chest, and shoulders were prepped and draped in usual sterile fashion. An incision was made on the left shoulder area. The left subclavian vein was cannulated. The wire was passed, which was in good position under fluoro, using Seldinger Technique. Near wire incision site made a pocket above the fascia and sutured in a size 7.5-French single-lumen MediPort into the pocket in 4 places using 3-0 Nurolon. I then sized the catheter under fluoro and placed introducer and dilator over the wire, removed the wire and dilator, placed the catheter through the introducer and removed the introducer. The line tip was in good position under fluoro. It withdrew and flushed well. I then closed the incision using 4-0 Vicryl, 5-0 Monocryl for the skin, and dressed with Steri-Strips. Accessed the ports with a 1-inch 20-gauge Huber needle, and it withdrew and flushed well with final heparin flush. We secured this with Tegaderm. The patient is then to undergo bilateral bone marrow biopsy and lumbar puncture by Oncology.""",
             """Sample Name: Leiomyosarcoma
Description: Discharge summary of patient with leiomyosarcoma and history of pulmonary embolism, subdural hematoma, pancytopenia, and pneumonia. (Medical Transcription Sample Report)
ADMITTING DIAGNOSES: 1. Leiomyosarcoma.2. History of pulmonary embolism.3. History of subdural hematoma.4. Pancytopenia.5. History of pneumonia.
PROCEDURES DURING HOSPITALIZATION: 1. Cycle six of CIVI-CAD (Cytoxan, Adriamycin, and DTIC) from 07/22/2008 to 07/29/2008.2. CTA, chest PE study showing no evidence for pulmonary embolism.
3. Head CT showing no evidence of acute intracranial abnormalities.4. Sinus CT, normal mini-CT of the paranasal sinuses.
HOSPITAL COURSE: 1. Leiomyosarcoma, the patient was admitted to Hem/Onco B Service under attending Dr. XYZ for cycle six of continuous IV infusion Cytoxan, Adriamycin, and DTIC, which she tolerated well.2. History of pulmonary embolism. Upon admission, the patient reported an approximate two-week history of dyspnea on exertion and some mild chest pain. She underwent a CTA, which showed no evidence of pulmonary embolism and the patient was started on prophylactic doses of Lovenox at 40 mg a day. She had no further complaints throughout the hospitalization with any shortness of breath or chest pain.3. History of subdural hematoma, also on admission the patient noted some mild intermittent headaches that were fleeting in nature, several a day that would resolve on their own. Her headaches were not responding to pain medication and so on 07/24/2008, we obtained a head CT that showed no evidence of acute intracranial abnormalities. The patient also had a history of sinusitis and so a sinus CT scan was obtained, which was normal.4. Pancytopenia. On admission, the patient's white blood count was 3.4, hemoglobin 11.3, platelet count 82, and ANC of 2400. The patient's counts were followed throughout admission. She did not require transfusion of red blood cells or platelets; however, on 07/26/2008 her ANC did dip to 900 and she was placed on neutropenic diet. At discharge her ANC is back up to 1100 and she is taken off neutropenic diet. Her white blood cell count at discharge was 1.4 and her hemoglobin was 11.2 with a platelet count of 140.
5. History of pneumonia. During admission, the patient did not exhibit any signs or symptoms of pneumonia.
DISPOSITION:  Home in stable condition.
DIET:  Regular and less neutropenic.
ACTIVITY:  Resume same activity.
FOLLOWUP: The patient will have lab work at Dr. XYZ on 08/05/2008 and she will also return to the cancer center on 08/12/2008 at 10:20 a.m. The patient is also advised to monitor for any fevers greater than 100.5 and should she have any further problems in the meantime to please call in to be seen sooner.""",
             """Sample Name: Sickle Cell Anemia - ER Visit
Description: A 19-year-old known male with sickle cell anemia comes to the emergency room on his own with 3-day history of back pain.
(Medical Transcription Sample Report)
HISTORY OF PRESENT ILLNESS:  This is a 19-year-old known male with sickle cell anemia. He comes to the emergency room on his own with 3-day history of back pain. He is on no medicines. He does live with a room mate. Appetite is decreased. No diarrhea, vomiting. Voiding well. Bowels have been regular. Denies any abdominal pain. Complains of a slight headaches, but his main concern is back ache that extends from above the lower T-spine to the lumbosacral spine. The patient is not sure of his immunizations. The patient does have sickle cell and hemoglobin is followed in the Hematology Clinic.
ALLERGIES:  THE PATIENT IS ALLERGIC TO TYLENOL WITH CODEINE, but he states he can get morphine along with Benadryl.
MEDICATIONS:  He was previously on folic acid. None at the present time.
PAST SURGICAL HISTORY:  He has had no surgeries in the past.
FAMILY HISTORY:  Positive for diabetes, hypertension and cancer.
SOCIAL HISTORY:  He denies any smoking or drug usage.
PHYSICAL EXAMINATION:  VITAL SIGNS: On examination, the patient has a temp of 37 degrees tympanic, pulse was recorded at 37 per minute, but subsequently it was noted to be 66 per minute, respiratory rate is 24 per minute and blood pressure is 149/66, recheck blood pressure was 132/72.""",
             """Sample Name: Consult - Breast Cancer - 1
Description: The patient is a 57-year-old female with invasive ductal carcinoma of the left breast, T1c, Nx, M0 left breast carcinoma.
(Medical Transcription Sample Report)
CHIEF COMPLAINT:  Left breast cancer.
MEDICATION: She is currently on omeprazole for reflux and indigestion.
ALLERGIES: SHE HAS NO KNOWN DRUG ALLERGIES.
REVIEW OF SYSTEMS: Negative for any recent febrile illnesses, chest pains or shortness of breath. Positive for restless leg syndrome. Negative for any unexplained weight loss and no change in bowel or bladder habits.
FAMILY HISTORY: Positive for breast cancer in her mother and also mesothelioma from possible asbestosis or asbestos exposure.
SOCIAL HISTORY: The patient works as a school teacher and teaching high school.
PHYSICAL EXAMINATION: The patient is a white female, alert and oriented x 3, appears her stated age of 57. Head is atraumatic and normocephalic. Sclerae are anicteric. Regular rate and rhythm. 
RECOMMENDATIONS:  I have discussed with the patient in detail about the diagnosis of breast cancer and the surgical options, and medical oncologist has discussed with her issues about adjuvant or neoadjuvant chemotherapy. We have decided to recommend to the patient breast conservation surgery with left breast lumpectomy with preoperative sentinel lymph node injection and mapping and left axillary dissection. The possibility of further surgery requiring wider lumpectomy or even completion mastectomy was explained to the patient. The procedure and risks of the surgery were explained to include, but not limited to extra bleeding, infection, unsightly scar formation, the possibility of local recurrence, the possibility of left upper extremity lymphedema was explained. Local numbness, paresthesias or chronic pain was explained. The patient was given an educational brochure and several brochures about the diagnosis and treatment of breast cancers. She was certainly encouraged to obtain further surgical medical opinions prior to proceeding. I believe the patient has given full informed consent and desires to proceed with the above.""",
             """Sample Name: Consult - Breast Cancer - 1
Description: A female with a history of peritoneal mesothelioma who has received prior intravenous chemotherapy.
(Medical Transcription Sample Report)
REASON FOR ADMISSION: Intraperitoneal chemotherapy.
MEDICATIONS:  Norco 10 per 325 one to two p.o. q.4h. p.r.n. pain, atenolol 50 mg p.o. b.i.d., Levoxyl 75 mcg p.o. daily, Phenergan 25 mg p.o. q.4-6h. p.r.n. nausea, lorazepam 0.5 mg every 8 hours as needed for anxiety, Ventolin HFA 2 puffs q.6h. p.r.n., Plavix 75 mg p.o. daily, Norvasc 10 mg p.o. daily, Cymbalta 60 mg p.o. daily, and Restoril 30 mg at bedtime as needed for sleep.
ALLERGIES:  THE PATIENT STATES THAT ON OCCASION LORAZEPAM DOSE PRODUCE HALLUCINATIONS, AND SHE HAD DIFFICULTY TOLERATING ATIVAN.
PHYSICAL EXAMINATION: Vital Signs : The patient's height is 165 cm, weight is 77 kg. BSA is 1.8 sq m. The vital signs reveal blood pressure to be 158/75, heart rate 61 per minute with a regular sinus rhythm, temperature of 96.6 degrees, respiratory rate 18 with an SpO2 of 100% on room air. She is normally developed; well nourished; very cooperative; oriented to person, place, and time; and in no distress at this time. She is anicteric.
DIAGNOSTIC IMPRESSION:
1. Intraperitoneal mesothelioma, partial remission, as noted by CT scan of the abdomen.2. Presumed left lower pole kidney hemorrhagic cyst.3. History of hypertension.4. Type 1 bipolar disease.
PLAN:  The patient will have appropriate laboratory studies done. A left renal ultrasound is requested to further delineate the possible hemorrhagic cyst in the lower left pole of the left kidney. Interventional radiology will access for ports in the abdomen. She will receive chemotherapy intraperitoneally. The plan will be to use intraperitoneal Taxol."""]

In [None]:
documentAssembler = nlp.DocumentAssembler()\
      .setInputCol("text")\
      .setOutputCol("document")
 
sentenceDetector = nlp.SentenceDetectorDLModel.pretrained()\
      .setInputCols(["document"])\
      .setOutputCol("sentence")
 
tokenizer = nlp.Tokenizer()\
      .setInputCols(["sentence"])\
      .setOutputCol("token")\
 
word_embeddings = nlp.WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")\
      .setInputCols(["sentence", "token"])\
      .setOutputCol("embeddings")
 
c2doc = nlp.Chunk2Doc()\
      .setInputCols("ner_chunk")\
      .setOutputCol("ner_chunk_doc") 
 
clinical_ner = medical.NerModel.pretrained("ner_jsl_slim", "en", "clinical/models") \
      .setInputCols(["sentence", "token", "embeddings"]) \
      .setOutputCol("ner")
 
ner_converter = nlp.NerConverter() \
      .setInputCols(["sentence", "token", "ner"]) \
      .setOutputCol("ner_chunk")\
      .setWhiteList(["Header"])


pipeline = Pipeline(
    stages = [
        documentAssembler,
        sentenceDetector,
        tokenizer,
        word_embeddings,
        clinical_ner,
        ner_converter
        ])
 
empty_df = spark.createDataFrame([[""]]).toDF('text')
pipeline_model = pipeline.fit(empty_df)

sentence_detector_dl download started this may take some time.
Approximate size to download 354.6 KB
[OK!]
embeddings_clinical download started this may take some time.
Approximate size to download 1.6 GB
[OK!]
ner_jsl_slim download started this may take some time.
[OK!]


In [None]:
files = [f"{i}.txt" for i in (range(1, len(input_list)+1))]

In [None]:
import pandas as pd

df = spark.createDataFrame(pd.DataFrame({'text': input_list, 'file' : files}))

In [None]:
light_pipeline = LightPipeline(pipeline_model)

## 3. NER Visuzalization

In [None]:
result_ner = pipeline_model.transform(df)

result_ner.select(F.explode(F.arrays_zip(result_ner.ner_chunk.result, 
                                         result_ner.ner_chunk.metadata)).alias("cols")) \
          .select(F.expr("cols['0']").alias("chunk"),
                  F.expr("cols['1']['entity']").alias("ner_label")).show(truncate=False)
          

+----------------------------------+---------+
|chunk                             |ner_label|
+----------------------------------+---------+
|Description:                      |Header   |
|PREOPERATIVE DIAGNOSIS:           |Header   |
|POSTOPERATIVE DIAGNOSIS:          |Header   |
|PROCEDURE:                        |Header   |
|ANESTHESIA:                       |Header   |
|DESCRIPTION OF FINDINGS:          |Header   |
|SPECIMEN:                         |Header   |
|INDICATIONS:                      |Header   |
|Description:                      |Header   |
|PREOPERATIVE DIAGNOSIS:           |Header   |
|POSTOPERATIVE DIAGNOSIS:          |Header   |
|PROCEDURE:                        |Header   |
|INDICATIONS FOR PROCEDURE:        |Header   |
|DESCRIPTION OF PROCEDURE:         |Header   |
|Description:                      |Header   |
|ADMITTING DIAGNOSES:              |Header   |
|PROCEDURES DURING HOSPITALIZATION:|Header   |
|HOSPITAL COURSE:                  |Header   |
|DISPOSITION:

In [None]:
from sparknlp_display import NerVisualizer

for i, sample in enumerate(input_list[:1]):
    
    print("*"*30)
    print(i+1)
    print("*"*30)
    
    light_result = light_pipeline.fullAnnotate(sample)
    
    visualiser = NerVisualizer()

    visualiser.display(light_result[0], label_col='ner_chunk', document_col='document')

******************************
1
******************************


## 4. Sections

In [None]:
light_pipeline = LightPipeline(pipeline_model)

result = light_pipeline.transform(df).toPandas()

In [None]:
result_exploded = result[["text", 'file', "ner_chunk"]].explode("ner_chunk")

In [None]:
result_exploded['start'] = result_exploded.ner_chunk.apply(lambda x: x[1])
result_exploded['end'] = result_exploded.ner_chunk.apply(lambda x: x[2])
result_exploded['section_header'] = result_exploded.ner_chunk.apply(lambda x: x[3])

In [None]:
def get_text(text,sta,end):
    return text[sta:end]
df = pd.DataFrame()
for file, group in result_exploded.groupby("file"):
    group['section_start'] = group.end + 1
    group['section_end'] = group.start.shift(-1).fillna(-1).astype(int)
    group['section_text'] = group.apply(lambda x: get_text(x.text, x.section_start, x.section_end), axis=1)
    df = df.append(group[['file', 'section_header', 'section_text']])
    df.section_header = df.section_header.str.replace(":", "")

In [None]:
OUTPUT_FILE_PATH = f"../content/"
# Create input & output folders

!mkdir -p $OUTPUT_FILE_PATH

In [None]:
#export results
for file, group in df.groupby("file"):
    group.index = group.section_header
    group[['section_text']].T.to_json(OUTPUT_FILE_PATH + "HEADER_" + file.split(".")[0] + ".json")

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
pd.read_json(OUTPUT_FILE_PATH + '/HEADER_1.json')

Unnamed: 0,Description,PREOPERATIVE DIAGNOSIS,POSTOPERATIVE DIAGNOSIS,PROCEDURE,ANESTHESIA,DESCRIPTION OF FINDINGS,SPECIMEN,INDICATIONS
section_text,Right pleural effusion and suspected malignan...,Right pleural effusion and suspected maligna...,"Right pleural effusion, suspected malignant m...",Right VATS pleurodesis and pleural biopsy.\n,General double-lumen endotracheal.\n,"Right pleural effusion, firm nodules, diffus...",Pleural biopsies for pathology and microbiol...,"Briefly, this is a 66-year-old gentleman who..."
