![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/streamlit_notebooks/healthcare/MEDICAL_TEXT_SUMMARIZATION.ipynb)

# **T5 Clinical Summarization / QA model**

# **Colab Setup**

In [None]:
import json, os
from google.colab import files

if 'spark_jsl.json' not in os.listdir():
  license_keys = files.upload()
  os.rename(list(license_keys.keys())[0], 'spark_jsl.json')

with open('spark_jsl.json') as f:
    license_keys = json.load(f)

# Defining license key-value pairs as local variables
locals().update(license_keys)
os.environ.update(license_keys)

In [None]:
# Installing pyspark and spark-nlp
! pip install --upgrade -q pyspark==3.1.2 spark-nlp==$PUBLIC_VERSION

# Installing Spark NLP Healthcare
! pip install --upgrade -q spark-nlp-jsl==$JSL_VERSION  --extra-index-url https://pypi.johnsnowlabs.com/$SECRET

# Installing Spark NLP Display Library for visualization
! pip install -q spark-nlp-display

In [3]:
import sparknlp
import sparknlp_jsl

from sparknlp.base import *
from sparknlp.annotator import *
from sparknlp_jsl.annotator import *

from pyspark.sql import SparkSession
from pyspark.sql import functions as F
from pyspark.ml import Pipeline,PipelineModel
from pyspark.sql.types import StringType, IntegerType

import pandas as pd
pd.set_option('display.max_colwidth', 200)

import warnings
warnings.filterwarnings('ignore')

params = {"spark.driver.memory":"16G", 
          "spark.kryoserializer.buffer.max":"2000M", 
          "spark.driver.maxResultSize":"2000M"} 

spark = sparknlp_jsl.start(license_keys['SECRET'],params=params)

print("Spark NLP Version :", sparknlp.version())
print("Spark NLP_JSL Version :", sparknlp_jsl.version())

spark

Spark NLP Version : 4.2.8
Spark NLP_JSL Version : 4.2.8


# **`t5_base_pubmedqa`**

📌This model is specifically trained on medical data for text summarization and question answering.

⛓️ https://nlp.johnsnowlabs.com/2022/10/25/t5_base_pubmedqa_en.html

In [4]:
text_list = [
    """A 44-year-old man underwent cryolipolysis for unwanted fat in the pectoral region. At 4 month follow-up, the patient had well-demarcated tissue growth in the treatment areas. He elected to undergo additional cryolipolysis treatment to the areas. Two months later, he was found to have further tissue growth in the treatment areas. The patient then underwent corrective treatment with liposuction. A 52-year-old man underwent cryolipolysis for unwanted lower abdominal fat. At one year follow-up, he had a well-demarcated, subcutaneous mass on the lower abdomen corresponding to the treatment site. The patient elected to undergo corrective treatment with liposuction. Adipose tissue samples from the treated and non-treated areas, for control, were collected, processed, and stained to evaluate cellularity and tissue structure. In our practice, the incidence of PAH is 0.47% or 2 in 422 cryolipolysis treatments. This is 100 times greater than the device manufacturer's reported incidence. Histopathologic examination of the subcutaneous tissue mass showed an increased number of adipocytes, fibrosis, and scar tissue in the treated areas when compared to controls. No lipoblasts, a marker of malignant neoplastic proliferation, were identified on the histopathologic examination of the affected tissues.""",
    """A 48-year-old right handed gardener presented with a white discoloration and numbness of her left ring finger. She reported cutting her roses without protection gloves so repetitive scratchy lesions especially of her left hand occurred. On examination the pulse of the left radial artery was absent. Allen's test showed a dominant ulnar supply of the palmar arch. Duplex ultrasound demonstrated an occluded aneurysm of the distal portion of the left radial artery. Furthermore there were occlusions of the first and fourth digital artery on MR angiography probably due to distal emboli of the radial aneurysm. After exclusion of systemic disease or vasculitis, an repetitive trauma due to rose thorns was supposed to be the cause of the radial aneurysm. Anticoagulation therapy was initiated and infusion of prostaglanden E1 was performed over 7 days. The digital ischemia resolved within a few days. Therefore a surgical procedure was not recommended.""",
    """Normal physical traits but no period MESSAGE: I'm a 40 yr. old woman that has infantile reproductive organs and have never experienced a mensus. I have had Doctors look but they all say I just have infantile female reproductive organs. When I try to look for answers on the internet I cannot find anything. ALL my \"girly\" parts are normal. My organs never matured. Could you give me more information please.""",
    """Giant cell arteritis (GCA) of the breast is one of the less recognized variants of this vasculitis and may represent an isolated finding or a manifestation of a more widespread disease. We present the case of a 74-year-old woman with malaise and a 14-day persistent fever, reaching 38 degrees C. There was a bilateral, painless and mobile axillary lymphadenopathy and a slight tenderness over the medial and lateral upper quadrants of her left breast, as well as an independent palpable tender mass in the upper outer quadrant of the same breast measuring 2 cm in its greatest diameter. Constitutional symptoms, anemia and an elevated erythrocyte sedimentation rate suggestive of polymyalgia rheumatica were also present. An invasive ductal carcinoma of the breast with coincidental pathologic findings of GCA in the same biopsy specimen was revealed. In this case, arteritis was limited to the breast and presented with diffuse breast tenderness. No other artery was involved by GCA. All arteritis-related symptoms disappeared after the removal of the tumor.""",
    """Lichen aureus is localized variant of persistent pigmented purpuric dermatitis that typically affects the legs and can be associated with delayed hypersensitivity reactions or vascular abnormalities. Plasma cell vulvitis (Zoon's vulvitis) is a rare condition that frequently contains hemosiderin deposits and is suspected to be a mucosal reaction pattern due to variety of insults, most often local irritation or trauma. A 50-year-old female with longstanding complaints of spotting, vulvar dryness, irritation, and dyspareunia presented with circumscribed, purpuric, erythematous vulvar patches. Past estrogen cream treatment evoked symptoms of discomfort. On biopsy, siderophages and extravasated red blood cells were found in conjunction with a lichenoid, lymphocyte and plasma cell infiltrate, and dilated dermal and intraepithelial vessels."""
]

data = spark.createDataFrame(text_list, StringType()).toDF('text')
data.show(truncate=50)

+--------------------------------------------------+
|                                              text|
+--------------------------------------------------+
|A 44-year-old man underwent cryolipolysis for u...|
|A 48-year-old right handed gardener presented w...|
|Normal physical traits but no period MESSAGE: I...|
|Giant cell arteritis (GCA) of the breast is one...|
|Lichen aureus is localized variant of persisten...|
+--------------------------------------------------+



In [5]:
document_assembler = DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("documents")

t5 = T5Transformer().pretrained("t5_base_pubmedqa", "en", "clinical/models")\
    .setInputCols(["documents"])\
    .setOutputCol("t5_output")\
    .setTask("summarize medical questions:")\
    .setMaxOutputLength(200)

pipeline = Pipeline(stages=[
                        document_assembler,
                        t5])


results = pipeline.fit(data).transform(data)

t5_base_pubmedqa download started this may take some time.
Approximate size to download 874.2 MB
[OK!]


In [6]:
results.select("text", "t5_output.result").show(truncate=100)

+----------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------+
|                                                                                                text|                                                                                         result|
+----------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------+
|A 44-year-old man underwent cryolipolysis for unwanted fat in the pectoral region. At 4 month fol...|                                         [Is there a risk of liposuction in the treated areas?]|
|A 48-year-old right handed gardener presented with a white discoloration and numbness of her left...|                           [Do repetitive trauma to the palmar arch cause the radial aneurysm?]|
|Norm