![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)

# **MedicalSummarizer**

This notebook will cover the different parameters and usages of `MedicalSummarizer`.

**📖 Learning Objectives:**

1. Background: Understand the `MedicalSummarizer` Annotator.

2. Colab setup.

3. Become comfortable with using the different parameters of the annotator.

**🔗 Helpful Links:**

- Python Docs : [MedicalSummarizer](https://nlp.johnsnowlabs.com/licensed/api/python/reference/autosummary/sparknlp_jsl/annotator/seq2seq/medical_summarizer/index.html#sparknlp_jsl.annotator.seq2seq.medical_summarizer.MedicalSummarizer)

- Scala Docs: [MedicalSummarizer](https://nlp.johnsnowlabs.com/licensed/api/com/johnsnowlabs/nlp/annotators/seq2seq/MedicalSummarizer.html)

- For extended examples of usage, see [Spark NLP Workshop repository](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/healthcare-nlp/24.0.Medical_Text_Summarization.ipynb).


## **📜 Background**

🔎 **Text Summarization** is a natural language processing (NLP) task that involves condensing a lengthy text document into a shorter, more compact version while still retaining the most important information and meaning. The goal is to produce a summary that accurately represents the content of the original text in a concise form.

🔎There are different approaches to text summarization, including `extractive methods` that identify and extract important sentences or phrases from the text, and `abstractive methods` that generate new text based on the content of the original text.

## **🎬 Colab Setup**

In [None]:
import os, json

from google.colab import files

if 'spark_jsl.json' not in os.listdir():
  license_keys = files.upload()
  os.rename(list(license_keys.keys())[0], 'spark_jsl.json')

with open('spark_jsl.json') as f:
    license_keys = json.load(f)

locals().update(license_keys)
os.environ.update(license_keys)


In [None]:
!pip install -q johnsnowlabs

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m126.2/126.2 kB[0m [31m3.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m310.8/310.8 MB[0m [31m2.9 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m547.3/547.3 kB[0m [31m46.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m665.7/665.7 kB[0m [31m51.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m95.4/95.4 kB[0m [31m13.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.1/3.1 MB[0m [31m84.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m139.3/139.3 kB[0m [31m15.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m66.9/66.9 kB[0m [31m8.

In [None]:
from johnsnowlabs import nlp, medical
nlp.install()

👌 Detected license file /content/spark_jsl.json
📋 Stored John Snow Labs License in /root/.johnsnowlabs/licenses/license_number_0_for_Spark-Healthcare_Spark-OCR.json
👷 Setting up  John Snow Labs home in /root/.johnsnowlabs, this might take a few minutes.
Downloading 🐍+🚀 Python Library spark_nlp-5.2.2-py2.py3-none-any.whl
Downloading 🐍+💊 Python Library spark_nlp_jsl-5.2.1-py3-none-any.whl
Downloading 🫘+🚀 Java Library spark-nlp-assembly-5.2.2.jar
Downloading 🫘+💊 Java Library spark-nlp-jsl-5.2.1.jar
🙆 JSL Home setup in /root/.johnsnowlabs
👌 Detected license file /content/spark_jsl.json
Installing /root/.johnsnowlabs/py_installs/spark_nlp_jsl-5.2.1-py3-none-any.whl to /usr/bin/python3
Installed 1 products:
💊 Spark-Healthcare==5.2.1 installed! ✅ Heal the planet with NLP! 


In [None]:
from johnsnowlabs import nlp, medical
spark = nlp.start()


👌 Detected license file /content/spark_jsl.json
👌 Launched [92mcpu optimized[39m session with with: 🚀Spark-NLP==5.2.2, 💊Spark-Healthcare==5.2.1, running on ⚡ PySpark==3.4.0


In [None]:
spark

In [None]:
def f(text):
  import textwrap
  text = textwrap.fill(text, width=120)
  return  '\n'+text

## **🖨️ Input/Output Annotation Types**



- Input:  ``DOCUMENT``  
- Output:  ``CHUNK``  

## **🔎 Parameters**

`maxNewTokens`: Maximum number of new tokens to be generated (Default: 30)

`maxTextLength`: Maximum length of context text.

`doSample`: Whether or not to use sampling, use greedy decoding otherwise (Default: false)

`noRepeatNgramSize`: If set to int > 0, all ngrams of that size can only occur once (Default: 0)

`refineSummary`: Set true to perform refined summarization at increased computation cost.

`refineSummaryTargetLength`: Target length for refined summary.

`refineChunkSize`: How large should refined chunks be. Should be equal to LLM context window size in tokens. Takes only effect when refineSummary=True'.

`refineMaxAttempts`: How many times should chunks be re-summarized while they are above SummaryTargetLength before stopping.

`topK`: The number of highest probability vocabulary tokens to keep for top-k-filtering (Default: 50)

`stopAtEos` : Stop text generation when the end-of-sentence token is encountered Default: False

`caseSensitive` : whether to ignore case in tokens for embeddings matching  Default: True





# ✍  Explaining MedicalSummarizer with an Example

## **💻Pipeline**

In [None]:
document_assembler = nlp.DocumentAssembler()\
    .setInputCol('text')\
    .setOutputCol('document')

sentenceDetector = nlp.SentenceDetectorDLModel\
    .pretrained("sentence_detector_dl_healthcare","en","clinical/models")\
    .setInputCols(["document"])\
    .setOutputCol("sentence")\
    .setCustomBounds(["\n"])\
    .setUseCustomBoundsOnly(True)

summarizer = medical.Summarizer\
    .pretrained("summarizer_clinical_jsl")\
    .setInputCols(['sentence'])\
    .setOutputCol('summary')\
    .setMaxTextLength(512)\
    .setMaxNewTokens(512)

pipeline = nlp.Pipeline(
    stages=[
        document_assembler,
        sentenceDetector,
        summarizer
])

model = pipeline.fit(spark.createDataFrame([[""]]).toDF("text"))

sentence_detector_dl_healthcare download started this may take some time.
Approximate size to download 367.3 KB
[OK!]
summarizer_clinical_jsl download started this may take some time.
[OK!]


In [None]:
summarizer.extractParamMap()

{Param(parent='MedicalSummarizer_3c37cacbb43e', name='batchSize', doc='Size of every batch'): 4,
 Param(parent='MedicalSummarizer_3c37cacbb43e', name='caseSensitive', doc='whether to ignore case in tokens for embeddings matching'): True,
 Param(parent='MedicalSummarizer_3c37cacbb43e', name='doSample', doc='Whether or not to use sampling; use greedy decoding otherwise'): False,
 Param(parent='MedicalSummarizer_3c37cacbb43e', name='ignoreTokenIds', doc="A list of token ids which are ignored in the decoder's output"): [],
 Param(parent='MedicalSummarizer_3c37cacbb43e', name='lazyAnnotator', doc='Whether this AnnotatorModel acts as lazy in RecursivePipelines'): False,
 Param(parent='MedicalSummarizer_3c37cacbb43e', name='maxNewTokens', doc='Maximum number of new tokens to be generated'): 512,
 Param(parent='MedicalSummarizer_3c37cacbb43e', name='maxTextLength', doc='Max text length to process'): 512,
 Param(parent='MedicalSummarizer_3c37cacbb43e', name='mlFrameworkType', doc='ML framework 

In [None]:
text = """PRESENT ILLNESS: The patient is a 28-year-old, who is status post gastric bypass surgery nearly one year ago. He has lost about 200 pounds and was otherwise doing well until yesterday evening around 7:00-8:00 when he developed nausea and right upper quadrant pain, which apparently wrapped around toward his right side and back. He feels like he was on it but has not done so. He has overall malaise and a low-grade temperature of 100.3. He denies any prior similar or lesser symptoms. His last normal bowel movement was yesterday. He denies any outright chills or blood per rectum.

PHYSICAL EXAMINATION: His temperature is 100.3, blood pressure 129/59, respirations 16, heart rate 84. He is drowsy, but easily arousable and appropriate with conversation. He is oriented to person, place, and situation. He is normocephalic, atraumatic. His sclerae are anicteric. His mucous membranes are somewhat tacky. His neck is supple and symmetric. His respirations are unlabored and clear. He has a regular rate and rhythm. His abdomen is soft. He has diffuse right upper quadrant tenderness, worse focally, but no rebound or guarding. He otherwise has no organomegaly, masses, or abdominal hernias evident. His extremities are symmetrical with no edema. His posterior tibial pulses are palpable and symmetric. He is grossly nonfocal neurologically.

PLAN: He will be admitted and placed on IV antibiotics. We will get an ultrasound this morning. He will need his gallbladder out, probably with intraoperative cholangiogram. Hopefully, the stone will pass this way. Due to his anatomy, an ERCP would prove quite difficult if not impossible unless laparoscopic assisted. Dr. X will see him later this morning and discuss the plan further. The patient understands."""

light_model = nlp.LightPipeline(model)
light_result = light_model.annotate(text)

for i in range(len(light_result['sentence'])):
    print("■ SENTENCE ■: ", f(light_result['sentence'][i]),"\n")
    print("■ SUMMARY ■: ", f(light_result['summary'][i]),"\n\n")
    print("-"*120)


CPU times: user 739 ms, sys: 105 ms, total: 844 ms
Wall time: 1min 43s


## ▶ `setMaxNewTokens`:

Maximum number of new tokens to be generated (Default: 30).

In [None]:
for setMaxNewTokens in [512,30]:
  summarizer = medical.Summarizer\
      .pretrained("summarizer_clinical_jsl")\
      .setInputCols(['sentence'])\
      .setOutputCol('summary')\
      .setMaxNewTokens(setMaxNewTokens)  # default:30

  pipeline = nlp.Pipeline(stages=[document_assembler, sentenceDetector, summarizer])
  model = pipeline.fit(spark.createDataFrame([[""]]).toDF("text"))

  text = """PRESENT ILLNESS: The patient is a 28-year-old, who is status post gastric bypass surgery nearly one year ago. He has lost about 200 pounds and was otherwise doing well until yesterday evening around 7:00-8:00 when he developed nausea and right upper quadrant pain, which apparently wrapped around toward his right side and back. He feels like he was on it but has not done so. He has overall malaise and a low-grade temperature of 100.3. He denies any prior similar or lesser symptoms. His last normal bowel movement was yesterday. He denies any outright chills or blood per rectum.
  PHYSICAL EXAMINATION: His temperature is 100.3, blood pressure 129/59, respirations 16, heart rate 84. He is drowsy, but easily arousable and appropriate with conversation. He is oriented to person, place, and situation. He is normocephalic, atraumatic. His sclerae are anicteric. His mucous membranes are somewhat tacky. His neck is supple and symmetric. His respirations are unlabored and clear. He has a regular rate and rhythm. His abdomen is soft. He has diffuse right upper quadrant tenderness, worse focally, but no rebound or guarding. He otherwise has no organomegaly, masses, or abdominal hernias evident. His extremities are symmetrical with no edema. His posterior tibial pulses are palpable and symmetric. He is grossly nonfocal neurologically.
  PLAN: He will be admitted and placed on IV antibiotics. We will get an ultrasound this morning. He will need his gallbladder out, probably with intraoperative cholangiogram. Hopefully, the stone will pass this way. Due to his anatomy, an ERCP would prove quite difficult if not impossible unless laparoscopic assisted. Dr. X will see him later this morning and discuss the plan further. The patient understands."""

  light_model = nlp.LightPipeline(model)
  light_result = light_model.annotate(text)

  for i in range(len(light_result['sentence'])):
      print("■ SENTENCE ■: ", f(light_result['sentence'][i]),"\n")
  print("■"*120)

  for i in range(len(light_result['sentence'])):
      print("■ SUMMARY ■: ", f(light_result['summary'][i]),"\n\n")

  print("■"*120)

summarizer_clinical_jsl download started this may take some time.
[OK!]
■ SENTENCE ■:  
PRESENT ILLNESS: The patient is a 28-year-old, who is status post gastric bypass surgery nearly one year ago. He has
lost about 200 pounds and was otherwise doing well until yesterday evening around 7:00-8:00 when he developed nausea and
right upper quadrant pain, which apparently wrapped around toward his right side and back. He feels like he was on it
but has not done so. He has overall malaise and a low-grade temperature of 100.3. He denies any prior similar or lesser
symptoms. His last normal bowel movement was yesterday. He denies any outright chills or blood per rectum. 

■ SENTENCE ■:  
PHYSICAL EXAMINATION: His temperature is 100.3, blood pressure 129/59, respirations 16, heart rate 84. He is drowsy, but
easily arousable and appropriate with conversation. He is oriented to person, place, and situation. He is normocephalic,
atraumatic. His sclerae are anicteric. His mucous membranes are somew

## ▶ `setMaxTextLength`:
Maximum length of context text to be processed; default: 512.


In [None]:
for   setMaxTextLength in [1024,10]:
  summarizer = medical.Summarizer\
      .pretrained("summarizer_clinical_jsl")\
      .setInputCols(['sentence'])\
      .setOutputCol('summary')\
      .setMaxNewTokens(512)\
      .setMaxTextLength(setMaxTextLength)  # default:1024

  pipeline = nlp.Pipeline(stages=[document_assembler, sentenceDetector, summarizer])

  model = pipeline.fit(spark.createDataFrame([[""]]).toDF("text"))

  text = """PRESENT ILLNESS: The patient is a 28-year-old, who is status post gastric bypass surgery nearly one year ago. He has lost about 200 pounds and was otherwise doing well until yesterday evening around 7:00-8:00 when he developed nausea and right upper quadrant pain, which apparently wrapped around toward his right side and back. He feels like he was on it but has not done so. He has overall malaise and a low-grade temperature of 100.3. He denies any prior similar or lesser symptoms. His last normal bowel movement was yesterday. He denies any outright chills or blood per rectum.
  PHYSICAL EXAMINATION: His temperature is 100.3, blood pressure 129/59, respirations 16, heart rate 84. He is drowsy, but easily arousable and appropriate with conversation. He is oriented to person, place, and situation. He is normocephalic, atraumatic. His sclerae are anicteric. His mucous membranes are somewhat tacky. His neck is supple and symmetric. His respirations are unlabored and clear. He has a regular rate and rhythm. His abdomen is soft. He has diffuse right upper quadrant tenderness, worse focally, but no rebound or guarding. He otherwise has no organomegaly, masses, or abdominal hernias evident. His extremities are symmetrical with no edema. His posterior tibial pulses are palpable and symmetric. He is grossly nonfocal neurologically.
  PLAN: He will be admitted and placed on IV antibiotics. We will get an ultrasound this morning. He will need his gallbladder out, probably with intraoperative cholangiogram. Hopefully, the stone will pass this way. Due to his anatomy, an ERCP would prove quite difficult if not impossible unless laparoscopic assisted. Dr. X will see him later this morning and discuss the plan further. The patient understands."""

  light_model = nlp.LightPipeline(model)
  light_result = light_model.annotate(text)

  for i in range(len(light_result['sentence'])):
      print(i+1, "   ■ SENTENCE ■: ", f(light_result['sentence'][i]),"\n")
      print(i+1, "   ■ SUMMARY ■: ", f(light_result['summary'][i]),"\n\n")
  print("■"*120)

summarizer_clinical_jsl download started this may take some time.
[OK!]
1    ■ SENTENCE ■:  
PRESENT ILLNESS: The patient is a 28-year-old, who is status post gastric bypass surgery nearly one year ago. He has
lost about 200 pounds and was otherwise doing well until yesterday evening around 7:00-8:00 when he developed nausea and
right upper quadrant pain, which apparently wrapped around toward his right side and back. He feels like he was on it
but has not done so. He has overall malaise and a low-grade temperature of 100.3. He denies any prior similar or lesser
symptoms. His last normal bowel movement was yesterday. He denies any outright chills or blood per rectum. 

1    ■ SUMMARY ■:  
A 28-year-old patient who had gastric bypass surgery nearly one year ago developed nausea and right upper quadrant pain,
which wrapped around his right side and back. He has malaise and a low-grade temperature of 100.3. He denies any
previous symptoms and has no other symptoms. 


2    ■ SENTENCE ■:  

## ▶ `setDoSample`:

Whether or not to use sampling - use greedy decoding otherwise, default: False.

In [None]:
summarizer = medical.Summarizer\
    .pretrained("summarizer_clinical_jsl")\
    .setInputCols(['sentence'])\
    .setOutputCol('summary')\
    .setMaxNewTokens(50)\
    .setMaxTextLength(512)\
    .setStopAtEos(True)

for setDoSample in [False, True]:
  summarizer = summarizer.setDoSample(setDoSample)  #default:False

  pipeline = nlp.Pipeline(stages=[document_assembler, sentenceDetector, summarizer])

  model = pipeline.fit(spark.createDataFrame([[""]]).toDF("text"))

  text = """PRESENT ILLNESS: The patient is a 28-year-old, who is status post gastric bypass surgery nearly one year ago. He has lost about 200 pounds and was otherwise doing well until yesterday evening around 7:00-8:00 when he developed nausea and right upper quadrant pain, which apparently wrapped around toward his right side and back. He feels like he was on it but has not done so. He has overall malaise and a low-grade temperature of 100.3. He denies any prior similar or lesser symptoms. His last normal bowel movement was yesterday. He denies any outright chills or blood per rectum.

  PHYSICAL EXAMINATION: His temperature is 100.3, blood pressure 129/59, respirations 16, heart rate 84. He is drowsy, but easily arousable and appropriate with conversation. He is oriented to person, place, and situation. He is normocephalic, atraumatic. His sclerae are anicteric. His mucous membranes are somewhat tacky. His neck is supple and symmetric. His respirations are unlabored and clear. He has a regular rate and rhythm. His abdomen is soft. He has diffuse right upper quadrant tenderness, worse focally, but no rebound or guarding. He otherwise has no organomegaly, masses, or abdominal hernias evident. His extremities are symmetrical with no edema. His posterior tibial pulses are palpable and symmetric. He is grossly nonfocal neurologically.

  PLAN: He will be admitted and placed on IV antibiotics. We will get an ultrasound this morning. He will need his gallbladder out, probably with intraoperative cholangiogram. Hopefully, the stone will pass this way. Due to his anatomy, an ERCP would prove quite difficult if not impossible unless laparoscopic assisted. Dr. X will see him later this morning and discuss the plan further. The patient understands."""

  light_model = nlp.LightPipeline(model)
  light_result = light_model.annotate(text)

  print("■"*40, 'setDoSample = ', setDoSample, "■"*60)
  for i in range(len(light_result['sentence'])):
      print("■ SENTENCE ■: ", f(light_result['sentence'][i]),"\n")
      print("■ SUMMARY ■: ", f(light_result['summary'][i]),"\n\n")



summarizer_clinical_jsl download started this may take some time.
[OK!]
■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ setDoSample =  False ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
■ SENTENCE ■:  
PRESENT ILLNESS: The patient is a 28-year-old, who is status post gastric bypass surgery nearly one year ago. He has
lost about 200 pounds and was otherwise doing well until yesterday evening around 7:00-8:00 when he developed nausea and
right upper quadrant pain, which apparently wrapped around toward his right side and back. He feels like he was on it
but has not done so. He has overall malaise and a low-grade temperature of 100.3. He denies any prior similar or lesser
symptoms. His last normal bowel movement was yesterday. He denies any outright chills or blood per rectum. 

■ SUMMARY ■:  
A 28-year-old patient who had gastric bypass surgery nearly one year ago developed nausea and right upper quadrant pain,
which wrapped around his right side and back. He has malaise and a l

## ▶ `noRepeatNgramSize`:

If set to int > 0, all ngrams of that size can only occur once; default: 0.

In [None]:
for NoRepeatNgramSize in [0,2]:
  summarizer = medical.Summarizer\
      .pretrained("summarizer_clinical_jsl")\
      .setInputCols(['sentence'])\
      .setOutputCol('summary')\
      .setMaxNewTokens(512)\
      .setMaxTextLength(512)\
      .setNoRepeatNgramSize(NoRepeatNgramSize)   # default:0

  pipeline = nlp.Pipeline(stages=[document_assembler, sentenceDetector, summarizer])
  model = pipeline.fit(spark.createDataFrame([[""]]).toDF("text"))

  text = """PRESENT ILLNESS: The patient is a 28-year-old, who is status post gastric bypass surgery nearly one year ago. He has lost about 200 pounds and was otherwise doing well until yesterday evening around 7:00-8:00 when he developed nausea and right upper quadrant pain, which apparently wrapped around toward his right side and back. He feels like he was on it but has not done so. He has overall malaise and a low-grade temperature of 100.3. He denies any prior similar or lesser symptoms. His last normal bowel movement was yesterday. He denies any outright chills or blood per rectum.
  PHYSICAL EXAMINATION: His temperature is 100.3, blood pressure 129/59, respirations 16, heart rate 84. He is drowsy, but easily arousable and appropriate with conversation. He is oriented to person, place, and situation. He is normocephalic, atraumatic. His sclerae are anicteric. His mucous membranes are somewhat tacky. His neck is supple and symmetric. His respirations are unlabored and clear. He has a regular rate and rhythm. His abdomen is soft. He has diffuse right upper quadrant tenderness, worse focally, but no rebound or guarding. He otherwise has no organomegaly, masses, or abdominal hernias evident. His extremities are symmetrical with no edema. His posterior tibial pulses are palpable and symmetric. He is grossly nonfocal neurologically.
  PLAN: He will be admitted and placed on IV antibiotics. We will get an ultrasound this morning. He will need his gallbladder out, probably with intraoperative cholangiogram. Hopefully, the stone will pass this way. Due to his anatomy, an ERCP would prove quite difficult if not impossible unless laparoscopic assisted. Dr. X will see him later this morning and discuss the plan further. The patient understands."""

  light_model = nlp.LightPipeline(model)
  light_result = light_model.annotate(text)

  for i in range(len(light_result['sentence'])):
      print(i+1, "   ■ SENTENCE ■: ", f(light_result['sentence'][i]),"\n")
      print(i+1, "   ■ SUMMARY ■: ", f(light_result['summary'][i]),"\n\n")
  print("■"*120)

summarizer_clinical_jsl download started this may take some time.
[OK!]
1    ■ SENTENCE ■:  
PRESENT ILLNESS: The patient is a 28-year-old, who is status post gastric bypass surgery nearly one year ago. He has
lost about 200 pounds and was otherwise doing well until yesterday evening around 7:00-8:00 when he developed nausea and
right upper quadrant pain, which apparently wrapped around toward his right side and back. He feels like he was on it
but has not done so. He has overall malaise and a low-grade temperature of 100.3. He denies any prior similar or lesser
symptoms. His last normal bowel movement was yesterday. He denies any outright chills or blood per rectum. 

1    ■ SUMMARY ■:  
A 28-year-old patient who had gastric bypass surgery nearly one year ago developed nausea and right upper quadrant pain,
which wrapped around his right side and back. He has malaise and a low-grade temperature of 100.3. He denies any
previous symptoms and has no other symptoms. 


2    ■ SENTENCE ■:  

## ▶ `refineSummary`:

This group of parameters are active only if refineSummary set to True.
```
setRefineSummary: Set to True for refined summarization with increased computational cost.
setRefineSummaryTargetLength: Define the target length of summarizations in tokens (delimited by whitespace). Effective only when setRefineSummary=True.
setRefineChunkSize: Specify the desired size of refined chunks. Should correspond to the LLM context window size in tokens. Effective only when - setRefineSummary=True.
setRefineMaxAttempts: Determine the number of attempts for re-summarizing chunks exceeding the setRefineSummaryTargetLength before discontinuing. Effective only when setRefineSummary=True.
```



In [None]:
summarizer = medical.Summarizer\
    .pretrained("summarizer_clinical_jsl")\
    .setInputCols(['sentence'])\
    .setOutputCol('summary')\
    .setMaxNewTokens(512)\
    .setMaxTextLength(512)\
    .setRefineSummary(True)\
    .setRefineSummaryTargetLength(100)\
    .setRefineMaxAttempts(3)\
    .setRefineChunkSize(512)

pipeline = nlp.Pipeline(stages=[document_assembler, sentenceDetector, summarizer])
model = pipeline.fit(spark.createDataFrame([[""]]).toDF("text"))

text = """PRESENT ILLNESS: The patient is a 28-year-old, who is status post gastric bypass surgery nearly one year ago. He has lost about 200 pounds and was otherwise doing well until yesterday evening around 7:00-8:00 when he developed nausea and right upper quadrant pain, which apparently wrapped around toward his right side and back. He feels like he was on it but has not done so. He has overall malaise and a low-grade temperature of 100.3. He denies any prior similar or lesser symptoms. His last normal bowel movement was yesterday. He denies any outright chills or blood per rectum.
PHYSICAL EXAMINATION: His temperature is 100.3, blood pressure 129/59, respirations 16, heart rate 84. He is drowsy, but easily arousable and appropriate with conversation. He is oriented to person, place, and situation. He is normocephalic, atraumatic. His sclerae are anicteric. His mucous membranes are somewhat tacky. His neck is supple and symmetric. His respirations are unlabored and clear. He has a regular rate and rhythm. His abdomen is soft. He has diffuse right upper quadrant tenderness, worse focally, but no rebound or guarding. He otherwise has no organomegaly, masses, or abdominal hernias evident. His extremities are symmetrical with no edema. His posterior tibial pulses are palpable and symmetric. He is grossly nonfocal neurologically.
PLAN: He will be admitted and placed on IV antibiotics. We will get an ultrasound this morning. He will need his gallbladder out, probably with intraoperative cholangiogram. Hopefully, the stone will pass this way. Due to his anatomy, an ERCP would prove quite difficult if not impossible unless laparoscopic assisted. Dr. X will see him later this morning and discuss the plan further. The patient understands."""

light_model = nlp.LightPipeline(model)
light_result = light_model.annotate(text)

for i in range(len(light_result['sentence'])):
    print(i+1, "   ■ SENTENCE ■: ", f(light_result['sentence'][i]),"\n")
    print(i+1, "   ■ SUMMARY ■: ", f(light_result['summary'][i]),"\n\n")
    print("■"*120)

summarizer_clinical_jsl download started this may take some time.
[OK!]
1    ■ SENTENCE ■:  
PRESENT ILLNESS: The patient is a 28-year-old, who is status post gastric bypass surgery nearly one year ago. He has
lost about 200 pounds and was otherwise doing well until yesterday evening around 7:00-8:00 when he developed nausea and
right upper quadrant pain, which apparently wrapped around toward his right side and back. He feels like he was on it
but has not done so. He has overall malaise and a low-grade temperature of 100.3. He denies any prior similar or lesser
symptoms. His last normal bowel movement was yesterday. He denies any outright chills or blood per rectum. 

1    ■ SUMMARY ■:  
An YMJCP doctor notes the patient with surgical surgery one years ago is experiencing malignant nausea, lower
extremities Pain which may relate to other conditions. The discomfort appears unrelated only due and it'll not have any
significant effect on their health levels or physical exams in recent we

## ▶ `topK`:

The number of highest probability vocabulary tokens to keep for top-k-filtering (Default: 50).


In [None]:
for topK in [50,10]:
  summarizer = medical.Summarizer\
      .pretrained("summarizer_clinical_jsl")\
      .setInputCols(['sentence'])\
      .setOutputCol('summary')\
      .setMaxNewTokens(512)\
      .setMaxTextLength(512)\
      .setTopK(topK)   # default:50

  pipeline = nlp.Pipeline(stages=[document_assembler, sentenceDetector, summarizer])
  model = pipeline.fit(spark.createDataFrame([[""]]).toDF("text"))

  text = """PRESENT ILLNESS: The patient is a 28-year-old, who is status post gastric bypass surgery nearly one year ago. He has lost about 200 pounds and was otherwise doing well until yesterday evening around 7:00-8:00 when he developed nausea and right upper quadrant pain, which apparently wrapped around toward his right side and back. He feels like he was on it but has not done so. He has overall malaise and a low-grade temperature of 100.3. He denies any prior similar or lesser symptoms. His last normal bowel movement was yesterday. He denies any outright chills or blood per rectum.
  PHYSICAL EXAMINATION: His temperature is 100.3, blood pressure 129/59, respirations 16, heart rate 84. He is drowsy, but easily arousable and appropriate with conversation. He is oriented to person, place, and situation. He is normocephalic, atraumatic. His sclerae are anicteric. His mucous membranes are somewhat tacky. His neck is supple and symmetric. His respirations are unlabored and clear. He has a regular rate and rhythm. His abdomen is soft. He has diffuse right upper quadrant tenderness, worse focally, but no rebound or guarding. He otherwise has no organomegaly, masses, or abdominal hernias evident. His extremities are symmetrical with no edema. His posterior tibial pulses are palpable and symmetric. He is grossly nonfocal neurologically.
  PLAN: He will be admitted and placed on IV antibiotics. We will get an ultrasound this morning. He will need his gallbladder out, probably with intraoperative cholangiogram. Hopefully, the stone will pass this way. Due to his anatomy, an ERCP would prove quite difficult if not impossible unless laparoscopic assisted. Dr. X will see him later this morning and discuss the plan further. The patient understands."""

  light_model = nlp.LightPipeline(model)
  light_result = light_model.annotate(text)

  for i in range(len(light_result['sentence'])):
      print(i+1, "   ■ SENTENCE ■: ", f(light_result['sentence'][i]),"\n")
      print(i+1, "   ■ SUMMARY ■: ", f(light_result['summary'][i]),"\n\n")
  print("■"*120)

summarizer_clinical_jsl download started this may take some time.
[OK!]
1    ■ SENTENCE ■:  
PRESENT ILLNESS: The patient is a 28-year-old, who is status post gastric bypass surgery nearly one year ago. He has
lost about 200 pounds and was otherwise doing well until yesterday evening around 7:00-8:00 when he developed nausea and
right upper quadrant pain, which apparently wrapped around toward his right side and back. He feels like he was on it
but has not done so. He has overall malaise and a low-grade temperature of 100.3. He denies any prior similar or lesser
symptoms. His last normal bowel movement was yesterday. He denies any outright chills or blood per rectum. 

1    ■ SUMMARY ■:  
One child may consider placing breast devices to maintain upper end anatom function as soon after starting up, before
moving back at overnight from birth again given repeated replacement options prior on Monday of March 2102. An infant
experiencing comfort can not present itself fully either until one

## ▶ `stopAtEos`:

Stop text generation when the end-of-sentence token is encountered; default:False.


In [None]:
for opt in [False, True]:
  summarizer = medical.Summarizer\
      .pretrained("summarizer_clinical_jsl")\
      .setInputCols(['sentence'])\
      .setOutputCol('summary')\
      .setCaseSensitive(True)\
      .setMaxNewTokens(25)\
      .setDoSample(False)\
      .setTopK(5)\
      .setStopAtEos(opt)

  pipeline = nlp.Pipeline(stages=[document_assembler, sentenceDetector, summarizer])
  model = pipeline.fit(spark.createDataFrame([[""]]).toDF("text"))

  text = """PRESENT ILLNESS: The patient is a 28-year-old, who is status post gastric bypass surgery nearly one year ago. He has lost about 200 pounds and was otherwise doing well until yesterday evening around 7:00-8:00 when he developed nausea and right upper quadrant pain, which apparently wrapped around toward his right side and back. He feels like he was on it but has not done so. He has overall malaise and a low-grade temperature of 100.3. He denies any prior similar or lesser symptoms. His last normal bowel movement was yesterday. He denies any outright chills or blood per rectum.
  PHYSICAL EXAMINATION: His temperature is 100.3, blood pressure 129/59, respirations 16, heart rate 84. He is drowsy, but easily arousable and appropriate with conversation. He is oriented to person, place, and situation. He is normocephalic, atraumatic. His sclerae are anicteric. His mucous membranes are somewhat tacky. His neck is supple and symmetric. His respirations are unlabored and clear. He has a regular rate and rhythm. His abdomen is soft. He has diffuse right upper quadrant tenderness, worse focally, but no rebound or guarding. He otherwise has no organomegaly, masses, or abdominal hernias evident. His extremities are symmetrical with no edema. His posterior tibial pulses are palpable and symmetric. He is grossly nonfocal neurologically.
  PLAN: He will be admitted and placed on IV antibiotics. We will get an ultrasound this morning. He will need his gallbladder out, probably with intraoperative cholangiogram. Hopefully, the stone will pass this way. Due to his anatomy, an ERCP would prove quite difficult if not impossible unless laparoscopic assisted. Dr. X will see him later this morning and discuss the plan further. The patient understands."""

  light_model = nlp.LightPipeline(model)
  light_result = light_model.annotate(text)
  print("■"*60, f'StopAtEos={opt}',"■"*60)

  for i in range(len(light_result['sentence'])):
      print(i+1, "   ■ SENTENCE ■: ", f(light_result['sentence'][i]),"\n")
      print(i+1, "   ■ SUMMARY ■: ", f(light_result['summary'][i]),"\n\n")

summarizer_clinical_jsl download started this may take some time.
[OK!]
■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ StopAtEos=False ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
1    ■ SENTENCE ■:  
PRESENT ILLNESS: The patient is a 28-year-old, who is status post gastric bypass surgery nearly one year ago. He has
lost about 200 pounds and was otherwise doing well until yesterday evening around 7:00-8:00 when he developed nausea and
right upper quadrant pain, which apparently wrapped around toward his right side and back. He feels like he was on it
but has not done so. He has overall malaise and a low-grade temperature of 100.3. He denies any prior similar or lesser
symptoms. His last normal bowel movement was yesterday. He denies any outright chills or blood per rectum. 

1    ■ SUMMARY ■:  
A 28-year old patient who had gastric bypass surgery nearly a year ago developed nausea and right upper quadrant pain 


2    ■ SENTENCE ■:  
PHYSICAL EXAMINATION: 

## ▶ `setCaseSensitive`:

Whether to ignore case in tokens for embeddings matching; default: True.


In [None]:
for cs in [True, False]:
  summarizer = medical.Summarizer\
      .pretrained("summarizer_clinical_jsl")\
      .setInputCols(['sentence'])\
      .setOutputCol('summary')\
      .setStopAtEos(False)\
      .setCaseSensitive(cs)  # default:True

  pipeline = nlp.Pipeline(stages=[document_assembler, sentenceDetector, summarizer])
  model = pipeline.fit(spark.createDataFrame([[""]]).toDF("text"))

  text = "Patient presented with SOB (shortness of breath) and CP (chest pain). Vital signs stable, O2 sat 95%. CXR revealed bilateral infiltrates. ABG showed hypoxemia. Started on O2 and IV antibiotics. Labs pending. Follow up in 24 hours."

  light_model = nlp.LightPipeline(model)
  light_result = light_model.annotate(text)

  for i in range(len(light_result['sentence'])):
    # print("■ SENTENCE ■: ", f(light_result['sentence'][i]),"\n")
    print("■ SUMMARY ■: ", f(light_result['summary'][i]),"\n\n")
    print("-"*120)
  print("■"*120)

summarizer_clinical_jsl download started this may take some time.
[OK!]
■ SUMMARY ■:  
The patient presented with shortness of breath and chest pain. Vital signs were stable, but oxygen saturation was 95% on
C 


------------------------------------------------------------------------------------------------------------------------
■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
summarizer_clinical_jsl download started this may take some time.
[OK!]
■ SUMMARY ■:  
The patient presented with shortness of breath and chest pain. Vital signs were stable, but oxygen saturation was 95% on
C 


------------------------------------------------------------------------------------------------------------------------
■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
