![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/Spark_NLP_Udemy_MOOC/Healthcare_NLP/REChunkMerger.ipynb)

#   **📜 MedicalLLM**

**`MedicalLLM`** was designed to load and run large language models (LLMs) in GGUF format with scalable performance. Ideal for clinical and healthcare applications, MedicalLLM supports tasks like medical entity extraction, summarization, Q&A, Retrieval Augmented Generation (RAG), and conversational AI. With simple integration into Spark NLP pipelines, it allows for customizable batch sizes, prediction settings, and chat templates. GPU optimization is also available, enhancing its capabilities for high-performance environments. MedicalLLM empowers users to link medical entities and perform complex NLP tasks with efficiency and precision. MedicalLLM can be accessed using the `AutoGGUFModel`.

**📖 Learning Objectives:**

1. Understand how to use the annotator.

2. Become comfortable using the different parameters of the annotator.

**🔗 Helpful Links:**

- Reference Documentation: [MedicalLLM](https://nlp.johnsnowlabs.com/docs/en/licensed_annotators#medicalllm)

- Python Docs : [MedicalLLM](https://nlp.johnsnowlabs.com/licensed/api/python/reference/autosummary/sparknlp_jsl/annotator/medical_llm/medical_llm/index.html)

- For extended examples of usage, see the [MedicalLLM](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/Certification_Trainings/Healthcare/46.Loading_Medical_and_Open-Souce_LLMs.ipynb#scrollTo=USxzeevIrPMX) notebooks.


## **🎬 Colab Setup**

In [None]:
# Install the johnsnowlabs library to access Spark-NLP for Healthcare
! pip install -q johnsnowlabs

In [None]:
from google.colab import files
print('Please Upload your John Snow Labs License using the button below')
license_keys = files.upload()

In [None]:
from johnsnowlabs import nlp, medical

# After uploading your license run this to install all licensed Python Wheels and pre-download Jars the Spark Session JVM
nlp.settings.enforce_versions=True
nlp.install()

In [4]:
# Automatically load license data and start a session with all jars user has access to
spark = nlp.start()

👌 Detected license file /content/spark_nlp_for_healthcare_spark_ocr_9596 (2).json
👌 Launched [92mcpu optimized[39m session with with: 🚀Spark-NLP==5.5.0, 💊Spark-Healthcare==5.5.0, running on ⚡ PySpark==3.4.0


In [5]:
spark

## **🖨️ Input/Output Annotation Types**

- Input: `DOCUMENT`

- Output: `DOCUMENT`

## **🔎 Parameters**


**Parameters**:

- `setNGpuLayers` :  if you have GPU`: The `setSeparator` parameter allows users to define a custom string that will be used to separate merged entities within the output phrase.  Set the number of layers to store in VRAM (-1 - use default)


- `temperature` : Set the temperature. Adjusts the randomness in selecting tokens during text generation, with values ranging from 0 (deterministic) to 1 (maximally random).

      
  

# Pipeline

JSL_MedM model is trained to perform Summarization and Q&A based on a given context.

In [6]:
medm_prompt = """
summarize the following content.

 content:
 ---------------------------- INDICATIONS AND USAGE ---------------------------
 KISUNLA is an amyloid beta-directed antibody indicated for the
 treatment of Alzheimer’s disease. Treatment with KISUNLA should be
 initiated in patients with mild cognitive impairment or mild dementia
 stage of disease, the population in which treatment was initiated in the
 clinical trials. (1)
 ------------------------DOSAGE AND ADMINISTRATION-----------------------
 • Confirm the presence of amyloid beta pathology prior to initiating
 treatment. (2.1)
 • The recommended dosage of KISUNLA is 700 mg administered as
 an intravenous infusion over approximately 30 minutes every four
 weeks for the first three doses, followed by 1400 mg every four
 weeks. (2.2)
 • Consider stopping dosing with KISUNLA based on reduction of
 amyloid plaques to minimal levels on amyloid PET imaging. (2.2)
 • Obtain a recent baseline brain MRI prior to initiating treatment.
 (2.3, 5.1)
 • Obtain an MRI prior to the 2nd, 3rd, 4th, and 7th infusions. If
 radiographically observed ARIA occurs, treatment
 recommendations are based on type, severity, and presence of
 symptoms. (2.3, 5.1)
 • Dilution to a final concentration of 4 mg/mL to 10 mg/mL with 0.9%
 Sodium Chloride Injection, is required prior to administration. (2.4)
 ----------------------DOSAGE FORMS AND STRENGTHS---------------------
 Injection: 350 mg/20 mL (17.5 mg/mL) in a single-dose vial. (3)
 ------------------------------- CONTRAINDICATIONS ------------------------------
 KISUNLA is contraindicated in patients with known serious
 hypersensitivity to donanemab-azbt or to any of the excipients. (4, 5.2)
 ------------------------WARNINGS AND PRECAUTIONS-----------------------
 • Amyloid Related Imaging Abnormalities (ARIA): Enhanced clinical
 vigilance for ARIA is recommended during the first 24 weeks of
 treatment with KISUNLA. Risk of ARIA, including symptomatic
 ARIA, was increased in apolipoprotein E ε4 (ApoE ε4)
 homozygotes compared to heterozygotes and noncarriers. The risk
 of ARIA-E and ARIA-H is increased in KISUNLA-treated patients
 with pretreatment microhemorrhages and/or superficial siderosis. If
 a patient experiences symptoms suggestive of ARIA, clinical
 evaluation should be performed, including MRI scanning if
 indicated. (2.3, 5.1)
 • Infusion-Related Reactions: The infusion rate may be reduced, or
 the infusion may be discontinued, and appropriate therapy initiated
 as clinically indicated. Consider pre-treatment with antihistamines,
 acetaminophen, or corticosteroids prior to subsequent dosing. (5.3)
 -------------------------------ADVERSE REACTIONS------------------------------
 Most common adverse reactions (at least 10% and higher incidence
 compared to placebo): ARIA-E, ARIA-H microhemorrhage, ARIA-H
 superficial siderosis, and headache. (6.1)
"""

data = spark.createDataFrame([[medm_prompt]]).toDF("text")
data.show(truncate=100)

+----------------------------------------------------------------------------------------------------+
|                                                                                                text|
+----------------------------------------------------------------------------------------------------+
|\nsummarize the following content.\n\n content:\n ---------------------------- INDICATIONS AND US...|
+----------------------------------------------------------------------------------------------------+



## Parameters :



### 📌 `setTemperature()`
- **Definition**: Adjusts the randomness in selecting tokens during text generation, with values ranging from 0 (deterministic) to 1 (maximally random).
- **Effect**:
  - **Low Values**: Produces more focused and predictable outputs.
  - **High Values**: Encourages more varied and creative responses, but may lead to less coherent text.


#### Temperature : 0.0

In [None]:
# Document Assembler
documentAssembler = nlp.DocumentAssembler() \
    .setInputCol("text") \
    .setOutputCol("document")

medical_llm_temp0 = medical.AutoGGUFModel.pretrained("jsl_medm_q8_v1", "en", "clinical/models") \
    .setInputCols(["document"]) \
    .setOutputCol("completions") \
    .setBatchSize(1)\
    .setTemperature(0.0)  # Lower temperature for more deterministic results

pipeline_temp0 = nlp.Pipeline(stages=[documentAssembler, medical_llm_temp0])

results_temp0 = pipeline_temp0.fit(data).transform(data)


jsl_medm_q8_v1 download started this may take some time.
[OK!]


In [None]:
print("First Run Results with Temperature = 0:")
print(results_temp0.select("completions").collect()[0].completions[0].result)

First Run Results with Temperature = 0:
KISUNLA is an amyloid beta-directed antibody indicated for the treatment of Alzheimer's disease. It is recommended to initiate treatment in patients with mild cognitive impairment or mild dementia stage of disease. The recommended dosage is 700 mg administered as an intravenous infusion over approximately 30 minutes every four weeks for the first three doses, followed by 1400 mg every four weeks. Patients should have a recent baseline brain MRI prior to initiating treatment and obtain an MRI prior to the 2nd, 3rd, 4th, and 7th infusions. KISUNLA is contraindicated in patients with known serious hypersensitivity to donanemab-azbt or to any of the excipients. Common adverse reactions include ARIA-E, ARIA-H microhemorrhage, ARIA-H superficial siderosis, and headache.


In [None]:
print("Second Run Results with Temperature = 0:")
print(results_temp0.select("completions").collect()[0].completions[0].result)

Second Run Results with Temperature = 0:
KISUNLA is an amyloid beta-directed antibody indicated for the treatment of Alzheimer's disease. It is recommended to initiate treatment in patients with mild cognitive impairment or mild dementia stage of disease. The recommended dosage is 700 mg administered as an intravenous infusion over approximately 30 minutes every four weeks for the first three doses, followed by 1400 mg every four weeks. Patients should have a recent baseline brain MRI prior to initiating treatment and obtain an MRI prior to the 2nd, 3rd, 4th, and 7th infusions. KISUNLA is contraindicated in patients with known serious hypersensitivity to donanemab-azbt or to any of the excipients. Common adverse reactions include ARIA-E, ARIA-H microhemorrhage, ARIA-H superficial siderosis, and headache.


 🔎 In this demonstration above, We set the temperature parameter to 0, making the model's responses completely deterministic. By running the pipeline twice with this setting, we can observe that the outputs remain consistent across runs, showing how a zero-temperature setting ensures that the model generates the same result for the same input every time.

#### Temperature : 0.7

In [None]:
medical_llm_temp0.setTemperature(0.7)
results_temp07 = pipeline_temp0.fit(data).transform(data)

In [None]:
print("First Run Results with Temperature = 0.7:")
print(results_temp07.select("completions").collect()[0].completions[0].result)

First Run Results with Temperature = 0.7:
Here is a summary of the content:



In [None]:
print("Second Run Results with Temperature = 0.7:")
print(results_temp07.select("completions").collect()[0].completions[0].result)

Second Run Results with Temperature = 0.7:
KISUNLA is an amyloid beta-directed antibody indicated for the treatment of Alzheimer's disease. It is recommended for patients with mild cognitive impairment or mild dementia. The dosage is 700mg intravenous infusion over 30 minutes every four weeks for the first three doses, followed by 1400mg every four weeks. Amyloid plaques should be reduced to minimal levels on amyloid PET imaging, and a recent baseline brain MRI should be obtained prior to treatment. Patients with a history of serious hypersensitivity to the drug or its excipients should not receive KISUNLA.


🔎This time we set the temperature parameter to 0.7, allowing for some variation in the model’s responses. With this setting, the model introduces a degree of randomness in its output, making the summaries slightly different each time we run the pipeline. This variability demonstrates how a higher temperature value encourages the model to explore alternative wordings or perspectives, leading to diverse summarization results across multiple runs.

### 📌 `setNGpuLayers` Parameter
- **Definition**: Specifies the number of model layers to store in GPU memory for processing.
- **Effect**:
  - **Positive Impact**: Increases performance and speed when using a GPU, allowing for faster inference and handling larger models.
  - **Negative Impact**: Setting it too high may lead to out-of-memory errors if the GPU cannot accommodate the specified layers. Setting it to `-1` uses default settings, optimizing resource usage automatically.

### CPU
Initially, we perform the timing calculations using the CPU

In [7]:
import time

documentAssembler = nlp.DocumentAssembler() \
    .setInputCol("text") \
    .setOutputCol("document")

# NGpuLayers = -1 (CPU only)
medical_llm_cpu = medical.AutoGGUFModel.pretrained("jsl_medm_q8_v1", "en", "clinical/models") \
    .setInputCols(["document"]) \
    .setOutputCol("completions")\
    .setBatchSize(1)

pipeline_cpu = nlp.Pipeline(stages=[documentAssembler, medical_llm_cpu])
model_cpu = pipeline_cpu.fit(data)

start_time = time.time()
results_cpu = model_cpu.transform(data)
results_cpu.select("completions.result").show(truncate=False)
cpu_time = time.time() - start_time

print(f"CPU processing time: {cpu_time:.2f} seconds")

jsl_medm_q8_v1 download started this may take some time.
[OK!]
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|result                                                                                                                                                       

### GPU
We have now switch the runtime to a GPU machine by starting Spark with the hardware_target="gpu" parameter, allowing us to observe the effect of the setNGpuLayers() parameter.

In [None]:
# Automatically load license data and start a session with all jars user has access to
spark = nlp.start(hardware_target="gpu")

👌 Detected license file /content/spark_nlp_for_healthcare_spark_ocr_9596 (2).json
🤓 Looks like you are missing some jars, trying fetching them ...
👌 Detected license file /content/spark_nlp_for_healthcare_spark_ocr_9596 (2).json
Downloading 🫘+🚀 Java Library spark-nlp-gpu-assembly-5.5.0.jar
🙆 JSL Home setup in /root/.johnsnowlabs
👌 Detected license file /content/spark_nlp_for_healthcare_spark_ocr_9596 (2).json
👌 Launched [92mgpu optimized[39m session with with: 🚀Spark-NLP==5.5.0, 💊Spark-Healthcare==5.5.0, running on ⚡ PySpark==3.4.0


In [None]:
spark

In [None]:
# NGpuLayers = 400 (Using GPU)
medical_llm_gpu = medical.AutoGGUFModel.pretrained("jsl_medm_q8_v1", "en", "clinical/models") \
    .setInputCols(["document"]) \
    .setOutputCol("completions") \
    .setNGpuLayers(400)

pipeline_gpu = nlp.Pipeline(stages=[documentAssembler, medical_llm_gpu])
model_gpu = pipeline_gpu.fit(data)

start_time = time.time()
results_gpu = model_gpu.transform(data)
results_gpu.select("completions.result").show(truncate=False)
gpu_time = time.time() - start_time

print(f"\nGPU processing time with setNGpuLayers(400): {gpu_time:.2f} seconds")

jsl_medm_q8_v1 download started this may take some time.
[OK!]
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|result                                                                                                                                                       

🔎  **Exploring the Effect of the `setNGpuLayers()` Parameter**

The `setNGpuLayers()` parameter directly impacting processing speed. Here’s a comparison of CPU and GPU timing:

**CPU vs. GPU with `setNGpuLayers(400)`**:
- **CPU Only**: 120.24 seconds — slower due to sequential processing.
- **GPU (400 layers)**: 40.82 seconds — faster by leveraging the GPU’s parallel processing power.

These results show that using `setNGpuLayers(400)` maximizes GPU efficiency, significantly reducing training time.