![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)


# **MedicalQuestionAnswering**

This notebook will cover the different parameters and usages of `MedicalQuestionAnswering` annotator.

**📖 Learning Objectives:**

1. Understand how to use `MedicalQuestionAnswering`.

2. Become comfortable using the different parameters of the annotator.

3. Train an `MedicalQuestionAnswering` based on pattern matching.


**🔗 Helpful Links:**

- Documentation : [MedicalQuestionAnswering](https://nlp.johnsnowlabs.com/docs/en/licensed_annotators#questionanswering)

- Python Docs : [MedicalQuestionAnswering](https://nlp.johnsnowlabs.com/licensed/api/python/reference/autosummary/sparknlp_jsl/annotator/qa/medical_qa/index.html#sparknlp_jsl.annotator.qa.medical_qa.MedicalQuestionAnswering)

- Scala Docs : [MedicalQuestionAnswering](https://nlp.johnsnowlabs.com/licensed/api/com/johnsnowlabs/nlp/annotators/qa/MedicalQuestionAnswering.html)

- For extended examples of usage, see the [Spark NLP Workshop repository](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/healthcare-nlp/23.0.Medical_Question_Answering.ipynb).

## **📜 Background**


`MedicalQuestionAnswering` is a GPT-based model for answering questions given a context. Unlike span-based models, it generates the answers to the questions, rather than selecting phrases from the given context. The model is capable of answering various types of questions, including yes-no or full-text ones. Types of questions are supported: "short" (producing yes/no/maybe) answers and "long" (full answers).

## **🎬 Colab Setup**

### !!! Important note:

To run this notebook, you need to configure your environment with high RAM.

In [None]:
# Install the johnsnowlabs library to access Spark-NLP for Healthcare
! pip install -q johnsnowlabs

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m265.2/265.2 kB[0m [31m4.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m310.8/310.8 MB[0m [31m2.9 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m565.0/565.0 kB[0m [31m27.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m676.2/676.2 kB[0m [31m40.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m95.6/95.6 kB[0m [31m11.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.1/3.1 MB[0m [31m81.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m139.2/139.2 kB[0m [31m16.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m66.9/66.9 kB[0m [31m6.

In [None]:
from google.colab import files
print('Please Upload your John Snow Labs License using the button below')
license_keys = files.upload()

Please Upload your John Snow Labs License using the button below


Saving 5.3.3.spark_nlp_for_healthcare.json to 5.3.3.spark_nlp_for_healthcare.json


In [None]:
from johnsnowlabs import nlp, medical

# After uploading your license run this to install all licensed Python Wheels and pre-download Jars the Spark Session JVM
nlp.install()

👌 Detected license file /content/5.3.3.spark_nlp_for_healthcare.json
🚨 Outdated Medical Secrets in license file. Version=5.3.3 but should be Version=5.3.2
🚨 Outdated OCR Secrets in license file. Version=5.1.2 but should be Version=5.3.2
👷 Trying to install compatible secrets. Use nlp.settings.enforce_versions=False if you want to install outdated secrets.
📋 Stored John Snow Labs License in /root/.johnsnowlabs/licenses/license_number_0_for_Spark-Healthcare_Spark-OCR.json
👷 Setting up  John Snow Labs home in /root/.johnsnowlabs, this might take a few minutes.
Downloading 🐍+🚀 Python Library spark_nlp-5.3.2-py2.py3-none-any.whl
Downloading 🐍+💊 Python Library spark_nlp_jsl-5.3.2-py3-none-any.whl
Downloading 🫘+🚀 Java Library spark-nlp-assembly-5.3.2.jar
Downloading 🫘+💊 Java Library spark-nlp-jsl-5.3.2.jar
🙆 JSL Home setup in /root/.johnsnowlabs
👌 Detected license file /content/5.3.3.spark_nlp_for_healthcare.json
👷 Trying to install compatible secrets. Use nlp.settings.enforce_versions=False 

In [None]:
# Automatically load license data and start a session with all jars user has access to
spark = nlp.start()
spark

👌 Detected license file /content/5.3.3.spark_nlp_for_healthcare.json
👷 Trying to install compatible secrets. Use nlp.settings.enforce_versions=False if you want to install outdated secrets.
👌 Launched [92mcpu optimized[39m session with with: 🚀Spark-NLP==5.3.2, 💊Spark-Healthcare==5.3.2, running on ⚡ PySpark==3.4.0


## **🖨️ Input/Output Annotation Types**

- Input: `DOCUMENT`, `DOCUMENT`

- Output: `CHUNK`

## **🔎 Parameters**


- `questionType`: Question type, e.g. “short” or “long”. The question types depend on the model.
- `maxNewTokens`: Maximum number of of new tokens to generate, by default 30.
- `maxContextLength`: Maximum length of context text.
- `configProtoBytes`: ConfigProto from tensorflow, serialized into byte array.
- `doSample`: Whether or not to use sampling; use greedy decoding otherwise, by default False.
- `topK`: The number of highest probability vocabulary tokens to consider, by default 1.
- `noRepeatNgramSize`: The number of tokens that can’t be repeated in the same order. Useful for preventing loops. The default is 0.
- `ignoreTokenIds`: A list of token ids which are ignored in the decoder’s output, by default [].
- `randomSeed`: Set to positive integer to get reproducible results, by default None.
- `customPrompt`: Custom prompt template. Available variables {QUESTION} and {CONTEXT}.

All the parameters can be set using the corresponding set method in camel case. For example, `.setInputcols()`.

### `questionType`

Let's define a pipeline to process raw texts into `questionType`.

#### Long Answer

In [None]:
document_assembler = nlp.MultiDocumentAssembler()\
    .setInputCols("question", "context")\
    .setOutputCols("document_question", "document_context")

med_qa = medical.MedicalQuestionAnswering.pretrained("medical_qa_biogpt","en","clinical/models")\
    .setInputCols(["document_question", "document_context"])\
    .setOutputCol("answer")\
    .setQuestionType("long")

pipeline = nlp.Pipeline(stages=[document_assembler,
                                med_qa])

medical_qa_biogpt download started this may take some time.
[OK!]


In [None]:
paper_abstract = [
    "We have previously reported the feasibility of diagnostic and therapeutic peritoneoscopy including liver biopsy, gastrojejunostomy, and tubal ligation by an oral transgastric approach. We present results of per-oral transgastric splenectomy in a porcine model. The goal of this study was to determine the technical feasibility of per-oral transgastric splenectomy using a flexible endoscope. We performed acute experiments on 50-kg pigs. All animals were fed liquids for 3 days prior to procedure. The procedures were performed under general anesthesia with endotracheal intubation. The flexible endoscope was passed per orally into the stomach and puncture of the gastric wall was performed with a needle knife. The puncture was extended to create a 1.5-cm incision using a pull-type sphincterotome, and a double-channel endoscope was advanced into the peritoneal cavity. The peritoneal cavity was insufflated with air through the endoscope. The spleen was visualized. The splenic vessels were ligated with endoscopic loops and clips, and then mesentery was dissected using electrocautery. Endoscopic splenectomy was performed on six pigs. There were no complications during gastric incision and entrance into the peritoneal cavity. Visualization of the spleen and other intraperitoneal organs was very good. Ligation of the splenic vessels and mobilization of the spleen were achieved using commercially available devices and endoscopic accessories."
]

question = ["Transgastric endoscopic splenectomy: is it possible?"]

In [None]:
data = spark.createDataFrame([[paper_abstract[0],  question[0]]]).toDF("context","question")

data.show(truncate = 100)

+----------------------------------------------------------------------------------------------------+----------------------------------------------------+
|                                                                                             context|                                            question|
+----------------------------------------------------------------------------------------------------+----------------------------------------------------+
|We have previously reported the feasibility of diagnostic and therapeutic peritoneoscopy includin...|Transgastric endoscopic splenectomy: is it possible?|
+----------------------------------------------------------------------------------------------------+----------------------------------------------------+



In [None]:
result = pipeline.fit(data).transform(data)

result.selectExpr("document_question.result as Question", "answer.result as Long_Answer").show(truncate=False)

+------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|Question                                              |Long_Answer                                                                                                                                                               |
+------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|[Transgastric endoscopic splenectomy: is it possible?]|[per - oral transgastric splenectomy was technically feasible in a porcine model. further studies are necessary to determine the safety and efficacy of this procedure in]|
+------------------------------------------------------+--------------------------------

#### Short Answer

In [None]:
med_qa = medical.MedicalQuestionAnswering.pretrained("medical_qa_biogpt","en","clinical/models")\
    .setInputCols(["document_question", "document_context"])\
    .setOutputCol("answer")\
    .setQuestionType("short")

pipeline = nlp.Pipeline(stages=[document_assembler,
                                med_qa])

medical_qa_biogpt download started this may take some time.
[OK!]


In [None]:
result = pipeline.fit(data).transform(data)

result.selectExpr("document_question.result as Question", "answer.result as Short_Answer").show(truncate=False)

+------------------------------------------------------+------------+
|Question                                              |Short_Answer|
+------------------------------------------------------+------------+
|[Transgastric endoscopic splenectomy: is it possible?]|[yes]       |
+------------------------------------------------------+------------+



### `setMaxNewTokens`

In [None]:
med_qa = medical.MedicalQuestionAnswering.pretrained("medical_qa_biogpt","en","clinical/models")\
    .setInputCols(["document_question", "document_context"])\
    .setOutputCol("answer")\
    .setQuestionType("long")\
    .setMaxNewTokens(20)

pipeline = nlp.Pipeline(stages=[document_assembler,
                                med_qa])

medical_qa_biogpt download started this may take some time.
[OK!]


In [None]:
result = pipeline.fit(data).transform(data)

result.selectExpr("document_question.result as Question", "answer.result as Answer").show(truncate=False)

+------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
|Question                                              |Answer                                                                                                          |
+------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
|[Transgastric endoscopic splenectomy: is it possible?]|[per - oral transgastric splenectomy was technically feasible in a porcine model. further studies are necessary]|
+------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+



### `maxContextLength`

In [None]:
med_qa = medical.MedicalQuestionAnswering.pretrained("medical_qa_biogpt","en","clinical/models")\
    .setInputCols(["document_question", "document_context"])\
    .setOutputCol("answer")\
    .setQuestionType("long")\
    .setMaxContextLength(300)

pipeline = nlp.Pipeline(stages=[document_assembler,
                                med_qa])

medical_qa_biogpt download started this may take some time.
[OK!]


In [None]:
result = pipeline.fit(data).transform(data)

result.selectExpr("document_question.result as Question", "answer.result as Answer").show(truncate=False)

+------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
|Question                                              |Answer                                                                                                          |
+------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
|[Transgastric endoscopic splenectomy: is it possible?]|[per - oral transgastric splenectomy was technically feasible in a porcine model. further studies are necessary]|
+------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+



### `TopK`

In [None]:
med_qa = medical.MedicalQuestionAnswering.pretrained("medical_qa_biogpt","en","clinical/models")\
    .setInputCols(["document_question", "document_context"])\
    .setOutputCol("answer")\
    .setQuestionType("long")\
    .setTopK(2)

pipeline = nlp.Pipeline(stages=[document_assembler,
                                med_qa])

medical_qa_biogpt download started this may take some time.
[OK!]


In [None]:
result = pipeline.fit(data).transform(data)

result.selectExpr("document_question.result as Question", "answer.result as Answer").show(truncate=False)

+------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
|Question                                              |Answer                                                                                                          |
+------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
|[Transgastric endoscopic splenectomy: is it possible?]|[per - oral transgastric splenectomy was technically feasible in a porcine model. further studies are necessary]|
+------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+



### `noRepeatNgramSize`

In [None]:
med_qa = medical.MedicalQuestionAnswering.pretrained("medical_qa_biogpt","en","clinical/models")\
    .setInputCols(["document_question", "document_context"])\
    .setOutputCol("answer")\
    .setQuestionType("long")\
    .setNoRepeatNgramSize(2)

pipeline = nlp.Pipeline(stages=[document_assembler,
                                med_qa])

medical_qa_biogpt download started this may take some time.
[OK!]


In [None]:
result = pipeline.fit(data).transform(data)

result.selectExpr("document_question.result as Question", "answer.result as Answer").show(truncate=False)

+------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------+
|Question                                              |Answer                                                                                                                       |
+------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------+
|[Transgastric endoscopic splenectomy: is it possible?]|[per oral endoscopic transgastrically assisted splenectomy is technically feasible in pigs. further studies are necessary to]|
+------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------+



### `customPrompt`

In [None]:
med_qa = medical.MedicalQuestionAnswering.pretrained("medical_qa_biogpt","en","clinical/models")\
    .setInputCols(["document_question", "document_context"])\
    .setOutputCol("answer")\
    .setCustomPrompt("CONTEXT")

pipeline = nlp.Pipeline(stages=[document_assembler,
                                med_qa])

medical_qa_biogpt download started this may take some time.
[OK!]


In [None]:
result = pipeline.fit(data).transform(data)

result.selectExpr("document_question.result as Question", "answer.result as Result").show(truncate=False)

+------------------------------------------------------+------------------------------------------------------------------------------------------------------------+
|Question                                              |Result                                                                                                      |
+------------------------------------------------------+------------------------------------------------------------------------------------------------------------+
|[Transgastric endoscopic splenectomy: is it possible?]|[context: to evaluate the effect of a short - term exercise program on the quality of life of breast cancer]|
+------------------------------------------------------+------------------------------------------------------------------------------------------------------------+



In [None]:
med_qa = medical.MedicalQuestionAnswering.pretrained("medical_qa_biogpt","en","clinical/models")\
    .setInputCols(["document_question", "document_context"])\
    .setOutputCol("answer")\
    .setCustomPrompt("QUESTION")

pipeline = nlp.Pipeline(stages=[document_assembler,
                                med_qa])

medical_qa_biogpt download started this may take some time.
[OK!]


In [None]:
result = pipeline.fit(data).transform(data)

result.selectExpr("document_question.result as Question", "answer.result as Result").show(truncate=False)

+------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------+
|Question                                              |Result                                                                                                                           |
+------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------+
|[Transgastric endoscopic splenectomy: is it possible?]|[question: does the timing of adjuvant therapy affect survival in patients with resected pancreatic cancer? context: the optimal]|
+------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------+

