![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/streamlit_notebooks/healthcare/MEDICAL_QUESTION_ANSWERING.ipynb)

# QUESTION ANSWERING

In [None]:
# Installing pyspark and spark-nlp
! pip install --upgrade -q pyspark==3.1.2 spark-nlp==4.2.4

In [2]:
import sparknlp

from sparknlp.base import *
from sparknlp.annotator import *

from pyspark.sql import SparkSession
from pyspark.sql import functions as F
from pyspark.ml import Pipeline,PipelineModel

spark = sparknlp.start(gpu=True)

spark

## Load Sample Data

In [3]:
!wget https://raw.githubusercontent.com/vgupta123/sumpubmed/master/text/text_10000.txt
!wget https://raw.githubusercontent.com/vgupta123/sumpubmed/master/text/text_10001.txt
!wget https://raw.githubusercontent.com/vgupta123/sumpubmed/master/text/text_10002.txt
!wget https://raw.githubusercontent.com/vgupta123/sumpubmed/master/text/text_10003.txt
!wget https://raw.githubusercontent.com/vgupta123/sumpubmed/master/text/text_10004.txt

--2023-02-06 23:21:35--  https://raw.githubusercontent.com/vgupta123/sumpubmed/master/text/text_10000.txt
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.109.133, 185.199.108.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.109.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 17087 (17K) [text/plain]
Saving to: ‘text_10000.txt’


2023-02-06 23:21:35 (30.4 MB/s) - ‘text_10000.txt’ saved [17087/17087]

--2023-02-06 23:21:35--  https://raw.githubusercontent.com/vgupta123/sumpubmed/master/text/text_10001.txt
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 14588 (14K) [text/plain]
Saving to: ‘text_10001.txt’


2023-02-06 23:21:35 (31.7 MB/s) - ‘text_10001.

## English RobertaForQuestionAnswering (from nlpunibo)


In [4]:
new_text = []
questions = {0: ["What is the most prevalent cause of human dermatophytoses?", 
                 "How does T. Rubrum respond to the the environmental pH?",
                 "What does trichophyton rubrum colonize?"],
             1: ["Which organ detects pheromones?", 
                 "What do the results indicate that v1r functions as?", 
                 "In which animals are putative functional v2r genes present?"],
             2: ["Which cell types are able to express nitric oxide?", 
                 "What can the release of NO restrict?", 
                 "Melioidosis is an emerging infectious disease in which areas?"],
             3: ["What mediates the exchange of GTP for GDP?", 
                 "How are ras-like gtpases turned on?", 
                 "When are ras-like gtpases turned off?"],
             4: ["What is used to study gene function in vivo?", 
                 "What is constitutive gene deletion lethal for?", 
                 "Why is the deletion of the gene of interest frequently incomplete?"]}
             
for i in range(5):
    with open(f'text_1000{i}.txt', 'r') as f:
        pbmed = f.read()
        for x in questions[i]:
          new_text.append([pbmed[11:],x])

example = spark.createDataFrame(new_text).toDF("context", "question")

In [5]:
example.show(truncate=40)

+----------------------------------------+----------------------------------------+
|                                 context|                                question|
+----------------------------------------+----------------------------------------+
|trichophyton rubrum is a cosmopolitan...|What is the most prevalent cause of h...|
|trichophyton rubrum is a cosmopolitan...|How does T. Rubrum respond to the the...|
|trichophyton rubrum is a cosmopolitan...| What does trichophyton rubrum colonize?|
|pheromones are chemical substances th...|         Which organ detects pheromones?|
|pheromones are chemical substances th...|What do the results indicate that v1r...|
|pheromones are chemical substances th...|In which animals are putative functio...|
|nitric oxide  is a free radical molec...|Which cell types are able to express ...|
|nitric oxide  is a free radical molec...|    What can the release of NO restrict?|
|nitric oxide  is a free radical molec...|Melioidosis is an emerging infecti

In [6]:
document_assembler = MultiDocumentAssembler() \
    .setInputCols(["question", "context"]) \
    .setOutputCols(["document_question", "document_context"])

answer_roberta = RoBertaForQuestionAnswering.pretrained("roberta_qa_nlpunibo_roberta","en") \
    .setInputCols(["document_question", "document_context"]) \
    .setOutputCol("answer_roberta") \
    .setCaseSensitive(True) \
    .setMaxSentenceLength(512)

answer_distilbert = DistilBertForQuestionAnswering.pretrained("distilbert_qa_transformers_qa","en") \
    .setInputCols(["document_question", "document_context"]) \
    .setOutputCol("answer_distilbert") \
    .setCaseSensitive(True) \
    .setMaxSentenceLength(512)

answer_bert = BertForQuestionAnswering.pretrained("bert_qa_spanbert_recruit_qa","en") \
    .setInputCols(["document_question", "document_context"]) \
    .setOutputCol("answer_bert") \
    .setCaseSensitive(True) \
    .setMaxSentenceLength(512)

pipeline = Pipeline().setStages([
document_assembler,
answer_roberta,
answer_distilbert,
answer_bert,
])

roberta_qa_nlpunibo_roberta download started this may take some time.
Approximate size to download 442 MB
[OK!]
distilbert_qa_transformers_qa download started this may take some time.
Approximate size to download 232.8 MB
[OK!]
bert_qa_spanbert_recruit_qa download started this may take some time.
Approximate size to download 1.1 GB
[OK!]


In [7]:
result = pipeline.fit(example).transform(example)

In [8]:
result.select('question', 'answer_roberta.result').show(truncate=100)

+------------------------------------------------------------------+-----------------------------------------------------------------+
|                                                          question|                                                           result|
+------------------------------------------------------------------+-----------------------------------------------------------------+
|        What is the most prevalent cause of human dermatophytoses?|                                            [trichophyton rubrum]|
|           How does T. Rubrum respond to the the environmental pH?|                        [by altering its gene expression profile]|
|                           What does trichophyton rubrum colonize?|                                           [human skin and nails]|
|                                   Which organ detects pheromones?|                                              [vomeronasal organ]|
|               What do the results indicate that v1r f

In [9]:
result.select('question', 'answer_distilbert.result').show(truncate=100)

+------------------------------------------------------------------+----------------------------------------------------------------------------------------------------+
|                                                          question|                                                                                              result|
+------------------------------------------------------------------+----------------------------------------------------------------------------------------------------+
|        What is the most prevalent cause of human dermatophytoses?|                                                                               [trichophyton rubrum]|
|           How does T. Rubrum respond to the the environmental pH?|                                                              [altering its gene expression profile]|
|                           What does trichophyton rubrum colonize?|                                                                              [hum

In [10]:
result.select('question', 'answer_bert.result').show(truncate=100)

+------------------------------------------------------------------+--------------------------------------------------+
|                                                          question|                                            result|
+------------------------------------------------------------------+--------------------------------------------------+
|        What is the most prevalent cause of human dermatophytoses?|                             [trichophyton rubrum]|
|           How does T. Rubrum respond to the the environmental pH?|         [by altering its gene expression profile]|
|                           What does trichophyton rubrum colonize?|                            [human skin and nails]|
|                                   Which organ detects pheromones?|                               [vomeronasal organ]|
|               What do the results indicate that v1r functions as?|                              [pheromone receptor]|
|       In which animals are putative fu