# Medical Demo

In this demo, we will create a system that simulates a doctor, answering medical questions from patients, using a large medical knowledge base.

First, lets download a dataset of medical information. For our purposes, we use the [ChatDoctor Instruction Dataset](https://huggingface.co/datasets/LinhDuong/chatdoctor-200k).

In [3]:
from datasets import load_dataset

In [4]:
data_url = "https://huggingface.co/datasets/LinhDuong/chatdoctor-200k/resolve/main/chatdoctor200k.json"

In [5]:
data_save_path = "medical_data"

In [6]:
import os
os.makedirs(data_save_path, exist_ok=True)

In [9]:
!cd {data_save_path} && wget {data_url}

Will not apply HSTS. The HSTS database must be a regular and non-world-writable file.
ERROR: could not open HSTS store at '/store/.wget-hsts'. HSTS will be disabled.
--2023-07-16 12:50:04--  https://huggingface.co/datasets/LinhDuong/chatdoctor-200k/resolve/main/chatdoctor200k.json
Resolving proxy-chain.intel.com (proxy-chain.intel.com)... 10.24.221.149, 10.24.221.149
Connecting to proxy-chain.intel.com (proxy-chain.intel.com)|10.24.221.149|:912... connected.
Proxy request sent, awaiting response... 302 Found
Location: https://cdn-lfs.huggingface.co/repos/95/6f/956f15a7b1e421e62ea685edd9e63f4f6a1a04b04f84b2c1640d2316dec510d2/818e03761c948d74feb19f781edf28c4637dfe45266e0dc921d1a2dca6f956de?response-content-disposition=attachment%3B+filename*%3DUTF-8%27%27chatdoctor200k.json%3B+filename%3D%22chatdoctor200k.json%22%3B&response-content-type=application%2Fjson&Expires=1689760204&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTY4OTc2MDIwNH19LCJSZXNvdXJ

In [7]:
import json

In [8]:
with open(f"{data_save_path}/chatdoctor200k.json", "r") as f:
    data = json.load(f)

In [9]:
len(data)

207408

lets see an example from the dataset:

In [10]:
print(json.dumps(data[1], indent=4))

{
    "instruction": "If you are a doctor, please answer the medical questions based on the patient's description.",
    "input": "I woke up this morning feeling the whole room is spinning when i was sitting down. I went to the bathroom walking unsteadily, as i tried to focus i feel nauseous. I try to vomit but it wont come out.. After taking panadol and sleep for few hours, i still feel the same.. By the way, if i lay down or sit down, my head do not spin, only when i want to move around then i feel the whole world is spinning.. And it is normal stomach discomfort at the same time? Earlier after i relieved myself, the spinning lessen so i am not sure whether its connected or coincidences.. Thank you doc!",
    "output": "Hi, Thank you for posting your query. The most likely cause for your symptoms is benign paroxysmal positional vertigo (BPPV), a type of peripheral vertigo. In this condition, the most common symptom is dizziness or giddiness, which is made worse with movements. Accomp

For this demo, lets take a subset of the documents, say a 1000 documents.

In [11]:
data = data[:1000]

In [12]:
from haystack import Document

We can transform the dataset into a document collection, where each document has the patient's question and the doctor's answer as the content.

In [13]:
from tqdm import tqdm

In [14]:
def get_document_text(item):
    return f"A patient asked: {item['input']} A doctor answered: {item['output']}"

document_collection = [Document(content=get_document_text(item)) for item in tqdm(data)]

100%|██████████| 1000/1000 [00:00<00:00, 5963.06it/s]


Now, we can store our documents in an index. For this example, we will use a simple InMemoryDocumentStore. 

In [20]:
from haystack.document_stores import InMemoryDocumentStore

data_store = InMemoryDocumentStore(use_bm25=True)

Next, we store the documents in the index

In [21]:
data_store.write_documents(document_collection)

Updating BM25 representation...:   0%|          | 0/1000 [00:00<?, ? docs/s]

In [22]:
data_store.get_document_count()

1000

On top of the data store, we define a retriver, to fetch the documents, using the BM25 algorithm.

In [23]:
from haystack.nodes.retriever import BM25Retriever

retriever = BM25Retriever(document_store = data_store)

Now that we have stored the documents, we can search over them. 
Lets take the retrieval stage further, and add a Deep Cross Encoder, to better rerank the documents

In [25]:
from haystack.nodes.ranker import SentenceTransformersRanker

reranker = SentenceTransformersRanker(
    batch_size= 32,
    model_name_or_path= "cross-encoder/ms-marco-MiniLM-L-6-v2",
    use_gpu= False
)

  return self.fget.__get__(instance, owner)()


After the reranker, we will have the final set of relevant documents for our query. 
Lets use a Large Language Model to answer the question, using the given documents.

First, we create the prompt model, that will answer our question. In our case, we use the [Flan Alpaca](https://huggingface.co/declare-lab/flan-sharegpt-xl) model.

In [26]:
from haystack.nodes import PromptModel

prompt_model = PromptModel(
    model_name_or_path= "declare-lab/flan-sharegpt-xl",
    use_gpu= True,
    model_kwargs= dict(
      model_max_length= 100000,
      load_in_8bit=True,
      device_map= {"": 6}  
    )
)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Now, lets define a prompt template for the model to use. The keywords "query" and "join(documents)" are placeholders for
the question and retrieved documents respectively.

In [27]:
prompt_template = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
You are a medical doctor, giving medical advice to patients. 
Give a diagnosis to the patient that asked the Question below.
Use only the information provided in the Input to respond.
Do not use any prior knowledge to respond.

### Input:
{join(documents)}

Question: {query}

### Response:"""

Here, we create the prompt_template object:

In [28]:
from haystack.nodes.prompt.prompt_template import PromptTemplate
from haystack.nodes import AnswerParser

prompt_template_object = PromptTemplate(
    name="lfqa",
    prompt_text=prompt_template,
    output_parser= AnswerParser()
)


To use the prompt model, we define a prompt node, that combines the prompt model with the chosen template:

In [29]:
from haystack.nodes.prompt import PromptNode

prompter = PromptNode(
    model_name_or_path= prompt_model,
    default_prompt_template=prompt_template_object
)


Now that we have all our components, lets combine them together to a single end-to-end pipeline:

In [30]:
from haystack import Pipeline

pipeline = Pipeline()

pipeline.add_node(component=retriever, name= 'Retriever',inputs= ["Query"])

pipeline.add_node(component=reranker, name= 'Reranker',inputs= ["Retriever"])

pipeline.add_node(component=prompter, name= 'Prompter',inputs= ["Reranker"])

We can now use the pipeline to answer our question:

In [31]:
result = pipeline.run(
    query="I feel the whole room is spinning, what is going on?",
    params = {
        "Retriever": {
            "top_k": 50
        },
        "Reranker": {
            "top_k": 3
        },
        "generation_kwargs": {
            "max_new_tokens": 1000,
        },
    },
)

Both `max_new_tokens` (=1000) and `max_length`(=100) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Here is the answer:

In [32]:
print(result["answers"][0].answer)

The patient's symptoms of feeling the whole room spinning and experiencing nausea and vomiting are consistent with benign paroxysmal positional vertigo (BPPV). BPPV is a type of peripheral vertigo, which is caused by a problem in the inner ear. The symptoms are usually worse with movements and can be accompanied by nausea and vomiting. Betahistine tablets would help relieve your symptoms. Doing vestibular rehabilitation or adaptation exercises would prevent the recurrence of these symptoms. An ENT evaluation would also help. I hope it helps. Best wishes, Chat Doctor.


And these a the supporting documents:

In [33]:
for doc in result["documents"]:
    print(f"{doc.content}\n\n")

A patient asked: I woke up this morning feeling the whole room is spinning when i was sitting down. I went to the bathroom walking unsteadily, as i tried to focus i feel nauseous. I try to vomit but it wont come out.. After taking panadol and sleep for few hours, i still feel the same.. By the way, if i lay down or sit down, my head do not spin, only when i want to move around then i feel the whole world is spinning.. And it is normal stomach discomfort at the same time? Earlier after i relieved myself, the spinning lessen so i am not sure whether its connected or coincidences.. Thank you doc! A doctor answered: Hi, Thank you for posting your query. The most likely cause for your symptoms is benign paroxysmal positional vertigo (BPPV), a type of peripheral vertigo. In this condition, the most common symptom is dizziness or giddiness, which is made worse with movements. Accompanying nausea and vomiting are common. The condition is due to problem in the ear, and improves in a few days on