Medical ChatBot using RAG with BioMistral Open Source LLM

In [None]:
from google.colab import drive
drive.mount("/content/drive")

Mounted at /content/drive


mounting the healthy heart pdf from the drive

In [None]:
!pip install langchain sentence-transformers chromadb llama-cpp-python langchain_community pypdf

Collecting langchain
  Downloading langchain-0.3.0-py3-none-any.whl.metadata (7.1 kB)
Collecting sentence-transformers
  Downloading sentence_transformers-3.1.1-py3-none-any.whl.metadata (10 kB)
Collecting chromadb
  Downloading chromadb-0.5.7-py3-none-any.whl.metadata (6.8 kB)
Collecting llama-cpp-python
  Downloading llama_cpp_python-0.2.90.tar.gz (63.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m63.8/63.8 MB[0m [31m11.5 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Installing backend dependencies ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting langchain_community
  Downloading langchain_community-0.3.0-py3-none-any.whl.metadata (2.8 kB)
Collecting pypdf
  Downloading pypdf-5.0.0-py3-none-any.whl.metadata (7.4 kB)
Collecting langchain-core<0.4.0,>=0.3.0 (from langchain)
  Downloading langchain_core-0.3.5-py3-none-

Installing GenAI Packages
Chromadb as vector store
langchain for loading document
llama for loading llm
pypdf for reading pdf

In [None]:
from langchain_community.document_loaders import PyPDFDirectoryLoader #reads pdf
from langchain.text_splitter import RecursiveCharacterTextSplitter #divides the text into chunks
from langchain_community.embeddings import SentenceTransformerEmbeddings #generating embeddings for the chunks
from langchain.vectorstores import Chroma #storing embeddings and chunks into vector stores
from langchain_community.llms import LlamaCpp #loading llm model
from langchain.chains import RetrievalQA, LLMChain #bulding the end to end application





importing the document to read and divide to chunks

In [None]:
loader = PyPDFDirectoryLoader("/content/drive/MyDrive/DataHeart/file")
docs = loader.load()

In [None]:
len(docs)

95

In [None]:
docs[1]

Document(metadata={'source': '/content/drive/MyDrive/DataHeart/file/healthyheart.pdf', 'page': 1}, page_content='YOUR GUIDE TO\nA Healthy Heart\nU.S. D EPARTMENT OF HEALTH AND HUMAN SERVICES\nNational Institutes of Health\nNational Heart, Lung, and Blood InstituteNIH Publication No. 06-5269December 2005')

Chunking the data

In [None]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=300, chunk_overlap=50) #300 chracters and 50 chracters overlap
chunks = text_splitter.split_documents(docs)

In [None]:
len(chunks)

747

In [None]:

chunks[601]

Document(metadata={'source': '/content/drive/MyDrive/DataHeart/file/healthyheart.pdf', 'page': 74}, page_content='How To Choose a Weight-Loss Program\nSome people lose weight on their own, while others like the supportof a structured program. If you decide to participate in a weight-loss program, here are some questions to ask before you join:\nDoes the program provide counseling to help you change your eat-')

Embedding Creations

In [None]:
import os
os.environ['HUGGINGFACEHUB_API_TOKEN'] = "hf_zZKrvkPtZektKuNbbMaCCqYuyZtYMRxRZ"

In [None]:
embeddings = SentenceTransformerEmbeddings(model_name="NeuML/pubmedbert-base-embeddings") #generating embeddings

Vector Store Creation

In [None]:
vectorstore = Chroma.from_documents(chunks, embeddings) #to store the contents
#the embedding models are passed to the chunks and we create embedding vectors
#using hybrid search

In [None]:
query = "what are the symptoms for heart disease?"

search_results = vectorstore.similarity_search(query)

In [None]:
search_results

[Document(metadata={'page': 5, 'source': '/content/drive/MyDrive/DataHeart/healthyheart.pdf'}, page_content='■As early as age 45, a man’s risk of heart disease begins to rise significantly. For a woman, risk starts to increase at age 55.\n■Fifty percent of men and 64 percent of women who die suddenlyof heart disease have no previous symptoms of the disease.1Heart Disease: Why Should You Care?Heart Disease:'),
 Document(metadata={'page': 5, 'source': '/content/drive/MyDrive/DataHeart/healthyheart.pdf'}, page_content='■As early as age 45, a man’s risk of heart disease begins to rise significantly. For a woman, risk starts to increase at age 55.\n■Fifty percent of men and 64 percent of women who die suddenlyof heart disease have no previous symptoms of the disease.1Heart Disease: Why Should You Care?Heart Disease:'),
 Document(metadata={'page': 5, 'source': '/content/drive/MyDrive/DataHeart/file/healthyheart.pdf'}, page_content='■As early as age 45, a man’s risk of heart disease begins to

In [None]:
retriever = vectorstore.as_retriever(search_kwargs={'k':2}) #has more functionalities
#gives only top two searches with the KNN technique

In [None]:
retriever.get_relevant_documents(query)

[Document(metadata={'page': 5, 'source': '/content/drive/MyDrive/DataHeart/healthyheart.pdf'}, page_content='■As early as age 45, a man’s risk of heart disease begins to rise significantly. For a woman, risk starts to increase at age 55.\n■Fifty percent of men and 64 percent of women who die suddenlyof heart disease have no previous symptoms of the disease.1Heart Disease: Why Should You Care?Heart Disease:'),
 Document(metadata={'page': 5, 'source': '/content/drive/MyDrive/DataHeart/file/healthyheart.pdf'}, page_content='■As early as age 45, a man’s risk of heart disease begins to rise significantly. For a woman, risk starts to increase at age 55.\n■Fifty percent of men and 64 percent of women who die suddenlyof heart disease have no previous symptoms of the disease.1Heart Disease: Why Should You Care?Heart Disease:')]

LLM Loading

In [None]:
!pip install transformers torch



In [None]:
model_dir = '/content/drive/MyDrive/DataHeart'

In [None]:
import os
import requests

# Define the model and the destination in your Google Drive
model_repo = "MaziyarPanahi/BioMistral-7B-GGUF"
drive_path = "/content/drive/My Drive/DataHeart"

# Create the directory if it doesn't exist
os.makedirs(drive_path, exist_ok=True)

# Download the model files from Hugging Face (adjust based on available files)
# You might need to adjust these URLs based on the actual files available on Hugging Face
file_names = ["BioMistral-7B.Q2_K.gguf"]  # Add other necessary files if required

for file_name in file_names:
    url = f"https://huggingface.co/{model_repo}/resolve/main/{file_name}"
    response = requests.get(url)

    with open(os.path.join(drive_path, file_name), 'wb') as f:
        f.write(response.content)

print("Model files downloaded to Drive.")


Model files downloaded to Drive.


In [None]:
llm=LlamaCpp(model_path="/content/drive/MyDrive/DataHeart/BioMistral-7B.Q2_K.gguf",
             temperature=0.2,
             max_tokens=2048,
             top_p=1)

llama_model_loader: loaded meta data with 21 key-value pairs and 291 tensors from /content/drive/MyDrive/DataHeart/BioMistral-7B.Q2_K.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = hub
llama_model_loader: - kv   2:                       llama.context_length u32              = 32768
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv   4:                          llama.block_count u32              = 32
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 14336
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv   7:                 llama.attent

generating response using retrieval and query passed to the LLM

In [None]:

template = """
<|context|>
You are an Medical Assistant that follows the instructions and generate the accurate response based on the query and the context provided.
Please be truthful and give direct answers.

<|user|>
{query}

<|assistant|>
"""

In [None]:
from langchain.schema.runnable import RunnablePassthrough #pass all query to the stages
from langchain.schema.output_parser import StrOutputParser
from langchain.prompts import ChatPromptTemplate #to use the template created

In [None]:
prompt = ChatPromptTemplate.from_template(template)

In [None]:
rag_chain = (
    {"context": retriever, "query": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)#text to pass through retriever and query to be passed through the stages
#prompt is a combo of the retriever and query
#llm generates the response

In [None]:
response = rag_chain.invoke(query)


llama_print_timings:        load time =    2641.79 ms
llama_print_timings:      sample time =      39.47 ms /    73 runs   (    0.54 ms per token,  1849.74 tokens per second)
llama_print_timings: prompt eval time =   25967.04 ms /    68 tokens (  381.87 ms per token,     2.62 tokens per second)
llama_print_timings:        eval time =   37938.91 ms /    72 runs   (  526.93 ms per token,     1.90 tokens per second)
llama_print_timings:       total time =   64012.36 ms /   140 tokens


In [None]:
response

'The symptoms of type 1 diabetes include: (1) Increased thirst and urination, (2) Increased appetite, (3) Weight loss, (4) Blurred vision, (5) Weakness, (6) Irritability, (7) Ketoacidosis, (8) High blood sugar levels.'

to make the query run forver as a conversation

In [None]:
import sys

while True:
  user_input = input(f"Input query: ")
  if user_input == 'exit':
    print("Exiting...")
    sys.exit()
  if user_input=="":
    continue
  result = rag_chain.invoke(user_input)
  print("Answer: ", result)

Input query: what is this document about


Llama.generate: 52 prefix-match hit, remaining 14 prompt tokens to eval

llama_print_timings:        load time =    2641.79 ms
llama_print_timings:      sample time =      68.29 ms /   114 runs   (    0.60 ms per token,  1669.33 tokens per second)
llama_print_timings: prompt eval time =    4671.64 ms /    14 tokens (  333.69 ms per token,     3.00 tokens per second)
llama_print_timings:        eval time =   60838.23 ms /   113 runs   (  538.39 ms per token,     1.86 tokens per second)
llama_print_timings:       total time =   65686.97 ms /   127 tokens


Answer:  This document is about a research study conducted by a team of researchers from different universities in the United States. The study was designed to investigate the effectiveness of a new treatment for a rare genetic disorder called LGMD2A. LGMD2A is a disease that affects muscle and can cause weakness and difficulty with movement. The treatment used in the study was a type of exercise therapy, which involved a combination of exercises aimed at improving strength and mobility. The researchers wanted to know if this treatment would improve symptoms and function in people with LGMD2A.
Input query: what are the risk factors of heart diseases


Llama.generate: 53 prefix-match hit, remaining 16 prompt tokens to eval

llama_print_timings:        load time =    2641.79 ms
llama_print_timings:      sample time =      21.77 ms /    39 runs   (    0.56 ms per token,  1791.13 tokens per second)
llama_print_timings: prompt eval time =    6666.11 ms /    16 tokens (  416.63 ms per token,     2.40 tokens per second)
llama_print_timings:        eval time =   19992.39 ms /    38 runs   (  526.12 ms per token,     1.90 tokens per second)
llama_print_timings:       total time =   26710.05 ms /    54 tokens


Answer:  The risk factors for heart disease include smoking, high blood pressure, high cholesterol levels, overweight and obesity, a poor diet, physical inactivity, and diabetes.
Input query: what is the age where the symptoms become prominent


Llama.generate: 53 prefix-match hit, remaining 17 prompt tokens to eval

llama_print_timings:        load time =    2641.79 ms
llama_print_timings:      sample time =      26.46 ms /    47 runs   (    0.56 ms per token,  1776.27 tokens per second)
llama_print_timings: prompt eval time =    5850.30 ms /    16 tokens (  365.64 ms per token,     2.73 tokens per second)
llama_print_timings:        eval time =   24916.33 ms /    47 runs   (  530.13 ms per token,     1.89 tokens per second)
llama_print_timings:       total time =   30831.34 ms /    63 tokens


Answer:  The symptoms of Alzheimer’s disease typically become apparent in a person’s early 50s, but they can occur as early as in one’s mid-30s or late 40s.
Input query: how blood cholestrol effects heart health


Llama.generate: 52 prefix-match hit, remaining 18 prompt tokens to eval

llama_print_timings:        load time =    2641.79 ms
llama_print_timings:      sample time =      28.51 ms /    50 runs   (    0.57 ms per token,  1754.02 tokens per second)
llama_print_timings: prompt eval time =    6715.14 ms /    18 tokens (  373.06 ms per token,     2.68 tokens per second)
llama_print_timings:        eval time =   26424.28 ms /    49 runs   (  539.27 ms per token,     1.85 tokens per second)
llama_print_timings:       total time =   33208.90 ms /    67 tokens


Answer:  Blood cholesterol is a type of fat that circulates in your bloodstream. It can be measured by blood tests. High levels of blood cholesterol are linked to a higher risk of heart disease and other health problems.
Input query: exit
Exiting...


SystemExit: 

  warn("To exit: use 'exit', 'quit', or Ctrl-D.", stacklevel=1)
