BioMistral Medical RAG Chatbot using BioMistral Open Source LLM

In [1]:
#Load the Google Drive
from google.colab import drive
drive.mount("/content/drive")

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [2]:
#Installation of Required Packages
!pip install langchain sentence-transformers chromadb llama-cpp-python langchain_community pypdf



In [3]:
!pip install gradio



In [4]:
#Importing Required Libraries
#1. Loading the Dataset Document and Read the Text from the Document
from langchain_community.document_loaders import PyPDFDirectoryLoader
#2. Divide the Text into Chunks
from langchain.text_splitter import RecursiveCharacterTextSplitter
#3. Generating Embeddings for the Chunks
from langchain_community.embeddings import SentenceTransformerEmbeddings
#4. Store the embeddings into the vector store
from langchain.vectorstores import Chroma
#5. Indexing Completed
#6. LLM Model
from langchain_community.llms import LlamaCpp
#7. Setting Up the Retrieval-Augmented Generation (RAG) Pipeline
from langchain.chains import RetrievalQA, LLMChain
from langchain.prompts import ChatPromptTemplate
from langchain.schema.runnable import RunnablePassthrough
from langchain.schema.output_parser import StrOutputParser
import os
#8. Web Interface Using Gradio
import gradio as gr

In [5]:
# Load the medical documents
loader = PyPDFDirectoryLoader("/content/drive/MyDrive/HeartHealth")
docs = loader.load()

In [6]:
len(docs)

95

In [7]:
docs[5]

Document(metadata={'producer': 'Acrobat Distiller 6.0.1 for Macintosh', 'creator': 'QuarkXPress(tm) 6.5', 'creationdate': '2006-02-16T11:30:29-05:00', 'subject': 'Heart disease', 'author': 'NHLBI', 'keywords': 'heart disease, prevention, risk factors, chd, coronary artery disease, corornary heart disease, cad', 'moddate': '2006-02-23T09:58:15-05:00', 'title': 'Your Guide to A Healthy Heart', 'source': '/content/drive/MyDrive/HeartHealth/healthyheart.pdf', 'total_pages': 95, 'page': 5, 'page_label': '6'}, page_content='If you’re like many people, you may think of heart disease as a\nproblem that happens to other folks. “I feel fine,” you may think,\n“so I have nothing to worry about.” If you’re a woman, you may\nalso believe that being female protects you from heart disease.\nIf you’re a man, you may think you’re not old enough to have a\nserious heart condition.\nWrong on all counts. In the United States, heart disease is the #1\nkiller of both women and men. It affects many people at 

In [8]:
# Split text into chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=300, chunk_overlap=50)
chunks = text_splitter.split_documents(docs)

In [9]:
len(chunks)

585

In [10]:
chunks[543]

Document(metadata={'producer': 'Acrobat Distiller 6.0.1 for Macintosh', 'creator': 'QuarkXPress(tm) 6.5', 'creationdate': '2006-02-16T11:30:29-05:00', 'subject': 'Heart disease', 'author': 'NHLBI', 'keywords': 'heart disease, prevention, risk factors, chd, coronary artery disease, corornary heart disease, cad', 'moddate': '2006-02-23T09:58:15-05:00', 'title': 'Your Guide to A Healthy Heart', 'source': '/content/drive/MyDrive/HeartHealth/healthyheart.pdf', 'total_pages': 95, 'page': 86, 'page_label': '87'}, page_content='82Your Guide to a Healthy Heart\nAspirin:Take With Caution\nThis well-known “wonder drug” is an antiplatelet\nmedicine that can help to lower the risk of a heart\nattack or stroke for those who have already had\none. Aspirin also can help to keep arteries open')

In [11]:
os.environ['HUGGINGFACEHUB_API_TOKEN'] = "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"

In [12]:
# Generate embeddings
embeddings = SentenceTransformerEmbeddings(model_name="NeuML/pubmedbert-base-embeddings")

  embeddings = SentenceTransformerEmbeddings(model_name="NeuML/pubmedbert-base-embeddings")
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


In [14]:
# Perform a semantic search in the vector store to find relevant text chunks
# related to the query "Who is at risk of Heart disease?"
# The search retrieves document chunks that are most similar in meaning to the query.
query = "Who is at risk of Heart disease?"
search_results = vectorstore.similarity_search(query)

# Display the retrieved document chunks containing relevant information.
search_results

[Document(metadata={'author': 'NHLBI', 'creationdate': '2006-02-16T11:30:29-05:00', 'creator': 'QuarkXPress(tm) 6.5', 'keywords': 'heart disease, prevention, risk factors, chd, coronary artery disease, corornary heart disease, cad', 'moddate': '2006-02-23T09:58:15-05:00', 'page': 8, 'page_label': '9', 'producer': 'Acrobat Distiller 6.0.1 for Macintosh', 'source': '/content/drive/MyDrive/HeartHealth/healthyheart.pdf', 'subject': 'Heart disease', 'title': 'Your Guide to A Healthy Heart', 'total_pages': 95}, page_content='4\nWho Is at Risk?\nRisk factors are conditions or habits that make a person more likely\nto develop a disease. They can also increase the chances that an\nexisting disease will get worse. Important risk factors for heart dis-\nease that you can do something about are cigarette smoking, high'),
 Document(metadata={'author': 'NHLBI', 'creationdate': '2006-02-16T11:30:29-05:00', 'creator': 'QuarkXPress(tm) 6.5', 'keywords': 'heart disease, prevention, risk factors, chd, co

In [15]:
# Step 1: Create a vector store (ChromaDB) from document chunks and their embeddings
vectorstore = Chroma.from_documents(chunks, embeddings)

# Step 2: Convert the vector store into a retriever
# The retriever will return the top 'k' most relevant document chunks for a given query
retriever = vectorstore.as_retriever(search_kwargs={'k': 3})

# Step 3: Retrieve the top 3 most relevant documents for the query
# Display the retrieved document chunks
retriever.get_relevant_documents(query)

  retriever.get_relevant_documents(query)


[Document(metadata={'author': 'NHLBI', 'creationdate': '2006-02-16T11:30:29-05:00', 'creator': 'QuarkXPress(tm) 6.5', 'keywords': 'heart disease, prevention, risk factors, chd, coronary artery disease, corornary heart disease, cad', 'moddate': '2006-02-23T09:58:15-05:00', 'page': 8, 'page_label': '9', 'producer': 'Acrobat Distiller 6.0.1 for Macintosh', 'source': '/content/drive/MyDrive/HeartHealth/healthyheart.pdf', 'subject': 'Heart disease', 'title': 'Your Guide to A Healthy Heart', 'total_pages': 95}, page_content='4\nWho Is at Risk?\nRisk factors are conditions or habits that make a person more likely\nto develop a disease. They can also increase the chances that an\nexisting disease will get worse. Important risk factors for heart dis-\nease that you can do something about are cigarette smoking, high'),
 Document(metadata={'author': 'NHLBI', 'creationdate': '2006-02-16T11:30:29-05:00', 'creator': 'QuarkXPress(tm) 6.5', 'keywords': 'heart disease, prevention, risk factors, chd, co

In [16]:
# Load Llama model
llm = LlamaCpp(
    model_path="/content/drive/MyDrive/BioMistral-7B.Q4_K_M.gguf",
    temperature=0.2,
    max_tokens=2048,
    top_p=1
)

llama_model_loader: loaded meta data with 21 key-value pairs and 291 tensors from /content/drive/MyDrive/BioMistral-7B.Q4_K_M.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = hub
llama_model_loader: - kv   2:                       llama.context_length u32              = 32768
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv   4:                          llama.block_count u32              = 32
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 14336
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv   7:                 llama.attention.head

In [17]:
# Define Prompt Template
template = """
<|context|>
You are Medical Assistant that follows the instructions and generate the accurate response based on the query and the context provided.
Be truthful and give direct answers.
</s>
{query}
</s>
<|assistant|>
"""

prompt = ChatPromptTemplate.from_template(template)

In [18]:
# Create a Retrieval-Augmented Generation (RAG) pipeline
# Step 1: Retrieve relevant document chunks using the retriever
# Step 2: Pass the query through as-is using RunnablePassthrough()
# Step 3: Format the retrieved context and query into a prompt using the prompt template
# Step 4: Send the formatted prompt to the LLM for generating a response
# Step 5: Parse the model's output into a string for readability

rag_chain = (
    {"context": retriever, "query": RunnablePassthrough()}  # Retrieve documents and pass query
    | prompt  # Format prompt with retrieved context
    | llm  # Generate response using LLM
    | StrOutputParser()  # Convert output into a readable string
)

In [19]:
response = rag_chain.invoke(query)# Get the final response from the RAG system

llama_perf_context_print:        load time =   27759.81 ms
llama_perf_context_print: prompt eval time =   27759.67 ms /    65 tokens (  427.07 ms per token,     2.34 tokens per second)
llama_perf_context_print:        eval time =   31581.59 ms /    49 runs   (  644.52 ms per token,     1.55 tokens per second)
llama_perf_context_print:       total time =   59391.20 ms /   114 tokens


In [20]:
response

'The risk factors for heart disease include high blood pressure, high cholesterol levels, smoking, family history of heart disease, diabetes, overweight and obesity. People who have these risk factors are more likely to develop heart disease.'

In [21]:
def medical_chatbot(query):
    """
    Function to process user queries using the RAG-based medical chatbot.

    Args:
        query (str): The user's input question.

    Returns:
        str: The generated response from the chatbot.
    """

    # Step 1: Validate the input query
    if not query.strip():
        return "⚠️ Please enter a valid query."

    # Step 2: Process the query through the RAG pipeline
    response = rag_chain.invoke(query)

    # Step 3: Return the chatbot's response
    return response

In [22]:
# Create Gradio Interface
chat_interface = gr.Interface(
    fn=medical_chatbot,
    inputs=gr.Textbox(label="Enter your medical question"),
    outputs=gr.Textbox(label="AI Response"),
    title="🩺 AI-Powered Heart Health Assistant",
    description="Ask any heart health-related questions, and I'll provide medical insights based on trusted sources.",
    theme="default",
    flagging_mode="never"
)

In [23]:
# Launch the app
chat_interface.launch(share=True)

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://3deff09d5975e32d48.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


