# Qdrant DSPy Medical Chatbot

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/qdrant/examples/blob/master/DSPy-medical-bot/medical_bot_DSPy_Qdrant.ipynb)

This notebook demonstrates how to build a chatbot grounded on medical data with simple guardrails to prevent the bot from responding to non-medical questions.

### Requirements
- Qdrant
- DSPy
- A free [Qdrant Cloud account](https://qdrant.tech/cloud/)

The first step is to install all the required packages:

In [1]:
pip install   qdrant-client[fastembed] dspy-ai dspy-qdrant

Collecting dspy-ai
  Downloading dspy_ai-3.0.2-py3-none-any.whl.metadata (285 bytes)
Collecting dspy-qdrant
  Downloading dspy_qdrant-0.1.3-py3-none-any.whl.metadata (2.8 kB)
Collecting qdrant-client[fastembed]
  Downloading qdrant_client-1.15.1-py3-none-any.whl.metadata (11 kB)
Collecting fastembed<0.8,>=0.7 (from qdrant-client[fastembed])
  Downloading fastembed-0.7.1-py3-none-any.whl.metadata (10 kB)
Collecting portalocker<4.0,>=2.7.0 (from qdrant-client[fastembed])
  Downloading portalocker-3.2.0-py3-none-any.whl.metadata (8.7 kB)
Collecting dspy>=3.0.2 (from dspy-ai)
  Downloading dspy-3.0.2-py3-none-any.whl.metadata (7.1 kB)
Collecting backoff>=2.2 (from dspy>=3.0.2->dspy-ai)
  Downloading backoff-2.2.1-py3-none-any.whl.metadata (14 kB)
Collecting ujson>=5.8.0 (from dspy>=3.0.2->dspy-ai)
  Downloading ujson-5.11.0-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.metadata (9.4 kB)
Collecting optuna>=3.4.0 (from dspy>=3.0.2->dspy-ai)
  Downloading optuna-4.5.0-py3-none-a

In [5]:
!pip install -U datasets



Import all the modules needed for this project.

In [6]:
from datasets import load_dataset
from qdrant_client import QdrantClient, models
from google.colab import userdata
import dspy
from dspy_qdrant import QdrantRM
from qdrant_client import QdrantClient, models
from qdrant_client.models import Filter, FieldCondition, MatchValue

# Connect to Qdrant Cloud

Obtain your Qdrant API key and URL from the Qdrant Cloud dashboard and save them as secrets on Colab. If running locally, create a `.env` file and save them there, then modify the part below accordingly.

In [7]:
client = QdrantClient(
    url=userdata.get("https://22759b96-17a5-4178-a282-980ee712b31d.us-west-2-0.aws.cloud.qdrant.io:6333"),
    api_key=userdata.get("eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJhY2Nlc3MiOiJtIn0.pdLHBp8-SNdeGB6lD2pjYkju7qOa8Zrj5YjeB0aOqYk"),
    timeout=60.0,
    prefer_grpc=True
)

SecretNotFoundError: Secret https://22759b96-17a5-4178-a282-980ee712b31d.us-west-2-0.aws.cloud.qdrant.io:6333 does not exist.

# Load MIRIAD dataset (sample for demo)

Below, we load a sample of the Miriad medical dataset. You can also modify this to use the [entire dataset](https://huggingface.co/datasets/miriad/miriad-5.8M), but uploading the vectors will take longer. Colab also doesn't provide enough resources to store the entire downloaded dataset.

In [None]:
ds = load_dataset("mwitiderrick/miriad-1k", split="train")

## Generate embeddings

Next, encode the medical data:

In [None]:
dense_documents = [
    models.Document(text=doc, model="BAAI/bge-small-en")
    for doc in ds['passage_text']
]

colbert_documents = [
    models.Document(text=doc, model="colbert-ir/colbertv2.0")
    for doc in ds['passage_text']
]

# Create collection

Then create a Qdrant collection with the dense and colbert vectors. Note that we leave indexing on for the dense vector but turn it off for the colbert vector that will be used for reranking. Checkout the [How to Effectively Use Multivector Representations](https://qdrant.tech/documentation/advanced-tutorials/using-multivector-representations/) in Qdrant for Reranking tutorial to learn more about this.


In [None]:
collection_name = "medical_QA"

if not client.collection_exists(collection_name):
    client.create_collection(
        collection_name=collection_name,
        vectors_config={
            "dense": models.VectorParams(size=384, distance=models.Distance.COSINE),
            "colbert": models.VectorParams(
                size=128,
                distance=models.Distance.COSINE,
                multivector_config=models.MultiVectorConfig(
                    comparator=models.MultiVectorComparator.MAX_SIM
                ),
                hnsw_config=models.HnswConfigDiff(m=0)  # reranker: no indexing
            )
        }
    )

    # Create payload indexes
    client.create_payload_index(
        collection_name=collection_name,
        field_name="specialty",
        field_schema="keyword",
    )

    client.create_payload_index(
        collection_name=collection_name,
        field_name="year",
        field_schema="integer",
    )

We create indexes from the `specialty` and `year` columns since we will be filtering the responses from the vector database using them. Adding the payload indexes speeds up filtering. Checkout our [Indexing guide](https://qdrant.tech/documentation/concepts/indexing/) to learn more.

# Batch upload in chunks

ColBERT produces ~1k vectors per point, so we will batch upload the vectors to avoid hitting API limits.

In [None]:
BATCH_SIZE = 3
points_batch = []

for i in range(len(ds['passage_text'])):
    point = models.PointStruct(
        id=i,
        vector={
            "dense": dense_documents[i],
            "colbert": colbert_documents[i]
        },
        payload={
        "passage_text": ds['passage_text'][i],
         "year": ds['year'][i],
         "specialty": ds['specialty'][i],
         }
    )
    points_batch.append(point)

    if len(points_batch) == BATCH_SIZE:
        client.upsert(collection_name=collection_name, points=points_batch)
        print(f"Uploaded batch ending at index {i}")
        points_batch = []

# Final flush
if points_batch:
    client.upsert(collection_name=collection_name, points=points_batch)
    print("Uploaded final batch.")

# DSPy setup

Next, setup [DSPy and Qdrant](https://qdrant.tech/documentation/frameworks/dspy/) to work together.

In [4]:
lm = dspy.LM("gpt-4", max_tokens=512,api_key=userdata.get("eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJhY2Nlc3MiOiJtIn0.pdLHBp8-SNdeGB6lD2pjYkju7qOa8Zrj5YjeB0aOqYk"))
client = QdrantClient(url=userdata.get("https://22759b96-17a5-4178-a282-980ee712b31d.us-west-2-0.aws.cloud.qdrant.io:6333"), api_key=userdata.get("eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJhY2Nlc3MiOiJtIn0.pdLHBp8-SNdeGB6lD2pjYkju7qOa8Zrj5YjeB0aOqYk"))

NameError: name 'dspy' is not defined

## Setup QdrantRM

QdrantRM is a retrieval module that uses Qdrant to return the top passages for a given query. DSPy can then use these passages to provide the best response.



In [None]:
rm = QdrantRM(
    qdrant_collection_name=collection_name,
    qdrant_client=client,
    vector_name="dense",                 # <-- MATCHES your vector field in upsert
    document_field="passage_text",        # <-- MATCHES your payload field in upsert
    k=20)

dspy.settings.configure(lm=lm, rm=rm)

# Manual reranker using Qdrant’s native prefetch + ColBERT query

Next, we define a function using Qdrant's multi-vector search capabilities with both dense and late-interaction (ColBERT-style) embeddings for retrieval and reranking.

We also include filtering by year and speciality. This way one can filter results for a specific speciality and within a certain year range. This is important for getting the most recent information. Checkout our [Filtering guide](https://qdrant.tech/documentation/concepts/filtering/) to learn more about filtering data from Qdrant.

In [None]:
def rerank_with_colbert(query_text, min_year, max_year, specialty):
    from fastembed import TextEmbedding, LateInteractionTextEmbedding

    # Encode query once with both models
    dense_model = TextEmbedding("BAAI/bge-small-en")
    colbert_model = LateInteractionTextEmbedding("colbert-ir/colbertv2.0")

    dense_query = list(dense_model.embed(query_text))[0]
    colbert_query = list(colbert_model.embed(query_text))[0]

    # Combined query: retrieve with dense, rerank with ColBERT
    results = client.query_points(
        collection_name=collection_name,
        prefetch=models.Prefetch(
            query=dense_query,
            using="dense"
        ),
        query=colbert_query,
        using="colbert",
        limit=5,
        with_payload=True,
        query_filter=Filter(
            must=[
                FieldCondition(key="specialty", match=MatchValue(value=specialty)),
                FieldCondition(key="year",range=models.Range(gt=None,gte=min_year,lt=None,lte=max_year))
            ]

        )
    )

    points = results.points
    docs = []

    for point in points:
        docs.append(point.payload['passage_text'])

    return docs


# DSPy Signature and Module

Define a DSPy signature:

In [None]:
class MedicalAnswer(dspy.Signature):
    question = dspy.InputField(desc="The medical question to answer")
    is_medical = dspy.OutputField(desc="Answer 'Yes' if the question is medical, otherwise 'No'")
    min_year = dspy.InputField(desc="The minimum year of the medical paper")
    max_year = dspy.InputField(desc="The maximum year of the medical paper")
    specialty = dspy.InputField(desc="The specialty of the medical paper")
    context = dspy.OutputField(desc="The answer to the medical question")
    final_answer = dspy.OutputField(desc="The answer to the medical question")


## Setup Guardrail

Setup a simple guardrail to ensure the system doesn't respond to non-medical questions.

In [None]:
class MedicalGuardrail(dspy.Module):
    def forward(self, question):
        prompt = (
            "Is the following question a medical question? Answer with 'Yes' or 'No'.\n"
            f"Question: {question}\n"
            "Answer:"
        )
        response = dspy.settings.lm(prompt)
        answer = response[0].strip().lower()
        return answer.startswith("yes")

# Configure MedicalRAG

Setup the medical RAG with chain of thought reasoning:

In [None]:
class MedicalRAG(dspy.Module):
    def __init__(self):
        super().__init__()
        self.guardrail = MedicalGuardrail()

    def forward(self, question, min_year, max_year, specialty):
        if not self.guardrail.forward(question):
            class DummyResult:
                final_answer = "Sorry, I can only answer medical questions. Please ask a question related to medicine or healthcare."
            return DummyResult()
        reranked_docs = rerank_with_colbert(question, min_year, max_year, specialty)
        context_str = "\n".join(reranked_docs)
        return dspy.ChainOfThought(MedicalAnswer)(
            question=question,
            min_year=min_year,
            max_year=max_year,
            specialty=specialty,
            context=context_str
        )


## Define Sample Question

Define some sample questions:

In [None]:
sample_questions = [
    "What are the most common symptoms of lupus?",
    "How is type 2 diabetes usually managed in adults?",
    "What are the first-line medications for treating hypertension?",
]

List all the specialities in the dataset for reference:

In [None]:
specialty_options = [
                "Rheumatology", "Psychiatry", "Pulmonology & Respiratory Medicine", "Nephrology", "Public Health & Epidemiology",
                "Medical Research & Methodology", "Pharmacy & Pharmacology", "Hematology", "Oncology", "Medical Ethics & Law",
                "Medical Technology & Informatics", "Infectious Disease", "Basic Medical Sciences", "Allergology", "Geriatrics",
                "Cardiology", "Gastroenterology & Hepatology", "General Surgery", "General Pediatrics", "Endocrinology & Metabolism",
                "Vascular Surgery", "Radiology & Imaging", "Obstetrics & Gynecology", "Orthopedic Surgery", "Neurology",
                "Family Medicine & Primary Care", "Psychology & Behavioral Health", "Otorhinolaryngology (ENT)", "General Internal Medicine",
                "Anesthesiology", "Physical & Rehabilitation Medicine", "Medical Education", "Healthcare Administration & Management",
                "Non-Medical Sciences & Disciplines", "Dermatology", "Critical Care & Intensive Care", "Urology", "Complementary & Alternative Medicine",
                "Cardiothoracic Surgery", "Neurosurgery", "Pediatric Subspecialties", "Occupational & Environmental Health", "Ophthalmology",
                "Emergency Medicine", "Dental & Oral Medicine", "Biomedical Engineering", "Pathology & Laboratory Medicine", "Transplant Surgery",
                "Preventive Medicine", "Genetics", "Nursing", "Allied Health Professions", "Plastic & Reconstructive Surgery", "Others",
                "Toxicology", "General Medicine"
            ]

Run the medical RAG with a sample question:

In [None]:
rag_chain = MedicalRAG()
min_year = 1990
max_year = 2021
specialty = specialty_options[0]
result = rag_chain.forward(sample_questions[0], min_year, max_year, specialty)

In [None]:
print(result.reasoning)

The question is asking for the most common symptoms of a specific medical condition, lupus, which falls under the specialty of Rheumatology. Therefore, the question is medical in nature and requires a response based on medical knowledge and research.


In [None]:
print(result.is_medical)

Yes


In [None]:
print(result.final_answer)

The most common symptoms of lupus are fatigue, joint pain and swelling, skin rashes (particularly a butterfly-shaped rash across the cheeks and nose), fever, chest pain, hair loss, mouth sores, sensitivity to sunlight, and swollen lymph nodes.


Test the medical RAG application with a non-medical question:

In [None]:
result = rag_chain.forward("How is the weather today?", min_year, max_year, specialty)

In [None]:
print(result.final_answer)

Sorry, I can only answer medical questions. Please ask a question related to medicine or healthcare.


# Conclusion

Multivector search is one of the most powerful features for building a medical RAG application. With this functionality in Qdrant, you can:

- Store token-level embeddings natively. Disable indexing to reduce overhead.

- Run fast retrieval and accurate reranking in one API call.

- Efficiently scale late interaction. Combining FastEmbed and Qdrant leads to a production-ready pipeline for ColBERT-style reranking without wasting resources.

 You can do this locally or use Qdrant Cloud. Qdrant offers an easy-to-use API to get started with your search engine, so if you’re ready to dive in, sign up for free at [Qdrant Cloud](https://qdrant.tech/documentation/cloud-intro/) and start building.