# 📘 RAG with Qdrant, Google Gemini, and LangChain

This project demonstrates how to build a **Retrieval-Augmented Generation (RAG)** pipeline using **LangChain Expression Language (LCEL)**, **Qdrant** as a vector store, and **Google Gemini** as the LLM. It allows you to query knowledge from your own PDF documents, combining **retrieval** and **generation** in a clean, modular pipeline.

## 1️⃣ Load Environment Variables

Load the API keys, Qdrant URL, and collection name from the `.env` file to keep secrets secure.

In [1]:
import os
from dotenv import load_dotenv

# Load environment variables
load_dotenv()
QDRANT_URL = os.getenv("QDRANT_URL")
QDRANT_API_KEY = os.getenv("QDRANT_API_KEY")
QDRANT_COLLECTION = os.getenv("QDRANT_COLLECTION")
GOOGLE_API_KEY = os.getenv("GOOGLE_API_KEY")

## 2️⃣ Load and Split PDFs

Load the PDF documents and split them into chunks for embeddings. This step converts raw PDFs into smaller, manageable text segments.

In [2]:
from utils.loader import load_and_split_pdfs
from utils.embeddings import get_embedding_model
from utils.ingest import ingest_documents
from utils.retriever import get_retriever, format_docs
from utils.llm import get_llm_model
from utils.prompt import get_rag_prompt, get_chat_rag_prompt
from langchain.schema.runnable import RunnableLambda
from langchain.schema import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

# List of PDF files to process
PDFS = [
    "data/eldoria_history.pdf",
    "data/quantum_drive_specs.pdf",
    "data/zirconia_recipes.pdf"
]

# Step 1: Load and split PDFs into chunks
docs = load_and_split_pdfs(PDFS)

📚 Loaded and split 3 PDF(s) into 9 chunks


## 3️⃣ Generate Embeddings

Use Gemini (or Google) embeddings to convert text chunks into vector representations for retrieval.

In [3]:
# Step 2: Initialize embedding model
embedding_model = get_embedding_model(GOOGLE_API_KEY)

# 4️⃣ Ingest Documents into Qdrant

Store embeddings into Qdrant vector database. Skips ingestion if data already exists unless `force_update=True`.

In [4]:
# Step 3: Ingest documents into Qdrant vectorstore
ingest_documents(QDRANT_URL, QDRANT_API_KEY, QDRANT_COLLECTION, embedding_model, docs)

ℹ️ Using existing Qdrant collection 'my_rag_collection'


E0000 00:00:1759346235.862929   16292 alts_credentials.cc:93] ALTS creds ignored. Not running on GCP and untrusted ALTS is not enabled.


✅ Collection 'my_rag_collection' already has 9 vectors. Skipping ingestion.


<langchain_qdrant.qdrant.QdrantVectorStore at 0x71fc5e575640>

## 5️⃣ Build Retriever and LLM Pipeline

Setup the retriever to fetch relevant documents and the LLM to generate answers using a RAG pipeline.

In [5]:
# Step 4: Build retriever + LLM
retriever = get_retriever(QDRANT_URL, QDRANT_API_KEY, QDRANT_COLLECTION, embedding_model, k=3)
llm = get_llm_model(GOOGLE_API_KEY, model="models/gemini-flash-latest")

# Step 5: Build prompt template
prompt = get_chat_rag_prompt()

# Step 6: Build RAG pipeline with LangChain Expressions (LCEL)
rag_chain = (
    {
        "context": retriever | RunnableLambda(format_docs),  # retrieve and format docs
        "question": RunnablePassthrough()  # pass the question as is
    }
    | prompt  # add the RAG prompt
    | llm     # generate the answer
    | StrOutputParser()  # parse output to string
)

ℹ️ Using existing Qdrant collection 'my_rag_collection'


## 6️⃣ Test RAG Pipeline

Run a few example queries to see how the RAG pipeline generates answers using retrieved documents.

In [6]:
# Example queries to test the pipeline
queries = [
    "What led to the fall of the Kingdom of Eldoria?",
    "How does the Starlink Quantum Drive create thrust?",
    "What are the ingredients and effects of Crystal Soup from Zirconia?"
]

for q in queries:
    print("\n[QUERY]", q)
    result = rag_chain.invoke(q)
    print("[ANSWER]", result)


[QUERY] What led to the fall of the Kingdom of Eldoria?


E0000 00:00:1759346239.242533   16401 alts_credentials.cc:93] ALTS creds ignored. Not running on GCP and untrusted ALTS is not enabled.


[ANSWER] The fall of the Kingdom of Eldoria was caused by internal strife among rival noble houses, sparking decades of civil war that drained the nation's strength. This was further destabilized by the rise of the Great Shadow War, where creatures of darkness invaded the borderlands, and the withdrawal of dragons to their mountain strongholds.

[QUERY] How does the Starlink Quantum Drive create thrust?
[ANSWER] The Starlink Quantum Drive creates a localized warp bubble by manipulating entangled states within the Quantum Resonance Chamber (QRC). Within this bubble, space contracts in front of the ship and expands behind it, propelling the craft forward without traditional thrust.

[QUERY] What are the ingredients and effects of Crystal Soup from Zirconia?
[ANSWER] Crystal Soup is made from ground luminous crystals and spiced with plasma herbs. Locals believe the soup enhances stamina and sharpens the senses.


## 7️⃣ Evaluate Generated Answers

Evaluate RAG-generated answers against expected answers using multiple text metrics: Exact Match, ROUGE, BERTScore, BLEU, and METEOR.

In [7]:
from utils.evaluate import evaluate_rag

# Evaluation queries with expected answers
eval_queries = [
    {
        "query": "What led to the fall of the Kingdom of Eldoria?",
        "expected_answer": "The Kingdom of Eldoria fell due to internal strife, economic decline, and invasions by neighboring realms."
    },
    {
        "query": "How does the Starlink Quantum Drive create thrust?",
        "expected_answer": "The Starlink Quantum Drive generates thrust by manipulating quantum fields to produce directed energy propulsion."
    },
    {
        "query": "What are the ingredients and effects of Crystal Soup from Zirconia?",
        "expected_answer": "Crystal Soup contains Zirconia crystals, moon herbs, and ethereal broth, granting temporary enhanced vision and clarity."
    }
]

# Evaluate RAG pipeline
results = evaluate_rag(
    rag_chain=rag_chain,
    eval_queries=eval_queries,
    metrics=["exact_match", "rouge", "bert", "bleu", "meteor"]
)

# Display results for each query
for r in results:
    print("\n---")
    print(f"Query: {r['query']}")
    print(f"Expected: {r['expected']}")
    print(f"Generated: {r['generated']}")
    print("Metrics:")
    for metric, score in r["metrics"].items():
        print(f"  {metric}: {score}")

Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['pooler.dense.bias', 'pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['pooler.dense.bias', 'pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['pooler.dense.bias', 'pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.



---
Query: What led to the fall of the Kingdom of Eldoria?
Expected: The Kingdom of Eldoria fell due to internal strife, economic decline, and invasions by neighboring realms.
Generated: The fall of Eldoria was caused by internal strife among rival noble houses, sparking decades of civil war, and the rise of the Great Shadow War, during which creatures of darkness invaded the borderlands. Additionally, the dragons, once allies, withdrew to their mountain strongholds, refusing to participate in the human conflict.
Metrics:
  exact_match: 0
  rouge: {'rouge-1': {'r': 0.4375, 'p': 0.16279069767441862, 'f': 0.23728813164033327}, 'rouge-2': {'r': 0.06666666666666667, 'p': 0.02, 'f': 0.03076922721893532}, 'rouge-l': {'r': 0.375, 'p': 0.13953488372093023, 'f': 0.20338982655558752}}
  bert_f1: 0.895494818687439
  bleu: 0.010393938326032186
  meteor: "hypothesis" expects pre-tokenized hypothesis (Iterable[str]): The fall of Eldoria was caused by internal strife among rival noble houses, sparki

## 8️⃣ Evaluate Retriever Performance

Check how well the retriever retrieves the correct documents using **Recall@k**, **Precision@k**, and **Mean Reciprocal Rank (MRR)**.

In [8]:
from utils.evaluate import recall_at_k, precision_at_k, mean_reciprocal_rank

# True documents by source path for each query
true_docs = [
    ["data/eldoria_history.pdf"],
    ["data/quantum_drive_specs.pdf"],
    ["data/zirconia_recipes.pdf"]
]

# Predicted docs by retriever
pred_docs = []
for q in eval_queries:
    retrieved = retriever.invoke(q["query"])  # use invoke() instead of deprecated method
    pred_docs.append([doc.metadata.get("source") for doc in retrieved])

# Compute retriever metrics
print("Recall@3:", recall_at_k(pred_docs, true_docs, k=3))
print("Precision@3:", precision_at_k(pred_docs, true_docs, k=3))
print("MRR:", mean_reciprocal_rank(pred_docs, true_docs))

Recall@3: 1.0
Precision@3: 0.3333333333333333
MRR: 1.0
