### Building a RAG System with LangChain and ChromaDB

#### Introduction
Retrieval-Augmented Generation (RAG) is a pwoerful technique that combines the capabilities of large
language models with external knowledge retrievla. This notebook will walk you through building a complete RAG system using:  

- LangChain: A framework fordeveloping applications powered by language models
- ChromaDB: An open-source vector database for storing and retrieving embeddings
- OpenAI: For embeddings and language model (you can substitue with other providers)

In [3]:
import os
from dotenv import load_dotenv
load_dotenv()

True

In [2]:
## Langchain imports
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import TextLoader, PyPDFLoader
from langchain_huggingface import HuggingFaceEndpointEmbeddings
from langchain.schema import Document

# vectorestore
from langchain.vectorstores import Chroma

# utility imports
import numpy as np
from typing import List

## Document Splitting

In [3]:
from typing import List, Any
import re

class SmartPDF:
    def __init__(self, chunk_size: int = 1000, chunk_overlap: int = 150):
        self.chunk_size = chunk_size
        self.chunk_overlap = chunk_overlap
        self.text_splitter = RecursiveCharacterTextSplitter(
            chunk_size=self.chunk_size,
            chunk_overlap=self.chunk_overlap,
            separators=[" "]
        )
        
    def process_pdf(self, pdf_path: str) -> List[Document]:
        """Processes the pdf"""
        
        # Loading the PDF
        loader = PyPDFLoader(pdf_path)
        pages = loader.load()
        
        processed_chunks = []
        
        for page_num, page in enumerate(pages):
            ## Clean the text
            cleaned_text = self._clean_text(page.page_content)
            
            # OPs Codes bekommen
            ops_codes = self._get_codes(page.page_content)
            
            ## Skip nearly empty page
            if len(cleaned_text.strip()) < 50:
                continue
            
            chunks = self.text_splitter.create_documents(
                texts=[cleaned_text],
                metadatas=[{
                    **page.metadata,
                    "page": page_num + 1,
                    "total_pages": len(pages),
                    "chunk_method": "smar_pdf_processor",
                    "char_count": len(cleaned_text),
                    "ops_code": ops_codes
                }]                                 
            )        
            
            processed_chunks.extend(chunks)
            
        return processed_chunks
    
    
    def _clean_text(self, text: str) -> str:
        """Clean extracted text"""
        
        text = " ".join(text.split())
        
        return text
    
    def _get_codes(self, text: str) -> Any:
        first_line = text.strip().split("\n")[0]
        pattern = r"^\s*(\d{1,2}-\d{2,3}(?:\.\d{1,3})?)"
        match = re.match(pattern, first_line)
        if match:
            code = match.group(1)
            # Nur valide OPS-Kodes zulassen (1- bis 9- am Anfang)
            if re.match(r"^[1-9]-\d{2,3}(?:\.\d{1,3})?$", code):
                return code
        return None

In [4]:
pdf_processor = SmartPDF()

try:
    chunks = pdf_processor.process_pdf("data/ops_2025.pdf")
    print(f"Processed {len(chunks)} chunks")
    for key, value in chunks[1254].metadata.items():
        print(f"{key}: {value}")   
except Exception as e:
    print(f"Error: {e}")
    

Processed 2097 chunks
producer: LibreOffice 7.5
creator: Writer
creationdate: 2024-10-16T11:52:46+02:00
author: BfArM
moddate: 2024-10-21T17:45:24+02:00
title: OPS Version 2025 Systematisches Verzeichnis
source: data/ops_2025.pdf
total_pages: 662
page: 407
page_label: 407
chunk_method: smar_pdf_processor
char_count: 2541
ops_code: 5-903


## Embedding

In [None]:
from langchain_community.document_loaders import WebBaseLoader

In [None]:
# Text Embeddings Inference
embeddings = HuggingFaceEndpointEmbeddings(model="http://localhost:8080")

def batched(chunks, batch_size: int = 32):
    for i in range(0, len(chunks), batch_size):
        yield chunks[i:i + batch_size]

texts = [text.page_content for text in chunks]
all_vectors = []

for batch in batched(texts, 32):
    vecs = embeddings.embed_documents(batch)
    all_vectors.extend(vecs)
    
all_vectors

## Vetorstore
Initialize ChromaDB and Store the chunks in vector representation

In [8]:
from langchain.vectorstores import FAISS

for batch in batched(chunks, 32):
    vectorestore = FAISS.from_documents(
        documents=batch,
        embedding=embeddings,
    )

    vectorestore.save_local("ops-2025")

### Test the similarity search

In [None]:
### Advanced similarity search

query = "Was meint der OPS Code 1-27"

result = vectorestore.similarity_search(query="Was ist OPS 1-20", k=10)
result

[Document(id='1913f407-9b06-494f-b9c2-cb57a0ea35fc', metadata={'producer': 'LibreOffice 7.5', 'creator': 'Writer', 'creationdate': '2024-10-16T11:52:46+02:00', 'author': 'BfArM', 'moddate': '2024-10-21T17:45:24+02:00', 'title': 'OPS Version 2025 Systematisches Verzeichnis', 'source': 'data/ops_2025.pdf', 'total_pages': 662, 'page': 660, 'page_label': '660', 'chunk_method': 'smar_pdf_processor', 'char_count': 1960, 'ops_code': None}, page_content='Katheter liegen! 4 20. Invasives Kreislaufmonitoring HZV-Messungen mittels PiCCO oder PA-Katheter oder FATD (femoral artery thermodilution) 5 21. Dialyse-Verfahren Hier sind alle Nierenersatzverfahren gemeint. Ein entsprechender OPS-Kode muss gesondert angegeben werden. 6 22. Intrakranielle Druckmessung (invasives Verfahren) 4 23. Therapie einer Alkalose oder Azidose 4 24. Spezielle Interventionen auf der Intensivstation z.B. Tracheotomie, Kardioversion Diese Punkte können nur einmal pro Tag angerechnet werden. 8 25. Aktionen außerhalb der Int

### Understandig similarity score
The similarity score represents how closly related a document chunk is to your query. The scoring depends on the distance
metric used.

ChromaDB default: Uses L2 distance (Euclidean distance)

- Lower scores = MORE similar (closer in vector space)  
- Score of 0 = identical vectors  
- Typical range: 0 to 2 (but can be higher)


Cosine similarity (if configured):  

- Higher scores = More similar  
- Range: -1 to 1 (1 being identical)

In [14]:
### Initialize LLM, RAG Chain, Prompt Template, Query the RAG System
from langchain_ollama import ChatOllama
from langchain_groq import ChatGroq

llm = ChatOllama(
    model="llama3.2:3b",    
    temperature=0.2,
    top_p=0.6,
    num_ctx=8192, 
    reasoning=False  
)

In [None]:
test_response = llm.stream("Was ist unter den OPS Codes 5-38 zu finden?")

for text in test_response:
    print(text.content, end='')

## Modern RAG Chain

In [10]:
from langchain.chains import create_retrieval_chain
from langchain_core.prompts import ChatPromptTemplate
from langchain.chains.combine_documents import create_stuff_documents_chain

In [11]:
## Convert vectorestore to retriever
retriever = vectorestore.as_retriever(
    search_type="similarity",
    search_kwargs={"k": 10},
)

In [15]:
## Create a prompt template
system_prompt = """
You are an assistant for question-answering tasks.  
Answer the following question **only** based on the provided context.  

Rules:  
- Use only the text from the context. Do not add any external knowledge.  
- If the information is not contained in the context, reply with: "I don’t know."  
- Always include the OPS codes mentioned in the context.  
- Always provide the page number(s) if available.  
- Answer in German.  

Context:  
{context}  

Question:  
{input}  

Answer format (always use this structure):  
OPS-Code: <code(s)>  
Seite(n): <number(s)>  
Beschreibung:\n 

<answer in German>
"""

prompt = ChatPromptTemplate.from_messages([
    ("system", system_prompt),
    ("user", "{input}")   
])

In [16]:
## Create document chain
document_chain = create_stuff_documents_chain(llm, prompt)
document_chain

RunnableBinding(bound=RunnableBinding(bound=RunnableAssign(mapper={
  context: RunnableLambda(format_docs)
}), kwargs={}, config={'run_name': 'format_inputs'}, config_factories=[])
| ChatPromptTemplate(input_variables=['context', 'input'], input_types={}, partial_variables={}, messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'input'], input_types={}, partial_variables={}, template='\nYou are an assistant for question-answering tasks.  \nAnswer the following question **only** based on the provided context.  \n\nRules:  \n- Use only the text from the context. Do not add any external knowledge.  \n- If the information is not contained in the context, reply with: "I don’t know."  \n- Always include the OPS codes mentioned in the context.  \n- Always provide the page number(s) if available.  \n- Answer in German.  \n\nContext:  \n{context}  \n\nQuestion:  \n{input}  \n\nAnswer format (always use this structure):  \nOPS-Code: <code(s)>  \nSeite(n): <num

**This chain**
- Takes retrieved documents
- "Stuffs" them into the prompt's {context} placeholder
- Sends the complete prompt to the LLM
- Returns the LLM's response


In [22]:
from pprint import pprint


### Create the final RAG Chain
rag_chain = create_retrieval_chain(retriever, document_chain)
result = rag_chain.invoke({"input":"Was ist neu im OPS Katalog 2025"})

print(result['answer'])

OPS-Code: 9-999.92 .93 
Seite(n): 655/662 
Beschreibung: Nicht belegte Schlüsselnummer, Liste 9.80 

Ich weiß nicht, was genau neu in dem OPS-Katalog 2025 ist.
