
# Tracing and Evaluating a RAG Pipeline with Netra

This notebook walks you through adding full observability and systematic evaluation to a Retrieval-Augmented Generation (RAG) pipeline using Netraâ€”tracing every stage from document ingestion to answer generation.

What You'll Learn
Build a RAG pipeline that processes PDFs and answers questions.

Trace every stage: chunking, embedding, retrieval, and generation.

Track token usage, costs, and latency per query.

Evaluate retrieval quality, answer correctness, and faithfulness.

Prerequisites

OpenAI API key

Netra API key ([Steps Mentioned here](https://docs.getnetra.ai/quick-start/Overview))

---
## Step 0: Install Packages

In [None]:
pip install netra-sdk openai chromadb pypdf reportlab

---
## Step 1: Set Environment Variables

In [None]:
import os
from getpass import getpass

os.environ["OPENAI_API_KEY"] = getpass("Enter your OpenAI API Key:")
os.environ["NETRA_API_KEY"] = getpass("Enter your Netra API Key:")
os.environ["NETRA_OTLP_ENDPOINT"] = getpass("Enter your Netra OTLP Endpoint:")

print("API keys configured!")

---
## Step 2: Initialize Netra for Observability

In [None]:
import os
from netra import Netra, SpanType, UsageModel

print(os.getenv('NETRA_API_KEY'))

Netra.init(
    app_name="pdf-qa-chatbot",
    headers=f"x-api-key={os.getenv('NETRA_API_KEY')}",
    environment="development",
    trace_content=True,
)

print("Netra initialized!")

---
## Step 3: Import Libraries and Initialize Clients

In [None]:
import uuid
from typing import List, Optional
from pypdf import PdfReader
import chromadb
from openai import OpenAI
import os

# Initialize clients
openai_client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
chroma_client = chromadb.Client()

print("Clients initialized!")

---
## Step 4: Define Helper Functions

These functions handle PDF loading, text chunking, and embedding generation.

In [None]:
def load_pdf(file_path: str) -> str:
    """Extract text from a PDF file."""
    reader = PdfReader(file_path)
    text = ""
    for page in reader.pages:
        text += page.extract_text() + "\n"
    return text


def chunk_text(text: str, chunk_size: int = 1000, overlap: int = 200) -> List[str]:
    """Split text into overlapping chunks."""
    chunks = []
    start = 0
    while start < len(text):
        end = start + chunk_size
        chunk = text[start:end]
        chunks.append(chunk)
        start = end - overlap
    return chunks


def generate_embeddings(texts: List[str]) -> List[List[float]]:
    """Generate embeddings for a list of texts."""
    response = openai_client.embeddings.create(
        model="text-embedding-3-small",
        input=texts
    )
    return [item.embedding for item in response.data]


print("Helper functions defined!")

---
## Step 5: Create Traced Ingestion Pipeline

This function ingests a PDF with full tracing for each stage.

In [None]:
def ingest_pdf_traced(file_path: str, collection_name: str = "pdf_qa") -> dict:
    """Ingest a PDF with full tracing."""
    with Netra.start_span("pdf-ingestion") as parent_span:
        parent_span.set_attribute("pdf.path", file_path)

        # Step 1: Load PDF
        with Netra.start_span("load-pdf") as load_span:
            pdf_text = load_pdf(file_path)
            load_span.set_attribute("pdf.characters", len(pdf_text))
            load_span.set_success()
            print(f"Loaded PDF: {len(pdf_text)} characters")

        # Step 2: Chunk text
        with Netra.start_span("chunk-text") as chunk_span:
            chunks = chunk_text(pdf_text, chunk_size=1000, overlap=200)
            chunk_span.set_attribute("chunks.count", len(chunks))
            chunk_span.set_attribute("chunks.avg_size", len(pdf_text) // len(chunks))
            chunk_span.set_success()
            print(f"Created {len(chunks)} chunks")

        # Step 3: Generate embeddings
        with Netra.start_span("generate-embeddings", as_type=SpanType.EMBEDDING) as embed_span:
            embed_span.set_model("text-embedding-3-small")
            embed_span.set_llm_system("openai")

            embeddings = generate_embeddings(chunks)

            # Track embedding costs (approximate)
            total_tokens = sum(len(chunk.split()) * 1.3 for chunk in chunks)
            embed_span.set_usage([
                UsageModel(
                    model="text-embedding-3-small",
                    cost_in_usd=total_tokens * 0.00002 / 1000,
                    usage_type="embedding",
                    units_used=int(total_tokens)
                )
            ])
            embed_span.set_success()
            print(f"Generated {len(embeddings)} embeddings")

        # Step 4: Store in vector DB
        with Netra.start_span("store-vectors") as store_span:
            # Delete collection if exists
            try:
                chroma_client.delete_collection(name=collection_name)
            except:
                pass

            collection = chroma_client.create_collection(name=collection_name)
            collection.add(
                documents=chunks,
                embeddings=embeddings,
                ids=[f"chunk_{i}" for i in range(len(chunks))]
            )
            store_span.set_attribute("vectors.count", len(chunks))
            store_span.set_attribute("collection.name", collection_name)
            store_span.set_success()
            print(f"Stored {len(chunks)} vectors in ChromaDB")

        parent_span.set_attribute("ingestion.chunks_created", len(chunks))
        parent_span.set_success()

        return {
            "chunks_count": len(chunks),
            "collection": collection,
            "chunks": chunks
        }


print("Ingestion function defined!")

---
## Step 6: Create Traced Retrieval Function

This function retrieves relevant chunks with similarity score tracking.

In [None]:
def retrieve_chunks_traced(query: str, collection, top_k: int = 3) -> List[dict]:
    """Retrieve chunks with full tracing."""
    with Netra.start_span("retrieval", as_type=SpanType.TOOL) as span:
        span.set_attribute("query", query)
        span.set_attribute("top_k", top_k)

        # Generate query embedding
        with Netra.start_span("query-embedding", as_type=SpanType.EMBEDDING) as embed_span:
            embed_span.set_model("text-embedding-3-small")
            query_embedding = generate_embeddings([query])[0]
            embed_span.set_success()

        # Perform vector search
        with Netra.start_span("vector-search") as search_span:
            results = collection.query(
                query_embeddings=[query_embedding],
                n_results=top_k,
                include=["documents", "distances"]
            )
            search_span.set_attribute("results.count", len(results["documents"][0]))
            search_span.set_success()

        # Process results
        retrieved = []
        for i, doc in enumerate(results["documents"][0]):
            similarity = 1 - results["distances"][0][i]
            retrieved.append({
                "content": doc,
                "similarity_score": similarity,
                "chunk_id": f"chunk_{i}"
            })

        # Log retrieval quality metrics
        if retrieved:
            span.set_attribute("retrieval.avg_similarity",
                             sum(r["similarity_score"] for r in retrieved) / len(retrieved))
            span.set_attribute("retrieval.max_similarity",
                             max(r["similarity_score"] for r in retrieved))
            span.set_attribute("retrieval.min_similarity",
                             min(r["similarity_score"] for r in retrieved))

        span.add_event("chunks-retrieved", {
            "count": len(retrieved),
            "similarity_scores": [r["similarity_score"] for r in retrieved]
        })

        span.set_success()
        return retrieved


print("Retrieval function defined!")

---
## Step 7: Create Traced Answer Generation Function

This function generates answers with token and cost tracking.

In [None]:
def generate_answer_traced(query: str, context_chunks: List[dict]) -> dict:
    """Generate answer with full tracing and cost tracking."""
    with Netra.start_span("answer-generation", as_type=SpanType.GENERATION) as span:
        span.set_model("gpt-4o-mini")
        span.set_llm_system("openai")

        context = "\n\n".join([chunk["content"] for chunk in context_chunks])

        prompt = f"""Context:
{context}

Question: {query}

Answer the question based only on the provided context. If the answer is not in the context, say "I cannot find this information in the document."
"""

        span.set_prompt(prompt)
        span.set_attribute("context.chunks_count", len(context_chunks))
        span.set_attribute("context.total_chars", len(context))

        span.add_event("generation-started")

        response = openai_client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[
                {
                    "role": "system",
                    "content": "You are a helpful assistant that answers questions based on provided context."
                },
                {"role": "user", "content": prompt}
            ],
            temperature=0.1
        )

        answer = response.choices[0].message.content

        # Track token usage and cost
        prompt_tokens = response.usage.prompt_tokens
        completion_tokens = response.usage.completion_tokens

        # GPT-4o-mini pricing: $0.15/1M input, $0.60/1M output
        cost = (prompt_tokens * 0.00015 / 1000) + (completion_tokens * 0.0006 / 1000)

        span.set_usage([
            UsageModel(
                model="gpt-4o-mini",
                cost_in_usd=cost,
                usage_type="chat",
                units_used=prompt_tokens + completion_tokens
            )
        ])

        span.set_attribute("tokens.prompt", prompt_tokens)
        span.set_attribute("tokens.completion", completion_tokens)
        span.set_attribute("tokens.total", prompt_tokens + completion_tokens)
        span.set_attribute("cost.usd", cost)

        span.add_event("generation-completed", {
            "answer_length": len(answer),
            "tokens_used": prompt_tokens + completion_tokens
        })

        span.set_success()

        return {
            "answer": answer,
            "token_usage": {
                "prompt": prompt_tokens,
                "completion": completion_tokens,
                "total": prompt_tokens + completion_tokens
            },
            "cost_usd": cost
        }


print("Answer generation function defined!")

---
## Step 8: Create the Complete Traced Chatbot Class

This class combines all the traced functions into a complete chatbot with session support.

In [None]:
class TracedPDFChatbot:
    """RAG Pipeline with full Netra tracing."""

    def __init__(self, pdf_path: str):
        self.pdf_path = pdf_path
        self.session_id = str(uuid.uuid4())
        self.collection = None
        self.chunks = []
        self.conversation_history = []

    def initialize(self):
        """Initialize with tracing."""
        print(f"Initializing chatbot for: {self.pdf_path}")
        print(f"Session ID: {self.session_id}")

        result = ingest_pdf_traced(self.pdf_path, f"pdf_{self.session_id[:8]}")
        self.collection = result["collection"]
        self.chunks = result["chunks"]

        print(f"\nInitialization complete! {result['chunks_count']} chunks indexed.")

    def chat(self, query: str, user_id: str = None) -> dict:
        """Process a chat message with full tracing."""
        # Set session and user context
        Netra.set_session_id(self.session_id)
        if user_id:
            Netra.set_user_id(user_id)

        with Netra.start_span("pdf-qa-query", as_type=SpanType.AGENT) as span:
            span.set_attribute("query", query)
            span.set_attribute("session_id", self.session_id)
            if user_id:
                span.set_attribute("user_id", user_id)

            # Retrieve relevant chunks
            retrieved_chunks = retrieve_chunks_traced(query, self.collection)

            # Generate answer
            result = generate_answer_traced(query, retrieved_chunks)

            # Update conversation history
            self.conversation_history.append({
                "role": "user",
                "content": query
            })
            self.conversation_history.append({
                "role": "assistant",
                "content": result["answer"]
            })

            span.set_attribute("response.length", len(result["answer"]))
            span.set_attribute("cost.total_usd", result["cost_usd"])
            span.set_success()

            return {
                "query": query,
                "answer": result["answer"],
                "retrieved_chunks": retrieved_chunks,
                "token_usage": result["token_usage"],
                "cost_usd": result["cost_usd"],
                "session_id": self.session_id
            }


print("TracedPDFChatbot class defined!")

---
## Step 9: Upload or Create a Sample PDF

Upload your own PDF or create a sample one for testing.

In [None]:
# Option 1: Upload your own PDF (uncomment to use)
# from google.colab import files
# uploaded = files.upload()
# pdf_path = list(uploaded.keys())[0]

# Option 2: Create a sample PDF for testing
from reportlab.lib.pagesizes import letter
from reportlab.pdfgen import canvas

def create_sample_pdf(filename: str = "sample_document.pdf"):
    """Create a sample PDF for testing."""
    c = canvas.Canvas(filename, pagesize=letter)
    width, height = letter

    # Page 1: Introduction
    c.setFont("Helvetica-Bold", 24)
    c.drawString(100, height - 100, "Introduction to Machine Learning")

    c.setFont("Helvetica", 12)
    text = """
    Machine learning is a subset of artificial intelligence that enables systems to learn
    and improve from experience without being explicitly programmed. The term was coined
    by Arthur Samuel in 1959 while at IBM.

    Machine learning algorithms build a mathematical model based on sample data, known as
    training data, in order to make predictions or decisions without being explicitly
    programmed to perform the task.

    The primary aim is to allow computers to learn automatically without human intervention
    or assistance and adjust actions accordingly.
    """

    y = height - 150
    for line in text.strip().split("\n"):
        c.drawString(100, y, line.strip())
        y -= 20

    # Page 2: Types of ML
    c.showPage()
    c.setFont("Helvetica-Bold", 18)
    c.drawString(100, height - 100, "Types of Machine Learning")

    c.setFont("Helvetica", 12)
    text = """
    There are three main types of machine learning:

    1. Supervised Learning: The algorithm learns from labeled training data and makes
       predictions based on that data. Examples include classification and regression.

    2. Unsupervised Learning: The algorithm learns patterns from unlabeled data without
       any guidance. Examples include clustering and dimensionality reduction.

    3. Reinforcement Learning: The algorithm learns through interaction with an environment,
       receiving rewards or penalties for actions. Used in robotics and game playing.

    Each type has its own applications and is suited for different kinds of problems.
    """

    y = height - 150
    for line in text.strip().split("\n"):
        c.drawString(100, y, line.strip())
        y -= 20

    # Page 3: Applications
    c.showPage()
    c.setFont("Helvetica-Bold", 18)
    c.drawString(100, height - 100, "Applications of Machine Learning")

    c.setFont("Helvetica", 12)
    text = """
    Machine learning has numerous real-world applications:

    - Image Recognition: Identifying objects, faces, and scenes in images
    - Natural Language Processing: Translation, sentiment analysis, chatbots
    - Recommendation Systems: Netflix, Amazon, Spotify recommendations
    - Autonomous Vehicles: Self-driving cars use ML for navigation
    - Healthcare: Disease diagnosis, drug discovery, personalized treatment
    - Financial Services: Fraud detection, algorithmic trading, credit scoring

    The field continues to grow rapidly with new applications emerging regularly.
    """

    y = height - 150
    for line in text.strip().split("\n"):
        c.drawString(100, y, line.strip())
        y -= 20

    c.save()
    print(f"Created sample PDF: {filename}")
    return filename

# Create the sample PDF
pdf_path = create_sample_pdf()

---
## Step 10: Initialize and Test the Chatbot

Now let's create and test our traced PDF chatbot!

In [None]:
# Initialize the chatbot
chatbot = TracedPDFChatbot(pdf_path)
chatbot.initialize()

In [None]:
response = chatbot.chat(
    "What are the key findings?",
    user_id="user-123"
)
print(f"Answer: {response['answer']}")
print(f"Cost: ${response['cost_usd']:.6f}")

In [None]:
# Test with a few questions
questions = [
    "What is machine learning?",
    "What are the three types of machine learning?",
    "Who coined the term machine learning and when?",
    "What are some applications of machine learning?"
]

print("=" * 60)
print("Testing the RAG Pipeline")
print("=" * 60)

for i, question in enumerate(questions, 1):
    print(f"\n--- Question {i} ---")
    print(f"Q: {question}")

    response = chatbot.chat(question, user_id="test-user")

    print(f"\nA: {response['answer']}")
    print(f"\n[Tokens: {response['token_usage']['total']} | Cost: ${response['cost_usd']:.6f}]")
    print(f"[Retrieved {len(response['retrieved_chunks'])} chunks | "
          f"Max similarity: {max(c['similarity_score'] for c in response['retrieved_chunks']):.3f}]")

---
## Step 12: Run Evaluation

Run the chatbot against all test cases and collect results.

In [None]:
def run_evaluation(dataset_id: str = "local-eval"):
    """Run evaluation on all test cases."""
    dataset = Netra.evaluation.get_dataset(dataset_id)
    Netra.evaluation.run_test_suite(
        name="Evaluation Test Demo",
        data=dataset,
        task=lambda eval_input: print(eval_input) # Define a function based on your evaluation needs! The supplied function is called with the evaluator input as defind in the dataset
    )

# Run evaluation
print("\n" + "=" * 60)
print("Running Evaluation")
print("=" * 60 + "\n")

eval_results = run_evaluation("replace the id with your netra dataset id")

---
## Step 14: Shutdown Netra

Flush all pending traces and gracefully shutdown.

In [None]:
# Shutdown Netra gracefully
Netra.shutdown()
print("Netra shutdown complete!")
print("\nView your traces at: https://app.getnetra.ai")

---
## View Details on Evaluations and Test Runs

For more details on how to do evaluations and test runs on this RAG project,  
refer to the [Tracing and Evaluating a RAG Pipeline Cookbook](https://docs.getnetra.ai/Cookbooks/pdf-qa-rag-chatbot).

## Documentation Links

- [Netra Documentation](https://docs.getnetra.ai)
- [Evaluators Guide](https://docs.getnetra.ai/Evaluation/Evaluators)
- [Datasets Guide](https://docs.getnetra.ai/Evaluation/Datasets)
- [Manual Tracing](https://docs.getnetra.ai/Observability/Traces/manual-tracing)