# Context Engineering with Chroma — RAG Visualization

This notebook demonstrates how to visualize Chroma RAG workflows using the context-engineering-dashboard.

**Key Features:**
- See similarity scores in VIEW mode
- Explore full document metadata in EXPLORE mode
- Understand the "available pool" vs "selected context" distinction

In [None]:
import chromadb
from context_engineering_dashboard import trace_chroma, ContextWindow, ContextDiff

## 1. Setup: Create a Documentation Collection

In [None]:
# Initialize Chroma
client = chromadb.Client()

# Create collection with sample documentation
collection = client.get_or_create_collection(
    name="chroma_docs",
    metadata={"description": "ChromaDB documentation chunks"}
)

# Add documentation chunks
docs = [
    {
        "id": "install_01",
        "text": "To install ChromaDB, run: pip install chromadb. ChromaDB requires Python 3.8 or higher.",
        "metadata": {"section": "installation", "page": 1}
    },
    {
        "id": "collection_01",
        "text": "Collections are the core abstraction in Chroma. Create one with client.create_collection('name'). Each collection stores documents, embeddings, and metadata.",
        "metadata": {"section": "collections", "page": 5}
    },
    {
        "id": "collection_02",
        "text": "To add documents: collection.add(ids=['id1'], documents=['text'], metadatas=[{'key': 'value'}]). IDs must be unique strings.",
        "metadata": {"section": "collections", "page": 6}
    },
    {
        "id": "query_01",
        "text": "Query collections with collection.query(query_texts=['your question'], n_results=5). Returns documents ranked by semantic similarity.",
        "metadata": {"section": "querying", "page": 10}
    },
    {
        "id": "query_02",
        "text": "Filter results using where={'key': 'value'}. Supports operators like $eq, $ne, $gt, $lt for complex queries.",
        "metadata": {"section": "querying", "page": 11}
    },
    {
        "id": "embed_01",
        "text": "ChromaDB supports multiple embedding functions: OpenAI, Cohere, HuggingFace, or custom implementations.",
        "metadata": {"section": "embeddings", "page": 15}
    },
    {
        "id": "persist_01",
        "text": "For persistence, use PersistentClient: client = chromadb.PersistentClient(path='./chroma_db'). Data is saved to disk automatically.",
        "metadata": {"section": "persistence", "page": 20}
    },
    {
        "id": "advanced_01",
        "text": "For production deployments, use ChromaDB's client-server mode. Start the server with chroma run --path /db_path.",
        "metadata": {"section": "advanced", "page": 30}
    },
]

collection.add(
    ids=[d["id"] for d in docs],
    documents=[d["text"] for d in docs],
    metadatas=[d["metadata"] for d in docs],
)

print(f"Added {collection.count()} documents to collection")

## 2. Trace a RAG Query

Wrap the collection with `trace_chroma()` to capture retrieval details.

In [None]:
# Wrap collection for tracing
traced_collection = trace_chroma(collection)

# Perform a query — this captures ALL returned results
user_query = "How do I create and use a collection?"

results = traced_collection.query(
    query_texts=[user_query],
    n_results=6,  # Retrieve more than we'll use
)

print("Retrieved documents:")
for i, (doc_id, doc, dist) in enumerate(zip(
    results['ids'][0], 
    results['documents'][0], 
    results['distances'][0]
)):
    score = 1 / (1 + dist)  # Convert distance to similarity
    print(f"{i+1}. [{doc_id}] score={score:.3f}")
    print(f"   {doc[:80]}...")

## 3. Mark Selected Documents

In a real RAG pipeline, you'd select the top-k documents that fit your context budget.

In [None]:
# Select only the top 3 documents for our context
selected_ids = results['ids'][0][:3]
traced_collection.mark_selected(selected_ids)

print(f"Selected {len(selected_ids)} documents: {selected_ids}")

## 4. Add Other Context Components

In [None]:
# Add system prompt
traced_collection.add_system_prompt(
    "You are a helpful assistant for ChromaDB documentation. "
    "Answer questions based on the provided context. "
    "If the answer isn't in the context, say so."
)

# Add user message
traced_collection.add_user_message(user_query)

## 5. Visualize the Context Window

### VIEW Mode
Shows component sizes and **Chroma scores** as badges.

In [None]:
trace = traced_collection.get_trace(context_limit=128_000)

ctx = ContextWindow(trace=trace, mode="view")
ctx.display()

### EXPLORE Mode

**Double-click** any component to see:
- Full text content
- Similarity score
- All metadata fields
- Collection name

In [None]:
ctx = ContextWindow(trace=trace, mode="explore")
ctx.display()

## 6. Understanding the Available Pool

The trace captures **all** documents Chroma returned, not just the ones selected. This helps you understand:
- What was available to choose from
- Why certain documents were cut
- Score distribution across candidates

In [None]:
# Examine the Chroma query trace
chroma_query = trace.chroma_queries[0]

print(f"Query: {chroma_query.query_text}")
print(f"Collection: {chroma_query.collection}")
print(f"Results requested: {chroma_query.n_results}")
print()
print("All retrieved documents:")
print("-" * 60)

for result in chroma_query.results:
    status = "✓ SELECTED" if result.selected else "✗ cut"
    print(f"  [{result.id}] score={result.score:.3f} tokens={result.token_count} {status}")

In [None]:
# Visualize with available pool
ctx = ContextWindow(trace=trace, mode="explore", show_available_pool=True)
ctx.display()

## 7. Context Budget Analysis

Understand how much of your context window is used and by what.

In [None]:
from context_engineering_dashboard import ComponentType

print(f"Context Window Analysis")
print(f"=" * 40)
print(f"Limit: {trace.context_limit:,} tokens")
print(f"Used: {trace.total_tokens:,} tokens")
print(f"Unused: {trace.unused_tokens:,} tokens")
print(f"Utilization: {trace.utilization:.1f}%")
print()
print("By component type:")
print("-" * 40)

for comp_type in ComponentType:
    components = trace.get_components_by_type(comp_type)
    if components:
        total = sum(c.token_count for c in components)
        pct = (total / trace.total_tokens) * 100
        print(f"  {comp_type.value:<20} {total:>6,} tokens ({pct:>5.1f}%)")

## 8. Experiment: What If We Selected Fewer Documents?

Create two traces to compare different retrieval strategies.

In [None]:
# Reset and create a new trace with only top-1 document
traced_collection.reset()

results = traced_collection.query(query_texts=[user_query], n_results=6)
traced_collection.mark_selected(results['ids'][0][:1])  # Only top 1
traced_collection.add_system_prompt("You are a helpful assistant.")
traced_collection.add_user_message(user_query)

minimal_trace = traced_collection.get_trace(context_limit=128_000)

# Compare with the original (top-3) trace
diff = ContextDiff(
    before=trace,
    after=minimal_trace,
    before_label="Top 3 docs",
    after_label="Top 1 doc",
)
diff.sankey()

In [None]:
diff.summary()

## 9. Save Trace for Later Analysis

In [None]:
# Export to JSON
trace.to_json("chroma_rag_trace.json")
print("Trace saved to chroma_rag_trace.json")

# Peek at the structure
import json
with open("chroma_rag_trace.json") as f:
    data = json.load(f)
    print(f"\nTrace contains:")
    print(f"  - {len(data['components'])} components")
    print(f"  - {len(data['chroma_queries'])} Chroma queries")
    print(f"  - {data['total_tokens']} total tokens")

## Key Takeaways

1. **Scores visible in VIEW mode**: Quickly assess retrieval quality
2. **Full metadata in EXPLORE mode**: Debug document selection
3. **Available pool tracking**: Understand what was cut and why
4. **Diff views**: Compare retrieval strategies visually
5. **Serialization**: Save traces for reproducibility