# Deep Dive: Context Engineering Dashboard

A comprehensive walkthrough of every feature in the dashboard.

**What you will learn:**

| Section | Feature |
|---|---|
| 1 | Chroma RAG tracing with `trace_chroma()` |
| 2 | Available pool vs. selected context |
| 3 | Gesture-based interactions (hover, click, double-click) |
| 4 | Horizontal vs. Treemap layouts |
| 5 | Live OpenAI tracing with `trace_openai()` |
| 6 | Multi-turn conversation tracing |
| 7 | Sankey diff for prompt optimization |
| 8 | Context budget analysis |
| 9 | Serialization & reproducibility |

In [None]:
# Setup: works with pip install OR local development
import sys
from pathlib import Path

# Try importing the package - if it fails, add parent directory to path
try:
    import context_engineering_dashboard
    print(f"Using installed package: {context_engineering_dashboard.__file__}")
except ImportError:
    # Running from local clone - add parent directory to path
    repo_root = Path().resolve().parent
    if repo_root not in sys.path:
        sys.path.insert(0, str(repo_root))
    print(f"Using local development: {repo_root}")

# Installation options (run one if package not found):
# Option 1: Install from GitHub release
# !pip install git+https://github.com/cp71-dlai/context-engineering-dashboard.git@v0.1.0

# Option 2: Install locally for development (from repo root)
# !pip install -e ".[dev,all]"

In [None]:
import os
from dotenv import load_dotenv

load_dotenv()  # reads OPENAI_API_KEY from .env

---
## 1 | Chroma RAG tracing

`trace_chroma()` wraps a Chroma collection so that every `.query()` call
is automatically recorded -- documents, distances, scores, and metadata.

In [None]:
import chromadb

client = chromadb.Client()
collection = client.get_or_create_collection(
    name="context_eng_docs",
    metadata={"description": "Context engineering reference docs"},
)

# Populate with realistic documentation chunks
docs = [
    {
        "id": "ce_overview",
        "text": (
            "Context engineering is the discipline of designing and optimizing the information "
            "provided to a large language model within its context window. Unlike prompt engineering, "
            "which focuses on instruction phrasing, context engineering considers the entire input: "
            "system prompts, retrieved documents, chat history, tool outputs, and user messages. "
            "The goal is to maximize the signal-to-noise ratio so the model produces accurate, "
            "grounded responses."
        ),
        "meta": {"section": "overview", "page": 1},
    },
    {
        "id": "ce_components",
        "text": (
            "A typical LLM context window contains several component types: (1) System Prompt -- "
            "defines the assistant persona and rules. (2) RAG Documents -- retrieved passages providing "
            "factual grounding. (3) Chat History -- prior turns for conversational continuity. "
            "(4) Tool Outputs -- results from function calls. (5) User Message -- the current query. "
            "Each component competes for limited token budget."
        ),
        "meta": {"section": "components", "page": 3},
    },
    {
        "id": "ce_rag_best",
        "text": (
            "RAG best practices: (1) Retrieve more than you need, then re-rank and prune. "
            "(2) Prefer smaller, focused chunks (200-400 tokens) over large passages. "
            "(3) Include metadata (source, date, score) so the model can weigh relevance. "
            "(4) Place the highest-scoring documents closest to the user query for recency bias. "
            "(5) Monitor retrieval quality with similarity-score thresholds."
        ),
        "meta": {"section": "rag", "page": 7},
    },
    {
        "id": "ce_history",
        "text": (
            "Chat history management strategies: (1) Sliding window -- keep only the last N turns. "
            "(2) Summarization -- compress older turns into a summary paragraph. "
            "(3) Selective inclusion -- only include turns relevant to the current query. "
            "(4) Token budgeting -- allocate a fixed percentage of the context window to history. "
            "Excessive history leads to distraction and increased latency."
        ),
        "meta": {"section": "history", "page": 12},
    },
    {
        "id": "ce_tools",
        "text": (
            "Tool integration patterns: Function calling lets the model invoke external APIs. "
            "Each tool definition consumes tokens from the context window. Best practices: "
            "(1) Only include tools relevant to the current task. (2) Keep descriptions concise. "
            "(3) Provide few-shot examples for complex tools. "
            "(4) Monitor tool-output token counts to avoid budget overruns."
        ),
        "meta": {"section": "tools", "page": 18},
    },
    {
        "id": "ce_eval",
        "text": (
            "Evaluating context quality: (1) Measure answer accuracy with and without specific "
            "components. (2) Track token utilization (used vs. available). (3) Compare retrieval "
            "strategies using before/after diffs. (4) Log context traces for every production call "
            "to identify patterns (e.g., which docs are never selected, which prompts are too long)."
        ),
        "meta": {"section": "evaluation", "page": 22},
    },
    {
        "id": "chroma_setup",
        "text": (
            "ChromaDB setup: pip install chromadb. Create an in-memory client with chromadb.Client() "
            "or a persistent one with chromadb.PersistentClient(path='./data'). Collections store "
            "documents and their embeddings. Default embedding function uses sentence-transformers."
        ),
        "meta": {"section": "chroma", "page": 25},
    },
    {
        "id": "chroma_query",
        "text": (
            "Querying ChromaDB: collection.query(query_texts=['your question'], n_results=10). "
            "Returns IDs, documents, distances, and metadata. Use where={'field': 'value'} for "
            "metadata filtering. Distances are L2 by default; convert to similarity with 1/(1+d)."
        ),
        "meta": {"section": "chroma", "page": 27},
    },
]

collection.add(
    ids=[d["id"] for d in docs],
    documents=[d["text"] for d in docs],
    metadatas=[d["meta"] for d in docs],
)

print(f"Collection '{collection.name}' has {collection.count()} documents")

In [None]:
from context_engineering_dashboard import trace_chroma, ContextWindow

# Wrap the collection -- all queries are now traced
traced = trace_chroma(collection)

# Query: retrieve 6 documents (more than we will select)
user_question = "What are the best practices for RAG in context engineering?"

results = traced.query(
    query_texts=[user_question],
    n_results=6,
)

print(f"Retrieved {len(results['ids'][0])} documents:\n")
for doc_id, doc, dist in zip(
    results["ids"][0],
    results["documents"][0],
    results["distances"][0],
):
    score = 1.0 / (1.0 + dist)
    print(f"  [{doc_id}]  score={score:.3f}")
    print(f"    {doc[:90]}...\n")

In [None]:
# Select top-3 documents for the context window
selected_ids = results["ids"][0][:3]
traced.mark_selected(selected_ids)

# Add the rest of the prompt
traced.add_system_prompt(
    "You are an expert on context engineering for LLM applications. "
    "Answer questions using only the provided documents. "
    "Cite the source section when possible."
)
traced.add_user_message(user_question)

# Build the trace
rag_trace = traced.get_trace(context_limit=128_000)

print(f"Selected: {selected_ids}")
print(f"Total tokens: {rag_trace.total_tokens:,}")

In [None]:
# Visualize the context window
ContextWindow(trace=rag_trace)

---
## 2 | Available pool vs. selected context

When `show_available_pool=True`, the dashboard shows **all** documents
Chroma returned (left panel) next to what actually made it into the
context window (right panel). Green = selected, grey = cut.

In [None]:
ContextWindow(trace=rag_trace, show_available_pool=True)

In [None]:
# Inspect the raw query trace
query = rag_trace.chroma_queries[0]

print(f"Query: \"{query.query_text}\"")
print(f"Collection: {query.collection}")
print(f"Requested n_results: {query.n_results}")
print()
print(f"{'ID':<20} {'Score':>6} {'Tokens':>7} {'Status'}")
print("-" * 50)
for r in query.results:
    status = "SELECTED" if r.selected else "cut"
    print(f"{r.id:<20} {r.score:>6.3f} {r.token_count:>7,} {status}")

---
## 3 | Gesture-based interactions

The dashboard uses intuitive mouse gestures:

| Gesture | Action |
|---------|--------|
| **Hover** | Tooltip showing component type and token count |
| **Click** | Opens a modal with full content and metadata |
| **Click text in modal** | Switch to edit mode (Save button appears in header) |

In [None]:
# Try the gestures on this visualization:
# - Hover over any block to see the tooltip
# - Click a RAG block to see its full content and metadata
# - Click on the text content to edit (Save button appears in header)
ContextWindow(trace=rag_trace)

---
## 4 | Horizontal vs. Treemap layouts

Two layout algorithms are available:

- **Horizontal** (default) -- flex row where each block's width is proportional to its token count.
  Great for comparing a few large components.
- **Treemap** -- squarified treemap (Bruls-Huizing-van Wijk). Better when you have
  many components of varying size, since small blocks remain visible.

In [None]:
from context_engineering_dashboard import ComponentType, ContextComponent, ContextTrace

# Build a trace with many components to highlight layout differences
many_components = [
    ContextComponent("sys",     ComponentType.SYSTEM_PROMPT, "System instructions.",     token_count=3500),
    ContextComponent("rag_1",   ComponentType.RAG_DOCUMENT,  "Architecture overview.",   token_count=6200),
    ContextComponent("rag_2",   ComponentType.RAG_DOCUMENT,  "API reference.",           token_count=4100),
    ContextComponent("rag_3",   ComponentType.RAG_DOCUMENT,  "Deployment guide.",        token_count=2800),
    ContextComponent("hist_1",  ComponentType.CHAT_HISTORY,  "Turn 1.",                  token_count=1500),
    ContextComponent("hist_2",  ComponentType.CHAT_HISTORY,  "Turn 2.",                  token_count=1200),
    ContextComponent("tool_1",  ComponentType.TOOL,          "DB query result.",         token_count=800),
    ContextComponent("few_1",   ComponentType.FEW_SHOT,      "Example Q&A pair.",        token_count=600),
    ContextComponent("mem",     ComponentType.MEMORY,        "User preferences.",        token_count=400),
    ContextComponent("user",    ComponentType.USER_MESSAGE,  "What is the API limit?",   token_count=250),
]

rich_trace = ContextTrace(
    context_limit=128_000,
    components=many_components,
    total_tokens=sum(c.token_count for c in many_components),
)

print(f"Components: {len(many_components)}, Tokens: {rich_trace.total_tokens:,}")

In [None]:
# Horizontal
ContextWindow(trace=rich_trace, layout="horizontal")

In [None]:
# Treemap -- notice how small components (Memory, Few-Shot) are easier to see
ContextWindow(trace=rich_trace, layout="treemap")

---
## 5 | Live OpenAI tracing

`trace_openai()` monkey-patches the OpenAI client to capture every
`chat.completions.create` call inside the `with` block. Token counts
come from the API's own usage data.

In [None]:
from openai import OpenAI
from context_engineering_dashboard import trace_openai

client = OpenAI()

system_prompt = (
    "You are an expert on context engineering for large language models. "
    "When answering, structure your response with clear headings and examples."
)

with trace_openai() as tracer:
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user",   "content": (
                "Compare three chat-history management strategies for LLM context windows: "
                "sliding window, summarization, and selective inclusion. "
                "Include trade-offs for each."
            )},
        ],
        temperature=0.7,
    )

print(response.choices[0].message.content)

In [None]:
openai_trace = tracer.result

print(f"Model:       {openai_trace.llm_trace.model}")
print(f"Provider:    {openai_trace.llm_trace.provider}")
print(f"Prompt:      {openai_trace.llm_trace.usage.get('prompt_tokens', '?')} tokens")
print(f"Completion:  {openai_trace.llm_trace.usage.get('completion_tokens', '?')} tokens")
print(f"Latency:     {openai_trace.llm_trace.latency_ms:.0f} ms")
print(f"Context:     {openai_trace.context_limit:,} tokens")

# Click any component to see details
ContextWindow(trace=openai_trace)

---
## 6 | Multi-turn conversation tracing

Build a manual trace that represents a multi-turn conversation to see
how chat history accumulates.

In [None]:
with trace_openai() as tracer:
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system",    "content": "You are a helpful assistant. Be concise."},
            {"role": "user",      "content": "What is ChromaDB?"},
            {"role": "assistant", "content": "ChromaDB is an open-source embedding database designed for AI applications. It stores and retrieves documents using vector similarity search."},
            {"role": "user",      "content": "How do I add documents to a collection?"},
            {"role": "assistant", "content": "Use collection.add(ids=['id1'], documents=['text'], metadatas=[{'key': 'val'}]). IDs must be unique strings."},
            {"role": "user",      "content": "Now show me how to query with metadata filtering."},
        ],
        temperature=0.7,
    )

print("Response:", response.choices[0].message.content[:200], "...")

In [None]:
multi_turn_trace = tracer.result

print(f"Components: {len(multi_turn_trace.components)}")
for c in multi_turn_trace.components:
    print(f"  {c.id:<15} {c.type.value:<15} {c.token_count:>5} tokens")

# Click components to see chat history content
ContextWindow(trace=multi_turn_trace)

---
## 7 | Sankey diff for prompt optimization

`ContextDiff` renders an SVG Sankey diagram showing how token allocation
changed between two versions of a prompt.

In [None]:
from context_engineering_dashboard import ContextDiff

# Scenario: optimize a production RAG prompt
before = ContextTrace(
    context_limit=128_000,
    components=[
        ContextComponent("sys",    ComponentType.SYSTEM_PROMPT, "...", token_count=5000),
        ContextComponent("rag_1",  ComponentType.RAG_DOCUMENT,  "...", token_count=8000),
        ContextComponent("rag_2",  ComponentType.RAG_DOCUMENT,  "...", token_count=6000),
        ContextComponent("rag_3",  ComponentType.RAG_DOCUMENT,  "...", token_count=4000),
        ContextComponent("hist",   ComponentType.CHAT_HISTORY,  "...", token_count=22000),
        ContextComponent("tools",  ComponentType.TOOL,          "...", token_count=3000),
        ContextComponent("user",   ComponentType.USER_MESSAGE,  "...", token_count=500),
    ],
    total_tokens=48500,
)

# After optimization:
# - Summarized chat history (22K -> 5K)
# - Dropped lowest-scoring RAG doc
# - Trimmed system prompt
after = ContextTrace(
    context_limit=128_000,
    components=[
        ContextComponent("sys",    ComponentType.SYSTEM_PROMPT, "...", token_count=2500),
        ContextComponent("rag_1",  ComponentType.RAG_DOCUMENT,  "...", token_count=8000),
        ContextComponent("rag_2",  ComponentType.RAG_DOCUMENT,  "...", token_count=6000),
        ContextComponent("hist",   ComponentType.CHAT_HISTORY,  "...", token_count=5000),
        ContextComponent("tools",  ComponentType.TOOL,          "...", token_count=3000),
        ContextComponent("user",   ComponentType.USER_MESSAGE,  "...", token_count=500),
    ],
    total_tokens=25000,
)

diff = ContextDiff(
    before=before,
    after=after,
    before_label="v1 (Verbose)",
    after_label="v2 (Optimized)",
)

diff.sankey()

In [None]:
diff.summary()

In [None]:
# Side by side: before and after
from IPython.display import display, HTML

display(HTML("<h3>BEFORE</h3>"))
display(ContextWindow(trace=before))
display(HTML("<h3>AFTER</h3>"))
display(ContextWindow(trace=after))

---
## 8 | Context budget analysis

Programmatically inspect where your tokens go.

In [None]:
# Analyze the RAG trace from Section 1
trace = rag_trace

print("CONTEXT BUDGET ANALYSIS")
print("=" * 50)
print(f"Context limit:  {trace.context_limit:>10,} tokens")
print(f"Tokens used:    {trace.total_tokens:>10,} tokens")
print(f"Tokens free:    {trace.unused_tokens:>10,} tokens")
print(f"Utilization:    {trace.utilization:>9.1f}%")
print()
print(f"{'Component Type':<22} {'Tokens':>8} {'% of Used':>10} {'Count':>6}")
print("-" * 50)

for comp_type in ComponentType:
    comps = trace.get_components_by_type(comp_type)
    if comps:
        total = sum(c.token_count for c in comps)
        pct = (total / trace.total_tokens) * 100 if trace.total_tokens else 0
        print(f"  {comp_type.value:<20} {total:>8,} {pct:>9.1f}% {len(comps):>5}")

In [None]:
# Individual component breakdown
print(f"\n{'ID':<20} {'Type':<18} {'Tokens':>7}")
print("-" * 50)
for c in trace.components:
    print(f"{c.id:<20} {c.type.value:<18} {c.token_count:>7,}")

---
## 9 | Serialization & reproducibility

Traces serialize to JSON so you can save snapshots, share with teammates,
and compare across time.

In [None]:
import json

# Save to JSON
rag_trace.to_json("deep_dive_trace.json")
print("Saved trace to deep_dive_trace.json")

# Peek at the JSON structure
with open("deep_dive_trace.json") as f:
    data = json.load(f)

print(f"\nJSON keys: {list(data.keys())}")
print(f"Components: {len(data['components'])}")
print(f"Chroma queries: {len(data.get('chroma_queries', []))}")
print(f"Total tokens: {data['total_tokens']:,}")

In [None]:
# Reload and visualize
reloaded = ContextTrace.from_json("deep_dive_trace.json")

print(f"Reloaded: {len(reloaded.components)} components, {reloaded.total_tokens:,} tokens")
print(f"Chroma queries: {len(reloaded.chroma_queries)}")

ContextWindow(trace=reloaded, show_available_pool=True)

In [None]:
# Convert to dict for programmatic access
trace_dict = rag_trace.to_dict()

# Example: extract all RAG scores
for comp in trace_dict["components"]:
    if comp["type"] == "rag_document" and comp.get("metadata"):
        score = comp["metadata"].get("chroma_score", "N/A")
        print(f"  {comp['id']}: score={score}")

In [None]:
# Clean up
os.remove("deep_dive_trace.json")

---

## Summary

| Feature | How to use it |
|---|---|
| Chroma tracing | `traced = trace_chroma(collection)` then `.query()` / `.mark_selected()` |
| OpenAI tracing | `with trace_openai() as t:` wraps any `client.chat.completions.create` call |
| Visualization | `ContextWindow(trace=trace, layout="horizontal")` |
| Interactions | Hover (tooltip), Click (modal), Click text (edit) |
| Available pool | `ContextWindow(trace=trace, show_available_pool=True)` |
| Diff view | `ContextDiff(before, after).sankey()` |
| Save/load | `trace.to_json('file.json')` / `ContextTrace.from_json('file.json')` |
| Budget analysis | `trace.utilization`, `trace.unused_tokens`, `trace.get_components_by_type()` |