# 10: Observability Dashboard üî≠

This notebook serves as the **"Flight Recorder"** for the SalesOps Agent Suite (Day 9).

To win the Capstone, we must prove that our agent is **Deterministic, Measurable, and Auditable**. This dashboard visualizes the JSONL telemetry logs generated by the `observability` package.

### üéØ Goals
1.  **Audit Runs:** See a history of all Coordinator executions (Success/Failure).
2.  **Visualize Traces:** View a Gantt chart of the agent workflow (Ingest ‚Üí Detect ‚Üí Explain ‚Üí Act).
3.  **Analyze Performance:** Track LLM latency and estimated token costs.
4.  **Verify Actions:** Confirm that downstream actions (Jira/Email) were executed correctly.

### üèóÔ∏è Components Used
* `observability.collector.LogCollector`: Aggregates logs from `outputs/observability/`.
* `plotly`: Interactive charts for Traces and Metrics.

## 1: Imports

In [1]:
import sys
import os
import json
import pandas as pd
import plotly.express as px
from IPython.display import display, Markdown

# Add project root to path
project_root = os.path.abspath(os.path.join(os.path.dirname("__file__"), ".."))
if project_root not in sys.path:
    sys.path.append(project_root)

os.environ["OBSERVABILITY_DIR"] = os.path.join(project_root, "outputs", "observability")

from observability.collector import LogCollector

# Initialize Collector
OBS_DIR = os.environ["OBSERVABILITY_DIR"]
collector = LogCollector(OBS_DIR)

print(f"‚úÖ Dashboard Connected to: {os.path.abspath(OBS_DIR)}")

‚úÖ Dashboard Connected to: d:\01. Github\salesops-suite\outputs\observability


## 2: Runs Overview

In [2]:
df_runs = collector.get_runs()

if not df_runs.empty:
    print(f"üìä Total Runs: {len(df_runs)}")

    # Status Breakdown
    status_counts = df_runs["status"].value_counts().reset_index()
    status_counts.columns = ["Status", "Count"]
    display(status_counts)

    # Show Table
    display(df_runs[["run_id", "status", "start_ts", "duration_sec"]].tail())
else:
    print("‚ùå No runs found. Please execute 'python main.py' first.")

üìä Total Runs: 6


Unnamed: 0,Status,Count
0,completed,6


Unnamed: 0,run_id,status,start_ts,duration_sec
1,run_20251129T062235Z_e50586,completed,2025-11-29 06:22:35.678508+00:00,27.209853
2,run_20251129T062235Z_e50586,completed,2025-11-29 06:24:56.430967+00:00,26.903636
3,run_20251129T064142Z_aeb5fe,completed,2025-11-29 06:41:42.434285+00:00,28.521615
4,run_20251129T133320Z_6aa263,completed,2025-11-29 13:33:20.674748+00:00,28.975827
5,run_20251129T133320Z_6aa263,completed,2025-11-29 13:34:20.904851+00:00,27.303216


## 3: Trace Visualization (Gantt Chart)

In [3]:
df_spans = collector.get_traces()

if not df_spans.empty and not df_runs.empty:
    current_spans = df_spans.copy()

    if not current_spans.empty:
        # Sort for Gantt
        current_spans = current_spans.sort_values("start_ts")

        # Ensure visibility (min duration 1ms)
        current_spans["duration_ms"] = current_spans["duration_ms"].apply(
            lambda x: max(x, 1)
        )

        fig = px.timeline(
            current_spans,
            x_start="start_ts",
            x_end="end_ts",
            y="name",
            color="component",
            title=f"Execution Trace Waterfall",
            hover_data=["duration_ms", "status", "error"],
            height=400,
        )

        fig.update_traces(marker_line_width=1, opacity=1)
        fig.update_yaxes(autorange="reversed")
        fig.show()
    else:
        print("No spans found.")
else:
    print("‚ùå No trace data available.")

## 4: LLM Metrics (Cost & Latency)

In [4]:
df_llm = collector.get_llm_calls()

if not df_llm.empty:
    # Latency Distribution
    fig_hist = px.histogram(
        df_llm,
        x="latency_ms",
        nbins=20,
        color="model",
        title="LLM Latency Distribution (ms)",
        marginal="box",
    )
    fig_hist.show()

    # KPIs
    total_calls = len(df_llm)
    total_tokens = df_llm["est_tokens"].sum() if "est_tokens" in df_llm.columns else 0
    avg_latency = df_llm["latency_ms"].mean()

    md = f"""
### ü§ñ AI Metrics
* **Total Calls:** {total_calls}
* **Est. Tokens:** {total_tokens:,.0f}
* **Avg Latency:** {avg_latency:.0f} ms
    """
    display(Markdown(md))
else:
    print("‚ùå No LLM calls recorded.")


### ü§ñ AI Metrics
* **Total Calls:** 41
* **Est. Tokens:** 9,559
* **Avg Latency:** 2015 ms
    

## 5: Action Audit

In [5]:
df_actions = collector.get_actions()

if not df_actions.empty:
    # Parse nested result status if needed
    if "result" in df_actions.columns:
        # Safe extraction
        df_actions["status_code"] = df_actions["result"].apply(
            lambda x: x.get("http_code") if isinstance(x, dict) else None
        )
        df_actions["outcome"] = df_actions["result"].apply(
            lambda x: x.get("status") if isinstance(x, dict) else None
        )

    fig_bar = px.bar(
        df_actions,
        x="type",
        color="outcome",
        title="Actions Executed by Type",
        barmode="group",
    )
    fig_bar.show()

    print("Recent Actions:")

    # Handle missing timestamp column gracefully (Legacy logs compatibility)
    cols_to_show = ["action_id", "type", "outcome"]
    if "timestamp" in df_actions.columns:
        cols_to_show.insert(0, "timestamp")

    display(df_actions[cols_to_show].tail())
else:
    print("‚ùå No actions recorded.")

Recent Actions:


Unnamed: 0,timestamp,action_id,type,outcome
8,2025-11-29T13:28:24.296904+00:00,0521be34-2df4-4948-bcd9-9fb92fcb65ff,create_ticket,success
9,2025-11-29T13:28:26.346371+00:00,ed78edec-50b9-42d8-ad66-27b98c251f63,create_ticket,success
10,2025-11-29T13:28:28.403090+00:00,b47a3702-0ded-47b2-adfb-328c2a825bad,create_ticket,success
11,2025-11-29T13:28:30.469580+00:00,da6d7287-9193-421b-862d-73ae179d4a94,create_ticket,success
12,2025-11-29T13:28:32.547927+00:00,0c9cb42c-0e1c-4e77-b2f6-15d58f840bd2,create_ticket,success


## 6: Deep Dive Evidence

In [6]:
if not df_llm.empty:
    last_call = df_llm.iloc[-1]
    prompt_hash = last_call.get("prompt_hash")

    print(f"üîç Inspecting Last AI Call: {last_call['anomaly_id']}")

    raw_path = f"{OBS_DIR}/responses/{prompt_hash}.json"
    if os.path.exists(raw_path):
        with open(raw_path, "r") as f:
            raw_data = json.load(f)

        print("\n--- üìù Prompt (Truncated) ---")
        print(
            raw_data["prompt"][:1000] + "..."
            if len(raw_data["prompt"]) > 1000
            else raw_data["prompt"]
        )

        print("\n--- üí° Model Response ---")
        print(json.dumps(raw_data["response"], indent=2))
    else:
        print(f"‚ö†Ô∏è Raw response file not found: {raw_path}")

üîç Inspecting Last AI Call: new_event_123

--- üìù Prompt (Truncated) ---
You are a Senior SalesOps Analyst. Analyze this sales anomaly.

DATA CONTEXT:
- Entity: Technology (category)
- Metric: Sales
- Value: 25,000.00
- Expected: 5,000.00
- Score: 4.80

STATISTICAL CONTEXT:
window_mean: 5000

HISTORICAL CONTEXT (From Memory Bank):
**Relevant Past Events (Learned History):**
- [2025-11-29] (Sim: 0.71) Historical: Technology sales dipped in 2014 due to supply chain.
- [2025-11-29] (Sim: 0.67) Anomaly in Technology (Sales). Severity: 5.5. Explanation: Spike caused by bulk laptop order from Acme Corp.. Action Taken: create_ticket.
- [2025-11-29] (Sim: 0.67) Anomaly in Technology (Sales). Severity: 5.5. Explanation: Spike caused by bulk laptop order from Acme Corp.. Action Taken: create_ticket.

OUTPUT FORMAT:
Return valid JSON with these exact keys:
{
    "explanation_short": "1 sentence summary",
    "explanation_full": "2-3 sentence detailed analysis. Reference history if relevant.",

## ‚è≠Ô∏è Next Step: Proving Quality (Evaluation)

Success! We have built the **Observability Dashboard**.
* We can see the **Trace Waterfall** of our agents.
* We can audit every **Action** taken.
* We can measure **LLM Latency and Cost**.

**But... does it actually work?**
Tracing shows *what* happened, but not *how good* it was.
* Did the detector find all the anomalies? (Recall)
* Did the explainer give accurate reasons? (Quality)
* Did the system survive errors? (Robustness)

In **Day 10**, we will build the **Evaluation Pipeline**.
We will use **Synthetic Golden Datasets**, **Automated Regression Tests**, and **Human-in-the-loop Scoring** to generate a final "Report Card" for our submission.

üëâ **Proceed to `../evaluation/99_evaluation_report.ipynb`.**