# CERT Dashboard - Authenticated LLM Monitoring

This notebook demonstrates how to use the **CERT Dashboard** as a standalone SaaS-like tool for monitoring and evaluating LLM applications.

## What is CERT Dashboard?

CERT (Compliance Evaluation and Reporting Tool) Dashboard is a **real-time LLM monitoring platform** that provides:

| Feature | Description |
|---------|-------------|
| **Trace Collection** | Capture all LLM calls with inputs, outputs, tokens, latency |
| **Persistent Storage** | Traces stored in your database, linked to your account |
| **Quality Evaluations** | LLM Judge, Grounding Check, Human Review |
| **Cost Analysis** | Track spending across providers and models |
| **Performance Monitoring** | Latency, throughput, error rates |

## Architecture

```
Your Notebook/App                    CERT Dashboard
     │                                    │
     │  1. Register & get API key         │
     │ ─────────────────────────────────► │
     │                                    │
     │  2. Send traces with API key       │
     │ ─────────────────────────────────► │  ──► Supabase DB
     │    POST /api/v1/traces             │      (persistent)
     │    Header: X-API-Key: xxx          │
     │                                    │
     │  3. View & evaluate in dashboard   │
     │ ◄───────────────────────────────── │
```

## This Demo

We'll run a **multi-agent document extraction pipeline** using:
- **Claude Sonnet 4.5** - Extract structured data from Apple's 10-K SEC filing
- **GPT-4o** - Generate an executive briefing report

All LLM calls will be traced to CERT Dashboard for monitoring and evaluation.

---
## Step 1: CERT Dashboard Setup

### 1.1 Register for an Account

Before running this notebook, you need to:

1. **Go to your CERT Dashboard** (e.g., `https://your-cert-dashboard.com`)
2. **Click "Sign In"** in the top right
3. **Click "Create Account"** or go to `/register`
4. **Fill in your details:**
   - Full Name
   - Email
   - Company (optional)
   - Password
5. **After registration, go to `/account`** to see your API key

### 1.2 Copy Your API Key

Your API key looks like: `cert_a1b2c3d4e5f6...`

This key:
- Authenticates your notebook requests
- Links all traces to your account
- Enables persistent storage in the database

In [None]:
# Install required packages
!pip install -q langchain langchain-openai langchain-anthropic langchain-community
!pip install -q langchain-text-splitters
!pip install -q pypdf tiktoken
!pip install -q pydantic requests

print("Packages installed successfully!")

In [None]:
import os
from getpass import getpass

# =============================================================
# CERT DASHBOARD CONFIGURATION
# =============================================================

# Your CERT Dashboard URL
CERT_DASHBOARD_URL = "https://dashboard.cert-framework.com"  # Change to your URL

# Your CERT API Key (get this from /account page after registration)
if 'CERT_API_KEY' not in os.environ:
    os.environ['CERT_API_KEY'] = getpass('Enter your CERT API Key: ')

CERT_API_KEY = os.environ['CERT_API_KEY']

print(f"CERT Dashboard: {CERT_DASHBOARD_URL}")
print(f"API Key: {CERT_API_KEY[:20]}..." if len(CERT_API_KEY) > 20 else f"API Key: {CERT_API_KEY}")

In [None]:
# =============================================================
# LLM API KEYS
# =============================================================

# Set your LLM provider API keys
if 'OPENAI_API_KEY' not in os.environ:
    os.environ['OPENAI_API_KEY'] = getpass('Enter your OpenAI API key: ')

if 'ANTHROPIC_API_KEY' not in os.environ:
    os.environ['ANTHROPIC_API_KEY'] = getpass('Enter your Anthropic API key: ')

print("LLM API keys configured.")

---
## Step 2: CERT Tracer with Authentication

The `CERTTracer` class sends LLM call data to your CERT Dashboard.

**Key features:**
- Uses `X-API-Key` header for authentication
- Traces are stored persistently in your database
- Supports source context for Grounding Check evaluation
- No third-party telemetry dependencies

In [None]:
import requests
import time
import uuid
from datetime import datetime
from typing import List, Optional, Dict, Any

class CERTTracer:
    """
    Authenticated tracer for CERT Dashboard.
    
    Sends LLM call traces to your CERT Dashboard with:
    - API key authentication (traces linked to your account)
    - Persistent storage in database
    - Support for Grounding Check (source context)
    """
    
    def __init__(self, dashboard_url: str, api_key: str, project_name: str = "default"):
        """
        Initialize the CERT tracer.
        
        Args:
            dashboard_url: Your CERT Dashboard URL (e.g., https://your-cert.com)
            api_key: Your CERT API key (from /account page)
            project_name: Name for this project/experiment
        """
        self.endpoint = f"{dashboard_url.rstrip('/')}/api/v1/traces"
        self.api_key = api_key
        self.project_name = project_name
        self.session_id = str(uuid.uuid4())[:8]
        self.traces_sent = 0
        
    def _get_headers(self) -> Dict[str, str]:
        """Get request headers with authentication."""
        return {
            "Content-Type": "application/json",
            "X-API-Key": self.api_key  # Authentication header
        }
    
    def send_trace(self, trace_data: dict) -> bool:
        """
        Send a single trace to CERT Dashboard.
        
        Returns True if successful, False otherwise.
        """
        try:
            response = requests.post(
                self.endpoint,
                json={"traces": [trace_data]},
                headers=self._get_headers(),
                timeout=10
            )
            
            if response.status_code == 200:
                result = response.json()
                self.traces_sent += 1
                storage = "database" if result.get('stored_in_db') else "memory"
                print(f"  [CERT] Trace #{self.traces_sent} sent -> stored in {storage}")
                return True
            elif response.status_code == 401:
                print(f"  [CERT] Authentication failed. Check your API key.")
                return False
            else:
                print(f"  [CERT] Error {response.status_code}: {response.text[:200]}")
                return False
                
        except requests.exceptions.ConnectionError:
            print(f"  [CERT] Connection failed. Is the dashboard running?")
            return False
        except Exception as e:
            print(f"  [CERT] Error: {e}")
            return False
    
    def trace_llm_call(
        self,
        provider: str,
        model: str,
        input_text: str,
        output_text: str,
        duration_ms: float,
        prompt_tokens: int = 0,
        completion_tokens: int = 0,
        context: Optional[List[str]] = None,
        status: str = "ok",
        metadata: Optional[Dict[str, Any]] = None
    ) -> dict:
        """
        Record an LLM call trace and send to CERT Dashboard.
        
        Args:
            provider: LLM provider (e.g., "anthropic", "openai")
            model: Model name (e.g., "claude-sonnet-4-5-20250929")
            input_text: The prompt/input sent to the LLM
            output_text: The response from the LLM
            duration_ms: Time taken for the call in milliseconds
            prompt_tokens: Number of input tokens
            completion_tokens: Number of output tokens
            context: Source documents/chunks for Grounding Check evaluation
            status: "ok" or "error"
            metadata: Additional metadata to include
            
        Returns:
            The trace dictionary that was sent
        """
        trace = {
            "id": f"{self.session_id}-{str(uuid.uuid4())[:8]}",
            "provider": provider,
            "model": model,
            "input": input_text,
            "output": output_text,
            "promptTokens": prompt_tokens,
            "completionTokens": completion_tokens,
            "durationMs": round(duration_ms, 2),
            "timestamp": datetime.utcnow().isoformat() + "Z",
            "status": status,
            "metadata": {
                "project": self.project_name,
                "session_id": self.session_id,
                **(metadata or {})
            }
        }
        
        # Add context for Grounding Check evaluation
        # This allows CERT to verify outputs are grounded in source documents
        if context:
            trace["context"] = context
            
        self.send_trace(trace)
        return trace
    
    def test_connection(self) -> bool:
        """Test the connection and authentication to CERT Dashboard."""
        print(f"Testing connection to: {self.endpoint}")
        print(f"Using API key: {self.api_key[:20]}...")
        
        try:
            # Send a test trace
            test_trace = {
                "id": f"test-{str(uuid.uuid4())[:8]}",
                "provider": "test",
                "model": "connection-test",
                "input": "Connection test",
                "output": "Test successful",
                "promptTokens": 0,
                "completionTokens": 0,
                "durationMs": 0,
                "timestamp": datetime.utcnow().isoformat() + "Z",
                "metadata": {"test": True}
            }
            
            response = requests.post(
                self.endpoint,
                json={"traces": [test_trace]},
                headers=self._get_headers(),
                timeout=10
            )
            
            if response.status_code == 200:
                result = response.json()
                print(f"Connection successful!")
                print(f"  Traces received: {result.get('received', 'unknown')}")
                print(f"  Stored in DB: {result.get('stored_in_db', False)}")
                print(f"  User ID: {result.get('user_id', 'anonymous')}")
                return True
            elif response.status_code == 401:
                print(f"Authentication failed!")
                print(f"  Please check your API key at /account")
                return False
            else:
                print(f"Connection failed: {response.status_code}")
                print(f"  Response: {response.text[:200]}")
                return False
                
        except requests.exceptions.ConnectionError:
            print(f"Connection failed: Could not reach {self.endpoint}")
            print(f"  Is your CERT Dashboard running?")
            return False
        except Exception as e:
            print(f"Connection failed: {e}")
            return False

In [None]:
# =============================================================
# INITIALIZE CERT TRACER
# =============================================================

cert_tracer = CERTTracer(
    dashboard_url=CERT_DASHBOARD_URL,
    api_key=CERT_API_KEY,
    project_name="apple-10k-extraction"
)

print("=" * 60)
print("CERT Dashboard - Authenticated Tracer")
print("=" * 60)
print(f"Dashboard URL: {CERT_DASHBOARD_URL}")
print(f"Project: {cert_tracer.project_name}")
print(f"Session ID: {cert_tracer.session_id}")
print()
print("Testing connection...")
print("-" * 40)

connection_ok = cert_tracer.test_connection()

if connection_ok:
    print()
    print("View your traces at:")
    print(f"  {CERT_DASHBOARD_URL}/quality")
    print(f"  {CERT_DASHBOARD_URL}/operational/observability")
else:
    print()
    print("Troubleshooting:")
    print("  1. Verify your CERT Dashboard is running")
    print("  2. Check your API key at /account")
    print("  3. Ensure the URL is correct")
print("=" * 60)

---
## Step 3: Load Document for Extraction

We'll extract data from Apple's 10-K SEC filing as a demo.

In [None]:
# For Google Colab: Upload the PDF
try:
    from google.colab import files
    print("Running in Google Colab")
    print("Please upload Apple's 10-K PDF file:")
    print("(Download from: https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0000320193&type=10-K)")
    uploaded = files.upload()
    PDF_PATH = list(uploaded.keys())[0]
    print(f"\nUploaded: {PDF_PATH}")
except ImportError:
    # Running locally
    print("Running locally - specify the PDF path:")
    PDF_PATH = input("Enter path to 10-K PDF: ") or "apple_10k.pdf"
    print(f"Using: {PDF_PATH}")

In [None]:
from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter

# Load the PDF
print(f"Loading PDF: {PDF_PATH}")
loader = PyPDFLoader(PDF_PATH)
documents = loader.load()
print(f"Loaded {len(documents)} pages")

# Split into chunks
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=4000,
    chunk_overlap=200,
    separators=["\n\n", "\n", " ", ""]
)

chunks = text_splitter.split_documents(documents)
print(f"Split into {len(chunks)} chunks")

# Preview first chunk
print(f"\nFirst chunk preview ({len(chunks[0].page_content)} chars):")
print("-" * 40)
print(chunks[0].page_content[:500] + "...")

---
## Step 4: Define Data Models

Pydantic models for structured extraction.

In [None]:
from pydantic import BaseModel, Field
from typing import List, Optional

class FinancialMetrics(BaseModel):
    """Key financial metrics from 10-K"""
    total_revenue: Optional[str] = Field(description="Total net revenue/sales")
    net_income: Optional[str] = Field(description="Net income")
    gross_margin: Optional[str] = Field(description="Gross margin percentage")
    operating_income: Optional[str] = Field(description="Operating income")
    eps: Optional[str] = Field(description="Earnings per share (diluted)")
    total_assets: Optional[str] = Field(description="Total assets")
    total_debt: Optional[str] = Field(description="Total debt/liabilities")
    cash_and_equivalents: Optional[str] = Field(description="Cash and cash equivalents")

class BusinessSegments(BaseModel):
    """Revenue by segment"""
    iphone_revenue: Optional[str] = Field(description="iPhone revenue")
    mac_revenue: Optional[str] = Field(description="Mac revenue")
    ipad_revenue: Optional[str] = Field(description="iPad revenue")
    wearables_revenue: Optional[str] = Field(description="Wearables, Home and Accessories revenue")
    services_revenue: Optional[str] = Field(description="Services revenue")

class RiskFactors(BaseModel):
    """Key risk factors"""
    risks: List[str] = Field(description="List of key risk factors")

class ExtractedData(BaseModel):
    """Complete extracted data from 10-K"""
    company_name: str = Field(description="Company name")
    fiscal_year: str = Field(description="Fiscal year")
    financial_metrics: FinancialMetrics
    business_segments: BusinessSegments
    risk_factors: RiskFactors
    business_overview: str = Field(description="Brief business description")
    key_developments: List[str] = Field(description="Key developments")

print("Data models defined.")

---
## Step 5: Initialize LLM Models

In [None]:
from langchain_anthropic import ChatAnthropic
from langchain_openai import ChatOpenAI

# Claude Sonnet 4.5 for extraction (excellent at structured data)
extractor_llm = ChatAnthropic(
    model="claude-sonnet-4-5-20250929",
    temperature=0,
    max_tokens=4096
)

# GPT-4o for report generation (strong narrative writing)
report_llm = ChatOpenAI(
    model="gpt-4o",
    temperature=0.3,
    max_tokens=2048
)

print("LLM Models:")
print("  Agent 1 (Extractor): Claude Sonnet 4.5")
print("  Agent 2 (Reporter):  GPT-4o")

---
## Step 6: Agent 1 - Document Extractor (Claude Sonnet 4.5)

This agent extracts structured financial data from the 10-K filing.

**CERT Features Used:**
- `context` parameter: Sends source document chunks for **Grounding Check** evaluation
- Token tracking: Monitors input/output tokens for cost analysis

In [None]:
from langchain_core.prompts import ChatPromptTemplate
import json

# Extraction prompt
extraction_prompt = ChatPromptTemplate.from_messages([
    ("system", """You are an expert financial analyst specializing in SEC filings.
Extract key information from Apple's 10-K filing and return it as valid JSON.

Extract:
- Company name and fiscal year
- Financial metrics (revenue, net income, margins, EPS, assets, debt, cash)
- Revenue by segment (iPhone, Mac, iPad, Wearables, Services)
- Top 5 risk factors
- Brief business overview (2-3 sentences)
- Key developments (3-5 bullet points)

Use null for missing values. Include units (e.g., "$394.3 billion").

Return ONLY valid JSON matching this schema:
{schema}"""),
    ("human", """Analyze these sections from Apple's 10-K:

---
{document_text}
---

Return extracted data as JSON:""")
])

print("Extraction prompt configured.")

In [None]:
def extract_with_tracing(chunks, llm, prompt, tracer):
    """
    Extract data from document chunks with CERT tracing.
    
    Sends source chunks as 'context' for Grounding Check evaluation,
    which verifies extracted data comes from source documents.
    """
    # Prepare source chunks (for Grounding Check)
    source_chunks = [chunk.page_content for chunk in chunks[:30]]
    combined_text = "\n\n---\n\n".join(source_chunks)
    
    # Truncate if needed
    if len(combined_text) > 100000:
        combined_text = combined_text[:100000]
        source_chunks = source_chunks[:15]
    
    print(f"Processing {len(combined_text):,} characters")
    print(f"Sending {len(source_chunks)} source chunks for Grounding Check")
    
    # Create chain and run
    chain = prompt | llm
    
    start_time = time.time()
    response = chain.invoke({
        "schema": ExtractedData.model_json_schema(),
        "document_text": combined_text
    })
    duration_ms = (time.time() - start_time) * 1000
    
    # Get token counts
    prompt_tokens = 0
    completion_tokens = 0
    if hasattr(response, 'usage_metadata') and response.usage_metadata:
        prompt_tokens = response.usage_metadata.get('input_tokens', 0)
        completion_tokens = response.usage_metadata.get('output_tokens', 0)
    
    print(f"\nExtraction complete in {duration_ms/1000:.1f}s")
    print(f"Tokens: {prompt_tokens:,} input, {completion_tokens:,} output")
    
    # Send trace to CERT Dashboard with source context
    tracer.trace_llm_call(
        provider="anthropic",
        model="claude-sonnet-4-5-20250929",
        input_text=f"Extract financial data from Apple 10-K ({len(combined_text):,} chars)",
        output_text=response.content[:1000] + "..." if len(response.content) > 1000 else response.content,
        duration_ms=duration_ms,
        prompt_tokens=prompt_tokens,
        completion_tokens=completion_tokens,
        context=source_chunks,  # Enable Grounding Check!
        metadata={"task": "extraction", "doc_type": "10-K"}
    )
    
    return response.content

In [None]:
print("=" * 60)
print("AGENT 1: Document Extractor (Claude Sonnet 4.5)")
print("=" * 60)
print("\nExtracting from 10-K filing...")
print("-" * 40)

# Run extraction
extraction_result = extract_with_tracing(
    chunks, 
    extractor_llm, 
    extraction_prompt,
    cert_tracer
)

# Parse JSON
try:
    json_start = extraction_result.find('{')
    json_end = extraction_result.rfind('}') + 1
    json_str = extraction_result[json_start:json_end]
    extracted_data = json.loads(json_str)
    print("\nExtraction successful!")
except json.JSONDecodeError:
    print("\nWarning: Could not parse JSON, using raw response")
    extracted_data = {"raw_response": extraction_result}

print("\n" + "-" * 60)
print("EXTRACTED DATA:")
print("-" * 60)
print(json.dumps(extracted_data, indent=2))

---
## Step 7: Agent 2 - Report Generator (GPT-4o)

This agent generates an executive briefing from the extracted data.

In [None]:
# Report generation prompt
report_prompt = ChatPromptTemplate.from_messages([
    ("system", """You are a senior financial analyst writing an executive briefing.
Create a concise, professional report with:
1. Key financial performance metrics
2. Notable trends and changes
3. Material risks summary
4. Brief outlook

Use professional language, clear sections, and bullet points."""),
    ("human", """Based on this extracted Apple 10-K data, generate an executive briefing (500-700 words):

EXTRACTED DATA:
{extracted_data}

Generate the report:""")
])

print("Report prompt configured.")

In [None]:
print("=" * 60)
print("AGENT 2: Report Generator (GPT-4o)")
print("=" * 60)
print("\nGenerating executive briefing...")
print("-" * 40)

# Create chain and run
report_chain = report_prompt | report_llm
report_input = json.dumps(extracted_data, indent=2)

start_time = time.time()
report_response = report_chain.invoke({"extracted_data": report_input})
duration_ms = (time.time() - start_time) * 1000

final_report = report_response.content

# Get token counts
prompt_tokens = 0
completion_tokens = 0
if hasattr(report_response, 'usage_metadata') and report_response.usage_metadata:
    prompt_tokens = report_response.usage_metadata.get('input_tokens', 0)
    completion_tokens = report_response.usage_metadata.get('output_tokens', 0)

print(f"\nReport generated in {duration_ms/1000:.1f}s")
print(f"Tokens: {prompt_tokens:,} input, {completion_tokens:,} output")

# Send trace to CERT Dashboard
cert_tracer.trace_llm_call(
    provider="openai",
    model="gpt-4o",
    input_text="Generate executive briefing from extracted Apple 10-K data",
    output_text=final_report[:1000] + "..." if len(final_report) > 1000 else final_report,
    duration_ms=duration_ms,
    prompt_tokens=prompt_tokens,
    completion_tokens=completion_tokens,
    metadata={"task": "report_generation", "doc_type": "executive_briefing"}
)

In [None]:
print("\n" + "=" * 60)
print("FINAL EXECUTIVE BRIEFING REPORT")
print("=" * 60)
print(final_report)

---
## Step 8: View Results in CERT Dashboard

Your traces are now stored in CERT Dashboard. Here's what you can do:

In [None]:
print("=" * 60)
print("CERT DASHBOARD - Next Steps")
print("=" * 60)
print(f"\nSession ID: {cert_tracer.session_id}")
print(f"Traces sent: {cert_tracer.traces_sent}")
print()
print("View your traces:")
print(f"  {CERT_DASHBOARD_URL}/quality")
print()
print("Available evaluations:")
print()
print("1. LLM Judge (/quality/judge)")
print("   - Automated quality scoring using another LLM")
print("   - Evaluates: Accuracy, Completeness, Coherence, Relevance")
print()
print("2. Grounding Check (/quality/judge)")
print("   - Verifies outputs are grounded in source documents")
print("   - Perfect for document extraction tasks!")
print("   - Uses the 'context' field we sent with traces")
print()
print("3. Human Review (/quality/review)")
print("   - Manual review interface for quality assurance")
print("   - Pass/Fail/Review decisions")
print()
print("Operational monitoring:")
print(f"  {CERT_DASHBOARD_URL}/operational/performance - Latency & throughput")
print(f"  {CERT_DASHBOARD_URL}/operational/costs - Cost analysis by model")
print(f"  {CERT_DASHBOARD_URL}/operational/observability - Full trace details")
print()
print("Account management:")
print(f"  {CERT_DASHBOARD_URL}/account - View API key & usage stats")
print("=" * 60)

---
## Step 9: Save Results Locally

In [None]:
from datetime import datetime

# Create output
output = {
    "metadata": {
        "source_document": PDF_PATH,
        "extraction_model": "claude-sonnet-4-5-20250929",
        "report_model": "gpt-4o",
        "timestamp": datetime.now().isoformat(),
        "cert_session_id": cert_tracer.session_id,
        "cert_traces_sent": cert_tracer.traces_sent
    },
    "extracted_data": extracted_data,
    "executive_report": final_report
}

# Save to JSON
output_filename = f"apple_10k_analysis_{datetime.now().strftime('%Y%m%d_%H%M%S')}.json"
with open(output_filename, 'w') as f:
    json.dump(output, f, indent=2)

print(f"Results saved to: {output_filename}")

# Download in Colab
try:
    from google.colab import files
    files.download(output_filename)
except ImportError:
    print(f"(Running locally - file saved to current directory)")

---
## Summary

### What We Did

1. **Registered** for a CERT Dashboard account and got an API key
2. **Configured** the `CERTTracer` with authentication
3. **Ran** a multi-agent pipeline:
   - **Claude Sonnet 4.5** extracted structured data from Apple's 10-K
   - **GPT-4o** generated an executive briefing report
4. **Traced** all LLM calls to CERT Dashboard with:
   - Source context for Grounding Check
   - Token counts for cost analysis
   - Timing for performance monitoring

### CERT Dashboard Features

| Feature | Description | URL |
|---------|-------------|-----|
| Quality Overview | Summary of all evaluations | `/quality` |
| LLM Judge | Automated quality scoring | `/quality/judge` |
| Grounding Check | Verify outputs match sources | `/quality/judge` |
| Human Review | Manual quality review | `/quality/review` |
| Performance | Latency & throughput metrics | `/operational/performance` |
| Cost Analysis | Spending by model/provider | `/operational/costs` |
| Observability | Full trace details | `/operational/observability` |
| Account | API key & usage stats | `/account` |

### Authentication Flow

```python
# 1. Get your API key from /account after registration
CERT_API_KEY = "cert_a1b2c3d4e5f6..."

# 2. Initialize tracer
tracer = CERTTracer(
    dashboard_url="https://your-cert-dashboard.com",
    api_key=CERT_API_KEY,
    project_name="my-project"
)

# 3. Trace LLM calls
tracer.trace_llm_call(
    provider="openai",
    model="gpt-4o",
    input_text="...",
    output_text="...",
    duration_ms=1234,
    prompt_tokens=100,
    completion_tokens=200,
    context=["source1", "source2"]  # For Grounding Check
)
```

### Benefits of Persistent Storage

- **Traces persist** across page refreshes and sessions
- **Linked to your account** for team collaboration
- **Historical analysis** of LLM performance over time
- **Cost tracking** across all your projects