<div align="center">
<img src="https://imagedelivery.net/Dr98IMl5gQ9tPkFM5JRcng/3e5f6fbd-9bc6-4aa1-368e-e8bb1d6ca100/Ultra" alt="Contextual AI Logo" width="160" />
</div>

<br/>

# RAG Agent Monitoring & Analytics Dashboard

## Overview

This notebook demonstrates how to use the **Contextual AI Metrics API** to monitor, analyze, and optimize Retrieval-Augmented Generation (RAG) agents in production. It covers methods for tracking performance metrics, analyzing user feedback, and assessing system health to ensure consistent, high-quality responses.

### What is Covered

- **Real-time Metrics Collection**: Access conversation data and user feedback
- **Performance Analytics**: Analyze response quality, user satisfaction, and system usage patterns
- **Time Series Analysis**: Visualize performance over time
- **Quality Monitoring**: Examine feedback distributions, flagged responses, and other quality indicators
- **Operational Insights**: Identify usage peaks and engagement patterns

### Prerequisites

- **Contextual AI Account**: Active subscription with API access
- **API Key**: Valid API key with metrics access permissions
- **Python Environment**: Python 3.8+ with required dependencies
- **Active RAG Agent**: At least one deployed agent with conversation history

You can run this notebook entirely in Colab:

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ContextualAI/examples/blob/main/14-monitoring/monitoring_intro.ipynb)
     

### Resources

- [Contextual AI Documentation](https://docs.contextual.ai/)
- [API Reference](https://docs.contextual.ai/api-reference/datastores/list-datastores)


## Environment Setup & Configuration

### API Key Configuration

To access the Metrics API, you'll need a valid Contextual AI API key:

1. **Generate API Key**: Log into your tenant at [app.contextual.ai](https://app.contextual.ai)
2. **Navigate to API Keys**: Go to Settings → API Keys
3. **Create New Key**: Click "Create API Key" and copy the generated key
4. **Secure Storage**: Store your key securely using environment variables

### Required Dependencies

Install the necessary packages for data analysis and visualization:


In [None]:
# Install required packages (uncomment if needed)
# !pip install contextual-client pandas plotly matplotlib seaborn

In [None]:
import os
import requests
import json
from pathlib import Path
from typing import List, Optional, Dict
from IPython.display import display, JSON
import pandas as pd
import matplotlib.pyplot as plt
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from contextual import ContextualAI

# Set display options for better notebook experience
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)

print("✅ All dependencies imported successfully!")

In [None]:
# Initialize Contextual AI client
# Option 1: Use environment variable (recommended)
# os.environ["CONTEXTUAL_API_KEY"] = "your-api-key-here"

# Option 2: Direct initialization (for demo purposes)
client = ContextualAI(
    api_key=os.environ["CONTEXTUAL_API_KEY"]
)

print("Contextual AI client initialized successfully!")


In [None]:
def fetch_file(filepath):
    os.makedirs(os.path.dirname(filepath), exist_ok=True) if '/' in filepath else None
    if not os.path.exists(filepath):
        print(f"Fetching {filepath}")
        response = requests.get(f"https://raw.githubusercontent.com/ContextualAI/examples/main/14-monitoring/{filepath}")
        if response.ok:
            with open(filepath, 'wb') as f:
                f.write(response.content)
            print(f"Saved {filepath}")
        else:
            print(f"Failed to fetch {filepath}")

fetch_file('data/synthetic_data.csv')

## Metrics API: Data Extraction & Overview

The Contextual AI Metrics API provides access to your RAG agent's conversation data, including:

### Available Metrics

- **Conversation Data**: Questions and  answers, timestamps
- **User Feedback**: Thumbs up/down, flagged content, and custom feedback
- **Retrieval Metrics**: Number of documents retrieved, relevance scores
- **Usage Analytics**: User patterns, peak times, and engagement metrics

### Query Parameters

The Metrics API supports filtering options:

- `agent_id`: Target specific RAG agents
- `created_after/created_before`: Date range filtering
- `conversation_id`: Filter by specific conversations
- `user_id`: Analyze individual user interactions
- `limit/offset`: Pagination for large datasets

Let's start by extracting metrics data from an active Contextual AI RAG agent (if you don't have an active agent, just skip ahead to the next section).

In [None]:
# Specify your RAG agent ID
# Replace with your actual agent ID
agent_id = "fd4ebb74-"

# Extract metrics data with date filtering
# Adjust the date range based on your agent's activity
metrics = client.agents.query.metrics(
    agent_id=agent_id,
    created_after="2025-07-30"
)

print(f"📊 Total conversations retrieved: {metrics.total_count}")
print(f"🔗 Agent ID: {agent_id}")


In [None]:
# Convert metrics data to pandas DataFrame for analysis
messages_list = metrics.messages

# Create DataFrame from messages list
df = pd.DataFrame(messages_list)

print(f"📋 DataFrame shape: {df.shape}")
print(f"📊 Available columns: {list(df.columns)}")
print("\n🔍 First few rows:")
df.head()


## Demo Data: Synthetic Metrics for Analysis

For demonstration purposes, we'll use synthetic metrics data that showcases RAG agent performance patterns.

### Dataset Overview

Our synthetic dataset includes:
- **441 conversations** over a 9-day period
- **Feedback data** (thumbs up/down, flagged content)
- **Time-based usage variations** with peak hours and daily variations
- **Quality metrics** including response length and content flags
- **Delibrate errors included** to identify issues such as unavailable information

Let's load and explore this demonstration data:


In [None]:
# Load synthetic metrics data for demonstration
df = pd.read_csv('data/synthetic_data.csv')
df['created_at'] = pd.to_datetime(df['created_at'], format='%Y-%m-%d %H:%M:%S.%f')

print(f"📊 Dataset loaded successfully!")
print(f"📋 Shape: {df.shape[0]} rows × {df.shape[1]} columns")
print(f"📅 Date range: {df['created_at'].min().strftime('%Y-%m-%d')} to {df['created_at'].max().strftime('%Y-%m-%d')}")
print(f"⏰ Total duration: {(df['created_at'].max() - df['created_at'].min()).days} days")

print("\n🔍 Sample data:")
df.tail()


## Feature Engineering & Data Preparation

To enable analytics, we'll create additional derived features that provide deeper insights into agent performance and user behavior.

### Calculated Metrics

We'll engineer features for:
- **Content Analysis**: Word counts, response complexity
- **Temporal Patterns**: Hourly/daily usage, peak activity times
- **Quality Indicators**: Feedback patterns, flagged content detection
- **Engagement Metrics**: User interaction patterns and satisfaction rates


In [None]:
# Calculate metrics and features
print("🔧 Engineering features for enhanced analysis...")

# Basic counts and statistics
total_messages = len(df)
feedback_counts = df['feedback'].value_counts().to_dict()
feedback_counts['no_feedback'] = df['feedback'].isna().sum()
flagged_count = df['issues'].apply(lambda x: x == '{}').sum()

# Content analysis features
df['question_word_count'] = df['question'].astype(str).apply(lambda x: len(x.split()))
df['answer_word_count'] = df['answer'].astype(str).apply(lambda x: len(x.split()))
df['no_relevant_docs_flag'] = df['answer'].str.contains("I don't have relevant documentation", case=False, na=False)

# Temporal features for time series analysis
df['date'] = df['created_at'].dt.date
df['hour'] = df['created_at'].dt.hour
df['day_of_week'] = df['created_at'].dt.day_name()
df['feedback_category'] = df['feedback'].fillna('no_feedback')
df['has_feedback'] = df['feedback_category'] != 'no_feedback'

print("✅ Feature engineering complete!")
print(f"📊 Total messages processed: {total_messages}")
print(f"📈 New features added: {len(['question_word_count', 'answer_word_count', 'no_relevant_docs_flag', 'date', 'hour', 'day_of_week', 'feedback_category', 'has_feedback'])} features")


## Feedback Analysis & Quality Metrics

Understanding user feedback is crucial for monitoring RAG agent performance. This section provides insights into user satisfaction, quality issues, and areas for improvement.

### Key Metrics Analyzed

- **Feedback Distribution**: Overall satisfaction rates and feedback patterns
- **Quality Indicators**: Flagged content and problematic responses
- **Response Analysis**: Content length, complexity, and relevance flags
- **Trend Analysis**: How feedback patterns change over time


In [None]:
from IPython.display import HTML, display

print("📊 FEEDBACK ANALYSIS DASHBOARD")
print("=" * 50)
print(f"📈 Total Messages: {total_messages:,}")
print(f"📊 Feedback Rate: {len(df[df['has_feedback']])/len(df)*100:.1f}%")
print(f"📅 Analysis Period: {df['created_at'].min().strftime('%Y-%m-%d')} to {df['created_at'].max().strftime('%Y-%m-%d')}")

print("\n📋 Feedback Distribution:")
for feedback, count in feedback_counts.items():
    percentage = (count / total_messages) * 100
    print(f"   • {feedback.replace('_', ' ').title()}: {count:,} ({percentage:.1f}%)")

print(f"\n🚩 Quality Issues:")
print(f"   • Flagged Messages: {flagged_count:,}")
print(f"   • No Relevant Docs Responses: {df['no_relevant_docs_flag'].sum():,}")

# Create professional feedback distribution visualization
plt.figure(figsize=(12, 8))

# Main feedback distribution plot
plt.subplot(2, 2, 1)
colors = ['#2ca02c', '#d62728', '#ffd700', '#7f7f7f']  # Green, Red, Yellow, Gray
feedback_series = pd.Series(feedback_counts)
bars = plt.bar(feedback_series.index, feedback_series.values, color=colors, alpha=0.8)
plt.title('Feedback Distribution', fontsize=14, fontweight='bold')
plt.xlabel('Feedback Type', fontsize=12)
plt.ylabel('Number of Messages', fontsize=12)
plt.xticks(rotation=45)

# Add value labels on bars
for bar, value in zip(bars, feedback_series.values):
    plt.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 1,
             f'{value:,}', ha='center', va='bottom', fontweight='bold')

# Response length analysis
plt.subplot(2, 2, 2)
plt.hist(df['answer_word_count'], bins=30, alpha=0.7, color='#1f77b4', edgecolor='black')
plt.title('Answer Length Distribution', fontsize=14, fontweight='bold')
plt.xlabel('Word Count', fontsize=12)
plt.ylabel('Frequency', fontsize=12)
plt.axvline(df['answer_word_count'].mean(), color='red', linestyle='--',
            label=f'Mean: {df["answer_word_count"].mean():.1f}')
plt.legend()

# Question length analysis
plt.subplot(2, 2, 3)
plt.hist(df['question_word_count'], bins=20, alpha=0.7, color='#ff7f0e', edgecolor='black')
plt.title('Question Length Distribution', fontsize=14, fontweight='bold')
plt.xlabel('Word Count', fontsize=12)
plt.ylabel('Frequency', fontsize=12)
plt.axvline(df['question_word_count'].mean(), color='red', linestyle='--',
            label=f'Mean: {df["question_word_count"].mean():.1f}')
plt.legend()

# Daily activity pattern
plt.subplot(2, 2, 4)
daily_counts = df.groupby('day_of_week').size()
days_order = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
daily_counts = daily_counts.reindex(days_order, fill_value=0)
plt.bar(daily_counts.index, daily_counts.values, color='#9467bd', alpha=0.8)
plt.title('Daily Activity Pattern', fontsize=14, fontweight='bold')
plt.xlabel('Day of Week', fontsize=12)
plt.ylabel('Number of Messages', fontsize=12)
plt.xticks(rotation=45)

plt.tight_layout()
plt.show()


## Time Series Analysis & Usage Patterns

Understanding temporal patterns in RAG agent usage is essential for capacity planning, performance optimization, and user experience improvement. Here is an example interactive visualization for time-based analytics.

### Analysis Features

- **Hourly Usage Patterns**: Identify peak usage times and quiet periods
- **Feedback Trends**: Track how user satisfaction changes over time
- **Stacked Visualizations**: See feedback distribution within each time period
- **Interactive Elements**: Hover for detailed information and zoom capabilities


In [None]:
# Enhanced time series analysis with interactive visualizations
print("📊 ENHANCED TIME SERIES ANALYSIS")
print("=" * 50)
print(f"📈 Total Messages: {len(df):,}")
print(f"📊 Feedback Rate: {len(df[df['has_feedback']])/len(df)*100:.1f}%")
print(f"📅 Date Range: {df['created_at'].min().strftime('%Y-%m-%d')} to {df['created_at'].max().strftime('%Y-%m-%d')}")

# Create interactive time series visualization
fig = make_subplots(
    rows=1, cols=1,
    specs=[[{"type": "bar"}]]
)

# Prepare hourly feedback data
hourly_feedback_data = df.groupby([df['created_at'].dt.floor('h'), 'feedback_category']).size().unstack(fill_value=0)

# Ensure all feedback categories exist
for category in ['no_feedback', 'thumbs_up', 'thumbs_down', 'flagged']:
    if category not in hourly_feedback_data.columns:
        hourly_feedback_data[category] = 0


feedback_colors = {
    'no_feedback': 'silver', #Gray
    'thumbs_up': '#2ca02c',      # Green
    'thumbs_down': '#d62728',    # Red
    'flagged': '#ffd700'         # Yellow
}

for category, color in feedback_colors.items():
    fig.add_trace(
        go.Bar(
            x=hourly_feedback_data.index,
            y=hourly_feedback_data[category],
            name=category.replace('_', ' ').title(),
            marker_color=color,
            opacity=1,
            hovertemplate='<b>%{x}</b><br>' +
                          f'{category.replace("_", " ").title()}: %{{y}}<br>' +
                          '<extra></extra>'
        ),
        row=1, col=1
    )


fig.update_layout(
    title={
        'text': '📊 Messages Over Time - Stacked by Feedback Type',
        'x': 0.5,
        'xanchor': 'center',
        'font': {'size': 20, 'color': '#2c3e50', 'family': 'Arial, sans-serif'}
    },
    height=600,
    showlegend=True,
    template='plotly_white',
    barmode='stack',
    plot_bgcolor='rgba(248, 249, 250, .6)',
    paper_bgcolor='ghostwhite',
    font=dict(family="Arial, sans-serif", size=12, color="#2c3e50"),
    margin=dict(l=60, r=60, t=80, b=60),
    hovermode='x unified',
    hoverlabel=dict(
        bgcolor="ghostwhite",
        font_size=12,
        font_family="Arial"
    )
)

# Update axes with enhanced styling
fig.update_xaxes(
    title_text="Time Period",
    title_font=dict(size=14, color="#2c3e50"),
    tickfont=dict(size=11),
    gridcolor='rgba(128, 128, 128, 0.2)',
    zeroline=False
)

fig.update_yaxes(
    title_text="Messages per Hour",
    title_font=dict(size=14, color="#2c3e50"),
    tickfont=dict(size=11),
    gridcolor='rgba(128, 128, 128, 0.2)',
    zeroline=False
)

# Display the interactive dashboard
fig.show()

# Comprehensive analytics summary
print("\n" + "="*50)
print("📈 KEY METRICS & INSIGHTS")
print("="*50)

total_messages = len(df)
feedback_messages = len(df[df['has_feedback']])
feedback_rate = (feedback_messages / total_messages) * 100

print(f"📊 Total Messages: {total_messages:,}")
print(f"📈 Feedback Rate: {feedback_rate:.1f}% ({feedback_messages:,}/{total_messages:,})")

if feedback_messages > 0:
    positive_rate = (len(df[df['feedback_category'] == 'thumbs_up']) / feedback_messages) * 100
    negative_rate = (len(df[df['feedback_category'] == 'thumbs_down']) / feedback_messages) * 100
    flagged_rate = (len(df[df['feedback_category'] == 'flagged']) / feedback_messages) * 100

    print(f"👍 Positive Feedback: {positive_rate:.1f}%")
    print(f"👎 Negative Feedback: {negative_rate:.1f}%")
    print(f"🚩 Flagged Content: {flagged_rate:.1f}%")

print(f"\n⏰ Temporal Analysis:")
print(f"   • Peak Hour: {df.groupby('hour').size().idxmax()}:00")
print(f"   • Most Active Day: {df.groupby('day_of_week').size().idxmax()}")
print(f"   • Analysis Period: {(df['created_at'].max() - df['created_at'].min()).days} days")

print(f"\n📝 Content Analysis:")
print(f"   • Avg Question Length: {df['question_word_count'].mean():.1f} words")
print(f"   • Avg Answer Length: {df['answer_word_count'].mean():.1f} words")
print(f"   • No Relevant Docs: {df['no_relevant_docs_flag'].sum():,} responses")

## Advanced Analytics & Next Steps

Using the Metrics Data you can calculate additional metrics:

### 1. **Advanced Quality Assessment with External Libraries**
- **RAGAS Integration**: Use libraries like RAGAS and LLM Judges to get metrics like:
  - Response Relevancy
  - Answer Correctness  
  - Semantic Similarity
  - Factual Correctness

### 2. **Enhanced Monitoring with Retrieved Context**
If you save or sample your queries, you can run more metrics including:
- Additional RAGAS metrics (e.g., Faithfulness)
- Groundedness Metric from Contextual AI
- Number of retrievals for a query
- Number of attributions for a query

### Next Steps for Production Monitoring

1. **Real-time Dashboards**: Set up automated data pipelines
2. **Alerting Systems**: Monitor quality degradation in real-time  
3. **A/B Testing**: Compare different RAG configurations
4. **Custom Metrics**: Develop domain-specific quality indicators

### Additional Resources

- [Contextual AI Documentation](https://docs.contextual.ai/user-guides/beginner-guide)
- [RAGAS Evaluation](https://github.com/ContextualAI/examples/tree/main/07-evaluation-ragas)
- [Retrieval Analysis](https://github.com/ContextualAI/examples/tree/main/11-retrieval-analysis)
