In [None]:
# Fix: Upload Dataset with Proper Environment Loading
!cd /Users/mramanindia/work/NovaEval/noveum_customer_support_bt && source .env && python upload_dataset.py --dataset-json split_datasets/agent.rag_evaluation_metrics_dataset.json --item-type conversation


# Noveum AI Agent with RAG + Web Search

An intelligent conversational agent that dynamically routes queries between **RAG (Retrieval-Augmented Generation)** for Noveum.ai-specific information and **Web Search** for external knowledge, providing comprehensive answers with full observability.

## 🚀 What This Agent Does

### Core Functionality
- **Intelligent Query Routing**: Automatically determines whether to use RAG or Web Search based on query content
- **Dual Knowledge Sources**: 
  - **RAG Mode**: Answers questions about Noveum.ai platform using scraped documentation
  - **Web Search Mode**: Handles external queries using real-time web search
- **Comprehensive Tracing**: Full observability with detailed metrics and performance tracking
- **Modular Architecture**: Clean separation of concerns for easy maintenance and extension

### Key Capabilities
- 🧠 **Document Intelligence**: Scrapes and indexes Noveum.ai website content for semantic search
- 🌐 **Real-time Web Search**: Uses DuckDuckGo for current events and external knowledge
- 🎯 **Smart Classification**: LLM-powered query routing with keyword fallback
- 📊 **Performance Monitoring**: Detailed metrics on response quality, latency, and token usage
- 🔄 **Scalable Design**: Easy to extend with new data sources or routing logic

## 📋 Prerequisites & Requirements

### Required Environment Variables
```bash
NOVEUM_API_KEY=your_noveum_api_key_here
OPENAI_API_KEY=your_openai_api_key_here
```

### Required Python Packages
- `requests` - HTTP requests for web scraping
- `beautifulsoup4` - HTML parsing
- `trafilatura` - Advanced text extraction
- `langchain` - LLM framework and vector operations
- `langchain-openai` - OpenAI integration
- `langchain-community` - Community tools (FAISS, DuckDuckGo)
- `noveum-trace` - Observability and tracing
- `python-dotenv` - Environment variable management

### System Requirements
- Python 3.8+
- Internet connection for web scraping and API calls
- ~500MB disk space for vector store and scraped data

## 🏗️ Architecture Overview

### 1. **Website Scraper** (`NoveumWebsiteScraper`)
- Recursively scrapes noveum.ai website and sub-pages
- Extracts clean text content using trafilatura
- Discovers internal links automatically
- Saves scraped data to JSON for persistence

### 2. **RAG System** (`NoveumRAGSystem`)
- Loads scraped documents and creates vector embeddings
- Uses FAISS for fast similarity search
- Generates context-aware responses using OpenAI GPT-4o-mini
- Tracks retrieval effectiveness and response quality

### 3. **Web Search System** (`NoveumWebSearchSystem`)
- Integrates DuckDuckGo search for external queries
- Synthesizes information from multiple web sources
- Handles real-time information and current events
- Formats search results into coherent responses

### 4. **Query Router** (`NoveumQueryRouter`)
- **Keyword-based classification**: Matches queries against predefined keyword lists
- **LLM-based classification**: Uses GPT-4o-mini for complex query analysis
- **Confidence scoring**: Evaluates routing decision quality
- **Fallback handling**: Defaults to Web Search for ambiguous queries

### 5. **Main Agent** (`NoveumAIAgent`)
- Orchestrates all components
- Manages system initialization and data loading
- Provides unified interface for query processing
- Handles error recovery and response formatting

## 🎯 How to Use

### Quick Start
```python
# 1. Initialize the system (first time only)
noveum_agent.initialize_system(force_scrape=True)

# 2. Ask questions
response = noveum_agent.process_query("What is Noveum and what does it do?")
noveum_agent.display_response(response)

# 3. Or use convenience function
ask_question("How do I integrate Noveum Trace?")
```

### Advanced Usage
```python
# Run full demo with 20 test queries
demo_noveum_agent()

# Process queries programmatically
response = noveum_agent.process_query("What are the latest AI news?")
print(f"Mode: {response['mode']}")
print(f"Answer: {response['answer']}")
print(f"Sources: {response['sources']}")
```

### Query Types

#### RAG Queries (Noveum-specific)
- "What is Noveum and what does it do?"
- "How do I integrate Noveum Trace?"
- "What are Noveum's pricing plans?"
- "What features does Noveum Trace offer?"
- "How do I set up observability with Noveum?"

#### Web Search Queries (External knowledge)
- "What are the latest AI news today?"
- "What's the weather like today?"
- "Tell me about recent developments in machine learning"
- "What are the current trends in observability tools?"
- "What happened in tech news this week?"

## 📊 Observability & Monitoring

### Traced Operations
- **System Initialization**: Website scraping and vector store creation
- **Query Processing**: End-to-end query handling with performance metrics
- **RAG Operations**: Document retrieval, context generation, and response creation
- **Web Search Operations**: Search execution, result synthesis, and response generation
- **Query Routing**: Classification decision making and confidence scoring

### Key Metrics Tracked
- **Performance**: Response latency, processing time, token usage
- **Quality**: Response length, source diversity, context utilization
- **Routing**: Classification confidence, keyword scores, decision rationale
- **Model Usage**: Token consumption, cost estimation, efficiency scores
- **Retrieval**: Document relevance, context quality, source effectiveness

### Noveum Trace Integration
- All operations are automatically traced with detailed spans
- Comprehensive attribute tracking for debugging and optimization
- Real-time monitoring through Noveum.ai dashboard
- Export capabilities for further analysis

## 🔧 Configuration

### Default Settings
```python
CONFIG = {
    "noveum_base_url": "https://noveum.ai",
    "max_pages_to_scrape": 50,
    "chunk_size": 1000,
    "chunk_overlap": 200,
    "max_search_results": 5,
    "rag_threshold": 0.7,
    "noveum_docs_file": "noveum_docs.json",
    "vector_store_path": "noveum_vectorstore"
}
```

### Customization Options
- **Scraping**: Adjust `max_pages_to_scrape` for more/less content
- **RAG**: Modify `chunk_size` and `chunk_overlap` for different text splitting
- **Search**: Change `max_search_results` for more/fewer sources
- **Routing**: Add keywords to `rag_keywords` or `web_keywords` lists

## 🚨 Error Handling

### Common Issues
- **API Key Missing**: Ensure `NOVEUM_API_KEY` and `OPENAI_API_KEY` are set
- **Network Errors**: Check internet connection for scraping and API calls
- **Vector Store Issues**: Delete `noveum_vectorstore` folder to regenerate
- **Scraping Failures**: Set `force_scrape=True` to re-scrape website

### Recovery Strategies
- Automatic fallback to Web Search for RAG failures
- Graceful error handling with informative messages
- Retry mechanisms for transient network issues
- Detailed error logging for debugging

## 🔄 Maintenance

### Regular Tasks
- **Update Scraped Content**: Run with `force_scrape=True` periodically
- **Monitor Performance**: Check Noveum Trace dashboard for metrics
- **Review Routing**: Analyze query classification accuracy
- **Update Keywords**: Add new terms to routing keyword lists

### Scaling Considerations
- **Vector Store**: Can be shared across multiple agent instances
- **Scraped Data**: JSON file can be versioned and distributed
- **API Limits**: Monitor OpenAI token usage and costs
- **Performance**: Consider caching for frequently asked questions


In [None]:
!pip3 install -r ./noveum_agent_requirements.txt


In [28]:
# Cell 1: Setup & Imports
import os
import json
import time
from typing import List, Dict, Any, Optional, Tuple
from urllib.parse import urljoin, urlparse
import requests
from bs4 import BeautifulSoup
import trafilatura

# LangChain ecosystem
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_community.vectorstores import FAISS
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.schema import Document
from langchain_community.tools import DuckDuckGoSearchRun

# Noveum Trace integration
import noveum_trace
from noveum_trace.context_managers import trace_operation, trace_agent

# Load environment variables
try:
    from dotenv import load_dotenv
    load_dotenv()
except ImportError:
    print("python-dotenv not installed. Environment variables will be read from system only.")

print("✅ All imports loaded successfully!")


✅ All imports loaded successfully!


In [None]:


## set openai api key
## set gemini api key
## set noveum api key
## set environment
## set project'

# These are required for the project


In [None]:
# Cell 2: Noveum Trace Integration & Configuration
# Initialize the Noveum Trace SDK
noveum_trace.init(
    project="customer_support_agent",
    api_key=os.getenv("NOVEUM_API_KEY"),
    environment="dev-aman",
)

# Configuration
CONFIG = {
    "noveum_base_url": "https://noveum.ai",
    "max_pages_to_scrape": 50,
    "chunk_size": 1000,
    "chunk_overlap": 200,
    "max_search_results": 5,
    "rag_threshold": 0.7,  # Similarity threshold for RAG retrieval
    "noveum_docs_file": "noveum_docs.json",
    "vector_store_path": "noveum_vectorstore"
}

# Initialize LLM and embeddings
llm = ChatOpenAI(
    model="gpt-4o-mini",
    temperature=0.1,
    api_key=os.getenv("OPENAI_API_KEY")
)

embeddings = OpenAIEmbeddings(
    model="text-embedding-3-small",
    api_key=os.getenv("OPENAI_API_KEY")
)

# Initialize web search tool
web_search = DuckDuckGoSearchRun()

print("✅ Noveum Trace initialized and configuration loaded!")
print(f"🔧 Configuration: {CONFIG}")


✅ Noveum Trace initialized and configuration loaded!
🔧 Configuration: {'noveum_base_url': 'https://noveum.ai', 'max_pages_to_scrape': 50, 'chunk_size': 1000, 'chunk_overlap': 200, 'max_search_results': 5, 'rag_threshold': 0.7, 'noveum_docs_file': 'noveum_docs.json', 'vector_store_path': 'noveum_vectorstore'}


In [50]:
# Cell 3: Website Scraper - Extract content from noveum.ai and sub-URLs
class NoveumWebsiteScraper:
    def __init__(self, base_url: str, max_pages: int = 50):
        self.base_url = base_url
        self.max_pages = max_pages
        self.scraped_urls = set()
        self.scraped_content = []
        self.session = requests.Session()
        self.session.headers.update({
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
        })
    
    def is_valid_url(self, url: str) -> bool:
        """Check if URL is valid and belongs to noveum.ai domain"""
        try:
            parsed = urlparse(url)
            return (
                parsed.netloc in ['noveum.ai', 'www.noveum.ai'] and
                not any(ext in url.lower() for ext in ['.pdf', '.jpg', '.png', '.gif', '.css', '.js', '.xml', '.txt']) and
                '#' not in url
            )
        except:
            return False
    
    def extract_text_content(self, html_content: str, url: str) -> str:
        """Extract clean text content from HTML"""
        try:
            # Use trafilatura for better text extraction
            extracted = trafilatura.extract(html_content)
            if extracted:
                return extracted.strip()
            
            # Fallback to BeautifulSoup
            soup = BeautifulSoup(html_content, 'html.parser')
            
            # Remove script and style elements
            for script in soup(["script", "style"]):
                script.decompose()
            
            # Get text and clean up
            text = soup.get_text()
            lines = (line.strip() for line in text.splitlines())
            chunks = (phrase.strip() for line in lines for phrase in line.split("  "))
            text = ' '.join(chunk for chunk in chunks if chunk)
            
            return text.strip()
        except Exception as e:
            print(f"Error extracting text from {url}: {e}")
            return ""
    
    def find_internal_links(self, html_content: str, current_url: str) -> List[str]:
        """Find all internal links from the current page"""
        try:
            soup = BeautifulSoup(html_content, 'html.parser')
            links = []
            
            for link in soup.find_all('a', href=True):
                href = link['href']
                full_url = urljoin(current_url, href)
                
                if self.is_valid_url(full_url) and full_url not in self.scraped_urls:
                    links.append(full_url)
            
            return links
        except Exception as e:
            print(f"Error finding links in {current_url}: {e}")
            return []
    
    def scrape_page(self, url: str) -> Optional[Dict[str, Any]]:
        """Scrape a single page and return content"""
        try:
            print(f"🔍 Scraping: {url}")
            response = self.session.get(url, timeout=10)
            response.raise_for_status()
            
            # Extract text content
            text_content = self.extract_text_content(response.text, url)
            
            if not text_content or len(text_content) < 100:  # Skip pages with too little content
                print(f"⚠️  Skipping {url} - insufficient content")
                return None
            
            # Find internal links
            internal_links = self.find_internal_links(response.text, url)
            
            page_data = {
                "url": url,
                "title": self.extract_title(response.text),
                "content": text_content,
                "content_length": len(text_content),
                "internal_links": internal_links,
                "scraped_at": time.time()
            }
            
            print(f"✅ Scraped {url} - {len(text_content)} chars, {len(internal_links)} internal links")
            return page_data
            
        except Exception as e:
            print(f"❌ Error scraping {url}: {e}")
            return None
    
    def extract_title(self, html_content: str) -> str:
        """Extract page title"""
        try:
            soup = BeautifulSoup(html_content, 'html.parser')
            title_tag = soup.find('title')
            return title_tag.get_text().strip() if title_tag else "Untitled"
        except:
            return "Untitled"
    
    def scrape_website(self) -> List[Dict[str, Any]]:
        """Main scraping function - scrape noveum.ai recursively"""
        print(f"🚀 Starting to scrape {self.base_url}")
        
        urls_to_scrape = [self.base_url]
        self.scraped_urls.add(self.base_url)
        
        with trace_operation("noveum_website_scraping") as scrape_span:
            scrape_span.set_attributes({
                "scraper.base_url": self.base_url,
                "scraper.max_pages": self.max_pages,
                "input_query": f"Scrape website: {self.base_url}",
                "output_response": f"Scraping completed: {len(self.scraped_content)} pages scraped, {sum(page['content_length'] for page in self.scraped_content)} total characters extracted"
            })
            
            while urls_to_scrape and len(self.scraped_content) < self.max_pages:
                current_url = urls_to_scrape.pop(0)
                
                # Scrape the current page
                page_data = self.scrape_page(current_url)
                
                if page_data:
                    self.scraped_content.append(page_data)
                    
                    # Add new internal links to the queue
                    for link in page_data["internal_links"]:
                        if link not in self.scraped_urls and len(urls_to_scrape) < 100:  # Prevent infinite loops
                            urls_to_scrape.append(link)
                            self.scraped_urls.add(link)
                    
                    # Add page data to span
                    scrape_span.add_event("page_scraped", {
                        "input_query": f"Scrape page: {current_url}",
                        "output_response": f"Page scraped successfully: {page_data['content_length']} characters, {len(page_data['internal_links'])} internal links found",
                        "url": current_url,
                        "content_length": page_data["content_length"],
                        "internal_links_found": len(page_data["internal_links"])
                    })
                
                # Small delay to be respectful
                time.sleep(0.5)
            
            # Final metrics
            scrape_span.set_attributes({
                "scraper.pages_scraped": len(self.scraped_content),
                "scraper.total_urls_found": len(self.scraped_urls),
                "scraper.total_content_length": sum(page["content_length"] for page in self.scraped_content)
            })
        
        print(f"✅ Scraping complete! Scraped {len(self.scraped_content)} pages")
        return self.scraped_content
    
    def save_to_json(self, filename: str) -> None:
        """Save scraped content to JSON file"""
        with open(filename, 'w', encoding='utf-8') as f:
            json.dump(self.scraped_content, f, indent=2, ensure_ascii=False)
        print(f"💾 Saved scraped content to {filename}")

# Initialize scraper
scraper = NoveumWebsiteScraper(CONFIG["noveum_base_url"], CONFIG["max_pages_to_scrape"])
print("✅ Website scraper initialized!")


✅ Website scraper initialized!


In [51]:
# Cell 4: RAG System - Vector search and retrieval over scraped content
class NoveumRAGSystem:
    def __init__(self, embeddings, llm, config):
        self.embeddings = embeddings
        self.llm = llm
        self.config = config
        self.vectorstore = None
        self.documents = []
        self.text_splitter = RecursiveCharacterTextSplitter(
            chunk_size=config["chunk_size"],
            chunk_overlap=config["chunk_overlap"]
        )
    
    def load_documents_from_json(self, json_file: str) -> List[Document]:
        """Load documents from scraped JSON file"""
        try:
            with open(json_file, 'r', encoding='utf-8') as f:
                scraped_data = json.load(f)
            
            documents = []
            for page in scraped_data:
                # Create document from page content
                doc = Document(
                    page_content=page["content"],
                    metadata={
                        "url": page["url"],
                        "title": page["title"],
                        "content_length": page["content_length"],
                        "scraped_at": page["scraped_at"]
                    }
                )
                documents.append(doc)
            
            print(f"✅ Loaded {len(documents)} documents from {json_file}")
            return documents
            
        except FileNotFoundError:
            print(f"❌ File {json_file} not found. Please run the scraper first.")
            return []
        except Exception as e:
            print(f"❌ Error loading documents: {e}")
            return []
    
    def create_vectorstore(self, documents: List[Document]) -> None:
        """Create FAISS vector store from documents"""
        if not documents:
            print("❌ No documents to create vector store")
            return
        
        print("🔄 Creating vector store...")
        
        # Split documents into chunks
        split_docs = self.text_splitter.split_documents(documents)
        print(f"📄 Split into {len(split_docs)} chunks")
        
        # Create vector store
        self.vectorstore = FAISS.from_documents(split_docs, self.embeddings)
        
        # Save vector store
        self.vectorstore.save_local(self.config["vector_store_path"])
        print(f"💾 Vector store saved to {self.config['vector_store_path']}")
    
    def load_vectorstore(self) -> bool:
        """Load existing vector store from disk"""
        try:
            self.vectorstore = FAISS.load_local(
                self.config["vector_store_path"], 
                self.embeddings,
                allow_dangerous_deserialization=True
            )
            print(f"✅ Loaded existing vector store from {self.config['vector_store_path']}")
            return True
        except Exception as e:
            print(f"❌ Error loading vector store: {e}")
            return False
    
    def search_relevant_docs(self, query: str, k: int = 5) -> List[Document]:
        """Search for relevant documents using similarity search"""
        if not self.vectorstore:
            print("❌ Vector store not initialized")
            return []
        
        try:
            # Perform similarity search
            docs = self.vectorstore.similarity_search(query, k=k)
            
            # Filter by similarity threshold if needed
            # Note: FAISS doesn't return scores by default, but we can add that if needed
            
            print(f"🔍 Found {len(docs)} relevant documents for query: '{query}'")
            return docs
            
        except Exception as e:
            print(f"❌ Error searching documents: {e}")
            return []
    
    def retrieve_context(self, query: str, max_docs: int = 5) -> str:
        """Retrieve and format context for the query"""
        relevant_docs = self.search_relevant_docs(query, max_docs)
        
        if not relevant_docs:
            return "No relevant information found in Noveum documentation."
        
        context_parts = []
        for i, doc in enumerate(relevant_docs, 1):
            context_parts.append(f"Source {i} ({doc.metadata.get('url', 'Unknown URL')}):\n{doc.page_content[:500]}...")
        
        return "\n\n".join(context_parts)
    
    def generate_rag_response(self, query: str) -> Dict[str, Any]:
        """Generate response using RAG"""
        with trace_agent(
            agent_type="rag_agent",
            operation="llm-rag",
            capabilities=["document_retrieval", "context_generation", "response_generation"],
            attributes={
                "agent.id": "noveum_rag_agent",
                "input_query": query,
                "query_length": len(query)
            }
        ) as rag_span:
            
            # Retrieve relevant context
            context = self.retrieve_context(query, CONFIG["max_search_results"])
            
            # Create prompt for RAG
            rag_prompt = f"""You are a helpful assistant for Noveum.ai. Answer the user's question based on the provided context from Noveum's documentation.

Context from Noveum documentation:
{context}

User Question: {query}

Instructions:
1. Answer based primarily on the provided context
2. If the context doesn't contain enough information, say so clearly
3. Be specific and cite sources when possible
4. Keep responses concise but informative
5. If the question is not related to Noveum, politely redirect to ask about Noveum

Answer:"""

            # Extract model parameters and metadata
            model_name = getattr(self.llm, 'model_name', 'unknown')
            model_temperature = getattr(self.llm, 'temperature', 0.0)
            model_max_tokens = getattr(self.llm, 'max_tokens', None)
            model_top_p = getattr(self.llm, 'top_p', None)
            model_frequency_penalty = getattr(self.llm, 'frequency_penalty', None)
            model_presence_penalty = getattr(self.llm, 'presence_penalty', None)
            
            # Model Details Span - Track model-specific information
            with trace_agent(
                agent_type="model_details",
                operation="llm_model_execution",
                capabilities=["model_invocation", "parameter_tracking", "latency_measurement"],
                attributes={
                    "agent.id": "noveum_model_details",
                    "input_query": f"Model execution for query: {query[:100]}{'...' if len(query) > 100 else ''}",
                    "query_length": len(query)
                }
            ) as model_span:
                
                # Record start time for latency measurement
                model_start_time = time.time()
                
                # Generate response
                response = self.llm.invoke(rag_prompt)
                
                # Record end time and calculate latency
                model_end_time = time.time()
                model_latency = model_end_time - model_start_time

                if response.content:
                    answer = response.content
                else:
                    answer = str(response)

                # Extract token usage metadata - Enhanced extraction
                prompt_tokens = 0
                completion_tokens = 0
                total_tokens = 0
                
                # Try multiple ways to extract token usage
                if hasattr(response, 'usage_metadata') and response.usage_metadata:
                    usage = response.usage_metadata
                    prompt_tokens = getattr(usage, "input_tokens", 0) or getattr(usage, "prompt_tokens", 0)
                    completion_tokens = getattr(usage, "output_tokens", 0) or getattr(usage, "completion_tokens", 0)
                    total_tokens = getattr(usage, "total_tokens", 0)
                elif hasattr(response, 'response_metadata') and response.response_metadata:
                    metadata = response.response_metadata
                    if 'token_usage' in metadata:
                        token_usage = metadata['token_usage']
                        prompt_tokens = token_usage.get('prompt_tokens', 0)
                        completion_tokens = token_usage.get('completion_tokens', 0)
                        total_tokens = token_usage.get('total_tokens', 0)
                elif hasattr(response, 'token_usage'):
                    token_usage = response.token_usage
                    prompt_tokens = getattr(token_usage, "prompt_tokens", 0)
                    completion_tokens = getattr(token_usage, "completion_tokens", 0)
                    total_tokens = getattr(token_usage, "total_tokens", 0)
                
                # If still no tokens found, try to estimate from content length
                if total_tokens == 0:
                    # Rough estimation: ~4 characters per token for English text
                    estimated_prompt_tokens = len(rag_prompt) // 4
                    estimated_completion_tokens = len(answer) // 4
                    prompt_tokens = estimated_prompt_tokens
                    completion_tokens = estimated_completion_tokens
                    total_tokens = prompt_tokens + completion_tokens

                # Set model details span attributes
                model_span.set_attributes({
                    # Input metrics
                    "input_query": f"Model execution for query: {query[:100]}{'...' if len(query) > 100 else ''}",
                    "query_length": len(query),
                    "query_type": "rag_model_query",
                    
                    # Model parameters and configuration
                    "model.name": model_name,
                    "model.temperature": model_temperature,
                    "model.max_tokens": model_max_tokens,
                    "model.top_p": model_top_p,
                    "model.frequency_penalty": model_frequency_penalty,
                    "model.presence_penalty": model_presence_penalty,
                    "model.provider": "openai",
                    "model.type": "chat_completion",
                    "model.version": "gpt-4o-mini",
                    
                    # Latency and performance metrics
                    "model.latency_seconds": model_latency,
                    "model.latency_ms": model_latency * 1000,
                    "model.start_time": model_start_time,
                    "model.end_time": model_end_time,
                    "model.performance_tier": "fast" if model_latency < 2.0 else "medium" if model_latency < 5.0 else "slow",
                    
                    # Token usage and cost metrics
                    "model.prompt_tokens": prompt_tokens,
                    "model.completion_tokens": completion_tokens,
                    "model.total_tokens": total_tokens,
                    "model.tokens_per_second": total_tokens / model_latency if model_latency > 0 else 0,
                    "model.estimated_cost": total_tokens * 0.00003,  # Rough cost estimate
                    "model.efficiency_score": len(answer) / total_tokens if total_tokens > 0 else 0,
                    
                    # Response characteristics
                    "model.response_length": len(answer),
                    "model.response_quality": "high" if len(answer) > 200 else "medium" if len(answer) > 100 else "low",
                    "model.output_response": f"Model Response: {answer[:200]}{'...' if len(answer) > 200 else ''}",
                    
                    # Model configuration details
                    "model.config": {
                        "name": model_name,
                        "temperature": model_temperature,
                        "max_tokens": model_max_tokens,
                        "top_p": model_top_p,
                        "frequency_penalty": model_frequency_penalty,
                        "presence_penalty": model_presence_penalty,
                        "provider": "openai",
                        "type": "chat_completion"
                    }
                })

            # Other Details Span - Track retrieval, response quality, and evaluation metrics
            with trace_agent(
                agent_type="other_details",
                operation="rag_evaluation_metrics",
                capabilities=["retrieval_analysis", "response_evaluation", "quality_assessment"],
                attributes={
                    "agent.id": "noveum_other_details",
                    "input_query": f"Evaluation for query: {query[:100]}{'...' if len(query) > 100 else ''}",
                    "query_length": len(query)
                }
            ) as rag_node:
                
                # Calculate additional evaluation metrics
                context_length = len(context)
                answer_length = len(answer)
                sources_count = len(context.split("Source")) - 1 if "Source" in context else 0
                
                # Set other details span attributes
                rag_node.set_attributes({
                    # Input metrics
                    "input_query": f"Evaluation for query: {query[:100]}{'...' if len(query) > 100 else ''}",
                    "query_length": len(query),
                    "query_type": "rag_evaluation_query",
                    
                    # Retrieval metrics
                    "retrieval.context_retrieved": f"Context: {context[:300]}{'...' if len(context) > 300 else ''}",
                    "retrieval.context_length": context_length,
                    "retrieval.sources_count": sources_count,
                    "retrieval.context_quality": "high" if context_length > 500 else "medium" if context_length > 200 else "low",
                    "retrieval.effectiveness": sources_count / 5.0,  # Normalized to max expected sources
                    "retrieval.context_utilization": context_length / 1000.0,  # Normalized context usage
                    
                    # Prompt engineering metrics
                    "prompt.complete_prompt": rag_prompt,
                    "prompt.prompt_length": len(rag_prompt),
                    "prompt.context_injection": f"Context injected: {context[:200]}{'...' if len(context) > 200 else ''}",
                    "prompt.instruction_following": "rag_optimized",
                    
                    # Response quality metrics
                    "response.answer_length": answer_length,
                    "response.answer_completeness": "complete" if answer_length > 100 else "brief",
                    "response.response_quality": "high" if answer_length > 200 and sources_count > 2 else "medium" if answer_length > 100 else "low",
                    "response.source_citation": sources_count,
                    "response.context_utilization": answer_length / context_length if context_length > 0 else 0,
                    "output_response": f"RAG Answer: {answer[:200]}{'...' if len(answer) > 200 else ''}",
                    
                    # Evaluation metrics
                    "evaluation.retrieval_effectiveness": sources_count / 5.0,
                    "evaluation.response_completeness": "complete" if answer_length > 150 else "partial",
                    "evaluation.source_diversity": sources_count,
                    "evaluation.context_relevance": "high" if context_length > 500 else "medium" if context_length > 200 else "low",
                    "evaluation.overall_quality": "high" if answer_length > 200 and sources_count > 2 and context_length > 500 else "medium" if answer_length > 100 and sources_count > 1 else "low",
                    "evaluation.ready_for_production": True,
                    
                    # RAG-specific metrics
                    "rag.retrieval_strategy": "semantic_similarity",
                    "rag.vector_search_results": sources_count,
                    "rag.context_synthesis": "multi_source" if sources_count > 1 else "single_source",
                    "rag.document_coverage": sources_count / 5.0,  # Normalized coverage
                    "rag.information_density": answer_length / context_length if context_length > 0 else 0
                })

            # Set main RAG span attributes (simplified)
            rag_span.set_attributes({
                "input_query": query,
                "query_length": len(query),
                "query_type": "rag_query",
                "output_response": f"RAG Answer: {answer[:200]}{'...' if len(answer) > 200 else ''}",
                "rag.context_length": context_length,
                "rag.sources_count": sources_count,
                "rag.answer_length": answer_length,
                "rag.mode": "retrieval_augmented_generation"
            })

            return {
                "answer": answer,
                "context": context,
                "mode": "RAG",
                "sources": [doc.metadata.get('url', 'Unknown') for doc in self.search_relevant_docs(query, CONFIG["max_search_results"])],
                "model_info": {
                    "name": model_name,
                    "tokens_used": total_tokens,
                    "prompt_tokens": prompt_tokens,
                    "completion_tokens": completion_tokens,
                    "latency": model_latency
                }
            }

# Initialize RAG system
rag_system = NoveumRAGSystem(embeddings, llm, CONFIG)
print("✅ RAG system initialized!")


✅ RAG system initialized!


In [52]:
# Cell 5: Web Search Integration - DuckDuckGo search for external queries
class NoveumWebSearchSystem:
    def __init__(self, web_search_tool, llm, config):
        self.web_search = web_search_tool
        self.llm = llm
        self.config = config
    
    def search_web(self, query: str, max_results: int = 5) -> List[Dict[str, Any]]:
        """Perform web search and return formatted results"""
        try:
            # Perform web search
            search_results = self.web_search.run(query)
            
            # Parse results (DuckDuckGo returns a string, need to parse it)
            results = []
            if isinstance(search_results, str):
                # Split by lines and parse each result
                lines = search_results.split('\n')
                for i, line in enumerate(lines[:max_results]):
                    if line.strip():
                        results.append({
                            "title": f"Search Result {i+1}",
                            "snippet": line.strip(),
                            "url": f"https://duckduckgo.com/?q={query.replace(' ', '+')}"
                        })
            else:
                # If it's already a list/dict format
                results = search_results[:max_results]
            
            print(f"🔍 Found {len(results)} web search results for: '{query}'")
            return results
            
        except Exception as e:
            print(f"❌ Error performing web search: {e}")
            return []
    
    def format_search_context(self, search_results: List[Dict[str, Any]]) -> str:
        """Format search results into context string"""
        if not search_results:
            return "No search results found."
        
        context_parts = []
        for i, result in enumerate(search_results, 1):
            title = result.get('title', f'Result {i}')
            snippet = result.get('snippet', 'No description available')
            url = result.get('url', 'No URL available')
            
            context_parts.append(f"Source {i} - {title}:\n{snippet}\nURL: {url}")
        
        return "\n\n".join(context_parts)
    
    def generate_web_response(self, query: str) -> Dict[str, Any]:
        """Generate response using web search"""
        with trace_agent(
            agent_type="web_search_agent",
            operation="web_search_generation",
            capabilities=["web_search", "content_synthesis", "response_generation"],
            attributes={
                "agent.id": "noveum_web_search_agent",
                "input_query": query,
                "query_length": len(query)
            }
        ) as web_span:
            
            # Perform web search
            search_results = self.search_web(query, self.config["max_search_results"])
            
            # Format context
            context = self.format_search_context(search_results)
            
            # Create prompt for web search response
            web_prompt = f"""You are a helpful assistant. Answer the user's question based on the provided web search results.

Web Search Results:
{context}

User Question: {query}

Instructions:
1. Answer based on the provided web search results
2. Synthesize information from multiple sources when relevant
3. Be informative and accurate
4. If the results don't contain enough information, say so clearly
5. Keep responses concise but comprehensive
6. Cite sources when possible

Answer:"""

            # Extract model parameters and metadata
            model_name = getattr(self.llm, 'model_name', 'unknown')
            model_temperature = getattr(self.llm, 'temperature', 0.0)
            model_max_tokens = getattr(self.llm, 'max_tokens', None)
            model_top_p = getattr(self.llm, 'top_p', None)
            model_frequency_penalty = getattr(self.llm, 'frequency_penalty', None)
            model_presence_penalty = getattr(self.llm, 'presence_penalty', None)

            # Model Details Span - Track model-specific information
            with trace_agent(
                agent_type="model_details",
                operation="llm_model_execution",
                capabilities=["model_invocation", "parameter_tracking", "latency_measurement"],
                attributes={
                    "agent.id": "noveum_model_details",
                    "input_query": f"Model execution for query: {query[:100]}{'...' if len(query) > 100 else ''}",
                    "query_length": len(query)
                }
            ) as model_span:
                
                # Record start time for latency measurement
                model_start_time = time.time()
                
                # Generate response
                response = self.llm.invoke(web_prompt)
                
                # Record end time and calculate latency
                model_end_time = time.time()
                model_latency = model_end_time - model_start_time

                # Handle response content extraction
                if hasattr(response, 'content'):
                    # When response is a proper SDK object
                    answer = response.content
                elif isinstance(response, dict):
                    # When response is returned as a plain dict
                    answer = response.get('content', '')
                else:
                    # Fallback to string
                    answer = str(response)

                # Extract token usage metadata - Enhanced extraction
                prompt_tokens = 0
                completion_tokens = 0
                total_tokens = 0
                
                # Try multiple ways to extract token usage
                if hasattr(response, 'usage_metadata') and response.usage_metadata:
                    usage = response.usage_metadata
                    prompt_tokens = getattr(usage, "input_tokens", 0) or getattr(usage, "prompt_tokens", 0)
                    completion_tokens = getattr(usage, "output_tokens", 0) or getattr(usage, "completion_tokens", 0)
                    total_tokens = getattr(usage, "total_tokens", 0)
                elif hasattr(response, 'response_metadata') and response.response_metadata:
                    metadata = response.response_metadata
                    if 'token_usage' in metadata:
                        token_usage = metadata['token_usage']
                        prompt_tokens = token_usage.get('prompt_tokens', 0)
                        completion_tokens = token_usage.get('completion_tokens', 0)
                        total_tokens = token_usage.get('total_tokens', 0)
                elif hasattr(response, 'token_usage'):
                    token_usage = response.token_usage
                    prompt_tokens = getattr(token_usage, "prompt_tokens", 0)
                    completion_tokens = getattr(token_usage, "completion_tokens", 0)
                    total_tokens = getattr(token_usage, "total_tokens", 0)
                
                # If still no tokens found, try to estimate from content length
                if total_tokens == 0:
                    # Rough estimation: ~4 characters per token for English text
                    estimated_prompt_tokens = len(web_prompt) // 4
                    estimated_completion_tokens = len(answer) // 4
                    prompt_tokens = estimated_prompt_tokens
                    completion_tokens = estimated_completion_tokens
                    total_tokens = prompt_tokens + completion_tokens

                # Set model details span attributes
                model_span.set_attributes({
                    # Input metrics
                    "input_query": f"Model execution for query: {query[:100]}{'...' if len(query) > 100 else ''}",
                    "query_length": len(query),
                    "query_type": "web_search_model_query",
                    
                    # Model parameters and configuration
                    "model.name": model_name,
                    "model.temperature": model_temperature,
                    "model.max_tokens": model_max_tokens,
                    "model.top_p": model_top_p,
                    "model.frequency_penalty": model_frequency_penalty,
                    "model.presence_penalty": model_presence_penalty,
                    "model.provider": "openai",
                    "model.type": "chat_completion",
                    "model.version": "gpt-4o-mini",
                    
                    # Latency and performance metrics
                    "model.latency_seconds": model_latency,
                    "model.latency_ms": model_latency * 1000,
                    "model.start_time": model_start_time,
                    "model.end_time": model_end_time,
                    "model.performance_tier": "fast" if model_latency < 2.0 else "medium" if model_latency < 5.0 else "slow",
                    
                    # Token usage and cost metrics
                    "model.prompt_tokens": prompt_tokens,
                    "model.completion_tokens": completion_tokens,
                    "model.total_tokens": total_tokens,
                    "model.tokens_per_second": total_tokens / model_latency if model_latency > 0 else 0,
                    "model.estimated_cost": total_tokens * 0.00003,  # Rough cost estimate
                    "model.efficiency_score": len(answer) / total_tokens if total_tokens > 0 else 0,
                    
                    # Response characteristics
                    "model.response_length": len(answer),
                    "model.response_quality": "high" if len(answer) > 200 else "medium" if len(answer) > 100 else "low",
                    "model.output_response": f"Model Response: {answer[:200]}{'...' if len(answer) > 200 else ''}",
                    
                    # Model configuration details
                    "model.config": {
                        "name": model_name,
                        "temperature": model_temperature,
                        "max_tokens": model_max_tokens,
                        "top_p": model_top_p,
                        "frequency_penalty": model_frequency_penalty,
                        "presence_penalty": model_presence_penalty,
                        "provider": "openai",
                        "type": "chat_completion"
                    }
                })

            # Other Details Span - Track web search, response quality, and evaluation metrics
            with trace_agent(
                agent_type="other_details",
                operation="web_search_evaluation_metrics",
                capabilities=["web_search_analysis", "response_evaluation", "quality_assessment"],
                attributes={
                    "agent.id": "noveum_other_details",
                    "input_query": f"Evaluation for query: {query[:100]}{'...' if len(query) > 100 else ''}",
                    "query_length": len(query)
                }
            ) as other_span:
                
                # Calculate additional evaluation metrics
                search_results_count = len(search_results)
                context_length = len(context)
                answer_length = len(answer or "")
                
                # Set other details span attributes
                other_span.set_attributes({
                    # Input metrics
                    "input_query": f"Evaluation for query: {query[:100]}{'...' if len(query) > 100 else ''}",
                    "query_length": len(query),
                    "query_type": "web_search_evaluation_query",
                    
                    # Web search metrics
                    "web_search.results_count": search_results_count,
                    "web_search.context_length": context_length,
                    "web_search.context_synthesized": f"Context from {search_results_count} web sources",
                    "web_search.search_effectiveness": search_results_count / 5.0,  # Normalized to max expected results
                    "web_search.context_quality": "high" if context_length > 800 else "medium" if context_length > 400 else "low",
                    "web_search.source_diversity": search_results_count,
                    "web_search.information_synthesis": "high" if search_results_count > 3 and answer_length > 200 else "medium" if search_results_count > 1 else "low",
                    "web_search.external_knowledge_utilization": context_length / 1000.0,  # Normalized context usage
                    
                    # Prompt engineering metrics
                    "prompt.complete_prompt": web_prompt,
                    "prompt.prompt_length": len(web_prompt),
                    "prompt.context_injection": f"Web context injected: {context[:200]}{'...' if len(context) > 200 else ''}",
                    "prompt.instruction_following": "web_search_optimized",
                    
                    # Response quality metrics
                    "response.answer_length": answer_length,
                    "response.answer_completeness": "complete" if answer_length > 150 else "brief",
                    "response.response_quality": "high" if answer_length > 300 and search_results_count > 3 else "medium" if answer_length > 150 else "low",
                    "response.source_citation": search_results_count,
                    "response.context_utilization": answer_length / context_length if context_length > 0 else 0,
                    "output_response": f"Web Search Answer: {answer[:200]}{'...' if len(answer or '') > 200 else ''}" if answer else "No answer generated",
                    
                    # Evaluation metrics
                    "evaluation.search_effectiveness": search_results_count / 5.0,
                    "evaluation.response_completeness": "complete" if answer_length > 200 else "partial",
                    "evaluation.source_diversity": search_results_count,
                    "evaluation.context_relevance": "high" if context_length > 800 else "medium" if context_length > 400 else "low",
                    "evaluation.overall_quality": "high" if answer_length > 300 and search_results_count > 3 and context_length > 800 else "medium" if answer_length > 150 and search_results_count > 1 else "low",
                    "evaluation.ready_for_production": True,
                    
                    # Web search specific metrics
                    "web_search.search_strategy": "duckduckgo_api",
                    "web_search.real_time_data": True,
                    "web_search.external_sources": search_results_count,
                    "web_search.information_freshness": "current",
                    "web_search.knowledge_synthesis": "multi_source" if search_results_count > 1 else "single_source"
                })

            # Set main Web Search span attributes (simplified)
            web_span.set_attributes({
                "input_query": query,
                "query_length": len(query),
                "query_type": "web_search_query",
                "output_response": f"Web Search Answer: {answer[:200]}{'...' if len(answer or '') > 200 else ''}" if answer else "No answer generated",
                "web_search.results_count": search_results_count,
                "web_search.context_length": context_length,
                "web_search.response_length": answer_length,
                "web_search.mode": "external_web_search"
            })

            return {
                "answer": answer,
                "context": context,
                "mode": "Web Search",
                "sources": [result.get('url', 'Unknown') for result in search_results],
                "model_info": {
                    "name": model_name,
                    "tokens_used": total_tokens,
                    "prompt_tokens": prompt_tokens,
                    "completion_tokens": completion_tokens,
                    "latency": model_latency
                }
            }

# Initialize web search system
web_search_system = NoveumWebSearchSystem(web_search, llm, CONFIG)
print("✅ Web search system initialized!")


✅ Web search system initialized!


In [53]:
# Cell 6: Query Router - Intelligent decision making between RAG and Web Search
class NoveumQueryRouter:
    def __init__(self, llm, config):
        self.llm = llm
        self.config = config
        
        # Keywords that suggest RAG should be used
        self.rag_keywords = [
            "noveum", "platform", "product", "feature", "api", "documentation",
            "trace", "observability", "monitoring", "agent", "system", "tool",
            "integration", "setup", "configuration", "usage", "guide", "tutorial",
            "pricing", "plan", "subscription", "account", "dashboard", "metrics"
        ]
        
        # Keywords that suggest Web Search should be used
        self.web_keywords = [
            "recent", "latest", "news", "update", "announcement", "release",
            "today", "yesterday", "this week", "this month", "current",
            "trending", "popular", "viral", "breaking", "live", "real-time",
            "weather", "stock", "price", "market", "cryptocurrency", "bitcoin",
            "election", "politics", "sports", "entertainment", "celebrity"
        ]
    
    def classify_query(self, query: str) -> str:
        """Classify query to determine whether to use RAG or Web Search"""
        query_lower = query.lower()
        
        # Check for RAG keywords
        rag_score = sum(1 for keyword in self.rag_keywords if keyword in query_lower)
        
        # Check for Web Search keywords
        web_score = sum(1 for keyword in self.web_keywords if keyword in query_lower)
        
        # Check for explicit mentions of Noveum
        if "noveum" in query_lower:
            return "RAG"
        
        # If both scores are 0, use LLM-based classification
        if rag_score == 0 and web_score == 0:
            return self._llm_classify_query(query)
        
        # Return the mode with higher score
        return "RAG" if rag_score >= web_score else "Web Search"
    
    def _llm_classify_query(self, query: str) -> str:
        """Use LLM to classify query when keyword matching is inconclusive"""
        try:
            classification_prompt = f"""Classify the following user query to determine the best response method:

Query: "{query}"

Choose between:
- RAG: Use when the query is about Noveum.ai platform, products, features, documentation, or internal information
- Web Search: Use when the query is about recent events, news, general knowledge, or external topics

Respond with only "RAG" or "Web Search"."""

            # Extract model parameters for tracking
            model_name = getattr(self.llm, 'model_name', 'unknown')
            model_temperature = getattr(self.llm, 'temperature', 0.0)
            model_max_tokens = getattr(self.llm, 'max_tokens', None)
            model_top_p = getattr(self.llm, 'top_p', None)
            model_frequency_penalty = getattr(self.llm, 'frequency_penalty', None)
            model_presence_penalty = getattr(self.llm, 'presence_penalty', None)

            # Model Details Span for classification
            with trace_agent(
                agent_type="model_details",
                operation="llm_model_execution",
                capabilities=["model_invocation", "parameter_tracking", "latency_measurement"],
                attributes={
                    "agent.id": "noveum_model_details",
                    "input_query": f"Model execution for classification: {query[:100]}{'...' if len(query) > 100 else ''}",
                    "query_length": len(query)
                }
            ) as model_span:
                
                # Record start time for latency measurement
                model_start_time = time.time()
                
                response = self.llm.invoke(classification_prompt)
                
                # Record end time and calculate latency
                model_end_time = time.time()
                model_latency = model_end_time - model_start_time
            
                if hasattr(response, 'content'):
                    result = response.content.strip().upper()
                else:
                    result = str(response).strip().upper()

                # Extract token usage for classification
                prompt_tokens = 0
                completion_tokens = 0
                total_tokens = 0
                
                if hasattr(response, 'usage_metadata') and response.usage_metadata:
                    usage = response.usage_metadata
                    prompt_tokens = getattr(usage, "input_tokens", 0) or getattr(usage, "prompt_tokens", 0)
                    completion_tokens = getattr(usage, "output_tokens", 0) or getattr(usage, "completion_tokens", 0)
                    total_tokens = getattr(usage, "total_tokens", 0)
                elif hasattr(response, 'response_metadata') and response.response_metadata:
                    metadata = response.response_metadata
                    if 'token_usage' in metadata:
                        token_usage = metadata['token_usage']
                        prompt_tokens = token_usage.get('prompt_tokens', 0)
                        completion_tokens = token_usage.get('completion_tokens', 0)
                        total_tokens = token_usage.get('total_tokens', 0)
                
                # If still no tokens found, estimate
                if total_tokens == 0:
                    estimated_prompt_tokens = len(classification_prompt) // 4
                    estimated_completion_tokens = len(result) // 4
                    prompt_tokens = estimated_prompt_tokens
                    completion_tokens = estimated_completion_tokens
                    total_tokens = prompt_tokens + completion_tokens

                # Set model details span attributes
                model_span.set_attributes({
                    # Input metrics
                    "input_query": f"Model execution for classification: {query[:100]}{'...' if len(query) > 100 else ''}",
                    "query_length": len(query),
                    "query_type": "classification_model_query",
                    
                    # Model parameters and configuration
                    "model.name": model_name,
                    "model.temperature": model_temperature,
                    "model.max_tokens": model_max_tokens,
                    "model.top_p": model_top_p,
                    "model.frequency_penalty": model_frequency_penalty,
                    "model.presence_penalty": model_presence_penalty,
                    "model.provider": "openai",
                    "model.type": "chat_completion",
                    "model.version": "gpt-4o-mini",
                    
                    # Latency and performance metrics
                    "model.latency_seconds": model_latency,
                    "model.latency_ms": model_latency * 1000,
                    "model.start_time": model_start_time,
                    "model.end_time": model_end_time,
                    "model.performance_tier": "fast" if model_latency < 1.0 else "medium" if model_latency < 3.0 else "slow",
                    
                    # Token usage and cost metrics
                    "model.prompt_tokens": prompt_tokens,
                    "model.completion_tokens": completion_tokens,
                    "model.total_tokens": total_tokens,
                    "model.tokens_per_second": total_tokens / model_latency if model_latency > 0 else 0,
                    "model.estimated_cost": total_tokens * 0.00003,  # Rough cost estimate
                    "model.efficiency_score": len(result) / total_tokens if total_tokens > 0 else 0,
                    
                    # Response characteristics
                    "model.response_length": len(result),
                    "model.response_quality": "high" if len(result) > 10 else "medium" if len(result) > 5 else "low",
                    "model.output_response": f"Classification Result: {result}",
                    
                    # Model configuration details
                    "model.config": {
                        "name": model_name,
                        "temperature": model_temperature,
                        "max_tokens": model_max_tokens,
                        "top_p": model_top_p,
                        "frequency_penalty": model_frequency_penalty,
                        "presence_penalty": model_presence_penalty,
                        "provider": "openai",
                        "type": "chat_completion"
                    }
                })

            # Log classification details for debugging
            print(f"🔍 LLM Classification - Model: {model_name}, Tokens: {total_tokens}, Result: {result}")
            
            if "RAG" in result:
                return "RAG"
            elif "WEB" in result or "SEARCH" in result:
                return "Web Search"
            else:
                # Default to Web Search if unclear
                return "Web Search"
                
        except Exception as e:
            print(f"❌ Error in LLM classification: {e}")
            # Default to Web Search on error
            return "Web Search"
    
    def route_query(self, query: str) -> Tuple[str, Dict[str, Any]]:
        """Route query to appropriate system and return response"""
        with trace_agent(
            agent_type="query_router",
            operation="query_routing",
            capabilities=["query_classification", "routing_decision"],
            attributes={
                "agent.id": "noveum_query_router",
                "input_query": query,
                "query_length": len(query)
            }
        ) as router_span:
            
            # Define classification prompt for tracing
            classification_prompt = f"""Classify the following user query to determine the best response method:

Query: "{query}"

Choose between:
- RAG: Use when the query is about Noveum.ai platform, products, features, documentation, or internal information
- Web Search: Use when the query is about recent events, news, general knowledge, or external topics

Respond with only "RAG" or "Web Search"."""
            
            # Classify the query
            mode = self.classify_query(query)
            
            # Calculate routing evaluation metrics
            query_lower = query.lower()
            rag_keywords = ["noveum", "platform", "product", "feature", "api", "documentation", "trace", "observability", "monitoring", "agent", "system", "tool", "integration", "setup", "configuration", "usage", "guide", "tutorial", "pricing", "plan", "subscription", "account", "dashboard", "metrics"]
            web_keywords = ["recent", "latest", "news", "update", "announcement", "release", "today", "yesterday", "this week", "this month", "current", "trending", "popular", "viral", "breaking", "live", "real-time", "weather", "stock", "price", "market", "cryptocurrency", "bitcoin", "election", "politics", "sports", "entertainment", "celebrity"]
            
            rag_score = sum(1 for keyword in rag_keywords if keyword in query_lower)
            web_score = sum(1 for keyword in web_keywords if keyword in query_lower)
            confidence_score = abs(rag_score - web_score) / max(rag_score + web_score, 1)
            
            # Other Details Span - Track routing analysis and decision metrics
            with trace_agent(
                agent_type="other_details",
                operation="routing_evaluation_metrics",
                capabilities=["routing_analysis", "decision_evaluation", "quality_assessment"],
                attributes={
                    "agent.id": "noveum_other_details",
                    "input_query": f"Routing evaluation for query: {query[:100]}{'...' if len(query) > 100 else ''}",
                    "query_length": len(query)
                }
            ) as other_span:
                
                # Set other details span attributes
                other_span.set_attributes({
                    # Input metrics
                    "input_query": f"Routing evaluation for query: {query[:100]}{'...' if len(query) > 100 else ''}",
                    "query_length": len(query),
                    "query_type": "routing_evaluation_query",
                    
                    # Classification metrics
                    "classification.mode": mode,
                    "classification.rag_keyword_score": rag_score,
                    "classification.web_keyword_score": web_score,
                    "classification.confidence_score": confidence_score,
                    "classification.confidence_level": "high" if confidence_score > 0.5 else "medium" if confidence_score > 0.2 else "low",
                    "classification.method": "llm_based" if rag_score == 0 and web_score == 0 else "keyword_based",
                    
                    # Query analysis metrics
                    "query.complexity": "complex" if len(query) > 50 else "medium" if len(query) > 20 else "simple",
                    "query.intent": "noveum_specific" if "noveum" in query_lower else "general_knowledge" if web_score > rag_score else "documentation",
                    "query.keyword_density": (rag_score + web_score) / len(query.split()),
                    "query.domain_affinity": "noveum" if rag_score > web_score else "general" if web_score > rag_score else "neutral",
                    
                    # Routing decision metrics
                    "routing.decision": f"Routed to {mode} based on analysis",
                    "routing.rationale": f"RAG score: {rag_score}, Web score: {web_score}, Confidence: {confidence_score:.2f}",
                    "routing.expected_performance": "high" if confidence_score > 0.5 else "medium" if confidence_score > 0.2 else "low",
                    "routing.alternative_mode": "Web Search" if mode == "RAG" else "RAG",
                    "routing.decision_confidence": confidence_score,
                    
                    # Evaluation metrics
                    "evaluation.routing_accuracy": "high" if confidence_score > 0.5 else "medium" if confidence_score > 0.2 else "low",
                    "evaluation.keyword_coverage": (rag_score + web_score) / len(rag_keywords + web_keywords),
                    "evaluation.query_understanding": "clear" if confidence_score > 0.5 else "ambiguous" if confidence_score > 0.2 else "unclear",
                    "evaluation.ready_for_production": True,
                    
                    # Router-specific metrics
                    "router.classification_strategy": "hybrid_keyword_llm",
                    "router.keyword_matching": "used" if rag_score > 0 or web_score > 0 else "bypassed",
                    "router.llm_fallback": "used" if rag_score == 0 and web_score == 0 else "not_needed",
                    "router.decision_time": "instant" if rag_score > 0 or web_score > 0 else "llm_required",
                    "output_response": f"Routing Decision: {mode} (Confidence: {confidence_score:.2f})"
                })

            # Set main router span attributes (simplified)
            router_span.set_attributes({
                "input_query": query,
                "query_length": len(query),
                "query_type": "routing_query",
                "output_response": f"Routed to {mode} for query processing",
                "router.classification": mode,
                "router.confidence_score": confidence_score,
                "router.rag_keyword_score": rag_score,
                "router.web_keyword_score": web_score,
                "router.mode": "intelligent_routing"
            })
            
            # Route to appropriate system
            if mode == "RAG":
                print(f"🧠 Routing to RAG system for: '{query}'")
                response = rag_system.generate_rag_response(query)
            else:
                print(f"🌐 Routing to Web Search for: '{query}'")
                response = web_search_system.generate_web_response(query)
            
            return mode, response

# Initialize query router
query_router = NoveumQueryRouter(llm, CONFIG)
print("✅ Query router initialized!")


✅ Query router initialized!


In [None]:
# Cell 7: Main Executor - Orchestrates the complete agent workflow
class NoveumAIAgent:
    def __init__(self, scraper, rag_system, web_search_system, query_router, config):
        self.scraper = scraper
        self.rag_system = rag_system
        self.web_search_system = web_search_system
        self.query_router = query_router
        self.config = config
        self.is_initialized = False
    
    def initialize_system(self, force_scrape: bool = False) -> bool:
        """Initialize the system by setting up RAG with scraped data"""
        print("🚀 Initializing Noveum AI Agent...")
        
        with trace_operation("system_initialization") as init_span:
            init_span.set_attributes({
                "system.force_scrape": force_scrape,
                "system.config": self.config,
                "input_query": f"Initialize system with force_scrape={force_scrape}",
                "output_response": "System initialization: RAG system loaded, vector store ready, agent operational"
            })
            
            # Check if we need to scrape or if data already exists
            if force_scrape or not os.path.exists(self.config["noveum_docs_file"]):
                print("📥 Scraping Noveum website...")
                
                # Scrape the website
                scraped_data = self.scraper.scrape_website()
                
                if not scraped_data:
                    print("❌ Failed to scrape website data")
                    return False
                
                # Save scraped data
                self.scraper.save_to_json(self.config["noveum_docs_file"])
                
                init_span.add_event("website_scraped", {
                    "input_query": f"Scrape website: {self.config['noveum_base_url']}",
                    "output_response": f"Website scraping completed: {len(scraped_data)} pages scraped, {sum(page['content_length'] for page in scraped_data)} total characters extracted for RAG system",
                    "pages_scraped": len(scraped_data),
                    "total_content_length": sum(page["content_length"] for page in scraped_data)
                })
            else:
                print("📁 Using existing scraped data...")
            
            # Load documents and create/load vector store
            documents = self.rag_system.load_documents_from_json(self.config["noveum_docs_file"])
            
            if not documents:
                print("❌ Failed to load documents")
                return False
            
            # Try to load existing vector store, create if doesn't exist
            if not self.rag_system.load_vectorstore():
                print("🔄 Creating new vector store...")
                self.rag_system.create_vectorstore(documents)
            
            self.is_initialized = True
            print("✅ Noveum AI Agent initialized successfully!")
            
            init_span.set_attributes({
                "system.initialized": True,
                "system.documents_loaded": len(documents),
                "system.vectorstore_ready": self.rag_system.vectorstore is not None
            })
            
            return True
    
    def process_query(self, query: str) -> Dict[str, Any]:
        """Process a user query and return response"""
        if not self.is_initialized:
            print("❌ System not initialized. Please run initialize_system() first.")
            return {
                "answer": "System not initialized. Please run initialize_system() first.",
                "mode": "Error",
                "sources": [],
                "error": "System not initialized"
            }
        
        print(f"\n🎯 Processing query: '{query}'")
        
        with trace_operation("tool-orchestator") as process_span:
            process_span.set_attributes({
                "input_query": query,
                "query.length": len(query)
            })
            
            start_time = time.time()
            
            try:
                # Route query and get response
                mode, response = self.query_router.route_query(query)
                
                # Add processing metrics
                end_time = time.time()
                processing_time = end_time - start_time
                
                response.update({
                    "processing_time": processing_time,
                    "timestamp": time.time()
                })
                
                # Add metrics to span
                process_span.set_attributes({
                    "processing.mode": mode,
                    "processing.time_seconds": processing_time,
                    "processing.response_length": len(response.get("answer", "")),
                    "processing.sources_count": len(response.get("sources", [])),
                    "output_response": f"Final Answer: {response.get('answer', '')[:200]}{'...' if len(response.get('answer', '')) > 200 else ''}",
                    "final_answer_mode": mode,
                    "query_processed.input_query": query,
                    "query_processed.output_response": f"Successfully processed query using {mode}, generated {len(response.get('answer', ''))} character response",
                    "query_processed.mode": mode,
                    "query_processed.processing_time": processing_time,
                    "query_processed.response_length": len(response.get("answer", ""))
                })
                
                print(f"✅ Query processed in {processing_time:.2f}s using {mode}")
                return response
                
            except Exception as e:
                error_msg = f"Error processing query: {str(e)}"
                print(f"❌ {error_msg}")
                
                process_span.add_event("query_processing_error", {
                    "error": str(e),
                    "input_query": query,
                    "output_response": f"I encountered an error while processing your query: {str(e)}"
                })
                
                return {
                    "answer": f"I encountered an error while processing your query: {str(e)}",
                    "mode": "Error",
                    "sources": [],
                    "error": str(e),
                    "processing_time": time.time() - start_time
                }
    
    def display_response(self, response: Dict[str, Any]) -> None:
        """Display the response in a formatted way"""
        print("\n" + "="*80)
        print(f"🤖 NOVEUM AI AGENT RESPONSE")
        print("="*80)
        print(f"📊 Mode: {response.get('mode', 'Unknown')}")
        print(f"⏱️  Processing Time: {response.get('processing_time', 0):.2f}s")
        print(f"📅 Timestamp: {time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(response.get('timestamp', time.time())))}")
        
        if response.get('sources'):
            print(f"📚 Sources ({len(response['sources'])}):")
            for i, source in enumerate(response['sources'][:3], 1):  # Show first 3 sources
                print(f"   {i}. {source}")
            if len(response['sources']) > 3:
                print(f"   ... and {len(response['sources']) - 3} more")
        
        print("\n💬 Answer:")
        print("-" * 40)
        print(response.get('answer', 'No answer provided'))
        print("="*80)

# Initialize the main agent
noveum_agent = NoveumAIAgent(scraper, rag_system, web_search_system, query_router, CONFIG)
print("✅ Noveum AI Agent initialized!")


✅ Noveum AI Agent initialized!


In [56]:
# Cell 8: Usage Examples and Demo
def demo_noveum_agent():
    """Demo function showing how to use the Noveum AI Agent"""
    
    print("🎬 NOVEUM AI AGENT DEMO")
    print("="*50)
    
    # Step 1: Initialize the system
    print("\n1️⃣ Initializing the system...")
    success = noveum_agent.initialize_system(force_scrape=False)  # Set to True to force re-scraping
    
    if not success:
        print("❌ Failed to initialize system")
        return
    
    # Step 2: Demo queries - 20 comprehensive test questions
    demo_queries = [
        # RAG Queries (Noveum-specific)
        "What is Noveum and what does it do?",  # Basic product info
        "How do I integrate Noveum Trace in my application?",  # Technical integration
        "What are Noveum's pricing plans?",  # Pricing information
        "What features does Noveum Trace offer?",  # Feature overview
        "How do I set up observability with Noveum?",  # Setup guidance
        "What APIs are available in Noveum platform?",  # API documentation
        "How does Noveum handle agent tracing?",  # Technical details
        "What monitoring capabilities does Noveum provide?",  # Capabilities
        "How do I configure Noveum for my system?",  # Configuration
        "What are the benefits of using Noveum Trace?",  # Value proposition
        
        # Web Search Queries (External/Recent information)
        "What are the latest AI news today?",  # Recent news
        "What's the weather like today?",  # Current weather
        "Tell me about recent developments in machine learning",  # Recent developments
        "What are the current trends in observability tools?",  # Industry trends
        "What happened in tech news this week?",  # Weekly tech news
        "What are the latest updates in Python programming?",  # Recent updates
        "What's the current status of cryptocurrency markets?",  # Market information
        "What are the newest features in cloud computing?",  # Recent features
        "What's happening in the software development world today?",  # Current events
        "What are the latest breakthroughs in artificial intelligence?"  # Recent breakthroughs
    ]
    
    print(f"\n2️⃣ Running {len(demo_queries)} demo queries...")
    
    for i, query in enumerate(demo_queries, 1):
        print(f"\n--- Demo Query {i} ---")
        response = noveum_agent.process_query(query)
        noveum_agent.display_response(response)
        
        # Small delay between queries
        time.sleep(1)
    
    print(f"\n🎉 Demo completed! Check Noveum Trace dashboard for detailed observability data.")
    print("💡 You can now use noveum_agent.process_query('your question') for your own queries!")

# Interactive query function
def ask_question(question: str):
    """Convenience function to ask a single question"""
    if not noveum_agent.is_initialized:
        print("⚠️  System not initialized. Initializing now...")
        if not noveum_agent.initialize_system():
            print("❌ Failed to initialize system")
            return
    
    response = noveum_agent.process_query(question)
    noveum_agent.display_response(response)
    return response

print("✅ Demo functions ready!")
print("\n🚀 To get started:")
print("1. Run: demo_noveum_agent()  # For a full demo")
print("2. Run: ask_question('Your question here')  # For a single question")
print("3. Or use: noveum_agent.process_query('Your question')  # For programmatic access")


✅ Demo functions ready!

🚀 To get started:
1. Run: demo_noveum_agent()  # For a full demo
2. Run: ask_question('Your question here')  # For a single question
3. Or use: noveum_agent.process_query('Your question')  # For programmatic access


In [None]:
demo_noveum_agent()

## Downloading the data set

In [None]:
!python noveum_customer_support_bt/traces/fetch_traces_api.py 50

#. This script fetches traces for our project and saves them locally.

In [None]:
!python NovaEval/noveum_customer_support_bt/traces/combine_spans_api_compat.py

## Data Filteration and mapping

In [None]:
!python preprocess_map.py ./traces/dataset_filtered.json

In [None]:
!python preprocess_map.py NovaEval/noveum_customer_support_bt/traces/traces/dataset_filtered.json



## Running eval on the dataset

In [None]:
# 1. Setup
!source .venv/bin/activate
!cd noveum_customer_support_bt

# 2. Create Dataset
!python create_dataset.py --dataset-type agent --description "Customer Support Agent Evaluation Dataset" --pretty



In [None]:
# 3. Create Version
!python create_dataset_version.py --pretty


In [None]:
# Getting scores
from demo_utils import run_complete_agent_evaluation
import os

# Process all JSON files in split_datasets directory
for file in os.listdir('split_datasets'):
    if file.endswith('.json'):
        print(f'Processing {file}...')
        run_complete_agent_evaluation(
            f'split_datasets/{file}', 
            sample_size=25, 
            evaluation_name=file.replace('.json', ''),
            output_dir='./demo_results'
        )
        print(f'Completed {file}\n')

In [14]:

import pandas as pd
import json

# Read the CSV file
df = pd.read_csv('demo_results/agent.query_routing_dataset/agent_evaluation_results.csv')

# Create API data structure with all task_ids
api_data = {
    'items': [
        {
            'item_key': str(row['task_id']),
            'item_id': f'item_{i+1}'  # Generate unique item IDs
        }
        for i, row in df.iterrows()
    ]
}

# Save to JSON
with open('api_data.json', 'w') as f:
    json.dump(api_data, f, indent=2)

print('Created api_data.json with', len(api_data['items']), 'items')
print('Sample items:')
for item in api_data['items'][:3]:
    print(f'  {item}')


Created api_data.json with 20 items
Sample items:
  {'item_key': 'eda4fe22-9a2b-4b73-856b-f4f3309bf719', 'item_id': 'item_1'}
  {'item_key': '0ffffba1-8a37-443c-8866-d53ffbfa7718', 'item_id': 'item_2'}
  {'item_key': 'f1f37bd7-0851-4659-b493-b80d3800d920', 'item_id': 'item_3'}


In [None]:
!python upload_scores.py demo_results/agent.query_routing_dataset/agent_evaluation_results.csv --item-key-col task_id --score-col context_relevancy --reasoning-col context_relevancy_reasoning --api-data api_data.json --scorer-id context_relevancy_scorer

In [None]:
!python upload_scores.py demo_results/agent.query_routing_dataset/agent_evaluation_results.csv --item-key-col task_id --score-col role_adherence --reasoning-col role_adherence_reasoning --api-data api_data.json --scorer-id role_adherence_scorer

In [None]:
!python upload_scores.py demo_results/agent.query_routing_dataset/agent_evaluation_results.csv --item-key-col task_id --score-col parameter_correctness --reasoning-col parameter_correctness_reasoning --api-data api_data.json --scorer-id parameter_correctness_scorer