In [None]:
# Fix: Upload Dataset with Proper Environment Loading
!cd /Users/mramanindia/work/NovaEval/noveum_customer_support_bt && source .env && python upload_dataset.py --dataset-json split_datasets/agent.rag_evaluation_metrics_dataset.json --item-type conversation


# Noveum AI Agent with RAG + Web Search

An intelligent conversational agent that dynamically routes queries between **RAG (Retrieval-Augmented Generation)** for Noveum.ai-specific information and **Web Search** for external knowledge, providing comprehensive answers with full observability.

## 🚀 What This Agent Does

### Core Functionality
- **Intelligent Query Routing**: Automatically determines whether to use RAG or Web Search based on query content
- **Dual Knowledge Sources**: 
  - **RAG Mode**: Answers questions about Noveum.ai platform using scraped documentation
  - **Web Search Mode**: Handles external queries using real-time web search
- **Comprehensive Tracing**: Full observability with detailed metrics and performance tracking
- **Modular Architecture**: Clean separation of concerns for easy maintenance and extension

### Key Capabilities
- 🧠 **Document Intelligence**: Scrapes and indexes Noveum.ai website content for semantic search
- 🌐 **Real-time Web Search**: Uses DuckDuckGo for current events and external knowledge
- 🎯 **Smart Classification**: LLM-powered query routing with keyword fallback
- 📊 **Performance Monitoring**: Detailed metrics on response quality, latency, and token usage
- 🔄 **Scalable Design**: Easy to extend with new data sources or routing logic

## 📋 Prerequisites & Requirements

### Required Environment Variables
```bash
NOVEUM_API_KEY=your_noveum_api_key_here
OPENAI_API_KEY=your_openai_api_key_here
```

### Required Python Packages
- `requests` - HTTP requests for web scraping
- `beautifulsoup4` - HTML parsing
- `trafilatura` - Advanced text extraction
- `langchain` - LLM framework and vector operations
- `langchain-openai` - OpenAI integration
- `langchain-community` - Community tools (FAISS, DuckDuckGo)
- `noveum-trace` - Observability and tracing
- `python-dotenv` - Environment variable management

### System Requirements
- Python 3.8+
- Internet connection for web scraping and API calls
- ~500MB disk space for vector store and scraped data

## 🏗️ Architecture Overview

### 1. **Website Scraper** (`NoveumWebsiteScraper`)
- Recursively scrapes noveum.ai website and sub-pages
- Extracts clean text content using trafilatura
- Discovers internal links automatically
- Saves scraped data to JSON for persistence

### 2. **RAG System** (`NoveumRAGSystem`)
- Loads scraped documents and creates vector embeddings
- Uses FAISS for fast similarity search
- Generates context-aware responses using OpenAI GPT-4o-mini
- Tracks retrieval effectiveness and response quality

### 3. **Web Search System** (`NoveumWebSearchSystem`)
- Integrates DuckDuckGo search for external queries
- Synthesizes information from multiple web sources
- Handles real-time information and current events
- Formats search results into coherent responses

### 4. **Query Router** (`NoveumQueryRouter`)
- **Keyword-based classification**: Matches queries against predefined keyword lists
- **LLM-based classification**: Uses GPT-4o-mini for complex query analysis
- **Confidence scoring**: Evaluates routing decision quality
- **Fallback handling**: Defaults to Web Search for ambiguous queries

### 5. **Main Agent** (`NoveumAIAgent`)
- Orchestrates all components
- Manages system initialization and data loading
- Provides unified interface for query processing
- Handles error recovery and response formatting

## 🎯 How to Use

### Quick Start
```python
# 1. Initialize the system (first time only)
noveum_agent.initialize_system(force_scrape=True)

# 2. Ask questions
response = noveum_agent.process_query("What is Noveum and what does it do?")
noveum_agent.display_response(response)

# 3. Or use convenience function
ask_question("How do I integrate Noveum Trace?")
```

### Advanced Usage
```python
# Run full demo with 20 test queries
demo_noveum_agent()

# Process queries programmatically
response = noveum_agent.process_query("What are the latest AI news?")
print(f"Mode: {response['mode']}")
print(f"Answer: {response['answer']}")
print(f"Sources: {response['sources']}")
```

### Query Types

#### RAG Queries (Noveum-specific)
- "What is Noveum and what does it do?"
- "How do I integrate Noveum Trace?"
- "What are Noveum's pricing plans?"
- "What features does Noveum Trace offer?"
- "How do I set up observability with Noveum?"

#### Web Search Queries (External knowledge)
- "What are the latest AI news today?"
- "What's the weather like today?"
- "Tell me about recent developments in machine learning"
- "What are the current trends in observability tools?"
- "What happened in tech news this week?"

## 📊 Observability & Monitoring

### Traced Operations
- **System Initialization**: Website scraping and vector store creation
- **Query Processing**: End-to-end query handling with performance metrics
- **RAG Operations**: Document retrieval, context generation, and response creation
- **Web Search Operations**: Search execution, result synthesis, and response generation
- **Query Routing**: Classification decision making and confidence scoring

### Key Metrics Tracked
- **Performance**: Response latency, processing time, token usage
- **Quality**: Response length, source diversity, context utilization
- **Routing**: Classification confidence, keyword scores, decision rationale
- **Model Usage**: Token consumption, cost estimation, efficiency scores
- **Retrieval**: Document relevance, context quality, source effectiveness

### Noveum Trace Integration
- All operations are automatically traced with detailed spans
- Comprehensive attribute tracking for debugging and optimization
- Real-time monitoring through Noveum.ai dashboard
- Export capabilities for further analysis

## 🔧 Configuration

### Default Settings
```python
CONFIG = {
    "noveum_base_url": "https://noveum.ai",
    "max_pages_to_scrape": 50,
    "chunk_size": 1000,
    "chunk_overlap": 200,
    "max_search_results": 5,
    "rag_threshold": 0.7,
    "noveum_docs_file": "noveum_docs.json",
    "vector_store_path": "noveum_vectorstore"
}
```

### Customization Options
- **Scraping**: Adjust `max_pages_to_scrape` for more/less content
- **RAG**: Modify `chunk_size` and `chunk_overlap` for different text splitting
- **Search**: Change `max_search_results` for more/fewer sources
- **Routing**: Add keywords to `rag_keywords` or `web_keywords` lists

## 🚨 Error Handling

### Common Issues
- **API Key Missing**: Ensure `NOVEUM_API_KEY` and `OPENAI_API_KEY` are set
- **Network Errors**: Check internet connection for scraping and API calls
- **Vector Store Issues**: Delete `noveum_vectorstore` folder to regenerate
- **Scraping Failures**: Set `force_scrape=True` to re-scrape website

### Recovery Strategies
- Automatic fallback to Web Search for RAG failures
- Graceful error handling with informative messages
- Retry mechanisms for transient network issues
- Detailed error logging for debugging

## 🔄 Maintenance

### Regular Tasks
- **Update Scraped Content**: Run with `force_scrape=True` periodically
- **Monitor Performance**: Check Noveum Trace dashboard for metrics
- **Review Routing**: Analyze query classification accuracy
- **Update Keywords**: Add new terms to routing keyword lists

### Scaling Considerations
- **Vector Store**: Can be shared across multiple agent instances
- **Scraped Data**: JSON file can be versioned and distributed
- **API Limits**: Monitor OpenAI token usage and costs
- **Performance**: Consider caching for frequently asked questions


In [20]:
!pip3 install -r ./noveum_agent_requirements.txt


Collecting requests==2.32.3 (from -r ./noveum_agent_requirements.txt (line 3))
  Using cached requests-2.32.3-py3-none-any.whl.metadata (4.6 kB)
Collecting beautifulsoup4==4.12.3 (from -r ./noveum_agent_requirements.txt (line 4))
  Using cached beautifulsoup4-4.12.3-py3-none-any.whl.metadata (3.8 kB)
Collecting python-dotenv==1.0.1 (from -r ./noveum_agent_requirements.txt (line 5))
  Using cached python_dotenv-1.0.1-py3-none-any.whl.metadata (23 kB)
Collecting langchain==0.3.26 (from -r ./noveum_agent_requirements.txt (line 8))
  Using cached langchain-0.3.26-py3-none-any.whl.metadata (7.8 kB)
Collecting langchain-community==0.3.18 (from -r ./noveum_agent_requirements.txt (line 9))
  Using cached langchain_community-0.3.18-py3-none-any.whl.metadata (2.4 kB)
Collecting langchain-core==0.3.66 (from -r ./noveum_agent_requirements.txt (line 10))
  Using cached langchain_core-0.3.66-py3-none-any.whl.metadata (5.8 kB)
Collecting langchain-openai==0.3.25 (from -r ./noveum_agent_requirements.t

In [28]:
# Cell 1: Setup & Imports
import os
import json
import time
from typing import List, Dict, Any, Optional, Tuple
from urllib.parse import urljoin, urlparse
import requests
from bs4 import BeautifulSoup
import trafilatura

# LangChain ecosystem
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_community.vectorstores import FAISS
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.schema import Document
from langchain_community.tools import DuckDuckGoSearchRun

# Noveum Trace integration
import noveum_trace
from noveum_trace.context_managers import trace_operation, trace_agent

# Load environment variables
try:
    from dotenv import load_dotenv
    load_dotenv()
except ImportError:
    print("python-dotenv not installed. Environment variables will be read from system only.")

print("✅ All imports loaded successfully!")


✅ All imports loaded successfully!


In [None]:


## set openai api key
## set gemini api key
## set noveum api key
## set environment
## set project'

# These are required for the project


In [None]:
# Cell 2: Noveum Trace Integration & Configuration
# Initialize the Noveum Trace SDK
noveum_trace.init(
    project="customer_support_agent",
    api_key=os.getenv("NOVEUM_API_KEY"),
    environment="dev-aman",
)

# Configuration
CONFIG = {
    "noveum_base_url": "https://noveum.ai",
    "max_pages_to_scrape": 50,
    "chunk_size": 1000,
    "chunk_overlap": 200,
    "max_search_results": 5,
    "rag_threshold": 0.7,  # Similarity threshold for RAG retrieval
    "noveum_docs_file": "noveum_docs.json",
    "vector_store_path": "noveum_vectorstore"
}

# Initialize LLM and embeddings
llm = ChatOpenAI(
    model="gpt-4o-mini",
    temperature=0.1,
    api_key=os.getenv("OPENAI_API_KEY")
)

embeddings = OpenAIEmbeddings(
    model="text-embedding-3-small",
    api_key=os.getenv("OPENAI_API_KEY")
)

# Initialize web search tool
web_search = DuckDuckGoSearchRun()

print("✅ Noveum Trace initialized and configuration loaded!")
print(f"🔧 Configuration: {CONFIG}")


✅ Noveum Trace initialized and configuration loaded!
🔧 Configuration: {'noveum_base_url': 'https://noveum.ai', 'max_pages_to_scrape': 50, 'chunk_size': 1000, 'chunk_overlap': 200, 'max_search_results': 5, 'rag_threshold': 0.7, 'noveum_docs_file': 'noveum_docs.json', 'vector_store_path': 'noveum_vectorstore'}


In [50]:
# Cell 3: Website Scraper - Extract content from noveum.ai and sub-URLs
class NoveumWebsiteScraper:
    def __init__(self, base_url: str, max_pages: int = 50):
        self.base_url = base_url
        self.max_pages = max_pages
        self.scraped_urls = set()
        self.scraped_content = []
        self.session = requests.Session()
        self.session.headers.update({
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
        })
    
    def is_valid_url(self, url: str) -> bool:
        """Check if URL is valid and belongs to noveum.ai domain"""
        try:
            parsed = urlparse(url)
            return (
                parsed.netloc in ['noveum.ai', 'www.noveum.ai'] and
                not any(ext in url.lower() for ext in ['.pdf', '.jpg', '.png', '.gif', '.css', '.js', '.xml', '.txt']) and
                '#' not in url
            )
        except:
            return False
    
    def extract_text_content(self, html_content: str, url: str) -> str:
        """Extract clean text content from HTML"""
        try:
            # Use trafilatura for better text extraction
            extracted = trafilatura.extract(html_content)
            if extracted:
                return extracted.strip()
            
            # Fallback to BeautifulSoup
            soup = BeautifulSoup(html_content, 'html.parser')
            
            # Remove script and style elements
            for script in soup(["script", "style"]):
                script.decompose()
            
            # Get text and clean up
            text = soup.get_text()
            lines = (line.strip() for line in text.splitlines())
            chunks = (phrase.strip() for line in lines for phrase in line.split("  "))
            text = ' '.join(chunk for chunk in chunks if chunk)
            
            return text.strip()
        except Exception as e:
            print(f"Error extracting text from {url}: {e}")
            return ""
    
    def find_internal_links(self, html_content: str, current_url: str) -> List[str]:
        """Find all internal links from the current page"""
        try:
            soup = BeautifulSoup(html_content, 'html.parser')
            links = []
            
            for link in soup.find_all('a', href=True):
                href = link['href']
                full_url = urljoin(current_url, href)
                
                if self.is_valid_url(full_url) and full_url not in self.scraped_urls:
                    links.append(full_url)
            
            return links
        except Exception as e:
            print(f"Error finding links in {current_url}: {e}")
            return []
    
    def scrape_page(self, url: str) -> Optional[Dict[str, Any]]:
        """Scrape a single page and return content"""
        try:
            print(f"🔍 Scraping: {url}")
            response = self.session.get(url, timeout=10)
            response.raise_for_status()
            
            # Extract text content
            text_content = self.extract_text_content(response.text, url)
            
            if not text_content or len(text_content) < 100:  # Skip pages with too little content
                print(f"⚠️  Skipping {url} - insufficient content")
                return None
            
            # Find internal links
            internal_links = self.find_internal_links(response.text, url)
            
            page_data = {
                "url": url,
                "title": self.extract_title(response.text),
                "content": text_content,
                "content_length": len(text_content),
                "internal_links": internal_links,
                "scraped_at": time.time()
            }
            
            print(f"✅ Scraped {url} - {len(text_content)} chars, {len(internal_links)} internal links")
            return page_data
            
        except Exception as e:
            print(f"❌ Error scraping {url}: {e}")
            return None
    
    def extract_title(self, html_content: str) -> str:
        """Extract page title"""
        try:
            soup = BeautifulSoup(html_content, 'html.parser')
            title_tag = soup.find('title')
            return title_tag.get_text().strip() if title_tag else "Untitled"
        except:
            return "Untitled"
    
    def scrape_website(self) -> List[Dict[str, Any]]:
        """Main scraping function - scrape noveum.ai recursively"""
        print(f"🚀 Starting to scrape {self.base_url}")
        
        urls_to_scrape = [self.base_url]
        self.scraped_urls.add(self.base_url)
        
        with trace_operation("noveum_website_scraping") as scrape_span:
            scrape_span.set_attributes({
                "scraper.base_url": self.base_url,
                "scraper.max_pages": self.max_pages,
                "input_query": f"Scrape website: {self.base_url}",
                "output_response": f"Scraping completed: {len(self.scraped_content)} pages scraped, {sum(page['content_length'] for page in self.scraped_content)} total characters extracted"
            })
            
            while urls_to_scrape and len(self.scraped_content) < self.max_pages:
                current_url = urls_to_scrape.pop(0)
                
                # Scrape the current page
                page_data = self.scrape_page(current_url)
                
                if page_data:
                    self.scraped_content.append(page_data)
                    
                    # Add new internal links to the queue
                    for link in page_data["internal_links"]:
                        if link not in self.scraped_urls and len(urls_to_scrape) < 100:  # Prevent infinite loops
                            urls_to_scrape.append(link)
                            self.scraped_urls.add(link)
                    
                    # Add page data to span
                    scrape_span.add_event("page_scraped", {
                        "input_query": f"Scrape page: {current_url}",
                        "output_response": f"Page scraped successfully: {page_data['content_length']} characters, {len(page_data['internal_links'])} internal links found",
                        "url": current_url,
                        "content_length": page_data["content_length"],
                        "internal_links_found": len(page_data["internal_links"])
                    })
                
                # Small delay to be respectful
                time.sleep(0.5)
            
            # Final metrics
            scrape_span.set_attributes({
                "scraper.pages_scraped": len(self.scraped_content),
                "scraper.total_urls_found": len(self.scraped_urls),
                "scraper.total_content_length": sum(page["content_length"] for page in self.scraped_content)
            })
        
        print(f"✅ Scraping complete! Scraped {len(self.scraped_content)} pages")
        return self.scraped_content
    
    def save_to_json(self, filename: str) -> None:
        """Save scraped content to JSON file"""
        with open(filename, 'w', encoding='utf-8') as f:
            json.dump(self.scraped_content, f, indent=2, ensure_ascii=False)
        print(f"💾 Saved scraped content to {filename}")

# Initialize scraper
scraper = NoveumWebsiteScraper(CONFIG["noveum_base_url"], CONFIG["max_pages_to_scrape"])
print("✅ Website scraper initialized!")


✅ Website scraper initialized!


In [51]:
# Cell 4: RAG System - Vector search and retrieval over scraped content
class NoveumRAGSystem:
    def __init__(self, embeddings, llm, config):
        self.embeddings = embeddings
        self.llm = llm
        self.config = config
        self.vectorstore = None
        self.documents = []
        self.text_splitter = RecursiveCharacterTextSplitter(
            chunk_size=config["chunk_size"],
            chunk_overlap=config["chunk_overlap"]
        )
    
    def load_documents_from_json(self, json_file: str) -> List[Document]:
        """Load documents from scraped JSON file"""
        try:
            with open(json_file, 'r', encoding='utf-8') as f:
                scraped_data = json.load(f)
            
            documents = []
            for page in scraped_data:
                # Create document from page content
                doc = Document(
                    page_content=page["content"],
                    metadata={
                        "url": page["url"],
                        "title": page["title"],
                        "content_length": page["content_length"],
                        "scraped_at": page["scraped_at"]
                    }
                )
                documents.append(doc)
            
            print(f"✅ Loaded {len(documents)} documents from {json_file}")
            return documents
            
        except FileNotFoundError:
            print(f"❌ File {json_file} not found. Please run the scraper first.")
            return []
        except Exception as e:
            print(f"❌ Error loading documents: {e}")
            return []
    
    def create_vectorstore(self, documents: List[Document]) -> None:
        """Create FAISS vector store from documents"""
        if not documents:
            print("❌ No documents to create vector store")
            return
        
        print("🔄 Creating vector store...")
        
        # Split documents into chunks
        split_docs = self.text_splitter.split_documents(documents)
        print(f"📄 Split into {len(split_docs)} chunks")
        
        # Create vector store
        self.vectorstore = FAISS.from_documents(split_docs, self.embeddings)
        
        # Save vector store
        self.vectorstore.save_local(self.config["vector_store_path"])
        print(f"💾 Vector store saved to {self.config['vector_store_path']}")
    
    def load_vectorstore(self) -> bool:
        """Load existing vector store from disk"""
        try:
            self.vectorstore = FAISS.load_local(
                self.config["vector_store_path"], 
                self.embeddings,
                allow_dangerous_deserialization=True
            )
            print(f"✅ Loaded existing vector store from {self.config['vector_store_path']}")
            return True
        except Exception as e:
            print(f"❌ Error loading vector store: {e}")
            return False
    
    def search_relevant_docs(self, query: str, k: int = 5) -> List[Document]:
        """Search for relevant documents using similarity search"""
        if not self.vectorstore:
            print("❌ Vector store not initialized")
            return []
        
        try:
            # Perform similarity search
            docs = self.vectorstore.similarity_search(query, k=k)
            
            # Filter by similarity threshold if needed
            # Note: FAISS doesn't return scores by default, but we can add that if needed
            
            print(f"🔍 Found {len(docs)} relevant documents for query: '{query}'")
            return docs
            
        except Exception as e:
            print(f"❌ Error searching documents: {e}")
            return []
    
    def retrieve_context(self, query: str, max_docs: int = 5) -> str:
        """Retrieve and format context for the query"""
        relevant_docs = self.search_relevant_docs(query, max_docs)
        
        if not relevant_docs:
            return "No relevant information found in Noveum documentation."
        
        context_parts = []
        for i, doc in enumerate(relevant_docs, 1):
            context_parts.append(f"Source {i} ({doc.metadata.get('url', 'Unknown URL')}):\n{doc.page_content[:500]}...")
        
        return "\n\n".join(context_parts)
    
    def generate_rag_response(self, query: str) -> Dict[str, Any]:
        """Generate response using RAG"""
        with trace_agent(
            agent_type="rag_agent",
            operation="llm-rag",
            capabilities=["document_retrieval", "context_generation", "response_generation"],
            attributes={
                "agent.id": "noveum_rag_agent",
                "input_query": query,
                "query_length": len(query)
            }
        ) as rag_span:
            
            # Retrieve relevant context
            context = self.retrieve_context(query, CONFIG["max_search_results"])
            
            # Create prompt for RAG
            rag_prompt = f"""You are a helpful assistant for Noveum.ai. Answer the user's question based on the provided context from Noveum's documentation.

Context from Noveum documentation:
{context}

User Question: {query}

Instructions:
1. Answer based primarily on the provided context
2. If the context doesn't contain enough information, say so clearly
3. Be specific and cite sources when possible
4. Keep responses concise but informative
5. If the question is not related to Noveum, politely redirect to ask about Noveum

Answer:"""

            # Extract model parameters and metadata
            model_name = getattr(self.llm, 'model_name', 'unknown')
            model_temperature = getattr(self.llm, 'temperature', 0.0)
            model_max_tokens = getattr(self.llm, 'max_tokens', None)
            model_top_p = getattr(self.llm, 'top_p', None)
            model_frequency_penalty = getattr(self.llm, 'frequency_penalty', None)
            model_presence_penalty = getattr(self.llm, 'presence_penalty', None)
            
            # Model Details Span - Track model-specific information
            with trace_agent(
                agent_type="model_details",
                operation="llm_model_execution",
                capabilities=["model_invocation", "parameter_tracking", "latency_measurement"],
                attributes={
                    "agent.id": "noveum_model_details",
                    "input_query": f"Model execution for query: {query[:100]}{'...' if len(query) > 100 else ''}",
                    "query_length": len(query)
                }
            ) as model_span:
                
                # Record start time for latency measurement
                model_start_time = time.time()
                
                # Generate response
                response = self.llm.invoke(rag_prompt)
                
                # Record end time and calculate latency
                model_end_time = time.time()
                model_latency = model_end_time - model_start_time

                if response.content:
                    answer = response.content
                else:
                    answer = str(response)

                # Extract token usage metadata - Enhanced extraction
                prompt_tokens = 0
                completion_tokens = 0
                total_tokens = 0
                
                # Try multiple ways to extract token usage
                if hasattr(response, 'usage_metadata') and response.usage_metadata:
                    usage = response.usage_metadata
                    prompt_tokens = getattr(usage, "input_tokens", 0) or getattr(usage, "prompt_tokens", 0)
                    completion_tokens = getattr(usage, "output_tokens", 0) or getattr(usage, "completion_tokens", 0)
                    total_tokens = getattr(usage, "total_tokens", 0)
                elif hasattr(response, 'response_metadata') and response.response_metadata:
                    metadata = response.response_metadata
                    if 'token_usage' in metadata:
                        token_usage = metadata['token_usage']
                        prompt_tokens = token_usage.get('prompt_tokens', 0)
                        completion_tokens = token_usage.get('completion_tokens', 0)
                        total_tokens = token_usage.get('total_tokens', 0)
                elif hasattr(response, 'token_usage'):
                    token_usage = response.token_usage
                    prompt_tokens = getattr(token_usage, "prompt_tokens", 0)
                    completion_tokens = getattr(token_usage, "completion_tokens", 0)
                    total_tokens = getattr(token_usage, "total_tokens", 0)
                
                # If still no tokens found, try to estimate from content length
                if total_tokens == 0:
                    # Rough estimation: ~4 characters per token for English text
                    estimated_prompt_tokens = len(rag_prompt) // 4
                    estimated_completion_tokens = len(answer) // 4
                    prompt_tokens = estimated_prompt_tokens
                    completion_tokens = estimated_completion_tokens
                    total_tokens = prompt_tokens + completion_tokens

                # Set model details span attributes
                model_span.set_attributes({
                    # Input metrics
                    "input_query": f"Model execution for query: {query[:100]}{'...' if len(query) > 100 else ''}",
                    "query_length": len(query),
                    "query_type": "rag_model_query",
                    
                    # Model parameters and configuration
                    "model.name": model_name,
                    "model.temperature": model_temperature,
                    "model.max_tokens": model_max_tokens,
                    "model.top_p": model_top_p,
                    "model.frequency_penalty": model_frequency_penalty,
                    "model.presence_penalty": model_presence_penalty,
                    "model.provider": "openai",
                    "model.type": "chat_completion",
                    "model.version": "gpt-4o-mini",
                    
                    # Latency and performance metrics
                    "model.latency_seconds": model_latency,
                    "model.latency_ms": model_latency * 1000,
                    "model.start_time": model_start_time,
                    "model.end_time": model_end_time,
                    "model.performance_tier": "fast" if model_latency < 2.0 else "medium" if model_latency < 5.0 else "slow",
                    
                    # Token usage and cost metrics
                    "model.prompt_tokens": prompt_tokens,
                    "model.completion_tokens": completion_tokens,
                    "model.total_tokens": total_tokens,
                    "model.tokens_per_second": total_tokens / model_latency if model_latency > 0 else 0,
                    "model.estimated_cost": total_tokens * 0.00003,  # Rough cost estimate
                    "model.efficiency_score": len(answer) / total_tokens if total_tokens > 0 else 0,
                    
                    # Response characteristics
                    "model.response_length": len(answer),
                    "model.response_quality": "high" if len(answer) > 200 else "medium" if len(answer) > 100 else "low",
                    "model.output_response": f"Model Response: {answer[:200]}{'...' if len(answer) > 200 else ''}",
                    
                    # Model configuration details
                    "model.config": {
                        "name": model_name,
                        "temperature": model_temperature,
                        "max_tokens": model_max_tokens,
                        "top_p": model_top_p,
                        "frequency_penalty": model_frequency_penalty,
                        "presence_penalty": model_presence_penalty,
                        "provider": "openai",
                        "type": "chat_completion"
                    }
                })

            # Other Details Span - Track retrieval, response quality, and evaluation metrics
            with trace_agent(
                agent_type="other_details",
                operation="rag_evaluation_metrics",
                capabilities=["retrieval_analysis", "response_evaluation", "quality_assessment"],
                attributes={
                    "agent.id": "noveum_other_details",
                    "input_query": f"Evaluation for query: {query[:100]}{'...' if len(query) > 100 else ''}",
                    "query_length": len(query)
                }
            ) as rag_node:
                
                # Calculate additional evaluation metrics
                context_length = len(context)
                answer_length = len(answer)
                sources_count = len(context.split("Source")) - 1 if "Source" in context else 0
                
                # Set other details span attributes
                rag_node.set_attributes({
                    # Input metrics
                    "input_query": f"Evaluation for query: {query[:100]}{'...' if len(query) > 100 else ''}",
                    "query_length": len(query),
                    "query_type": "rag_evaluation_query",
                    
                    # Retrieval metrics
                    "retrieval.context_retrieved": f"Context: {context[:300]}{'...' if len(context) > 300 else ''}",
                    "retrieval.context_length": context_length,
                    "retrieval.sources_count": sources_count,
                    "retrieval.context_quality": "high" if context_length > 500 else "medium" if context_length > 200 else "low",
                    "retrieval.effectiveness": sources_count / 5.0,  # Normalized to max expected sources
                    "retrieval.context_utilization": context_length / 1000.0,  # Normalized context usage
                    
                    # Prompt engineering metrics
                    "prompt.complete_prompt": rag_prompt,
                    "prompt.prompt_length": len(rag_prompt),
                    "prompt.context_injection": f"Context injected: {context[:200]}{'...' if len(context) > 200 else ''}",
                    "prompt.instruction_following": "rag_optimized",
                    
                    # Response quality metrics
                    "response.answer_length": answer_length,
                    "response.answer_completeness": "complete" if answer_length > 100 else "brief",
                    "response.response_quality": "high" if answer_length > 200 and sources_count > 2 else "medium" if answer_length > 100 else "low",
                    "response.source_citation": sources_count,
                    "response.context_utilization": answer_length / context_length if context_length > 0 else 0,
                    "output_response": f"RAG Answer: {answer[:200]}{'...' if len(answer) > 200 else ''}",
                    
                    # Evaluation metrics
                    "evaluation.retrieval_effectiveness": sources_count / 5.0,
                    "evaluation.response_completeness": "complete" if answer_length > 150 else "partial",
                    "evaluation.source_diversity": sources_count,
                    "evaluation.context_relevance": "high" if context_length > 500 else "medium" if context_length > 200 else "low",
                    "evaluation.overall_quality": "high" if answer_length > 200 and sources_count > 2 and context_length > 500 else "medium" if answer_length > 100 and sources_count > 1 else "low",
                    "evaluation.ready_for_production": True,
                    
                    # RAG-specific metrics
                    "rag.retrieval_strategy": "semantic_similarity",
                    "rag.vector_search_results": sources_count,
                    "rag.context_synthesis": "multi_source" if sources_count > 1 else "single_source",
                    "rag.document_coverage": sources_count / 5.0,  # Normalized coverage
                    "rag.information_density": answer_length / context_length if context_length > 0 else 0
                })

            # Set main RAG span attributes (simplified)
            rag_span.set_attributes({
                "input_query": query,
                "query_length": len(query),
                "query_type": "rag_query",
                "output_response": f"RAG Answer: {answer[:200]}{'...' if len(answer) > 200 else ''}",
                "rag.context_length": context_length,
                "rag.sources_count": sources_count,
                "rag.answer_length": answer_length,
                "rag.mode": "retrieval_augmented_generation"
            })

            return {
                "answer": answer,
                "context": context,
                "mode": "RAG",
                "sources": [doc.metadata.get('url', 'Unknown') for doc in self.search_relevant_docs(query, CONFIG["max_search_results"])],
                "model_info": {
                    "name": model_name,
                    "tokens_used": total_tokens,
                    "prompt_tokens": prompt_tokens,
                    "completion_tokens": completion_tokens,
                    "latency": model_latency
                }
            }

# Initialize RAG system
rag_system = NoveumRAGSystem(embeddings, llm, CONFIG)
print("✅ RAG system initialized!")


✅ RAG system initialized!


In [52]:
# Cell 5: Web Search Integration - DuckDuckGo search for external queries
class NoveumWebSearchSystem:
    def __init__(self, web_search_tool, llm, config):
        self.web_search = web_search_tool
        self.llm = llm
        self.config = config
    
    def search_web(self, query: str, max_results: int = 5) -> List[Dict[str, Any]]:
        """Perform web search and return formatted results"""
        try:
            # Perform web search
            search_results = self.web_search.run(query)
            
            # Parse results (DuckDuckGo returns a string, need to parse it)
            results = []
            if isinstance(search_results, str):
                # Split by lines and parse each result
                lines = search_results.split('\n')
                for i, line in enumerate(lines[:max_results]):
                    if line.strip():
                        results.append({
                            "title": f"Search Result {i+1}",
                            "snippet": line.strip(),
                            "url": f"https://duckduckgo.com/?q={query.replace(' ', '+')}"
                        })
            else:
                # If it's already a list/dict format
                results = search_results[:max_results]
            
            print(f"🔍 Found {len(results)} web search results for: '{query}'")
            return results
            
        except Exception as e:
            print(f"❌ Error performing web search: {e}")
            return []
    
    def format_search_context(self, search_results: List[Dict[str, Any]]) -> str:
        """Format search results into context string"""
        if not search_results:
            return "No search results found."
        
        context_parts = []
        for i, result in enumerate(search_results, 1):
            title = result.get('title', f'Result {i}')
            snippet = result.get('snippet', 'No description available')
            url = result.get('url', 'No URL available')
            
            context_parts.append(f"Source {i} - {title}:\n{snippet}\nURL: {url}")
        
        return "\n\n".join(context_parts)
    
    def generate_web_response(self, query: str) -> Dict[str, Any]:
        """Generate response using web search"""
        with trace_agent(
            agent_type="web_search_agent",
            operation="web_search_generation",
            capabilities=["web_search", "content_synthesis", "response_generation"],
            attributes={
                "agent.id": "noveum_web_search_agent",
                "input_query": query,
                "query_length": len(query)
            }
        ) as web_span:
            
            # Perform web search
            search_results = self.search_web(query, self.config["max_search_results"])
            
            # Format context
            context = self.format_search_context(search_results)
            
            # Create prompt for web search response
            web_prompt = f"""You are a helpful assistant. Answer the user's question based on the provided web search results.

Web Search Results:
{context}

User Question: {query}

Instructions:
1. Answer based on the provided web search results
2. Synthesize information from multiple sources when relevant
3. Be informative and accurate
4. If the results don't contain enough information, say so clearly
5. Keep responses concise but comprehensive
6. Cite sources when possible

Answer:"""

            # Extract model parameters and metadata
            model_name = getattr(self.llm, 'model_name', 'unknown')
            model_temperature = getattr(self.llm, 'temperature', 0.0)
            model_max_tokens = getattr(self.llm, 'max_tokens', None)
            model_top_p = getattr(self.llm, 'top_p', None)
            model_frequency_penalty = getattr(self.llm, 'frequency_penalty', None)
            model_presence_penalty = getattr(self.llm, 'presence_penalty', None)

            # Model Details Span - Track model-specific information
            with trace_agent(
                agent_type="model_details",
                operation="llm_model_execution",
                capabilities=["model_invocation", "parameter_tracking", "latency_measurement"],
                attributes={
                    "agent.id": "noveum_model_details",
                    "input_query": f"Model execution for query: {query[:100]}{'...' if len(query) > 100 else ''}",
                    "query_length": len(query)
                }
            ) as model_span:
                
                # Record start time for latency measurement
                model_start_time = time.time()
                
                # Generate response
                response = self.llm.invoke(web_prompt)
                
                # Record end time and calculate latency
                model_end_time = time.time()
                model_latency = model_end_time - model_start_time

                # Handle response content extraction
                if hasattr(response, 'content'):
                    # When response is a proper SDK object
                    answer = response.content
                elif isinstance(response, dict):
                    # When response is returned as a plain dict
                    answer = response.get('content', '')
                else:
                    # Fallback to string
                    answer = str(response)

                # Extract token usage metadata - Enhanced extraction
                prompt_tokens = 0
                completion_tokens = 0
                total_tokens = 0
                
                # Try multiple ways to extract token usage
                if hasattr(response, 'usage_metadata') and response.usage_metadata:
                    usage = response.usage_metadata
                    prompt_tokens = getattr(usage, "input_tokens", 0) or getattr(usage, "prompt_tokens", 0)
                    completion_tokens = getattr(usage, "output_tokens", 0) or getattr(usage, "completion_tokens", 0)
                    total_tokens = getattr(usage, "total_tokens", 0)
                elif hasattr(response, 'response_metadata') and response.response_metadata:
                    metadata = response.response_metadata
                    if 'token_usage' in metadata:
                        token_usage = metadata['token_usage']
                        prompt_tokens = token_usage.get('prompt_tokens', 0)
                        completion_tokens = token_usage.get('completion_tokens', 0)
                        total_tokens = token_usage.get('total_tokens', 0)
                elif hasattr(response, 'token_usage'):
                    token_usage = response.token_usage
                    prompt_tokens = getattr(token_usage, "prompt_tokens", 0)
                    completion_tokens = getattr(token_usage, "completion_tokens", 0)
                    total_tokens = getattr(token_usage, "total_tokens", 0)
                
                # If still no tokens found, try to estimate from content length
                if total_tokens == 0:
                    # Rough estimation: ~4 characters per token for English text
                    estimated_prompt_tokens = len(web_prompt) // 4
                    estimated_completion_tokens = len(answer) // 4
                    prompt_tokens = estimated_prompt_tokens
                    completion_tokens = estimated_completion_tokens
                    total_tokens = prompt_tokens + completion_tokens

                # Set model details span attributes
                model_span.set_attributes({
                    # Input metrics
                    "input_query": f"Model execution for query: {query[:100]}{'...' if len(query) > 100 else ''}",
                    "query_length": len(query),
                    "query_type": "web_search_model_query",
                    
                    # Model parameters and configuration
                    "model.name": model_name,
                    "model.temperature": model_temperature,
                    "model.max_tokens": model_max_tokens,
                    "model.top_p": model_top_p,
                    "model.frequency_penalty": model_frequency_penalty,
                    "model.presence_penalty": model_presence_penalty,
                    "model.provider": "openai",
                    "model.type": "chat_completion",
                    "model.version": "gpt-4o-mini",
                    
                    # Latency and performance metrics
                    "model.latency_seconds": model_latency,
                    "model.latency_ms": model_latency * 1000,
                    "model.start_time": model_start_time,
                    "model.end_time": model_end_time,
                    "model.performance_tier": "fast" if model_latency < 2.0 else "medium" if model_latency < 5.0 else "slow",
                    
                    # Token usage and cost metrics
                    "model.prompt_tokens": prompt_tokens,
                    "model.completion_tokens": completion_tokens,
                    "model.total_tokens": total_tokens,
                    "model.tokens_per_second": total_tokens / model_latency if model_latency > 0 else 0,
                    "model.estimated_cost": total_tokens * 0.00003,  # Rough cost estimate
                    "model.efficiency_score": len(answer) / total_tokens if total_tokens > 0 else 0,
                    
                    # Response characteristics
                    "model.response_length": len(answer),
                    "model.response_quality": "high" if len(answer) > 200 else "medium" if len(answer) > 100 else "low",
                    "model.output_response": f"Model Response: {answer[:200]}{'...' if len(answer) > 200 else ''}",
                    
                    # Model configuration details
                    "model.config": {
                        "name": model_name,
                        "temperature": model_temperature,
                        "max_tokens": model_max_tokens,
                        "top_p": model_top_p,
                        "frequency_penalty": model_frequency_penalty,
                        "presence_penalty": model_presence_penalty,
                        "provider": "openai",
                        "type": "chat_completion"
                    }
                })

            # Other Details Span - Track web search, response quality, and evaluation metrics
            with trace_agent(
                agent_type="other_details",
                operation="web_search_evaluation_metrics",
                capabilities=["web_search_analysis", "response_evaluation", "quality_assessment"],
                attributes={
                    "agent.id": "noveum_other_details",
                    "input_query": f"Evaluation for query: {query[:100]}{'...' if len(query) > 100 else ''}",
                    "query_length": len(query)
                }
            ) as other_span:
                
                # Calculate additional evaluation metrics
                search_results_count = len(search_results)
                context_length = len(context)
                answer_length = len(answer or "")
                
                # Set other details span attributes
                other_span.set_attributes({
                    # Input metrics
                    "input_query": f"Evaluation for query: {query[:100]}{'...' if len(query) > 100 else ''}",
                    "query_length": len(query),
                    "query_type": "web_search_evaluation_query",
                    
                    # Web search metrics
                    "web_search.results_count": search_results_count,
                    "web_search.context_length": context_length,
                    "web_search.context_synthesized": f"Context from {search_results_count} web sources",
                    "web_search.search_effectiveness": search_results_count / 5.0,  # Normalized to max expected results
                    "web_search.context_quality": "high" if context_length > 800 else "medium" if context_length > 400 else "low",
                    "web_search.source_diversity": search_results_count,
                    "web_search.information_synthesis": "high" if search_results_count > 3 and answer_length > 200 else "medium" if search_results_count > 1 else "low",
                    "web_search.external_knowledge_utilization": context_length / 1000.0,  # Normalized context usage
                    
                    # Prompt engineering metrics
                    "prompt.complete_prompt": web_prompt,
                    "prompt.prompt_length": len(web_prompt),
                    "prompt.context_injection": f"Web context injected: {context[:200]}{'...' if len(context) > 200 else ''}",
                    "prompt.instruction_following": "web_search_optimized",
                    
                    # Response quality metrics
                    "response.answer_length": answer_length,
                    "response.answer_completeness": "complete" if answer_length > 150 else "brief",
                    "response.response_quality": "high" if answer_length > 300 and search_results_count > 3 else "medium" if answer_length > 150 else "low",
                    "response.source_citation": search_results_count,
                    "response.context_utilization": answer_length / context_length if context_length > 0 else 0,
                    "output_response": f"Web Search Answer: {answer[:200]}{'...' if len(answer or '') > 200 else ''}" if answer else "No answer generated",
                    
                    # Evaluation metrics
                    "evaluation.search_effectiveness": search_results_count / 5.0,
                    "evaluation.response_completeness": "complete" if answer_length > 200 else "partial",
                    "evaluation.source_diversity": search_results_count,
                    "evaluation.context_relevance": "high" if context_length > 800 else "medium" if context_length > 400 else "low",
                    "evaluation.overall_quality": "high" if answer_length > 300 and search_results_count > 3 and context_length > 800 else "medium" if answer_length > 150 and search_results_count > 1 else "low",
                    "evaluation.ready_for_production": True,
                    
                    # Web search specific metrics
                    "web_search.search_strategy": "duckduckgo_api",
                    "web_search.real_time_data": True,
                    "web_search.external_sources": search_results_count,
                    "web_search.information_freshness": "current",
                    "web_search.knowledge_synthesis": "multi_source" if search_results_count > 1 else "single_source"
                })

            # Set main Web Search span attributes (simplified)
            web_span.set_attributes({
                "input_query": query,
                "query_length": len(query),
                "query_type": "web_search_query",
                "output_response": f"Web Search Answer: {answer[:200]}{'...' if len(answer or '') > 200 else ''}" if answer else "No answer generated",
                "web_search.results_count": search_results_count,
                "web_search.context_length": context_length,
                "web_search.response_length": answer_length,
                "web_search.mode": "external_web_search"
            })

            return {
                "answer": answer,
                "context": context,
                "mode": "Web Search",
                "sources": [result.get('url', 'Unknown') for result in search_results],
                "model_info": {
                    "name": model_name,
                    "tokens_used": total_tokens,
                    "prompt_tokens": prompt_tokens,
                    "completion_tokens": completion_tokens,
                    "latency": model_latency
                }
            }

# Initialize web search system
web_search_system = NoveumWebSearchSystem(web_search, llm, CONFIG)
print("✅ Web search system initialized!")


✅ Web search system initialized!


In [53]:
# Cell 6: Query Router - Intelligent decision making between RAG and Web Search
class NoveumQueryRouter:
    def __init__(self, llm, config):
        self.llm = llm
        self.config = config
        
        # Keywords that suggest RAG should be used
        self.rag_keywords = [
            "noveum", "platform", "product", "feature", "api", "documentation",
            "trace", "observability", "monitoring", "agent", "system", "tool",
            "integration", "setup", "configuration", "usage", "guide", "tutorial",
            "pricing", "plan", "subscription", "account", "dashboard", "metrics"
        ]
        
        # Keywords that suggest Web Search should be used
        self.web_keywords = [
            "recent", "latest", "news", "update", "announcement", "release",
            "today", "yesterday", "this week", "this month", "current",
            "trending", "popular", "viral", "breaking", "live", "real-time",
            "weather", "stock", "price", "market", "cryptocurrency", "bitcoin",
            "election", "politics", "sports", "entertainment", "celebrity"
        ]
    
    def classify_query(self, query: str) -> str:
        """Classify query to determine whether to use RAG or Web Search"""
        query_lower = query.lower()
        
        # Check for RAG keywords
        rag_score = sum(1 for keyword in self.rag_keywords if keyword in query_lower)
        
        # Check for Web Search keywords
        web_score = sum(1 for keyword in self.web_keywords if keyword in query_lower)
        
        # Check for explicit mentions of Noveum
        if "noveum" in query_lower:
            return "RAG"
        
        # If both scores are 0, use LLM-based classification
        if rag_score == 0 and web_score == 0:
            return self._llm_classify_query(query)
        
        # Return the mode with higher score
        return "RAG" if rag_score >= web_score else "Web Search"
    
    def _llm_classify_query(self, query: str) -> str:
        """Use LLM to classify query when keyword matching is inconclusive"""
        try:
            classification_prompt = f"""Classify the following user query to determine the best response method:

Query: "{query}"

Choose between:
- RAG: Use when the query is about Noveum.ai platform, products, features, documentation, or internal information
- Web Search: Use when the query is about recent events, news, general knowledge, or external topics

Respond with only "RAG" or "Web Search"."""

            # Extract model parameters for tracking
            model_name = getattr(self.llm, 'model_name', 'unknown')
            model_temperature = getattr(self.llm, 'temperature', 0.0)
            model_max_tokens = getattr(self.llm, 'max_tokens', None)
            model_top_p = getattr(self.llm, 'top_p', None)
            model_frequency_penalty = getattr(self.llm, 'frequency_penalty', None)
            model_presence_penalty = getattr(self.llm, 'presence_penalty', None)

            # Model Details Span for classification
            with trace_agent(
                agent_type="model_details",
                operation="llm_model_execution",
                capabilities=["model_invocation", "parameter_tracking", "latency_measurement"],
                attributes={
                    "agent.id": "noveum_model_details",
                    "input_query": f"Model execution for classification: {query[:100]}{'...' if len(query) > 100 else ''}",
                    "query_length": len(query)
                }
            ) as model_span:
                
                # Record start time for latency measurement
                model_start_time = time.time()
                
                response = self.llm.invoke(classification_prompt)
                
                # Record end time and calculate latency
                model_end_time = time.time()
                model_latency = model_end_time - model_start_time
            
                if hasattr(response, 'content'):
                    result = response.content.strip().upper()
                else:
                    result = str(response).strip().upper()

                # Extract token usage for classification
                prompt_tokens = 0
                completion_tokens = 0
                total_tokens = 0
                
                if hasattr(response, 'usage_metadata') and response.usage_metadata:
                    usage = response.usage_metadata
                    prompt_tokens = getattr(usage, "input_tokens", 0) or getattr(usage, "prompt_tokens", 0)
                    completion_tokens = getattr(usage, "output_tokens", 0) or getattr(usage, "completion_tokens", 0)
                    total_tokens = getattr(usage, "total_tokens", 0)
                elif hasattr(response, 'response_metadata') and response.response_metadata:
                    metadata = response.response_metadata
                    if 'token_usage' in metadata:
                        token_usage = metadata['token_usage']
                        prompt_tokens = token_usage.get('prompt_tokens', 0)
                        completion_tokens = token_usage.get('completion_tokens', 0)
                        total_tokens = token_usage.get('total_tokens', 0)
                
                # If still no tokens found, estimate
                if total_tokens == 0:
                    estimated_prompt_tokens = len(classification_prompt) // 4
                    estimated_completion_tokens = len(result) // 4
                    prompt_tokens = estimated_prompt_tokens
                    completion_tokens = estimated_completion_tokens
                    total_tokens = prompt_tokens + completion_tokens

                # Set model details span attributes
                model_span.set_attributes({
                    # Input metrics
                    "input_query": f"Model execution for classification: {query[:100]}{'...' if len(query) > 100 else ''}",
                    "query_length": len(query),
                    "query_type": "classification_model_query",
                    
                    # Model parameters and configuration
                    "model.name": model_name,
                    "model.temperature": model_temperature,
                    "model.max_tokens": model_max_tokens,
                    "model.top_p": model_top_p,
                    "model.frequency_penalty": model_frequency_penalty,
                    "model.presence_penalty": model_presence_penalty,
                    "model.provider": "openai",
                    "model.type": "chat_completion",
                    "model.version": "gpt-4o-mini",
                    
                    # Latency and performance metrics
                    "model.latency_seconds": model_latency,
                    "model.latency_ms": model_latency * 1000,
                    "model.start_time": model_start_time,
                    "model.end_time": model_end_time,
                    "model.performance_tier": "fast" if model_latency < 1.0 else "medium" if model_latency < 3.0 else "slow",
                    
                    # Token usage and cost metrics
                    "model.prompt_tokens": prompt_tokens,
                    "model.completion_tokens": completion_tokens,
                    "model.total_tokens": total_tokens,
                    "model.tokens_per_second": total_tokens / model_latency if model_latency > 0 else 0,
                    "model.estimated_cost": total_tokens * 0.00003,  # Rough cost estimate
                    "model.efficiency_score": len(result) / total_tokens if total_tokens > 0 else 0,
                    
                    # Response characteristics
                    "model.response_length": len(result),
                    "model.response_quality": "high" if len(result) > 10 else "medium" if len(result) > 5 else "low",
                    "model.output_response": f"Classification Result: {result}",
                    
                    # Model configuration details
                    "model.config": {
                        "name": model_name,
                        "temperature": model_temperature,
                        "max_tokens": model_max_tokens,
                        "top_p": model_top_p,
                        "frequency_penalty": model_frequency_penalty,
                        "presence_penalty": model_presence_penalty,
                        "provider": "openai",
                        "type": "chat_completion"
                    }
                })

            # Log classification details for debugging
            print(f"🔍 LLM Classification - Model: {model_name}, Tokens: {total_tokens}, Result: {result}")
            
            if "RAG" in result:
                return "RAG"
            elif "WEB" in result or "SEARCH" in result:
                return "Web Search"
            else:
                # Default to Web Search if unclear
                return "Web Search"
                
        except Exception as e:
            print(f"❌ Error in LLM classification: {e}")
            # Default to Web Search on error
            return "Web Search"
    
    def route_query(self, query: str) -> Tuple[str, Dict[str, Any]]:
        """Route query to appropriate system and return response"""
        with trace_agent(
            agent_type="query_router",
            operation="query_routing",
            capabilities=["query_classification", "routing_decision"],
            attributes={
                "agent.id": "noveum_query_router",
                "input_query": query,
                "query_length": len(query)
            }
        ) as router_span:
            
            # Define classification prompt for tracing
            classification_prompt = f"""Classify the following user query to determine the best response method:

Query: "{query}"

Choose between:
- RAG: Use when the query is about Noveum.ai platform, products, features, documentation, or internal information
- Web Search: Use when the query is about recent events, news, general knowledge, or external topics

Respond with only "RAG" or "Web Search"."""
            
            # Classify the query
            mode = self.classify_query(query)
            
            # Calculate routing evaluation metrics
            query_lower = query.lower()
            rag_keywords = ["noveum", "platform", "product", "feature", "api", "documentation", "trace", "observability", "monitoring", "agent", "system", "tool", "integration", "setup", "configuration", "usage", "guide", "tutorial", "pricing", "plan", "subscription", "account", "dashboard", "metrics"]
            web_keywords = ["recent", "latest", "news", "update", "announcement", "release", "today", "yesterday", "this week", "this month", "current", "trending", "popular", "viral", "breaking", "live", "real-time", "weather", "stock", "price", "market", "cryptocurrency", "bitcoin", "election", "politics", "sports", "entertainment", "celebrity"]
            
            rag_score = sum(1 for keyword in rag_keywords if keyword in query_lower)
            web_score = sum(1 for keyword in web_keywords if keyword in query_lower)
            confidence_score = abs(rag_score - web_score) / max(rag_score + web_score, 1)
            
            # Other Details Span - Track routing analysis and decision metrics
            with trace_agent(
                agent_type="other_details",
                operation="routing_evaluation_metrics",
                capabilities=["routing_analysis", "decision_evaluation", "quality_assessment"],
                attributes={
                    "agent.id": "noveum_other_details",
                    "input_query": f"Routing evaluation for query: {query[:100]}{'...' if len(query) > 100 else ''}",
                    "query_length": len(query)
                }
            ) as other_span:
                
                # Set other details span attributes
                other_span.set_attributes({
                    # Input metrics
                    "input_query": f"Routing evaluation for query: {query[:100]}{'...' if len(query) > 100 else ''}",
                    "query_length": len(query),
                    "query_type": "routing_evaluation_query",
                    
                    # Classification metrics
                    "classification.mode": mode,
                    "classification.rag_keyword_score": rag_score,
                    "classification.web_keyword_score": web_score,
                    "classification.confidence_score": confidence_score,
                    "classification.confidence_level": "high" if confidence_score > 0.5 else "medium" if confidence_score > 0.2 else "low",
                    "classification.method": "llm_based" if rag_score == 0 and web_score == 0 else "keyword_based",
                    
                    # Query analysis metrics
                    "query.complexity": "complex" if len(query) > 50 else "medium" if len(query) > 20 else "simple",
                    "query.intent": "noveum_specific" if "noveum" in query_lower else "general_knowledge" if web_score > rag_score else "documentation",
                    "query.keyword_density": (rag_score + web_score) / len(query.split()),
                    "query.domain_affinity": "noveum" if rag_score > web_score else "general" if web_score > rag_score else "neutral",
                    
                    # Routing decision metrics
                    "routing.decision": f"Routed to {mode} based on analysis",
                    "routing.rationale": f"RAG score: {rag_score}, Web score: {web_score}, Confidence: {confidence_score:.2f}",
                    "routing.expected_performance": "high" if confidence_score > 0.5 else "medium" if confidence_score > 0.2 else "low",
                    "routing.alternative_mode": "Web Search" if mode == "RAG" else "RAG",
                    "routing.decision_confidence": confidence_score,
                    
                    # Evaluation metrics
                    "evaluation.routing_accuracy": "high" if confidence_score > 0.5 else "medium" if confidence_score > 0.2 else "low",
                    "evaluation.keyword_coverage": (rag_score + web_score) / len(rag_keywords + web_keywords),
                    "evaluation.query_understanding": "clear" if confidence_score > 0.5 else "ambiguous" if confidence_score > 0.2 else "unclear",
                    "evaluation.ready_for_production": True,
                    
                    # Router-specific metrics
                    "router.classification_strategy": "hybrid_keyword_llm",
                    "router.keyword_matching": "used" if rag_score > 0 or web_score > 0 else "bypassed",
                    "router.llm_fallback": "used" if rag_score == 0 and web_score == 0 else "not_needed",
                    "router.decision_time": "instant" if rag_score > 0 or web_score > 0 else "llm_required",
                    "output_response": f"Routing Decision: {mode} (Confidence: {confidence_score:.2f})"
                })

            # Set main router span attributes (simplified)
            router_span.set_attributes({
                "input_query": query,
                "query_length": len(query),
                "query_type": "routing_query",
                "output_response": f"Routed to {mode} for query processing",
                "router.classification": mode,
                "router.confidence_score": confidence_score,
                "router.rag_keyword_score": rag_score,
                "router.web_keyword_score": web_score,
                "router.mode": "intelligent_routing"
            })
            
            # Route to appropriate system
            if mode == "RAG":
                print(f"🧠 Routing to RAG system for: '{query}'")
                response = rag_system.generate_rag_response(query)
            else:
                print(f"🌐 Routing to Web Search for: '{query}'")
                response = web_search_system.generate_web_response(query)
            
            return mode, response

# Initialize query router
query_router = NoveumQueryRouter(llm, CONFIG)
print("✅ Query router initialized!")


✅ Query router initialized!


In [54]:
# Cell 7: Main Executor - Orchestrates the complete agent workflow
class NoveumAIAgent:
    def __init__(self, scraper, rag_system, web_search_system, query_router, config):
        self.scraper = scraper
        self.rag_system = rag_system
        self.web_search_system = web_search_system
        self.query_router = query_router
        self.config = config
        self.is_initialized = False
    
    def initialize_system(self, force_scrape: bool = False) -> bool:
        """Initialize the system by setting up RAG with scraped data"""
        print("🚀 Initializing Noveum AI Agent...")
        
        with trace_operation("system_initialization") as init_span:
            init_span.set_attributes({
                "system.force_scrape": force_scrape,
                "system.config": self.config,
                "input_query": f"Initialize system with force_scrape={force_scrape}",
                "output_response": "System initialization: RAG system loaded, vector store ready, agent operational"
            })
            
            # Check if we need to scrape or if data already exists
            if force_scrape or not os.path.exists(self.config["noveum_docs_file"]):
                print("📥 Scraping Noveum website...")
                
                # Scrape the website
                scraped_data = self.scraper.scrape_website()
                
                if not scraped_data:
                    print("❌ Failed to scrape website data")
                    return False
                
                # Save scraped data
                self.scraper.save_to_json(self.config["noveum_docs_file"])
                
                init_span.add_event("website_scraped", {
                    "input_query": f"Scrape website: {self.config['noveum_base_url']}",
                    "output_response": f"Website scraping completed: {len(scraped_data)} pages scraped, {sum(page['content_length'] for page in scraped_data)} total characters extracted for RAG system",
                    "pages_scraped": len(scraped_data),
                    "total_content_length": sum(page["content_length"] for page in scraped_data)
                })
            else:
                print("📁 Using existing scraped data...")
            
            # Load documents and create/load vector store
            documents = self.rag_system.load_documents_from_json(self.config["noveum_docs_file"])
            
            if not documents:
                print("❌ Failed to load documents")
                return False
            
            # Try to load existing vector store, create if doesn't exist
            if not self.rag_system.load_vectorstore():
                print("🔄 Creating new vector store...")
                self.rag_system.create_vectorstore(documents)
            
            self.is_initialized = True
            print("✅ Noveum AI Agent initialized successfully!")
            
            init_span.set_attributes({
                "system.initialized": True,
                "system.documents_loaded": len(documents),
                "system.vectorstore_ready": self.rag_system.vectorstore is not None
            })
            
            return True
    
    def process_query(self, query: str) -> Dict[str, Any]:
        """Process a user query and return response"""
        if not self.is_initialized:
            print("❌ System not initialized. Please run initialize_system() first.")
            return {
                "answer": "System not initialized. Please run initialize_system() first.",
                "mode": "Error",
                "sources": [],
                "error": "System not initialized"
            }
        
        print(f"\n🎯 Processing query: '{query}'")
        
        with trace_operation("tool-orchestator") as process_span:
            process_span.set_attributes({
                "input_query": query,
                "query.length": len(query)
            })
            
            start_time = time.time()
            
            try:
                # Route query and get response
                mode, response = self.query_router.route_query(query)
                
                # Add processing metrics
                end_time = time.time()
                processing_time = end_time - start_time
                
                response.update({
                    "processing_time": processing_time,
                    "timestamp": time.time()
                })
                
                # Add metrics to span
                process_span.set_attributes({
                    "processing.mode": mode,
                    "processing.time_seconds": processing_time,
                    "processing.response_length": len(response.get("answer", "")),
                    "processing.sources_count": len(response.get("sources", [])),
                    "output_response": f"Final Answer: {response.get('answer', '')[:200]}{'...' if len(response.get('answer', '')) > 200 else ''}",
                    "final_answer_mode": mode,
                    "query_processed.input_query": query,
                    "query_processed.output_response": f"Successfully processed query using {mode}, generated {len(response.get('answer', ''))} character response",
                    "query_processed.mode": mode,
                    "query_processed.processing_time": processing_time,
                    "query_processed.response_length": len(response.get("answer", ""))
                })
                
                print(f"✅ Query processed in {processing_time:.2f}s using {mode}")
                return response
                
            except Exception as e:
                error_msg = f"Error processing query: {str(e)}"
                print(f"❌ {error_msg}")
                
                process_span.add_event("query_processing_error", {
                    "error": str(e),
                    "input_query": query,
                    "output_response": f"I encountered an error while processing your query: {str(e)}"
                })
                
                return {
                    "answer": f"I encountered an error while processing your query: {str(e)}",
                    "mode": "Error",
                    "sources": [],
                    "error": str(e),
                    "processing_time": time.time() - start_time
                }
    
    def display_response(self, response: Dict[str, Any]) -> None:
        """Display the response in a formatted way"""
        print("\n" + "="*80)
        print(f"🤖 NOVEUM AI AGENT RESPONSE")
        print("="*80)
        print(f"📊 Mode: {response.get('mode', 'Unknown')}")
        print(f"⏱️  Processing Time: {response.get('processing_time', 0):.2f}s")
        print(f"📅 Timestamp: {time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(response.get('timestamp', time.time())))}")
        
        if response.get('sources'):
            print(f"📚 Sources ({len(response['sources'])}):")
            for i, source in enumerate(response['sources'][:3], 1):  # Show first 3 sources
                print(f"   {i}. {source}")
            if len(response['sources']) > 3:
                print(f"   ... and {len(response['sources']) - 3} more")
        
        print("\n💬 Answer:")
        print("-" * 40)
        print(response.get('answer', 'No answer provided'))
        print("="*80)

# Initialize the main agent
noveum_agent = NoveumAIAgent(scraper, rag_system, web_search_system, query_router, CONFIG)
print("✅ Noveum AI Agent initialized!")


✅ Noveum AI Agent initialized!


In [56]:
# Cell 8: Usage Examples and Demo
def demo_noveum_agent():
    """Demo function showing how to use the Noveum AI Agent"""
    
    print("🎬 NOVEUM AI AGENT DEMO")
    print("="*50)
    
    # Step 1: Initialize the system
    print("\n1️⃣ Initializing the system...")
    success = noveum_agent.initialize_system(force_scrape=False)  # Set to True to force re-scraping
    
    if not success:
        print("❌ Failed to initialize system")
        return
    
    # Step 2: Demo queries - 20 comprehensive test questions
    demo_queries = [
        # RAG Queries (Noveum-specific)
        "What is Noveum and what does it do?",  # Basic product info
        "How do I integrate Noveum Trace in my application?",  # Technical integration
        "What are Noveum's pricing plans?",  # Pricing information
        "What features does Noveum Trace offer?",  # Feature overview
        "How do I set up observability with Noveum?",  # Setup guidance
        "What APIs are available in Noveum platform?",  # API documentation
        "How does Noveum handle agent tracing?",  # Technical details
        "What monitoring capabilities does Noveum provide?",  # Capabilities
        "How do I configure Noveum for my system?",  # Configuration
        "What are the benefits of using Noveum Trace?",  # Value proposition
        
        # Web Search Queries (External/Recent information)
        "What are the latest AI news today?",  # Recent news
        "What's the weather like today?",  # Current weather
        "Tell me about recent developments in machine learning",  # Recent developments
        "What are the current trends in observability tools?",  # Industry trends
        "What happened in tech news this week?",  # Weekly tech news
        "What are the latest updates in Python programming?",  # Recent updates
        "What's the current status of cryptocurrency markets?",  # Market information
        "What are the newest features in cloud computing?",  # Recent features
        "What's happening in the software development world today?",  # Current events
        "What are the latest breakthroughs in artificial intelligence?"  # Recent breakthroughs
    ]
    
    print(f"\n2️⃣ Running {len(demo_queries)} demo queries...")
    
    for i, query in enumerate(demo_queries, 1):
        print(f"\n--- Demo Query {i} ---")
        response = noveum_agent.process_query(query)
        noveum_agent.display_response(response)
        
        # Small delay between queries
        time.sleep(1)
    
    print(f"\n🎉 Demo completed! Check Noveum Trace dashboard for detailed observability data.")
    print("💡 You can now use noveum_agent.process_query('your question') for your own queries!")

# Interactive query function
def ask_question(question: str):
    """Convenience function to ask a single question"""
    if not noveum_agent.is_initialized:
        print("⚠️  System not initialized. Initializing now...")
        if not noveum_agent.initialize_system():
            print("❌ Failed to initialize system")
            return
    
    response = noveum_agent.process_query(question)
    noveum_agent.display_response(response)
    return response

print("✅ Demo functions ready!")
print("\n🚀 To get started:")
print("1. Run: demo_noveum_agent()  # For a full demo")
print("2. Run: ask_question('Your question here')  # For a single question")
print("3. Or use: noveum_agent.process_query('Your question')  # For programmatic access")


✅ Demo functions ready!

🚀 To get started:
1. Run: demo_noveum_agent()  # For a full demo
2. Run: ask_question('Your question here')  # For a single question
3. Or use: noveum_agent.process_query('Your question')  # For programmatic access


In [58]:
demo_noveum_agent()

🎬 NOVEUM AI AGENT DEMO

1️⃣ Initializing the system...
🚀 Initializing Noveum AI Agent...
📁 Using existing scraped data...
✅ Loaded 38 documents from noveum_docs.json


2025-10-22 22:26:35 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_system_initialization (ID: 9af2b4b0-0c2f-4d2e-ae99-30df0d2f0fa6) - 1 spans
2025-10-22 22:26:35 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_system_initialization (ID: 9af2b4b0-0c2f-4d2e-ae99-30df0d2f0fa6) - 1 spans
2025-10-22 22:26:35 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace 9af2b4b0-0c2f-4d2e-ae99-30df0d2f0fa6
2025-10-22 22:26:35 - noveum_trace.transport.http_transport - INFO - ✅ Trace 9af2b4b0-0c2f-4d2e-ae99-30df0d2f0fa6 successfully queued for export


✅ Loaded existing vector store from noveum_vectorstore
✅ Noveum AI Agent initialized successfully!

2️⃣ Running 20 demo queries...

--- Demo Query 1 ---

🎯 Processing query: 'What is Noveum and what does it do?'
🧠 Routing to RAG system for: 'What is Noveum and what does it do?'
🔍 Found 5 relevant documents for query: 'What is Noveum and what does it do?'


2025-10-22 22:26:43 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_tool-orchestator (ID: 3ad192e5-bbd4-4236-8144-019d53512769) - 6 spans
2025-10-22 22:26:43 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_tool-orchestator (ID: 3ad192e5-bbd4-4236-8144-019d53512769) - 6 spans
2025-10-22 22:26:43 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace 3ad192e5-bbd4-4236-8144-019d53512769
2025-10-22 22:26:43 - noveum_trace.transport.http_transport - INFO - ✅ Trace 3ad192e5-bbd4-4236-8144-019d53512769 successfully queued for export


🔍 Found 5 relevant documents for query: 'What is Noveum and what does it do?'
✅ Query processed in 8.68s using RAG

🤖 NOVEUM AI AGENT RESPONSE
📊 Mode: RAG
⏱️  Processing Time: 8.68s
📅 Timestamp: 2025-10-22 22:26:43
📚 Sources (5):
   1. https://noveum.ai/en/blog/noveum-ai-your-one-stop-ai-evaluation-platform
   2. https://noveum.ai/en/docs/getting-started/overview
   3. https://noveum.ai/docs/getting-started/overview
   ... and 2 more

💬 Answer:
----------------------------------------
Noveum.ai is a comprehensive tracing and observability platform specifically designed for AI applications, including LLM-powered chatbots, RAG systems, and multi-agent workflows. It provides the necessary insights to understand, debug, and optimize these systems, addressing the complexities involved in building production AI applications, such as LLM calls, vector searches, and agent reasoning. Noveum.ai enhances visibility into these workflows, making debugging easier and optimization more effective (Sou

2025-10-22 22:26:53 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_tool-orchestator (ID: bc60c122-f30b-4114-9f78-ee40a024b525) - 6 spans
2025-10-22 22:26:53 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_tool-orchestator (ID: bc60c122-f30b-4114-9f78-ee40a024b525) - 6 spans
2025-10-22 22:26:53 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace bc60c122-f30b-4114-9f78-ee40a024b525
2025-10-22 22:26:53 - noveum_trace.transport.http_transport - INFO - ✅ Trace bc60c122-f30b-4114-9f78-ee40a024b525 successfully queued for export


🔍 Found 5 relevant documents for query: 'How do I integrate Noveum Trace in my application?'
✅ Query processed in 8.83s using RAG

🤖 NOVEUM AI AGENT RESPONSE
📊 Mode: RAG
⏱️  Processing Time: 8.83s
📅 Timestamp: 2025-10-22 22:26:53
📚 Sources (5):
   1. https://noveum.ai/en/docs
   2. https://noveum.ai/docs
   3. https://noveum.ai/docs/getting-started/sdk-integration
   ... and 2 more

💬 Answer:
----------------------------------------
To integrate Noveum Trace into your application, you can follow these steps based on the SDK Integration Guide:

1. **Create Your Account & Get API Key**:
   - Sign up at [noveum.ai](https://noveum.ai).
   - Create a project in your dashboard.
   - Generate an API key.

2. **Choose Your SDK**:
   - For Python applications, use the **Python SDK** (`noveum-trace`), which offers decorator-based tracing for LLM calls, agents, and RAG pipelines.
   - For TypeScript applications, use the **TypeScript SDK** (`@noveum/trace`), designed for frameworks like Next.js, 

2025-10-22 22:26:58 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_tool-orchestator (ID: 29a862bf-d7aa-4cf9-963c-0a4244343bbe) - 6 spans
2025-10-22 22:26:58 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_tool-orchestator (ID: 29a862bf-d7aa-4cf9-963c-0a4244343bbe) - 6 spans
2025-10-22 22:26:58 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace 29a862bf-d7aa-4cf9-963c-0a4244343bbe
2025-10-22 22:26:58 - noveum_trace.transport.http_transport - INFO - ✅ Trace 29a862bf-d7aa-4cf9-963c-0a4244343bbe successfully queued for export


🔍 Found 5 relevant documents for query: 'What are Noveum's pricing plans?'
✅ Query processed in 3.49s using RAG

🤖 NOVEUM AI AGENT RESPONSE
📊 Mode: RAG
⏱️  Processing Time: 3.49s
📅 Timestamp: 2025-10-22 22:26:58
📚 Sources (5):
   1. https://noveum.ai/docs/getting-started/overview
   2. https://noveum.ai/en/docs/getting-started/overview
   3. https://noveum.ai/en/blog/noveum-ai-your-one-stop-ai-evaluation-platform
   ... and 2 more

💬 Answer:
----------------------------------------
The provided context does not include specific information about Noveum's pricing plans. It mentions that all users will have access to the Observability suite for free, and early users will receive free evaluation jobs and premium support for the first year (Source 1 and Source 2). However, details on ongoing pricing or additional plans are not available in the documentation. If you have further questions about Noveum, feel free to ask!

--- Demo Query 4 ---

🎯 Processing query: 'What features does Noveum T

2025-10-22 22:27:05 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_tool-orchestator (ID: df0c6344-1a2e-4c53-bb7d-4e43b3bce1b8) - 6 spans
2025-10-22 22:27:05 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_tool-orchestator (ID: df0c6344-1a2e-4c53-bb7d-4e43b3bce1b8) - 6 spans
2025-10-22 22:27:05 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace df0c6344-1a2e-4c53-bb7d-4e43b3bce1b8
2025-10-22 22:27:05 - noveum_trace.transport.http_transport - INFO - ✅ Trace df0c6344-1a2e-4c53-bb7d-4e43b3bce1b8 successfully queued for export


🔍 Found 5 relevant documents for query: 'What features does Noveum Trace offer?'
✅ Query processed in 6.38s using RAG

🤖 NOVEUM AI AGENT RESPONSE
📊 Mode: RAG
⏱️  Processing Time: 6.38s
📅 Timestamp: 2025-10-22 22:27:05
📚 Sources (5):
   1. https://noveum.ai/en/docs
   2. https://noveum.ai/docs
   3. https://noveum.ai/en/docs/getting-started/sdk-integration
   ... and 2 more

💬 Answer:
----------------------------------------
Noveum Trace offers several key features through its Python SDK (`noveum-trace`) and TypeScript SDK (`@noveum/trace`):

1. **Decorator-based Tracing**: It provides a simple way to trace LLM calls, agents, and RAG pipelines using decorators.
2. **Automatic Instrumentation**: The SDK automatically instruments popular AI frameworks, making it easier to integrate tracing without extensive manual setup.
3. **Context Propagation**: It supports context propagation across asynchronous operations, ensuring that trace data remains consistent throughout the execution flow.
4. 

2025-10-22 22:27:12 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_tool-orchestator (ID: 93bd37c4-4289-43c9-b34e-19eaaef610c2) - 6 spans
2025-10-22 22:27:12 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_tool-orchestator (ID: 93bd37c4-4289-43c9-b34e-19eaaef610c2) - 6 spans
2025-10-22 22:27:12 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace 93bd37c4-4289-43c9-b34e-19eaaef610c2
2025-10-22 22:27:12 - noveum_trace.transport.http_transport - INFO - ✅ Trace 93bd37c4-4289-43c9-b34e-19eaaef610c2 successfully queued for export


🔍 Found 5 relevant documents for query: 'How do I set up observability with Noveum?'
✅ Query processed in 5.72s using RAG

🤖 NOVEUM AI AGENT RESPONSE
📊 Mode: RAG
⏱️  Processing Time: 5.72s
📅 Timestamp: 2025-10-22 22:27:12
📚 Sources (5):
   1. https://noveum.ai/en/blog/noveum-ai-your-one-stop-ai-evaluation-platform
   2. https://noveum.ai/en/docs/getting-started/tracing-concepts
   3. https://noveum.ai/docs/getting-started/tracing-concepts
   ... and 2 more

💬 Answer:
----------------------------------------
To set up observability with Noveum, follow these steps:

1. **Sign Up**: Go to [noveum.ai](https://noveum.ai) and create an account.
2. **Create a Project**: After signing up, create a project to obtain your API key.
3. **Install the SDK**: Choose your preferred programming language and install the corresponding SDK.
4. **Add Tracing**: Integrate tracing into your AI workflows using the SDK.
5. **Explore Insights**: Use the Noveum.ai dashboard to analyze the insights gathered from 

2025-10-22 22:27:18 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_tool-orchestator (ID: fee3e6d3-8da1-4ef3-8214-a7d7323d3a49) - 6 spans
2025-10-22 22:27:18 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_tool-orchestator (ID: fee3e6d3-8da1-4ef3-8214-a7d7323d3a49) - 6 spans
2025-10-22 22:27:18 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace fee3e6d3-8da1-4ef3-8214-a7d7323d3a49
2025-10-22 22:27:18 - noveum_trace.transport.http_transport - INFO - ✅ Trace fee3e6d3-8da1-4ef3-8214-a7d7323d3a49 successfully queued for export


🔍 Found 5 relevant documents for query: 'What APIs are available in Noveum platform?'
✅ Query processed in 5.00s using RAG

🤖 NOVEUM AI AGENT RESPONSE
📊 Mode: RAG
⏱️  Processing Time: 5.00s
📅 Timestamp: 2025-10-22 22:27:18
📚 Sources (5):
   1. https://noveum.ai/en/docs
   2. https://noveum.ai/docs
   3. https://noveum.ai/docs/getting-started/overview
   ... and 2 more

💬 Answer:
----------------------------------------
The provided context does not specify the APIs available in the Noveum platform. For detailed information about the APIs, I recommend checking the official Noveum documentation or the SDK Integration Guide. If you have any other questions about Noveum, feel free to ask!

--- Demo Query 7 ---

🎯 Processing query: 'How does Noveum handle agent tracing?'
🧠 Routing to RAG system for: 'How does Noveum handle agent tracing?'
🔍 Found 5 relevant documents for query: 'How does Noveum handle agent tracing?'


2025-10-22 22:27:23 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_tool-orchestator (ID: 3bbd0b36-c30f-4126-b62a-dd0fae1c701d) - 6 spans
2025-10-22 22:27:23 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_tool-orchestator (ID: 3bbd0b36-c30f-4126-b62a-dd0fae1c701d) - 6 spans
2025-10-22 22:27:23 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace 3bbd0b36-c30f-4126-b62a-dd0fae1c701d
2025-10-22 22:27:23 - noveum_trace.transport.http_transport - INFO - ✅ Trace 3bbd0b36-c30f-4126-b62a-dd0fae1c701d successfully queued for export


🔍 Found 5 relevant documents for query: 'How does Noveum handle agent tracing?'
✅ Query processed in 4.48s using RAG

🤖 NOVEUM AI AGENT RESPONSE
📊 Mode: RAG
⏱️  Processing Time: 4.48s
📅 Timestamp: 2025-10-22 22:27:23
📚 Sources (5):
   1. https://noveum.ai/en/blog/noveum-ai-your-one-stop-ai-evaluation-platform
   2. https://noveum.ai/docs/advanced/multi-agent-tracing
   3. https://noveum.ai/en/docs/advanced/multi-agent-tracing
   ... and 2 more

💬 Answer:
----------------------------------------
Noveum.ai handles agent tracing through its specialized Multi-Agent Tracing capabilities, which are designed to observe complex workflows and inter-agent communications within multi-agent systems. These systems often involve multiple agents that coordinate, communicate, and collaborate to achieve shared goals, presenting unique observability challenges. Noveum.ai provides comprehensive tracing to help users understand and optimize these intricate workflows, ensuring better visibility into agent 

2025-10-22 22:27:31 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_tool-orchestator (ID: fb0d31fc-2465-4ecd-973c-4cd443d9befa) - 6 spans
2025-10-22 22:27:31 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_tool-orchestator (ID: fb0d31fc-2465-4ecd-973c-4cd443d9befa) - 6 spans
2025-10-22 22:27:31 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace fb0d31fc-2465-4ecd-973c-4cd443d9befa
2025-10-22 22:27:31 - noveum_trace.transport.http_transport - INFO - ✅ Trace fb0d31fc-2465-4ecd-973c-4cd443d9befa successfully queued for export


🔍 Found 5 relevant documents for query: 'What monitoring capabilities does Noveum provide?'
✅ Query processed in 6.97s using RAG

🤖 NOVEUM AI AGENT RESPONSE
📊 Mode: RAG
⏱️  Processing Time: 6.97s
📅 Timestamp: 2025-10-22 22:27:31
📚 Sources (5):
   1. https://noveum.ai/en
   2. https://noveum.ai/
   3. https://noveum.ai
   ... and 2 more

💬 Answer:
----------------------------------------
Noveum provides comprehensive monitoring capabilities for AI applications, including the ability to monitor, trace, and optimize AI agents across various frameworks such as LangChain, CrewAI, AutoGen, and custom implementations. The platform features a unified dashboard that allows users to monitor everything in their AI ecosystem, capturing every trace and span from simple LLM calls to complex multi-agent interactions. This is facilitated by lightweight SDKs that ensure no detail is missed, enabling users to evaluate and improve their AI agents effectively (Sources 1, 2, 4).

--- Demo Query 9 ---

🎯 Pr

2025-10-22 22:27:37 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_tool-orchestator (ID: 1ae0d6c4-d1fb-4239-b536-3333187d44aa) - 6 spans
2025-10-22 22:27:37 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_tool-orchestator (ID: 1ae0d6c4-d1fb-4239-b536-3333187d44aa) - 6 spans
2025-10-22 22:27:37 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace 1ae0d6c4-d1fb-4239-b536-3333187d44aa
2025-10-22 22:27:37 - noveum_trace.transport.http_transport - INFO - ✅ Trace 1ae0d6c4-d1fb-4239-b536-3333187d44aa successfully queued for export


🔍 Found 5 relevant documents for query: 'How do I configure Noveum for my system?'
✅ Query processed in 4.69s using RAG

🤖 NOVEUM AI AGENT RESPONSE
📊 Mode: RAG
⏱️  Processing Time: 4.69s
📅 Timestamp: 2025-10-22 22:27:37
📚 Sources (5):
   1. https://noveum.ai/en/docs
   2. https://noveum.ai/docs
   3. https://noveum.ai/docs/getting-started/overview
   ... and 2 more

💬 Answer:
----------------------------------------
To configure Noveum for your system, you can refer to the SDK Integration Guide available in the Noveum documentation. This guide will help you trace your AI applications with minimal code changes. The SDKs provided for Python and TypeScript are designed to facilitate this integration. 

For detailed steps and specific configurations, please check the relevant sections in the documentation. If you need further assistance, you can also reach out to the Noveum community via Discord or email support at [email protected] (Sources: Source 1, Source 5).

--- Demo Query 10 ---

🎯 

2025-10-22 22:27:46 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_tool-orchestator (ID: 823b7927-10bb-49ef-a94e-1b914c298bfb) - 6 spans
2025-10-22 22:27:46 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_tool-orchestator (ID: 823b7927-10bb-49ef-a94e-1b914c298bfb) - 6 spans
2025-10-22 22:27:46 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace 823b7927-10bb-49ef-a94e-1b914c298bfb
2025-10-22 22:27:46 - noveum_trace.transport.http_transport - INFO - ✅ Trace 823b7927-10bb-49ef-a94e-1b914c298bfb successfully queued for export


🔍 Found 5 relevant documents for query: 'What are the benefits of using Noveum Trace?'
✅ Query processed in 7.97s using RAG

🤖 NOVEUM AI AGENT RESPONSE
📊 Mode: RAG
⏱️  Processing Time: 7.97s
📅 Timestamp: 2025-10-22 22:27:46
📚 Sources (5):
   1. https://noveum.ai/docs/platform/dashboard
   2. https://noveum.ai/en/docs
   3. https://noveum.ai/docs
   ... and 2 more

💬 Answer:
----------------------------------------
Using Noveum Trace offers several benefits, including:

1. **Real-Time Monitoring**: The Noveum platform provides a real-time dashboard for analyzing traces and performance, with live updates as new traces arrive and real-time status monitoring of your trace ingestion pipeline (Source 1).

2. **Advanced Analysis**: Users can perform comparative analysis across multiple traces, allowing for performance comparisons and insights into system health through live metrics (Source 1).

3. **Cost Analysis and Optimization**: Noveum Trace includes features for cost analysis and optimiz

  with DDGS() as ddgs:



--- Demo Query 11 ---

🎯 Processing query: 'What are the latest AI news today?'
🌐 Routing to Web Search for: 'What are the latest AI news today?'
🔍 Found 1 web search results for: 'What are the latest AI news today?'


2025-10-22 22:27:50 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_tool-orchestator (ID: e4f3cc20-c137-4905-80fe-2d29b03c2231) - 6 spans
2025-10-22 22:27:50 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_tool-orchestator (ID: e4f3cc20-c137-4905-80fe-2d29b03c2231) - 6 spans
2025-10-22 22:27:50 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace e4f3cc20-c137-4905-80fe-2d29b03c2231
2025-10-22 22:27:50 - noveum_trace.transport.http_transport - INFO - ✅ Trace e4f3cc20-c137-4905-80fe-2d29b03c2231 successfully queued for export


✅ Query processed in 3.48s using Web Search

🤖 NOVEUM AI AGENT RESPONSE
📊 Mode: Web Search
⏱️  Processing Time: 3.48s
📅 Timestamp: 2025-10-22 22:27:50
📚 Sources (1):
   1. https://duckduckgo.com/?q=What+are+the+latest+AI+news+today?

💬 Answer:
----------------------------------------
The provided web search results do not contain specific information about the latest AI news today. They primarily focus on general news coverage from India and around the world, including politics, business, and entertainment, but do not mention any recent developments or updates in artificial intelligence.

For the latest AI news, I recommend checking dedicated technology news websites or platforms that specialize in AI developments, as they would provide more relevant and up-to-date information.

--- Demo Query 12 ---

🎯 Processing query: 'What's the weather like today?'
🌐 Routing to Web Search for: 'What's the weather like today?'


  with DDGS() as ddgs:


🔍 Found 1 web search results for: 'What's the weather like today?'


2025-10-22 22:27:56 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_tool-orchestator (ID: b1b13e60-5b32-44a0-9769-197a944989d0) - 6 spans
2025-10-22 22:27:56 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_tool-orchestator (ID: b1b13e60-5b32-44a0-9769-197a944989d0) - 6 spans
2025-10-22 22:27:56 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace b1b13e60-5b32-44a0-9769-197a944989d0
2025-10-22 22:27:56 - noveum_trace.transport.http_transport - INFO - ✅ Trace b1b13e60-5b32-44a0-9769-197a944989d0 successfully queued for export


✅ Query processed in 4.93s using Web Search

🤖 NOVEUM AI AGENT RESPONSE
📊 Mode: Web Search
⏱️  Processing Time: 4.93s
📅 Timestamp: 2025-10-22 22:27:56
📚 Sources (1):
   1. https://duckduckgo.com/?q=What's+the+weather+like+today?

💬 Answer:
----------------------------------------
The web search results do not provide specific information about today's weather. To find out the current weather conditions, I recommend checking a reliable weather website or app for detailed updates, including temperature, wind, and precipitation forecasts.


  with DDGS() as ddgs:



--- Demo Query 13 ---

🎯 Processing query: 'Tell me about recent developments in machine learning'
🌐 Routing to Web Search for: 'Tell me about recent developments in machine learning'
🔍 Found 1 web search results for: 'Tell me about recent developments in machine learning'


2025-10-22 22:28:03 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_tool-orchestator (ID: c5a64d08-d00f-46cf-ab3c-d8d473c10eca) - 6 spans
2025-10-22 22:28:03 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_tool-orchestator (ID: c5a64d08-d00f-46cf-ab3c-d8d473c10eca) - 6 spans
2025-10-22 22:28:03 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace c5a64d08-d00f-46cf-ab3c-d8d473c10eca
2025-10-22 22:28:03 - noveum_trace.transport.http_transport - INFO - ✅ Trace c5a64d08-d00f-46cf-ab3c-d8d473c10eca successfully queued for export


✅ Query processed in 5.97s using Web Search

🤖 NOVEUM AI AGENT RESPONSE
📊 Mode: Web Search
⏱️  Processing Time: 5.97s
📅 Timestamp: 2025-10-22 22:28:03
📚 Sources (1):
   1. https://duckduckgo.com/?q=Tell+me+about+recent+developments+in+machine+learning

💬 Answer:
----------------------------------------
The web search results did not provide specific information about recent developments in machine learning. Therefore, I cannot summarize any recent advancements based on the provided data.

However, if you're interested in general trends in machine learning, I can mention that recent developments often include advancements in deep learning architectures, improvements in natural language processing (NLP) models, and increased applications of machine learning in various industries such as healthcare, finance, and autonomous systems. For the latest updates, I recommend checking reputable tech news sources or academic journals focused on artificial intelligence and machine learning.

--- Dem

2025-10-22 22:28:10 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_tool-orchestator (ID: 45efae74-ee25-4004-a509-6e0ff628a0cd) - 6 spans
2025-10-22 22:28:10 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_tool-orchestator (ID: 45efae74-ee25-4004-a509-6e0ff628a0cd) - 6 spans
2025-10-22 22:28:10 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace 45efae74-ee25-4004-a509-6e0ff628a0cd
2025-10-22 22:28:10 - noveum_trace.transport.http_transport - INFO - ✅ Trace 45efae74-ee25-4004-a509-6e0ff628a0cd successfully queued for export


🔍 Found 5 relevant documents for query: 'What are the current trends in observability tools?'
✅ Query processed in 5.15s using RAG

🤖 NOVEUM AI AGENT RESPONSE
📊 Mode: RAG
⏱️  Processing Time: 5.15s
📅 Timestamp: 2025-10-22 22:28:10
📚 Sources (5):
   1. https://noveum.ai/en/blog/from-logs-to-intelligent-choices-inside-noveum-ais-evaluation-process
   2. https://noveum.ai/docs/getting-started/tracing-concepts
   3. https://noveum.ai/en/docs/getting-started/tracing-concepts
   ... and 2 more

💬 Answer:
----------------------------------------
The provided context does not explicitly outline current trends in observability tools. However, it highlights the unique challenges faced by AI applications that traditional monitoring tools cannot adequately address. For instance, Noveum.ai emphasizes the importance of comprehensive tracing and collecting the right data to understand, debug, and optimize AI systems, rather than just gathering all possible data (Source 2 and Source 4).

Additionally,

  with DDGS() as ddgs:



--- Demo Query 15 ---

🎯 Processing query: 'What happened in tech news this week?'
🌐 Routing to Web Search for: 'What happened in tech news this week?'
🔍 Found 1 web search results for: 'What happened in tech news this week?'


2025-10-22 22:28:14 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_tool-orchestator (ID: e30e9ba6-ffd1-4eb9-b7ae-6d19e04343d6) - 6 spans
2025-10-22 22:28:14 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_tool-orchestator (ID: e30e9ba6-ffd1-4eb9-b7ae-6d19e04343d6) - 6 spans
2025-10-22 22:28:14 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace e30e9ba6-ffd1-4eb9-b7ae-6d19e04343d6
2025-10-22 22:28:14 - noveum_trace.transport.http_transport - INFO - ✅ Trace e30e9ba6-ffd1-4eb9-b7ae-6d19e04343d6 successfully queued for export


✅ Query processed in 3.22s using Web Search

🤖 NOVEUM AI AGENT RESPONSE
📊 Mode: Web Search
⏱️  Processing Time: 3.22s
📅 Timestamp: 2025-10-22 22:28:14
📚 Sources (1):
   1. https://duckduckgo.com/?q=What+happened+in+tech+news+this+week?

💬 Answer:
----------------------------------------
The web search results do not provide specific information about recent events in tech news for this week. They primarily discuss the grammatical usage of the word "happened" and its variations, such as "what happened" and "what's happened." 

To find out what happened in tech news this week, I recommend checking reliable tech news websites or platforms that aggregate current events in technology. If you have specific topics or companies in mind, I can help guide you on where to look for that information.

--- Demo Query 16 ---

🎯 Processing query: 'What are the latest updates in Python programming?'
🌐 Routing to Web Search for: 'What are the latest updates in Python programming?'


  with DDGS() as ddgs:


🔍 Found 1 web search results for: 'What are the latest updates in Python programming?'


2025-10-22 22:28:18 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_tool-orchestator (ID: 51f4534f-23e4-4be7-8d6e-e93391d6e0e2) - 6 spans
2025-10-22 22:28:18 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_tool-orchestator (ID: 51f4534f-23e4-4be7-8d6e-e93391d6e0e2) - 6 spans
2025-10-22 22:28:18 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace 51f4534f-23e4-4be7-8d6e-e93391d6e0e2
2025-10-22 22:28:18 - noveum_trace.transport.http_transport - INFO - ✅ Trace 51f4534f-23e4-4be7-8d6e-e93391d6e0e2 successfully queued for export


✅ Query processed in 3.62s using Web Search

🤖 NOVEUM AI AGENT RESPONSE
📊 Mode: Web Search
⏱️  Processing Time: 3.62s
📅 Timestamp: 2025-10-22 22:28:18
📚 Sources (1):
   1. https://duckduckgo.com/?q=What+are+the+latest+updates+in+Python+programming?

💬 Answer:
----------------------------------------
The provided web search results do not contain specific information about the latest updates in Python programming. They primarily focus on general news coverage and updates from India and around the world, without addressing programming or Python specifically.

For the latest updates in Python programming, I recommend checking official sources such as the Python Software Foundation's website or popular programming news platforms like Real Python or Python Weekly. These sources typically provide information on new releases, features, and enhancements in the Python language.

--- Demo Query 17 ---

🎯 Processing query: 'What's the current status of cryptocurrency markets?'
🌐 Routing to Web Se

  with DDGS() as ddgs:


🔍 Found 1 web search results for: 'What's the current status of cryptocurrency markets?'


2025-10-22 22:28:25 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_tool-orchestator (ID: 3f7b6a62-8888-4a5d-b12c-37a3829aea49) - 6 spans
2025-10-22 22:28:25 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_tool-orchestator (ID: 3f7b6a62-8888-4a5d-b12c-37a3829aea49) - 6 spans
2025-10-22 22:28:25 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace 3f7b6a62-8888-4a5d-b12c-37a3829aea49
2025-10-22 22:28:25 - noveum_trace.transport.http_transport - INFO - ✅ Trace 3f7b6a62-8888-4a5d-b12c-37a3829aea49 successfully queued for export


✅ Query processed in 5.65s using Web Search

🤖 NOVEUM AI AGENT RESPONSE
📊 Mode: Web Search
⏱️  Processing Time: 5.65s
📅 Timestamp: 2025-10-22 22:28:25
📚 Sources (1):
   1. https://duckduckgo.com/?q=What's+the+current+status+of+cryptocurrency+markets?

💬 Answer:
----------------------------------------
The provided web search results do not contain specific information regarding the current status of cryptocurrency markets. To get the latest updates on cryptocurrency prices, trends, and market analysis, I recommend checking financial news websites, cryptocurrency exchanges, or market tracking platforms like CoinMarketCap or CoinGecko. If you have any other questions or need further assistance, feel free to ask!

--- Demo Query 18 ---

🎯 Processing query: 'What are the newest features in cloud computing?'
🧠 Routing to RAG system for: 'What are the newest features in cloud computing?'
🔍 Found 5 relevant documents for query: 'What are the newest features in cloud computing?'


2025-10-22 22:28:31 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_tool-orchestator (ID: 343764b8-7ac9-4549-93cf-8abc11ceba6f) - 6 spans
2025-10-22 22:28:31 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_tool-orchestator (ID: 343764b8-7ac9-4549-93cf-8abc11ceba6f) - 6 spans
2025-10-22 22:28:31 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace 343764b8-7ac9-4549-93cf-8abc11ceba6f
2025-10-22 22:28:31 - noveum_trace.transport.http_transport - INFO - ✅ Trace 343764b8-7ac9-4549-93cf-8abc11ceba6f successfully queued for export


🔍 Found 5 relevant documents for query: 'What are the newest features in cloud computing?'
✅ Query processed in 5.25s using RAG

🤖 NOVEUM AI AGENT RESPONSE
📊 Mode: RAG
⏱️  Processing Time: 5.25s
📅 Timestamp: 2025-10-22 22:28:31
📚 Sources (5):
   1. https://noveum.ai/en/changelog
   2. https://noveum.ai/en/changelog
   3. https://noveum.ai/docs/platform/dashboard
   ... and 2 more

💬 Answer:
----------------------------------------
The provided context does not specifically mention any new features in cloud computing as a general topic. However, it does highlight several recent enhancements related to the Noveum.ai platform, which may involve cloud computing aspects. These include:

1. **Enhanced Dashboard Analytics**: Real-time request tracking, latency monitoring, and cost analysis (Source 1).
2. **Improved Logs Interface**: Better search functionality and enhanced debugging capabilities (Source 1).
3. **Advanced Metrics Collection**: A comprehensive telemetry system with custom metri

  with DDGS() as ddgs:


🔍 Found 1 web search results for: 'What's happening in the software development world today?'


2025-10-22 22:28:41 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_tool-orchestator (ID: ec7c6cae-9bbd-47f4-a8ae-2d6d9b5011ba) - 6 spans
2025-10-22 22:28:41 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_tool-orchestator (ID: ec7c6cae-9bbd-47f4-a8ae-2d6d9b5011ba) - 6 spans
2025-10-22 22:28:41 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace ec7c6cae-9bbd-47f4-a8ae-2d6d9b5011ba
2025-10-22 22:28:41 - noveum_trace.transport.http_transport - INFO - ✅ Trace ec7c6cae-9bbd-47f4-a8ae-2d6d9b5011ba successfully queued for export


✅ Query processed in 8.28s using Web Search

🤖 NOVEUM AI AGENT RESPONSE
📊 Mode: Web Search
⏱️  Processing Time: 8.28s
📅 Timestamp: 2025-10-22 22:28:41
📚 Sources (1):
   1. https://duckduckgo.com/?q=What's+happening+in+the+software+development+world+today?

💬 Answer:
----------------------------------------
The web search did not yield any specific results regarding current events in the software development world. Therefore, I cannot provide detailed information on the latest trends, technologies, or news in this field.

However, generally speaking, the software development landscape is often influenced by several ongoing trends, such as:

1. **Increased Adoption of AI and Machine Learning**: Many companies are integrating AI into their software solutions to enhance functionality and user experience.

2. **Remote Work and Collaboration Tools**: The shift to remote work has led to a surge in the development and use of collaboration tools and platforms.

3. **DevOps and Continuous Integr

  with DDGS() as ddgs:



--- Demo Query 20 ---

🎯 Processing query: 'What are the latest breakthroughs in artificial intelligence?'
🌐 Routing to Web Search for: 'What are the latest breakthroughs in artificial intelligence?'
🔍 Found 1 web search results for: 'What are the latest breakthroughs in artificial intelligence?'


2025-10-22 22:28:44 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_tool-orchestator (ID: 6fe93abf-1413-49e8-8464-58558e25ab7b) - 6 spans
2025-10-22 22:28:44 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_tool-orchestator (ID: 6fe93abf-1413-49e8-8464-58558e25ab7b) - 6 spans
2025-10-22 22:28:44 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace 6fe93abf-1413-49e8-8464-58558e25ab7b
2025-10-22 22:28:44 - noveum_trace.transport.http_transport - INFO - ✅ Trace 6fe93abf-1413-49e8-8464-58558e25ab7b successfully queued for export


✅ Query processed in 2.68s using Web Search

🤖 NOVEUM AI AGENT RESPONSE
📊 Mode: Web Search
⏱️  Processing Time: 2.68s
📅 Timestamp: 2025-10-22 22:28:44
📚 Sources (1):
   1. https://duckduckgo.com/?q=What+are+the+latest+breakthroughs+in+artificial+intelligence?

💬 Answer:
----------------------------------------
The web search did not yield specific results regarding the latest breakthroughs in artificial intelligence. However, I can provide a general overview based on knowledge up to October 2023.

Recent breakthroughs in artificial intelligence include advancements in natural language processing (NLP), particularly with models like GPT-4, which have improved understanding and generation of human-like text. Additionally, there have been significant developments in computer vision, enabling AI systems to better interpret and analyze visual data.

Another notable area of progress is in reinforcement learning, where AI systems are increasingly capable of learning complex tasks through tria

## Downloading the data set

## 📋 Complete End-to-End Workflow Overview

This notebook demonstrates a **complete end-to-end workflow** for AI agent evaluation:

### 🔄 Workflow Steps:

1. **🤖 Agent Creation & Demo** - Build and test the Noveum AI agent with RAG + Web Search
2. **📊 Trace Collection** - Download traces from Noveum platform 
3. **🔄 Data Processing** - Combine, filter, and map trace data
4. **📈 Dataset Management** - Create dataset and upload items to Noveum
5. **📊 Score Upload** - Upload all evaluation metrics to the platform

### 🎯 What You'll Learn:

- How to build a sophisticated AI agent with dual knowledge sources
- Complete observability and tracing implementation
- End-to-end evaluation pipeline from traces to scores
- Noveum platform integration for dataset and score management

### 📊 Expected Results:

- **50 traces** with **279 spans** collected
- **12 conversation items** uploaded to dataset
- **5 evaluation metrics** successfully uploaded
- Complete observability pipeline operational

Let's start! 🚀


In [None]:
!python noveum_customer_support_bt/traces/fetch_traces_api.py 50

#. This script fetches traces for our project and saves them locally.

  pid, fd = os.forkpty()


Fetching 50 traces for project: noveum-ai-agent-rag-websearch
Cleaning existing traces directory: /Users/mramanindia/work/NovaEval/noveum_customer_support_bt/traces/traces
Created traces directory: /Users/mramanindia/work/NovaEval/noveum_customer_support_bt/traces/traces
Will fetch in 1 batch(es) of up to 100 traces each

--- Batch 1/1 ---
Fetching traces: size=50, from=0
Successfully fetched 50 traces
Saved batch 1 to: /Users/mramanindia/work/NovaEval/noveum_customer_support_bt/traces/traces/traces_batch_001.json
Batch 1 complete: 50 traces
Total fetched so far: 50/50
Reached target of 50 traces

=== Summary ===
Total traces fetched: 50
Batches created: 1
Traces directory: /Users/mramanindia/work/NovaEval/noveum_customer_support_bt/traces/traces
Created files: traces_batch_001.json


In [None]:
!python NovaEval/noveum_customer_support_bt/traces/combine_spans_api_compat.py

Processing traces from: /Users/mramanindia/work/NovaEval/noveum_customer_support_bt/traces/traces
Found 1 trace files: ['traces_batch_001.json']
Processing traces_batch_001.json...
Combined 290 spans total
Saved combined spans to: /Users/mramanindia/work/NovaEval/noveum_customer_support_bt/traces/traces/dataset.json

Sample of first span keys: ['span_id', 'trace_id', 'parent_span_id', 'name', 'start_time', 'end_time', 'duration_ms', 'status', 'status_message', 'attributes', 'events', 'links', 'trace_trace_id', 'trace_name', 'project', 'environment', 'trace_status', 'trace_status_message', 'trace_start_time', 'trace_end_time', 'trace_duration_ms', 'span_count', 'error_count', 'sdk', 'trace_attributes', 'metadata', 'created_at', 'updated_at']
Total spans: 290

Span types distribution:
  agent.llm-rag: 26
  agent.llm_model_execution: 48
  agent.query_routing: 48
  agent.rag_evaluation_metrics: 26
  agent.routing_evaluation_metrics: 48
  agent.web_search_evaluation_metrics: 22
  agent.web_

  pid, fd = os.forkpty()


## Data Filteration and mapping

In [62]:
!python preprocess_filter.py ./traces/traces/dataset.json



Reading ./traces/traces/dataset.json...
Original dataset: 290 records
Filtering spans...
After filtering: 290 records
Converting tool output format...
Writing ./traces/traces/dataset_filtered.json...
Filtering complete! Output: ./traces/traces/dataset_filtered.json

Success! Created ./traces/traces/dataset_filtered.json


  pid, fd = os.forkpty()


In [63]:
!python preprocess_map.py ./traces/dataset_filtered.json

Error: File ./traces/dataset_filtered.json not found


  pid, fd = os.forkpty()


In [None]:
!python preprocess_map.py NovaEval/noveum_customer_support_bt/traces/traces/dataset_filtered.json



Reading /Users/mramanindia/work/NovaEval/noveum_customer_support_bt/traces/traces/dataset_filtered.json...
Input dataset: 290 records
Mapping spans...
Writing /Users/mramanindia/work/NovaEval/noveum_customer_support_bt/traces/traces/dataset_filtered_mapped.json...
Mapping complete! Output: /Users/mramanindia/work/NovaEval/noveum_customer_support_bt/traces/traces/dataset_filtered_mapped.json

Success! Created /Users/mramanindia/work/NovaEval/noveum_customer_support_bt/traces/traces/dataset_filtered_mapped.json


  pid, fd = os.forkpty()


## Running eval on the dataset

In [72]:
# 1. Setup
!cd /Users/mramanindia/work/NovaEval
!source .venv/bin/activate
!cd noveum_customer_support_bt

# 2. Create Dataset
!python create_dataset.py --dataset-type agent --description "Customer Support Agent Evaluation Dataset" --pretty



  pid, fd = os.forkpty()


zsh:source:1: no such file or directory: .venv/bin/activate


  pid, fd = os.forkpty()


zsh:cd:1: no such file or directory: noveum_customer_support_bt


  pid, fd = os.forkpty()
  pid, fd = os.forkpty()


   The API will return a slug that you should set as NOVEUM_DATASET_SLUG in your .env file.

Creating dataset at: https://noveum.ai/api/v1/datasets
Organization: magic-api
Dataset name: customersupportagentdemo_new
Dataset type: agent
Description: Customer Support Agent Evaluation Dataset
Visibility: org
Environment: 
Error creating dataset: 409 Client Error: Conflict for url: https://noveum.ai/api/v1/datasets
Response status: 409
Response text: DATASET_SLUG_EXISTS


In [1]:
# 3. Create Version
!python create_dataset_version.py --pretty


Creating dataset version at: https://noveum.ai/api/v1/datasets/customersupportagentdemo-new/versions?organizationSlug=magic-api
Organization: magic-api
Dataset: customersupportagentdemo-new
Version: 0.0.2
Successfully created dataset version
Response status: 201

Response saved to: dataset_version_response.json

Response data:
{
  "success": true,
  "version": "0.0.2"
}


In [7]:
from demo_utils import run_complete_agent_evaluation
import os

# Process all JSON files in split_datasets directory
for file in os.listdir('split_datasets'):
    if file.endswith('.json'):
        print(f'Processing {file}...')
        run_complete_agent_evaluation(
            f'split_datasets/{file}', 
            sample_size=25, 
            evaluation_name=file.replace('.json', ''),
            output_dir='./demo_results'
        )
        print(f'Completed {file}\n')

2025-10-22 22:59:42 - noveum_trace.transport.batch_processor - INFO - 🔄 Batch processor background thread started (batch_size=100, timeout=5.0s)
2025-10-22 22:59:42 - noveum_trace.transport.batch_processor - INFO - Batch processor started with batch_size=100
2025-10-22 22:59:42 - noveum_trace.transport.http_transport - INFO - HTTP transport initialized for endpoint: https://api.noveum.ai/api
2025-10-22 22:59:42 - noveum_trace.core.client - INFO - Noveum Trace client initialized
2025-10-22 22:59:42,946 - INFO - novaeval.models.base - Noveum tracing initialized successfully


✅ All imports successful!
✅ list_dataset_files function defined!
✅ load_and_analyze_dataset function defined!
✅ parse_tools_from_prompt function defined!
✅ parse_params function defined!
✅ identify_span_type function defined!
✅ map_span_to_agent_data function defined!
✅ convert_spans_to_agent_dataset function defined!
✅ analyze_dataset_statistics function defined!
✅ setup_gemini_model function defined!
✅ setup_agent_evaluator function defined!
✅ run_evaluation function defined!
✅ analyze_agent_behavior_patterns function defined!
✅ export_processed_dataset function defined!
✅ setup_logging function defined!
✅ validate_environment function defined!
✅ print_demo_summary function defined!
✅ run_complete_agent_evaluation function defined!
Processing agent.llm_model_execution_dataset.json...
🚀 Starting Complete Agent Evaluation Pipeline
📁 Processing file: split_datasets/agent.llm_model_execution_dataset.json

📋 Step 1: Environment Setup
✅ Logging configured at INFO level
🔍 Environment valida

Evaluating samples: 0it [00:00, ?it/s]

2025-10-22 22:59:43 - INFO - novaeval.evaluators.agent_evaluator - Saving final results
2025-10-22 22:59:43 - INFO - novaeval.evaluators.agent_evaluator - Agent evaluation completed

✅ Evaluation completed!
❌ Results file not found
❌ Evaluation failed

📋 Step 7: Exporting Dataset
💾 Exporting processed dataset...
✅ Exported to ./processed_datasets/agent.llm_model_execution_dataset_processed_dataset.json
✅ Exported to ./processed_datasets/agent.llm_model_execution_dataset_processed_dataset.csv
✅ Export completed successfully!

🎉 EVALUATION PIPELINE COMPLETED!
📊 Final Results:
  - File processed: split_datasets/agent.llm_model_execution_dataset.json
  - Spans loaded: 20
  - Dataset size: 20
  - Evaluation completed: False
  - Export successful: True
  - Errors encountered: 1
Completed agent.llm_model_execution_dataset.json

Processing agent.query_routing_dataset.json...
🚀 Starting Complete Agent Evaluation Pipeline
📁 Processing file: split_datasets/agent.query_routing_dataset.json

📋 Step


Evaluating samples: 0it [00:00, ?it/s]

2025-10-22 22:59:43 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 22:59:44 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: 62c88c1e-760c-437e-a9d7-84a84e258789) - 1 spans
2025-10-22 22:59:44 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: 62c88c1e-760c-437e-a9d7-84a84e258789) - 1 spans
2025-10-22 22:59:44 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace 62c88c1e-760c-437e-a9d7-84a84e258789
2025-10-22 22:59:44 - noveum_trace.transport.http_transport - INFO - ✅ Trace 62c88c1e-760c-437e-a9d7-84a84e258789 successfully queued for export


2025-10-22 22:59:44 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 22:59:46 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: cf8d91be-c525-4e98-b2ab-b6dead66b872) - 1 spans
2025-10-22 22:59:46 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: cf8d91be-c525-4e98-b2ab-b6dead66b872) - 1 spans
2025-10-22 22:59:46 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace cf8d91be-c525-4e98-b2ab-b6dead66b872
2025-10-22 22:59:46 - noveum_trace.transport.http_transport - INFO - ✅ Trace cf8d91be-c525-4e98-b2ab-b6dead66b872 successfully queued for export


2025-10-22 22:59:46 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 22:59:47 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: 087e523c-cf91-49e6-8a3f-63d7e6ea817a) - 1 spans
2025-10-22 22:59:47 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: 087e523c-cf91-49e6-8a3f-63d7e6ea817a) - 1 spans
2025-10-22 22:59:47 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace 087e523c-cf91-49e6-8a3f-63d7e6ea817a
2025-10-22 22:59:47 - noveum_trace.transport.http_transport - INFO - ✅ Trace 087e523c-cf91-49e6-8a3f-63d7e6ea817a successfully queued for export


2025-10-22 22:59:47 - INFO - novaeval.evaluators.agent_evaluator - Saving intermediate results after 1 samples
2025-10-22 22:59:47 - INFO - novaeval.evaluators.agent_evaluator - Intermediate results saved to demo_results/agent.query_routing_dataset/agent_evaluation_results.csv


Evaluating samples: 1it [00:04,  4.56s/it]

2025-10-22 22:59:47 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 22:59:48 - noveum_trace.transport.batch_processor - INFO - ⏰ TIMEOUT TRIGGER: Sending batch due to timeout (5.1s >= 5.0s)
2025-10-22 22:59:48 - noveum_trace.transport.batch_processor - INFO - 📤 SENDING BATCH: 3 traces via send_callback
2025-10-22 22:59:48 - noveum_trace.transport.http_transport - INFO - 🚀 SENDING BATCH: 3 traces to https://api.noveum.ai/api/v1/traces
2025-10-22 22:59:48 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: f5d0b1e3-779c-4804-8cb6-c37743638b63) - 1 spans
2025-10-22 22:59:48 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: f5d0b1e3-779c-4804-8cb6-c37743638b63) - 1 spans
2025-10-22 22:59:48 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace f5d0b1e3-779c-4804-8cb6-c37743638b63
2025-10-22 22:59:48 - noveum_trace.transport.http_transport - INFO - ✅ Trace f5d0b1e3-779c-4804-8cb6-c37743638b63 successfully queued for export


2025-10-22 22:59:48 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 22:59:49 - noveum_trace.transport.http_transport - INFO - 📡 HTTP RESPONSE: Status 200 from https://api.noveum.ai/api/v1/traces
2025-10-22 22:59:49 - noveum_trace.transport.http_transport - INFO - ✅ Successfully sent batch of 3 traces
2025-10-22 22:59:49 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully sent batch of 3 traces via callback
2025-10-22 22:59:50 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: 9c192a1f-994d-4dba-a03d-79b70c731c6f) - 1 spans
2025-10-22 22:59:50 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: 9c192a1f-994d-4dba-a03d-79b70c731c6f) - 1 spans
2025-10-22 22:59:50 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace 9c192a1f-994d-4dba-a03d-79b70c731c6f
2025-10-22 22:59:50 - noveum_trace.transport.http_transport - INFO - ✅ Trace 9c192a1f-994d-4dba-a03d-79b70c731c6f successfully queued for export


2025-10-22 22:59:50 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 22:59:52 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: 4d902ae0-f8b9-4d47-b1d8-cfcc59d71ef9) - 1 spans
2025-10-22 22:59:52 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: 4d902ae0-f8b9-4d47-b1d8-cfcc59d71ef9) - 1 spans
2025-10-22 22:59:52 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace 4d902ae0-f8b9-4d47-b1d8-cfcc59d71ef9
2025-10-22 22:59:52 - noveum_trace.transport.http_transport - INFO - ✅ Trace 4d902ae0-f8b9-4d47-b1d8-cfcc59d71ef9 successfully queued for export


2025-10-22 22:59:52 - INFO - novaeval.evaluators.agent_evaluator - Saving intermediate results after 2 samples
2025-10-22 22:59:52 - INFO - novaeval.evaluators.agent_evaluator - Intermediate results saved to demo_results/agent.query_routing_dataset/agent_evaluation_results.csv


Evaluating samples: 2it [00:09,  5.04s/it]

2025-10-22 22:59:52 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 22:59:53 - noveum_trace.transport.batch_processor - INFO - ⏰ TIMEOUT TRIGGER: Sending batch due to timeout (5.4s >= 5.0s)
2025-10-22 22:59:53 - noveum_trace.transport.batch_processor - INFO - 📤 SENDING BATCH: 3 traces via send_callback
2025-10-22 22:59:53 - noveum_trace.transport.http_transport - INFO - 🚀 SENDING BATCH: 3 traces to https://api.noveum.ai/api/v1/traces
2025-10-22 22:59:53 - noveum_trace.transport.http_transport - INFO - 📡 HTTP RESPONSE: Status 200 from https://api.noveum.ai/api/v1/traces
2025-10-22 22:59:53 - noveum_trace.transport.http_transport - INFO - ✅ Successfully sent batch of 3 traces
2025-10-22 22:59:53 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully sent batch of 3 traces via callback
2025-10-22 22:59:55 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: d6f4e5b6-7b0c-4cc5-a6ab-0d7f6ee82997) - 1 spans
2025-10-22 22:59:55 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEU

2025-10-22 22:59:55 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 22:59:58 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: 1791b898-a25e-406e-9946-13403e6ea2a5) - 1 spans
2025-10-22 22:59:58 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: 1791b898-a25e-406e-9946-13403e6ea2a5) - 1 spans
2025-10-22 22:59:58 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace 1791b898-a25e-406e-9946-13403e6ea2a5
2025-10-22 22:59:58 - noveum_trace.transport.http_transport - INFO - ✅ Trace 1791b898-a25e-406e-9946-13403e6ea2a5 successfully queued for export


2025-10-22 22:59:58 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 22:59:58 - noveum_trace.transport.batch_processor - INFO - ⏰ TIMEOUT TRIGGER: Sending batch due to timeout (5.1s >= 5.0s)
2025-10-22 22:59:58 - noveum_trace.transport.batch_processor - INFO - 📤 SENDING BATCH: 2 traces via send_callback
2025-10-22 22:59:58 - noveum_trace.transport.http_transport - INFO - 🚀 SENDING BATCH: 2 traces to https://api.noveum.ai/api/v1/traces
2025-10-22 22:59:58 - noveum_trace.transport.http_transport - INFO - 📡 HTTP RESPONSE: Status 200 from https://api.noveum.ai/api/v1/traces
2025-10-22 22:59:58 - noveum_trace.transport.http_transport - INFO - ✅ Successfully sent batch of 2 traces
2025-10-22 22:59:58 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully sent batch of 2 traces via callback
2025-10-22 22:59:59 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: 93cd53ab-4aee-474b-80a1-b2e056179d42) - 1 spans
2025-10-22 22:59:59 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEU

2025-10-22 22:59:59 - INFO - novaeval.evaluators.agent_evaluator - Saving intermediate results after 3 samples
2025-10-22 22:59:59 - INFO - novaeval.evaluators.agent_evaluator - Intermediate results saved to demo_results/agent.query_routing_dataset/agent_evaluation_results.csv


Evaluating samples: 3it [00:16,  5.53s/it]

2025-10-22 22:59:59 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:00:01 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: bdd95aaf-99e9-4926-ae57-f3846a34dfe2) - 1 spans
2025-10-22 23:00:01 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: bdd95aaf-99e9-4926-ae57-f3846a34dfe2) - 1 spans
2025-10-22 23:00:01 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace bdd95aaf-99e9-4926-ae57-f3846a34dfe2
2025-10-22 23:00:01 - noveum_trace.transport.http_transport - INFO - ✅ Trace bdd95aaf-99e9-4926-ae57-f3846a34dfe2 successfully queued for export


2025-10-22 23:00:01 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:00:03 - noveum_trace.transport.batch_processor - INFO - ⏰ TIMEOUT TRIGGER: Sending batch due to timeout (5.2s >= 5.0s)
2025-10-22 23:00:03 - noveum_trace.transport.batch_processor - INFO - 📤 SENDING BATCH: 2 traces via send_callback
2025-10-22 23:00:03 - noveum_trace.transport.http_transport - INFO - 🚀 SENDING BATCH: 2 traces to https://api.noveum.ai/api/v1/traces
2025-10-22 23:00:03 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: 14a0cc75-5abe-4057-8b8e-474c095520bd) - 1 spans
2025-10-22 23:00:03 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: 14a0cc75-5abe-4057-8b8e-474c095520bd) - 1 spans
2025-10-22 23:00:03 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace 14a0cc75-5abe-4057-8b8e-474c095520bd
2025-10-22 23:00:03 - noveum_trace.transport.http_transport - INFO - ✅ Trace 14a0cc75-5abe-4057-8b8e-474c095520bd successfully queued for export


2025-10-22 23:00:03 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:00:04 - noveum_trace.transport.http_transport - INFO - 📡 HTTP RESPONSE: Status 200 from https://api.noveum.ai/api/v1/traces
2025-10-22 23:00:04 - noveum_trace.transport.http_transport - INFO - ✅ Successfully sent batch of 2 traces
2025-10-22 23:00:04 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully sent batch of 2 traces via callback
2025-10-22 23:00:05 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: 3cdcd252-7600-4247-8557-f67f1a66beae) - 1 spans
2025-10-22 23:00:05 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: 3cdcd252-7600-4247-8557-f67f1a66beae) - 1 spans
2025-10-22 23:00:05 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace 3cdcd252-7600-4247-8557-f67f1a66beae
2025-10-22 23:00:05 - noveum_trace.transport.http_transport - INFO - ✅ Trace 3cdcd252-7600-4247-8557-f67f1a66beae successfully queued for export


2025-10-22 23:00:05 - INFO - novaeval.evaluators.agent_evaluator - Saving intermediate results after 4 samples
2025-10-22 23:00:05 - INFO - novaeval.evaluators.agent_evaluator - Intermediate results saved to demo_results/agent.query_routing_dataset/agent_evaluation_results.csv


Evaluating samples: 4it [00:21,  5.69s/it]

2025-10-22 23:00:05 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:00:06 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: 57555bab-ecd2-4006-a353-7d27bef7df0b) - 1 spans
2025-10-22 23:00:06 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: 57555bab-ecd2-4006-a353-7d27bef7df0b) - 1 spans
2025-10-22 23:00:06 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace 57555bab-ecd2-4006-a353-7d27bef7df0b
2025-10-22 23:00:06 - noveum_trace.transport.http_transport - INFO - ✅ Trace 57555bab-ecd2-4006-a353-7d27bef7df0b successfully queued for export


2025-10-22 23:00:06 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:00:07 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: 6b673f69-57ca-406b-8795-a74e96080081) - 1 spans
2025-10-22 23:00:07 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: 6b673f69-57ca-406b-8795-a74e96080081) - 1 spans
2025-10-22 23:00:07 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace 6b673f69-57ca-406b-8795-a74e96080081
2025-10-22 23:00:07 - noveum_trace.transport.http_transport - INFO - ✅ Trace 6b673f69-57ca-406b-8795-a74e96080081 successfully queued for export


2025-10-22 23:00:07 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:00:08 - noveum_trace.transport.batch_processor - INFO - ⏰ TIMEOUT TRIGGER: Sending batch due to timeout (5.0s >= 5.0s)
2025-10-22 23:00:08 - noveum_trace.transport.batch_processor - INFO - 📤 SENDING BATCH: 4 traces via send_callback
2025-10-22 23:00:08 - noveum_trace.transport.http_transport - INFO - 🚀 SENDING BATCH: 4 traces to https://api.noveum.ai/api/v1/traces
2025-10-22 23:00:08 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: 47efa03a-15f4-4818-b062-db186df9847c) - 1 spans
2025-10-22 23:00:08 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: 47efa03a-15f4-4818-b062-db186df9847c) - 1 spans
2025-10-22 23:00:08 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace 47efa03a-15f4-4818-b062-db186df9847c
2025-10-22 23:00:08 - noveum_trace.transport.http_transport - INFO - ✅ Trace 47efa03a-15f4-4818-b062-db186df9847c successfully queued for export


2025-10-22 23:00:08 - INFO - novaeval.evaluators.agent_evaluator - Saving intermediate results after 5 samples
2025-10-22 23:00:08 - INFO - novaeval.evaluators.agent_evaluator - Intermediate results saved to demo_results/agent.query_routing_dataset/agent_evaluation_results.csv


Evaluating samples: 5it [00:25,  5.03s/it]

2025-10-22 23:00:08 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:00:09 - noveum_trace.transport.http_transport - INFO - 📡 HTTP RESPONSE: Status 200 from https://api.noveum.ai/api/v1/traces
2025-10-22 23:00:09 - noveum_trace.transport.http_transport - INFO - ✅ Successfully sent batch of 4 traces
2025-10-22 23:00:09 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully sent batch of 4 traces via callback
2025-10-22 23:00:10 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: a1e9ac36-8c30-4784-9f33-93ed2497ee97) - 1 spans
2025-10-22 23:00:10 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: a1e9ac36-8c30-4784-9f33-93ed2497ee97) - 1 spans
2025-10-22 23:00:10 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace a1e9ac36-8c30-4784-9f33-93ed2497ee97
2025-10-22 23:00:10 - noveum_trace.transport.http_transport - INFO - ✅ Trace a1e9ac36-8c30-4784-9f33-93ed2497ee97 successfully queued for export


2025-10-22 23:00:10 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:00:11 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: 62a690ab-2806-4619-92df-eec54d03b859) - 1 spans
2025-10-22 23:00:11 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: 62a690ab-2806-4619-92df-eec54d03b859) - 1 spans
2025-10-22 23:00:11 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace 62a690ab-2806-4619-92df-eec54d03b859
2025-10-22 23:00:11 - noveum_trace.transport.http_transport - INFO - ✅ Trace 62a690ab-2806-4619-92df-eec54d03b859 successfully queued for export


2025-10-22 23:00:11 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:00:12 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: 239c63de-f804-41f5-98af-463f42dd4089) - 1 spans
2025-10-22 23:00:12 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: 239c63de-f804-41f5-98af-463f42dd4089) - 1 spans
2025-10-22 23:00:12 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace 239c63de-f804-41f5-98af-463f42dd4089
2025-10-22 23:00:12 - noveum_trace.transport.http_transport - INFO - ✅ Trace 239c63de-f804-41f5-98af-463f42dd4089 successfully queued for export


2025-10-22 23:00:12 - INFO - novaeval.evaluators.agent_evaluator - Saving intermediate results after 6 samples
2025-10-22 23:00:12 - INFO - novaeval.evaluators.agent_evaluator - Intermediate results saved to demo_results/agent.query_routing_dataset/agent_evaluation_results.csv


Evaluating samples: 6it [00:29,  4.54s/it]

2025-10-22 23:00:12 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:00:13 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: 4b6d4dfb-7c55-4296-9ad6-0490385e162d) - 1 spans
2025-10-22 23:00:13 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: 4b6d4dfb-7c55-4296-9ad6-0490385e162d) - 1 spans
2025-10-22 23:00:13 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace 4b6d4dfb-7c55-4296-9ad6-0490385e162d
2025-10-22 23:00:13 - noveum_trace.transport.http_transport - INFO - ✅ Trace 4b6d4dfb-7c55-4296-9ad6-0490385e162d successfully queued for export


2025-10-22 23:00:13 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:00:14 - noveum_trace.transport.batch_processor - INFO - ⏰ TIMEOUT TRIGGER: Sending batch due to timeout (5.4s >= 5.0s)
2025-10-22 23:00:14 - noveum_trace.transport.batch_processor - INFO - 📤 SENDING BATCH: 5 traces via send_callback
2025-10-22 23:00:14 - noveum_trace.transport.http_transport - INFO - 🚀 SENDING BATCH: 5 traces to https://api.noveum.ai/api/v1/traces
2025-10-22 23:00:14 - noveum_trace.transport.http_transport - INFO - 📡 HTTP RESPONSE: Status 200 from https://api.noveum.ai/api/v1/traces
2025-10-22 23:00:14 - noveum_trace.transport.http_transport - INFO - ✅ Successfully sent batch of 5 traces
2025-10-22 23:00:14 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully sent batch of 5 traces via callback
2025-10-22 23:00:14 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: b6ec3f3c-f9ec-4714-b99e-034d4fa7655b) - 1 spans
2025-10-22 23:00:14 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEU

2025-10-22 23:00:14 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:00:16 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: ed49c38e-e2d4-4cb2-8049-ec501c7407d7) - 1 spans
2025-10-22 23:00:16 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: ed49c38e-e2d4-4cb2-8049-ec501c7407d7) - 1 spans
2025-10-22 23:00:16 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace ed49c38e-e2d4-4cb2-8049-ec501c7407d7
2025-10-22 23:00:16 - noveum_trace.transport.http_transport - INFO - ✅ Trace ed49c38e-e2d4-4cb2-8049-ec501c7407d7 successfully queued for export


2025-10-22 23:00:16 - INFO - novaeval.evaluators.agent_evaluator - Saving intermediate results after 7 samples
2025-10-22 23:00:16 - INFO - novaeval.evaluators.agent_evaluator - Intermediate results saved to demo_results/agent.query_routing_dataset/agent_evaluation_results.csv


Evaluating samples: 7it [00:33,  4.32s/it]

2025-10-22 23:00:16 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:00:17 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: 1df4cc5a-2027-401e-8a59-c0bc394f0463) - 1 spans
2025-10-22 23:00:17 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: 1df4cc5a-2027-401e-8a59-c0bc394f0463) - 1 spans
2025-10-22 23:00:17 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace 1df4cc5a-2027-401e-8a59-c0bc394f0463
2025-10-22 23:00:17 - noveum_trace.transport.http_transport - INFO - ✅ Trace 1df4cc5a-2027-401e-8a59-c0bc394f0463 successfully queued for export


2025-10-22 23:00:17 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:00:18 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: 75755b07-45c9-4ad8-a59e-f6b04685a2be) - 1 spans
2025-10-22 23:00:18 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: 75755b07-45c9-4ad8-a59e-f6b04685a2be) - 1 spans
2025-10-22 23:00:18 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace 75755b07-45c9-4ad8-a59e-f6b04685a2be
2025-10-22 23:00:18 - noveum_trace.transport.http_transport - INFO - ✅ Trace 75755b07-45c9-4ad8-a59e-f6b04685a2be successfully queued for export


2025-10-22 23:00:18 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:00:19 - noveum_trace.transport.batch_processor - INFO - ⏰ TIMEOUT TRIGGER: Sending batch due to timeout (5.3s >= 5.0s)
2025-10-22 23:00:19 - noveum_trace.transport.batch_processor - INFO - 📤 SENDING BATCH: 4 traces via send_callback
2025-10-22 23:00:19 - noveum_trace.transport.http_transport - INFO - 🚀 SENDING BATCH: 4 traces to https://api.noveum.ai/api/v1/traces
2025-10-22 23:00:19 - noveum_trace.transport.http_transport - INFO - 📡 HTTP RESPONSE: Status 200 from https://api.noveum.ai/api/v1/traces
2025-10-22 23:00:19 - noveum_trace.transport.http_transport - INFO - ✅ Successfully sent batch of 4 traces
2025-10-22 23:00:19 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully sent batch of 4 traces via callback
2025-10-22 23:00:20 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: dea54b30-9b78-45a1-861c-acef72c97a3f) - 1 spans
2025-10-22 23:00:20 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEU

2025-10-22 23:00:20 - INFO - novaeval.evaluators.agent_evaluator - Saving intermediate results after 8 samples
2025-10-22 23:00:20 - INFO - novaeval.evaluators.agent_evaluator - Intermediate results saved to demo_results/agent.query_routing_dataset/agent_evaluation_results.csv


Evaluating samples: 8it [00:37,  4.16s/it]

2025-10-22 23:00:20 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:00:21 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: 9cb16557-f687-4b6d-aa25-798f27e9615c) - 1 spans
2025-10-22 23:00:21 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: 9cb16557-f687-4b6d-aa25-798f27e9615c) - 1 spans
2025-10-22 23:00:21 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace 9cb16557-f687-4b6d-aa25-798f27e9615c
2025-10-22 23:00:21 - noveum_trace.transport.http_transport - INFO - ✅ Trace 9cb16557-f687-4b6d-aa25-798f27e9615c successfully queued for export


2025-10-22 23:00:21 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:00:22 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: 39999cf9-16cd-4b1f-8167-39953b328da6) - 1 spans
2025-10-22 23:00:22 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: 39999cf9-16cd-4b1f-8167-39953b328da6) - 1 spans
2025-10-22 23:00:22 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace 39999cf9-16cd-4b1f-8167-39953b328da6
2025-10-22 23:00:22 - noveum_trace.transport.http_transport - INFO - ✅ Trace 39999cf9-16cd-4b1f-8167-39953b328da6 successfully queued for export


2025-10-22 23:00:22 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:00:24 - noveum_trace.transport.batch_processor - INFO - ⏰ TIMEOUT TRIGGER: Sending batch due to timeout (5.1s >= 5.0s)
2025-10-22 23:00:24 - noveum_trace.transport.batch_processor - INFO - 📤 SENDING BATCH: 3 traces via send_callback
2025-10-22 23:00:24 - noveum_trace.transport.http_transport - INFO - 🚀 SENDING BATCH: 3 traces to https://api.noveum.ai/api/v1/traces
2025-10-22 23:00:24 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: 0829f590-d837-4dee-9e65-feee419011b2) - 1 spans
2025-10-22 23:00:24 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: 0829f590-d837-4dee-9e65-feee419011b2) - 1 spans
2025-10-22 23:00:24 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace 0829f590-d837-4dee-9e65-feee419011b2
2025-10-22 23:00:24 - noveum_trace.transport.http_transport - INFO - ✅ Trace 0829f590-d837-4dee-9e65-feee419011b2 successfully queued for export


2025-10-22 23:00:24 - INFO - novaeval.evaluators.agent_evaluator - Saving intermediate results after 9 samples
2025-10-22 23:00:24 - INFO - novaeval.evaluators.agent_evaluator - Intermediate results saved to demo_results/agent.query_routing_dataset/agent_evaluation_results.csv


Evaluating samples: 9it [00:41,  4.21s/it]

2025-10-22 23:00:24 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:00:24 - noveum_trace.transport.http_transport - INFO - 📡 HTTP RESPONSE: Status 200 from https://api.noveum.ai/api/v1/traces
2025-10-22 23:00:24 - noveum_trace.transport.http_transport - INFO - ✅ Successfully sent batch of 3 traces
2025-10-22 23:00:24 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully sent batch of 3 traces via callback
2025-10-22 23:00:26 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: 4e2bb2ea-fcec-476a-8505-3f15ad8243ca) - 1 spans
2025-10-22 23:00:26 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: 4e2bb2ea-fcec-476a-8505-3f15ad8243ca) - 1 spans
2025-10-22 23:00:26 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace 4e2bb2ea-fcec-476a-8505-3f15ad8243ca
2025-10-22 23:00:26 - noveum_trace.transport.http_transport - INFO - ✅ Trace 4e2bb2ea-fcec-476a-8505-3f15ad8243ca successfully queued for export


2025-10-22 23:00:26 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:00:28 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: a091f50f-c6f6-4e99-a812-c427266b63f3) - 1 spans
2025-10-22 23:00:28 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: a091f50f-c6f6-4e99-a812-c427266b63f3) - 1 spans
2025-10-22 23:00:28 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace a091f50f-c6f6-4e99-a812-c427266b63f3
2025-10-22 23:00:28 - noveum_trace.transport.http_transport - INFO - ✅ Trace a091f50f-c6f6-4e99-a812-c427266b63f3 successfully queued for export


2025-10-22 23:00:28 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:00:29 - noveum_trace.transport.batch_processor - INFO - ⏰ TIMEOUT TRIGGER: Sending batch due to timeout (5.4s >= 5.0s)
2025-10-22 23:00:29 - noveum_trace.transport.batch_processor - INFO - 📤 SENDING BATCH: 3 traces via send_callback
2025-10-22 23:00:29 - noveum_trace.transport.http_transport - INFO - 🚀 SENDING BATCH: 3 traces to https://api.noveum.ai/api/v1/traces
2025-10-22 23:00:30 - noveum_trace.transport.http_transport - INFO - 📡 HTTP RESPONSE: Status 200 from https://api.noveum.ai/api/v1/traces
2025-10-22 23:00:30 - noveum_trace.transport.http_transport - INFO - ✅ Successfully sent batch of 3 traces
2025-10-22 23:00:30 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully sent batch of 3 traces via callback
2025-10-22 23:00:30 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: 236d1ec0-daf8-46f7-be50-ea50e4a5e1c6) - 1 spans
2025-10-22 23:00:30 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEU

2025-10-22 23:00:30 - INFO - novaeval.evaluators.agent_evaluator - Saving intermediate results after 10 samples
2025-10-22 23:00:30 - INFO - novaeval.evaluators.agent_evaluator - Intermediate results saved to demo_results/agent.query_routing_dataset/agent_evaluation_results.csv


Evaluating samples: 10it [00:47,  4.74s/it]

2025-10-22 23:00:30 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:00:31 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: 609077f6-7021-4b16-83e5-d3ed68573359) - 1 spans
2025-10-22 23:00:31 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: 609077f6-7021-4b16-83e5-d3ed68573359) - 1 spans
2025-10-22 23:00:31 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace 609077f6-7021-4b16-83e5-d3ed68573359
2025-10-22 23:00:31 - noveum_trace.transport.http_transport - INFO - ✅ Trace 609077f6-7021-4b16-83e5-d3ed68573359 successfully queued for export


2025-10-22 23:00:31 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:00:32 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: c2bcfc15-818d-4d23-91ef-c38392c4ce9a) - 1 spans
2025-10-22 23:00:32 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: c2bcfc15-818d-4d23-91ef-c38392c4ce9a) - 1 spans
2025-10-22 23:00:32 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace c2bcfc15-818d-4d23-91ef-c38392c4ce9a
2025-10-22 23:00:32 - noveum_trace.transport.http_transport - INFO - ✅ Trace c2bcfc15-818d-4d23-91ef-c38392c4ce9a successfully queued for export


2025-10-22 23:00:32 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:00:34 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: 027b030d-d0d7-49f7-a9e8-8933f7829573) - 1 spans
2025-10-22 23:00:34 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: 027b030d-d0d7-49f7-a9e8-8933f7829573) - 1 spans
2025-10-22 23:00:34 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace 027b030d-d0d7-49f7-a9e8-8933f7829573
2025-10-22 23:00:34 - noveum_trace.transport.http_transport - INFO - ✅ Trace 027b030d-d0d7-49f7-a9e8-8933f7829573 successfully queued for export


2025-10-22 23:00:34 - INFO - novaeval.evaluators.agent_evaluator - Saving intermediate results after 11 samples
2025-10-22 23:00:34 - INFO - novaeval.evaluators.agent_evaluator - Intermediate results saved to demo_results/agent.query_routing_dataset/agent_evaluation_results.csv


Evaluating samples: 11it [00:51,  4.41s/it]

2025-10-22 23:00:34 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:00:35 - noveum_trace.transport.batch_processor - INFO - ⏰ TIMEOUT TRIGGER: Sending batch due to timeout (5.2s >= 5.0s)
2025-10-22 23:00:35 - noveum_trace.transport.batch_processor - INFO - 📤 SENDING BATCH: 4 traces via send_callback
2025-10-22 23:00:35 - noveum_trace.transport.http_transport - INFO - 🚀 SENDING BATCH: 4 traces to https://api.noveum.ai/api/v1/traces
2025-10-22 23:00:35 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: 3b7a2d72-a669-4745-939e-e3eba5fb166b) - 1 spans
2025-10-22 23:00:35 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: 3b7a2d72-a669-4745-939e-e3eba5fb166b) - 1 spans
2025-10-22 23:00:35 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace 3b7a2d72-a669-4745-939e-e3eba5fb166b
2025-10-22 23:00:35 - noveum_trace.transport.http_transport - INFO - ✅ Trace 3b7a2d72-a669-4745-939e-e3eba5fb166b successfully queued for export


2025-10-22 23:00:35 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:00:36 - noveum_trace.transport.http_transport - INFO - 📡 HTTP RESPONSE: Status 200 from https://api.noveum.ai/api/v1/traces
2025-10-22 23:00:36 - noveum_trace.transport.http_transport - INFO - ✅ Successfully sent batch of 4 traces
2025-10-22 23:00:36 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully sent batch of 4 traces via callback
2025-10-22 23:00:36 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: 3155450e-76b4-4a55-b15b-5c369a6cea5c) - 1 spans
2025-10-22 23:00:36 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: 3155450e-76b4-4a55-b15b-5c369a6cea5c) - 1 spans
2025-10-22 23:00:36 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace 3155450e-76b4-4a55-b15b-5c369a6cea5c
2025-10-22 23:00:36 - noveum_trace.transport.http_transport - INFO - ✅ Trace 3155450e-76b4-4a55-b15b-5c369a6cea5c successfully queued for export


2025-10-22 23:00:36 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:00:38 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: 21337429-011a-4aba-a4e5-83ac5be321ce) - 1 spans
2025-10-22 23:00:38 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: 21337429-011a-4aba-a4e5-83ac5be321ce) - 1 spans
2025-10-22 23:00:38 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace 21337429-011a-4aba-a4e5-83ac5be321ce
2025-10-22 23:00:38 - noveum_trace.transport.http_transport - INFO - ✅ Trace 21337429-011a-4aba-a4e5-83ac5be321ce successfully queued for export


2025-10-22 23:00:38 - INFO - novaeval.evaluators.agent_evaluator - Saving intermediate results after 12 samples
2025-10-22 23:00:38 - INFO - novaeval.evaluators.agent_evaluator - Intermediate results saved to demo_results/agent.query_routing_dataset/agent_evaluation_results.csv


Evaluating samples: 12it [00:55,  4.40s/it]

2025-10-22 23:00:38 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:00:40 - noveum_trace.transport.batch_processor - INFO - ⏰ TIMEOUT TRIGGER: Sending batch due to timeout (5.4s >= 5.0s)
2025-10-22 23:00:40 - noveum_trace.transport.batch_processor - INFO - 📤 SENDING BATCH: 3 traces via send_callback
2025-10-22 23:00:40 - noveum_trace.transport.http_transport - INFO - 🚀 SENDING BATCH: 3 traces to https://api.noveum.ai/api/v1/traces
2025-10-22 23:00:40 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: 4ebfd611-8645-4781-a3fa-adc5413e7ef9) - 1 spans
2025-10-22 23:00:40 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: 4ebfd611-8645-4781-a3fa-adc5413e7ef9) - 1 spans
2025-10-22 23:00:40 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace 4ebfd611-8645-4781-a3fa-adc5413e7ef9
2025-10-22 23:00:40 - noveum_trace.transport.http_transport - INFO - ✅ Trace 4ebfd611-8645-4781-a3fa-adc5413e7ef9 successfully queued for export


2025-10-22 23:00:40 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:00:41 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: 622d34b2-775e-4533-9b86-f3a269f7a87d) - 1 spans
2025-10-22 23:00:41 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: 622d34b2-775e-4533-9b86-f3a269f7a87d) - 1 spans
2025-10-22 23:00:41 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace 622d34b2-775e-4533-9b86-f3a269f7a87d
2025-10-22 23:00:41 - noveum_trace.transport.http_transport - INFO - ✅ Trace 622d34b2-775e-4533-9b86-f3a269f7a87d successfully queued for export


2025-10-22 23:00:41 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:00:43 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: 094c79f8-e0c9-4ce7-b493-3ead460412da) - 1 spans
2025-10-22 23:00:43 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: 094c79f8-e0c9-4ce7-b493-3ead460412da) - 1 spans
2025-10-22 23:00:43 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace 094c79f8-e0c9-4ce7-b493-3ead460412da
2025-10-22 23:00:43 - noveum_trace.transport.http_transport - INFO - ✅ Trace 094c79f8-e0c9-4ce7-b493-3ead460412da successfully queued for export


2025-10-22 23:00:43 - INFO - novaeval.evaluators.agent_evaluator - Saving intermediate results after 13 samples
2025-10-22 23:00:43 - INFO - novaeval.evaluators.agent_evaluator - Intermediate results saved to demo_results/agent.query_routing_dataset/agent_evaluation_results.csv


Evaluating samples: 13it [01:00,  4.59s/it]

2025-10-22 23:00:43 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:00:43 - noveum_trace.transport.http_transport - INFO - 📡 HTTP RESPONSE: Status 200 from https://api.noveum.ai/api/v1/traces
2025-10-22 23:00:43 - noveum_trace.transport.http_transport - INFO - ✅ Successfully sent batch of 3 traces
2025-10-22 23:00:43 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully sent batch of 3 traces via callback
2025-10-22 23:00:45 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: ba787a5a-c805-4920-bcb6-54f73a11e9be) - 1 spans
2025-10-22 23:00:45 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: ba787a5a-c805-4920-bcb6-54f73a11e9be) - 1 spans
2025-10-22 23:00:45 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace ba787a5a-c805-4920-bcb6-54f73a11e9be
2025-10-22 23:00:45 - noveum_trace.transport.http_transport - INFO - ✅ Trace ba787a5a-c805-4920-bcb6-54f73a11e9be successfully queued for export


2025-10-22 23:00:45 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:00:45 - noveum_trace.transport.batch_processor - INFO - ⏰ TIMEOUT TRIGGER: Sending batch due to timeout (5.5s >= 5.0s)
2025-10-22 23:00:45 - noveum_trace.transport.batch_processor - INFO - 📤 SENDING BATCH: 4 traces via send_callback
2025-10-22 23:00:45 - noveum_trace.transport.http_transport - INFO - 🚀 SENDING BATCH: 4 traces to https://api.noveum.ai/api/v1/traces
2025-10-22 23:00:46 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: f7978f02-3999-4b3e-adc7-86d57c952fff) - 1 spans
2025-10-22 23:00:46 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: f7978f02-3999-4b3e-adc7-86d57c952fff) - 1 spans
2025-10-22 23:00:46 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace f7978f02-3999-4b3e-adc7-86d57c952fff
2025-10-22 23:00:46 - noveum_trace.transport.http_transport - INFO - ✅ Trace f7978f02-3999-4b3e-adc7-86d57c952fff successfully queued for export


2025-10-22 23:00:46 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:00:47 - noveum_trace.transport.http_transport - INFO - 📡 HTTP RESPONSE: Status 200 from https://api.noveum.ai/api/v1/traces
2025-10-22 23:00:47 - noveum_trace.transport.http_transport - INFO - ✅ Successfully sent batch of 4 traces
2025-10-22 23:00:47 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully sent batch of 4 traces via callback
2025-10-22 23:00:47 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: f7c102f2-ae59-4acf-8a11-c8763070bc2d) - 1 spans
2025-10-22 23:00:47 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: f7c102f2-ae59-4acf-8a11-c8763070bc2d) - 1 spans
2025-10-22 23:00:47 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace f7c102f2-ae59-4acf-8a11-c8763070bc2d
2025-10-22 23:00:47 - noveum_trace.transport.http_transport - INFO - ✅ Trace f7c102f2-ae59-4acf-8a11-c8763070bc2d successfully queued for export


2025-10-22 23:00:47 - INFO - novaeval.evaluators.agent_evaluator - Saving intermediate results after 14 samples
2025-10-22 23:00:47 - INFO - novaeval.evaluators.agent_evaluator - Intermediate results saved to demo_results/agent.query_routing_dataset/agent_evaluation_results.csv


Evaluating samples: 14it [01:04,  4.50s/it]

2025-10-22 23:00:47 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:00:48 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: 499bdbdf-4ddd-4197-a8e4-8fa056cf2c81) - 1 spans
2025-10-22 23:00:48 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: 499bdbdf-4ddd-4197-a8e4-8fa056cf2c81) - 1 spans
2025-10-22 23:00:48 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace 499bdbdf-4ddd-4197-a8e4-8fa056cf2c81
2025-10-22 23:00:48 - noveum_trace.transport.http_transport - INFO - ✅ Trace 499bdbdf-4ddd-4197-a8e4-8fa056cf2c81 successfully queued for export


2025-10-22 23:00:48 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:00:50 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: 9941d150-0874-4f64-8348-a5e8a777128e) - 1 spans
2025-10-22 23:00:50 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: 9941d150-0874-4f64-8348-a5e8a777128e) - 1 spans
2025-10-22 23:00:50 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace 9941d150-0874-4f64-8348-a5e8a777128e
2025-10-22 23:00:50 - noveum_trace.transport.http_transport - INFO - ✅ Trace 9941d150-0874-4f64-8348-a5e8a777128e successfully queued for export


2025-10-22 23:00:50 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:00:51 - noveum_trace.transport.batch_processor - INFO - ⏰ TIMEOUT TRIGGER: Sending batch due to timeout (5.2s >= 5.0s)
2025-10-22 23:00:51 - noveum_trace.transport.batch_processor - INFO - 📤 SENDING BATCH: 4 traces via send_callback
2025-10-22 23:00:51 - noveum_trace.transport.http_transport - INFO - 🚀 SENDING BATCH: 4 traces to https://api.noveum.ai/api/v1/traces
2025-10-22 23:00:51 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: b4820be8-ae51-49bc-97d5-6aaf3f26ccbf) - 1 spans
2025-10-22 23:00:51 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: b4820be8-ae51-49bc-97d5-6aaf3f26ccbf) - 1 spans
2025-10-22 23:00:51 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace b4820be8-ae51-49bc-97d5-6aaf3f26ccbf
2025-10-22 23:00:51 - noveum_trace.transport.http_transport - INFO - ✅ Trace b4820be8-ae51-49bc-97d5-6aaf3f26ccbf successfully queued for export


2025-10-22 23:00:51 - INFO - novaeval.evaluators.agent_evaluator - Saving intermediate results after 15 samples
2025-10-22 23:00:51 - INFO - novaeval.evaluators.agent_evaluator - Intermediate results saved to demo_results/agent.query_routing_dataset/agent_evaluation_results.csv


Evaluating samples: 15it [01:08,  4.18s/it]

2025-10-22 23:00:51 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:00:52 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: 58a8930f-83c2-4717-9f71-5a34660d8fa2) - 1 spans
2025-10-22 23:00:52 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: 58a8930f-83c2-4717-9f71-5a34660d8fa2) - 1 spans
2025-10-22 23:00:52 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace 58a8930f-83c2-4717-9f71-5a34660d8fa2
2025-10-22 23:00:52 - noveum_trace.transport.http_transport - INFO - ✅ Trace 58a8930f-83c2-4717-9f71-5a34660d8fa2 successfully queued for export


2025-10-22 23:00:52 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:00:52 - noveum_trace.transport.http_transport - INFO - 📡 HTTP RESPONSE: Status 200 from https://api.noveum.ai/api/v1/traces
2025-10-22 23:00:52 - noveum_trace.transport.http_transport - INFO - ✅ Successfully sent batch of 4 traces
2025-10-22 23:00:52 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully sent batch of 4 traces via callback
2025-10-22 23:00:53 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: 9572f06a-e0ba-4e67-81eb-eaad1774cc15) - 1 spans
2025-10-22 23:00:53 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: 9572f06a-e0ba-4e67-81eb-eaad1774cc15) - 1 spans
2025-10-22 23:00:53 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace 9572f06a-e0ba-4e67-81eb-eaad1774cc15
2025-10-22 23:00:53 - noveum_trace.transport.http_transport - INFO - ✅ Trace 9572f06a-e0ba-4e67-81eb-eaad1774cc15 successfully queued for export


2025-10-22 23:00:53 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:00:54 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: 11aa6f5b-1e71-440f-b4d7-cfa0b1a3ae56) - 1 spans
2025-10-22 23:00:54 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: 11aa6f5b-1e71-440f-b4d7-cfa0b1a3ae56) - 1 spans
2025-10-22 23:00:54 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace 11aa6f5b-1e71-440f-b4d7-cfa0b1a3ae56
2025-10-22 23:00:54 - noveum_trace.transport.http_transport - INFO - ✅ Trace 11aa6f5b-1e71-440f-b4d7-cfa0b1a3ae56 successfully queued for export


2025-10-22 23:00:54 - INFO - novaeval.evaluators.agent_evaluator - Saving intermediate results after 16 samples
2025-10-22 23:00:54 - INFO - novaeval.evaluators.agent_evaluator - Intermediate results saved to demo_results/agent.query_routing_dataset/agent_evaluation_results.csv


Evaluating samples: 16it [01:11,  3.92s/it]

2025-10-22 23:00:54 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:00:56 - noveum_trace.transport.batch_processor - INFO - ⏰ TIMEOUT TRIGGER: Sending batch due to timeout (5.4s >= 5.0s)
2025-10-22 23:00:56 - noveum_trace.transport.batch_processor - INFO - 📤 SENDING BATCH: 4 traces via send_callback
2025-10-22 23:00:56 - noveum_trace.transport.http_transport - INFO - 🚀 SENDING BATCH: 4 traces to https://api.noveum.ai/api/v1/traces
2025-10-22 23:00:56 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: 1700f8dd-b141-4677-9778-db8440931e33) - 1 spans
2025-10-22 23:00:56 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: 1700f8dd-b141-4677-9778-db8440931e33) - 1 spans
2025-10-22 23:00:56 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace 1700f8dd-b141-4677-9778-db8440931e33
2025-10-22 23:00:56 - noveum_trace.transport.http_transport - INFO - ✅ Trace 1700f8dd-b141-4677-9778-db8440931e33 successfully queued for export


2025-10-22 23:00:56 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:00:57 - noveum_trace.transport.http_transport - INFO - 📡 HTTP RESPONSE: Status 200 from https://api.noveum.ai/api/v1/traces
2025-10-22 23:00:57 - noveum_trace.transport.http_transport - INFO - ✅ Successfully sent batch of 4 traces
2025-10-22 23:00:57 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully sent batch of 4 traces via callback
2025-10-22 23:00:58 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: 4ba7a1c6-56f2-40bb-bb89-bbcd9bf864b9) - 1 spans
2025-10-22 23:00:58 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: 4ba7a1c6-56f2-40bb-bb89-bbcd9bf864b9) - 1 spans
2025-10-22 23:00:58 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace 4ba7a1c6-56f2-40bb-bb89-bbcd9bf864b9
2025-10-22 23:00:58 - noveum_trace.transport.http_transport - INFO - ✅ Trace 4ba7a1c6-56f2-40bb-bb89-bbcd9bf864b9 successfully queued for export


2025-10-22 23:00:58 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:00:59 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: c25eaffa-a215-4fb8-af0d-0d0fcf0acde9) - 1 spans
2025-10-22 23:00:59 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: c25eaffa-a215-4fb8-af0d-0d0fcf0acde9) - 1 spans
2025-10-22 23:00:59 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace c25eaffa-a215-4fb8-af0d-0d0fcf0acde9
2025-10-22 23:00:59 - noveum_trace.transport.http_transport - INFO - ✅ Trace c25eaffa-a215-4fb8-af0d-0d0fcf0acde9 successfully queued for export


2025-10-22 23:00:59 - INFO - novaeval.evaluators.agent_evaluator - Saving intermediate results after 17 samples
2025-10-22 23:00:59 - INFO - novaeval.evaluators.agent_evaluator - Intermediate results saved to demo_results/agent.query_routing_dataset/agent_evaluation_results.csv


Evaluating samples: 17it [01:16,  4.19s/it]

2025-10-22 23:00:59 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:01:00 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: b925a4e7-7af5-4039-a7ab-0fdb05e03c71) - 1 spans
2025-10-22 23:01:00 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: b925a4e7-7af5-4039-a7ab-0fdb05e03c71) - 1 spans
2025-10-22 23:01:00 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace b925a4e7-7af5-4039-a7ab-0fdb05e03c71
2025-10-22 23:01:00 - noveum_trace.transport.http_transport - INFO - ✅ Trace b925a4e7-7af5-4039-a7ab-0fdb05e03c71 successfully queued for export


2025-10-22 23:01:00 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:01:01 - noveum_trace.transport.batch_processor - INFO - ⏰ TIMEOUT TRIGGER: Sending batch due to timeout (5.0s >= 5.0s)
2025-10-22 23:01:01 - noveum_trace.transport.batch_processor - INFO - 📤 SENDING BATCH: 4 traces via send_callback
2025-10-22 23:01:01 - noveum_trace.transport.http_transport - INFO - 🚀 SENDING BATCH: 4 traces to https://api.noveum.ai/api/v1/traces
2025-10-22 23:01:01 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: 9639d4eb-e562-4372-a019-e6eb2f867d0f) - 1 spans
2025-10-22 23:01:01 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: 9639d4eb-e562-4372-a019-e6eb2f867d0f) - 1 spans
2025-10-22 23:01:01 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace 9639d4eb-e562-4372-a019-e6eb2f867d0f
2025-10-22 23:01:01 - noveum_trace.transport.http_transport - INFO - ✅ Trace 9639d4eb-e562-4372-a019-e6eb2f867d0f successfully queued for export


2025-10-22 23:01:01 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:01:01 - noveum_trace.transport.http_transport - INFO - 📡 HTTP RESPONSE: Status 200 from https://api.noveum.ai/api/v1/traces
2025-10-22 23:01:01 - noveum_trace.transport.http_transport - INFO - ✅ Successfully sent batch of 4 traces
2025-10-22 23:01:01 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully sent batch of 4 traces via callback
2025-10-22 23:01:03 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: 02b5f22f-eb99-4514-9a0d-fb74709ade16) - 1 spans
2025-10-22 23:01:03 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: 02b5f22f-eb99-4514-9a0d-fb74709ade16) - 1 spans
2025-10-22 23:01:03 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace 02b5f22f-eb99-4514-9a0d-fb74709ade16
2025-10-22 23:01:03 - noveum_trace.transport.http_transport - INFO - ✅ Trace 02b5f22f-eb99-4514-9a0d-fb74709ade16 successfully queued for export


2025-10-22 23:01:03 - INFO - novaeval.evaluators.agent_evaluator - Saving intermediate results after 18 samples
2025-10-22 23:01:03 - INFO - novaeval.evaluators.agent_evaluator - Intermediate results saved to demo_results/agent.query_routing_dataset/agent_evaluation_results.csv


Evaluating samples: 18it [01:20,  4.06s/it]

2025-10-22 23:01:03 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:01:04 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: ed8a64df-f92b-4a38-b236-02cd5720da91) - 1 spans
2025-10-22 23:01:04 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: ed8a64df-f92b-4a38-b236-02cd5720da91) - 1 spans
2025-10-22 23:01:04 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace ed8a64df-f92b-4a38-b236-02cd5720da91
2025-10-22 23:01:04 - noveum_trace.transport.http_transport - INFO - ✅ Trace ed8a64df-f92b-4a38-b236-02cd5720da91 successfully queued for export


2025-10-22 23:01:04 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:01:06 - noveum_trace.transport.batch_processor - INFO - ⏰ TIMEOUT TRIGGER: Sending batch due to timeout (5.3s >= 5.0s)
2025-10-22 23:01:06 - noveum_trace.transport.batch_processor - INFO - 📤 SENDING BATCH: 3 traces via send_callback
2025-10-22 23:01:06 - noveum_trace.transport.http_transport - INFO - 🚀 SENDING BATCH: 3 traces to https://api.noveum.ai/api/v1/traces
2025-10-22 23:01:06 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: 65462cbf-b85d-40b9-b50d-a71850336ebe) - 1 spans
2025-10-22 23:01:06 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: 65462cbf-b85d-40b9-b50d-a71850336ebe) - 1 spans
2025-10-22 23:01:06 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace 65462cbf-b85d-40b9-b50d-a71850336ebe
2025-10-22 23:01:06 - noveum_trace.transport.http_transport - INFO - ✅ Trace 65462cbf-b85d-40b9-b50d-a71850336ebe successfully queued for export


2025-10-22 23:01:06 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:01:07 - noveum_trace.transport.http_transport - INFO - 📡 HTTP RESPONSE: Status 200 from https://api.noveum.ai/api/v1/traces
2025-10-22 23:01:07 - noveum_trace.transport.http_transport - INFO - ✅ Successfully sent batch of 3 traces
2025-10-22 23:01:07 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully sent batch of 3 traces via callback
2025-10-22 23:01:08 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: a174d9f5-fa2d-4c6d-a674-341e8382a11d) - 1 spans
2025-10-22 23:01:08 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: a174d9f5-fa2d-4c6d-a674-341e8382a11d) - 1 spans
2025-10-22 23:01:08 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace a174d9f5-fa2d-4c6d-a674-341e8382a11d
2025-10-22 23:01:08 - noveum_trace.transport.http_transport - INFO - ✅ Trace a174d9f5-fa2d-4c6d-a674-341e8382a11d successfully queued for export


2025-10-22 23:01:08 - INFO - novaeval.evaluators.agent_evaluator - Saving intermediate results after 19 samples
2025-10-22 23:01:08 - INFO - novaeval.evaluators.agent_evaluator - Intermediate results saved to demo_results/agent.query_routing_dataset/agent_evaluation_results.csv


Evaluating samples: 19it [01:25,  4.36s/it]

2025-10-22 23:01:08 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:01:09 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: 672a16dd-589f-4d48-a8e4-c9e6bb5d871b) - 1 spans
2025-10-22 23:01:09 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: 672a16dd-589f-4d48-a8e4-c9e6bb5d871b) - 1 spans
2025-10-22 23:01:09 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace 672a16dd-589f-4d48-a8e4-c9e6bb5d871b
2025-10-22 23:01:09 - noveum_trace.transport.http_transport - INFO - ✅ Trace 672a16dd-589f-4d48-a8e4-c9e6bb5d871b successfully queued for export


2025-10-22 23:01:09 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:01:11 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: 89085ca6-0aa1-44f9-9d93-804b9725c2c0) - 1 spans
2025-10-22 23:01:11 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: 89085ca6-0aa1-44f9-9d93-804b9725c2c0) - 1 spans
2025-10-22 23:01:11 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace 89085ca6-0aa1-44f9-9d93-804b9725c2c0
2025-10-22 23:01:11 - noveum_trace.transport.http_transport - INFO - ✅ Trace 89085ca6-0aa1-44f9-9d93-804b9725c2c0 successfully queued for export


2025-10-22 23:01:11 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:01:12 - noveum_trace.transport.batch_processor - INFO - ⏰ TIMEOUT TRIGGER: Sending batch due to timeout (5.3s >= 5.0s)
2025-10-22 23:01:12 - noveum_trace.transport.batch_processor - INFO - 📤 SENDING BATCH: 4 traces via send_callback
2025-10-22 23:01:12 - noveum_trace.transport.http_transport - INFO - 🚀 SENDING BATCH: 4 traces to https://api.noveum.ai/api/v1/traces
2025-10-22 23:01:12 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: b7a046e6-46d4-41f7-a47b-35dae73bb923) - 1 spans
2025-10-22 23:01:12 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: b7a046e6-46d4-41f7-a47b-35dae73bb923) - 1 spans
2025-10-22 23:01:12 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace b7a046e6-46d4-41f7-a47b-35dae73bb923
2025-10-22 23:01:12 - noveum_trace.transport.http_transport - INFO - ✅ Trace b7a046e6-46d4-41f7-a47b-35dae73bb923 successfully queued for export


2025-10-22 23:01:12 - INFO - novaeval.evaluators.agent_evaluator - Saving intermediate results after 20 samples
2025-10-22 23:01:12 - INFO - novaeval.evaluators.agent_evaluator - Intermediate results saved to demo_results/agent.query_routing_dataset/agent_evaluation_results.csv


Evaluating samples: 20it [01:29,  4.46s/it]

2025-10-22 23:01:12 - INFO - novaeval.evaluators.agent_evaluator - Saving final results
2025-10-22 23:01:12 - INFO - novaeval.evaluators.agent_evaluator - Reloaded 20 results from CSV
2025-10-22 23:01:12 - INFO - novaeval.evaluators.agent_evaluator - Agent evaluation completed

✅ Evaluation completed!

📊 Results Summary:
  - task_progression: 1.27
  - context_relevancy: 7.36
  - role_adherence: 5.45
  - tool_relevancy: 0.00
  - parameter_correctness: 0.00

🔍 Individual Scores:

  Record 1 (Task: eda4fe22-9a2b-4b73-856b-f4f3309bf719):
    - task_progression: 1.0
    - context_relevancy: 7.5
    - role_adherence: 1.0
    - tool_relevancy: 0.0
    - parameter_correctness: 0.0

  Record 2 (Task: 0ffffba1-8a37-443c-8866-d53ffbfa7718):
    - task_progression: 2.8
    - context_relevancy: 7.8
    - role_adherence: 10.0
    - tool_relevancy: 0.0
    - parameter_correctness: 0.0

  Record 3 (Task: f1f37bd7-0851-4659-b493-b80d3800d920):
    - task_progression: 1.0
    - context_relevancy: 7.5
  




✅ Gemini model initialized
✅ Initialized 5 scoring functions:
  - task_progression_scorer
  - context_relevancy_scorer
  - role_adherence_scorer
  - tool_relevancy_scorer
  - parameter_correctness_scorer

✅ AgentEvaluator created with Gemini model and scoring functions
✅ Evaluation components ready!

📋 Step 6: Running Evaluation
🎯 Evaluating 25 samples...
🚀 Running evaluation on sample data...

📊 Evaluating 0 sample records...
2025-10-22 23:01:12 - INFO - novaeval.evaluators.agent_evaluator - Starting agent evaluation process


Evaluating samples: 0it [00:00, ?it/s]

2025-10-22 23:01:12 - INFO - novaeval.evaluators.agent_evaluator - Saving final results
2025-10-22 23:01:12 - INFO - novaeval.evaluators.agent_evaluator - Agent evaluation completed

✅ Evaluation completed!
❌ Results file not found
❌ Evaluation failed

📋 Step 7: Exporting Dataset
💾 Exporting processed dataset...
✅ Exported to ./processed_datasets/tool-orchestator_dataset_processed_dataset.json
✅ Exported to ./processed_datasets/tool-orchestator_dataset_processed_dataset.csv
✅ Export completed successfully!

🎉 EVALUATION PIPELINE COMPLETED!
📊 Final Results:
  - File processed: split_datasets/tool-orchestator_dataset.json
  - Spans loaded: 20
  - Dataset size: 20
  - Evaluation completed: False
  - Export successful: True
  - Errors encountered: 1
Completed tool-orchestator_dataset.json

Processing agent.web_search_generation_dataset.json...
🚀 Starting Complete Agent Evaluation Pipeline
📁 Processing file: split_datasets/agent.web_search_generation_dataset.json

📋 Step 1: Environment Setu


Evaluating samples: 0it [00:00, ?it/s]

2025-10-22 23:01:12 - INFO - novaeval.evaluators.agent_evaluator - Saving final results
2025-10-22 23:01:12 - INFO - novaeval.evaluators.agent_evaluator - Agent evaluation completed

✅ Evaluation completed!
❌ Results file not found
❌ Evaluation failed

📋 Step 7: Exporting Dataset
💾 Exporting processed dataset...
✅ Exported to ./processed_datasets/agent.web_search_generation_dataset_processed_dataset.json
✅ Exported to ./processed_datasets/agent.web_search_generation_dataset_processed_dataset.csv
✅ Export completed successfully!

🎉 EVALUATION PIPELINE COMPLETED!
📊 Final Results:
  - File processed: split_datasets/agent.web_search_generation_dataset.json
  - Spans loaded: 8
  - Dataset size: 8
  - Evaluation completed: False
  - Export successful: True
  - Errors encountered: 1
Completed agent.web_search_generation_dataset.json

Processing agent.rag_evaluation_metrics_dataset.json...
🚀 Starting Complete Agent Evaluation Pipeline
📁 Processing file: split_datasets/agent.rag_evaluation_metr


Evaluating samples: 0it [00:00, ?it/s]

2025-10-22 23:01:12 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:01:12 - noveum_trace.transport.http_transport - INFO - 📡 HTTP RESPONSE: Status 200 from https://api.noveum.ai/api/v1/traces
2025-10-22 23:01:12 - noveum_trace.transport.http_transport - INFO - ✅ Successfully sent batch of 4 traces
2025-10-22 23:01:12 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully sent batch of 4 traces via callback
2025-10-22 23:01:13 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: eb775ca5-3cc0-47f9-b26b-0e3e770dadd5) - 1 spans
2025-10-22 23:01:13 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: eb775ca5-3cc0-47f9-b26b-0e3e770dadd5) - 1 spans
2025-10-22 23:01:13 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace eb775ca5-3cc0-47f9-b26b-0e3e770dadd5
2025-10-22 23:01:13 - noveum_trace.transport.http_transport - INFO - ✅ Trace eb775ca5-3cc0-47f9-b26b-0e3e770dadd5 successfully queued for export


2025-10-22 23:01:13 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:01:16 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: cce316e9-4f3f-4d2b-bdba-b2856b9c4181) - 1 spans
2025-10-22 23:01:16 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: cce316e9-4f3f-4d2b-bdba-b2856b9c4181) - 1 spans
2025-10-22 23:01:16 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace cce316e9-4f3f-4d2b-bdba-b2856b9c4181
2025-10-22 23:01:16 - noveum_trace.transport.http_transport - INFO - ✅ Trace cce316e9-4f3f-4d2b-bdba-b2856b9c4181 successfully queued for export


2025-10-22 23:01:16 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:01:17 - noveum_trace.transport.batch_processor - INFO - ⏰ TIMEOUT TRIGGER: Sending batch due to timeout (5.3s >= 5.0s)
2025-10-22 23:01:17 - noveum_trace.transport.batch_processor - INFO - 📤 SENDING BATCH: 3 traces via send_callback
2025-10-22 23:01:17 - noveum_trace.transport.http_transport - INFO - 🚀 SENDING BATCH: 3 traces to https://api.noveum.ai/api/v1/traces
2025-10-22 23:01:17 - noveum_trace.transport.http_transport - INFO - 📡 HTTP RESPONSE: Status 200 from https://api.noveum.ai/api/v1/traces
2025-10-22 23:01:17 - noveum_trace.transport.http_transport - INFO - ✅ Successfully sent batch of 3 traces
2025-10-22 23:01:17 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully sent batch of 3 traces via callback
2025-10-22 23:01:18 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: ef7447dd-1534-4496-9aff-02d51839cd0c) - 1 spans
2025-10-22 23:01:18 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEU

2025-10-22 23:01:18 - INFO - novaeval.evaluators.agent_evaluator - Saving intermediate results after 1 samples
2025-10-22 23:01:18 - INFO - novaeval.evaluators.agent_evaluator - Intermediate results saved to demo_results/agent.rag_evaluation_metrics_dataset/agent_evaluation_results.csv


Evaluating samples: 1it [00:06,  6.46s/it]

2025-10-22 23:01:18 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:01:20 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: d32ecc4b-d331-4e62-9ef2-ec61e6fc52bf) - 1 spans
2025-10-22 23:01:20 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: d32ecc4b-d331-4e62-9ef2-ec61e6fc52bf) - 1 spans
2025-10-22 23:01:20 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace d32ecc4b-d331-4e62-9ef2-ec61e6fc52bf
2025-10-22 23:01:20 - noveum_trace.transport.http_transport - INFO - ✅ Trace d32ecc4b-d331-4e62-9ef2-ec61e6fc52bf successfully queued for export


2025-10-22 23:01:20 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:01:22 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: 37646e1d-3bea-43ae-af16-71bf5ba46b9c) - 1 spans
2025-10-22 23:01:22 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: 37646e1d-3bea-43ae-af16-71bf5ba46b9c) - 1 spans
2025-10-22 23:01:22 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace 37646e1d-3bea-43ae-af16-71bf5ba46b9c
2025-10-22 23:01:22 - noveum_trace.transport.http_transport - INFO - ✅ Trace 37646e1d-3bea-43ae-af16-71bf5ba46b9c successfully queued for export


2025-10-22 23:01:22 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:01:22 - noveum_trace.transport.batch_processor - INFO - ⏰ TIMEOUT TRIGGER: Sending batch due to timeout (5.2s >= 5.0s)
2025-10-22 23:01:22 - noveum_trace.transport.batch_processor - INFO - 📤 SENDING BATCH: 3 traces via send_callback
2025-10-22 23:01:22 - noveum_trace.transport.http_transport - INFO - 🚀 SENDING BATCH: 3 traces to https://api.noveum.ai/api/v1/traces
2025-10-22 23:01:22 - noveum_trace.transport.http_transport - INFO - 📡 HTTP RESPONSE: Status 200 from https://api.noveum.ai/api/v1/traces
2025-10-22 23:01:22 - noveum_trace.transport.http_transport - INFO - ✅ Successfully sent batch of 3 traces
2025-10-22 23:01:22 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully sent batch of 3 traces via callback
2025-10-22 23:01:23 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: 309e9e0e-8ee6-46e4-86c2-60922d034adb) - 1 spans
2025-10-22 23:01:23 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEU

2025-10-22 23:01:23 - INFO - novaeval.evaluators.agent_evaluator - Saving intermediate results after 2 samples
2025-10-22 23:01:23 - INFO - novaeval.evaluators.agent_evaluator - Intermediate results saved to demo_results/agent.rag_evaluation_metrics_dataset/agent_evaluation_results.csv


Evaluating samples: 2it [00:10,  5.24s/it]

2025-10-22 23:01:23 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:01:24 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: 7876eb8f-cae3-442a-b676-588d0e06a56b) - 1 spans
2025-10-22 23:01:24 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: 7876eb8f-cae3-442a-b676-588d0e06a56b) - 1 spans
2025-10-22 23:01:24 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace 7876eb8f-cae3-442a-b676-588d0e06a56b
2025-10-22 23:01:24 - noveum_trace.transport.http_transport - INFO - ✅ Trace 7876eb8f-cae3-442a-b676-588d0e06a56b successfully queued for export


2025-10-22 23:01:24 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:01:26 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: a1a2f4ae-eaf9-4742-94d2-b2b650faf225) - 1 spans
2025-10-22 23:01:26 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: a1a2f4ae-eaf9-4742-94d2-b2b650faf225) - 1 spans
2025-10-22 23:01:26 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace a1a2f4ae-eaf9-4742-94d2-b2b650faf225
2025-10-22 23:01:26 - noveum_trace.transport.http_transport - INFO - ✅ Trace a1a2f4ae-eaf9-4742-94d2-b2b650faf225 successfully queued for export


2025-10-22 23:01:26 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:01:27 - noveum_trace.transport.batch_processor - INFO - ⏰ TIMEOUT TRIGGER: Sending batch due to timeout (5.1s >= 5.0s)
2025-10-22 23:01:27 - noveum_trace.transport.batch_processor - INFO - 📤 SENDING BATCH: 3 traces via send_callback
2025-10-22 23:01:27 - noveum_trace.transport.http_transport - INFO - 🚀 SENDING BATCH: 3 traces to https://api.noveum.ai/api/v1/traces
2025-10-22 23:01:27 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: 7fd55fac-bb86-4db1-9e20-e5ce69a772fb) - 1 spans
2025-10-22 23:01:27 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: 7fd55fac-bb86-4db1-9e20-e5ce69a772fb) - 1 spans
2025-10-22 23:01:27 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace 7fd55fac-bb86-4db1-9e20-e5ce69a772fb
2025-10-22 23:01:27 - noveum_trace.transport.http_transport - INFO - ✅ Trace 7fd55fac-bb86-4db1-9e20-e5ce69a772fb successfully queued for export


2025-10-22 23:01:27 - INFO - novaeval.evaluators.agent_evaluator - Saving intermediate results after 3 samples
2025-10-22 23:01:27 - INFO - novaeval.evaluators.agent_evaluator - Intermediate results saved to demo_results/agent.rag_evaluation_metrics_dataset/agent_evaluation_results.csv


Evaluating samples: 3it [00:15,  4.93s/it]

2025-10-22 23:01:27 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:01:28 - noveum_trace.transport.http_transport - INFO - 📡 HTTP RESPONSE: Status 200 from https://api.noveum.ai/api/v1/traces
2025-10-22 23:01:28 - noveum_trace.transport.http_transport - INFO - ✅ Successfully sent batch of 3 traces
2025-10-22 23:01:28 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully sent batch of 3 traces via callback
2025-10-22 23:01:28 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: 3e3b5c0c-04e1-4bf5-8ea2-c70387c8e327) - 1 spans
2025-10-22 23:01:28 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: 3e3b5c0c-04e1-4bf5-8ea2-c70387c8e327) - 1 spans
2025-10-22 23:01:28 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace 3e3b5c0c-04e1-4bf5-8ea2-c70387c8e327
2025-10-22 23:01:28 - noveum_trace.transport.http_transport - INFO - ✅ Trace 3e3b5c0c-04e1-4bf5-8ea2-c70387c8e327 successfully queued for export


2025-10-22 23:01:28 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:01:30 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: 3e9b223d-42c3-4f70-8541-e8b7c27201fc) - 1 spans
2025-10-22 23:01:30 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: 3e9b223d-42c3-4f70-8541-e8b7c27201fc) - 1 spans
2025-10-22 23:01:30 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace 3e9b223d-42c3-4f70-8541-e8b7c27201fc
2025-10-22 23:01:30 - noveum_trace.transport.http_transport - INFO - ✅ Trace 3e9b223d-42c3-4f70-8541-e8b7c27201fc successfully queued for export


2025-10-22 23:01:30 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:01:31 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: f822ee4c-33fa-4a46-8a1c-a9fd7a749983) - 1 spans
2025-10-22 23:01:31 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: f822ee4c-33fa-4a46-8a1c-a9fd7a749983) - 1 spans
2025-10-22 23:01:31 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace f822ee4c-33fa-4a46-8a1c-a9fd7a749983
2025-10-22 23:01:31 - noveum_trace.transport.http_transport - INFO - ✅ Trace f822ee4c-33fa-4a46-8a1c-a9fd7a749983 successfully queued for export


2025-10-22 23:01:31 - INFO - novaeval.evaluators.agent_evaluator - Saving intermediate results after 4 samples
2025-10-22 23:01:31 - INFO - novaeval.evaluators.agent_evaluator - Intermediate results saved to demo_results/agent.rag_evaluation_metrics_dataset/agent_evaluation_results.csv


Evaluating samples: 4it [00:18,  4.36s/it]

2025-10-22 23:01:31 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:01:32 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: b9ec0566-acc2-4c49-8fa4-fa58362b7056) - 1 spans
2025-10-22 23:01:32 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: b9ec0566-acc2-4c49-8fa4-fa58362b7056) - 1 spans
2025-10-22 23:01:32 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace b9ec0566-acc2-4c49-8fa4-fa58362b7056
2025-10-22 23:01:32 - noveum_trace.transport.http_transport - INFO - ✅ Trace b9ec0566-acc2-4c49-8fa4-fa58362b7056 successfully queued for export


2025-10-22 23:01:32 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:01:32 - noveum_trace.transport.batch_processor - INFO - ⏰ TIMEOUT TRIGGER: Sending batch due to timeout (5.3s >= 5.0s)
2025-10-22 23:01:32 - noveum_trace.transport.batch_processor - INFO - 📤 SENDING BATCH: 5 traces via send_callback
2025-10-22 23:01:32 - noveum_trace.transport.http_transport - INFO - 🚀 SENDING BATCH: 5 traces to https://api.noveum.ai/api/v1/traces
2025-10-22 23:01:33 - noveum_trace.transport.http_transport - INFO - 📡 HTTP RESPONSE: Status 200 from https://api.noveum.ai/api/v1/traces
2025-10-22 23:01:33 - noveum_trace.transport.http_transport - INFO - ✅ Successfully sent batch of 5 traces
2025-10-22 23:01:33 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully sent batch of 5 traces via callback
2025-10-22 23:01:34 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: 4a7712cd-d5db-43d4-b9f9-3bf2854fd36d) - 1 spans
2025-10-22 23:01:34 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEU

2025-10-22 23:01:34 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:01:35 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: d3a4f8d6-9ac8-4312-982f-ed0ba66a9792) - 1 spans
2025-10-22 23:01:35 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: d3a4f8d6-9ac8-4312-982f-ed0ba66a9792) - 1 spans
2025-10-22 23:01:35 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace d3a4f8d6-9ac8-4312-982f-ed0ba66a9792
2025-10-22 23:01:35 - noveum_trace.transport.http_transport - INFO - ✅ Trace d3a4f8d6-9ac8-4312-982f-ed0ba66a9792 successfully queued for export


2025-10-22 23:01:35 - INFO - novaeval.evaluators.agent_evaluator - Saving intermediate results after 5 samples
2025-10-22 23:01:35 - INFO - novaeval.evaluators.agent_evaluator - Intermediate results saved to demo_results/agent.rag_evaluation_metrics_dataset/agent_evaluation_results.csv


Evaluating samples: 5it [00:22,  4.25s/it]

2025-10-22 23:01:35 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:01:36 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: 089525d7-5531-478e-b4f3-fec4d1ac533d) - 1 spans
2025-10-22 23:01:36 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: 089525d7-5531-478e-b4f3-fec4d1ac533d) - 1 spans
2025-10-22 23:01:36 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace 089525d7-5531-478e-b4f3-fec4d1ac533d
2025-10-22 23:01:36 - noveum_trace.transport.http_transport - INFO - ✅ Trace 089525d7-5531-478e-b4f3-fec4d1ac533d successfully queued for export


2025-10-22 23:01:36 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:01:37 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: 2214e8fb-a31c-4111-8735-0c6c8057a66c) - 1 spans
2025-10-22 23:01:37 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: 2214e8fb-a31c-4111-8735-0c6c8057a66c) - 1 spans
2025-10-22 23:01:37 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace 2214e8fb-a31c-4111-8735-0c6c8057a66c
2025-10-22 23:01:37 - noveum_trace.transport.http_transport - INFO - ✅ Trace 2214e8fb-a31c-4111-8735-0c6c8057a66c successfully queued for export


2025-10-22 23:01:37 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:01:38 - noveum_trace.transport.batch_processor - INFO - ⏰ TIMEOUT TRIGGER: Sending batch due to timeout (5.3s >= 5.0s)
2025-10-22 23:01:38 - noveum_trace.transport.batch_processor - INFO - 📤 SENDING BATCH: 4 traces via send_callback
2025-10-22 23:01:38 - noveum_trace.transport.http_transport - INFO - 🚀 SENDING BATCH: 4 traces to https://api.noveum.ai/api/v1/traces
2025-10-22 23:01:38 - noveum_trace.transport.http_transport - INFO - 📡 HTTP RESPONSE: Status 200 from https://api.noveum.ai/api/v1/traces
2025-10-22 23:01:38 - noveum_trace.transport.http_transport - INFO - ✅ Successfully sent batch of 4 traces
2025-10-22 23:01:38 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully sent batch of 4 traces via callback
2025-10-22 23:01:38 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: 4d917e77-4f61-46e5-8aed-f705e23cf42b) - 1 spans
2025-10-22 23:01:38 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEU

2025-10-22 23:01:38 - INFO - novaeval.evaluators.agent_evaluator - Saving intermediate results after 6 samples
2025-10-22 23:01:38 - INFO - novaeval.evaluators.agent_evaluator - Intermediate results saved to demo_results/agent.rag_evaluation_metrics_dataset/agent_evaluation_results.csv


Evaluating samples: 6it [00:26,  4.03s/it]

2025-10-22 23:01:38 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:01:40 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: f08a5cf5-f699-499a-bf6f-7c5ab49bee74) - 1 spans
2025-10-22 23:01:40 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: f08a5cf5-f699-499a-bf6f-7c5ab49bee74) - 1 spans
2025-10-22 23:01:40 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace f08a5cf5-f699-499a-bf6f-7c5ab49bee74
2025-10-22 23:01:40 - noveum_trace.transport.http_transport - INFO - ✅ Trace f08a5cf5-f699-499a-bf6f-7c5ab49bee74 successfully queued for export


2025-10-22 23:01:40 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:01:41 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: 22fad5c7-5c05-46dc-be41-410da97ae840) - 1 spans
2025-10-22 23:01:41 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: 22fad5c7-5c05-46dc-be41-410da97ae840) - 1 spans
2025-10-22 23:01:41 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace 22fad5c7-5c05-46dc-be41-410da97ae840
2025-10-22 23:01:41 - noveum_trace.transport.http_transport - INFO - ✅ Trace 22fad5c7-5c05-46dc-be41-410da97ae840 successfully queued for export


2025-10-22 23:01:41 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:01:42 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: 81742ed5-9524-4203-aaf6-837045f5d1ce) - 1 spans
2025-10-22 23:01:42 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: 81742ed5-9524-4203-aaf6-837045f5d1ce) - 1 spans
2025-10-22 23:01:42 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace 81742ed5-9524-4203-aaf6-837045f5d1ce
2025-10-22 23:01:42 - noveum_trace.transport.http_transport - INFO - ✅ Trace 81742ed5-9524-4203-aaf6-837045f5d1ce successfully queued for export


2025-10-22 23:01:42 - INFO - novaeval.evaluators.agent_evaluator - Saving intermediate results after 7 samples
2025-10-22 23:01:42 - INFO - novaeval.evaluators.agent_evaluator - Intermediate results saved to demo_results/agent.rag_evaluation_metrics_dataset/agent_evaluation_results.csv


Evaluating samples: 7it [00:30,  3.93s/it]

2025-10-22 23:01:42 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:01:43 - noveum_trace.transport.batch_processor - INFO - ⏰ TIMEOUT TRIGGER: Sending batch due to timeout (5.4s >= 5.0s)
2025-10-22 23:01:43 - noveum_trace.transport.batch_processor - INFO - 📤 SENDING BATCH: 4 traces via send_callback
2025-10-22 23:01:43 - noveum_trace.transport.http_transport - INFO - 🚀 SENDING BATCH: 4 traces to https://api.noveum.ai/api/v1/traces
2025-10-22 23:01:43 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: 8dd29a67-8d57-4f95-b0bd-9c1fa22f55da) - 1 spans
2025-10-22 23:01:43 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: 8dd29a67-8d57-4f95-b0bd-9c1fa22f55da) - 1 spans
2025-10-22 23:01:43 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace 8dd29a67-8d57-4f95-b0bd-9c1fa22f55da
2025-10-22 23:01:43 - noveum_trace.transport.http_transport - INFO - ✅ Trace 8dd29a67-8d57-4f95-b0bd-9c1fa22f55da successfully queued for export


2025-10-22 23:01:43 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:01:44 - noveum_trace.transport.http_transport - INFO - 📡 HTTP RESPONSE: Status 200 from https://api.noveum.ai/api/v1/traces
2025-10-22 23:01:44 - noveum_trace.transport.http_transport - INFO - ✅ Successfully sent batch of 4 traces
2025-10-22 23:01:44 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully sent batch of 4 traces via callback
2025-10-22 23:01:45 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: ae9c03e4-9a18-4c40-a616-34ba6df7216e) - 1 spans
2025-10-22 23:01:45 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: ae9c03e4-9a18-4c40-a616-34ba6df7216e) - 1 spans
2025-10-22 23:01:45 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace ae9c03e4-9a18-4c40-a616-34ba6df7216e
2025-10-22 23:01:45 - noveum_trace.transport.http_transport - INFO - ✅ Trace ae9c03e4-9a18-4c40-a616-34ba6df7216e successfully queued for export


2025-10-22 23:01:45 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:01:46 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: aefaba52-0c8a-428c-af02-72f1145c5c99) - 1 spans
2025-10-22 23:01:46 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: aefaba52-0c8a-428c-af02-72f1145c5c99) - 1 spans
2025-10-22 23:01:46 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace aefaba52-0c8a-428c-af02-72f1145c5c99
2025-10-22 23:01:46 - noveum_trace.transport.http_transport - INFO - ✅ Trace aefaba52-0c8a-428c-af02-72f1145c5c99 successfully queued for export


2025-10-22 23:01:46 - INFO - novaeval.evaluators.agent_evaluator - Saving intermediate results after 8 samples
2025-10-22 23:01:46 - INFO - novaeval.evaluators.agent_evaluator - Intermediate results saved to demo_results/agent.rag_evaluation_metrics_dataset/agent_evaluation_results.csv


Evaluating samples: 8it [00:34,  3.87s/it]

2025-10-22 23:01:46 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:01:47 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: 6d0b6d2c-d570-40cb-8d9d-c0a1187d7ec5) - 1 spans
2025-10-22 23:01:47 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: 6d0b6d2c-d570-40cb-8d9d-c0a1187d7ec5) - 1 spans
2025-10-22 23:01:47 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace 6d0b6d2c-d570-40cb-8d9d-c0a1187d7ec5
2025-10-22 23:01:47 - noveum_trace.transport.http_transport - INFO - ✅ Trace 6d0b6d2c-d570-40cb-8d9d-c0a1187d7ec5 successfully queued for export


2025-10-22 23:01:47 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:01:49 - noveum_trace.transport.batch_processor - INFO - ⏰ TIMEOUT TRIGGER: Sending batch due to timeout (5.4s >= 5.0s)
2025-10-22 23:01:49 - noveum_trace.transport.batch_processor - INFO - 📤 SENDING BATCH: 4 traces via send_callback
2025-10-22 23:01:49 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: af1cb192-c53d-4c68-bcf1-63ae80a319ab) - 1 spans
2025-10-22 23:01:49 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: af1cb192-c53d-4c68-bcf1-63ae80a319ab) - 1 spans
2025-10-22 23:01:49 - noveum_trace.transport.http_transport - INFO - 🚀 SENDING BATCH: 4 traces to https://api.noveum.ai/api/v1/traces
2025-10-22 23:01:49 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace af1cb192-c53d-4c68-bcf1-63ae80a319ab
2025-10-22 23:01:49 - noveum_trace.transport.http_transport - INFO - ✅ Trace af1cb192-c53d-4c68-bcf1-63ae80a319ab successfully queued for export


2025-10-22 23:01:49 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:01:49 - noveum_trace.transport.http_transport - INFO - 📡 HTTP RESPONSE: Status 200 from https://api.noveum.ai/api/v1/traces
2025-10-22 23:01:49 - noveum_trace.transport.http_transport - INFO - ✅ Successfully sent batch of 4 traces
2025-10-22 23:01:49 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully sent batch of 4 traces via callback
2025-10-22 23:01:50 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: 6dc7d782-6f29-481b-9738-d22d793c9d53) - 1 spans
2025-10-22 23:01:50 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: 6dc7d782-6f29-481b-9738-d22d793c9d53) - 1 spans
2025-10-22 23:01:50 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace 6dc7d782-6f29-481b-9738-d22d793c9d53
2025-10-22 23:01:50 - noveum_trace.transport.http_transport - INFO - ✅ Trace 6dc7d782-6f29-481b-9738-d22d793c9d53 successfully queued for export


2025-10-22 23:01:50 - INFO - novaeval.evaluators.agent_evaluator - Saving intermediate results after 9 samples
2025-10-22 23:01:50 - INFO - novaeval.evaluators.agent_evaluator - Intermediate results saved to demo_results/agent.rag_evaluation_metrics_dataset/agent_evaluation_results.csv


Evaluating samples: 9it [00:38,  3.94s/it]

2025-10-22 23:01:50 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:01:51 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: 79ae16c8-3b0e-4e3c-bc10-9eae81586163) - 1 spans
2025-10-22 23:01:51 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: 79ae16c8-3b0e-4e3c-bc10-9eae81586163) - 1 spans
2025-10-22 23:01:51 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace 79ae16c8-3b0e-4e3c-bc10-9eae81586163
2025-10-22 23:01:51 - noveum_trace.transport.http_transport - INFO - ✅ Trace 79ae16c8-3b0e-4e3c-bc10-9eae81586163 successfully queued for export


2025-10-22 23:01:51 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:01:52 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: e69630b6-5e34-4ae7-9195-929d04ef31a5) - 1 spans
2025-10-22 23:01:52 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: e69630b6-5e34-4ae7-9195-929d04ef31a5) - 1 spans
2025-10-22 23:01:52 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace e69630b6-5e34-4ae7-9195-929d04ef31a5
2025-10-22 23:01:52 - noveum_trace.transport.http_transport - INFO - ✅ Trace e69630b6-5e34-4ae7-9195-929d04ef31a5 successfully queued for export


2025-10-22 23:01:52 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:01:53 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: ef939d49-86d5-4ab4-b8a3-918e10f86639) - 1 spans
2025-10-22 23:01:53 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: ef939d49-86d5-4ab4-b8a3-918e10f86639) - 1 spans
2025-10-22 23:01:53 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace ef939d49-86d5-4ab4-b8a3-918e10f86639
2025-10-22 23:01:53 - noveum_trace.transport.http_transport - INFO - ✅ Trace ef939d49-86d5-4ab4-b8a3-918e10f86639 successfully queued for export


2025-10-22 23:01:53 - INFO - novaeval.evaluators.agent_evaluator - Saving intermediate results after 10 samples
2025-10-22 23:01:53 - INFO - novaeval.evaluators.agent_evaluator - Intermediate results saved to demo_results/agent.rag_evaluation_metrics_dataset/agent_evaluation_results.csv


Evaluating samples: 10it [00:41,  3.77s/it]

2025-10-22 23:01:53 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:01:54 - noveum_trace.transport.batch_processor - INFO - ⏰ TIMEOUT TRIGGER: Sending batch due to timeout (5.4s >= 5.0s)
2025-10-22 23:01:54 - noveum_trace.transport.batch_processor - INFO - 📤 SENDING BATCH: 5 traces via send_callback
2025-10-22 23:01:54 - noveum_trace.transport.http_transport - INFO - 🚀 SENDING BATCH: 5 traces to https://api.noveum.ai/api/v1/traces
2025-10-22 23:01:54 - noveum_trace.transport.http_transport - INFO - 📡 HTTP RESPONSE: Status 200 from https://api.noveum.ai/api/v1/traces
2025-10-22 23:01:54 - noveum_trace.transport.http_transport - INFO - ✅ Successfully sent batch of 5 traces
2025-10-22 23:01:54 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully sent batch of 5 traces via callback
2025-10-22 23:01:57 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: 926cf84f-fcb5-4e44-b36b-dfa5f9c28c55) - 1 spans
2025-10-22 23:01:57 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEU

2025-10-22 23:01:57 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:01:58 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: 5bb8255c-0f41-4735-a818-52f6115b7de1) - 1 spans
2025-10-22 23:01:58 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: 5bb8255c-0f41-4735-a818-52f6115b7de1) - 1 spans
2025-10-22 23:01:58 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace 5bb8255c-0f41-4735-a818-52f6115b7de1
2025-10-22 23:01:58 - noveum_trace.transport.http_transport - INFO - ✅ Trace 5bb8255c-0f41-4735-a818-52f6115b7de1 successfully queued for export


2025-10-22 23:01:58 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:01:59 - noveum_trace.transport.batch_processor - INFO - ⏰ TIMEOUT TRIGGER: Sending batch due to timeout (5.5s >= 5.0s)
2025-10-22 23:01:59 - noveum_trace.transport.batch_processor - INFO - 📤 SENDING BATCH: 2 traces via send_callback
2025-10-22 23:01:59 - noveum_trace.transport.http_transport - INFO - 🚀 SENDING BATCH: 2 traces to https://api.noveum.ai/api/v1/traces
2025-10-22 23:02:00 - noveum_trace.transport.http_transport - INFO - 📡 HTTP RESPONSE: Status 200 from https://api.noveum.ai/api/v1/traces
2025-10-22 23:02:00 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: 96472f81-b98d-4453-97e8-b0000f4ac7b6) - 1 spans
2025-10-22 23:02:00 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: 96472f81-b98d-4453-97e8-b0000f4ac7b6) - 1 spans
2025-10-22 23:02:00 - noveum_trace.transport.http_transport - INFO - ✅ Successfully sent batch of 2 traces
2025-10-22 23:02:00 - noveum_trace.transport.

2025-10-22 23:02:00 - INFO - novaeval.evaluators.agent_evaluator - Saving intermediate results after 11 samples
2025-10-22 23:02:00 - INFO - novaeval.evaluators.agent_evaluator - Intermediate results saved to demo_results/agent.rag_evaluation_metrics_dataset/agent_evaluation_results.csv


Evaluating samples: 11it [00:47,  4.60s/it]

2025-10-22 23:02:00 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:02:02 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: 3ec56361-a2aa-45bd-ad9e-12139cd5b25b) - 1 spans
2025-10-22 23:02:02 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: 3ec56361-a2aa-45bd-ad9e-12139cd5b25b) - 1 spans
2025-10-22 23:02:02 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace 3ec56361-a2aa-45bd-ad9e-12139cd5b25b
2025-10-22 23:02:02 - noveum_trace.transport.http_transport - INFO - ✅ Trace 3ec56361-a2aa-45bd-ad9e-12139cd5b25b successfully queued for export


2025-10-22 23:02:02 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:02:03 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: bd7c6734-9502-43af-9227-7e3256eac18c) - 1 spans
2025-10-22 23:02:03 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: bd7c6734-9502-43af-9227-7e3256eac18c) - 1 spans
2025-10-22 23:02:03 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace bd7c6734-9502-43af-9227-7e3256eac18c
2025-10-22 23:02:03 - noveum_trace.transport.http_transport - INFO - ✅ Trace bd7c6734-9502-43af-9227-7e3256eac18c successfully queued for export


2025-10-22 23:02:03 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:02:05 - noveum_trace.transport.batch_processor - INFO - ⏰ TIMEOUT TRIGGER: Sending batch due to timeout (5.5s >= 5.0s)
2025-10-22 23:02:05 - noveum_trace.transport.batch_processor - INFO - 📤 SENDING BATCH: 3 traces via send_callback
2025-10-22 23:02:05 - noveum_trace.transport.http_transport - INFO - 🚀 SENDING BATCH: 3 traces to https://api.noveum.ai/api/v1/traces
2025-10-22 23:02:05 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: 8c77b5d2-4e09-4e86-89db-9c4432c6a8f9) - 1 spans
2025-10-22 23:02:05 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: 8c77b5d2-4e09-4e86-89db-9c4432c6a8f9) - 1 spans
2025-10-22 23:02:05 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace 8c77b5d2-4e09-4e86-89db-9c4432c6a8f9
2025-10-22 23:02:05 - noveum_trace.transport.http_transport - INFO - ✅ Trace 8c77b5d2-4e09-4e86-89db-9c4432c6a8f9 successfully queued for export


2025-10-22 23:02:05 - INFO - novaeval.evaluators.agent_evaluator - Saving intermediate results after 12 samples
2025-10-22 23:02:05 - INFO - novaeval.evaluators.agent_evaluator - Intermediate results saved to demo_results/agent.rag_evaluation_metrics_dataset/agent_evaluation_results.csv


Evaluating samples: 12it [00:52,  4.42s/it]

2025-10-22 23:02:05 - INFO - novaeval.evaluators.agent_evaluator - Saving final results
2025-10-22 23:02:05 - INFO - novaeval.evaluators.agent_evaluator - Reloaded 12 results from CSV
2025-10-22 23:02:05 - INFO - novaeval.evaluators.agent_evaluator - Agent evaluation completed

✅ Evaluation completed!

📊 Results Summary:
  - task_progression: 4.27
  - context_relevancy: 7.90
  - role_adherence: 9.08
  - tool_relevancy: 0.00
  - parameter_correctness: 0.00

🔍 Individual Scores:

  Record 1 (Task: f1f37bd7-0851-4659-b493-b80d3800d920):
    - task_progression: 3.8
    - context_relevancy: 7.8
    - role_adherence: 9.0
    - tool_relevancy: 0.0
    - parameter_correctness: 0.0

  Record 2 (Task: 52aacb67-c361-4445-9b72-c157f79f47d6):
    - task_progression: 2.8
    - context_relevancy: 7.8
    - role_adherence: 9.0
    - tool_relevancy: 0.0
    - parameter_correctness: 0.0

  Record 3 (Task: 2218f641-604c-491a-9710-b51a9941b982):
    - task_progression: 4.5
    - context_relevancy: 7.9
   


Evaluating samples: 0it [00:00, ?it/s]

2025-10-22 23:02:05 - INFO - novaeval.evaluators.agent_evaluator - Saving final results
2025-10-22 23:02:05 - INFO - novaeval.evaluators.agent_evaluator - Agent evaluation completed

✅ Evaluation completed!
❌ Results file not found
❌ Evaluation failed

📋 Step 7: Exporting Dataset
💾 Exporting processed dataset...
✅ Exported to ./processed_datasets/agent.routing_evaluation_metrics_dataset_processed_dataset.json
✅ Exported to ./processed_datasets/agent.routing_evaluation_metrics_dataset_processed_dataset.csv
✅ Export completed successfully!

🎉 EVALUATION PIPELINE COMPLETED!
📊 Final Results:
  - File processed: split_datasets/agent.routing_evaluation_metrics_dataset.json
  - Spans loaded: 20
  - Dataset size: 20
  - Evaluation completed: False
  - Export successful: True
  - Errors encountered: 1
Completed agent.routing_evaluation_metrics_dataset.json

Processing agent.llm-rag_dataset.json...
🚀 Starting Complete Agent Evaluation Pipeline
📁 Processing file: split_datasets/agent.llm-rag_data


Evaluating samples: 0it [00:00, ?it/s]

2025-10-22 23:02:05 - INFO - novaeval.evaluators.agent_evaluator - Saving final results
2025-10-22 23:02:05 - INFO - novaeval.evaluators.agent_evaluator - Agent evaluation completed

✅ Evaluation completed!
❌ Results file not found
❌ Evaluation failed

📋 Step 7: Exporting Dataset
💾 Exporting processed dataset...
✅ Exported to ./processed_datasets/agent.llm-rag_dataset_processed_dataset.json
✅ Exported to ./processed_datasets/agent.llm-rag_dataset_processed_dataset.csv
✅ Export completed successfully!

🎉 EVALUATION PIPELINE COMPLETED!
📊 Final Results:
  - File processed: split_datasets/agent.llm-rag_dataset.json
  - Spans loaded: 12
  - Dataset size: 12
  - Evaluation completed: False
  - Export successful: True
  - Errors encountered: 1
Completed agent.llm-rag_dataset.json

Processing agent.web_search_evaluation_metrics_dataset.json...
🚀 Starting Complete Agent Evaluation Pipeline
📁 Processing file: split_datasets/agent.web_search_evaluation_metrics_dataset.json

📋 Step 1: Environment 




✅ Gemini model initialized
✅ Initialized 5 scoring functions:
  - task_progression_scorer
  - context_relevancy_scorer
  - role_adherence_scorer
  - tool_relevancy_scorer
  - parameter_correctness_scorer

✅ AgentEvaluator created with Gemini model and scoring functions
✅ Evaluation components ready!

📋 Step 6: Running Evaluation
🎯 Evaluating 25 samples...
🚀 Running evaluation on sample data...

📊 Evaluating 8 sample records...
2025-10-22 23:02:05 - INFO - novaeval.evaluators.agent_evaluator - Starting agent evaluation process


Evaluating samples: 0it [00:00, ?it/s]

2025-10-22 23:02:05 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:02:05 - noveum_trace.transport.http_transport - INFO - 📡 HTTP RESPONSE: Status 200 from https://api.noveum.ai/api/v1/traces
2025-10-22 23:02:05 - noveum_trace.transport.http_transport - INFO - ✅ Successfully sent batch of 3 traces
2025-10-22 23:02:05 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully sent batch of 3 traces via callback
2025-10-22 23:02:07 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: e7588539-fcf8-44a3-9f31-efe82ba97700) - 1 spans
2025-10-22 23:02:07 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: e7588539-fcf8-44a3-9f31-efe82ba97700) - 1 spans
2025-10-22 23:02:07 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace e7588539-fcf8-44a3-9f31-efe82ba97700
2025-10-22 23:02:07 - noveum_trace.transport.http_transport - INFO - ✅ Trace e7588539-fcf8-44a3-9f31-efe82ba97700 successfully queued for export


2025-10-22 23:02:07 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:02:08 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: a47d3d33-371b-4045-9fb1-378424928ada) - 1 spans
2025-10-22 23:02:08 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: a47d3d33-371b-4045-9fb1-378424928ada) - 1 spans
2025-10-22 23:02:08 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace a47d3d33-371b-4045-9fb1-378424928ada
2025-10-22 23:02:08 - noveum_trace.transport.http_transport - INFO - ✅ Trace a47d3d33-371b-4045-9fb1-378424928ada successfully queued for export


2025-10-22 23:02:08 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:02:09 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: 69314585-f591-4d20-ba7c-81dcd5fa7aee) - 1 spans
2025-10-22 23:02:09 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: 69314585-f591-4d20-ba7c-81dcd5fa7aee) - 1 spans
2025-10-22 23:02:09 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace 69314585-f591-4d20-ba7c-81dcd5fa7aee
2025-10-22 23:02:09 - noveum_trace.transport.http_transport - INFO - ✅ Trace 69314585-f591-4d20-ba7c-81dcd5fa7aee successfully queued for export


2025-10-22 23:02:09 - INFO - novaeval.evaluators.agent_evaluator - Saving intermediate results after 1 samples
2025-10-22 23:02:09 - INFO - novaeval.evaluators.agent_evaluator - Intermediate results saved to demo_results/agent.web_search_evaluation_metrics_dataset/agent_evaluation_results.csv


Evaluating samples: 1it [00:04,  4.12s/it]

2025-10-22 23:02:09 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:02:10 - noveum_trace.transport.batch_processor - INFO - ⏰ TIMEOUT TRIGGER: Sending batch due to timeout (5.4s >= 5.0s)
2025-10-22 23:02:10 - noveum_trace.transport.batch_processor - INFO - 📤 SENDING BATCH: 4 traces via send_callback
2025-10-22 23:02:10 - noveum_trace.transport.http_transport - INFO - 🚀 SENDING BATCH: 4 traces to https://api.noveum.ai/api/v1/traces
2025-10-22 23:02:10 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: a05b5d92-06f6-4fec-9673-f33c267ef809) - 1 spans
2025-10-22 23:02:10 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: a05b5d92-06f6-4fec-9673-f33c267ef809) - 1 spans
2025-10-22 23:02:10 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace a05b5d92-06f6-4fec-9673-f33c267ef809
2025-10-22 23:02:10 - noveum_trace.transport.http_transport - INFO - ✅ Trace a05b5d92-06f6-4fec-9673-f33c267ef809 successfully queued for export


2025-10-22 23:02:10 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:02:11 - noveum_trace.transport.http_transport - INFO - 📡 HTTP RESPONSE: Status 200 from https://api.noveum.ai/api/v1/traces
2025-10-22 23:02:11 - noveum_trace.transport.http_transport - INFO - ✅ Successfully sent batch of 4 traces
2025-10-22 23:02:11 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully sent batch of 4 traces via callback
2025-10-22 23:02:12 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: 032c73be-2c3f-4980-9cdf-762de61c437e) - 1 spans
2025-10-22 23:02:12 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: 032c73be-2c3f-4980-9cdf-762de61c437e) - 1 spans
2025-10-22 23:02:12 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace 032c73be-2c3f-4980-9cdf-762de61c437e
2025-10-22 23:02:12 - noveum_trace.transport.http_transport - INFO - ✅ Trace 032c73be-2c3f-4980-9cdf-762de61c437e successfully queued for export


2025-10-22 23:02:12 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:02:13 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: 4ccea784-7905-4081-b536-1a6fade9180f) - 1 spans
2025-10-22 23:02:13 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: 4ccea784-7905-4081-b536-1a6fade9180f) - 1 spans
2025-10-22 23:02:13 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace 4ccea784-7905-4081-b536-1a6fade9180f
2025-10-22 23:02:13 - noveum_trace.transport.http_transport - INFO - ✅ Trace 4ccea784-7905-4081-b536-1a6fade9180f successfully queued for export


2025-10-22 23:02:13 - INFO - novaeval.evaluators.agent_evaluator - Saving intermediate results after 2 samples
2025-10-22 23:02:13 - INFO - novaeval.evaluators.agent_evaluator - Intermediate results saved to demo_results/agent.web_search_evaluation_metrics_dataset/agent_evaluation_results.csv


Evaluating samples: 2it [00:07,  3.79s/it]

2025-10-22 23:02:13 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:02:14 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: 4a768f59-41a4-4da7-b4fb-1530bd85f1bc) - 1 spans
2025-10-22 23:02:14 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: 4a768f59-41a4-4da7-b4fb-1530bd85f1bc) - 1 spans
2025-10-22 23:02:14 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace 4a768f59-41a4-4da7-b4fb-1530bd85f1bc
2025-10-22 23:02:14 - noveum_trace.transport.http_transport - INFO - ✅ Trace 4a768f59-41a4-4da7-b4fb-1530bd85f1bc successfully queued for export


2025-10-22 23:02:14 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:02:15 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: 490c1659-4aca-401f-bf9d-34891b18717a) - 1 spans
2025-10-22 23:02:15 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: 490c1659-4aca-401f-bf9d-34891b18717a) - 1 spans
2025-10-22 23:02:15 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace 490c1659-4aca-401f-bf9d-34891b18717a
2025-10-22 23:02:15 - noveum_trace.transport.http_transport - INFO - ✅ Trace 490c1659-4aca-401f-bf9d-34891b18717a successfully queued for export


2025-10-22 23:02:15 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:02:16 - noveum_trace.transport.batch_processor - INFO - ⏰ TIMEOUT TRIGGER: Sending batch due to timeout (5.5s >= 5.0s)
2025-10-22 23:02:16 - noveum_trace.transport.batch_processor - INFO - 📤 SENDING BATCH: 5 traces via send_callback
2025-10-22 23:02:16 - noveum_trace.transport.http_transport - INFO - 🚀 SENDING BATCH: 5 traces to https://api.noveum.ai/api/v1/traces
2025-10-22 23:02:16 - noveum_trace.transport.http_transport - INFO - 📡 HTTP RESPONSE: Status 200 from https://api.noveum.ai/api/v1/traces
2025-10-22 23:02:16 - noveum_trace.transport.http_transport - INFO - ✅ Successfully sent batch of 5 traces
2025-10-22 23:02:16 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully sent batch of 5 traces via callback
2025-10-22 23:02:17 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: c83bf958-6430-47bc-9407-5933e25f4892) - 1 spans
2025-10-22 23:02:17 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEU

2025-10-22 23:02:17 - INFO - novaeval.evaluators.agent_evaluator - Saving intermediate results after 3 samples
2025-10-22 23:02:17 - INFO - novaeval.evaluators.agent_evaluator - Intermediate results saved to demo_results/agent.web_search_evaluation_metrics_dataset/agent_evaluation_results.csv


Evaluating samples: 3it [00:11,  3.78s/it]

2025-10-22 23:02:17 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:02:18 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: eacc2b6e-8d42-4c12-82df-b0298723c50e) - 1 spans
2025-10-22 23:02:18 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: eacc2b6e-8d42-4c12-82df-b0298723c50e) - 1 spans
2025-10-22 23:02:18 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace eacc2b6e-8d42-4c12-82df-b0298723c50e
2025-10-22 23:02:18 - noveum_trace.transport.http_transport - INFO - ✅ Trace eacc2b6e-8d42-4c12-82df-b0298723c50e successfully queued for export


2025-10-22 23:02:18 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:02:19 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: 1f3f745f-2735-4197-b27d-0e54cd4fdd9b) - 1 spans
2025-10-22 23:02:19 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: 1f3f745f-2735-4197-b27d-0e54cd4fdd9b) - 1 spans
2025-10-22 23:02:19 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace 1f3f745f-2735-4197-b27d-0e54cd4fdd9b
2025-10-22 23:02:19 - noveum_trace.transport.http_transport - INFO - ✅ Trace 1f3f745f-2735-4197-b27d-0e54cd4fdd9b successfully queued for export


2025-10-22 23:02:19 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:02:20 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: 04df8130-9aa6-4c7c-b184-df6b8d827620) - 1 spans
2025-10-22 23:02:20 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: 04df8130-9aa6-4c7c-b184-df6b8d827620) - 1 spans
2025-10-22 23:02:21 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace 04df8130-9aa6-4c7c-b184-df6b8d827620
2025-10-22 23:02:21 - noveum_trace.transport.http_transport - INFO - ✅ Trace 04df8130-9aa6-4c7c-b184-df6b8d827620 successfully queued for export


2025-10-22 23:02:21 - INFO - novaeval.evaluators.agent_evaluator - Saving intermediate results after 4 samples
2025-10-22 23:02:21 - INFO - novaeval.evaluators.agent_evaluator - Intermediate results saved to demo_results/agent.web_search_evaluation_metrics_dataset/agent_evaluation_results.csv


Evaluating samples: 4it [00:15,  3.84s/it]

2025-10-22 23:02:21 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:02:21 - noveum_trace.transport.batch_processor - INFO - ⏰ TIMEOUT TRIGGER: Sending batch due to timeout (5.2s >= 5.0s)
2025-10-22 23:02:21 - noveum_trace.transport.batch_processor - INFO - 📤 SENDING BATCH: 4 traces via send_callback
2025-10-22 23:02:21 - noveum_trace.transport.http_transport - INFO - 🚀 SENDING BATCH: 4 traces to https://api.noveum.ai/api/v1/traces
2025-10-22 23:02:22 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: 25522bcf-a4e6-4f72-b6b1-be6e12c6b628) - 1 spans
2025-10-22 23:02:22 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: 25522bcf-a4e6-4f72-b6b1-be6e12c6b628) - 1 spans
2025-10-22 23:02:22 - noveum_trace.transport.http_transport - INFO - 📡 HTTP RESPONSE: Status 200 from https://api.noveum.ai/api/v1/traces
2025-10-22 23:02:22 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace 25522bcf-a4e6-4f72-b6b1-be6e12c6b628
2025-10-22 23:02:

2025-10-22 23:02:22 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:02:23 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: 8d0257b7-8b4a-43d2-b3f8-c331f9442e9a) - 1 spans
2025-10-22 23:02:23 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: 8d0257b7-8b4a-43d2-b3f8-c331f9442e9a) - 1 spans
2025-10-22 23:02:23 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace 8d0257b7-8b4a-43d2-b3f8-c331f9442e9a
2025-10-22 23:02:23 - noveum_trace.transport.http_transport - INFO - ✅ Trace 8d0257b7-8b4a-43d2-b3f8-c331f9442e9a successfully queued for export


2025-10-22 23:02:23 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:02:26 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: df6eff5c-8fa9-4c0a-9f8b-4fb5eb71694a) - 1 spans
2025-10-22 23:02:26 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: df6eff5c-8fa9-4c0a-9f8b-4fb5eb71694a) - 1 spans
2025-10-22 23:02:26 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace df6eff5c-8fa9-4c0a-9f8b-4fb5eb71694a
2025-10-22 23:02:26 - noveum_trace.transport.http_transport - INFO - ✅ Trace df6eff5c-8fa9-4c0a-9f8b-4fb5eb71694a successfully queued for export


2025-10-22 23:02:26 - INFO - novaeval.evaluators.agent_evaluator - Saving intermediate results after 5 samples
2025-10-22 23:02:26 - INFO - novaeval.evaluators.agent_evaluator - Intermediate results saved to demo_results/agent.web_search_evaluation_metrics_dataset/agent_evaluation_results.csv


Evaluating samples: 5it [00:21,  4.57s/it]

2025-10-22 23:02:26 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:02:27 - noveum_trace.transport.batch_processor - INFO - ⏰ TIMEOUT TRIGGER: Sending batch due to timeout (5.9s >= 5.0s)
2025-10-22 23:02:27 - noveum_trace.transport.batch_processor - INFO - 📤 SENDING BATCH: 3 traces via send_callback
2025-10-22 23:02:27 - noveum_trace.transport.http_transport - INFO - 🚀 SENDING BATCH: 3 traces to https://api.noveum.ai/api/v1/traces
2025-10-22 23:02:27 - noveum_trace.transport.http_transport - INFO - 📡 HTTP RESPONSE: Status 200 from https://api.noveum.ai/api/v1/traces
2025-10-22 23:02:27 - noveum_trace.transport.http_transport - INFO - ✅ Successfully sent batch of 3 traces
2025-10-22 23:02:27 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully sent batch of 3 traces via callback
2025-10-22 23:02:28 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: 9fca5686-3b49-4bba-961f-452fba26f5e6) - 1 spans
2025-10-22 23:02:28 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEU

2025-10-22 23:02:28 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:02:29 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: d7502a23-efaa-437e-ab40-138c0a1c3cc9) - 1 spans
2025-10-22 23:02:29 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: d7502a23-efaa-437e-ab40-138c0a1c3cc9) - 1 spans
2025-10-22 23:02:29 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace d7502a23-efaa-437e-ab40-138c0a1c3cc9
2025-10-22 23:02:29 - noveum_trace.transport.http_transport - INFO - ✅ Trace d7502a23-efaa-437e-ab40-138c0a1c3cc9 successfully queued for export


2025-10-22 23:02:29 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:02:30 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: 539d7f80-b556-441a-b288-08da522ef2ed) - 1 spans
2025-10-22 23:02:30 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: 539d7f80-b556-441a-b288-08da522ef2ed) - 1 spans
2025-10-22 23:02:30 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace 539d7f80-b556-441a-b288-08da522ef2ed
2025-10-22 23:02:30 - noveum_trace.transport.http_transport - INFO - ✅ Trace 539d7f80-b556-441a-b288-08da522ef2ed successfully queued for export


2025-10-22 23:02:30 - INFO - novaeval.evaluators.agent_evaluator - Saving intermediate results after 6 samples
2025-10-22 23:02:30 - INFO - novaeval.evaluators.agent_evaluator - Intermediate results saved to demo_results/agent.web_search_evaluation_metrics_dataset/agent_evaluation_results.csv


Evaluating samples: 6it [00:25,  4.33s/it]

2025-10-22 23:02:30 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:02:31 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: f4853141-1b6b-493e-94c2-ebd82a327a63) - 1 spans
2025-10-22 23:02:31 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: f4853141-1b6b-493e-94c2-ebd82a327a63) - 1 spans
2025-10-22 23:02:31 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace f4853141-1b6b-493e-94c2-ebd82a327a63
2025-10-22 23:02:31 - noveum_trace.transport.http_transport - INFO - ✅ Trace f4853141-1b6b-493e-94c2-ebd82a327a63 successfully queued for export


2025-10-22 23:02:31 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:02:32 - noveum_trace.transport.batch_processor - INFO - ⏰ TIMEOUT TRIGGER: Sending batch due to timeout (5.5s >= 5.0s)
2025-10-22 23:02:32 - noveum_trace.transport.batch_processor - INFO - 📤 SENDING BATCH: 4 traces via send_callback
2025-10-22 23:02:32 - noveum_trace.transport.http_transport - INFO - 🚀 SENDING BATCH: 4 traces to https://api.noveum.ai/api/v1/traces
2025-10-22 23:02:33 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: dc0ec2ed-2ab0-4e6f-9005-08c47f6be881) - 1 spans
2025-10-22 23:02:33 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: dc0ec2ed-2ab0-4e6f-9005-08c47f6be881) - 1 spans
2025-10-22 23:02:33 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace dc0ec2ed-2ab0-4e6f-9005-08c47f6be881
2025-10-22 23:02:33 - noveum_trace.transport.http_transport - INFO - ✅ Trace dc0ec2ed-2ab0-4e6f-9005-08c47f6be881 successfully queued for export


2025-10-22 23:02:33 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:02:33 - noveum_trace.transport.http_transport - INFO - 📡 HTTP RESPONSE: Status 200 from https://api.noveum.ai/api/v1/traces
2025-10-22 23:02:33 - noveum_trace.transport.http_transport - INFO - ✅ Successfully sent batch of 4 traces
2025-10-22 23:02:33 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully sent batch of 4 traces via callback
2025-10-22 23:02:34 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: 6be3cc26-2f44-4534-8c17-af00776df15d) - 1 spans
2025-10-22 23:02:34 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: 6be3cc26-2f44-4534-8c17-af00776df15d) - 1 spans
2025-10-22 23:02:34 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace 6be3cc26-2f44-4534-8c17-af00776df15d
2025-10-22 23:02:34 - noveum_trace.transport.http_transport - INFO - ✅ Trace 6be3cc26-2f44-4534-8c17-af00776df15d successfully queued for export


2025-10-22 23:02:34 - INFO - novaeval.evaluators.agent_evaluator - Saving intermediate results after 7 samples
2025-10-22 23:02:34 - INFO - novaeval.evaluators.agent_evaluator - Intermediate results saved to demo_results/agent.web_search_evaluation_metrics_dataset/agent_evaluation_results.csv


Evaluating samples: 7it [00:28,  4.12s/it]

2025-10-22 23:02:34 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:02:36 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: d900cab2-2ff3-4c05-b0c0-7d5ed8303c40) - 1 spans
2025-10-22 23:02:36 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: d900cab2-2ff3-4c05-b0c0-7d5ed8303c40) - 1 spans
2025-10-22 23:02:36 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace d900cab2-2ff3-4c05-b0c0-7d5ed8303c40
2025-10-22 23:02:36 - noveum_trace.transport.http_transport - INFO - ✅ Trace d900cab2-2ff3-4c05-b0c0-7d5ed8303c40 successfully queued for export


2025-10-22 23:02:36 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:02:37 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: 2337a5e3-f182-40a2-8172-7b973fd6f67f) - 1 spans
2025-10-22 23:02:37 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEUE: auto_trace_generate (ID: 2337a5e3-f182-40a2-8172-7b973fd6f67f) - 1 spans
2025-10-22 23:02:37 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully queued trace 2337a5e3-f182-40a2-8172-7b973fd6f67f
2025-10-22 23:02:37 - noveum_trace.transport.http_transport - INFO - ✅ Trace 2337a5e3-f182-40a2-8172-7b973fd6f67f successfully queued for export


2025-10-22 23:02:37 - INFO - google_genai.models - AFC is enabled with max remote calls: 10.


2025-10-22 23:02:38 - noveum_trace.transport.batch_processor - INFO - ⏰ TIMEOUT TRIGGER: Sending batch due to timeout (5.4s >= 5.0s)
2025-10-22 23:02:38 - noveum_trace.transport.batch_processor - INFO - 📤 SENDING BATCH: 4 traces via send_callback
2025-10-22 23:02:38 - noveum_trace.transport.http_transport - INFO - 🚀 SENDING BATCH: 4 traces to https://api.noveum.ai/api/v1/traces
2025-10-22 23:02:38 - noveum_trace.transport.http_transport - INFO - 📡 HTTP RESPONSE: Status 200 from https://api.noveum.ai/api/v1/traces
2025-10-22 23:02:38 - noveum_trace.transport.http_transport - INFO - ✅ Successfully sent batch of 4 traces
2025-10-22 23:02:38 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully sent batch of 4 traces via callback
2025-10-22 23:02:38 - noveum_trace.transport.http_transport - INFO - 📤 EXPORTING TRACE: auto_trace_generate (ID: f49c469d-893d-4a5a-8cf0-b5112321d143) - 1 spans
2025-10-22 23:02:38 - noveum_trace.transport.batch_processor - INFO - 📥 ADDING TRACE TO QUEU

2025-10-22 23:02:38 - INFO - novaeval.evaluators.agent_evaluator - Saving intermediate results after 8 samples
2025-10-22 23:02:38 - INFO - novaeval.evaluators.agent_evaluator - Intermediate results saved to demo_results/agent.web_search_evaluation_metrics_dataset/agent_evaluation_results.csv


Evaluating samples: 8it [00:33,  4.15s/it]

2025-10-22 23:02:38 - INFO - novaeval.evaluators.agent_evaluator - Saving final results
2025-10-22 23:02:38 - INFO - novaeval.evaluators.agent_evaluator - Reloaded 8 results from CSV
2025-10-22 23:02:38 - INFO - novaeval.evaluators.agent_evaluator - Agent evaluation completed

✅ Evaluation completed!

📊 Results Summary:
  - task_progression: 3.75
  - context_relevancy: 7.84
  - role_adherence: 8.97
  - tool_relevancy: 0.00
  - parameter_correctness: 0.00

🔍 Individual Scores:

  Record 1 (Task: eda4fe22-9a2b-4b73-856b-f4f3309bf719):
    - task_progression: 4.2
    - context_relevancy: 7.8
    - role_adherence: 9.0
    - tool_relevancy: 0.0
    - parameter_correctness: 0.0

  Record 2 (Task: 0ffffba1-8a37-443c-8866-d53ffbfa7718):
    - task_progression: 3.8
    - context_relevancy: 7.8
    - role_adherence: 9.0
    - tool_relevancy: 0.0
    - parameter_correctness: 0.0

  Record 3 (Task: 43cdf081-4f01-49cd-b566-dbd1619e6cd2):
    - task_progression: 4.2
    - context_relevancy: 7.8
    




2025-10-22 23:02:43 - noveum_trace.transport.batch_processor - INFO - ⏰ TIMEOUT TRIGGER: Sending batch due to timeout (5.1s >= 5.0s)
2025-10-22 23:02:43 - noveum_trace.transport.batch_processor - INFO - 📤 SENDING BATCH: 1 traces via send_callback
2025-10-22 23:02:43 - noveum_trace.transport.http_transport - INFO - 🚀 SENDING BATCH: 1 traces to https://api.noveum.ai/api/v1/traces
2025-10-22 23:02:44 - noveum_trace.transport.http_transport - INFO - 📡 HTTP RESPONSE: Status 200 from https://api.noveum.ai/api/v1/traces
2025-10-22 23:02:44 - noveum_trace.transport.http_transport - INFO - ✅ Successfully sent batch of 1 traces
2025-10-22 23:02:44 - noveum_trace.transport.batch_processor - INFO - ✅ Successfully sent batch of 1 traces via callback


In [14]:

import pandas as pd
import json

# Read the CSV file
df = pd.read_csv('demo_results/agent.query_routing_dataset/agent_evaluation_results.csv')

# Create API data structure with all task_ids
api_data = {
    'items': [
        {
            'item_key': str(row['task_id']),
            'item_id': f'item_{i+1}'  # Generate unique item IDs
        }
        for i, row in df.iterrows()
    ]
}

# Save to JSON
with open('api_data.json', 'w') as f:
    json.dump(api_data, f, indent=2)

print('Created api_data.json with', len(api_data['items']), 'items')
print('Sample items:')
for item in api_data['items'][:3]:
    print(f'  {item}')


Created api_data.json with 20 items
Sample items:
  {'item_key': 'eda4fe22-9a2b-4b73-856b-f4f3309bf719', 'item_id': 'item_1'}
  {'item_key': '0ffffba1-8a37-443c-8866-d53ffbfa7718', 'item_id': 'item_2'}
  {'item_key': 'f1f37bd7-0851-4659-b493-b80d3800d920', 'item_id': 'item_3'}


In [16]:
!python upload_scores.py demo_results/agent.query_routing_dataset/agent_evaluation_results.csv --item-key-col task_id --score-col context_relevancy --reasoning-col context_relevancy_reasoning --api-data api_data.json --scorer-id context_relevancy_scorer

Loaded environment variables
Loading API data from api_data.json...
Loaded 20 item mappings
Reading CSV from demo_results/agent.query_routing_dataset/agent_evaluation_results.csv...
Read 20 rows from CSV
Created 20 results for upload

Uploading 20 results in 1 batches...
✗ Batch 1/1 failed: 201
  Response: {"success":true,"created":20,"failed":0,"results":[{"result_id":"c92dbfca-8820-45d4-be82-a970e7e45b14","organization_slug":"magic-api","dataset_slug":"customersupportagentdemo-new","item_id":"item_1","scorer_id":"context_relevancy_scorer","score":7.5,"passed":1,"metadata":"{\"details\":\"The agent's response should provide information about the latest breakthroughs in AI. Without seeing the actual response, it's impossible to confirm alignment with role and task. However, the premise indicates that the content should be factually correct and presented in an informative manner, suggesting an appropriate tone for the given role.\"}","error":"","execution_time_ms":0,"created_at":"2025-1

In [17]:
!python upload_scores.py demo_results/agent.query_routing_dataset/agent_evaluation_results.csv --item-key-col task_id --score-col role_adherence --reasoning-col role_adherence_reasoning --api-data api_data.json --scorer-id role_adherence_scorer

Loaded environment variables
Loading API data from api_data.json...
Loaded 20 item mappings
Reading CSV from demo_results/agent.query_routing_dataset/agent_evaluation_results.csv...
Read 20 rows from CSV
Created 20 results for upload

Uploading 20 results in 1 batches...
✗ Batch 1/1 failed: 201
  Response: {"success":true,"created":20,"failed":0,"results":[{"result_id":"bcc832a3-fda1-4129-8988-5ba5a30ed2f0","organization_slug":"magic-api","dataset_slug":"customersupportagentdemo-new","item_id":"item_1","scorer_id":"role_adherence_scorer","score":1,"passed":1,"metadata":"{\"details\":\"The agent completely fails to adhere to its role as it provided no response and made no tool calls. There is no evidence of an attempt to answer the question about AI breakthroughs. The agent's lack of any output indicates a total disregard for the task and role instructions.\"}","error":"","execution_time_ms":0,"created_at":"2025-10-22 18:01:48.477000000","updated_at":"2025-10-22 18:01:48.477000000","sco

In [20]:
!python upload_scores.py demo_results/agent.query_routing_dataset/agent_evaluation_results.csv --item-key-col task_id --score-col parameter_correctness --reasoning-col parameter_correctness_reasoning --api-data api_data.json --scorer-id parameter_correctness_scorer

Loaded environment variables
Loading API data from api_data.json...
Loaded 20 item mappings
Reading CSV from demo_results/agent.query_routing_dataset/agent_evaluation_results.csv...
Read 20 rows from CSV
Created 20 results for upload

Uploading 20 results in 1 batches...
✗ Batch 1/1 failed: 201
  Response: {"success":true,"created":20,"failed":0,"results":[{"result_id":"35b42464-7989-4caa-90cb-5636319d4fd1","organization_slug":"magic-api","dataset_slug":"customersupportagentdemo-new","item_id":"item_1","scorer_id":"parameter_correctness_scorer","score":0,"passed":0,"metadata":"{\"details\":\"Error: Missing required fields: ['tool_calls'], out of all required fields: ['tool_calls', 'parameters_passed', 'tool_call_results']\"}","error":"","execution_time_ms":0,"created_at":"2025-10-22 18:03:01.353000000","updated_at":"2025-10-22 18:03:01.353000000","scorer_name":"Parameter Correctness Scorer"},{"result_id":"28380f39-f4e4-41a3-a5c7-01241174fb78","organization_slug":"magic-api","dataset_sl