# Agentic RAG System Implementation

This notebook implements a complete Agentic RAG (Retrieval-Augmented Generation) system that goes beyond traditional RAG by incorporating intelligent agents that can reason, plan, and make decisions about how to retrieve and use information.

## Table of Contents

1. [Setup and Dependencies](#setup)  
2. [Traditional RAG vs Agentic RAG](#comparison)  
3. [Document Processing and Indexing](#processing)  
4. [Multi-Modal Indexing System](#optimization)  
5. [Agent Implementation](#agents)  
6. [Orchestration Implementation](#orchestration)  
7. [Testing Framework](#testing1)  
8. [Performance Analysis](#analysis)
9. [Comprehensive Test Framework](#testing2)

## 1. Setup and Dependencies <a id="setup"></a>

First, let's install and import all necessary libraries for our Agentic RAG system.


In [None]:
# Install required packages
% pip install langchain langchain-community langchain-openai
% pip install chromadb sentence-transformers
% pip install PyPDF2 tiktoken
% pip install streamlit plotly
% pip install openai anthropic
% pip install faiss-cpu
% pip install rank-bm25

Collecting langchain
  Downloading langchain-0.3.27-py3-none-any.whl.metadata (7.8 kB)
Collecting langchain-community
  Downloading langchain_community-0.3.30-py3-none-any.whl.metadata (3.0 kB)
Collecting langchain-openai
  Downloading langchain_openai-0.3.33-py3-none-any.whl.metadata (2.4 kB)
Collecting langchain-core<1.0.0,>=0.3.72 (from langchain)
  Downloading langchain_core-0.3.76-py3-none-any.whl.metadata (3.7 kB)
Collecting langchain-text-splitters<1.0.0,>=0.3.9 (from langchain)
  Downloading langchain_text_splitters-0.3.11-py3-none-any.whl.metadata (1.8 kB)
Collecting langsmith>=0.1.17 (from langchain)
  Downloading langsmith-0.4.31-py3-none-any.whl.metadata (14 kB)
Collecting requests<3,>=2 (from langchain)
  Downloading requests-2.32.5-py3-none-any.whl.metadata (4.9 kB)
Collecting dataclasses-json<0.7.0,>=0.6.7 (from langchain-community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting pydantic-settings<3.0.0,>=2.10.1 (from langchain-community

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
conda-repo-cli 1.0.75 requires requests_mock, which is not installed.
alpaca-trade-api 3.2.0 requires websockets<11,>=9.0, but you have websockets 15.0.1 which is incompatible.
conda-repo-cli 1.0.75 requires clyent==1.2.1, but you have clyent 1.2.2 which is incompatible.
conda-repo-cli 1.0.75 requires requests==2.31.0, but you have requests 2.32.5 which is incompatible.
darts 0.32.0 requires scikit-learn<1.6.0,>=1.0.1, but you have scikit-learn 1.6.1 which is incompatible.
streamlit 1.44.1 requires packaging<25,>=20, but you have packaging 25.0 which is incompatible.
tensorflow-intel 2.18.0 requires ml-dtypes<0.5.0,>=0.4.0, but you have ml-dtypes 0.5.1 which is incompatible.
tensorflow-intel 2.18.0 requires tensorboard<2.19,>=2.18, but you have tensorboard 2.19.0 which is incompatible.


Collecting chromadb
  Downloading chromadb-1.1.0-cp39-abi3-win_amd64.whl.metadata (7.4 kB)
Collecting sentence-transformers
  Downloading sentence_transformers-5.1.1-py3-none-any.whl.metadata (16 kB)
Collecting build>=1.0.3 (from chromadb)
  Downloading build-1.3.0-py3-none-any.whl.metadata (5.6 kB)
Collecting pybase64>=1.4.1 (from chromadb)
  Downloading pybase64-1.4.2-cp311-cp311-win_amd64.whl.metadata (9.0 kB)
Collecting posthog<6.0.0,>=2.4.0 (from chromadb)
  Downloading posthog-5.4.0-py3-none-any.whl.metadata (5.7 kB)
Collecting onnxruntime>=1.14.1 (from chromadb)
  Downloading onnxruntime-1.23.0-cp311-cp311-win_amd64.whl.metadata (5.1 kB)
Collecting opentelemetry-api>=1.2.0 (from chromadb)
  Downloading opentelemetry_api-1.37.0-py3-none-any.whl.metadata (1.5 kB)
Collecting opentelemetry-exporter-otlp-proto-grpc>=1.2.0 (from chromadb)
  Downloading opentelemetry_exporter_otlp_proto_grpc-1.37.0-py3-none-any.whl.metadata (2.4 kB)
Collecting opentelemetry-sdk>=1.2.0 (from chromadb)
 

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
streamlit 1.44.1 requires packaging<25,>=20, but you have packaging 25.0 which is incompatible.
tensorflow-intel 2.18.0 requires ml-dtypes<0.5.0,>=0.4.0, but you have ml-dtypes 0.5.1 which is incompatible.
tensorflow-intel 2.18.0 requires tensorboard<2.19,>=2.18, but you have tensorboard 2.19.0 which is incompatible.


Collecting PyPDF2
  Using cached pypdf2-3.0.1-py3-none-any.whl.metadata (6.8 kB)
Using cached pypdf2-3.0.1-py3-none-any.whl (232 kB)
Installing collected packages: PyPDF2
Successfully installed PyPDF2-3.0.1
Collecting packaging<25,>=20 (from streamlit)
  Using cached packaging-24.2-py3-none-any.whl.metadata (3.2 kB)
Using cached packaging-24.2-py3-none-any.whl (65 kB)
Installing collected packages: packaging
  Attempting uninstall: packaging
    Found existing installation: packaging 25.0
    Uninstalling packaging-25.0:
      Successfully uninstalled packaging-25.0
Successfully installed packaging-24.2


ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
tensorflow-intel 2.18.0 requires ml-dtypes<0.5.0,>=0.4.0, but you have ml-dtypes 0.5.1 which is incompatible.
tensorflow-intel 2.18.0 requires tensorboard<2.19,>=2.18, but you have tensorboard 2.19.0 which is incompatible.


Collecting anthropic
  Downloading anthropic-0.69.0-py3-none-any.whl.metadata (28 kB)
Collecting docstring-parser<1,>=0.15 (from anthropic)
  Downloading docstring_parser-0.17.0-py3-none-any.whl.metadata (3.5 kB)
Downloading anthropic-0.69.0-py3-none-any.whl (337 kB)
   ---------------------------------------- 0.0/337.3 kB ? eta -:--:--
   --- ------------------------------------ 30.7/337.3 kB 1.4 MB/s eta 0:00:01
   ------------ --------------------------- 102.4/337.3 kB 2.0 MB/s eta 0:00:01
   -------------------------- ------------- 225.3/337.3 kB 2.0 MB/s eta 0:00:01
   ------------------------------------ --- 307.2/337.3 kB 1.9 MB/s eta 0:00:01
   ---------------------------------------- 337.3/337.3 kB 1.7 MB/s eta 0:00:00
Downloading docstring_parser-0.17.0-py3-none-any.whl (36 kB)
Installing collected packages: docstring-parser, anthropic
Successfully installed anthropic-0.69.0 docstring-parser-0.17.0
Collecting faiss-cpu
  Downloading faiss_cpu-1.12.0-cp311-cp311-win_amd64.whl.

In [1]:
import os
import sys
import json
import time
import re
from typing import List, Dict, Any, Optional, Tuple
from dataclasses import dataclass, asdict
from abc import ABC, abstractmethod
import logging
from pathlib import Path

# LangChain imports
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma, FAISS
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.document_loaders import PyPDFLoader
from langchain.schema import Document
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
from langchain.llms import OpenAI

# Vector search and embeddings
import chromadb
from sentence_transformers import SentenceTransformer
import faiss
import numpy as np
from rank_bm25 import BM25Okapi

# PDF processing
import PyPDF2

# Evaluation and metrics
from sklearn.metrics.pairwise import cosine_similarity
import matplotlib.pyplot as plt
import seaborn as sns

# Setup logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)





## 2. Traditional RAG vs Agentic RAG Comparison <a id="comparison"></a>

Let's understand the key differences between traditional RAG and Agentic RAG systems:

### Traditional RAG:
- **Static Pipeline**: Query → Retrieve → Generate  
- **Single Retrieval**: One-shot retrieval based on query  
- **No Reasoning**: Direct mapping from query to documents  
- **Limited Context**: Fixed context window  

### Agentic RAG:
- **Dynamic Planning**: Agents decide retrieval strategy  
- **Multi-step Reasoning**: Can perform multiple retrievals  
- **Self-Reflection**: Agents evaluate retrieval quality  
- **Adaptive Context**: Dynamically adjusts based on task complexity  
- **Tool Integration**: Can use multiple tools and strategies  

In [2]:
from dataclasses import dataclass
from typing import Optional, List, Dict, Any
from abc import ABC, abstractmethod
import time
import logging

# Set up a basic logger
logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)

@dataclass
class RetrievalResult:
    """Data class to store retrieval results with metadata"""
    content: str
    source: str
    score: float
    page_number: Optional[int] = None
    chunk_id: Optional[str] = None
    retrieval_method: Optional[str] = None
    timestamp: float = time.time()

@dataclass
class AgentDecision:
    """Data class to store agent decision-making process"""
    action: str
    reasoning: str
    confidence: float
    next_steps: List[str]
    metadata: Dict[str, Any]

class BaseAgent(ABC):
    """Abstract base class for all agents in the system"""
    
    def __init__(self, name: str, description: str):
        self.name = name
        self.description = description
        self.decision_history: List[AgentDecision] = []
    
    @abstractmethod
    def execute(self, input_data: Any) -> Any:
        """Execute the agent's main function"""
        pass
    
    def log_decision(self, decision: AgentDecision):
        """Log agent decisions for analysis"""
        self.decision_history.append(decision)
        logger.info(f"{self.name}: {decision.action} - {decision.reasoning}")


## 3. Document Processing and Indexing <a id="processing"></a>

Implement sophisticated document processing with multiple indexing strategies for optimal retrieval.


In [None]:
class DocumentProcessor:
    def __init__(self, chunk_size: int = 1000, chunk_overlap: int = 200):
        self.chunk_size = chunk_size
        self.chunk_overlap = chunk_overlap
        self.text_splitter = RecursiveCharacterTextSplitter(
            chunk_size=chunk_size,
            chunk_overlap=chunk_overlap,
            separators=["\n\n", "\n", ".", "!", "?", ",", " ", ""]
        )
    
    def load_pdf(self, file_path: str) -> List[Document]:
        try:
            loader = PyPDFLoader(file_path)
            documents = loader.load()
            
            # Add metadata
            for i, doc in enumerate(documents):
                doc.metadata.update({
                    'source_file': file_path,
                    'page_number': i + 1,
                    'total_pages': len(documents)
                })
            
            logger.info(f"Loaded {len(documents)} pages from {file_path}")
            return documents
            
        except Exception as e:
            logger.error(f"Error loading PDF {file_path}: {str(e)}")
            return []
    
    def create_chunks(self, documents: List[Document]) -> List[Document]:
        chunks = []
        
        for doc in documents:
            doc_chunks = self.text_splitter.split_documents([doc])

            for i, chunk in enumerate(doc_chunks):
                chunk.metadata.update({
                    'chunk_id': f"{doc.metadata.get('page_number', 0)}-{i}",
                    'chunk_index': i,
                    'total_chunks_in_page': len(doc_chunks),
                    'word_count': len(chunk.page_content.split()),
                    'char_count': len(chunk.page_content)
                })
                chunks.append(chunk)
        
        logger.info(f"Created {len(chunks)} chunks from {len(documents)} documents")
        return chunks
    
    def extract_key_phrases(self, text: str) -> List[str]:
        import re
      
        stop_words = {'the', 'a', 'an', 'and', 'or', 'but', 'in', 'on', 'at', 'to', 'for', 'of', 'with', 'by'}
        words = re.findall(r'\b[A-Za-z]{3,}\b', text.lower())
        keywords = [word for word in words if word not in stop_words]
        
        phrases = re.findall(r'\b[A-Z][a-z]+(?:\s+[A-Z][a-z]+)*\b', text)
        
        return list(set(keywords[:10] + phrases[:5]))  


In [None]:
# Load and process the attention paper
processor = DocumentProcessor(chunk_size=800, chunk_overlap=100)

# Load the PDF
pdf_path = "data/NIPS-2017-attention-is-all-you-need-Paper.pdf"
documents = processor.load_pdf(pdf_path)

# Create chunks
chunks = processor.create_chunks(documents)

print(f"Processed {len(documents)} pages into {len(chunks)} chunks")
print(f"Average chunk size: {np.mean([len(chunk.page_content) for chunk in chunks]):.0f} characters")

# Display sample chunk
if chunks:
    sample_chunk = chunks[5]  
    print("\nSample chunk:")
    print(f"Content preview: {sample_chunk.page_content[:200]}...")
    print(f"Metadata: {sample_chunk.metadata}")


INFO:__main__:Loaded 11 pages from data/NIPS-2017-attention-is-all-you-need-Paper.pdf
INFO:__main__:Created 51 chunks from 11 documents


Processed 11 pages into 51 chunks
Average chunk size: 690 characters

Sample chunk:
Content preview: constraint of sequential computation, however, remains.
Attention mechanisms have become an integral part of compelling sequence modeling and transduc-
tion models in various tasks, allowing modeling ...
Metadata: {'producer': 'PyPDF2', 'creator': 'PyPDF', 'creationdate': '', 'subject': 'Neural Information Processing Systems http://nips.cc/', 'publisher': 'Curran Associates, Inc.', 'language': 'en-US', 'created': '2017', 'eventtype': 'Poster', 'description-abstract': 'The dominant sequence transduction models are based on complex recurrent orconvolutional neural networks in an encoder and decoder configuration. The best performing such models also connect the encoder and decoder through an attentionm echanisms.  We propose a novel, simple network architecture based solely onan attention mechanism, dispensing with recurrence and convolutions entirely.Experiments on two machine translatio

## 4. Multi-Modal Indexing System <a id="optimization"></a>

Implement multiple retrieval methods that our agents can choose from based on the query type.


In [None]:
class HybridRetriever:
    def __init__(self, chunks: List[Document]):
        self.chunks = chunks
        self.chunk_texts = [chunk.page_content for chunk in chunks]
        self.embeddings_model = SentenceTransformer('all-MiniLM-L6-v2')
        self._build_semantic_index()
        self._build_lexical_index()
        self._build_contextual_index()
        
        logger.info("Hybrid retriever initialized with all indexes")
    
    def _build_semantic_index(self):
        print("Building semantic index...")
        
        # Generate embeddings
        self.embeddings = self.embeddings_model.encode(self.chunk_texts)
        
        # Build FAISS index
        dimension = self.embeddings.shape[1]
        self.faiss_index = faiss.IndexFlatIP(dimension)  
        
        # Normalize embeddings for cosine similarity
        normalized_embeddings = self.embeddings / np.linalg.norm(self.embeddings, axis=1, keepdims=True)
        self.faiss_index.add(normalized_embeddings.astype('float32'))
        
        print(f"Semantic index built with {len(self.chunk_texts)} chunks")
    
    def _build_lexical_index(self):
        print("Building lexical index...")
        
        tokenized_chunks = [chunk.lower().split() for chunk in self.chunk_texts]
        self.bm25_index = BM25Okapi(tokenized_chunks)
        
        print("Lexical index built")
    
    def _build_contextual_index(self):
        print("Building contextual index...")
        
        self.contextual_map = {}
        
        for i, chunk in enumerate(self.chunks):
            chunk_id = chunk.metadata.get('chunk_id', str(i))
            page_num = chunk.metadata.get('page_number', 1)
            
            related_chunks = []
            for j, other_chunk in enumerate(self.chunks):
                if i != j:
                    other_page = other_chunk.metadata.get('page_number', 1)
                    if abs(page_num - other_page) <= 1:  
                        related_chunks.append(j)
            
            self.contextual_map[i] = related_chunks
        
        print("Contextual index built")
    
    def semantic_search(self, query: str, k: int = 5) -> List[RetrievalResult]:
        query_embedding = self.embeddings_model.encode([query])
        query_embedding = query_embedding / np.linalg.norm(query_embedding)
        
        scores, indices = self.faiss_index.search(query_embedding.astype('float32'), k)
        
        results = []
        for score, idx in zip(scores[0], indices[0]):
            if idx < len(self.chunks):
                chunk = self.chunks[idx]
                result = RetrievalResult(
                    content=chunk.page_content,
                    source=chunk.metadata.get('source_file', 'unknown'),
                    score=float(score),
                    page_number=chunk.metadata.get('page_number'),
                    chunk_id=chunk.metadata.get('chunk_id'),
                    retrieval_method='semantic'
                )
                results.append(result)
        
        return results
    
    def lexical_search(self, query: str, k: int = 5) -> List[RetrievalResult]:
        query_tokens = query.lower().split()
        scores = self.bm25_index.get_scores(query_tokens)
        top_indices = np.argsort(scores)[::-1][:k]
        
        results = []
        for idx in top_indices:
            if scores[idx] > 0: 
                chunk = self.chunks[idx]
                result = RetrievalResult(
                    content=chunk.page_content,
                    source=chunk.metadata.get('source_file', 'unknown'),
                    score=float(scores[idx]),
                    page_number=chunk.metadata.get('page_number'),
                    chunk_id=chunk.metadata.get('chunk_id'),
                    retrieval_method='lexical'
                )
                results.append(result)
        
        return results
    
    def contextual_search(self, base_results: List[RetrievalResult], expand_count: int = 2) -> List[RetrievalResult]:
        expanded_results = list(base_results)
        
        for result in base_results:
            chunk_idx = None
            for i, chunk in enumerate(self.chunks):
                if chunk.metadata.get('chunk_id') == result.chunk_id:
                    chunk_idx = i
                    break
            
            if chunk_idx is not None and chunk_idx in self.contextual_map:
                related_indices = self.contextual_map[chunk_idx][:expand_count]
                
                for rel_idx in related_indices:
                    rel_chunk = self.chunks[rel_idx]
                    contextual_result = RetrievalResult(
                        content=rel_chunk.page_content,
                        source=rel_chunk.metadata.get('source_file', 'unknown'),
                        score=result.score * 0.8,  
                        page_number=rel_chunk.metadata.get('page_number'),
                        chunk_id=rel_chunk.metadata.get('chunk_id'),
                        retrieval_method='contextual'
                    )
                    expanded_results.append(contextual_result)
        
        return expanded_results
    
    def hybrid_search(self, query: str, k: int = 5, weights: Dict[str, float] = None) -> List[RetrievalResult]:
        if weights is None:
            weights = {'semantic': 0.6, 'lexical': 0.4}
        
        semantic_results = self.semantic_search(query, k)
        lexical_results = self.lexical_search(query, k)
        combined_results = {}

        for result in semantic_results:
            key = result.chunk_id
            if key not in combined_results:
                combined_results[key] = result
                combined_results[key].score = result.score * weights.get('semantic', 0.6)
            else:
                combined_results[key].score += result.score * weights.get('semantic', 0.6)
 
        for result in lexical_results:
            key = result.chunk_id
            if key not in combined_results:
                combined_results[key] = result
                combined_results[key].score = result.score * weights.get('lexical', 0.4)
            else:
                combined_results[key].score += result.score * weights.get('lexical', 0.4)
       
        sorted_results = sorted(combined_results.values(), key=lambda x: x.score, reverse=True)[:k]

        for result in sorted_results:
            result.retrieval_method = 'hybrid'
        
        return sorted_results


In [None]:
retriever = HybridRetriever(chunks)

test_query = "What is the attention mechanism in transformers?"

print("Testing different retrieval methods:")
print("=" * 50)

semantic_results = retriever.semantic_search(test_query, k=3)
print(f"\nSemantic Search Results ({len(semantic_results)} results):")
for i, result in enumerate(semantic_results):
    print(f"{i+1}. Score: {result.score:.4f} | Page: {result.page_number}")
    print(f"   Content: {result.content[:100]}...\n")

lexical_results = retriever.lexical_search(test_query, k=3)
print(f"\nLexical Search Results ({len(lexical_results)} results):")
for i, result in enumerate(lexical_results):
    print(f"{i+1}. Score: {result.score:.4f} | Page: {result.page_number}")
    print(f"   Content: {result.content[:100]}...\n")

hybrid_results = retriever.hybrid_search(test_query, k=3)
print(f"\nHybrid Search Results ({len(hybrid_results)} results):")
for i, result in enumerate(hybrid_results):
    print(f"{i+1}. Score: {result.score:.4f} | Page: {result.page_number}")
    print(f"   Content: {result.content[:100]}...\n")


INFO:sentence_transformers.SentenceTransformer:Use pytorch device_name: cpu
INFO:sentence_transformers.SentenceTransformer:Load pretrained SentenceTransformer: all-MiniLM-L6-v2


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Building semantic index...


Batches:   0%|          | 0/2 [00:00<?, ?it/s]

INFO:__main__:Hybrid retriever initialized with all indexes


Semantic index built with 51 chunks
Building lexical index...
Lexical index built
Building contextual index...
Contextual index built
Testing different retrieval methods:


Batches:   0%|          | 0/1 [00:00<?, ?it/s]


Semantic Search Results (3 results):
1. Score: 0.5548 | Page: 2
   Content: it more difﬁcult to learn dependencies between distant positions [ 11]. In the Transformer this is
r...

2. Score: 0.5530 | Page: 9
   Content: We are excited about the future of attention-based models and plan to apply them to other tasks. We
...

3. Score: 0.4634 | Page: 5
   Content: MultiHead(Q,K,V ) = Concat(head1,..., headh)WO
where headi = Attention(QWQ
i ,KW K
i ,VW V
i )
Where...


Lexical Search Results (3 results):
1. Score: 6.1750 | Page: 2
   Content: it more difﬁcult to learn dependencies between distant positions [ 11]. In the Transformer this is
r...

2. Score: 5.8413 | Page: 2
   Content: constraint of sequential computation, however, remains.
Attention mechanisms have become an integral...

3. Score: 5.4722 | Page: 2
   Content: textual entailment and learning task-independent sentence representations [4, 22, 23, 19].
End-to-en...



Batches:   0%|          | 0/1 [00:00<?, ?it/s]


Hybrid Search Results (3 results):
1. Score: 2.8029 | Page: 2
   Content: it more difﬁcult to learn dependencies between distant positions [ 11]. In the Transformer this is
r...

2. Score: 2.3365 | Page: 2
   Content: constraint of sequential computation, however, remains.
Attention mechanisms have become an integral...

3. Score: 2.1889 | Page: 2
   Content: textual entailment and learning task-independent sentence representations [4, 22, 23, 19].
End-to-en...



## 5. Agent Implementation <a id="agents"></a>

Implement the core agents that make our RAG system "agentic". Each agent has specific responsibilities and can make intelligent decisions.


In [None]:
class QueryAnalysisAgent(BaseAgent):
    def __init__(self):
        super().__init__("QueryAnalyzer", "Analyzes queries and determines retrieval strategy")
        self.query_patterns = {
            'factual': ['what is', 'define', 'explain', 'describe'],
            'comparison': ['compare', 'difference', 'versus', 'vs'],
            'procedural': ['how to', 'steps', 'process', 'method'],
            'analytical': ['why', 'analyze', 'evaluate', 'assess'],
            'specific': ['when', 'where', 'who', 'which']
        }
    
    def execute(self, query: str) -> AgentDecision:
        query_lower = query.lower()
 
        query_type = 'general'
        confidence = 0.5
        
        for q_type, patterns in self.query_patterns.items():
            if any(pattern in query_lower for pattern in patterns):
                query_type = q_type
                confidence = 0.8
                break

        complexity = 'simple'
        if len(query.split()) > 10 or any(word in query_lower for word in ['and', 'or', 'but', 'however']):
            complexity = 'complex'
            confidence = min(confidence + 0.1, 1.0)

        if query_type == 'factual':
            strategy = 'semantic'
            reasoning = "Factual queries benefit from semantic understanding"
        elif query_type == 'specific':
            strategy = 'lexical'
            reasoning = "Specific queries need exact term matching"
        elif complexity == 'complex':
            strategy = 'hybrid'
            reasoning = "Complex queries require multi-modal retrieval"
        else:
            strategy = 'hybrid'
            reasoning = "Default to hybrid approach for balanced results"
        
        decision = AgentDecision(
            action=f"recommend_{strategy}_retrieval",
            reasoning=reasoning,
            confidence=confidence,
            next_steps=[f"Execute {strategy} search", "Evaluate results", "Refine if needed"],
            metadata={
                'query_type': query_type,
                'complexity': complexity,
                'recommended_strategy': strategy,
                'query_length': len(query.split())
            }
        )
        
        self.log_decision(decision)
        return decision

class RetrievalAgent(BaseAgent):
    def __init__(self, retriever: HybridRetriever):
        super().__init__("RetrievalAgent", "Executes document retrieval with various strategies")
        self.retriever = retriever
        self.retrieval_history = []
    
    def execute(self, query: str, strategy: str = 'hybrid', k: int = 5) -> list[RetrievalResult]:
        start_time = time.time()
        
        if strategy == 'semantic':
            results = self.retriever.semantic_search(query, k)
        elif strategy == 'lexical':
            results = self.retriever.lexical_search(query, k)
        elif strategy == 'hybrid':
            results = self.retriever.hybrid_search(query, k)
        else:
            results = self.retriever.hybrid_search(query, k)  
        
        retrieval_time = time.time() - start_time
        
        self.retrieval_history.append({
            'query': query,
            'strategy': strategy,
            'results_count': len(results),
            'retrieval_time': retrieval_time,
            'timestamp': time.time()
        })
        
        decision = AgentDecision(
            action=f"retrieved_{len(results)}_documents",
            reasoning=f"Used {strategy} strategy to find relevant documents",
            confidence=0.8 if results else 0.3,
            next_steps=["Evaluate result quality", "Consider refinement"] if results else ["Try alternative strategy"],
            metadata={
                'strategy_used': strategy,
                'retrieval_time': retrieval_time,
                'results_count': len(results)
            }
        )
        
        self.log_decision(decision)
        return results

class QualityAssessmentAgent(BaseAgent):
    def __init__(self):
        super().__init__("QualityAssessor", "Evaluates and scores retrieval result quality")
        self.embeddings_model = SentenceTransformer('all-MiniLM-L6-v2')
    
    def execute(self, query: str, results: list[RetrievalResult]) -> dict:
        if not results:
            decision = AgentDecision(
                action="low_quality_assessment",
                reasoning="No results to evaluate",
                confidence=1.0,
                next_steps=["Recommend alternative retrieval strategy"],
                metadata={'quality_score': 0.0, 'issues': ['no_results']}
            )
            self.log_decision(decision)
            return {'quality_score': 0.0, 'issues': ['no_results'], 'recommendation': 'retry_with_different_strategy'}
        
        quality_metrics = self._calculate_quality_metrics(query, results)
        overall_score = self._calculate_overall_score(quality_metrics)
        issues = self._identify_issues(quality_metrics)
        recommendation = self._generate_recommendation(overall_score, issues)
        
        decision = AgentDecision(
            action=f"quality_score_{overall_score:.2f}",
            reasoning=f"Assessed {len(results)} results with score {overall_score:.2f}",
            confidence=0.9,
            next_steps=[recommendation] if recommendation else ["Proceed with current results"],
            metadata={
                'quality_score': overall_score,
                'issues': issues,
                'metrics': quality_metrics
            }
        )
        
        self.log_decision(decision)
        
        return {
            'quality_score': overall_score,
            'issues': issues,
            'recommendation': recommendation,
            'metrics': quality_metrics
        }
    
    def _calculate_quality_metrics(self, query: str, results: list[RetrievalResult]) -> dict:
        relevance_score = np.mean([result.score for result in results]) if results else 0.0
        
        if len(results) > 1:
            result_embeddings = self.embeddings_model.encode([r.content for r in results])
            similarity_matrix = cosine_similarity(result_embeddings)
            mask = ~np.eye(similarity_matrix.shape[0], dtype=bool)
            avg_similarity = similarity_matrix[mask].mean()
            diversity_score = 1.0 - avg_similarity  
        else:
            diversity_score = 1.0 if results else 0.0

        if results:
            query_embedding = self.embeddings_model.encode([query])
            result_embeddings = self.embeddings_model.encode([r.content for r in results])
            coverage_scores = cosine_similarity(query_embedding, result_embeddings)[0]
            coverage_score = np.max(coverage_scores)  
        else:
            coverage_score = 0.0

        if results:
            lengths = [len(r.content) for r in results]
            avg_length = np.mean(lengths)
            if 200 <= avg_length <= 2000:
                length_score = 1.0
            elif avg_length < 200:
                length_score = avg_length / 200.0
            else:
                length_score = max(0.5, 2000 / avg_length)
        else:
            length_score = 0.0
        
        return {
            'relevance': relevance_score,
            'diversity': diversity_score,
            'coverage': coverage_score,
            'length_consistency': length_score
        }
    
    def _calculate_overall_score(self, metrics: dict) -> float:
        weights = {
            'relevance': 0.4,
            'diversity': 0.2,
            'coverage': 0.3,
            'length_consistency': 0.1
        }
        
        return sum(metrics[key] * weights[key] for key in weights.keys())
    
    def _identify_issues(self, metrics: dict) -> list[str]:
        issues = []
        
        if metrics['relevance'] < 0.3:
            issues.append('low_relevance')
        if metrics['diversity'] < 0.2:
            issues.append('low_diversity')
        if metrics['coverage'] < 0.4:
            issues.append('poor_coverage')
        if metrics['length_consistency'] < 0.5:
            issues.append('inconsistent_length')
        
        return issues
    
    def _generate_recommendation(self, score: float, issues: list[str]) -> str | None:
        if score < 0.5:
            if 'low_relevance' in issues:
                return 'try_different_retrieval_strategy'
            elif 'poor_coverage' in issues:
                return 'expand_search_or_rephrase_query'
            else:
                return 'refine_retrieval_parameters'
        elif score < 0.7 and 'low_diversity' in issues:
            return 'add_contextual_expansion'
        
        return None  

class CitationAgent(BaseAgent):
    def __init__(self):
        super().__init__("CitationAgent", "Generates citations and tracks sources")
        self.citation_style = 'academic' 
    
    def execute(self, results: list[RetrievalResult]) -> dict:
        citations = []
        source_map = {}
        
        for i, result in enumerate(results):
            citation = self._generate_citation(result, i + 1)
            citations.append(citation)
            
            source_key = f"source_{i + 1}"
            source_map[source_key] = {
                'file': result.source,
                'page': result.page_number,
                'chunk_id': result.chunk_id,
                'retrieval_method': result.retrieval_method,
                'score': result.score
            }
        
        decision = AgentDecision(
            action=f"generated_{len(citations)}_citations",
            reasoning="Created proper citations for all retrieved sources",
            confidence=0.95,
            next_steps=["Include citations in response"],
            metadata={
                'citation_count': len(citations),
                'citation_style': self.citation_style
            }
        )
        
        self.log_decision(decision)
        
        return {
            'citations': citations,
            'source_map': source_map,
            'bibliography': self._generate_bibliography(results)
        }
    
    def _generate_citation(self, result: RetrievalResult, ref_number: int) -> str:
        filename = Path(result.source).stem if result.source else "Unknown"
        
        if self.citation_style == 'academic':
            if result.page_number:
                return f"[{ref_number}] {filename}, page {result.page_number}"
            else:
                return f"[{ref_number}] {filename}"
        else:
            return f"({ref_number}) {filename}"
    
    def _generate_bibliography(self, results: List[RetrievalResult]) -> List[str]:
        unique_sources = set()
        bibliography = []
        
        for result in results:
            source_key = (result.source, result.page_number)
            if source_key not in unique_sources:
                unique_sources.add(source_key)
                filename = Path(result.source).stem if result.source else "Unknown Source"
                entry = f"{filename}"
                if result.page_number:
                    entry += f", page {result.page_number}"
                bibliography.append(entry)
        
        return bibliography

## 6. Orchestration Agent - The Central Controller <a id="orchestration"></a>

This agent coordinates all other agents and implements the core agentic behavior.


In [None]:
# class OrchestrationAgent(BaseAgent):
#     """Central agent that coordinates all other agents and implements agentic behavior"""
    
#     def __init__(self, retriever: HybridRetriever):
#         super().__init__("Orchestrator", "Central coordinator for agentic RAG system")
        
#         # Initialize all sub-agents
#         self.query_analyzer = QueryAnalysisAgent()
#         self.retrieval_agent = RetrievalAgent(retriever)
#         self.quality_assessor = QualityAssessmentAgent()
#         self.citation_agent = CitationAgent()
        
#         # System configuration
#         self.max_iterations = 3
#         self.quality_threshold = 0.6
#         self.conversation_history = []
    
#     def execute(self, query: str, context: Dict[str, Any] = None) -> Dict[str, Any]:
#         """Execute the full agentic RAG pipeline"""
#         context = context or {}
#         start_time = time.time()
        
#         # Store conversation
#         conversation_id = len(self.conversation_history)
#         conversation = {
#             'id': conversation_id,
#             'query': query,
#             'context': context,
#             'start_time': start_time,
#             'iterations': [],
#             'final_results': None
#         }
        
#         logger.info(f"Starting agentic RAG for query: {query[:50]}...")
        
#         # Phase 1: Query Analysis
#         analysis_decision = self.query_analyzer.execute(query)
        
#         # Phase 2: Iterative Retrieval and Quality Assessment
#         best_results = []
#         best_quality_score = 0.0
        
#         for iteration in range(self.max_iterations):
#             iteration_start = time.time()
            
#             # Determine retrieval strategy for this iteration
#             if iteration == 0:
#                 strategy = analysis_decision.metadata['recommended_strategy']
#             else:
#                 # Adapt strategy based on previous results
#                 strategy = self._adapt_strategy(iteration, best_quality_score)
            
#             # Execute retrieval
#             results = self.retrieval_agent.execute(query, strategy, k=5)
            
#             # Assess quality
#             quality_assessment = self.quality_assessor.execute(query, results)
#             current_quality_score = quality_assessment['quality_score']
            
#             # Store iteration results
#             iteration_data = {
#                 'iteration': iteration + 1,
#                 'strategy': strategy,
#                 'results_count': len(results),
#                 'quality_score': current_quality_score,
#                 'issues': quality_assessment['issues'],
#                 'recommendation': quality_assessment['recommendation'],
#                 'time': time.time() - iteration_start
#             }
#             conversation['iterations'].append(iteration_data)
            
#             # Update best results if current iteration is better
#             if current_quality_score > best_quality_score:
#                 best_results = results
#                 best_quality_score = current_quality_score
            
#             logger.info(f"Iteration {iteration + 1}: Strategy={strategy}, Quality={current_quality_score:.3f}")
            
#             # Check if we should stop iterating
#             if self._should_stop_iteration(current_quality_score, quality_assessment, iteration):
#                 break
        
#         # Phase 3: Generate Citations
#         citation_data = self.citation_agent.execute(best_results)
        
#         # Phase 4: Compile Final Response
#         final_response = self._compile_response(
#             query, best_results, best_quality_score, citation_data, conversation
#         )
        
#         # Store final results
#         conversation['final_results'] = final_response
#         conversation['total_time'] = time.time() - start_time
#         self.conversation_history.append(conversation)
        
#         # Log final decision
#         final_decision = AgentDecision(
#             action="completed_agentic_rag",
#             reasoning=f"Completed RAG with {len(conversation['iterations'])} iterations, final quality: {best_quality_score:.3f}",
#             confidence=min(best_quality_score + 0.2, 1.0),
#             next_steps=["Present results to user"],
#             metadata={
#                 'total_time': conversation['total_time'],
#                 'iterations_used': len(conversation['iterations']),
#                 'final_quality': best_quality_score,
#                 'conversation_id': conversation_id
#             }
#         )
#         self.log_decision(final_decision)
        
#         return final_response
    
#     def _adapt_strategy(self, iteration: int, previous_quality: float) -> str:
#         """Adapt retrieval strategy based on previous results"""
#         if previous_quality < 0.4:
#             # Low quality, try different approach
#             strategies = ['hybrid', 'semantic', 'lexical']
#             return strategies[iteration % len(strategies)]
#         elif previous_quality < 0.6:
#             # Medium quality, try hybrid approach
#             return 'hybrid'
#         else:
#             # Good quality, stick with what works
#             return 'semantic'
    
#     def _should_stop_iteration(self, quality_score: float, assessment: Dict, iteration: int) -> bool:
#         """Determine if we should stop iterating"""
#         # Stop if quality is good enough
#         if quality_score >= self.quality_threshold:
#             return True
        
#         # Stop if no recommendation for improvement
#         if not assessment['recommendation']:
#             return True
        
#         # Continue if we have iterations left and there's room for improvement
#         return False
    
#     def _compile_response(self, query: str, results: List[RetrievalResult], 
#                          quality_score: float, citation_data: Dict, 
#                          conversation: Dict) -> Dict[str, Any]:
#         """Compile the final response with all metadata"""
        
#         # Extract key information from results
#         content_summary = self._summarize_content(results)
        
#         return {
#             'query': query,
#             'results': [asdict(result) for result in results],
#             'content_summary': content_summary,
#             'quality_metrics': {
#                 'overall_score': quality_score,
#                 'confidence': min(quality_score + 0.2, 1.0),
#                 'result_count': len(results)
#             },
#             'citations': citation_data['citations'],
#             'bibliography': citation_data['bibliography'],
#             'process_metadata': {
#                 'iterations_used': len(conversation['iterations']),
#                 'total_time': conversation.get('total_time', 0),
#                 'strategies_tried': [iter_data['strategy'] for iter_data in conversation['iterations']],
#                 'conversation_id': conversation['id']
#             },
#             'agent_decisions': {
#                 'query_analysis': self.query_analyzer.decision_history[-1] if self.query_analyzer.decision_history else None,
#                 'retrieval_decisions': self.retrieval_agent.decision_history[-3:],  # Last 3 decisions
#                 'quality_assessments': self.quality_assessor.decision_history[-3:],
#                 'orchestration': self.decision_history[-1] if self.decision_history else None
#             }
#         }
    
#     def _summarize_content(self, results: List[RetrievalResult]) -> str:
#         """Create a summary of the retrieved content"""
#         if not results:
#             return "No relevant content found."
        
#         # Combine content from top results
#         combined_content = "\n\n".join([result.content for result in results[:3]])
        
#         # Simple extractive summary (can be enhanced with proper summarization models)
#         sentences = combined_content.split('. ')
#         # Return first few sentences as summary
#         summary = '. '.join(sentences[:3]) + '.' if len(sentences) >= 3 else combined_content
        
#         return summary[:500] + "..." if len(summary) > 500 else summary
    
#     def get_conversation_history(self) -> List[Dict]:
#         """Return conversation history for analysis"""
#         return self.conversation_history
    
#     def get_system_stats(self) -> Dict[str, Any]:
#         """Get system performance statistics"""
#         if not self.conversation_history:
#             return {'status': 'No conversations yet'}
        
#         total_conversations = len(self.conversation_history)
#         avg_iterations = np.mean([len(conv['iterations']) for conv in self.conversation_history])
#         avg_quality = np.mean([conv['final_results']['quality_metrics']['overall_score'] 
#                               for conv in self.conversation_history if conv['final_results']])
#         avg_time = np.mean([conv.get('total_time', 0) for conv in self.conversation_history])
        
#         return {
#             'total_conversations': total_conversations,
#             'average_iterations': avg_iterations,
#             'average_quality_score': avg_quality,
#             'average_response_time': avg_time,
#             'retrieval_agent_calls': len(self.retrieval_agent.retrieval_history),
#             'quality_assessments': len(self.quality_assessor.decision_history)
#         }



In [None]:
# 1. Free Hugging Face LLM Integration
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
import torch

class HuggingFaceLLMGenerator(BaseAgent):
    def __init__(self, model_name: str = "microsoft/DialoGPT-medium"):
        super().__init__("HF_LLMGenerator", "Generates responses using Hugging Face models")
        
        self.available_models = {
            "microsoft/DialoGPT-medium": "Conversational model",
            "google/flan-t5-base": "Text-to-text model (recommended)",
            "facebook/opt-350m": "Lightweight causal LM",
            "distilgpt2": "Lightweight GPT-2 variant"
        }
        
        try:
            print(f"Loading model: {model_name}")
            
            device = 0 if torch.cuda.is_available() else -1
            self.generator = pipeline(
                "text-generation" if "gpt" in model_name.lower() or "opt" in model_name.lower() else "text2text-generation",
                model=model_name,
                device=device,
                torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32
            )
            
            self.model_name = model_name
            self.model_loaded = True
            print(f"✅ Model {model_name} loaded successfully!")
            
        except Exception as e:
            print(f"❌ Failed to load {model_name}: {e}")
            print("🔄 Falling back to rule-based generation...")
            self.model_loaded = False
    
    def execute(self, query: str, retrieved_content: List[RetrievalResult], 
                citations: List[str]) -> Dict[str, Any]:
       
        if self.model_loaded:
            response = self._generate_hf_response(query, retrieved_content, citations)
        else:
            response = self._generate_rule_based_response(query, retrieved_content, citations)
        
        decision = AgentDecision(
            action="generated_response",
            reasoning=f"Generated response using {'HF model' if self.model_loaded else 'rule-based'} approach",
            confidence=0.85 if self.model_loaded else 0.7,
            next_steps=["Present final response"],
            metadata={
                'response_length': len(response['content']),
                'sources_used': len(retrieved_content),
                'generation_method': response['method']
            }
        )
        
        self.log_decision(decision)
        return response
    
    def _generate_hf_response(self, query: str, retrieved_content: List[RetrievalResult], 
                             citations: List[str]) -> Dict[str, Any]:
        context_parts = []
        for i, result in enumerate(retrieved_content[:3]):  
            context_parts.append(f"Source {i+1}: {result.content[:200]}...")
        
        context = " ".join(context_parts)
        
        if "flan" in self.model_name.lower():
            prompt = f"Answer the question based on the context.\n\nContext: {context}\n\nQuestion: {query}\n\nAnswer:"
        else:
            prompt = f"Based on the research context: {context[:400]}...\n\nQuestion: {query}\nAnswer:"
        
        try:
            if "flan" in self.model_name.lower():
                outputs = self.generator(prompt, max_length=200, num_return_sequences=1)
                content = outputs[0]['generated_text'].strip()
            else:
                outputs = self.generator(
                    prompt, 
                    max_new_tokens=150,
                    num_return_sequences=1,
                    temperature=0.7,
                    do_sample=True,
                    pad_token_id=self.generator.tokenizer.eos_token_id
                )
                generated_text = outputs[0]['generated_text']
                content = generated_text[len(prompt):].strip()
            
            return {
                'content': content,
                'method': 'hf_generated',
                'model_used': self.model_name
            }
            
        except Exception as e:
            logger.error(f"HF generation failed: {e}")
            return self._generate_rule_based_response(query, retrieved_content, citations)
    
    def _generate_rule_based_response(self, query: str, retrieved_content: List[RetrievalResult], 
                                     citations: List[str]) -> Dict[str, Any]:
 
        if not retrieved_content:
            return {
                'content': "I couldn't find relevant information to answer your question.",
                'method': 'rule_based',
                'confidence': 0.1
            }
        
        key_sentences = []
        important_terms = []
        
        query_terms = set(word.lower() for word in query.split() if len(word) > 3)
        
        for result in retrieved_content[:3]:
            sentences = [s.strip() for s in result.content.split('.') if len(s.strip()) > 20]
            
            for sentence in sentences:
                sentence_terms = set(word.lower() for word in sentence.split() if len(word) > 3)
                overlap = len(query_terms.intersection(sentence_terms))
                
                if overlap >= 2:  
                    key_sentences.append(sentence)
                    caps_words = [w for w in sentence.split() if w[0].isupper() and len(w) > 3]
                    important_terms.extend(caps_words[:3])
        
        if key_sentences:
            response_parts = [
                f"Based on the retrieved research documents about {query.lower()}:",
                "",
                ". ".join(key_sentences[:2]) + ".",
                "",
                "Key concepts mentioned:",
                ", ".join(list(set(important_terms))[:5]) if important_terms else "Various technical concepts",
                "",
                f"Additional context from page {retrieved_content[0].page_number}:",
                retrieved_content[0].content[:150] + "..."
            ]
            
            content = "\n".join(response_parts)
        else:
            content = (
                f"Based on the available research documentation:\n\n"
                f"{retrieved_content[0].content[:200]}...\n\n"
                f"This information comes from page {retrieved_content[0].page_number} "
                f"of the research paper."
            )
        
        return {
            'content': content,
            'method': 'enhanced_rule_based',
            'sentences_used': len(key_sentences)
        }

class QualityAssessmentAgent(QualityAssessmentAgent):
    def _calculate_quality_metrics(self, query: str, results: list[RetrievalResult]) -> dict:
        if not results:
            return {
                'relevance': 0.0,
                'diversity': 0.0,
                'coverage': 0.0,
                'length_consistency': 0.0,
                'term_overlap': 0.0
            }
        
        query_terms = set(word.lower() for word in query.split() if len(word) > 2)
    
        relevance_scores = []
        for result in results:
            normalized_score = min(result.score, 1.0) if result.score > 0 else 0.0
            relevance_scores.append(normalized_score)
        
        relevance_score = np.mean(relevance_scores) if relevance_scores else 0.0
        
        term_overlap_scores = []
        for result in results:
            content_terms = set(word.lower() for word in result.content.split() if len(word) > 2)
            if query_terms:
                overlap_ratio = len(query_terms.intersection(content_terms)) / len(query_terms)
                term_overlap_scores.append(overlap_ratio)
        
        term_overlap_score = np.mean(term_overlap_scores) if term_overlap_scores else 0.0
        
        if len(results) > 1:
            # Check for different pages
            unique_pages = len(set(r.page_number for r in results if r.page_number))
            page_diversity = min(unique_pages / len(results), 1.0)
            
            # Check content diversity 
            content_lengths = [len(r.content) for r in results]
            length_variance = np.var(content_lengths) / np.mean(content_lengths) if content_lengths else 0
            length_diversity = min(length_variance, 1.0)
            
            diversity_score = (page_diversity + length_diversity) / 2
        else:
            diversity_score = 0.5  
        
        # 4. Coverage score 
        coverage_score = min((relevance_score + term_overlap_score) / 2, 1.0)
        
        # 5. Length consistency
        if results:
            lengths = [len(r.content) for r in results]
            avg_length = np.mean(lengths)
            # Prefer lengths between 100-1000 characters
            if 100 <= avg_length <= 1000:
                length_score = 1.0
            elif avg_length < 100:
                length_score = avg_length / 100.0
            else:
                length_score = max(0.3, 1000 / avg_length)
        else:
            length_score = 0.0
        
        return {
            'relevance': relevance_score,
            'diversity': diversity_score,
            'coverage': coverage_score,
            'length_consistency': length_score,
            'term_overlap': term_overlap_score
        }
    
    def _calculate_overall_score(self, metrics: dict) -> float:
        weights = {
            'relevance': 0.3,
            'term_overlap': 0.25, 
            'coverage': 0.2,
            'diversity': 0.15,
            'length_consistency': 0.1
        }
        
        score = sum(metrics.get(key, 0) * weights[key] for key in weights.keys())
        return min(score, 1.0) 
    
    def _identify_issues(self, metrics: dict) -> list[str]:
        issues = []
        
        if metrics['relevance'] < 0.4:
            issues.append('low_relevance')
        if metrics['term_overlap'] < 0.3:
            issues.append('poor_term_overlap')
        if metrics['diversity'] < 0.3:
            issues.append('low_diversity')
        if metrics['coverage'] < 0.4:
            issues.append('poor_coverage')
        if metrics['length_consistency'] < 0.5:
            issues.append('inconsistent_length')
        
        return issues

# 3. Orchestration Agent 
class OrchestrationAgent(BaseAgent):
    def __init__(self, retriever: HybridRetriever, use_hf_llm: bool = True):
        super().__init__(retriever)
        
        self.quality_assessor = QualityAssessmentAgent()
        
        if use_hf_llm:
            models_to_try = [
                "google/flan-t5-base", 
                "distilgpt2",           
                "microsoft/DialoGPT-medium"  
            ]
            
            self.llm_generator = None
            for model in models_to_try:
                try:
                    self.llm_generator = HuggingFaceLLMGenerator(model)
                    if self.llm_generator.model_loaded:
                        break
                except:
                    continue
            
            if not self.llm_generator or not self.llm_generator.model_loaded:
                print("⚠️ No HF model could be loaded, using enhanced rule-based generation")
                self.llm_generator = HuggingFaceLLMGenerator("fallback")  
        else:
            self.llm_generator = HuggingFaceLLMGenerator("fallback")
        
        # Fixed configuration
        self.max_iterations = 3
        self.quality_threshold = 0.5  
        self.min_iterations = 2
    
    def _should_stop_iteration(self, quality_score: float, assessment: Dict, iteration: int) -> bool:
        if iteration < self.min_iterations - 1:
            return False
        
        if quality_score >= 0.75:
            return True
        
        if iteration >= self.max_iterations - 1:
            return True
        
        return False
    
    def _adapt_strategy(self, iteration: int, previous_quality: float) -> str:
        if iteration == 1:
            return 'hybrid'
        elif iteration == 2:
            if previous_quality < 0.5:
                return 'lexical' 
            else:
                return 'semantic'  
        else:
            return 'hybrid'  
    
    def execute(self, query: str, context: Dict[str, Any] = None) -> Dict[str, Any]:
        context = context or {}
        start_time = time.time()
        
        conversation_id = len(self.conversation_history)
        conversation = {
            'id': conversation_id,
            'query': query,
            'context': context,
            'start_time': start_time,
            'iterations': [],
            'final_results': None
        }
        
        logger.info(f"🚀 Starting FIXED agentic RAG for query: {query[:50]}...")
 
        analysis_decision = self.query_analyzer.execute(query)
        best_results = []
        best_quality_score = 0.0
        
        for iteration in range(self.max_iterations):
            iteration_start = time.time()
            
            if iteration == 0:
                strategy = analysis_decision.metadata['recommended_strategy']
            else:
                strategy = self._adapt_strategy(iteration, best_quality_score)

            k = 5 + iteration  
            results = self.retrieval_agent.execute(query, strategy, k=k)
            quality_assessment = self.quality_assessor.execute(query, results)
            current_quality_score = quality_assessment['quality_score']
            
            iteration_data = {
                'iteration': iteration + 1,
                'strategy': strategy,
                'results_count': len(results),
                'quality_score': current_quality_score,
                'issues': quality_assessment['issues'],
                'recommendation': quality_assessment['recommendation'],
                'time': time.time() - iteration_start
            }
            conversation['iterations'].append(iteration_data)

            if current_quality_score > best_quality_score:
                best_results = results
                best_quality_score = current_quality_score
            
            logger.info(f"🔄 Iteration {iteration + 1}: Strategy={strategy}, Quality={current_quality_score:.3f}")
      
            if self._should_stop_iteration(current_quality_score, quality_assessment, iteration):
                logger.info(f"🛑 Stopping after {iteration + 1} iterations")
 
        citation_data = self.citation_agent.execute(best_results)

        if self.llm_generator:
            llm_response = self.llm_generator.execute(query, best_results, citation_data['citations'])
        else:
            llm_response = {'content': self._summarize_content(best_results), 'method': 'summary'}

        final_response = self._compile_enhanced_response(
            query, best_results, best_quality_score, citation_data, llm_response, conversation
        )

        conversation['final_results'] = final_response
        conversation['total_time'] = time.time() - start_time
        self.conversation_history.append(conversation)
 
        final_decision = AgentDecision(
            action="completed_fixed_agentic_rag",
            reasoning=f"✅ Completed FIXED RAG with {len(conversation['iterations'])} iterations, final quality: {best_quality_score:.3f}",
            confidence=min(best_quality_score + 0.2, 1.0),
            next_steps=["Present results to user"],
            metadata={
                'total_time': conversation['total_time'],
                'iterations_used': len(conversation['iterations']),
                'final_quality': best_quality_score,
                'conversation_id': conversation_id
            }
        )
        self.log_decision(final_decision)
        
        return final_response
    
    def _compile_enhanced_response(self, query: str, results: List[RetrievalResult], 
                                  quality_score: float, citation_data: Dict, 
                                  llm_response: Dict, conversation: Dict) -> Dict[str, Any]:
        
        return {
            'query': query,
            'response': llm_response['content'],
            'generation_method': llm_response['method'],
            'results': [asdict(result) for result in results],
            'content_summary': self._summarize_content(results),
            'quality_metrics': {
                'overall_score': quality_score,
                'confidence': min(quality_score + 0.2, 1.0),
                'result_count': len(results)
            },
            'citations': citation_data['citations'],
            'bibliography': citation_data['bibliography'],
            'process_metadata': {
                'iterations_used': len(conversation['iterations']),
                'total_time': conversation.get('total_time', 0),
                'strategies_tried': [iter_data['strategy'] for iter_data in conversation['iterations']],
                'conversation_id': conversation['id']
            },
            'agent_decisions': {
                'query_analysis': self.query_analyzer.decision_history[-1] if self.query_analyzer.decision_history else None,
                'retrieval_decisions': self.retrieval_agent.decision_history[-3:],
                'quality_assessments': self.quality_assessor.decision_history[-3:],
                'llm_generation': self.llm_generator.decision_history[-1] if self.llm_generator and self.llm_generator.decision_history else None,
                'orchestration': self.decision_history[-1] if self.decision_history else None
            }
        }

print("🔧 Installing required packages for HuggingFace integration...")

🔧 Installing required packages for HuggingFace integration...


In [None]:
print("🚀 Initializing Agentic RAG System with HuggingFace LLM...")

# Initialize the orchestrator
orchestrator = OrchestrationAgent(retriever, use_hf_llm=True)

# Test with the same queries
test_queries = [
    "What is the attention mechanism in transformers?",
    "How does multi-head attention work?",
    "Compare scaled dot-product attention with additive attention"
]

print("\n" + "="*60)
print("TESTING AGENTIC RAG SYSTEM")
print("="*60)

for idx, query in enumerate(test_queries, 1):
    print(f"\n🔍 Test {idx}: {query}")
    print("-" * 50)
    
    response = orchestrator.execute(query)
    
    print(f"✅ Quality Score: {response['quality_metrics']['overall_score']:.3f}")
    print(f"🔄 Iterations Used: {response['process_metadata']['iterations_used']}")
    print(f"🤖 Generation Method: {response['generation_method']}")
    print(f"⏱️ Total Time: {response['process_metadata']['total_time']:.2f}s")
    
    print(f"\n📝 Generated Response:")
    print(response['response'][:400] + "..." if len(response['response']) > 400 else response['response'])
    
    print(f"\n📚 Citations:")
    for citation in response['citations'][:3]:
        print(f"  • {citation}")
    
    if idx < len(test_queries):
        print("\n" + "="*60)

print(f"\n📊 PERFORMANCE COMPARISON:")
print("="*40)

# Run performance analysis with the fixed system
fixed_analyzer = PerformanceAnalyzer(orchestrator)
fixed_comparison = fixed_analyzer.compare_with_traditional_rag(test_queries)

print(f" Agentic RAG Quality: {np.mean(fixed_comparison['agentic_quality']):.3f}")
print(f" Agentic RAG Avg Iterations: {np.mean(fixed_comparison['agentic_iterations']):.1f}")
print(f" Agentic RAG Avg Time: {np.mean(fixed_comparison['agentic_times']):.3f}s")

# Get system stats
stats = orchestrator.get_system_stats()
print(f"\n📈 System Statistics:")
for key, value in stats.items():
    if isinstance(value, float):
        print(f"  {key.replace('_', ' ').title()}: {value:.3f}")
    else:
        print(f"  {key.replace('_', ' ').title()}: {value}")

print("\n🎯 Expected Improvements:")
print("  ✅ Minimum 2 iterations guaranteed")
print("  ✅ Better quality scoring calibration")
print("  ✅ HuggingFace LLM integration (free)")
print("  ✅ Enhanced response generation")
print("  ✅ Improved strategy adaptation")

INFO:sentence_transformers.SentenceTransformer:Use pytorch device_name: cpu
INFO:sentence_transformers.SentenceTransformer:Load pretrained SentenceTransformer: all-MiniLM-L6-v2


🚀 Initializing FIXED Agentic RAG System with HuggingFace LLM...


INFO:sentence_transformers.SentenceTransformer:Use pytorch device_name: cpu
INFO:sentence_transformers.SentenceTransformer:Load pretrained SentenceTransformer: all-MiniLM-L6-v2


Loading model: google/flan-t5-base


config.json: 0.00B [00:00, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
`torch_dtype` is deprecated! Use `dtype` instead!
Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/990M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json: 0.00B [00:00, ?B/s]

Device set to use cpu
INFO:__main__:🚀 Starting FIXED agentic RAG for query: What is the attention mechanism in transformers?...
INFO:__main__:QueryAnalyzer: recommend_semantic_retrieval - Factual queries benefit from semantic understanding


✅ Model google/flan-t5-base loaded successfully!

TESTING FIXED AGENTIC RAG SYSTEM

🔍 Test 1: What is the attention mechanism in transformers?
--------------------------------------------------


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

INFO:__main__:RetrievalAgent: retrieved_5_documents - Used semantic strategy to find relevant documents
INFO:__main__:QualityAssessor: quality_score_0.59 - Assessed 5 results with score 0.59
INFO:__main__:🔄 Iteration 1: Strategy=semantic, Quality=0.587


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

INFO:__main__:RetrievalAgent: retrieved_6_documents - Used hybrid strategy to find relevant documents
INFO:__main__:QualityAssessor: quality_score_0.79 - Assessed 6 results with score 0.79
INFO:__main__:🔄 Iteration 2: Strategy=hybrid, Quality=0.787
INFO:__main__:🛑 Stopping after 2 iterations
INFO:__main__:CitationAgent: generated_6_citations - Created proper citations for all retrieved sources
Both `max_new_tokens` (=256) and `max_length`(=200) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
INFO:__main__:HF_LLMGenerator: generated_response - Generated response using HF model approach
INFO:__main__:Orchestrator: completed_fixed_agentic_rag - ✅ Completed FIXED RAG with 2 iterations, final quality: 0.787
INFO:__main__:🚀 Starting FIXED agentic RAG for query: How does multi-head attention work?...
INFO:__main__:QueryAnalyzer: recommend_hybrid

✅ Quality Score: 0.787
🔄 Iterations Used: 2
🤖 Generation Method: hf_generated
⏱️ Total Time: 0.00s

📝 Generated Response:
a recurrent attention mechanism instead of sequence- aligned recurren

📚 Citations:
  • [1] NIPS-2017-attention-is-all-you-need-Paper, page 2
  • [2] NIPS-2017-attention-is-all-you-need-Paper, page 2
  • [3] NIPS-2017-attention-is-all-you-need-Paper, page 2


🔍 Test 2: How does multi-head attention work?
--------------------------------------------------


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

INFO:__main__:RetrievalAgent: retrieved_5_documents - Used hybrid strategy to find relevant documents
INFO:__main__:QualityAssessor: quality_score_0.72 - Assessed 5 results with score 0.72
INFO:__main__:🔄 Iteration 1: Strategy=hybrid, Quality=0.724


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

INFO:__main__:RetrievalAgent: retrieved_6_documents - Used hybrid strategy to find relevant documents
INFO:__main__:QualityAssessor: quality_score_0.71 - Assessed 6 results with score 0.71
INFO:__main__:🔄 Iteration 2: Strategy=hybrid, Quality=0.711


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

INFO:__main__:RetrievalAgent: retrieved_7_documents - Used semantic strategy to find relevant documents
INFO:__main__:QualityAssessor: quality_score_0.53 - Assessed 7 results with score 0.53
INFO:__main__:🔄 Iteration 3: Strategy=semantic, Quality=0.526
INFO:__main__:🛑 Stopping after 3 iterations
INFO:__main__:CitationAgent: generated_5_citations - Created proper citations for all retrieved sources
Both `max_new_tokens` (=256) and `max_length`(=200) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
INFO:__main__:HF_LLMGenerator: generated_response - Generated response using HF model approach
INFO:__main__:Orchestrator: completed_fixed_agentic_rag - ✅ Completed FIXED RAG with 3 iterations, final quality: 0.724
INFO:__main__:🚀 Starting FIXED agentic RAG for query: Compare scaled dot-product attention with additive...
INFO:__main__:QueryAnalyze

✅ Quality Score: 0.724
🔄 Iterations Used: 3
🤖 Generation Method: hf_generated
⏱️ Total Time: 0.00s

📝 Generated Response:
Multi-Head Attention consists of several attention layers running in parallel.

📚 Citations:
  • [1] NIPS-2017-attention-is-all-you-need-Paper, page 4
  • [2] NIPS-2017-attention-is-all-you-need-Paper, page 4
  • [3] NIPS-2017-attention-is-all-you-need-Paper, page 7


🔍 Test 3: Compare scaled dot-product attention with additive attention
--------------------------------------------------


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

INFO:__main__:RetrievalAgent: retrieved_5_documents - Used hybrid strategy to find relevant documents
INFO:__main__:QualityAssessor: quality_score_0.86 - Assessed 5 results with score 0.86
INFO:__main__:🔄 Iteration 1: Strategy=hybrid, Quality=0.865


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

INFO:__main__:RetrievalAgent: retrieved_6_documents - Used hybrid strategy to find relevant documents
INFO:__main__:QualityAssessor: quality_score_0.86 - Assessed 6 results with score 0.86
INFO:__main__:🔄 Iteration 2: Strategy=hybrid, Quality=0.858
INFO:__main__:🛑 Stopping after 2 iterations
INFO:__main__:CitationAgent: generated_5_citations - Created proper citations for all retrieved sources
Both `max_new_tokens` (=256) and `max_length`(=200) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
INFO:__main__:HF_LLMGenerator: generated_response - Generated response using HF model approach
INFO:__main__:Orchestrator: completed_fixed_agentic_rag - ✅ Completed FIXED RAG with 2 iterations, final quality: 0.865


✅ Quality Score: 0.865
🔄 Iterations Used: 2
🤖 Generation Method: hf_generated
⏱️ Total Time: 0.00s

📝 Generated Response:
The relevant information to answer the above question is: Scaled Dot-Product Attention We call our particular attention "Scaled Dot-Proproduct Attention" (Figure 2). The input consists of queries and keys of dimension dk,... Source 3: of 1dk . Additive attention computes the compatibility function using a feed-forward network with a single hidden layer. While the two are similar in theoretical com...

📚 Citations:
  • [1] NIPS-2017-attention-is-all-you-need-Paper, page 4
  • [2] NIPS-2017-attention-is-all-you-need-Paper, page 3
  • [3] NIPS-2017-attention-is-all-you-need-Paper, page 4

📊 PERFORMANCE COMPARISON:


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

INFO:__main__:QualityAssessor: quality_score_0.59 - Assessed 5 results with score 0.59


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

INFO:__main__:QualityAssessor: quality_score_0.56 - Assessed 5 results with score 0.56


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

INFO:__main__:QualityAssessor: quality_score_0.70 - Assessed 5 results with score 0.70


Fixed Agentic RAG Quality: 0.865
Fixed Agentic RAG Avg Iterations: 2.0
Fixed Agentic RAG Avg Time: 16.772s

📈 System Statistics:
  Total Conversations: 3
  Average Iterations: 2.333
  Average Quality Score: 0.792
  Average Response Time: 9.775
  Retrieval Agent Calls: 7
  Quality Assessments: 10

🎯 Expected Improvements:
  ✅ Minimum 2 iterations guaranteed
  ✅ Better quality scoring calibration
  ✅ HuggingFace LLM integration (free)
  ✅ Enhanced response generation
  ✅ Improved strategy adaptation


## 7. Testing the Agentic RAG System <a id="testing1"></a>

Create comprehensive tests to ensure our system works correctly.


In [None]:

for i, query in enumerate(test_queries):
    print(f"\n{'='*60}")
    print(f"Test Query {i+1}: {query}")
    print('='*60)
    
    response = orchestrator.execute(query)
    
    # Display results
    print(f"\nQuery Type: {orchestrator.query_analyzer.decision_history[-1].metadata['query_type']}")
    print(f"Recommended Strategy: {orchestrator.query_analyzer.decision_history[-1].metadata['recommended_strategy']}")
    print(f"Iterations Used: {response['process_metadata']['iterations_used']}")
    print(f"Total Time: {response['process_metadata']['total_time']:.3f} seconds")
    print(f"Quality Score: {response['quality_metrics']['overall_score']:.3f}")
    
    print(f"\nContent Summary:")
    print(response['content_summary'])
    
    print(f"\nCitations:")
    for citation in response['citations']:
        print(f"  {citation}")
    
    print(f"\nTop Retrieved Results:")
    for j, result in enumerate(response['results'][:2]): 
        print(f"  {j+1}. Page {result['page_number']} | Score: {result['score']:.3f}")
        print(f"     Content: {result['content'][:150]}...")
    
    if i < len(test_queries) - 1:  
        time.sleep(1)  

print(f"\n{'='*60}")
print("SYSTEM PERFORMANCE STATISTICS")
print('='*60)

stats = orchestrator.get_system_stats()
for key, value in stats.items():
    if isinstance(value, float):
        print(f"{key.replace('_', ' ').title()}: {value:.3f}")
    else:
        print(f"{key.replace('_', ' ').title()}: {value}")

INFO:__main__:Starting agentic RAG for query: What is the attention mechanism in transformers?...
INFO:__main__:QueryAnalyzer: recommend_semantic_retrieval - Factual queries benefit from semantic understanding



Test Query 1: What is the attention mechanism in transformers?


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

INFO:__main__:RetrievalAgent: retrieved_5_documents - Used semantic strategy to find relevant documents


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

INFO:__main__:QualityAssessor: quality_score_0.56 - Assessed 5 results with score 0.56
INFO:__main__:Iteration 1: Strategy=semantic, Quality=0.559
INFO:__main__:CitationAgent: generated_5_citations - Created proper citations for all retrieved sources
INFO:__main__:Orchestrator: completed_agentic_rag - Completed RAG with 1 iterations, final quality: 0.559



Query Type: factual
Recommended Strategy: semantic
Iterations Used: 1
Total Time: 0.000 seconds
Quality Score: 0.559

Content Summary:
it more difﬁcult to learn dependencies between distant positions [ 11]. In the Transformer this is
reduced to a constant number of operations, albeit at the cost of reduced effective resolution due
to averaging attention-weighted positions, an effect we counteract with Multi-Head Attention as
described in section 3.2.
Self-attention, sometimes called intra-attention is an attention mechanism relating different positions
of a single sequence in order to compute a representation of the sequence. S...

Citations:
  [1] NIPS-2017-attention-is-all-you-need-Paper, page 2
  [2] NIPS-2017-attention-is-all-you-need-Paper, page 9
  [3] NIPS-2017-attention-is-all-you-need-Paper, page 5
  [4] NIPS-2017-attention-is-all-you-need-Paper, page 2
  [5] NIPS-2017-attention-is-all-you-need-Paper, page 5

Top Retrieved Results:
  1. Page 2 | Score: 0.555
     Content: it 

INFO:__main__:Starting agentic RAG for query: How does multi-head attention work?...
INFO:__main__:QueryAnalyzer: recommend_hybrid_retrieval - Complex queries require multi-modal retrieval



Test Query 2: How does multi-head attention work?


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

INFO:__main__:RetrievalAgent: retrieved_5_documents - Used hybrid strategy to find relevant documents


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

INFO:__main__:QualityAssessor: quality_score_0.92 - Assessed 5 results with score 0.92
INFO:__main__:Iteration 1: Strategy=hybrid, Quality=0.922
INFO:__main__:CitationAgent: generated_5_citations - Created proper citations for all retrieved sources
INFO:__main__:Orchestrator: completed_agentic_rag - Completed RAG with 1 iterations, final quality: 0.922



Query Type: general
Recommended Strategy: hybrid
Iterations Used: 1
Total Time: 0.000 seconds
Quality Score: 0.922

Content Summary:
.
3.2.2 Multi-Head Attention
Instead of performing a single attention function with dmodel-dimensional keys, values and queries,
we found it beneﬁcial to linearly project the queries, keys and values htimes with different, learned
linear projections to dk, dk and dv dimensions, respectively. On each of these projected versions of
queries, keys and values we then perform the attention function in parallel, yielding dv-dimensional
output values. These are concatenated and once again projected, res...

Citations:
  [1] NIPS-2017-attention-is-all-you-need-Paper, page 4
  [2] NIPS-2017-attention-is-all-you-need-Paper, page 4
  [3] NIPS-2017-attention-is-all-you-need-Paper, page 7
  [4] NIPS-2017-attention-is-all-you-need-Paper, page 2
  [5] NIPS-2017-attention-is-all-you-need-Paper, page 2

Top Retrieved Results:
  1. Page 4 | Score: 1.524
     Content: .
3.2

INFO:__main__:Starting agentic RAG for query: Compare scaled dot-product attention with additive...
INFO:__main__:QueryAnalyzer: recommend_hybrid_retrieval - Default to hybrid approach for balanced results



Test Query 3: Compare scaled dot-product attention with additive attention


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

INFO:__main__:RetrievalAgent: retrieved_5_documents - Used hybrid strategy to find relevant documents


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

INFO:__main__:QualityAssessor: quality_score_1.75 - Assessed 5 results with score 1.75
INFO:__main__:Iteration 1: Strategy=hybrid, Quality=1.748
INFO:__main__:CitationAgent: generated_5_citations - Created proper citations for all retrieved sources
INFO:__main__:Orchestrator: completed_agentic_rag - Completed RAG with 1 iterations, final quality: 1.748



Query Type: comparison
Recommended Strategy: hybrid
Iterations Used: 1
Total Time: 0.000 seconds
Quality Score: 1.748

Content Summary:
Scaled Dot-Product Attention
 Multi-Head Attention
Figure 2: (left) Scaled Dot-Product Attention. (right) Multi-Head Attention consists of several
attention layers running in parallel.
query with all keys, divide each by √dk, and apply a softmax function to obtain the weights on the
values.
In practice, we compute the attention function on a set of queries simultaneously, packed together
into a matrix Q. The keys and values are also packed together into matrices Kand V.

Citations:
  [1] NIPS-2017-attention-is-all-you-need-Paper, page 4
  [2] NIPS-2017-attention-is-all-you-need-Paper, page 3
  [3] NIPS-2017-attention-is-all-you-need-Paper, page 4
  [4] NIPS-2017-attention-is-all-you-need-Paper, page 3
  [5] NIPS-2017-attention-is-all-you-need-Paper, page 5

Top Retrieved Results:
  1. Page 4 | Score: 4.893
     Content: Scaled Dot-Product Attention
 M

INFO:__main__:Starting agentic RAG for query: What are the computational complexities of differe...
INFO:__main__:QueryAnalyzer: recommend_hybrid_retrieval - Default to hybrid approach for balanced results



Test Query 4: What are the computational complexities of different attention mechanisms?


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

INFO:__main__:RetrievalAgent: retrieved_5_documents - Used hybrid strategy to find relevant documents


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

INFO:__main__:QualityAssessor: quality_score_1.42 - Assessed 5 results with score 1.42
INFO:__main__:Iteration 1: Strategy=hybrid, Quality=1.421
INFO:__main__:CitationAgent: generated_5_citations - Created proper citations for all retrieved sources
INFO:__main__:Orchestrator: completed_agentic_rag - Completed RAG with 1 iterations, final quality: 1.421



Query Type: general
Recommended Strategy: hybrid
Iterations Used: 1
Total Time: 0.000 seconds
Quality Score: 1.421

Content Summary:
different layer types.
As noted in Table 1, a self-attention layer connects all positions with a constant number of sequentially
executed operations, whereas a recurrent layer requires O(n) sequential operations. In terms of
computational complexity, self-attention layers are faster than recurrent layers when the sequence
length n is smaller than the representation dimensionality d, which is most often the case with
sentence representations used by state-of-the-art models in machine translations...

Citations:
  [1] NIPS-2017-attention-is-all-you-need-Paper, page 6
  [2] NIPS-2017-attention-is-all-you-need-Paper, page 5
  [3] NIPS-2017-attention-is-all-you-need-Paper, page 4
  [4] NIPS-2017-attention-is-all-you-need-Paper, page 6
  [5] NIPS-2017-attention-is-all-you-need-Paper, page 5

Top Retrieved Results:
  1. Page 6 | Score: 3.039
     Content: diffe

INFO:__main__:Starting agentic RAG for query: Explain the positional encoding in the transformer...
INFO:__main__:QueryAnalyzer: recommend_semantic_retrieval - Factual queries benefit from semantic understanding



Test Query 5: Explain the positional encoding in the transformer model


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

INFO:__main__:RetrievalAgent: retrieved_5_documents - Used semantic strategy to find relevant documents


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

INFO:__main__:QualityAssessor: quality_score_0.56 - Assessed 5 results with score 0.56
INFO:__main__:Iteration 1: Strategy=semantic, Quality=0.559
INFO:__main__:CitationAgent: generated_5_citations - Created proper citations for all retrieved sources
INFO:__main__:Orchestrator: completed_agentic_rag - Completed RAG with 1 iterations, final quality: 0.559



Query Type: factual
Recommended Strategy: semantic
Iterations Used: 1
Total Time: 0.000 seconds
Quality Score: 0.559

Content Summary:
Figure 1: The Transformer - model architecture.
wise fully connected feed-forward network. We employ a residual connection [10] around each of
the two sub-layers, followed by layer normalization [ 1]. That is, the output of each sub-layer is
LayerNorm(x+ Sublayer(x)), where Sublayer(x) is the function implemented by the sub-layer
itself.

Citations:
  [1] NIPS-2017-attention-is-all-you-need-Paper, page 3
  [2] NIPS-2017-attention-is-all-you-need-Paper, page 5
  [3] NIPS-2017-attention-is-all-you-need-Paper, page 2
  [4] NIPS-2017-attention-is-all-you-need-Paper, page 2
  [5] NIPS-2017-attention-is-all-you-need-Paper, page 5

Top Retrieved Results:
  1. Page 3 | Score: 0.550
     Content: Figure 1: The Transformer - model architecture.
wise fully connected feed-forward network. We employ a residual connection [10] around each of
the two...
  2. Page 5 |

In [None]:
orchestrator = OrchestrationAgent(retriever)

test_queries = [
    "What is the attention mechanism in transformers?",
    "How does multi-head attention work?",
    "Compare scaled dot-product attention with additive attention",
    "What are the computational complexities of different attention mechanisms?",
    "Explain the positional encoding in the transformer model"
]

print("Testing Agentic RAG System")
print("=" * 50)

for idx, query in enumerate(test_queries, 1):
    print(f"\nTest {idx}: Query: {query}")
    response = orchestrator.execute(query)
    print("Response Summary:")
    print(response['content_summary'])
    print("Quality Score:", response['quality_metrics']['overall_score'])
    print("Citations:", response['citations'])
    print("-" * 50)

INFO:sentence_transformers.SentenceTransformer:Use pytorch device_name: cpu
INFO:sentence_transformers.SentenceTransformer:Load pretrained SentenceTransformer: all-MiniLM-L6-v2
INFO:sentence_transformers.SentenceTransformer:Use pytorch device_name: cpu
INFO:sentence_transformers.SentenceTransformer:Load pretrained SentenceTransformer: all-MiniLM-L6-v2


Loading model: google/flan-t5-base


Device set to use cpu
INFO:__main__:🚀 Starting FIXED agentic RAG for query: What is the attention mechanism in transformers?...
INFO:__main__:QueryAnalyzer: recommend_semantic_retrieval - Factual queries benefit from semantic understanding


✅ Model google/flan-t5-base loaded successfully!
Testing Agentic RAG System

Test 1: Query: What is the attention mechanism in transformers?


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

INFO:__main__:RetrievalAgent: retrieved_5_documents - Used semantic strategy to find relevant documents
INFO:__main__:QualityAssessor: quality_score_0.59 - Assessed 5 results with score 0.59
INFO:__main__:🔄 Iteration 1: Strategy=semantic, Quality=0.587


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

INFO:__main__:RetrievalAgent: retrieved_6_documents - Used hybrid strategy to find relevant documents
INFO:__main__:QualityAssessor: quality_score_0.79 - Assessed 6 results with score 0.79
INFO:__main__:🔄 Iteration 2: Strategy=hybrid, Quality=0.787
INFO:__main__:🛑 Stopping after 2 iterations
INFO:__main__:CitationAgent: generated_6_citations - Created proper citations for all retrieved sources
Both `max_new_tokens` (=256) and `max_length`(=200) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
INFO:__main__:HF_LLMGenerator: generated_response - Generated response using HF model approach
INFO:__main__:Orchestrator: completed_fixed_agentic_rag - ✅ Completed FIXED RAG with 2 iterations, final quality: 0.787
INFO:__main__:🚀 Starting FIXED agentic RAG for query: How does multi-head attention work?...
INFO:__main__:QueryAnalyzer: recommend_hybrid

Response Summary:
it more difﬁcult to learn dependencies between distant positions [ 11]. In the Transformer this is
reduced to a constant number of operations, albeit at the cost of reduced effective resolution due
to averaging attention-weighted positions, an effect we counteract with Multi-Head Attention as
described in section 3.2.
Self-attention, sometimes called intra-attention is an attention mechanism relating different positions
of a single sequence in order to compute a representation of the sequence. S...
Quality Score: 0.7875
Citations: ['[1] NIPS-2017-attention-is-all-you-need-Paper, page 2', '[2] NIPS-2017-attention-is-all-you-need-Paper, page 2', '[3] NIPS-2017-attention-is-all-you-need-Paper, page 2', '[4] NIPS-2017-attention-is-all-you-need-Paper, page 5', '[5] NIPS-2017-attention-is-all-you-need-Paper, page 8', '[6] NIPS-2017-attention-is-all-you-need-Paper, page 5']
--------------------------------------------------

Test 2: Query: How does multi-head attention work?

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

INFO:__main__:RetrievalAgent: retrieved_5_documents - Used hybrid strategy to find relevant documents
INFO:__main__:QualityAssessor: quality_score_0.72 - Assessed 5 results with score 0.72
INFO:__main__:🔄 Iteration 1: Strategy=hybrid, Quality=0.724


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

INFO:__main__:RetrievalAgent: retrieved_6_documents - Used hybrid strategy to find relevant documents
INFO:__main__:QualityAssessor: quality_score_0.71 - Assessed 6 results with score 0.71
INFO:__main__:🔄 Iteration 2: Strategy=hybrid, Quality=0.711


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

INFO:__main__:RetrievalAgent: retrieved_7_documents - Used semantic strategy to find relevant documents
INFO:__main__:QualityAssessor: quality_score_0.53 - Assessed 7 results with score 0.53
INFO:__main__:🔄 Iteration 3: Strategy=semantic, Quality=0.526
INFO:__main__:🛑 Stopping after 3 iterations
INFO:__main__:CitationAgent: generated_5_citations - Created proper citations for all retrieved sources
Both `max_new_tokens` (=256) and `max_length`(=200) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
INFO:__main__:HF_LLMGenerator: generated_response - Generated response using HF model approach
INFO:__main__:Orchestrator: completed_fixed_agentic_rag - ✅ Completed FIXED RAG with 3 iterations, final quality: 0.724
INFO:__main__:🚀 Starting FIXED agentic RAG for query: Compare scaled dot-product attention with additive...
INFO:__main__:QueryAnalyze

Response Summary:
.
3.2.2 Multi-Head Attention
Instead of performing a single attention function with dmodel-dimensional keys, values and queries,
we found it beneﬁcial to linearly project the queries, keys and values htimes with different, learned
linear projections to dk, dk and dv dimensions, respectively. On each of these projected versions of
queries, keys and values we then perform the attention function in parallel, yielding dv-dimensional
output values. These are concatenated and once again projected, res...
Quality Score: 0.7243560651358123
Citations: ['[1] NIPS-2017-attention-is-all-you-need-Paper, page 4', '[2] NIPS-2017-attention-is-all-you-need-Paper, page 4', '[3] NIPS-2017-attention-is-all-you-need-Paper, page 7', '[4] NIPS-2017-attention-is-all-you-need-Paper, page 2', '[5] NIPS-2017-attention-is-all-you-need-Paper, page 2']
--------------------------------------------------

Test 3: Query: Compare scaled dot-product attention with additive attention


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

INFO:__main__:RetrievalAgent: retrieved_5_documents - Used hybrid strategy to find relevant documents
INFO:__main__:QualityAssessor: quality_score_0.86 - Assessed 5 results with score 0.86
INFO:__main__:🔄 Iteration 1: Strategy=hybrid, Quality=0.865


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

INFO:__main__:RetrievalAgent: retrieved_6_documents - Used hybrid strategy to find relevant documents
INFO:__main__:QualityAssessor: quality_score_0.86 - Assessed 6 results with score 0.86
INFO:__main__:🔄 Iteration 2: Strategy=hybrid, Quality=0.858
INFO:__main__:🛑 Stopping after 2 iterations
INFO:__main__:CitationAgent: generated_5_citations - Created proper citations for all retrieved sources
Both `max_new_tokens` (=256) and `max_length`(=200) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
INFO:__main__:HF_LLMGenerator: generated_response - Generated response using HF model approach
INFO:__main__:Orchestrator: completed_fixed_agentic_rag - ✅ Completed FIXED RAG with 2 iterations, final quality: 0.865
INFO:__main__:🚀 Starting FIXED agentic RAG for query: What are the computational complexities of differe...
INFO:__main__:QueryAnalyzer: r

Response Summary:
Scaled Dot-Product Attention
 Multi-Head Attention
Figure 2: (left) Scaled Dot-Product Attention. (right) Multi-Head Attention consists of several
attention layers running in parallel.
query with all keys, divide each by √dk, and apply a softmax function to obtain the weights on the
values.
In practice, we compute the attention function on a set of queries simultaneously, packed together
into a matrix Q. The keys and values are also packed together into matrices Kand V.
Quality Score: 0.865
Citations: ['[1] NIPS-2017-attention-is-all-you-need-Paper, page 4', '[2] NIPS-2017-attention-is-all-you-need-Paper, page 3', '[3] NIPS-2017-attention-is-all-you-need-Paper, page 4', '[4] NIPS-2017-attention-is-all-you-need-Paper, page 3', '[5] NIPS-2017-attention-is-all-you-need-Paper, page 5']
--------------------------------------------------

Test 4: Query: What are the computational complexities of different attention mechanisms?


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

INFO:__main__:RetrievalAgent: retrieved_5_documents - Used hybrid strategy to find relevant documents
INFO:__main__:QualityAssessor: quality_score_0.79 - Assessed 5 results with score 0.79
INFO:__main__:🔄 Iteration 1: Strategy=hybrid, Quality=0.795


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

INFO:__main__:RetrievalAgent: retrieved_6_documents - Used hybrid strategy to find relevant documents
INFO:__main__:QualityAssessor: quality_score_0.78 - Assessed 6 results with score 0.78
INFO:__main__:🔄 Iteration 2: Strategy=hybrid, Quality=0.780
INFO:__main__:🛑 Stopping after 2 iterations
INFO:__main__:CitationAgent: generated_5_citations - Created proper citations for all retrieved sources
Both `max_new_tokens` (=256) and `max_length`(=200) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
INFO:__main__:HF_LLMGenerator: generated_response - Generated response using HF model approach
INFO:__main__:Orchestrator: completed_fixed_agentic_rag - ✅ Completed FIXED RAG with 2 iterations, final quality: 0.795
INFO:__main__:🚀 Starting FIXED agentic RAG for query: Explain the positional encoding in the transformer...
INFO:__main__:QueryAnalyzer: r

Response Summary:
different layer types.
As noted in Table 1, a self-attention layer connects all positions with a constant number of sequentially
executed operations, whereas a recurrent layer requires O(n) sequential operations. In terms of
computational complexity, self-attention layers are faster than recurrent layers when the sequence
length n is smaller than the representation dimensionality d, which is most often the case with
sentence representations used by state-of-the-art models in machine translations...
Quality Score: 0.7949999999999999
Citations: ['[1] NIPS-2017-attention-is-all-you-need-Paper, page 6', '[2] NIPS-2017-attention-is-all-you-need-Paper, page 5', '[3] NIPS-2017-attention-is-all-you-need-Paper, page 4', '[4] NIPS-2017-attention-is-all-you-need-Paper, page 6', '[5] NIPS-2017-attention-is-all-you-need-Paper, page 5']
--------------------------------------------------

Test 5: Query: Explain the positional encoding in the transformer model


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

INFO:__main__:RetrievalAgent: retrieved_5_documents - Used semantic strategy to find relevant documents
INFO:__main__:QualityAssessor: quality_score_0.57 - Assessed 5 results with score 0.57
INFO:__main__:🔄 Iteration 1: Strategy=semantic, Quality=0.572


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

INFO:__main__:RetrievalAgent: retrieved_6_documents - Used hybrid strategy to find relevant documents
INFO:__main__:QualityAssessor: quality_score_0.81 - Assessed 6 results with score 0.81
INFO:__main__:🔄 Iteration 2: Strategy=hybrid, Quality=0.810
INFO:__main__:🛑 Stopping after 2 iterations
INFO:__main__:CitationAgent: generated_6_citations - Created proper citations for all retrieved sources
Both `max_new_tokens` (=256) and `max_length`(=200) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
INFO:__main__:HF_LLMGenerator: generated_response - Generated response using HF model approach
INFO:__main__:Orchestrator: completed_fixed_agentic_rag - ✅ Completed FIXED RAG with 2 iterations, final quality: 0.810


Response Summary:
3.5 Positional Encoding
Since our model contains no recurrence and no convolution, in order for the model to make use of the
order of the sequence, we must inject some information about the relative or absolute position of the
tokens in the sequence. To this end, we add "positional encodings" to the input embeddings at the
5

learned and ﬁxed [8].
In this work, we use sine and cosine functions of different frequencies:
PE(pos,2i) = sin(pos/100002i/dmodel )
PE(pos,2i+1) = cos(pos/100002i/dmodel )...
Quality Score: 0.8097222222222222
Citations: ['[1] NIPS-2017-attention-is-all-you-need-Paper, page 5', '[2] NIPS-2017-attention-is-all-you-need-Paper, page 6', '[3] NIPS-2017-attention-is-all-you-need-Paper, page 9', '[4] NIPS-2017-attention-is-all-you-need-Paper, page 5', '[5] NIPS-2017-attention-is-all-you-need-Paper, page 6', '[6] NIPS-2017-attention-is-all-you-need-Paper, page 7']
--------------------------------------------------


## 8. Performance Analysis and Evaluation <a id="analysis"></a>

Let's analyze the performance of our Agentic RAG system and compare it with traditional approaches.

In [None]:
import numpy as np
import time
from typing import List, Dict, Any, Optional
import logging

class PerformanceAnalyzer:
    def __init__(self, agentic_orchestrator, traditional_retriever):
        self.agentic_orchestrator = agentic_orchestrator
        self.traditional_retriever = traditional_retriever
        self.results_cache = {}
    
    def run_comprehensive_comparison(self, test_queries: List[str]) -> Dict[str, Any]:
        print("🔬 Running Comprehensive RAG Comparison...")
        print("="*60)
        
        results = {
            'traditional': {'times': [], 'quality_scores': [], 'result_counts': []},
            'agentic': {'times': [], 'quality_scores': [], 'result_counts': [], 'iterations': []},
            'improvements': {}
        }
        
        for i, query in enumerate(test_queries, 1):
            print(f"\n📝 Testing Query {i}: {query[:50]}...")
            print("   🔄 Running Traditional RAG...")
            trad_start = time.time()
            trad_results = self.traditional_retriever.semantic_search(query, k=5)
            trad_time = time.time() - trad_start
            trad_quality = self._assess_quality_simple(query, trad_results) 
            results['traditional']['times'].append(trad_time)
            results['traditional']['quality_scores'].append(trad_quality)
            results['traditional']['result_counts'].append(len(trad_results))
            print(f"      ⏱️ Time: {trad_time:.3f}s, Quality: {trad_quality:.3f}")
            print("   🤖 Running Agentic RAG...")
            agent_start = time.time()
            agent_response = self.agentic_orchestrator.execute(query)
            agent_time = time.time() - agent_start
            
            agent_quality = agent_response['quality_metrics']['overall_score']
            agent_iterations = agent_response['process_metadata']['iterations_used']
            
            results['agentic']['times'].append(agent_time)
            results['agentic']['quality_scores'].append(agent_quality)
            results['agentic']['result_counts'].append(agent_response['quality_metrics']['result_count'])
            results['agentic']['iterations'].append(agent_iterations)
            
            print(f"      ⏱️ Time: {agent_time:.3f}s, Quality: {agent_quality:.3f}, Iterations: {agent_iterations}")

            quality_improvement = agent_quality - trad_quality
            print(f"      📊 Quality Improvement: {quality_improvement:+.3f}")

        results['improvements'] = self._calculate_improvements(results)
        self._print_final_comparison(results)
        
        return results
    
    def _assess_quality_simple(self, query: str, results: List) -> float:
        if not results:
            return 0.0
        
        avg_score = np.mean([r.score for r in results]) if results else 0.0
        result_bonus = min(len(results) / 5.0, 1.0)  
        normalized_score = min(avg_score, 1.0) if avg_score > 0 else 0.0
        
        return (normalized_score * 0.7) + (result_bonus * 0.3)
    
    def _calculate_improvements(self, results: Dict) -> Dict:
        trad_avg_quality = np.mean(results['traditional']['quality_scores'])
        agent_avg_quality = np.mean(results['agentic']['quality_scores'])
        
        trad_avg_time = np.mean(results['traditional']['times'])
        agent_avg_time = np.mean(results['agentic']['times'])
        
        return {
            'quality_improvement': agent_avg_quality - trad_avg_quality,
            'quality_improvement_pct': ((agent_avg_quality - trad_avg_quality) / trad_avg_quality * 100) if trad_avg_quality > 0 else 0,
            'time_overhead': agent_avg_time - trad_avg_time,
            'avg_iterations': np.mean(results['agentic']['iterations']),
            'quality_consistency': np.std(results['agentic']['quality_scores']) / np.std(results['traditional']['quality_scores']) if np.std(results['traditional']['quality_scores']) > 0 else 1.0
        }
    
    def _print_final_comparison(self, results: Dict):
        print(f"\n{'='*60}")
        print("🏆 FINAL PERFORMANCE COMPARISON")
        print("="*60)

        trad_avg_quality = np.mean(results['traditional']['quality_scores'])
        agent_avg_quality = np.mean(results['agentic']['quality_scores'])
        quality_improvement = results['improvements']['quality_improvement']
        
        print(f"\n📊 QUALITY METRICS:")
        print(f"   Traditional RAG:  {trad_avg_quality:.3f}")
        print(f"   Agentic RAG:      {agent_avg_quality:.3f}")
        print(f"   Improvement:      {quality_improvement:+.3f} ({results['improvements']['quality_improvement_pct']:+.1f}%)")
        
        if quality_improvement > 0:
            print("   🎯 STATUS: ✅ AGENTIC RAG WINS!")
        else:
            print("   ⚠️  STATUS: ❌ TRADITIONAL RAG BETTER")
  
        trad_avg_time = np.mean(results['traditional']['times'])
        agent_avg_time = np.mean(results['agentic']['times'])
        
        print(f"\n⏱️ TIME METRICS:")
        print(f"   Traditional RAG:  {trad_avg_time:.3f}s")
        print(f"   Agentic RAG:      {agent_avg_time:.3f}s")
        print(f"   Overhead:         +{results['improvements']['time_overhead']:.3f}s")
        print(f"\n🔄 AGENTIC BEHAVIOR:")
        print(f"   Average Iterations: {results['improvements']['avg_iterations']:.1f}")
        print(f"   Quality Consistency: {results['improvements']['quality_consistency']:.3f}")
        print(f"\n🎯 OVERALL ASSESSMENT:")
        if quality_improvement > 0.05:
            print("   🏆 Agentic RAG provides significant quality improvement!")
        elif quality_improvement > 0.01:
            print("   ✅ Agentic RAG provides modest quality improvement")
        else:
            print("   ❌ Agentic RAG needs optimization - no quality improvement")

def run_fixed_comprehensive_test():
    print("🚀 RUNNING AGENTIC RAG COMPREHENSIVE TEST")
    print("="*60)
    if 'fixed_orchestrator' not in globals():
        print("⚠️ Creating Fixed Orchestrator...")
        fixed_orchestrator = OrchestrationAgent(retriever, use_hf_llm=True)
    else:
        fixed_orchestrator = globals()['fixed_orchestrator']
    
    test_queries = [
        "What is the attention mechanism in transformers?",
        "How does multi-head attention work?",
        "Compare scaled dot-product attention with additive attention",
        "What are the key innovations in transformer architecture?"
    ]
    
    analyzer = PerformanceAnalyzer(fixed_orchestrator, retriever)
    results = analyzer.run_comprehensive_comparison(test_queries)
    
    return results

def manual_verification_test():
    print("\n🔍 MANUAL VERIFICATION OF AGENTIC BEHAVIOR")
    print("="*50)
    
    test_query = "What is the attention mechanism in transformers?"

    print("\n1️⃣ Traditional RAG:")
    trad_start = time.time()
    trad_results = retriever.semantic_search(test_query, k=5)
    trad_time = time.time() - trad_start
    print(f"   Results: {len(trad_results)}")
    print(f"   Time: {trad_time:.3f}s")
    print(f"   Top result score: {trad_results[0].score:.3f}" if trad_results else "No results")

    print("\n2️⃣ Agentic RAG:")
    if 'fixed_orchestrator' not in globals():
        fixed_orchestrator = OrchestrationAgent(retriever, use_hf_llm=True)
    else:
        fixed_orchestrator = globals()['fixed_orchestrator']
    
    agent_start = time.time()
    agent_response = fixed_orchestrator.execute(test_query)
    agent_time = time.time() - agent_start
    
    print(f"   Iterations: {agent_response['process_metadata']['iterations_used']}")
    print(f"   Quality Score: {agent_response['quality_metrics']['overall_score']:.3f}")
    print(f"   Time: {agent_time:.3f}s")
    print(f"   Generation Method: {agent_response['generation_method']}")
    print(f"   Strategies Tried: {agent_response['process_metadata']['strategies_tried']}")
    
    # Comparison
    print(f"\n📊 COMPARISON:")
    quality_diff = agent_response['quality_metrics']['overall_score'] - (trad_results[0].score if trad_results else 0)
    print(f"   Quality Improvement: {quality_diff:+.3f}")
    print(f"   Time Overhead: +{agent_time - trad_time:.3f}s")
    print(f"   Iterations Used: {agent_response['process_metadata']['iterations_used']}")
    
    return {
        'traditional': {'time': trad_time, 'results': len(trad_results)},
        'agentic': {
            'time': agent_time, 
            'quality': agent_response['quality_metrics']['overall_score'],
            'iterations': agent_response['process_metadata']['iterations_used']
        }
    }

print("🔧 Installing transformers if needed...")
try:
    import transformers
    print("✅ Transformers already installed")
except ImportError:
    print("📦 Installing transformers...")
    !pip install transformers torch

print("\n" + "="*60)
print("🧪 STARTING COMPREHENSIVE FIXED TESTS")
print("="*60)

manual_results = manual_verification_test()

comprehensive_results = run_fixed_comprehensive_test()

print(f"\n{'='*60}")
print("✅ ALL TESTS COMPLETED!")
print("="*60)

✅ Transformers already installed

🧪 STARTING COMPREHENSIVE FIXED TESTS

🔍 MANUAL VERIFICATION OF AGENTIC BEHAVIOR

1️⃣ Traditional RAG:

Collecting accelerate
  Downloading accelerate-1.10.1-py3-none-any.whl.metadata (19 kB)
Downloading accelerate-1.10.1-py3-none-any.whl (374 kB)
   ---------------------------------------- 0.0/374.9 kB ? eta -:--:--
   - -------------------------------------- 10.2/374.9 kB ? eta -:--:--
   --- ----------------------------------- 30.7/374.9 kB 660.6 kB/s eta 0:00:01
   ------- ------------------------------- 71.7/374.9 kB 653.6 kB/s eta 0:00:01
   ---------- --------------------------- 102.4/374.9 kB 845.5 kB/s eta 0:00:01
   ---------- --------------------------- 102.4/374.9 kB 845.5 kB/s eta 0:00:01
   -------------------- ----------------- 204.8/374.9 kB 831.5 kB/s eta 0:00:01
   ------------------------------ --------- 286.7/374.9 kB 1.0 MB/s eta 0:00:01
   ------------------------------------- -- 348.2/374.9 kB 1.1 MB/s eta 0:00:01
   -------------

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

INFO:__main__:🚀 Starting FIXED agentic RAG for query: What is the attention mechanism in transformers?...
INFO:__main__:QueryAnalyzer: recommend_semantic_retrieval - Factual queries benefit from semantic understanding


   Results: 5
   Time: 0.055s
   Top result score: 0.555

2️⃣ Fixed Agentic RAG:


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

INFO:__main__:RetrievalAgent: retrieved_5_documents - Used semantic strategy to find relevant documents
INFO:__main__:QualityAssessor: quality_score_0.59 - Assessed 5 results with score 0.59
INFO:__main__:🔄 Iteration 1: Strategy=semantic, Quality=0.587


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

INFO:__main__:RetrievalAgent: retrieved_6_documents - Used hybrid strategy to find relevant documents
INFO:__main__:QualityAssessor: quality_score_0.79 - Assessed 6 results with score 0.79
INFO:__main__:🔄 Iteration 2: Strategy=hybrid, Quality=0.787
INFO:__main__:🛑 Stopping after 2 iterations
INFO:__main__:CitationAgent: generated_6_citations - Created proper citations for all retrieved sources
Both `max_new_tokens` (=256) and `max_length`(=200) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
INFO:__main__:HF_LLMGenerator: generated_response - Generated response using HF model approach
INFO:__main__:Orchestrator: completed_fixed_agentic_rag - ✅ Completed FIXED RAG with 2 iterations, final quality: 0.787


   Iterations: 2
   Quality Score: 0.787
   Time: 2.691s
   Generation Method: hf_generated
   Strategies Tried: ['semantic', 'hybrid']

📊 COMPARISON:
   Quality Improvement: +0.233
   Time Overhead: +2.636s
   Iterations Used: 2
🚀 RUNNING FIXED AGENTIC RAG COMPREHENSIVE TEST
🔬 Running Comprehensive RAG Comparison...

📝 Testing Query 1: What is the attention mechanism in transformers?...
   🔄 Running Traditional RAG...


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

INFO:__main__:🚀 Starting FIXED agentic RAG for query: What is the attention mechanism in transformers?...
INFO:__main__:QueryAnalyzer: recommend_semantic_retrieval - Factual queries benefit from semantic understanding


      ⏱️ Time: 0.032s, Quality: 0.648
   🤖 Running Agentic RAG...


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

INFO:__main__:RetrievalAgent: retrieved_5_documents - Used semantic strategy to find relevant documents
INFO:__main__:QualityAssessor: quality_score_0.59 - Assessed 5 results with score 0.59
INFO:__main__:🔄 Iteration 1: Strategy=semantic, Quality=0.587


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

INFO:__main__:RetrievalAgent: retrieved_6_documents - Used hybrid strategy to find relevant documents
INFO:__main__:QualityAssessor: quality_score_0.79 - Assessed 6 results with score 0.79
INFO:__main__:🔄 Iteration 2: Strategy=hybrid, Quality=0.787
INFO:__main__:🛑 Stopping after 2 iterations
INFO:__main__:CitationAgent: generated_6_citations - Created proper citations for all retrieved sources
Both `max_new_tokens` (=256) and `max_length`(=200) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
INFO:__main__:HF_LLMGenerator: generated_response - Generated response using HF model approach
INFO:__main__:Orchestrator: completed_fixed_agentic_rag - ✅ Completed FIXED RAG with 2 iterations, final quality: 0.787


      ⏱️ Time: 2.410s, Quality: 0.787, Iterations: 2
      📊 Quality Improvement: +0.139

📝 Testing Query 2: How does multi-head attention work?...
   🔄 Running Traditional RAG...


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

INFO:__main__:🚀 Starting FIXED agentic RAG for query: How does multi-head attention work?...
INFO:__main__:QueryAnalyzer: recommend_hybrid_retrieval - Complex queries require multi-modal retrieval


      ⏱️ Time: 0.034s, Quality: 0.702
   🤖 Running Agentic RAG...


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

INFO:__main__:RetrievalAgent: retrieved_5_documents - Used hybrid strategy to find relevant documents
INFO:__main__:QualityAssessor: quality_score_0.72 - Assessed 5 results with score 0.72
INFO:__main__:🔄 Iteration 1: Strategy=hybrid, Quality=0.724


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

INFO:__main__:RetrievalAgent: retrieved_6_documents - Used hybrid strategy to find relevant documents
INFO:__main__:QualityAssessor: quality_score_0.71 - Assessed 6 results with score 0.71
INFO:__main__:🔄 Iteration 2: Strategy=hybrid, Quality=0.711


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

INFO:__main__:RetrievalAgent: retrieved_7_documents - Used semantic strategy to find relevant documents
INFO:__main__:QualityAssessor: quality_score_0.53 - Assessed 7 results with score 0.53
INFO:__main__:🔄 Iteration 3: Strategy=semantic, Quality=0.526
INFO:__main__:🛑 Stopping after 3 iterations
INFO:__main__:CitationAgent: generated_5_citations - Created proper citations for all retrieved sources
Both `max_new_tokens` (=256) and `max_length`(=200) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
INFO:__main__:HF_LLMGenerator: generated_response - Generated response using HF model approach
INFO:__main__:Orchestrator: completed_fixed_agentic_rag - ✅ Completed FIXED RAG with 3 iterations, final quality: 0.724


      ⏱️ Time: 2.322s, Quality: 0.724, Iterations: 3
      📊 Quality Improvement: +0.023

📝 Testing Query 3: Compare scaled dot-product attention with additive...
   🔄 Running Traditional RAG...


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

INFO:__main__:🚀 Starting FIXED agentic RAG for query: Compare scaled dot-product attention with additive...
INFO:__main__:QueryAnalyzer: recommend_hybrid_retrieval - Default to hybrid approach for balanced results


      ⏱️ Time: 0.034s, Quality: 0.756
   🤖 Running Agentic RAG...


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

INFO:__main__:RetrievalAgent: retrieved_5_documents - Used hybrid strategy to find relevant documents
INFO:__main__:QualityAssessor: quality_score_0.86 - Assessed 5 results with score 0.86
INFO:__main__:🔄 Iteration 1: Strategy=hybrid, Quality=0.865


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

INFO:__main__:RetrievalAgent: retrieved_6_documents - Used hybrid strategy to find relevant documents
INFO:__main__:QualityAssessor: quality_score_0.86 - Assessed 6 results with score 0.86
INFO:__main__:🔄 Iteration 2: Strategy=hybrid, Quality=0.858
INFO:__main__:🛑 Stopping after 2 iterations
INFO:__main__:CitationAgent: generated_5_citations - Created proper citations for all retrieved sources
Both `max_new_tokens` (=256) and `max_length`(=200) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
INFO:__main__:HF_LLMGenerator: generated_response - Generated response using HF model approach
INFO:__main__:Orchestrator: completed_fixed_agentic_rag - ✅ Completed FIXED RAG with 2 iterations, final quality: 0.865


      ⏱️ Time: 12.216s, Quality: 0.865, Iterations: 2
      📊 Quality Improvement: +0.109

📝 Testing Query 4: What are the key innovations in transformer archit...
   🔄 Running Traditional RAG...


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

INFO:__main__:🚀 Starting FIXED agentic RAG for query: What are the key innovations in transformer archit...
INFO:__main__:QueryAnalyzer: recommend_hybrid_retrieval - Complex queries require multi-modal retrieval


      ⏱️ Time: 0.033s, Quality: 0.568
   🤖 Running Agentic RAG...


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

INFO:__main__:RetrievalAgent: retrieved_5_documents - Used hybrid strategy to find relevant documents
INFO:__main__:QualityAssessor: quality_score_0.77 - Assessed 5 results with score 0.77
INFO:__main__:🔄 Iteration 1: Strategy=hybrid, Quality=0.765


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

INFO:__main__:RetrievalAgent: retrieved_6_documents - Used hybrid strategy to find relevant documents
INFO:__main__:QualityAssessor: quality_score_0.77 - Assessed 6 results with score 0.77
INFO:__main__:🔄 Iteration 2: Strategy=hybrid, Quality=0.771
INFO:__main__:🛑 Stopping after 2 iterations
INFO:__main__:CitationAgent: generated_6_citations - Created proper citations for all retrieved sources
Both `max_new_tokens` (=256) and `max_length`(=200) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
INFO:__main__:HF_LLMGenerator: generated_response - Generated response using HF model approach
INFO:__main__:Orchestrator: completed_fixed_agentic_rag - ✅ Completed FIXED RAG with 2 iterations, final quality: 0.771


      ⏱️ Time: 4.047s, Quality: 0.771, Iterations: 2
      📊 Quality Improvement: +0.203

🏆 FINAL PERFORMANCE COMPARISON

📊 QUALITY METRICS:
   Traditional RAG:  0.668
   Agentic RAG:      0.787
   Improvement:      +0.118 (+17.7%)
   🎯 STATUS: ✅ AGENTIC RAG WINS!

⏱️ TIME METRICS:
   Traditional RAG:  0.033s
   Agentic RAG:      5.249s
   Overhead:         +5.215s

🔄 AGENTIC BEHAVIOR:
   Average Iterations: 2.2
   Quality Consistency: 0.730

🎯 OVERALL ASSESSMENT:
   🏆 Agentic RAG provides significant quality improvement!

✅ ALL TESTS COMPLETED!


## 9. Comprehensive Test Framework <a id="testing2"></a>

Let's create a robust testing framework to ensure our Agentic RAG system works correctly across different scenarios.

In [None]:
class AgenticRAGTestSuite:
    def __init__(self, orchestrator: OrchestrationAgent):
        self.orchestrator = orchestrator
        self.test_results = []
    
    def test_query_analysis_agent(self):
        """Test the Query Analysis Agent"""
        test_cases = [
            ("What is machine learning?", "factual", "semantic"),
            ("Compare supervised and unsupervised learning", "comparison", "hybrid"),
            ("How to implement a neural network?", "procedural", "hybrid"),
            ("When was the transformer invented?", "specific", "lexical"),
            ("Why does attention work better than RNNs for long sequences?", "analytical", "hybrid")
        ]
        
        results = []
        for query, expected_type, expected_strategy in test_cases:
            decision = self.orchestrator.query_analyzer.execute(query)
            actual_type = decision.metadata['query_type']
            actual_strategy = decision.metadata['recommended_strategy']
            
            type_correct = actual_type == expected_type
            strategy_correct = actual_strategy == expected_strategy
            
            results.append({
                'query': query,
                'expected_type': expected_type,
                'actual_type': actual_type,
                'expected_strategy': expected_strategy,
                'actual_strategy': actual_strategy,
                'type_correct': type_correct,
                'strategy_correct': strategy_correct,
                'confidence': decision.confidence
            })
        
        return results
    
    def test_retrieval_quality(self):
        test_query = "What is the attention mechanism in neural networks?"
        
        strategies = ['semantic', 'lexical', 'hybrid']
        results = {}
        
        for strategy in strategies:
            retrieval_results = self.orchestrator.retrieval_agent.execute(test_query, strategy, k=5)
            quality_assessment = self.orchestrator.quality_assessor.execute(test_query, retrieval_results)
            
            results[strategy] = {
                'results_count': len(retrieval_results),
                'quality_score': quality_assessment['quality_score'],
                'issues': quality_assessment['issues'],
                'metrics': quality_assessment['metrics']
            }
        
        return results
    
    def test_citation_generation(self):
        test_query = "Explain the transformer architecture"
        results = self.orchestrator.retrieval_agent.execute(test_query, 'hybrid', k=3)
        citations = self.orchestrator.citation_agent.execute(results)
        
        test_results = {
            'citation_count': len(citations['citations']),
            'bibliography_count': len(citations['bibliography']),
            'has_page_numbers': any('page' in citation for citation in citations['citations']),
            'unique_sources': len(set(frozenset(source.items()) for source in citations['source_map'].values())),
            'citations': citations['citations']
        }
        
        return test_results
    
    def test_iterative_improvement(self):
        test_query = "What are the key innovations in the attention mechanism?"
        response = self.orchestrator.execute(test_query)
        
        iterations = response['process_metadata']['strategies_tried']
        conversation = self.orchestrator.get_conversation_history()[-1]
        
        quality_progression = [iteration['quality_score'] for iteration in conversation['iterations']]
        
        return {
            'iterations_used': len(iterations),
            'strategies_tried': iterations,
            'quality_progression': quality_progression,
            'final_quality': response['quality_metrics']['overall_score'],
            'improved': len(quality_progression) > 1 and quality_progression[-1] >= quality_progression[0]
        }
    
    def test_edge_cases(self):
        edge_cases = [
            ("", "empty_query"),
            ("a", "very_short_query"),
            ("What is the meaning of life, the universe, and everything, and how does it relate to artificial intelligence, machine learning, deep learning, neural networks, transformers, attention mechanisms, and the future of humanity?", "very_long_query"),
            ("xyz123 qwerty asdfgh", "nonsense_query"),
            ("🤖 🧠 💻", "emoji_query")
        ]
        
        results = []
        for query, case_type in edge_cases:
            try:
                response = self.orchestrator.execute(query)
                results.append({
                    'case_type': case_type,
                    'query': query,
                    'success': True,
                    'quality_score': response['quality_metrics']['overall_score'],
                    'results_count': response['quality_metrics']['result_count'],
                    'error': None
                })
            except Exception as e:
                results.append({
                    'case_type': case_type,
                    'query': query,
                    'success': False,
                    'quality_score': 0,
                    'results_count': 0,
                    'error': str(e)
                })
        
        return results
    
    def run_all_tests(self):
        print("Running Comprehensive Test Suite")
        print("="*50)
        
        # Test 1: Query Analysis
        print("\n1. Testing Query Analysis Agent...")
        query_analysis_results = self.test_query_analysis_agent()
        type_accuracy = sum(r['type_correct'] for r in query_analysis_results) / len(query_analysis_results)
        strategy_accuracy = sum(r['strategy_correct'] for r in query_analysis_results) / len(query_analysis_results)
        
        print(f"   Query Type Accuracy: {type_accuracy:.2%}")
        print(f"   Strategy Recommendation Accuracy: {strategy_accuracy:.2%}")
        
        # Test 2: Retrieval Quality
        print("\n2. Testing Retrieval Quality...")
        retrieval_results = self.test_retrieval_quality()
        for strategy, metrics in retrieval_results.items():
            print(f"   {strategy.title()} Strategy - Quality: {metrics['quality_score']:.3f}")
        
        # Test 3: Citation Generation
        print("\n3. Testing Citation Generation...")
        citation_results = self.test_citation_generation()
        print(f"   Citations Generated: {citation_results['citation_count']}")
        print(f"   Bibliography Entries: {citation_results['bibliography_count']}")
        print(f"   Has Page Numbers: {citation_results['has_page_numbers']}")
        
        # Test 4: Iterative Improvement
        print("\n4. Testing Iterative Improvement...")
        improvement_results = self.test_iterative_improvement()
        print(f"   Iterations Used: {improvement_results['iterations_used']}")
        print(f"   Quality Improved: {improvement_results['improved']}")
        print(f"   Final Quality: {improvement_results['final_quality']:.3f}")
        
        # Test 5: Edge Cases
        print("\n5. Testing Edge Cases...")
        edge_case_results = self.test_edge_cases()
        success_rate = sum(r['success'] for r in edge_case_results) / len(edge_case_results)
        print(f"   Success Rate: {success_rate:.2%}")
        
        for result in edge_case_results:
            if not result['success']:
                print(f"   Failed: {result['case_type']} - {result['error']}")
        
        # Compile final test report
        self.test_results = {
            'query_analysis': {
                'type_accuracy': type_accuracy,
                'strategy_accuracy': strategy_accuracy,
                'details': query_analysis_results
            },
            'retrieval_quality': retrieval_results,
            'citation_generation': citation_results,
            'iterative_improvement': improvement_results,
            'edge_cases': {
                'success_rate': success_rate,
                'details': edge_case_results
            }
        }
        
        print(f"\n{'='*50}")
        print("Test Suite Completed Successfully!")
        return self.test_results

# Run the comprehensive test suite
test_suite = AgenticRAGTestSuite(orchestrator)
test_results = test_suite.run_all_tests()

INFO:__main__:QueryAnalyzer: recommend_semantic_retrieval - Factual queries benefit from semantic understanding
INFO:__main__:QueryAnalyzer: recommend_hybrid_retrieval - Complex queries require multi-modal retrieval
INFO:__main__:QueryAnalyzer: recommend_hybrid_retrieval - Complex queries require multi-modal retrieval
INFO:__main__:QueryAnalyzer: recommend_lexical_retrieval - Specific queries need exact term matching
INFO:__main__:QueryAnalyzer: recommend_hybrid_retrieval - Complex queries require multi-modal retrieval


Running Comprehensive Test Suite

1. Testing Query Analysis Agent...
   Query Type Accuracy: 100.00%
   Strategy Recommendation Accuracy: 100.00%

2. Testing Retrieval Quality...


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

INFO:__main__:RetrievalAgent: retrieved_5_documents - Used semantic strategy to find relevant documents
INFO:__main__:QualityAssessor: quality_score_0.60 - Assessed 5 results with score 0.60
INFO:__main__:RetrievalAgent: retrieved_5_documents - Used lexical strategy to find relevant documents
INFO:__main__:QualityAssessor: quality_score_0.76 - Assessed 5 results with score 0.76


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

INFO:__main__:RetrievalAgent: retrieved_5_documents - Used hybrid strategy to find relevant documents
INFO:__main__:QualityAssessor: quality_score_0.76 - Assessed 5 results with score 0.76


   Semantic Strategy - Quality: 0.598
   Lexical Strategy - Quality: 0.757
   Hybrid Strategy - Quality: 0.757

3. Testing Citation Generation...


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

INFO:__main__:RetrievalAgent: retrieved_3_documents - Used hybrid strategy to find relevant documents
INFO:__main__:CitationAgent: generated_3_citations - Created proper citations for all retrieved sources
INFO:__main__:🚀 Starting FIXED agentic RAG for query: What are the key innovations in the attention mech...
INFO:__main__:QueryAnalyzer: recommend_hybrid_retrieval - Default to hybrid approach for balanced results


   Citations Generated: 3
   Bibliography Entries: 1
   Has Page Numbers: True

4. Testing Iterative Improvement...


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

INFO:__main__:RetrievalAgent: retrieved_5_documents - Used hybrid strategy to find relevant documents
INFO:__main__:QualityAssessor: quality_score_0.80 - Assessed 5 results with score 0.80
INFO:__main__:🔄 Iteration 1: Strategy=hybrid, Quality=0.800


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

INFO:__main__:RetrievalAgent: retrieved_6_documents - Used hybrid strategy to find relevant documents
INFO:__main__:QualityAssessor: quality_score_0.79 - Assessed 6 results with score 0.79
INFO:__main__:🔄 Iteration 2: Strategy=hybrid, Quality=0.787
INFO:__main__:🛑 Stopping after 2 iterations
INFO:__main__:CitationAgent: generated_5_citations - Created proper citations for all retrieved sources
Both `max_new_tokens` (=256) and `max_length`(=200) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
INFO:__main__:HF_LLMGenerator: generated_response - Generated response using HF model approach
INFO:__main__:Orchestrator: completed_fixed_agentic_rag - ✅ Completed FIXED RAG with 2 iterations, final quality: 0.800
INFO:__main__:🚀 Starting FIXED agentic RAG for query: ...
INFO:__main__:QueryAnalyzer: recommend_hybrid_retrieval - Default to hybrid appr

   Iterations Used: 2
   Quality Improved: False
   Final Quality: 0.800

5. Testing Edge Cases...


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

INFO:__main__:RetrievalAgent: retrieved_5_documents - Used hybrid strategy to find relevant documents
INFO:__main__:QualityAssessor: quality_score_0.28 - Assessed 5 results with score 0.28
INFO:__main__:🔄 Iteration 1: Strategy=hybrid, Quality=0.277


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

INFO:__main__:RetrievalAgent: retrieved_6_documents - Used hybrid strategy to find relevant documents
INFO:__main__:QualityAssessor: quality_score_0.26 - Assessed 6 results with score 0.26
INFO:__main__:🔄 Iteration 2: Strategy=hybrid, Quality=0.264
INFO:__main__:RetrievalAgent: retrieved_0_documents - Used lexical strategy to find relevant documents
INFO:__main__:QualityAssessor: low_quality_assessment - No results to evaluate
INFO:__main__:🔄 Iteration 3: Strategy=lexical, Quality=0.000
INFO:__main__:🛑 Stopping after 3 iterations
INFO:__main__:CitationAgent: generated_5_citations - Created proper citations for all retrieved sources
Both `max_new_tokens` (=256) and `max_length`(=200) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
INFO:__main__:HF_LLMGenerator: generated_response - Generated response using HF model approach
INFO:__main__:O

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

INFO:__main__:RetrievalAgent: retrieved_5_documents - Used hybrid strategy to find relevant documents
INFO:__main__:QualityAssessor: quality_score_0.45 - Assessed 5 results with score 0.45
INFO:__main__:🔄 Iteration 1: Strategy=hybrid, Quality=0.452


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

INFO:__main__:RetrievalAgent: retrieved_6_documents - Used hybrid strategy to find relevant documents
INFO:__main__:QualityAssessor: quality_score_0.46 - Assessed 6 results with score 0.46
INFO:__main__:🔄 Iteration 2: Strategy=hybrid, Quality=0.460
INFO:__main__:RetrievalAgent: retrieved_7_documents - Used lexical strategy to find relevant documents
INFO:__main__:QualityAssessor: quality_score_0.62 - Assessed 7 results with score 0.62
INFO:__main__:🔄 Iteration 3: Strategy=lexical, Quality=0.618
INFO:__main__:🛑 Stopping after 3 iterations
INFO:__main__:CitationAgent: generated_7_citations - Created proper citations for all retrieved sources
Both `max_new_tokens` (=256) and `max_length`(=200) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
INFO:__main__:HF_LLMGenerator: generated_response - Generated response using HF model approach
INFO:__

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

INFO:__main__:RetrievalAgent: retrieved_5_documents - Used semantic strategy to find relevant documents
INFO:__main__:QualityAssessor: quality_score_0.33 - Assessed 5 results with score 0.33
INFO:__main__:🔄 Iteration 1: Strategy=semantic, Quality=0.325


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

INFO:__main__:RetrievalAgent: retrieved_6_documents - Used hybrid strategy to find relevant documents
INFO:__main__:QualityAssessor: quality_score_0.69 - Assessed 6 results with score 0.69
INFO:__main__:🔄 Iteration 2: Strategy=hybrid, Quality=0.689


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

INFO:__main__:RetrievalAgent: retrieved_7_documents - Used semantic strategy to find relevant documents
INFO:__main__:QualityAssessor: quality_score_0.37 - Assessed 7 results with score 0.37
INFO:__main__:🔄 Iteration 3: Strategy=semantic, Quality=0.375
INFO:__main__:🛑 Stopping after 3 iterations
INFO:__main__:CitationAgent: generated_6_citations - Created proper citations for all retrieved sources
Both `max_new_tokens` (=256) and `max_length`(=200) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
INFO:__main__:HF_LLMGenerator: generated_response - Generated response using HF model approach
INFO:__main__:Orchestrator: completed_fixed_agentic_rag - ✅ Completed FIXED RAG with 3 iterations, final quality: 0.689
INFO:__main__:🚀 Starting FIXED agentic RAG for query: xyz123 qwerty asdfgh...
INFO:__main__:QueryAnalyzer: recommend_hybrid_retrieval 

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

INFO:__main__:RetrievalAgent: retrieved_5_documents - Used hybrid strategy to find relevant documents
INFO:__main__:QualityAssessor: quality_score_0.29 - Assessed 5 results with score 0.29
INFO:__main__:🔄 Iteration 1: Strategy=hybrid, Quality=0.293


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

INFO:__main__:RetrievalAgent: retrieved_6_documents - Used hybrid strategy to find relevant documents
INFO:__main__:QualityAssessor: quality_score_0.29 - Assessed 6 results with score 0.29
INFO:__main__:🔄 Iteration 2: Strategy=hybrid, Quality=0.295
INFO:__main__:RetrievalAgent: retrieved_0_documents - Used lexical strategy to find relevant documents
INFO:__main__:QualityAssessor: low_quality_assessment - No results to evaluate
INFO:__main__:🔄 Iteration 3: Strategy=lexical, Quality=0.000
INFO:__main__:🛑 Stopping after 3 iterations
INFO:__main__:CitationAgent: generated_6_citations - Created proper citations for all retrieved sources
Both `max_new_tokens` (=256) and `max_length`(=200) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
INFO:__main__:HF_LLMGenerator: generated_response - Generated response using HF model approach
INFO:__main__:O

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

INFO:__main__:RetrievalAgent: retrieved_5_documents - Used hybrid strategy to find relevant documents
INFO:__main__:QualityAssessor: quality_score_0.26 - Assessed 5 results with score 0.26
INFO:__main__:🔄 Iteration 1: Strategy=hybrid, Quality=0.259


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

INFO:__main__:RetrievalAgent: retrieved_6_documents - Used hybrid strategy to find relevant documents
INFO:__main__:QualityAssessor: quality_score_0.26 - Assessed 6 results with score 0.26
INFO:__main__:🔄 Iteration 2: Strategy=hybrid, Quality=0.259
INFO:__main__:RetrievalAgent: retrieved_0_documents - Used lexical strategy to find relevant documents
INFO:__main__:QualityAssessor: low_quality_assessment - No results to evaluate
INFO:__main__:🔄 Iteration 3: Strategy=lexical, Quality=0.000
INFO:__main__:🛑 Stopping after 3 iterations
INFO:__main__:CitationAgent: generated_5_citations - Created proper citations for all retrieved sources
Both `max_new_tokens` (=256) and `max_length`(=200) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
INFO:__main__:HF_LLMGenerator: generated_response - Generated response using HF model approach
INFO:__main__:O

   Success Rate: 100.00%

Test Suite Completed Successfully!
