# Agentic Search - Exploration Notebook

This notebook explores the components of the Agentic Search system.

## INFO 624: Intelligent Search and Language Models

### Course Concepts Demonstrated:
- Vector Space Models (Week 4)
- Neural Language Models (Week 5)
- Query Processing (Week 6)
- Relevance Feedback (Week 9)
- RAG Systems (Week 11)

In [None]:
# Setup
import sys
sys.path.insert(0, '..')

from dotenv import load_dotenv
load_dotenv('../.env')

import asyncio
from src.utils.config import get_settings

settings = get_settings()
print(f"OpenAI configured: {bool(settings.openai_api_key)}")
print(f"Tavily configured: {bool(settings.tavily_api_key)}")

## 1. Testing Individual Retrievers

Let's test each retriever independently.

In [None]:
# Test Tavily Web Search
from src.retrievers import get_tavily_retriever

tavily = get_tavily_retriever()
web_results = await tavily.search("What is RAG in AI?")

print(f"Found {len(web_results)} web results")
for r in web_results[:3]:
    print(f"\n- {r['title']}")
    print(f"  URL: {r['url']}")
    print(f"  Score: {r['score']:.3f}")

In [None]:
# Test arXiv Search
from src.retrievers import get_arxiv_retriever

arxiv = get_arxiv_retriever()
arxiv_results = await arxiv.search("retrieval augmented generation")

print(f"Found {len(arxiv_results)} academic papers")
for r in arxiv_results[:3]:
    print(f"\n- {r['title']}")
    print(f"  Authors: {r['metadata'].get('authors', 'N/A')}")
    print(f"  URL: {r['url']}")

## 2. Document Ingestion Pipeline

Demonstrating text preprocessing and chunking (Week 2 concepts).

In [None]:
# Test chunking strategies
from src.ingestion import TextChunker, Document

sample_text = """
# Information Retrieval

Information retrieval (IR) is the process of obtaining information system resources 
that are relevant to an information need from a collection of those resources.

## Vector Space Model

The vector space model represents documents and queries as vectors in a 
high-dimensional space. Similarity is computed using cosine similarity.

## BM25

BM25 is a probabilistic retrieval function that ranks documents based on 
query terms appearing in each document.
"""

doc = Document(content=sample_text, metadata={"title": "IR Overview"})

# Compare chunking strategies
for strategy in ["fixed", "paragraph", "semantic"]:
    chunker = TextChunker(strategy=strategy, chunk_size=200)
    chunks = chunker.chunk_document(doc)
    print(f"\n{strategy.upper()} chunking: {len(chunks)} chunks")
    for i, chunk in enumerate(chunks[:2]):
        print(f"  Chunk {i+1}: {len(chunk.content)} chars")

## 3. Running the Full Agent

Test the complete agentic search pipeline.

In [None]:
from src.agent import run_search

# Simple query
result = await run_search("What is RAG in AI?")

print("Query Type:", result.get('query_type'))
print("Sub-queries:", [sq['query'] for sq in result.get('sub_queries', [])])
print("\nAnswer:")
print(result.get('final_answer', result.get('draft_answer', 'No answer'))[:500])

In [None]:
# Complex query with decomposition
result = await run_search("Compare BM25 and dense retrieval for question answering")

print("Query Type:", result.get('query_type'))
print("\nSub-queries:")
for sq in result.get('sub_queries', []):
    print(f"  - {sq['query']} -> {sq.get('sources', [])}")

print("\nQuality Score:", result.get('overall_quality', 0))
print("Iterations:", result.get('iteration_count', 0))

## 4. Visualizing the Agent Graph

Visualize the LangGraph workflow.

In [None]:
from src.agent.graph import create_search_agent

agent = create_search_agent()

# Print graph structure
print("Agent Graph Nodes:")
print(agent.get_graph().nodes)