# FedQuery Demo: FOMC Agentic RAG Research Assistant

This notebook demonstrates the FedQuery system end-to-end:

1. **Raw retrieval** — bi-encoder similarity search over FOMC document chunks (no LLM)
2. **Reranker-enhanced retrieval** — cross-encoder reranking for improved relevance
3. **Full LangGraph agent** — assess (with date extraction + adaptive top_k) → two-pass search → evaluate confidence → synthesize → validate → respond

**Prerequisites:**
- Run `fedquery ingest --years 2024` first to populate the ChromaDB store (or `--years 2021 2022 2023 2024 2025` for the full corpus)
- Sections 1-2 work without an API key
- Section 3 requires `ANTHROPIC_API_KEY` in `.env`

In [None]:
import sys
from pathlib import Path

# Ensure project root is on sys.path
project_root = Path.cwd()
if str(project_root) not in sys.path:
    sys.path.insert(0, str(project_root))

from config.settings import get_settings
from src.vectorstore.chroma_store import ChromaStore
from src.embedding.sentence_transformer import SentenceTransformerEmbeddingProvider

settings = get_settings()
store = ChromaStore(path=str(settings.chroma_path))
embedding_provider = SentenceTransformerEmbeddingProvider(settings.fedquery_embedding_model)

print(f"Embedding model: {settings.fedquery_embedding_model} ({embedding_provider.dimension}d)")
print(f"ChromaDB path:   {settings.chroma_path}")
print(f"Chunks in store: {store.count}")

## Section 2: Retrieval Only (no LLM needed)

### Raw Bi-Encoder Retrieval

The first stage uses BAAI/bge-small-en-v1.5 to embed the query and find the closest chunks
by cosine similarity in ChromaDB. This is fast but can miss nuanced relevance.

In [None]:
from src.agent.mcp_client import create_search_fn

search_fn = create_search_fn(store, embedding_provider)

queries = [
    "What was the federal funds rate target range in January 2024?",
    "How did the FOMC characterize inflation risks in 2024?",
]

for q in queries:
    print(f"\n{'='*80}")
    print(f"Query: {q}")
    print(f"{'='*80}")
    results = search_fn(q, top_k=10)
    print(f"{'Doc':<45} {'Date':<12} {'Section':<25} {'Score':>6}")
    print("-" * 92)
    for r in results:
        doc = r['document_name'][:44]
        date = r['document_date'][:11]
        sec = r['section_header'][:24]
        print(f"{doc:<45} {date:<12} {sec:<25} {r['relevance_score']:>6.3f}")
        print(f"  {r['chunk_text'][:120]}...\n")

In [None]:
from src.retrieval.reranker import CrossEncoderReranker

reranker = CrossEncoderReranker(settings.fedquery_reranker_model)
reranked_search_fn = create_search_fn(store, embedding_provider, reranker=reranker)

print(f"Reranker model: {reranker.model_name}")
print(f"Strategy: over-fetch 3x from bi-encoder, rerank with cross-encoder\n")

for q in queries:
    print(f"\n{'='*80}")
    print(f"Query: {q}")
    print(f"{'='*80}")
    results = reranked_search_fn(q, top_k=10)
    print(f"{'Doc':<45} {'Date':<12} {'Section':<25} {'Score':>6}")
    print("-" * 92)
    for r in results:
        doc = r['document_name'][:44]
        date = r['document_date'][:11]
        sec = r['section_header'][:24]
        print(f"{doc:<45} {date:<12} {sec:<25} {r['relevance_score']:>6.3f}")
        print(f"  {r['chunk_text'][:120]}...\n")

## Section 3: Full LangGraph Agent (requires ANTHROPIC_API_KEY)

The agent workflow:

1. **assess_query** — LLM classifies whether retrieval is needed, extracts date ranges for temporal filtering, and estimates `top_k_hint` (how many results to fetch based on query scope)
2. **search_corpus** — two-pass retrieval when date hints are present: (1) filtered pass with date range, (2) unfiltered pass, merged and deduped. Uses adaptive `top_k` from the hint (default 10, up to 50 for multi-year queries).
3. **evaluate_confidence** — score thresholds (high >= 0.55, medium >= 0.40, low >= 0.25)
4. **reformulate_query** — if low confidence, LLM rephrases (up to 2 retries)
5. **synthesize_answer** — LLM generates answer grounded in retrieved chunks with [Source N] citations
6. **validate_citations** — verify each citation maps to an actual chunk
7. **respond** — format final answer with sources, or return uncertainty message

This cell uses the same retrieval backend as the CLI (`fedquery ask`), controlled by `FEDQUERY_USE_MCP`:
- **`true` (default)**: Spawns MCP server as subprocess, communicates via stdio (same as Claude Desktop)
- **`false`**: Direct in-process ChromaStore calls (faster, no subprocess overhead)

In [None]:
from src.agent.graph import build_graph

# Use the same retrieval backend as the CLI (controlled by FEDQUERY_USE_MCP setting)
cleanup_fn = None

if settings.fedquery_use_mcp:
    from src.agent.mcp_client import MCPSearchClient, create_mcp_search_fn
    mcp_client = MCPSearchClient()
    mcp_client.connect()
    search_fn = create_mcp_search_fn(mcp_client)
    cleanup_fn = mcp_client.close
    retrieval_mode = "MCP (stdio subprocess)"
else:
    from src.agent.mcp_client import create_direct_search_fn
    search_fn = create_direct_search_fn(store, embedding_provider)
    retrieval_mode = "Direct (in-process ChromaStore)"

graph = build_graph(search_fn)


def ask(question: str):
    """Run the agent graph and display the result."""
    print(f"Question: {question}\n")
    result = graph.invoke({
        "query": question,
        "retrieved_chunks": [],
        "confidence": "insufficient",
        "reformulation_attempts": 0,
        "reformulated_query": None,
        "answer": None,
        "citations": [],
        "needs_retrieval": True,
        "metadata_hints": None,
        "top_k_hint": None,
    })
    print(f"Confidence: {result['confidence']}")
    print(f"Reformulations: {result['reformulation_attempts']}")
    print(f"Chunks retrieved: {len(result['retrieved_chunks'])}")
    print(f"Citations: {len(result['citations'])}")
    hints = result.get('metadata_hints')
    if hints:
        print(f"Date filter: {hints.get('date_start')} to {hints.get('date_end')}")
    top_k = result.get('top_k_hint')
    if top_k:
        print(f"Adaptive top_k: {top_k}")
    print(f"\n{'─'*80}")
    print(result["answer"])


print(f"Agent graph built. Nodes: {list(graph.get_graph().nodes)}")
print(f"Retrieval mode: {retrieval_mode}")

In [None]:
# Factual question
ask("What was the federal funds rate target range set by the FOMC in January 2024?")

In [None]:
# Cross-document question
ask("How did the FOMC's characterization of inflation change between January and September 2024?")

In [None]:
# Section-specific question
ask("What did FOMC participants discuss about quantitative tightening in 2024?")

In [None]:
# Out-of-scope question — should return uncertainty response
ask("What is the European Central Bank's current interest rate?")

In [None]:
# Clean up MCP server subprocess (if used)
if cleanup_fn:
    cleanup_fn()
    print("MCP server subprocess stopped.")
else:
    print("Direct mode — no cleanup needed.")

## Summary

This notebook demonstrated three levels of the FedQuery pipeline:

| Stage | LLM Required | What it does |
|-------|-------------|-------------|
| Bi-encoder retrieval | No | Fast cosine similarity search over FOMC chunks |
| + Cross-encoder reranker | No | Reranks candidates for better precision |
| Full LangGraph agent | Yes | Assesses, retrieves, synthesizes, cites, validates |

Key behaviors observed:
- Factual questions get high-confidence, well-cited answers
- Cross-document questions pull from multiple meeting dates
- Out-of-scope questions correctly return uncertainty responses
- All citations are validated against actual retrieved chunks

For retrieval benchmark data (HNSW vs IVF, chunking grid, reranker impact), see
[`optimization_efforts.md`](optimization_efforts.md).