# Fast Semantic Chunking Demo

**Approach**: Agno's native semantic chunking
- PDFReader for text extraction
- SemanticChunking (natural boundaries)
- PgVector hybrid search (vector + FTS)
- Gemini embeddings (text-embedding-004)

**Best for**: Fast prototyping, ~10 lines of code

## Setup

In [1]:
import subprocess
import sys
from pathlib import Path

project_root = Path.cwd().parent
sys.path.insert(0, str(project_root))
pdf_dir = project_root / "data" / "pdfs"

print(f"âœ… Project root: {project_root}")

âœ… Project root: /home/vitor/contextual-rag-agno-supabase


## 1. Download Sample PDFs

In [2]:
# if pdf_dir.exists() and list(pdf_dir.glob("*.pdf")):
#     print(f"âœ… PDFs already downloaded: {len(list(pdf_dir.glob('*.pdf')))} files")
# else:
#     print("ðŸ“¥ Downloading sample PDFs...")
#     result = subprocess.run([sys.executable, str(project_root / "scripts" / "download_pdfs.py")], 
#                           capture_output=True, text=True, cwd=str(project_root))
#     print(result.stdout)

## 2. Initialize Knowledge Base

In [3]:
from src.storage.agno_knowledge import AgnoKnowledge

kb = AgnoKnowledge(table_name="economics_docs")
print("âœ… Knowledge base initialized")
print("   - Semantic chunking: ON")
print("   - Hybrid search: ON")
print(f"   - Table: {kb.knowledge.vector_db.table_name}")

âœ… Knowledge base initialized
   - Semantic chunking: ON
   - Hybrid search: ON
   - Table: economics_docs


## 3. Ingest PDFs

In [4]:
kb.ingest_directory(str(pdf_dir))
print("âœ… All PDFs ingested")

TypeError: Can't instantiate abstract class _ChonkieEmbedderWrapper without an implementation for abstract methods 'count_tokens', 'count_tokens_batch', 'embed_batch', 'similarity'

## 4. Search with Hybrid Search

In [None]:
results = kb.search("What is the division of labor?", limit=3)

print("Search Results:\n" + "="*80)
for i, result in enumerate(results, 1):
    print(f"\n{i}. {result.content[:300]}...")
    print("-"*80)

## 5. Query with LLM Agent

In [None]:
from agno.agent import Agent
from agno.models.google import Gemini

agent = Agent(
    model=Gemini(id="gemini-2.0-flash-exp"),
    knowledge=kb.knowledge,
    search_knowledge=True,
    show_tool_calls=True
)

agent.print_response(
    "Explain Adam Smith's concept of the invisible hand and its role in free markets",
    stream=True
)

## 6. Try Different Queries

In [None]:
queries = [
    "What are the principles of wealth building?",
    "How does money and credit work in the economy?",
    "What is the role of government in economics?"
]

for query in queries:
    print(f"\n{'='*80}")
    print(f"Q: {query}")
    print('='*80)
    agent.print_response(query, stream=True)
    print("\n")