# Fast Semantic Chunking Demo

**Approach**: Agno's native semantic chunking
- PDFReader for text extraction
- SemanticChunking (natural boundaries)
- PgVector hybrid search (vector + FTS)
- Gemini embeddings (text-embedding-004)

**Best for**: Fast prototyping

## Setup

In [1]:
import subprocess
import sys
from pathlib import Path

project_root = Path.cwd().parent
sys.path.insert(0, str(project_root))
pdf_dir = project_root / "data" / "pdfs"

print(f"‚úÖ Project root: {project_root}")

‚úÖ Project root: /home/vitor/contextual-rag-agno-supabase


## 1. Download Sample PDFs

In [2]:
# if pdf_dir.exists() and list(pdf_dir.glob("*.pdf")):
#     print(f"‚úÖ PDFs already downloaded: {len(list(pdf_dir.glob('*.pdf')))} files")
# else:
#     print("üì• Downloading sample PDFs...")
#     result = subprocess.run([sys.executable, str(project_root / "scripts" / "download_pdfs.py")], 
#                           capture_output=True, text=True, cwd=str(project_root))
#     print(result.stdout)

## 2. Initialize Knowledge Base

In [3]:
from src.storage.agno_knowledge import AgnoKnowledge

kb = AgnoKnowledge(table_name="economics_docs_gemini")
print("‚úÖ Knowledge base initialized")
print("   - Semantic chunking: ON")
print("   - Hybrid search: ON")
print(f"   - Table: {kb.knowledge.vector_db.table_name}")

‚úÖ Knowledge base initialized
   - Semantic chunking: ON
   - Hybrid search: ON
   - Table: economics_docs_gemini


## 3. Ingest PDFs

In [4]:
kb.ingest_directory(str(pdf_dir))
print("‚úÖ All PDFs ingested")

‚úÖ All PDFs ingested


## 4. Search with Hybrid Search

In [6]:
results = kb.search("O que √© a quinta lei de outro?", limit=3)

print("Search Results:\n" + "="*80)
for i, result in enumerate(results, 1):
    print(f"\n{i}. {result.content[:300]}...")
    print("-"*80)

Search Results:

1. AS CINCO LEIS DE OURO I. O ouro vem de bom grado e numa quantidade crescente para todo homem que separa n√£o menos de um d√©cimo de seus ganhos, a fim de criar um fundo para o seu futuro e o de sua pr√≥pria fam√≠lia. II.0 ouro trabalha diligente e satisfatoriamente para o homem prudente que, possuindo-o...
--------------------------------------------------------------------------------

2. ganhan do. Esse √© o resultado da primeira lei." A Segunda Lei de Ouro O ouro trabalha diligente e satisfatoriamente para o homem pru dente que, possuindo-o, encontra para ele um emprego lucrativo, multiplicando-o como os flocos de algod√£o no campo. "O ouro realmente √© um trabalhador bem-disposto. Es...
--------------------------------------------------------------------------------

3. que nove.' Debatam o assunto entre voc√™s. Se algu√©m puder provar que isso n√£o √© verdade, conversaremos a respeito amanh√£ quando estivermos juntos de novo."...
-------------------------------

## 5. Query with LLM Agent

In [None]:
from agno.agent import Agent
from agno.models.google import Gemini

agent = Agent(
    model=Gemini(id="gemini-2.5-flash"),
    knowledge=kb.knowledge,
    search_knowledge=True,
    markdown=True,  # Opcional: formata a resposta em markdown
    debug_mode=True,  # Opcional: mostra detalhes de execu√ß√£o
)

agent.print_response(
    "Explique o que √© a quinta lei de outro",
    stream=True
)

Output()

## 6. Try Different Queries

In [None]:
# queries = [
#     "What are the principles of wealth building?",
#     "How does money and credit work in the economy?",
#     "What is the role of government in economics?"
# ]

# for query in queries:
#     print(f"\n{'='*80}")
#     print(f"Q: {query}")
#     print('='*80)
#     agent.print_response(query, stream=True)
#     print("\n")