Skip to content

Eva02Daruma/AgentMyPdf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

10 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Agentic system for legal document compliance Q&A using Cloudflare Workers, AI, and RAG

Este es un sistema agentico inteligente que puede leer y razonar sobre documentos legales , proporcionando respuestas precisas basadas en compliance y regulaciones.

πŸ“‘ Table of Contents


πŸ—οΈ Architecture Overview

High-Level System Design

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                        USER REQUEST                              β”‚
β”‚                   POST /question {"question": "..."}             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                           β”‚
                           β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                   CLOUDFLARE WORKER (Hono)                       β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚
β”‚  β”‚  1. Validate Request                                    β”‚    β”‚
β”‚  β”‚  2. Create runId                                        β”‚    β”‚
β”‚  β”‚  3. Queue to IntelligentAgent (Durable Object)         β”‚    β”‚
β”‚  β”‚  4. Return 202 Accepted + statusUrl                    β”‚    β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                           β”‚
                           β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚           INTELLIGENT AGENT (Durable Object + Agents SDK)       β”‚
β”‚                                                                  β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚  GUARDRAIL: Relevance Check                             β”‚  β”‚
β”‚  β”‚  β€’ LLM classifies if question is compliance-related     β”‚  β”‚
β”‚  β”‚  β€’ Rejects casual greetings, off-topic questions        β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚                           β”‚                                      β”‚
β”‚                           β–Ό                                      β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚  RAG PIPELINE (Structured Reasoning)                    β”‚  β”‚
β”‚  β”‚                                                           β”‚  β”‚
β”‚  β”‚  PHASE 1: EXTRACTION                                     β”‚  β”‚
β”‚  β”‚  └─ Tool: Text Embedding Generator                      β”‚  β”‚
β”‚  β”‚     └─ Workers AI (@cf/baai/bge-base-en-v1.5)          β”‚  β”‚
β”‚  β”‚                                                           β”‚  β”‚
β”‚  β”‚  PHASE 2: SEARCH                                         β”‚  β”‚
β”‚  β”‚  └─ Tool: Vector Similarity Search                      β”‚  β”‚
β”‚  β”‚     └─ Vectorize (top-10 semantic search)              β”‚  β”‚
β”‚  β”‚                                                           β”‚  β”‚
β”‚  β”‚  PHASE 3: RETRIEVAL                                      β”‚  β”‚
β”‚  β”‚  └─ Tool: Document Retriever                            β”‚  β”‚
β”‚  β”‚     └─ D1 Database (structured docs + metadata)        β”‚  β”‚
β”‚  β”‚                                                           β”‚  β”‚
β”‚  β”‚  PHASE 4: GENERATION                                     β”‚  β”‚
β”‚  β”‚  └─ Tool: LLM Answer Generator                          β”‚  β”‚
β”‚  β”‚     └─ Workers AI (@cf/meta/llama-4-scout-17b...)      β”‚  β”‚
β”‚  β”‚     └─ Chain-of-thought prompting                       β”‚  β”‚
β”‚  β”‚                                                           β”‚  β”‚
β”‚  β”‚  PHASE 5: EVALUATION                                     β”‚  β”‚
β”‚  β”‚  └─ Quality checks (content, citations, relevance)      β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚                           β”‚                                      β”‚
β”‚                           β–Ό                                      β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚  PERSISTENCE                                             β”‚  β”‚
β”‚  β”‚  β€’ Store in agent_runs table (D1)                       β”‚  β”‚
β”‚  β”‚  β€’ Track metrics, tools used, latency                   β”‚  β”‚
β”‚  β”‚  β€’ Update agent state                                   β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                           β”‚
                           β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                   USER POLLS STATUS                              β”‚
β”‚              GET /status/:runId β†’ returns answer                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Core Components

  1. Cloudflare Worker (Hono): HTTP API layer
  2. IntelligentAgent (Durable Object): Agentic processing with state management
  3. Workers AI: Embeddings + LLM inference
  4. Vectorize: Semantic vector search
  5. D1 Database: Document storage + run history
  6. AI Gateway: Observability and caching

🎯 Design Decisions

1. Cloudflare Agents SDK over Custom Implementation

Decision: Use Cloudflare's official Agents SDK instead of building a custom agent system.

Rationale:

  • βœ… Built-in state management with automatic persistence
  • βœ… RPC support for direct method calls
  • βœ… WebSocket integration for real-time updates
  • βœ… SQL storage integrated (no external DB needed)
  • βœ… Follows Cloudflare best practices

Implementation: IntelligentAgent extends Agent<Env, RAGAgentState>

2. Durable Objects for Stateful Agent

Decision: Use Durable Objects to manage agent state and ensure consistency.

Rationale:

  • βœ… Single instance ensures no race conditions
  • βœ… Built-in persistence (state survives restarts)
  • βœ… Co-location with data (SQL, state in same location)
  • βœ… Can run long operations without timeouts

3. Asynchronous Processing

Decision: Return immediately (202 Accepted) and process in background.

Rationale:

  • βœ… Agent runs can take 10+ seconds
  • βœ… User doesn't need to keep connection open
  • βœ… Resilient to network issues
  • βœ… Can handle multiple requests concurrently

Implementation:

// POST /question returns immediately
{ "runId": "...", "statusUrl": "/status/run-xxx" }

// Agent processes in background using ctx.waitUntil
await agent.processQuestion({ question, runId });

4. Guardrails for Compliance Focus

Decision: Add LLM-based relevance filter before RAG pipeline.

Rationale:

  • βœ… Prevents hallucinations on irrelevant questions
  • βœ… Saves compute (no embedding/search for "Hello")
  • βœ… Clear user feedback for off-topic questions
  • βœ… Aligns with compliance Q&A purpose

Implementation:

  • Uses separate LLM call with binary classification
  • Low temperature (0.1) for consistent filtering
  • Fast response (~500ms)

5. Structured Reasoning with Tool Tracking

Decision: Explicit phases with tool usage metrics.

Rationale:

  • βœ… Observable reasoning process
  • βœ… Easy to debug which phase failed
  • βœ… Metrics for optimization
  • βœ… Matches interview requirements

Phases:

  1. Guardrail: Relevance Check (optional)
  2. Extraction: Generate question embedding
  3. Search: Vector similarity in Vectorize
  4. Retrieval: Fetch full documents from D1
  5. Generation: LLM answer with chain-of-thought
  6. Evaluation: Quality assessment

6. Chain-of-Thought Prompting

Decision: Structured prompts following Cloudflare best practices.

Rationale:

  • βœ… Better reasoning quality
  • βœ… Explicit citation requirements
  • βœ… Metadata context (relevance scores)
  • βœ… Output format specification

πŸ“Š Data Extraction & RAG Pipeline

1. POST /question

Crea una nueva pregunta para el agente. Retorna un runId para tracking.

curl -X POST http://localhost:8787/question \
  -H 'Content-Type: application/json' \
  -d '{"question":"ΒΏQuΓ© obligaciones tengo sobre protecciΓ³n de datos personales?"}'

Respuesta:

{
  "success": true,
  "runId": "run-uuid",
  "status": "pending",
  "message": "Agent run created. Processing asynchronously."
}

2. GET /status/:runId

Consulta el estado de un run especΓ­fico.

curl http://localhost:8787/status/run-uuid

Respuesta:

{
  "success": true,
  "data": {
    "id": "run-uuid",
    "question": "...",
    "status": "completed",
    "result": "...",
    "createdAt": 1699999999999,
    "completedAt": 1699999999999
  }
}

3. POST /seed

Carga documentos en la base de conocimiento (uso interno).

4. GET /ws - WebSocket (⭐ NUEVO)

ConexiΓ³n WebSocket para recibir actualizaciones en tiempo real del agente.

Protocolo:

  • Conecta a: ws://localhost:8787/ws
  • SuscripciΓ³n especΓ­fica: {"type": "subscribe", "runId": "run-uuid"}
  • SuscripciΓ³n global (testing): {"type": "subscribe", "runId": "all"} ← ⭐ Recibe TODOS los runs
  • Recibe actualizaciones automΓ‘ticas cuando el agente progresa

Ejemplo con websocat:

# Instalar websocat: brew install websocat
websocat ws://localhost:8787/ws

Luego envΓ­a:

{"type": "subscribe", "runId": "tu-run-id-aqui"}

5. GET /ws-client - Cliente Web

Interfaz web para probar el WebSocket fΓ‘cilmente.

Uso:

  1. Abre en tu navegador: http://localhost:8787/ws-client
  2. Escribe tu pregunta legal
  3. Click en "Send Question"
  4. Observa las actualizaciones en tiempo real! πŸŽ‰

⭐ Modo Testing: El cliente web se auto-suscribe a TODOS los runs al conectar (no necesitas especificar runId). Esto es perfecto para testing - verÑs actualizaciones de cualquier pregunta que se ejecute.

πŸ”Œ WebSocket: Actualizaciones en Tiempo Real

En lugar de hacer polling constante a /status/:runId, puedes suscribirte via WebSocket y recibir actualizaciones automΓ‘ticas:

Ventajas:

  • βœ… Sin polling - actualizaciones instantΓ‘neas
  • βœ… Menor latencia
  • βœ… Menos requests al servidor
  • βœ… Experiencia mΓ‘s fluida

Flujo:

1. POST /question β†’ obtienes runId
2. Conectar WebSocket a /ws
3. Enviar {"type": "subscribe", "runId": "..."}
4. Recibir actualizaciones automΓ‘ticamente:
   - status: "running", step: "embedding_complete"
   - status: "running", step: "documents_retrieved"
   - status: "completed", result: "..."

Tipos de mensajes:

// ConexiΓ³n
{ type: "connected", clientId: "...", timestamp: ... }

// SuscripciΓ³n confirmada
{ type: "subscribed", runId: "...", timestamp: ... }

// ActualizaciΓ³n del run
{ 
  type: "run_update",
  runId: "...",
  update: {
    status: "running" | "completed" | "failed",
    step: "started" | "embedding_complete" | "documents_retrieved",
    result?: "...",  // solo cuando completed
    totalTime?: 2847  // milisegundos
  }
}

Como cargar documentos en Seed

Hay 2 Opciones :

  1. Usando el script seed-documents.ts
  2. Usando el endpoint /seed

About

Agente Rag LLM con embeddings usando CloudFlare

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published