Visual Retrieval-Augmented Generation — a framework for building visual search systems from any document type.
PixelRAG renders documents (web pages, PDFs, images) as screenshots, embeds them with a vision-language model, builds FAISS indexes, and serves a search API. Wikipedia's 8.28M articles are the primary benchmark, but the system is general-purpose.
Five packages, each independently installable:
| Package | What it does | Install |
|---|---|---|
| pixelrag-render | Document → image tiles (Playwright CDP, PDF) | uv sync --package pixelrag-render |
| pixelrag-embed | Tiles → vectors → FAISS index (three independent tools) | uv sync --package pixelrag-embed |
| pixelrag-index | Orchestrates the full pipeline: source → ingest → embed → index | uv sync --package pixelrag-index |
| pixelrag-serve | FAISS search API (FastAPI, CPU or GPU) | uv sync --package pixelrag-serve |
| pixelrag-train | LoRA/DoRA fine-tuning for Qwen3-VL-Embedding | uv sync --package pixelrag-train |
render ←── index ──→ embed serve (independent) train → serve (HTTP)
uv sync --package pixelrag-serve
# Download a pre-built index
aws s3 sync s3://wiki-screenshot-tiles-backup/kiwix_tiles/text_search_index_1024/ ./index/
# Start the API
pixelrag-serve --index-dir ./index --port 30001
# Query
curl -X POST http://localhost:30001/search \
-H "Content-Type: application/json" \
-d '{"queries": [{"text": "What is the capital of France?"}], "n_docs": 5}'uv sync --package pixelrag-index
# Create pixelrag.yaml
cat > pixelrag.yaml << 'EOF'
source:
type: local
path: ./my_docs
embed:
model: Qwen/Qwen3-VL-Embedding-2B
device: cuda
gpu_ids: [0]
output: ./my_index
EOF
# Build
pixelrag-index build
# Serve
pixelrag-serve --index-dir ./my_index --port 30001from pixelrag_render import render_url
tiles = render_url("https://en.wikipedia.org/wiki/Python", "./tiles")Setup (one-time):
./plugin/setup.shThen copy-paste any of these:
# "What does Hacker News look like right now?"
claude --plugin-dir ./plugin -p "screenshot https://news.ycombinator.com and summarize the top stories"
# "Read a research paper visually"
claude --plugin-dir ./plugin -p "screenshot https://arxiv.org/abs/2404.12387 and explain the key findings"
# "Check if my site looks right"
claude --plugin-dir ./plugin -p "screenshot http://localhost:3000 and tell me if anything looks broken"Or start an interactive session and use the slash command:
claude --plugin-dir ./plugin
# then type: /screenshot https://example.comNo MCP server, no backend required — the plugin teaches Claude to call pixelrag-render directly via Bash and read the resulting tile images.
Each tool works independently without the orchestrator:
pixelrag-chunk --tiles-dir ./tiles
pixelrag-embed --shard-dir ./tiles --output-dir ./embeddings --gpu-ids 0,1
pixelrag-build-index --embeddings-dir ./embeddings --output-dir ./indexApache-2.0