PixelRAG

Visual Retrieval-Augmented Generation — a framework for building visual search systems from any document type.

PixelRAG renders documents (web pages, PDFs, images) as screenshots, embeds them with a vision-language model, builds FAISS indexes, and serves a search API. Wikipedia's 8.28M articles are the primary benchmark, but the system is general-purpose.

Architecture

Five packages, each independently installable:

Package	What it does	Install
pixelrag-render	Document → image tiles (Playwright CDP, PDF)	`uv sync --package pixelrag-render`
pixelrag-embed	Tiles → vectors → FAISS index (three independent tools)	`uv sync --package pixelrag-embed`
pixelrag-index	Orchestrates the full pipeline: source → ingest → embed → index	`uv sync --package pixelrag-index`
pixelrag-serve	FAISS search API (FastAPI, CPU or GPU)	`uv sync --package pixelrag-serve`
pixelrag-train	LoRA/DoRA fine-tuning for Qwen3-VL-Embedding	`uv sync --package pixelrag-train`

render ←── index ──→ embed       serve (independent)       train → serve (HTTP)

Quick Start

Search pre-built Wikipedia index

uv sync --package pixelrag-serve

# Download a pre-built index
aws s3 sync s3://wiki-screenshot-tiles-backup/kiwix_tiles/text_search_index_1024/ ./index/

# Start the API
pixelrag-serve --index-dir ./index --port 30001

# Query
curl -X POST http://localhost:30001/search \
  -H "Content-Type: application/json" \
  -d '{"queries": [{"text": "What is the capital of France?"}], "n_docs": 5}'

Build an index from local documents

uv sync --package pixelrag-index

# Create pixelrag.yaml
cat > pixelrag.yaml << 'EOF'
source:
  type: local
  path: ./my_docs

embed:
  model: Qwen/Qwen3-VL-Embedding-2B
  device: cuda
  gpu_ids: [0]

output: ./my_index
EOF

# Build
pixelrag-index build

# Serve
pixelrag-serve --index-dir ./my_index --port 30001

Render a single URL (agent use)

from pixelrag_render import render_url

tiles = render_url("https://en.wikipedia.org/wiki/Python", "./tiles")

Claude Code plugin — give Claude eyes

Setup (one-time):

./plugin/setup.sh

Then copy-paste any of these:

# "What does Hacker News look like right now?"
claude --plugin-dir ./plugin -p "screenshot https://news.ycombinator.com and summarize the top stories"

# "Read a research paper visually"
claude --plugin-dir ./plugin -p "screenshot https://arxiv.org/abs/2404.12387 and explain the key findings"

# "Check if my site looks right"
claude --plugin-dir ./plugin -p "screenshot http://localhost:3000 and tell me if anything looks broken"

Or start an interactive session and use the slash command:

claude --plugin-dir ./plugin
# then type: /screenshot https://example.com

No MCP server, no backend required — the plugin teaches Claude to call pixelrag-render directly via Bash and read the resulting tile images.

Embed tools (standalone)

Each tool works independently without the orchestrator:

pixelrag-chunk --tiles-dir ./tiles
pixelrag-embed --shard-dir ./tiles --output-dir ./embeddings --gpu-ids 0,1
pixelrag-build-index --embeddings-dir ./embeddings --output-dir ./index

License

Apache-2.0

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.github/workflows		.github/workflows
.local/bin		.local/bin
chromium		chromium
demos		demos
docs		docs
embed		embed
eval		eval
history		history
index		index
plugin		plugin
render		render
serve		serve
skill		skill
tmp		tmp
train		train
web		web
.gitattributes		.gitattributes
.gitignore		.gitignore
.upptimerc.yml		.upptimerc.yml
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PixelRAG

Architecture

Quick Start

Search pre-built Wikipedia index

Build an index from local documents

Render a single URL (agent use)

Claude Code plugin — give Claude eyes

Embed tools (standalone)

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PixelRAG

Architecture

Quick Start

Search pre-built Wikipedia index

Build an index from local documents

Render a single URL (agent use)

Claude Code plugin — give Claude eyes

Embed tools (standalone)

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages