A unified image search system using CLIP embeddings stored in Supabase Postgres (pgvector) with Cloudflare R2 storage.
This project consists of two main components in a single codebase:
- Syncer - Uploads images to R2 and generates embeddings via Replicate's CLIP API
- Server - Web UI for searching images by text using vector similarity
Additionally, there's a separate mlx-local directory for local Apple Silicon inference (optional).
image-browser/
├── src/
│ ├── shared/ # Shared utilities (db, r2, replicate)
│ ├── syncer/ # Upload and embedding logic
│ └── server/ # Web server and UI
├── public/ # Static assets (favicon)
├── mlx-local/ # Optional: Local MLX inference (Python)
├── package.json # Unified dependencies
├── Dockerfile # Production deployment
└── mise.toml # Task automation
- Bucket-based table naming: Each R2 bucket automatically gets its own database table (
bucket_name_embeddings), enabling multiple deployments in the same database - No prefix complexity: Files are stored directly in bucket root
- Shared configuration: Same
.envworks for both syncer and server (can deploy separately) - Concurrent operations: Configurable workers for uploads and embedding generation
- Automatic retry logic: Handles Replicate API rate limits and database connection issues
- Production ready: Docker support with pnpm and multi-stage builds
- Node.js 20+ (via mise or nvm)
- pnpm (will be auto-enabled via packageManager field)
- Supabase account (for PostgreSQL with pgvector)
- Cloudflare R2 account
- Replicate API token
- Install dependencies:
pnpm install- Copy
env.exampleto.envand configure:
cp env.example .env- Required environment variables:
# Database
SUPABASE_DB_URL=postgresql://...
# R2 Storage
R2_ACCOUNT_ID=your-account-id
R2_ACCESS_KEY_ID=your-key
R2_SECRET_ACCESS_KEY=your-secret
R2_BUCKET=my-images
IMAGE_BASE_URL=https://your-r2-public-url.com
# Replicate
REPLICATE_API_TOKEN=your-token- Ensure database schema:
pnpm run ensure-schemaThis will create a table named {sanitized_bucket_name}_embeddings with a 768-dimensional vector column.
Syncer operations:
# Upload images to R2 and create DB rows
pnpm run upload
# Fast upload (skip R2 HEAD checks)
SKIP_R2_HEAD=true pnpm run upload
# Generate embeddings for missing images
pnpm run embed
# Full sync: upload + embed loop
pnpm run syncServer operations:
# Development server (with auto-reload)
pnpm run server:dev
# Production build
pnpm run build
# Start production server
pnpm start# Syncer
mise run upload
mise run fast-upload
mise run embed
mise run sync
# Server
mise run server
# MLX local (Apple Silicon)
mise run run-mlx-local-
Upload (
src/syncer/upload.ts):- Scans local
images/directory - Uploads new images to R2 bucket
- Creates database rows with
nullembeddings
- Scans local
-
Embed (
src/syncer/embed.ts):- Fetches images with
nullembeddings - Generates 768-d CLIP embeddings via Replicate
- Updates database with embeddings
- Handles retries for API rate limits (429, 5xx)
- Fetches images with
-
Sync (
src/syncer/sync.ts):- Orchestrates upload + embed loop
- Runs until all images have embeddings
- Search: Converts text query to embedding via Replicate, finds nearest neighbors in database
- Browse: Lists recent images with embeddings
- Neighbors: Finds visually similar images using image embeddings
- Stats: Shows encoding progress (total/encoded/pending)
The table name is automatically derived from R2_BUCKET:
my-images→my_images_embeddingsphotos→photos_embeddingsvacation-2024→vacation_2024_embeddings
This allows multiple independent collections in the same database.
# Upload concurrency (default: 8)
UPLOAD_CONCURRENCY=16
# Skip R2 existence checks for faster uploads
SKIP_R2_HEAD=true
# Embedding concurrency (default: 3)
CONCURRENCY=5
# Batch size for embedding (default: 100)
EMBED_LIMIT=200
# Database pool settings
PG_MAX=10
PG_IDLE=30000# Override default CLIP model
REPLICATE_TEXT_MODEL=your-text-model:version
REPLICATE_IMAGE_MODEL=your-image-model:version
# Adjust input keys if using different models
REPLICATE_TEXT_INPUT_KEY=prompt
REPLICATE_IMAGE_INPUT_KEY=url
# Match model output dimension
EXPECTED_VECTOR_DIM=512Build and run:
docker build -t image-browser .
docker run -p 3000:3000 --env-file .env image-browserThe syncer and server can be deployed on different machines - they share the same .env configuration but run independently.
Syncer (e.g., local machine or cron job):
pnpm run syncServer (e.g., Fly.io, Railway, Render):
pnpm startBoth connect to the same database and R2 bucket via shared .env.
For local inference on Apple Silicon without Replicate costs:
cd mlx-local
mise run install
mise run webThe MLX version now also uses bucket-based table naming via db_utils.py.
If you have existing data in an image_embeddings table:
- Manually rename the table to match your bucket:
ALTER TABLE image_embeddings RENAME TO my_bucket_embeddings;-
Or export/import data to the new table name
-
Update any references in mlx-local if using it
"R2_BUCKET is not set"
- Ensure
.envcontainsR2_BUCKET=your-bucket-name
"Embedding dimension mismatch"
- Check that
EXPECTED_VECTOR_DIMmatches your model output (default: 768) - Ensure database table was created with correct dimension
Rate limit errors (429)
- The syncer automatically retries with backoff
- Reduce
CONCURRENCYto slow down requests
Database connection errors
- Check
SUPABASE_DB_URLis correct - Ensure pgvector extension is installed
- Verify network access to database
Watch mode:
pnpm run server:devType checking:
pnpm exec tsc --noEmitClean start:
rm -rf node_modules dist
pnpm install
pnpm run buildPrivate project.
Built by thefocus.ai