Local, single-user photo-tagging and face-management tool. Scans a folder, tags every image with RAM++ (~4500 open-vocabulary labels), embeds with CLIP for semantic search, clusters with UMAP + HDBSCAN to surface the corpus's natural taxonomy, optionally detects + clusters faces via RetinaFace + ArcFace, and exposes everything through a CLI plus a single-page FastAPI UI.
Runs offline. No telemetry. No cloud. Your photos and embeddings never leave your machine.
New here? Start with
GETTING_STARTED.md— a TL;DR install + daily-commands + UI shortcuts walkthrough.
Existing photo managers (digiKam, PhotoPrism, Apple/Google Photos) ask you to invent your own categories before they can help. RAM++ + CLIP flip that around: tag everything with an open vocabulary, then let the clustering surface the categories the corpus actually contains. You see "x-ray scans, vacation landscapes, screenshots, kids portraits" without having defined any of those buckets.
Faces are a parallel track: detect → embed → cluster → name → browse.
Strictly opt-in (--i-understand on first run), strictly local, the
embeddings never leave the machine. Wipeable in one command.
- v1 — Discovery loop: shipped (scan, embed, cluster, HTML report).
- v1.5 — Search & maintenance: shipped (
query,list,stats,export,prune,doctor,backup, EXIF + reverse-geocoding). - v2 — Productivity: faces shipped (detect, cluster, validate,
identity merge, hard-negative cannot-link, sticky corrections,
triage queue, edge gallery, vectorized auto-attach). XMP sidecars
- user categories shipped:
phototag xmp write/clean,phototag category add/rm/list/map/unmap, plus an in-app categories editor that driveslr:HierarchicalSubjecton save.
- user categories shipped:
Live status table: specs/16-improvement-plan.md.
- Recursive walk; xxhash64 + mtime gating so re-runs are no-ops on unchanged files.
- RAM++ batched inference (GPU or CPU); per-tag scores stored so thresholds can be re-applied without re-running.
- EXIF extraction (date, camera, exposure, GPS) into a JSON field.
- Reverse-geocode GPS into city/country tags via offline cities-1000.
- CLIP image embeddings cached in SQLite; reused for clustering and semantic search.
- UMAP → HDBSCAN; cluster labels via TF-IDF on RAM tags.
- Static HTML report with per-cluster thumbnails.
- Re-cluster anytime; runs are versioned.
phototag query "TEXT"— CLIP semantic search over cached embeddings.phototag list --tag X --tag Y— AND across tags with score floor.phototag stats --kind label— corpus tag distribution; geo facts separated from model labels.phototag export --format csv|json— bulk dump.
- RetinaFace detection + ArcFace embedding (InsightFace
buffalo_l). - UMAP + HDBSCAN clustering; identity carry-forward via Hungarian assignment + sample-weighted centroid blend (capped so identities drift on new evidence).
- Per-face validate / wrong / drop-dups / delete / re-detect.
- Cannot-link from
face_corrections— once you say "wrong" on a face, the system never re-suggests that identity for that face. - Triage queue — finite walk through photos with unverified named faces or duplicate-name overlays.
- Fringe view — per identity, the 9 most-uncertain faces (cluster edge), one-click verify or unassign.
- Bulk auto-attach — vectorized cosine match of every orphan face against every identity centroid in one matmul.
- Identity merge — collapse two identity rows into one, blends centroids by sample count.
- Re-cluster orphans only —
phototag faces refine-noiseon the noise/orphan pool; named clusters never touched.
- FastAPI single-page app, same-origin (no CORS friction).
- Lightbox with face overlays (per-cluster colour, ⚠ for duplicate
names,
?for unverified, sim badge for auto-attached, ✓ implicit for validated). - Sidebar with cluster filter (space-delimited AND tokens, bold matches, X clear), pinned noise / orphan + triage queue entries.
- Keyboard everywhere:
?opens a help overlay listing every shortcut. - Optional
APP_API_TOKEN[_FILE]middleware for non-loopback binds (constant-time compare, hot-rotation via file watch).
phototag prune [--apply]— drop DB rows for files missing on disk.phototag doctor [--fix]— health check (size mismatches, orphan identities, schema-version drift).phototag backup --out PATH— atomic SQLite snapshot.phototag faces corrections-compact— dedup the audit log per face_id.phototag faces clear-noise-labels— recovery for a historic bug where naming the noise cluster mass-tagged its members.phototag xmp write PATH [--apply] [--include-people]— write<image>.xmpsidecars (Dublin Coredc:Subject) so digiKam, Lightroom, darktable, Capture One can read the tags. Idempotent. Requires theexiftoolsystem binary (apt install libimage-exiftool-perl/brew install exiftool). Example:uv run phototag xmp write ./data/pictures --apply --include-people.phototag xmp clean PATH [--apply]— remove every sidecar under PATH (default dry-run).phototag category add/rm/list/map/unmap— manage user categories that drivelr:HierarchicalSubject(the keyword-tree field digiKam / Lightroom show as a folder hierarchy). Bind tags or face-clusters to a category; on the nextphototag xmp write --applyeach photo gets hierarchical entries shapedcategory|subjectfor every applicable rule. Cluster rules win over tag rules. The same surface is editable in-app via the sidebar categories view. Seespecs/08-xmp-categories.md.phototag info IMAGE_PATH— inspect tags + metadata for one image (DB row + EXIF) without launching the UI.phototag exif-backfill [--limit N]— extract EXIF from disk intoimages.exif_jsonfor legacy DB rows that pre-date EXIF capture.phototag geo-tag [--limit N]— reverse-geocode EXIF GPS intocity+countrytags (model_name=geo_v1). Offline viareverse_geocoder([geo] extra).phototag rename CLUSTER_ID [LABEL]— set or clearlabel_useron one cluster (omitLABELto clear).phototag rename-bulk JSON_PATH— apply{cluster_id: label_user}from a JSON file in one transaction.phototag version— print the installed version.
git clone <repo> && cd image-classifier
uv sync --extra all # core + RAM++ + CLIP + clustering + faces + UI
uv run pre-commit install # contributors onlyHeavy ML deps live behind extras ([ram], [clip], [cluster],
[face], [heic], [raw], [exif], [geo], [ui], [report]).
Core install (uv sync without extras) works without any model.
Weights:
- RAM++ (~5 GB) — download
ram_plus_swin_large_14m.pthfrom the recognize-anything upstream into$XDG_CACHE_HOME/phototag/models/(default~/.cache/phototag/models/; override withAPP_MODELS_DIR). - InsightFace buffalo_l (~200 MB) — auto-downloads on first
phototag faces detect. - open_clip ViT-B/32 — auto-downloads on first
phototag embed.
See GETTING_STARTED.md for the full TL;DR. The
30-second version:
ln -s /path/to/your/photos data/pictures # point at your library
uv run phototag scan ./data/pictures # tag with RAM++
uv run phototag embed # CLIP embeddings
uv run phototag cluster --min-size 20 # UMAP + HDBSCAN
uv run phototag report --out ./report # static HTML
uv run phototag serve --port 8000 # interactive UIFaces (opt-in):
uv run phototag faces detect --i-understand # consent gate (one-time)
uv run phototag faces cluster --min-size 3 # group into people
uv run phototag faces auto-attach --persist # bulk-attach orphans
uv run phototag faces stats # see counts[Scanner] -> [Queue] -+-> [Worker RAM] --+
+-> [Worker CLIP] -+-> [Store SQLite] -> [API/CLI/Export]
|
+-> [Clusterer UMAP+HDBSCAN]
+-> [Faces detect/cluster]
+-> [HTML report]
Single SQLite file (WAL mode, atomic numbered migrations,
thread-local connections + write lock for the FastAPI threadpool).
Models behind Tagger / Embedder / FaceDetector interfaces so
swapping a backend doesn't touch the rest of the pipeline.
Detail: specs/01-architecture.md,
specs/03-data-model.md,
specs/15-faces.md.
This tool processes biometric data when face features are enabled. Hard rules baked in:
- Opt-in:
phototag scannever triggers detection. Face commands require--i-understandon first run. - Local only: every embedding stays in the SQLite file under
data/. No network calls during inference. - Wipeable:
phototag faces purge --yesdrops every face row, cluster, identity, and audit-log entry.--keep-identitieskeeps the names but drops the embeddings. - Don't process other people's libraries without their consent.
GPS data is extracted into images.exif_json; if you share the DB,
GPS leaks. Sanitize before exporting.
Full statement: specs/15-faces.md §"Privacy &
ethics".
- Python 3.14 (uv-managed; modern type hints throughout).
- SQLite (WAL, JSON1, single-file portable).
- FastAPI + uvicorn for the UI; same-origin SPA, no bundler.
- PyTorch + Transformers for RAM++; open_clip for CLIP; InsightFace + onnxruntime for faces.
- UMAP + HDBSCAN for clustering; scipy for Hungarian assignment.
- structlog with TTY detection; typer for the CLI; pydantic-settings for config; xxhash for content hashing.
- pytest for testing (83+ tests; CLI / Store / API / face helpers covered; heavy ML paths exercised by slow integration marker).
- ruff + mypy --strict + pre-commit for the contributor loop.
phototag/
cli.py typer entry point — every command
pipeline.py scan + tag + embed orchestration (batched)
scanner.py recursive walk + xxhash + mtime
store.py SQLite wrapper (migrations, thread-local conns,
write lock, all queries)
exif.py EXIF extraction + sanitization
geo.py offline reverse-geocoding (cities-1000)
clustering.py UMAP + HDBSCAN + TF-IDF cluster naming
reporting.py static HTML report (Jinja2)
faces.py face detect, cluster, identity match, sticky
corrections, attach, refine-noise, auto-attach
ui.py FastAPI app + every endpoint
models/
base.py Tagger / Embedder Protocols
ram.py RAM++ wrapper
clip.py open_clip wrapper
logging.py
config.py
settings.py APP_* env-var bound via pydantic-settings
static/ ui.css + ui.js (esbuild bundle of static/src/*.js)
static/src/ ESM modules — state, api, lightbox, sidebar,
workspace, keyboard, runs, main
templates/ ui.html, cluster.html.j2, index.html.j2
specs/ design + roadmap + improvement plan
tests/ pytest suites (CLI / Store / API / faces / EXIF)
data/ gitignored: DB, model weights, caches, backups
make lint # pre-commit on all files
uv run pytest # fast tests (default)
uv run pytest -m slow # tests requiring downloaded models
make test-cov # term-missing + html + xml
make js-build # bundle static/src/*.js -> static/ui.js
make js-watch # same, in watch modeThe frontend lives in static/src/ as ESM modules and is bundled to a
single ES2020 IIFE at static/ui.js by esbuild. After editing anything
under static/src/, run make js-build (one-time npm install first to
fetch esbuild). The bundle output is committed, so contributors who don't
touch JS never need Node — node_modules/ is gitignored.
Project conventions: CLAUDE.md (overrides apply
project-wide; honored by both human and AI contributors).
Pre-commit hooks: ruff (lint + format), mypy strict, pyupgrade
--py314-plus, detect-secrets, trailing-whitespace,
end-of-file-fixer, check-yaml/toml.
.env overrides (all APP_ prefix, parsed by pydantic-settings):
| var | default | what |
|---|---|---|
APP_LOG_LEVEL |
INFO |
structlog level |
APP_JSON_LOGS |
auto | force json/console; auto = TTY detect |
APP_DB_PATH |
phototag.db |
SQLite file location |
APP_MODELS_DIR |
$XDG_CACHE_HOME/phototag/models |
weights cache (per-user, outside library bundle) |
APP_DEVICE |
auto |
auto / cpu / cuda |
APP_API_TOKEN |
(unset) | shared secret for the UI; empty disables auth |
APP_API_TOKEN_FILE |
(unset) | file path to a secret; re-read per request (hot rotation) |
00-overview.md— goal, non-goals, version split01-architecture.md— components, data flow02-stack.md— tech choices, rationale03-data-model.md— SQLite schema (v1–v11)04-pipeline-tagging.md— scan/tag/persist05-clustering.md— UMAP + HDBSCAN + naming06-search.md— semantic search07-reporting.md— HTML report08-xmp-categories.md— XMP sidecars- user categories (shipped:
phototag xmp+phototag categoryCLIs, in-app rule editor)
- user categories (shipped:
09-cli.md— CLI surface10-project-structure.md— repo layout11-roadmap.md— milestones, status12-performance.md— perf targets13-risks.md— risks + mitigations14-testing.md— test strategy15-faces.md— face detection / recognition / clustering (full design)16-improvement-plan.md— forward backlog with status
Apache-2.0. See pyproject.toml.