Skip to content

babs/phototag

Repository files navigation

phototag

Local, single-user photo-tagging and face-management tool. Scans a folder, tags every image with RAM++ (~4500 open-vocabulary labels), embeds with CLIP for semantic search, clusters with UMAP + HDBSCAN to surface the corpus's natural taxonomy, optionally detects + clusters faces via RetinaFace + ArcFace, and exposes everything through a CLI plus a single-page FastAPI UI.

Runs offline. No telemetry. No cloud. Your photos and embeddings never leave your machine.

New here? Start with GETTING_STARTED.md — a TL;DR install + daily-commands + UI shortcuts walkthrough.

Why

Existing photo managers (digiKam, PhotoPrism, Apple/Google Photos) ask you to invent your own categories before they can help. RAM++ + CLIP flip that around: tag everything with an open vocabulary, then let the clustering surface the categories the corpus actually contains. You see "x-ray scans, vacation landscapes, screenshots, kids portraits" without having defined any of those buckets.

Faces are a parallel track: detect → embed → cluster → name → browse. Strictly opt-in (--i-understand on first run), strictly local, the embeddings never leave the machine. Wipeable in one command.

Status

  • v1 — Discovery loop: shipped (scan, embed, cluster, HTML report).
  • v1.5 — Search & maintenance: shipped (query, list, stats, export, prune, doctor, backup, EXIF + reverse-geocoding).
  • v2 — Productivity: faces shipped (detect, cluster, validate, identity merge, hard-negative cannot-link, sticky corrections, triage queue, edge gallery, vectorized auto-attach). XMP sidecars
    • user categories shipped: phototag xmp write/clean, phototag category add/rm/list/map/unmap, plus an in-app categories editor that drives lr:HierarchicalSubject on save.

Live status table: specs/16-improvement-plan.md.

What's in the box

Scan & tag

  • Recursive walk; xxhash64 + mtime gating so re-runs are no-ops on unchanged files.
  • RAM++ batched inference (GPU or CPU); per-tag scores stored so thresholds can be re-applied without re-running.
  • EXIF extraction (date, camera, exposure, GPS) into a JSON field.
  • Reverse-geocode GPS into city/country tags via offline cities-1000.

Embed, cluster, report

  • CLIP image embeddings cached in SQLite; reused for clustering and semantic search.
  • UMAP → HDBSCAN; cluster labels via TF-IDF on RAM tags.
  • Static HTML report with per-cluster thumbnails.
  • Re-cluster anytime; runs are versioned.

Search

  • phototag query "TEXT" — CLIP semantic search over cached embeddings.
  • phototag list --tag X --tag Y — AND across tags with score floor.
  • phototag stats --kind label — corpus tag distribution; geo facts separated from model labels.
  • phototag export --format csv|json — bulk dump.

Faces (opt-in, biometric)

  • RetinaFace detection + ArcFace embedding (InsightFace buffalo_l).
  • UMAP + HDBSCAN clustering; identity carry-forward via Hungarian assignment + sample-weighted centroid blend (capped so identities drift on new evidence).
  • Per-face validate / wrong / drop-dups / delete / re-detect.
  • Cannot-link from face_corrections — once you say "wrong" on a face, the system never re-suggests that identity for that face.
  • Triage queue — finite walk through photos with unverified named faces or duplicate-name overlays.
  • Fringe view — per identity, the 9 most-uncertain faces (cluster edge), one-click verify or unassign.
  • Bulk auto-attach — vectorized cosine match of every orphan face against every identity centroid in one matmul.
  • Identity merge — collapse two identity rows into one, blends centroids by sample count.
  • Re-cluster orphans onlyphototag faces refine-noise on the noise/orphan pool; named clusters never touched.

UI

  • FastAPI single-page app, same-origin (no CORS friction).
  • Lightbox with face overlays (per-cluster colour, ⚠ for duplicate names, ? for unverified, sim badge for auto-attached, ✓ implicit for validated).
  • Sidebar with cluster filter (space-delimited AND tokens, bold matches, X clear), pinned noise / orphan + triage queue entries.
  • Keyboard everywhere: ? opens a help overlay listing every shortcut.
  • Optional APP_API_TOKEN[_FILE] middleware for non-loopback binds (constant-time compare, hot-rotation via file watch).

Operations

  • phototag prune [--apply] — drop DB rows for files missing on disk.
  • phototag doctor [--fix] — health check (size mismatches, orphan identities, schema-version drift).
  • phototag backup --out PATH — atomic SQLite snapshot.
  • phototag faces corrections-compact — dedup the audit log per face_id.
  • phototag faces clear-noise-labels — recovery for a historic bug where naming the noise cluster mass-tagged its members.
  • phototag xmp write PATH [--apply] [--include-people] — write <image>.xmp sidecars (Dublin Core dc:Subject) so digiKam, Lightroom, darktable, Capture One can read the tags. Idempotent. Requires the exiftool system binary (apt install libimage-exiftool-perl / brew install exiftool). Example: uv run phototag xmp write ./data/pictures --apply --include-people.
  • phototag xmp clean PATH [--apply] — remove every sidecar under PATH (default dry-run).
  • phototag category add/rm/list/map/unmap — manage user categories that drive lr:HierarchicalSubject (the keyword-tree field digiKam / Lightroom show as a folder hierarchy). Bind tags or face-clusters to a category; on the next phototag xmp write --apply each photo gets hierarchical entries shaped category|subject for every applicable rule. Cluster rules win over tag rules. The same surface is editable in-app via the sidebar categories view. See specs/08-xmp-categories.md.
  • phototag info IMAGE_PATH — inspect tags + metadata for one image (DB row + EXIF) without launching the UI.
  • phototag exif-backfill [--limit N] — extract EXIF from disk into images.exif_json for legacy DB rows that pre-date EXIF capture.
  • phototag geo-tag [--limit N] — reverse-geocode EXIF GPS into city + country tags (model_name=geo_v1). Offline via reverse_geocoder ([geo] extra).
  • phototag rename CLUSTER_ID [LABEL] — set or clear label_user on one cluster (omit LABEL to clear).
  • phototag rename-bulk JSON_PATH — apply {cluster_id: label_user} from a JSON file in one transaction.
  • phototag version — print the installed version.

Install

git clone <repo> && cd image-classifier
uv sync --extra all                     # core + RAM++ + CLIP + clustering + faces + UI
uv run pre-commit install               # contributors only

Heavy ML deps live behind extras ([ram], [clip], [cluster], [face], [heic], [raw], [exif], [geo], [ui], [report]). Core install (uv sync without extras) works without any model.

Weights:

  • RAM++ (~5 GB) — download ram_plus_swin_large_14m.pth from the recognize-anything upstream into $XDG_CACHE_HOME/phototag/models/ (default ~/.cache/phototag/models/; override with APP_MODELS_DIR).
  • InsightFace buffalo_l (~200 MB) — auto-downloads on first phototag faces detect.
  • open_clip ViT-B/32 — auto-downloads on first phototag embed.

Quick start

See GETTING_STARTED.md for the full TL;DR. The 30-second version:

ln -s /path/to/your/photos data/pictures       # point at your library

uv run phototag scan ./data/pictures           # tag with RAM++
uv run phototag embed                              # CLIP embeddings
uv run phototag cluster --min-size 20              # UMAP + HDBSCAN
uv run phototag report --out ./report              # static HTML
uv run phototag serve --port 8000                  # interactive UI

Faces (opt-in):

uv run phototag faces detect --i-understand        # consent gate (one-time)
uv run phototag faces cluster --min-size 3         # group into people
uv run phototag faces auto-attach --persist        # bulk-attach orphans
uv run phototag faces stats                        # see counts

Architecture

[Scanner] -> [Queue] -+-> [Worker RAM]   --+
                      +-> [Worker CLIP]   -+-> [Store SQLite] -> [API/CLI/Export]
                                                    |
                                                    +-> [Clusterer UMAP+HDBSCAN]
                                                    +-> [Faces detect/cluster]
                                                    +-> [HTML report]

Single SQLite file (WAL mode, atomic numbered migrations, thread-local connections + write lock for the FastAPI threadpool). Models behind Tagger / Embedder / FaceDetector interfaces so swapping a backend doesn't touch the rest of the pipeline.

Detail: specs/01-architecture.md, specs/03-data-model.md, specs/15-faces.md.

Privacy

This tool processes biometric data when face features are enabled. Hard rules baked in:

  1. Opt-in: phototag scan never triggers detection. Face commands require --i-understand on first run.
  2. Local only: every embedding stays in the SQLite file under data/. No network calls during inference.
  3. Wipeable: phototag faces purge --yes drops every face row, cluster, identity, and audit-log entry. --keep-identities keeps the names but drops the embeddings.
  4. Don't process other people's libraries without their consent.

GPS data is extracted into images.exif_json; if you share the DB, GPS leaks. Sanitize before exporting.

Full statement: specs/15-faces.md §"Privacy & ethics".

Tech stack

  • Python 3.14 (uv-managed; modern type hints throughout).
  • SQLite (WAL, JSON1, single-file portable).
  • FastAPI + uvicorn for the UI; same-origin SPA, no bundler.
  • PyTorch + Transformers for RAM++; open_clip for CLIP; InsightFace + onnxruntime for faces.
  • UMAP + HDBSCAN for clustering; scipy for Hungarian assignment.
  • structlog with TTY detection; typer for the CLI; pydantic-settings for config; xxhash for content hashing.
  • pytest for testing (83+ tests; CLI / Store / API / face helpers covered; heavy ML paths exercised by slow integration marker).
  • ruff + mypy --strict + pre-commit for the contributor loop.

Project layout

phototag/
  cli.py            typer entry point — every command
  pipeline.py       scan + tag + embed orchestration (batched)
  scanner.py        recursive walk + xxhash + mtime
  store.py          SQLite wrapper (migrations, thread-local conns,
                    write lock, all queries)
  exif.py           EXIF extraction + sanitization
  geo.py            offline reverse-geocoding (cities-1000)
  clustering.py     UMAP + HDBSCAN + TF-IDF cluster naming
  reporting.py      static HTML report (Jinja2)
  faces.py          face detect, cluster, identity match, sticky
                    corrections, attach, refine-noise, auto-attach
  ui.py             FastAPI app + every endpoint
  models/
    base.py         Tagger / Embedder Protocols
    ram.py          RAM++ wrapper
    clip.py         open_clip wrapper
  logging.py
  config.py
  settings.py       APP_* env-var bound via pydantic-settings

static/             ui.css + ui.js (esbuild bundle of static/src/*.js)
static/src/         ESM modules — state, api, lightbox, sidebar,
                    workspace, keyboard, runs, main
templates/          ui.html, cluster.html.j2, index.html.j2
specs/              design + roadmap + improvement plan
tests/              pytest suites (CLI / Store / API / faces / EXIF)
data/               gitignored: DB, model weights, caches, backups

Development

make lint                  # pre-commit on all files
uv run pytest              # fast tests (default)
uv run pytest -m slow      # tests requiring downloaded models
make test-cov              # term-missing + html + xml
make js-build              # bundle static/src/*.js -> static/ui.js
make js-watch              # same, in watch mode

The frontend lives in static/src/ as ESM modules and is bundled to a single ES2020 IIFE at static/ui.js by esbuild. After editing anything under static/src/, run make js-build (one-time npm install first to fetch esbuild). The bundle output is committed, so contributors who don't touch JS never need Node — node_modules/ is gitignored.

Project conventions: CLAUDE.md (overrides apply project-wide; honored by both human and AI contributors).

Pre-commit hooks: ruff (lint + format), mypy strict, pyupgrade --py314-plus, detect-secrets, trailing-whitespace, end-of-file-fixer, check-yaml/toml.

Configuration

.env overrides (all APP_ prefix, parsed by pydantic-settings):

var default what
APP_LOG_LEVEL INFO structlog level
APP_JSON_LOGS auto force json/console; auto = TTY detect
APP_DB_PATH phototag.db SQLite file location
APP_MODELS_DIR $XDG_CACHE_HOME/phototag/models weights cache (per-user, outside library bundle)
APP_DEVICE auto auto / cpu / cuda
APP_API_TOKEN (unset) shared secret for the UI; empty disables auth
APP_API_TOKEN_FILE (unset) file path to a secret; re-read per request (hot rotation)

Specs

License

Apache-2.0. See pyproject.toml.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors