snapsift

Sift the near-duplicate snaps Apple's built-in Duplicates detector misses — specifically the "manual burst" sequences where someone held the shutter and got 10+ near-identical shots.

Built for a real Photos library of 120K+ photos where Apple's Duplicates album was already empty, yet thousands of sub-second-apart shots remained.

There are two ways to use snapsift:

The macOS app (app/) — a native SwiftUI window. Scan, review each cluster side-by-side, and delete the extras. Recommended for most people.
The Python tools (repo root) — the original, hackable engine and CLI. Zero-dependency, reads the library directly. Great for scripting and tinkering.

The macOS app

A native, on-device SwiftUI app built on Apple's own frameworks — nothing ever leaves your Mac.

PhotoKit for enumeration, thumbnails (fetched from iCloud on demand, so it works even with "Optimize Mac Storage"), and deletion straight into Recently Deleted (recoverable 30 days) — no AppleScript.
Apple quality ranking — reads your library's own aesthetic scores so the keeper is the genuinely better frame, not just the biggest file.
Face-aware keeper (Vision) — re-picks the frame where people's eyes are open and everyone's in shot.
Cross-time look-alikes (Vision feature prints) — finds the same photo saved on different days, not just time bursts.
Favorites are never deleted; videos are off by default.

Build it (no Xcode needed):

cd app
./scripts/build-app.sh          # → ~/Desktop/snapsift.app, then double-click
swift run SnapsiftTests         # run the Core test suite

How it works (the engine)

Five small tools. The core (scan/pick/delete) has zero dependencies beyond Python 3 and macOS; the two optional passes use Pillow.

Step	Tool	What it does
1	`scan.py`	Reads `Photos.sqlite` directly (read-only, immutable). Walks every non-trashed photo (videos skipped by default) in date order and clusters them by `(width, height)` + sub-3s time gap + ±10% file size, capped at a 30s total span. Carries each frame's favorite flag and Apple's own quality scores. Emits `groups.json`.
2	`pick.py`	For each cluster, picks one keeper: favorites are never deleted, then Apple's quality score, then UTI priority (HEIC > JPG > PNG), then larger file. Emits `plan.json` and `delete-uuids.txt`.
3	`delete.applescript`	Reads `delete-uuids.txt` and tells `Photos.app` to delete the marked items in batches of 100. They go to "Recently Deleted" → recoverable for 30 days.
L3	`hash.py` (opt.)	Cross-time near-duplicates: dHashes each photo's thumbnail and groups the matches via a BK-tree, so the same shot saved on different days collapses together. Emits a `groups.json`-shaped file that feeds straight back into `pick.py`. Needs `pip install "Pillow>=9"`.
UI	`review.py` (opt.)	A local web page to eyeball every cluster before deleting: keeper highlighted, click to re-pick, ★ favorites locked, then Export the reviewed delete list. Reads any `groups.json` (burst or perceptual). stdlib server; Pillow only sharpens the thumbnails.

Why it works

Apple's Duplicates feature is conservative: it only flags photos with very similar perceptual hashes and matching metadata. Manual sequences ("I held the shutter for two seconds and got 15 frames") are intentional captures from Apple's point of view, so the algorithm leaves them all.

But for users, those 15 frames are duplicates — the user just wants the best one. We detect them by relying on the only signal that's both fast and nearly perfect: photos taken within seconds of each other, same camera, same dimensions, similar file size, are nearly always near-duplicates.

Real-world hit rate on a 120K-photo library:

--gap-sec 3 --size-tolerance 0.10 (default): 4,142 clusters, 6,608 deletable, ≈19 GB recovered. Near-zero false positives in spot checks.
--gap-sec 5: more aggressive, ~38K candidates.
--gap-sec 10: aggressive, ~46K candidates — some misses (people legitimately took multiple shots at an event).

Safety

Photos.sqlite is opened with ?mode=ro&immutable=1, so we never touch Apple's data file even while Photos.app is running.
Favorites are never deleted. A favorited frame always survives — and if a whole cluster is favorited, nothing in it is deleted.
Videos are skipped by default (two short clips shot back-to-back are rarely true duplicates). Opt in with scan.py --include-video.
Runaway clusters are capped by --max-span (default 30s) so a slow drift of near-identical frames can't silently chain across an unrelated session.
Deletion goes via Photos' own AppleScript bridge, so items land in "Recently Deleted" — fully recoverable for 30 days.
iCloud sync handles the rest: deleting on the Mac also clears the duplicates from iCloud and from every other device.
Run on a small --max-groups 10 plan first to validate.

Usage

# 1. Scan
python3 scan.py \
    --library ~/Pictures/Photos\ Library.photoslibrary \
    --output groups.json

# 2. Plan — start with 10 groups to validate
python3 pick.py --input groups.json --output plan.json \
    --uuid-out delete-uuids.txt --max-groups 10

# 3. Open Photos.app, then delete
osascript delete.applescript "$(pwd)/delete-uuids.txt"

# Validate: open Photos.app → "Recently Deleted" → confirm
# Then re-run without --max-groups and apply.

Optional: review visually before deleting

python3 review.py --groups groups.json     # opens http://127.0.0.1:8765
# click to re-pick keepers, then "Export" → writes delete-uuids.txt

Optional: L3 cross-time perceptual pass

pip install "Pillow>=9"
python3 hash.py --output hash-groups.json --max-distance 2
python3 review.py --groups hash-groups.json --uuid-out hash-delete-uuids.txt
osascript delete.applescript "$(pwd)/hash-delete-uuids.txt"

Development

pip install pytest "Pillow>=9"
pytest                 # pure-logic tests — no Photos library needed

The clustering, keeper, hashing and grouping logic are all pure functions with unit tests; only the thin SQLite/thumbnail IO layer touches a real library.

Schema gotchas (for hackers)

ZASSET.ZDATECREATED is Cocoa epoch (seconds since 2001-01-01 UTC). Add 978307200 to get Unix epoch.
ZASSET.ZAVALANCHEUUID flags iOS-native burst groups — but on the test library this only accounts for 121 groups / 1,141 photos, ~10× less than what time-clustering finds.
ZADDITIONALASSETATTRIBUTES.ZORIGINALSTABLEHASH is Apple's own content hash. Exact matches are rare (the test library had 130) because most "duplicates" are visually identical but byte-different.
Apple already tracks ZDUPLICATEMETADATAMATCHINGALBUM and ZDUPLICATEPERCEPTUALMATCHINGALBUM. They're cleared after the user resolves Duplicates; check before relying on them.

Roadmap

L3: perceptual hashes over derivatives/ thumbnails to catch near-duplicates across time (different days, same photo) — hash.py.
Web review UI: clusters side-by-side, override the picker's choice — review.py.
Smarter keeper: weighted by ZCOMPUTEDASSETATTRIBUTES sharpness / framing / timing scores Apple already computes — pick.py.
Package as a single snapsift console entry point.
Face-aware keeper: prefer the frame where everyone's eyes are open.

License

MIT.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
.github/workflows		.github/workflows
app		app
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
delete.applescript		delete.applescript
hash.py		hash.py
index.html		index.html
pick.py		pick.py
pyproject.toml		pyproject.toml
review.py		review.py
scan.py		scan.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

snapsift

The macOS app

How it works (the engine)

Why it works

Safety

Usage

Optional: review visually before deleting

Optional: L3 cross-time perceptual pass

Development

Schema gotchas (for hackers)

Roadmap

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

snapsift

The macOS app

How it works (the engine)

Why it works

Safety

Usage

Optional: review visually before deleting

Optional: L3 cross-time perceptual pass

Development

Schema gotchas (for hackers)

Roadmap

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages