Sift the near-duplicate snaps Apple's built-in Duplicates detector misses — specifically the "manual burst" sequences where someone held the shutter and got 10+ near-identical shots.
Built for a real Photos library of 120K+ photos where Apple's Duplicates
album was already empty, yet thousands of sub-second-apart shots remained.
There are two ways to use snapsift:
- The macOS app (
app/) — a native SwiftUI window. Scan, review each cluster side-by-side, and delete the extras. Recommended for most people. - The Python tools (repo root) — the original, hackable engine and CLI. Zero-dependency, reads the library directly. Great for scripting and tinkering.
A native, on-device SwiftUI app built on Apple's own frameworks — nothing ever leaves your Mac.
- PhotoKit for enumeration, thumbnails (fetched from iCloud on demand, so it works even with "Optimize Mac Storage"), and deletion straight into Recently Deleted (recoverable 30 days) — no AppleScript.
- Apple quality ranking — reads your library's own aesthetic scores so the keeper is the genuinely better frame, not just the biggest file.
- Face-aware keeper (Vision) — re-picks the frame where people's eyes are open and everyone's in shot.
- Cross-time look-alikes (Vision feature prints) — finds the same photo saved on different days, not just time bursts.
- Favorites are never deleted; videos are off by default.
Build it (no Xcode needed):
cd app
./scripts/build-app.sh # → ~/Desktop/snapsift.app, then double-click
swift run SnapsiftTests # run the Core test suiteFive small tools. The core (scan/pick/delete) has zero dependencies beyond Python 3 and macOS; the two optional passes use Pillow.
| Step | Tool | What it does |
|---|---|---|
| 1 | scan.py |
Reads Photos.sqlite directly (read-only, immutable). Walks every non-trashed photo (videos skipped by default) in date order and clusters them by (width, height) + sub-3s time gap + ±10% file size, capped at a 30s total span. Carries each frame's favorite flag and Apple's own quality scores. Emits groups.json. |
| 2 | pick.py |
For each cluster, picks one keeper: favorites are never deleted, then Apple's quality score, then UTI priority (HEIC > JPG > PNG), then larger file. Emits plan.json and delete-uuids.txt. |
| 3 | delete.applescript |
Reads delete-uuids.txt and tells Photos.app to delete the marked items in batches of 100. They go to "Recently Deleted" → recoverable for 30 days. |
| L3 | hash.py (opt.) |
Cross-time near-duplicates: dHashes each photo's thumbnail and groups the matches via a BK-tree, so the same shot saved on different days collapses together. Emits a groups.json-shaped file that feeds straight back into pick.py. Needs pip install "Pillow>=9". |
| UI | review.py (opt.) |
A local web page to eyeball every cluster before deleting: keeper highlighted, click to re-pick, ★ favorites locked, then Export the reviewed delete list. Reads any groups.json (burst or perceptual). stdlib server; Pillow only sharpens the thumbnails. |
Apple's Duplicates feature is conservative: it only flags photos with very similar perceptual hashes and matching metadata. Manual sequences ("I held the shutter for two seconds and got 15 frames") are intentional captures from Apple's point of view, so the algorithm leaves them all.
But for users, those 15 frames are duplicates — the user just wants the best one. We detect them by relying on the only signal that's both fast and nearly perfect: photos taken within seconds of each other, same camera, same dimensions, similar file size, are nearly always near-duplicates.
Real-world hit rate on a 120K-photo library:
--gap-sec 3 --size-tolerance 0.10(default): 4,142 clusters, 6,608 deletable, ≈19 GB recovered. Near-zero false positives in spot checks.--gap-sec 5: more aggressive, ~38K candidates.--gap-sec 10: aggressive, ~46K candidates — some misses (people legitimately took multiple shots at an event).
Photos.sqliteis opened with?mode=ro&immutable=1, so we never touch Apple's data file even while Photos.app is running.- Favorites are never deleted. A favorited frame always survives — and if a whole cluster is favorited, nothing in it is deleted.
- Videos are skipped by default (two short clips shot back-to-back are
rarely true duplicates). Opt in with
scan.py --include-video. - Runaway clusters are capped by
--max-span(default 30s) so a slow drift of near-identical frames can't silently chain across an unrelated session. - Deletion goes via Photos' own AppleScript bridge, so items land in "Recently Deleted" — fully recoverable for 30 days.
- iCloud sync handles the rest: deleting on the Mac also clears the duplicates from iCloud and from every other device.
- Run on a small
--max-groups 10plan first to validate.
# 1. Scan
python3 scan.py \
--library ~/Pictures/Photos\ Library.photoslibrary \
--output groups.json
# 2. Plan — start with 10 groups to validate
python3 pick.py --input groups.json --output plan.json \
--uuid-out delete-uuids.txt --max-groups 10
# 3. Open Photos.app, then delete
osascript delete.applescript "$(pwd)/delete-uuids.txt"
# Validate: open Photos.app → "Recently Deleted" → confirm
# Then re-run without --max-groups and apply.python3 review.py --groups groups.json # opens http://127.0.0.1:8765
# click to re-pick keepers, then "Export" → writes delete-uuids.txtpip install "Pillow>=9"
python3 hash.py --output hash-groups.json --max-distance 2
python3 review.py --groups hash-groups.json --uuid-out hash-delete-uuids.txt
osascript delete.applescript "$(pwd)/hash-delete-uuids.txt"pip install pytest "Pillow>=9"
pytest # pure-logic tests — no Photos library neededThe clustering, keeper, hashing and grouping logic are all pure functions with unit tests; only the thin SQLite/thumbnail IO layer touches a real library.
ZASSET.ZDATECREATEDis Cocoa epoch (seconds since 2001-01-01 UTC). Add 978307200 to get Unix epoch.ZASSET.ZAVALANCHEUUIDflags iOS-native burst groups — but on the test library this only accounts for 121 groups / 1,141 photos, ~10× less than what time-clustering finds.ZADDITIONALASSETATTRIBUTES.ZORIGINALSTABLEHASHis Apple's own content hash. Exact matches are rare (the test library had 130) because most "duplicates" are visually identical but byte-different.- Apple already tracks
ZDUPLICATEMETADATAMATCHINGALBUMandZDUPLICATEPERCEPTUALMATCHINGALBUM. They're cleared after the user resolves Duplicates; check before relying on them.
- L3: perceptual hashes over
derivatives/thumbnails to catch near-duplicates across time (different days, same photo) —hash.py. - Web review UI: clusters side-by-side, override the picker's choice —
review.py. - Smarter keeper: weighted by
ZCOMPUTEDASSETATTRIBUTESsharpness / framing / timing scores Apple already computes —pick.py. - Package as a single
snapsiftconsole entry point. - Face-aware keeper: prefer the frame where everyone's eyes are open.
MIT.