v1.0.5 — Two-phase captioning (caption only viable photos)
Your idea, made even smarter
You suggested: in "All" mode, do WD14 → Florence → JoyCaption in order.
That's exactly the new pipeline — but with a key improvement: the heavy
captions run last AND only on photos worth keeping.
The problem with the old behavior
In "All" mode, every image went face → CLIP → WD14 → Florence → JoyCaption
before moving to the next. JoyCaption (~30-120s/image) ran on every photo,
including the ones about to be rejected (blurry, wrong person, duplicates).
New two-phase pipeline
- Phase 1 — fast analysis on all images: face detection, CLIP, quality,
WD14 tags, AI detection, artifacts → computes the viability verdict. - Phase 2 — Florence-2 / JoyCaption run only on viable / borderline
photos. Rejects are skipped entirely.
Impact
On a 200-photo dataset with ~80 viable, JoyCaption now runs on 80 images
instead of 200 — roughly 60% less time on the slow step.
Bonus:
- Phase 2 has its own progress bar + live preview, so you see the fast
analysis results (and the dataset verdict) before the slow captioning. - The cache stores phase-2 captions, so if you later keep more photos, only
the newly-kept ones get captioned. - Per-target scores correctly reflect the captions.
Updating
- Git:
git pull - In-app: ⚙ Config → 🔄 Check now → ⬇ Install update