Skip to content

v1.0.5 — Two-phase captioning (caption only viable photos)

Choose a tag to compare

@akalavol akalavol released this 31 May 11:13
· 10 commits to main since this release

Your idea, made even smarter

You suggested: in "All" mode, do WD14 → Florence → JoyCaption in order.
That's exactly the new pipeline — but with a key improvement: the heavy
captions run last AND only on photos worth keeping.

The problem with the old behavior

In "All" mode, every image went face → CLIP → WD14 → Florence → JoyCaption
before moving to the next. JoyCaption (~30-120s/image) ran on every photo,
including the ones about to be rejected
(blurry, wrong person, duplicates).

New two-phase pipeline

  • Phase 1 — fast analysis on all images: face detection, CLIP, quality,
    WD14 tags, AI detection, artifacts → computes the viability verdict.
  • Phase 2 — Florence-2 / JoyCaption run only on viable / borderline
    photos. Rejects are skipped entirely.

Impact

On a 200-photo dataset with ~80 viable, JoyCaption now runs on 80 images
instead of 200
— roughly 60% less time on the slow step.

Bonus:

  • Phase 2 has its own progress bar + live preview, so you see the fast
    analysis results (and the dataset verdict) before the slow captioning.
  • The cache stores phase-2 captions, so if you later keep more photos, only
    the newly-kept ones get captioned.
  • Per-target scores correctly reflect the captions.

Updating

  • Git: git pull
  • In-app: ⚙ Config → 🔄 Check now → ⬇ Install update