Skip to content

v1.0.4 — GPU acceleration for face + CLIP, device selector

Choose a tag to compare

@akalavol akalavol released this 31 May 10:49
· 11 commits to main since this release

Performance: now actually uses your GPU

You asked about coupling CPU+GPU. The honest answer: true data-parallel
splitting gives only ~1.25× (the CPU is ~4× slower than the GPU on these
models) for double the memory and a lot of fragile code — not worth it.

But the investigation found the real problem: two models that run on every
single image were pinned to CPU
even when a GPU was available:

  • insightface face detection
  • CLIP (body + expression analysis)

This release puts them on the GPU when one is present → ~3-5× faster
on those stages, for every analysis.

New: device selector (⚙ Config tab)

  • Auto (GPU if available) — default, the smart choice
  • Force GPU (CUDA) — falls back to CPU safely if no GPU detected
  • Force CPU — hides the GPU from the whole subprocess (torch +
    onnxruntime + every captioner). Handy when ComfyUI is busy on the GPU.

The chosen device is shown in the progress log, and is honored by both the
analyzer and the LoRA evaluator.

Real-world impact

Combined with v1.0.3 (no more 10-min timeout) and the WD14-first workflow,
a 200-photo dataset is now far quicker:

  • WD14 mode on GPU: a few minutes
  • JoyCaption still benefits because face + CLIP no longer bottleneck on CPU
    before each caption.

Updating

  • Git: git pull
  • In-app: ⚙ Config → 🔄 Check now → ⬇ Install update