Release v1.0.4 — GPU acceleration for face + CLIP, device selector · akalavol/LoRA-Dataset-Coach

Performance: now actually uses your GPU

You asked about coupling CPU+GPU. The honest answer: true data-parallel
splitting gives only ~1.25× (the CPU is ~4× slower than the GPU on these
models) for double the memory and a lot of fragile code — not worth it.

But the investigation found the real problem: two models that run on every
single image were pinned to CPU even when a GPU was available:

insightface face detection
CLIP (body + expression analysis)

This release puts them on the GPU when one is present → ~3-5× faster
on those stages, for every analysis.

New: device selector (⚙ Config tab)

Auto (GPU if available) — default, the smart choice
Force GPU (CUDA) — falls back to CPU safely if no GPU detected
Force CPU — hides the GPU from the whole subprocess (torch +
onnxruntime + every captioner). Handy when ComfyUI is busy on the GPU.

The chosen device is shown in the progress log, and is honored by both the
analyzer and the LoRA evaluator.

Real-world impact

Combined with v1.0.3 (no more 10-min timeout) and the WD14-first workflow,
a 200-photo dataset is now far quicker:

WD14 mode on GPU: a few minutes
JoyCaption still benefits because face + CLIP no longer bottleneck on CPU
before each caption.

Updating

Git: git pull
In-app: ⚙ Config → 🔄 Check now → ⬇ Install update

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.0.4 — GPU acceleration for face + CLIP, device selector

Choose a tag to compare

Sorry, something went wrong.