v0.55.0 — VRAM diet + lockup mitigation
What's new
Detailer and sampler hardening targeting an AMD/HIP hard-freeze fingerprint observed under multi-pass detailer runs (gfxhub page faults → VRAM exhaustion → wedge).
Smart Detailer
- Per-bbox soft_empty_cache removed — flushing the allocator every iteration defeated pooling and forced HIP alloc/free roundtrips (the wedge driver). Per-pass flush retained.
- fp32→fp16 cast hoisted to detail() so subsequent passes reuse the owned buffer instead of cloning per-pass (saves ~100-200 MB at hires resolution per pass).
- sam_mask_cache pruned between passes — drops bboxes no future pass will reuse. A 6-pass disjoint-detection run no longer holds full-resolution mask tensors for the whole call.
- SAM predictor offloads to CPU at end of detail(), reclaiming ~600 MB-1.5 GB VRAM between runs without invalidating the model cache.
Samplers (SDXL + Anima)
- Tile-step-down OOM retry for pixel-upscale-with-model and VAE encode/decode (256 → 128 → 64) instead of crashing.
- sampler_sdxl frees
decodedandupscaledbetween hires iterations (~200 MB of dead weight per iteration). - sampler_anima defragments allocator between sample and decode under fragmentation pressure.
Tests
+9 unit tests covering the new offload helper and the cache-pruning algorithm. 333 pass, 5 skip, 1 deselected (pre-existing unrelated comic_page test).