Skip to content

v0.55.0 — VRAM diet + lockup mitigation

Choose a tag to compare

@Deaththegrim Deaththegrim released this 08 May 06:36
· 36 commits to main since this release

What's new

Detailer and sampler hardening targeting an AMD/HIP hard-freeze fingerprint observed under multi-pass detailer runs (gfxhub page faults → VRAM exhaustion → wedge).

Smart Detailer

  • Per-bbox soft_empty_cache removed — flushing the allocator every iteration defeated pooling and forced HIP alloc/free roundtrips (the wedge driver). Per-pass flush retained.
  • fp32→fp16 cast hoisted to detail() so subsequent passes reuse the owned buffer instead of cloning per-pass (saves ~100-200 MB at hires resolution per pass).
  • sam_mask_cache pruned between passes — drops bboxes no future pass will reuse. A 6-pass disjoint-detection run no longer holds full-resolution mask tensors for the whole call.
  • SAM predictor offloads to CPU at end of detail(), reclaiming ~600 MB-1.5 GB VRAM between runs without invalidating the model cache.

Samplers (SDXL + Anima)

  • Tile-step-down OOM retry for pixel-upscale-with-model and VAE encode/decode (256 → 128 → 64) instead of crashing.
  • sampler_sdxl frees decoded and upscaled between hires iterations (~200 MB of dead weight per iteration).
  • sampler_anima defragments allocator between sample and decode under fragmentation pressure.

Tests

+9 unit tests covering the new offload helper and the cache-pruning algorithm. 333 pass, 5 skip, 1 deselected (pre-existing unrelated comic_page test).