Skip to content

docs: mflux-generate --quantize 4 does NOT auto-cache; use mflux-save instead #2

@CryptoJones

Description

@CryptoJones

The README and configs/mflux-launch-snippet.sh document an mflux-generate --quantize 4 workflow that's supposed to write 4-bit weights under ~/.cache/mflux/ on first generation and reuse them on subsequent runs. mflux 0.17.5 doesn't actually behave that way: --quantize quantizes in-memory on every cold start, and ~/.cache/mflux/ is never created. Every cold generation pays the quantization cost from the 31 GB FP16 cache.

The persistent-4-bit workflow that actually exists is mflux-save --model schnell --quantize 4 --path ~/mflux-models/schnell-4bit once, then mflux-generate --model ~/mflux-models/schnell-4bit --base-model schnell ... for every subsequent generation. After mflux-save, the 31 GB FP16 HuggingFace cache can be deleted; the 9 GB saved directory is self-contained.

Documenting the corrected workflow plus the related "quit oMLX and the heavy GUI apps before warmup or it'll thrash to a halt in swap" failure mode that was discovered while putting this together.

Proudly Made in Nebraska. Go Big Red! 🌽 https://xkcd.com/2347/

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions