docs: mflux-generate --quantize 4 does NOT auto-cache; use mflux-save instead

The README and `configs/mflux-launch-snippet.sh` document an `mflux-generate --quantize 4` workflow that's supposed to write 4-bit weights under `~/.cache/mflux/` on first generation and reuse them on subsequent runs. mflux 0.17.5 doesn't actually behave that way: `--quantize` quantizes in-memory on every cold start, and `~/.cache/mflux/` is never created. Every cold generation pays the quantization cost from the 31 GB FP16 cache.

The persistent-4-bit workflow that actually exists is `mflux-save --model schnell --quantize 4 --path ~/mflux-models/schnell-4bit` once, then `mflux-generate --model ~/mflux-models/schnell-4bit --base-model schnell ...` for every subsequent generation. After `mflux-save`, the 31 GB FP16 HuggingFace cache can be deleted; the 9 GB saved directory is self-contained.

Documenting the corrected workflow plus the related "quit oMLX and the heavy GUI apps before warmup or it'll thrash to a halt in swap" failure mode that was discovered while putting this together.

Proudly Made in Nebraska. Go Big Red! 🌽 https://xkcd.com/2347/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: mflux-generate --quantize 4 does NOT auto-cache; use mflux-save instead #2

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

docs: mflux-generate --quantize 4 does NOT auto-cache; use mflux-save instead #2

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions