Rust-native inference for generative image and video models on Apple MLX, built on mlx-rs.
Status: active — two dozen model provider crates with merged, parity-validated engines spanning image, video, upscaling, identity, and understanding models. Built as a Rust library workspace consumed in-process; not yet published to crates.io. See ARCHITECTURE.md for the design.
A from-scratch Rust reimplementation of the MLX image/video model stack (a divergence from the Python mflux / mlx-video lineage), collapsing on-device inference into a single statically-linked component with no Python sidecar. Each model family is its own provider crate registered through the core mlx-gen Generator contract.
Supported models
- Image: FLUX.1 (schnell/dev, incl. Hyper few-step), FLUX.2 (klein-9b and dev — txt2img, edit, ControlNet, KV-cache edit; Qwen3 text encoder + 32-ch VAE), Chroma (
chroma1_hd/base/flash), Qwen-Image (+ Edit, + ControlNet), Stable Diffusion XL (+ inpaint/outpaint, IP-Adapter, tile-ControlNet, LCM/Lightning/Hyper), Kolors (bilingual, ChatGLM3 text encoder), Z-Image (incl. ControlNet), SenseNova-U1 (unified understanding + generation: T2I, image-edit, VQA, interleaved document), Boogu-Image (Lumina-Image-2.0 / OmniGen2 lineage; base/turbo/edit, Qwen3-VL encoder), Ideogram 4.0 (+ Turbo), Lens / Lens-Turbo (Microsoft; gpt-oss-20b MoE encoder + dual-stream MMDiT) - Video: Wan2.2 (text/image/TI2V, incl. VACE and VACE-Fun), Bernini renderer (ByteDance; Wan2.2-A14B dual-expert MoE + source-id rotary + APG guidance), SCAIL-2 (controlled character animation / motion transfer; Wan2.1-14B I2V backbone), LTX-2.3 (text-to-video + audio), Stable Video Diffusion (image-to-video)
- Upscaling / restoration: SeedVR2 (one-step diffusion super-resolution, image and video; 3B/7B)
- Identity: PuLID-FLUX and InstantID, over a native MLX face stack (SCRFD + ArcFace + BiSeNet)
- Understanding & utility: JoyCaption (captioning), SAM2 / SAM3 (segmentation; SAM3 adds open-vocabulary concept segmentation + video tracking), prompt-refine (Llama-3.2-3B-Instruct prompt rewriting, with JSON grammar-constrained decoding)
- Adapters: LoRA, LoKr (reconstruct + forward-time residual + stacking, quant-safe), ControlNet, IP-Adapter
- Training: native MLX LoRA / LoKr fine-tuning for SDXL, Z-Image, Kolors, Wan2.2, LTX-2.3, and Lens (adamw / adam / rose / prodigy optimizers, dataset + checkpoint plumbing)
- Quantization: group-wise affine Q4 / Q8 (byte-identical to the reference packing)
- Weight loading: most models load directly from their Hugging Face / diffusers snapshot — no conversion step. Families that ship in a non-loadable source format include a native Rust converter (FLUX.2 single-file → diffusers; Wan2.2 torch
.pthreader, T2V/I2V/TI2V + VAE; LTX-2.3 single-file → split MLX). A few models that ship as fp8 or scattered torch checkpoints are provisioned by an offline Python converter undertools/(Ideogram 4 fp8, InstantID, and the face-stack sub-models)
Requires a Mac with full Xcode + the Metal Toolchain (MLX's Metal kernels compile from source).
mlx-gen is a Rust library workspace consumed in-process. Each model family lives in its own
provider crate that self-registers into the core mlx-gen registry at link time — so you depend
on mlx-gen plus whichever provider crates you want, then resolve models by id:
# Cargo.toml
[dependencies]
mlx-gen = { git = "https://github.com/michaeltrefry/mlx-gen" }
mlx-gen-z-image = { git = "https://github.com/michaeltrefry/mlx-gen" }use mlx_gen::{GenerationOutput, GenerationRequest, LoadSpec, Progress, WeightsSource};
// A provider crate registers itself only when it is actually linked. Reference it once
// so the linker keeps its `inventory::submit!` registration.
use mlx_gen_z_image as _;
fn main() -> mlx_gen::Result<()> {
// Load a model by id from a Hugging Face snapshot directory.
let spec = LoadSpec::new(WeightsSource::Dir("/path/to/Z-Image-Turbo".into()));
let model = mlx_gen::load("z_image_turbo", &spec)?;
let req = GenerationRequest {
prompt: "a red fox in a snowy forest".into(),
width: 1024,
height: 1024,
seed: Some(42),
..Default::default()
};
let out = model.generate(&req, &mut |p| {
if let Progress::Step { current, total } = p {
println!("step {current}/{total}");
}
})?;
if let GenerationOutput::Images(images) = out {
let img = &images[0];
// `img.pixels` is interleaved RGB (`img.width` × `img.height`); encode with any
// image crate (e.g. `image::save_buffer`) to write a PNG.
println!("generated {}×{}", img.width, img.height);
}
Ok(())
}Discover what is registered at runtime with mlx_gen::registry::generators() (SeedVR2 is
registered here as a Generator). The same link-time pattern backs the other entry points:
load_trainer (LoRA/LoKr fine-tuning), load_captioner (JoyCaption), and load_textllm
(prompt-refine). The SAM2 / SAM3 segmenters are plain utility APIs used directly, not through the
registry.
Apache License 2.0 — see LICENSE and NOTICE. You are free to use, modify, and distribute mlx-gen, including commercially, under those terms.
mlx-gen is an independent Rust reimplementation and includes no copied source, but it stands on the work of others:
- Apple MLX (MIT) and mlx-rs (Apache-2.0 OR MIT) — the on-device tensor stack
- mflux (MIT) — the MLX diffusion lineage mlx-gen diverged from and validates parity against
- Apple mlx-examples (MIT)
- Hugging Face Diffusers (Apache-2.0) — the upstream model architectures
See NOTICE for full attribution.