mlx-gen

Rust-native inference for generative image and video models on Apple MLX, built on mlx-rs.

Status: active — two dozen model provider crates with merged, parity-validated engines spanning image, video, upscaling, identity, and understanding models. Built as a Rust library workspace consumed in-process; not yet published to crates.io. See ARCHITECTURE.md for the design.

A from-scratch Rust reimplementation of the MLX image/video model stack (a divergence from the Python mflux / mlx-video lineage), collapsing on-device inference into a single statically-linked component with no Python sidecar. Each model family is its own provider crate registered through the core mlx-gen Generator contract.

Supported models

Image: FLUX.1 (schnell/dev, incl. Hyper few-step), FLUX.2 (klein-9b and dev — txt2img, edit, ControlNet, KV-cache edit; Qwen3 text encoder + 32-ch VAE), Chroma (chroma1_hd/base/flash), Qwen-Image (+ Edit, + ControlNet), Stable Diffusion XL (+ inpaint/outpaint, IP-Adapter, tile-ControlNet, LCM/Lightning/Hyper), Kolors (bilingual, ChatGLM3 text encoder), Z-Image (incl. ControlNet), SenseNova-U1 (unified understanding + generation: T2I, image-edit, VQA, interleaved document), Boogu-Image (Lumina-Image-2.0 / OmniGen2 lineage; base/turbo/edit, Qwen3-VL encoder), Ideogram 4.0 (+ Turbo), Lens / Lens-Turbo (Microsoft; gpt-oss-20b MoE encoder + dual-stream MMDiT)
Video: Wan2.2 (text/image/TI2V, incl. VACE and VACE-Fun), Bernini renderer (ByteDance; Wan2.2-A14B dual-expert MoE + source-id rotary + APG guidance), SCAIL-2 (controlled character animation / motion transfer; Wan2.1-14B I2V backbone), LTX-2.3 (text-to-video + audio), Stable Video Diffusion (image-to-video)
Upscaling / restoration: SeedVR2 (one-step diffusion super-resolution, image and video; 3B/7B)
Identity: PuLID-FLUX and InstantID, over a native MLX face stack (SCRFD + ArcFace + BiSeNet)
Understanding & utility: JoyCaption (captioning), SAM2 / SAM3 (segmentation; SAM3 adds open-vocabulary concept segmentation + video tracking), prompt-refine (Llama-3.2-3B-Instruct prompt rewriting, with JSON grammar-constrained decoding)
Adapters: LoRA, LoKr (reconstruct + forward-time residual + stacking, quant-safe), ControlNet, IP-Adapter
Training: native MLX LoRA / LoKr fine-tuning for SDXL, Z-Image, Kolors, Wan2.2, LTX-2.3, and Lens (adamw / adam / rose / prodigy optimizers, dataset + checkpoint plumbing)
Quantization: group-wise affine Q4 / Q8 (byte-identical to the reference packing)
Weight loading: most models load directly from their Hugging Face / diffusers snapshot — no conversion step. Families that ship in a non-loadable source format include a native Rust converter (FLUX.2 single-file → diffusers; Wan2.2 torch .pth reader, T2V/I2V/TI2V + VAE; LTX-2.3 single-file → split MLX). A few models that ship as fp8 or scattered torch checkpoints are provisioned by an offline Python converter under tools/ (Ideogram 4 fp8, InstantID, and the face-stack sub-models)

Requires a Mac with full Xcode + the Metal Toolchain (MLX's Metal kernels compile from source).

Usage

mlx-gen is a Rust library workspace consumed in-process. Each model family lives in its own provider crate that self-registers into the core mlx-gen registry at link time — so you depend on mlx-gen plus whichever provider crates you want, then resolve models by id:

# Cargo.toml
[dependencies]
mlx-gen = { git = "https://github.com/michaeltrefry/mlx-gen" }
mlx-gen-z-image = { git = "https://github.com/michaeltrefry/mlx-gen" }

use mlx_gen::{GenerationOutput, GenerationRequest, LoadSpec, Progress, WeightsSource};

// A provider crate registers itself only when it is actually linked. Reference it once
// so the linker keeps its `inventory::submit!` registration.
use mlx_gen_z_image as _;

fn main() -> mlx_gen::Result<()> {
    // Load a model by id from a Hugging Face snapshot directory.
    let spec = LoadSpec::new(WeightsSource::Dir("/path/to/Z-Image-Turbo".into()));
    let model = mlx_gen::load("z_image_turbo", &spec)?;

    let req = GenerationRequest {
        prompt: "a red fox in a snowy forest".into(),
        width: 1024,
        height: 1024,
        seed: Some(42),
        ..Default::default()
    };

    let out = model.generate(&req, &mut |p| {
        if let Progress::Step { current, total } = p {
            println!("step {current}/{total}");
        }
    })?;

    if let GenerationOutput::Images(images) = out {
        let img = &images[0];
        // `img.pixels` is interleaved RGB (`img.width` × `img.height`); encode with any
        // image crate (e.g. `image::save_buffer`) to write a PNG.
        println!("generated {}×{}", img.width, img.height);
    }
    Ok(())
}

Discover what is registered at runtime with mlx_gen::registry::generators() (SeedVR2 is registered here as a Generator). The same link-time pattern backs the other entry points: load_trainer (LoRA/LoKr fine-tuning), load_captioner (JoyCaption), and load_textllm (prompt-refine). The SAM2 / SAM3 segmenters are plain utility APIs used directly, not through the registry.

License

Apache License 2.0 — see LICENSE and NOTICE. You are free to use, modify, and distribute mlx-gen, including commercially, under those terms.

Acknowledgements

mlx-gen is an independent Rust reimplementation and includes no copied source, but it stands on the work of others:

Apple MLX (MIT) and mlx-rs (Apache-2.0 OR MIT) — the on-device tensor stack
mflux (MIT) — the MLX diffusion lineage mlx-gen diverged from and validates parity against
Apple mlx-examples (MIT)
Hugging Face Diffusers (Apache-2.0) — the upstream model architectures

See NOTICE for full attribution.

Name		Name	Last commit message	Last commit date
Latest commit History 1,408 Commits
.cargo		.cargo
.github/workflows		.github/workflows
docs		docs
gen-core-testkit		gen-core-testkit
gen-core		gen-core
mlx-gen-bernini		mlx-gen-bernini
mlx-gen-boogu		mlx-gen-boogu
mlx-gen-chroma		mlx-gen-chroma
mlx-gen-clip		mlx-gen-clip
mlx-gen-depth		mlx-gen-depth
mlx-gen-face		mlx-gen-face
mlx-gen-flux		mlx-gen-flux
mlx-gen-flux2		mlx-gen-flux2
mlx-gen-ideogram		mlx-gen-ideogram
mlx-gen-instantid		mlx-gen-instantid
mlx-gen-joycaption		mlx-gen-joycaption
mlx-gen-kolors		mlx-gen-kolors
mlx-gen-krea		mlx-gen-krea
mlx-gen-lens		mlx-gen-lens
mlx-gen-ltx		mlx-gen-ltx
mlx-gen-pid		mlx-gen-pid
mlx-gen-pulid		mlx-gen-pulid
mlx-gen-qwen-image		mlx-gen-qwen-image
mlx-gen-sam2		mlx-gen-sam2
mlx-gen-sam3		mlx-gen-sam3
mlx-gen-sana		mlx-gen-sana
mlx-gen-scail2		mlx-gen-scail2
mlx-gen-sd3		mlx-gen-sd3
mlx-gen-sdxl		mlx-gen-sdxl
mlx-gen-seedvr2		mlx-gen-seedvr2
mlx-gen-sensenova		mlx-gen-sensenova
mlx-gen-svd		mlx-gen-svd
mlx-gen-wan		mlx-gen-wan
mlx-gen-z-image		mlx-gen-z-image
scripts/spikes/sam3_oracle		scripts/spikes/sam3_oracle
src		src
tests		tests
tools		tools
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
CLAUDE.md		CLAUDE.md
CODEGRAPH.md		CODEGRAPH.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
rust-toolchain.toml		rust-toolchain.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

mlx-gen

Usage

License

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

mlx-gen

Usage

License

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages