Release nano-xDiT v0.1.0 · Antlera/nano-xDiT

Minimal single-GPU Wan video-DiT inference + TeaCache / First-Block-Cache step-skipping, extracted from xDiT with all distributed/sequence-parallel machinery removed.

Highlights:

apply_cache_on_transformer hooks a diffusers WanTransformer3DModel via a block-stack wrapper (TeaCache uses e0/e signal; FBCache uses the first-block residual).
NanoWanPipeline: explicit, instrumentable denoising loop driving per-CFG-branch caches.
Verified on Wan2.1-T2V-1.3B (480x832, 33f, 30 steps): 1.57x @ thr=0.1 (37% skip), 2.26x @ thr=0.2 (57% skip). Forced-compute is bit-exact vs no-cache.

import nanoxdit

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nano-xDiT v0.1.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!