C++ implementations of popular samplers and schedulers used for image generation (e.g. Stable Diffusion and Flux)
The samplers have been ported (to vanilla Python), but not tested. They are algorithmically identical to popular implementations, but may contain minor bugs.
The next step is to produce test values from an existing implementation (e.g. Forge WebUI), and ensure that this port matches those values.
Once it passes the numerical tests, I'll port the vanilla python implementation to C++.
- Separated Architecture: Schedulers (noise schedules) and Samplers (stepping algorithms) are independent
- (TBD) Numerically Verified: All implementations tested against reference values
- No Dependencies: Only standard library (no PyTorch/LibTorch needed)
- Uniform
- Karras
- Exponential
// easy to add more
- ddim
- heun
- euler
- euler_a
- dpm2
- dpm2_a
- dpm_fast
- dpm_adaptive
- lms
- dpmpp_2s_a
- dpmpp_sde
- dpmpp_2m
- dpmpp_2m (v2)
- dpmpp_2m_sde
- dpmpp_2m_sde_heun
- dpmpp_3m_sde
- look at "beta_schedule": "scaled_linear", and "steps_offset": 1, (used in SD 1.5), and whether that's being respected in the new implementation.
- look at timestep_spacing: "linspace", "trailing" etc. diffusers first creates the timesteps (using spacing), then the sigmas, and then converts those sigmas to karras, exp, etc.
- check if any samplers don't have model input or output scaling. I think the new DPMSolver implementation has a bug (it doesn't scale the model outputs), whereas diffusers does.