Hello Professor, I noticed that there does not seem to be support for distilling dual-model architectures such as Wan2.2-I2V-14B. Is this because, in DMD distillation, there is a “student simulates the denoising process” step, and for the low-noise-stage model, it is not straightforward to construct an initial sample in the same way as “directly initializing from Gaussian noise”?
Hello Professor, I noticed that there does not seem to be support for distilling dual-model architectures such as Wan2.2-I2V-14B. Is this because, in DMD distillation, there is a “student simulates the denoising process” step, and for the low-noise-stage model, it is not straightforward to construct an initial sample in the same way as “directly initializing from Gaussian noise”?