SVD is built on the pretrained SD2.1. The latter is trained with v-prediction, which is known to have more benefits than epsilon prediction, especially for tasks that emphasize consistency. However, SVD changes it back to epsilon prediction, which is not a straightforward design. Although full finetuning will be conducted, I think this shift will likely introduce larger gap with pretrained weights, and I am wondering the reason behind this option.