In the latent diffusion paper authors mention that:
"A notable advantage of this approach is that we need to train the universal autoencoding stage only once and can therefore reuse it for multiple DM trainings or to explore possibly completely different tasks."
Does that mean that we do not have to retrain autoencoder stage for image <-> latent space encoding / decoding when we want to train DM on a new dataset i.e. the autoencoder is general enough? This seems pretty strange to me.
I see that here:
huggingface/diffusers#356
vae = AutoencoderKL.from_pretrained( args.pretrained_model_name_or_path, subfolder="vae", use_auth_token=args.use_auth_token )
the vae is just loaded, which would seem that this is the case.