question about the visual autoencoder #55

Junction4Nako · 2023-12-28T06:52:45Z

Thanks for the great work! I have some questions about the checkpoints:

It seems that BAAI/Emu2 does not include the weight of visual decoder (diffusion unet), but I think in section 2.2.3 of the paper, Emu2 should include the autoencoder-trained decoder?
Emu2-Gen provides the weights of visual decoder, can the visual encoder and decoder in BAAI/Emu2-Gen work as an autoencoder?
looking forward to your reply~

ryanzhangfan · 2023-12-28T07:12:59Z

Thanks for your interest in our work!

The visual decoder of Emu2.
As stated in the paper, we freeze the visual encoder during the training of Emu2-Gen and visual decoder. Hence, Emu2 and Emu2-Gen share exactly the same visual decoder. The visual decoder weights in Emu2-Gen can be directly used in Emu2.
The autoencoder paradigm
Yes, the visual encoder and the visual decoder can work as an autoencoder. Our pipeline currently supports to generate the output in an autoenoding manner. You can find instructions at HF version model or native PyTorch version model(at the bottom part of the example codes).

Provide feedback