Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The vae encoder of the first_stage_model #345

Open
forgetable233 opened this issue Apr 10, 2024 · 4 comments
Open

The vae encoder of the first_stage_model #345

forgetable233 opened this issue Apr 10, 2024 · 4 comments

Comments

@forgetable233
Copy link

I'm using the sv3d_p model. I noticed that the vae encoder of the first_stage_model is not provided in the ckpt.
I wonder what's the vae encoder of the first_stage_model while training?

@JiuTongBro
Copy link

Same question.

@pengc02
Copy link

pengc02 commented May 12, 2024

Hi guys, i'm also focus on this. It seems that sv3d use the same encoder and decoder as svd, while svd's encoder is released on huggingface. You can refer to: https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt/tree/main/vae for the ckpt, https://github.com/huggingface/diffusers/blob/v0.24.0-release/src/diffusers/models/autoencoder_kl_temporal_decoder.py for the model code, and https://github.com/huggingface/diffusers/blob/v0.24.0-release/src/diffusers/pipelines/stable_video_diffusion/pipeline_stable_video_diffusion.py for how to use.

@chenshuo20
Copy link

@pengc02 thx! It helps.

@chenshuo20
Copy link

Hi guys, i'm also focus on this. It seems that sv3d use the same encoder and decoder as svd, while svd's encoder is released on huggingface. You can refer to: https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt/tree/main/vae for the ckpt, https://github.com/huggingface/diffusers/blob/v0.24.0-release/src/diffusers/models/autoencoder_kl_temporal_decoder.py for the model code, and https://github.com/huggingface/diffusers/blob/v0.24.0-release/src/diffusers/pipelines/stable_video_diffusion/pipeline_stable_video_diffusion.py for how to use.

Also I find that you can use such config to load the vae model:

vae_encoder_config:
      target: src.diffusers.models.autoencoders.autoencoder_kl_temporal_decoder.AutoencoderKLTemporalDecoder
      params:
        block_out_channels: [128, 256, 512, 512]
        layers_per_block: 2
        in_channels: 3
        out_channels: 3
        down_block_types: ["DownEncoderBlock2D", "DownEncoderBlock2D", "DownEncoderBlock2D", "DownEncoderBlock2D"]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants