Skip to content

Question about hyperparameters #287

@pseudo-usama

Description

@pseudo-usama

I've quite a lot of questions about this model. I've successfully trained latent diffusion on AFHQ dataset. But i'm having a hard time understanding many hyperparameters in yaml files.

In Autoencoder yaml:

  • embed_dim: Why we are using embeddings in an autoencoder?
  • n_embed: What is this?
  • double_z: What is the purpose of this? I've noticed that it's True for KL autoencoder & False for VQ autoencoder. Why?
  • ch: I know it means channels. But how does this changes model architecture?
  • ch_mult: How does this work?
  • lossconfig.target: This is set to taming.modules.losses.vqperceptual.VQLPIPSWithDiscriminator is it using a discriminator (like in a GAN)? Why an autoencoder needs an Discriminator?
  • lossconfig.params.disc_weight: Is that related to discriminator in VQLPIPSWithDiscriminator & how does it influences it?
  • lossconfig.params.codebook_weight: What is a codebook weight in a VQ autoencoder?

In Latent Diffusion yaml:

  • first_stage_key: What is this? In every yaml file it's set to image.
  • num_timesteps_cond: What does this do? In every file it's set to 1.
  • log_every_t: How does this work?

I would be grateful for any form of assistance. Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions