Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with Changing embedding_dim in VQ-VAE Model #138

Open
arsh-rl opened this issue Apr 1, 2024 · 1 comment
Open

Issue with Changing embedding_dim in VQ-VAE Model #138

arsh-rl opened this issue Apr 1, 2024 · 1 comment

Comments

@arsh-rl
Copy link

arsh-rl commented Apr 1, 2024

Hi @clementchadebec.
Thank you for creating this repository.

I am attempting to train a VQ-VAE model, but I couldn't find an embedding_dim argument in either the VQVAEconfig or VQVAE classes to assign a value to it.
From what I have found, the only place where the embedding_dim is assigned is inside the _set_quantizer function of the VQVAE class, which is hard-coded to one.

    def _set_quantizer(self, model_config):
        if model_config.input_dim is None:
            raise AttributeError(
                "No input dimension provided !"
                "'input_dim' parameter of VQVAEConfig instance must be set to 'data_shape' where "
                "the shape of the data is (C, H, W ..). Unable to set quantizer."
            )

        x = torch.randn((2,) + self.model_config.input_dim)
        z = self.encoder(x).embedding
        if len(z.shape) == 2:
            z = z.reshape(z.shape[0], 1, 1, -1)

        z = z.permute(0, 2, 3, 1)

        self.model_config.embedding_dim = z.shape[-1]

z.shape[-1] always holds the value 1.

@clementchadebec
Copy link
Owner

Hi @arsh-rl,

Actually, this value is set automatically to either the number of channels of your encoded sample or the size of your latent space in case of flattened encoded input. This is needed to be able to quantized the encoded sample within the codebook. To change this value you should either adapt you encoder architecture to output a sample with the required embedded dimension if you created you own or change the latent_dim in your config if you use default nets. In the latter case, do not also forget to provide the input dimension of your data.

I hope this helps.

Best,

Clément

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants