Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training code? #1

Open
kabachuha opened this issue Jun 21, 2024 · 6 comments
Open

Training code? #1

kabachuha opened this issue Jun 21, 2024 · 6 comments

Comments

@kabachuha
Copy link

Hi!

It's an extremely nice work. Do you have training code release in plans?

@cornettoyu
Copy link
Collaborator

Hi,

Thanks for your interests. For training code and beyond, we are still working on the internal approval process. Before that, feel free to let me know if you have any questions regarding technical details and I am more than happy to address them :)

@reedscot
Copy link

Awesome work! May I ask which VQGAN implementation was used for the proxy codes?

@cornettoyu
Copy link
Collaborator

Awesome work! May I ask which VQGAN implementation was used for the proxy codes?

Thanks for your interests! For proxy codes at warm-up training, we used MaskGIT-VQGAN, the original implementation was in Jax and can be found at https://github.com/google-research/maskgit We used the pytorch version from huggingface's open-muse which provides a pytorch reimplementation and weight ported from Jax.

@vkramanuj
Copy link

vkramanuj commented Jun 22, 2024

Thanks for the great work!

I have a detailed question about the proxy codes. The Maskgit VQGAN provides a fixed length set of codes (256 or 1024). How do you distill that into 32 or 64 codes during the warm-up procedure for the smaller models? Perhaps I'm misunderstanding the paper. Thanks!

@cornettoyu
Copy link
Collaborator

Thanks for the great work!

I have a detailed question about the proxy codes. The Maskgit VQGAN provides a fixed length set of codes (256 or 1024). How do you distill that into 32 or 64 codes during the warm-up procedure for the smaller models? Perhaps I'm misunderstanding the paper. Thanks!

The MaskGIT-VQGAN's code is used to supervise the output of TiTok's de-tokenizer, similar to BEIT. As we use the mask token sequence (BERT, MAE style) to reconstruct the target sequence, it does not matter how many tokens we are using or they are using. We do not apply any "distill" or "loss" between TiTok's codebook/embedding and MaskGIT-VQ's codebook/embedding etc. Hope it addresses your question.

@jeasinema
Copy link

Thank you for the reply. I think I get it, but please correct me if I am wrong here -- so during the warm-up stage, eq. (4) in the main paper is different: 1) it produces the codes of MaskGIT-VQGAN instead of pixels directly; 2) the codes will be processed by MaskGIT-VQGAN decoder into pixels.

I would appreciate it if you could update the main paper with an equation clearing these differences if this is the case. That will help the readers a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants