Training code? #1

kabachuha · 2024-06-21T21:13:25Z

Hi!

It's an extremely nice work. Do you have training code release in plans?

cornettoyu · 2024-06-21T21:23:13Z

Hi,

Thanks for your interests. For training code and beyond, we are still working on the internal approval process. Before that, feel free to let me know if you have any questions regarding technical details and I am more than happy to address them :)

reedscot · 2024-06-22T03:20:48Z

Awesome work! May I ask which VQGAN implementation was used for the proxy codes?

cornettoyu · 2024-06-22T04:08:43Z

Awesome work! May I ask which VQGAN implementation was used for the proxy codes?

Thanks for your interests! For proxy codes at warm-up training, we used MaskGIT-VQGAN, the original implementation was in Jax and can be found at https://github.com/google-research/maskgit We used the pytorch version from huggingface's open-muse which provides a pytorch reimplementation and weight ported from Jax.

vkramanuj · 2024-06-22T07:48:24Z

Thanks for the great work!

I have a detailed question about the proxy codes. The Maskgit VQGAN provides a fixed length set of codes (256 or 1024). How do you distill that into 32 or 64 codes during the warm-up procedure for the smaller models? Perhaps I'm misunderstanding the paper. Thanks!

cornettoyu · 2024-06-22T21:39:19Z

Thanks for the great work!

I have a detailed question about the proxy codes. The Maskgit VQGAN provides a fixed length set of codes (256 or 1024). How do you distill that into 32 or 64 codes during the warm-up procedure for the smaller models? Perhaps I'm misunderstanding the paper. Thanks!

The MaskGIT-VQGAN's code is used to supervise the output of TiTok's de-tokenizer, similar to BEIT. As we use the mask token sequence (BERT, MAE style) to reconstruct the target sequence, it does not matter how many tokens we are using or they are using. We do not apply any "distill" or "loss" between TiTok's codebook/embedding and MaskGIT-VQ's codebook/embedding etc. Hope it addresses your question.

jeasinema · 2024-06-25T03:39:23Z

Thank you for the reply. I think I get it, but please correct me if I am wrong here -- so during the warm-up stage, eq. (4) in the main paper is different: 1) it produces the codes of MaskGIT-VQGAN instead of pixels directly; 2) the codes will be processed by MaskGIT-VQGAN decoder into pixels.

I would appreciate it if you could update the main paper with an equation clearing these differences if this is the case. That will help the readers a lot!

cornettoyu mentioned this issue Jun 24, 2024

About the proxy code during training #6

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training code? #1

Training code? #1

kabachuha commented Jun 21, 2024

cornettoyu commented Jun 21, 2024

reedscot commented Jun 22, 2024

cornettoyu commented Jun 22, 2024

vkramanuj commented Jun 22, 2024 •

edited

Loading

cornettoyu commented Jun 22, 2024

jeasinema commented Jun 25, 2024

Training code? #1

Training code? #1

Comments

kabachuha commented Jun 21, 2024

cornettoyu commented Jun 21, 2024

reedscot commented Jun 22, 2024

cornettoyu commented Jun 22, 2024

vkramanuj commented Jun 22, 2024 • edited Loading

cornettoyu commented Jun 22, 2024

jeasinema commented Jun 25, 2024

vkramanuj commented Jun 22, 2024 •

edited

Loading