Hi, thanks for sharing this great work and the code!
I’m currently experimenting with a similar setup and had a question regarding the behavior of the codebook during training.
Specifically, I’m wondering whether you observed any codebook collapse issues (e.g., low perplexity, low active code ratio, high code reuse) when training your model. In my own experiments, I sometimes see the model relying on only a small subset of codes.
If you did encounter such issues, I would really appreciate it if you could share:
- Whether this was a problem in your setting
- What techniques you found effective to mitigate it
Thanks a lot for your time and for making your work available!
Hi, thanks for sharing this great work and the code!
I’m currently experimenting with a similar setup and had a question regarding the behavior of the codebook during training.
Specifically, I’m wondering whether you observed any codebook collapse issues (e.g., low perplexity, low active code ratio, high code reuse) when training your model. In my own experiments, I sometimes see the model relying on only a small subset of codes.
If you did encounter such issues, I would really appreciate it if you could share:
Thanks a lot for your time and for making your work available!