-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
questions about the image generation? #9
Comments
Thank you for your interest! Yes, Anole uses the same vqgan from Chameleon. As you mentioned, the open-sourced version of Chameleon doesn't support vision generation. Anole facilitates image and multimodal generation capabilities from Chameleon. |
Thank you for the quick reply. |
The VQGAN part seems to work pretty well. According to our experiments, the reconstructed images looks pretty much the same as the original image. |
We did not change the VQGAN tokenizer. |
I would love a semi-descriptive (ELI a 40 year old full stack eng) writeup on how this is achieved |
@matbee-eth Thank you for your interest! This is our paper: https://arxiv.org/abs/2407.06135 |
Can you further explain this question? Many thanks! Does Chameleon have to be fine-tuned like what Anole has done to activate the intrinsic image generation capability that is banned? Do you experiment with directly applying the Chameleon original weights to generate images without fine-tuning (as vqgan decoder weights are provided by meta and Chameleon theoretically can generate images without fine-tuning)? |
I 'd also like to know whether you tune the vqgan or directly use the weight from chameleon. Many thanks |
Hi @mutonix , Chameleon doesn't support image generation. For more information, please see this issue. Anole is fine-tuned from Chameleon to facilitate image generation and multimodal generation. |
In the issue you have mentioned, he does not mention that he has commented out the following code or similar code in the original chameleon implementation:
Maybe that is the reason why he can not get the correct images. Have you tried to comment out the above code to directly generate the image? My confusion is that only fine-tuning can activate the image generation capability or just commenting out a few lines of code is ok. |
@mutonix I can't find anything like that in the original Chameleon implementation, only in the transformers version distributed with Anole (for fine-tuning purposes?): modeling_chameleon.py#L1627 I tried swapping the original Chameleon 7b weights for Anole 7b and running the original Chameleon Miniviewer. It appears to be capable of generating coherent images only when using the Anole weights. |
The original Chameleon seems released the vqgan's decoder, then how does the Chameleon banned the image generation ability? what does Anole do to activate this ability? For example, Does Chameleon remove the logits corresponding to image tokens in the last layer, and Anole added it back? |
Thanks for bringing the great work!
I have some questions about image generation in Anole. Does Anole utilize the vqgan decoder from Chameleon? As Chameleon has also released the vqgan weight for image generation (though they claim the image generation function is banned), what new things are added in Anole?
Many thanks!
The text was updated successfully, but these errors were encountered: