questions about the image generation? #9

mutonix · 2024-07-10T07:21:56Z

Thanks for bringing the great work!

I have some questions about image generation in Anole. Does Anole utilize the vqgan decoder from Chameleon? As Chameleon has also released the vqgan weight for image generation (though they claim the image generation function is banned), what new things are added in Anole?

Many thanks!

EthanC111 · 2024-07-10T07:27:08Z

Thank you for your interest!

Yes, Anole uses the same vqgan from Chameleon. As you mentioned, the open-sourced version of Chameleon doesn't support vision generation. Anole facilitates image and multimodal generation capabilities from Chameleon.
We will upgrade Anole with new functionality soon! Stay tuned for more updates!

mutonix · 2024-07-10T07:37:40Z

Thank you for the quick reply.
I am curious about if we directly use the chameleon vqgan to generate images, will it work? Or does it have to be fine-tuned like what Anole had done to activate the image generation capability? Do you experiment on directly applying the Chameleon vqgan without fine-tuning?

EthanC111 · 2024-07-10T08:13:22Z

The VQGAN part seems to work pretty well. According to our experiments, the reconstructed images looks pretty much the same as the original image.

EthanC111 · 2024-07-10T08:13:47Z

We did not change the VQGAN tokenizer.

matbee-eth · 2024-07-10T15:57:59Z

Thank you for your interest!

Yes, Anole uses the same vqgan from Chameleon. As you mentioned, the open-sourced version of Chameleon doesn't support vision generation. Anole facilitates image and multimodal generation capabilities from Chameleon. We will upgrade Anole with new functionality soon! Stay tuned for more updates!

I would love a semi-descriptive (ELI a 40 year old full stack eng) writeup on how this is achieved

EthanC111 · 2024-07-10T16:25:45Z

I would love a semi-descriptive (ELI a 40 year old full stack eng) writeup on how this is achieved

@matbee-eth Thank you for your interest! This is our paper: https://arxiv.org/abs/2407.06135

mutonix · 2024-07-11T02:28:21Z

Can you further explain this question? Many thanks!

Does Chameleon have to be fine-tuned like what Anole has done to activate the intrinsic image generation capability that is banned? Do you experiment with directly applying the Chameleon original weights to generate images without fine-tuning (as vqgan decoder weights are provided by meta and Chameleon theoretically can generate images without fine-tuning)?

b2r66sun · 2024-07-11T07:02:53Z

I 'd also like to know whether you tune the vqgan or directly use the weight from chameleon. Many thanks

EthanC111 · 2024-07-12T07:37:28Z

Hi @mutonix , Chameleon doesn't support image generation. For more information, please see this issue. Anole is fine-tuned from Chameleon to facilitate image generation and multimodal generation.
Hi @b2r66sun , we did not tune the VQGAN. We directly use the VQGAN provided by Chameleon.

mutonix · 2024-07-13T15:52:58Z

In the issue you have mentioned, he does not mention that he has commented out the following code or similar code in the original chameleon implementation:

image_tokens = self.model.vocabulary_mapping.image_tokens
logits[:, :, image_tokens] = torch.finfo(logits.dtype).min

Maybe that is the reason why he can not get the correct images. Have you tried to comment out the above code to directly generate the image? My confusion is that only fine-tuning can activate the image generation capability or just commenting out a few lines of code is ok.

AbrahamSanders · 2024-07-15T20:41:46Z

@mutonix I can't find anything like that in the original Chameleon implementation, only in the transformers version distributed with Anole (for fine-tuning purposes?): modeling_chameleon.py#L1627

I tried swapping the original Chameleon 7b weights for Anole 7b and running the original Chameleon Miniviewer. It appears to be capable of generating coherent images only when using the Anole weights.

Yuheng-Li · 2024-07-25T00:04:19Z

The original Chameleon seems released the vqgan's decoder, then how does the Chameleon banned the image generation ability?

what does Anole do to activate this ability? For example, Does Chameleon remove the logits corresponding to image tokens in the last layer, and Anole added it back?

JoyBoy-Su added the question Further information is requested label Jul 11, 2024

Yuheng-Li mentioned this issue Jul 24, 2024

where is vqgan decoder coming from #32

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

questions about the image generation? #9

questions about the image generation? #9

mutonix commented Jul 10, 2024

EthanC111 commented Jul 10, 2024

mutonix commented Jul 10, 2024

EthanC111 commented Jul 10, 2024

EthanC111 commented Jul 10, 2024

matbee-eth commented Jul 10, 2024

EthanC111 commented Jul 10, 2024 •

edited

Loading

mutonix commented Jul 11, 2024

b2r66sun commented Jul 11, 2024 •

edited

Loading

EthanC111 commented Jul 12, 2024 •

edited

Loading

mutonix commented Jul 13, 2024 •

edited

Loading

AbrahamSanders commented Jul 15, 2024 •

edited

Loading

Yuheng-Li commented Jul 25, 2024

questions about the image generation? #9

questions about the image generation? #9

Comments

mutonix commented Jul 10, 2024

EthanC111 commented Jul 10, 2024

mutonix commented Jul 10, 2024

EthanC111 commented Jul 10, 2024

EthanC111 commented Jul 10, 2024

matbee-eth commented Jul 10, 2024

EthanC111 commented Jul 10, 2024 • edited Loading

mutonix commented Jul 11, 2024

b2r66sun commented Jul 11, 2024 • edited Loading

EthanC111 commented Jul 12, 2024 • edited Loading

mutonix commented Jul 13, 2024 • edited Loading

AbrahamSanders commented Jul 15, 2024 • edited Loading

Yuheng-Li commented Jul 25, 2024

EthanC111 commented Jul 10, 2024 •

edited

Loading

b2r66sun commented Jul 11, 2024 •

edited

Loading

EthanC111 commented Jul 12, 2024 •

edited

Loading

mutonix commented Jul 13, 2024 •

edited

Loading

AbrahamSanders commented Jul 15, 2024 •

edited

Loading