About the choices in LLaVA+S^2 implementation #10

jungle-gym-ac · 2024-05-26T09:27:13Z

Great work! I've read the paper and it seems the LLaVA+S^2 is implemented with OpenCLIP VIsion Encoder, and the LLM is finetuned with LoRA. However, the LLaVA baseline you compared with is implemented with OpenAI-CLIP Vision Encoder, and the LLM is full-finetuned(without LoRA).

If I'm right, I just wonder if you have tried using the same Vision Encoder, or full-finetuning the LLM, and what are the results of this setting? Thank you.

The text was updated successfully, but these errors were encountered:

bfshi · 2024-05-26T19:34:12Z

Hi @jungle-gym-ac, yeah good question. In the scaling experiment on llava (Fig 3 in the paper), all the models including the baselines use openclip. The experiment of comparing llava-s2 to official llava (Table 11 in Appendix) uses OpenAI clip.

And you are right, all the models we trained on llava use lora while the official llava checkpoint we compare to uses full fine tuning. According to the official llava repo, the performance of llava with ft/lora doesn't differ much on average, but yeah comparing to the official checkpoint Lora would be fairer. We will include this in a later version of the paper. And we didn't try llava-s2 with ft.

bfshi · 2024-05-28T18:47:37Z

The training recipe is the exact same one as LLaVA. Loss of 2.5 seems weird. That is almost the same loss as an untrained model. Did you change anything from the llava repo that may cause some unexpected behaviors?

…

On Tue, May 28, 2024 at 12:39 AM zuijiang ***@***.***> wrote: @bfshi <https://github.com/bfshi> I'm wondering is the pretraining data used in LLaVA+S^2 the same as the original LLaVA? cuz I tried pretraining LLaVA with S^2, and the loss is much bitter than LLaVA (2.5 compared with 0.7) — Reply to this email directly, view it on GitHub <#10 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AIIE4LJT4KE23HMRRQMTIW3ZEQDBTAVCNFSM6AAAAABIJUQK22VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMZUGMZTENBYGY> . You are receiving this because you were mentioned.Message ID: ***@***.***>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About the choices in LLaVA+S^2 implementation #10

About the choices in LLaVA+S^2 implementation #10

jungle-gym-ac commented May 26, 2024

bfshi commented May 26, 2024

bfshi commented May 28, 2024 via email

About the choices in LLaVA+S^2 implementation #10

About the choices in LLaVA+S^2 implementation #10

Comments

jungle-gym-ac commented May 26, 2024

bfshi commented May 26, 2024

bfshi commented May 28, 2024 via email