You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I understand that the output of the audio model is the input required by the 7b model, not the 13b model, so can audio use the 7b model and LLM use the 13b model? If not, what's the point of releasing vicuna13b-v2?
The text was updated successfully, but these errors were encountered:
When you are launching the latest version of Video-LLaMA, you should use the VL & AL checkpoints aligned with the same language decoder (because the VL branch and the AL branch share the language decoder). So, for your first question, the answer is NO.
For your second question, the *-vicuna13b-v2 checkpoints (for VL branch) were uploaded to Hugging Face in the previous release where our Video-LLaMA only contained one VL branch at that time (i.e., without audio support).
I understand that the output of the audio model is the input required by the 7b model, not the 13b model, so can audio use the 7b model and LLM use the 13b model? If not, what's the point of releasing vicuna13b-v2?
The text was updated successfully, but these errors were encountered: