About choosing dataset format and pre-training weights #306

xmc-andy · 2023-11-13T07:29:01Z

Hello, authors! I have a question about choosing a dataset format and corresponding weights. I am doing a classification task with multiple images and prompt input. If multiple images are regarded as videos, there are two options: SD format (single <image> + single <Users>, where <image> represents all images) and DC mode (single <image> + multiple <Users>) . I understand their difference lies in the use of prompt. DC mode is more suitable for each picture with detailed prompts, while SD mode is suitable for all pictures to use a unified prompt. Is my understanding correct?

In addition, I used the Image-MPT7B weight in SD mode before, but it seems that the Video-LLaMA7B-DenseCaption weight in DC/SD mode is more suitable for the video frame mode. Is my understanding correct?

Luodian · 2023-11-13T07:33:15Z

Yes, it's pretty correct! I suggest you use DC mode and use Video pretrained weights. You could see via our web demo, the backend model is Video-LLaMA7B-DC.

Remember to put the multiple images as frames in the [B, T, F, C, H, W]'s F dimension (debug at vision_x to see the actual dimension during your training)
And I will suggest you to try both template:

1. <image> + prompt
2. <image><image>...<image> + prompt

For training DC, we use the first.

xmc-andy · 2023-11-13T07:35:47Z

Thank you so much!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About choosing dataset format and pre-training weights #306

About choosing dataset format and pre-training weights #306

xmc-andy commented Nov 13, 2023 •

edited

Loading

Luodian commented Nov 13, 2023 •

edited

Loading

xmc-andy commented Nov 13, 2023

About choosing dataset format and pre-training weights #306

About choosing dataset format and pre-training weights #306

Comments

xmc-andy commented Nov 13, 2023 • edited Loading

Luodian commented Nov 13, 2023 • edited Loading

xmc-andy commented Nov 13, 2023

xmc-andy commented Nov 13, 2023 •

edited

Loading

Luodian commented Nov 13, 2023 •

edited

Loading