Zero-shot Evaluation Results

Tips 🚀: We decrease lora_alpha from 32 to 20 during inference to restore the model's language capabilities, which is very helpful for benchmarks in QA forms. Please refer to FAQ.md for details.

TempCompass

link: https://github.com/llyx97/TempCompass

leaderboard: https://huggingface.co/spaces/lyx97/TempCompass

evaluation scripts:

firstly reset MODEL_DIR, ANNO_DIR, and VIDEO_DIR in eval_tempcompass.sh
run:
```
cd benchmark
sh eval_tempcompass.sh
```

results:

MVBench

link: https://github.com/OpenGVLab/Ask-Anything/tree/main/video_chat2

leaderboard: https://huggingface.co/spaces/OpenGVLab/MVBench_Leaderboard

evaluation scripts:

firstly reset MODEL_DIR, ANNO_DIR, and VIDEO_DIR in eval_mvbench.sh
run:
```
cd benchmark
sh eval_mvbench.sh
```

results:

EgoSchema

link: https://github.com/egoschema/EgoSchema

leaderboard: https://www.kaggle.com/competitions/egoschema-public/overview

evaluation scripts:

firstly reset MODEL_DIR, ANNO_DIR, and VIDEO_DIR in eval_egoschema.sh
run:
```
cd benchmark
sh eval_egoschema.sh
```

results:

VideoMME

TBD