Skip to content

Latest commit

 

History

History
69 lines (47 loc) · 1.62 KB

EVAL.md

File metadata and controls

69 lines (47 loc) · 1.62 KB

Zero-shot Evaluation Results

Tips 🚀: We decrease lora_alpha from 32 to 20 during inference to restore the model's language capabilities, which is very helpful for benchmarks in QA forms. Please refer to FAQ.md for details.

TempCompass

link: https://github.com/llyx97/TempCompass

leaderboard: https://huggingface.co/spaces/lyx97/TempCompass

evaluation scripts:

  1. firstly reset MODEL_DIR, ANNO_DIR, and VIDEO_DIR in eval_tempcompass.sh

  2. run:

    cd benchmark
    sh eval_tempcompass.sh

results:

MVBench

link: https://github.com/OpenGVLab/Ask-Anything/tree/main/video_chat2

leaderboard: https://huggingface.co/spaces/OpenGVLab/MVBench_Leaderboard

evaluation scripts:

  1. firstly reset MODEL_DIR, ANNO_DIR, and VIDEO_DIR in eval_mvbench.sh

  2. run:

    cd benchmark
    sh eval_mvbench.sh

results:

EgoSchema

link: https://github.com/egoschema/EgoSchema

leaderboard: https://www.kaggle.com/competitions/egoschema-public/overview

evaluation scripts:

  1. firstly reset MODEL_DIR, ANNO_DIR, and VIDEO_DIR in eval_egoschema.sh

  2. run:

    cd benchmark
    sh eval_egoschema.sh

results:

VideoMME

TBD