From aadcaf8bd0469f63a1e93af95eb91f42ecd678d1 Mon Sep 17 00:00:00 2001 From: liuyuanxin Date: Tue, 12 Mar 2024 21:39:24 +0800 Subject: [PATCH] add TempCompass --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index d3c33d7..8c6caf4 100644 --- a/README.md +++ b/README.md @@ -552,6 +552,7 @@ The first work to correct hallucinations in MLLMs. :sparkles: | **LAMM-Benchmark** | [LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset, Framework, and Benchmark](https://arxiv.org/pdf/2306.06687.pdf) | [Link](https://github.com/OpenLAMM/LAMM#lamm-benchmark) | A benchmark for evaluating the quantitative performance of MLLMs on various2D/3D vision tasks | | **M3Exam** | [M3Exam: A Multilingual, Multimodal, Multilevel Benchmark for Examining Large Language Models](https://arxiv.org/pdf/2306.05179.pdf) | [Link](https://github.com/DAMO-NLP-SG/M3Exam) | A multilingual, multimodal, multilevel benchmark for evaluating MLLM | | **OwlEval** | [mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality](https://arxiv.org/pdf/2304.14178.pdf) | [Link](https://github.com/X-PLUG/mPLUG-Owl/tree/main/OwlEval) | Dataset for evaluation on multiple capabilities | +| **TempCompass** | [TempCompass: Do Video LLMs Really Understand Videos?](https://arxiv.org/pdf/2403.00476.pdf) | [Link](https://github.com/llyx97/TempCompass) | A benchmark to evaluate the temporal perception ability of Video LLMs | ## Others | Name | Paper | Link | Notes |