🔥🔥🔥 A Survey on Multimodal Large Language Models
Project Page [This Page] | Paper
The first comprehensive survey for Multimodal Large Language Models (MLLMs). ✨
Welcome to add WeChat ID (wmd_ustc) to join our MLLM communication group! 🌟
🔥🔥🔥 VITA: Towards Open-Source Interactive Omni Multimodal LLM
You can experience our Basic Demo on ModelScope directly. The Real-Time Interactive Demo needs to be configured according to the instructions.
🔥🔥🔥 MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs
🔥🔥🔥 Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
Project Page | Paper | GitHub | Dataset | Leaderboard
We are very proud to launch Video-MME, the first-ever comprehensive evaluation benchmark of MLLMs in Video Analysis! 🌟
It includes short- (< 2min), medium- (4min~15min), and long-term (30min~60min) videos, ranging from 11 seconds to 1 hour. All data are newly collected and annotated by humans, not from any existing video dataset. ✨
🔥🔥🔥 MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models
Paper | Download | Eval Tool | ✒️ Citation
A representative evaluation benchmark for MLLMs. ✨
🔥🔥🔥 Woodpecker: Hallucination Correction for Multimodal Large Language Models
Paper | GitHub
This is the first work to correct hallucination in multimodal large language models. ✨
🔥🔥🔥 Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM
Project Page | Paper | GitHub
A speech-to-speech dialogue model with both low-latency and high intelligence while the training process is based on a frozen LLM. ✨
Table of Contents
Title | Venue | Date | Code | Demo |
---|---|---|---|---|
Look Twice Before You Answer: Memory-Space Visual Retracing for Hallucination Mitigation in Multimodal Large Language Models |
arXiv | 2024-10-04 | Github | - |
Interpreting and Editing Vision-Language Representations to Mitigate Hallucinations |
arXiv | 2024-10-03 | Github | - |
FIHA: Autonomous Hallucination Evaluation in Vision-Language Models with Davidson Scene Graphs | arXiv | 2024-09-20 | Link | - |
Alleviating Hallucination in Large Vision-Language Models with Active Retrieval Augmentation | arXiv | 2024-08-01 | - | - |
Paying More Attention to Image: A Training-Free Method for Alleviating Hallucination in LVLMs |
ECCV | 2024-07-31 | Github | - |
Evaluating and Analyzing Relationship Hallucinations in LVLMs |
ICML | 2024-06-24 | Github | - |