Year | Source | Name | Author | Content | Class |
---|---|---|---|---|---|
2024 | ArXiv | Hallucinations in Neural Automatic Speech Recognition: Identifying Errors and Hallucinatory Models | Rita Frieske | This paper define hallucinations in ASR as transcriptions generated by a model that are semantically unrelated to the source utterance, yet still fluent and coherent. | Audio |
2022 | ACCV | Thinking Hallucination for Video Captioning | Nasib Ullah | To alleviate Object Hallucination and Action Hallucination, COAHA (by this paper) is proposed to comprehensively assess the extent of these two types of hallucinations. | Video |
2019 | ArXiv | Object Hallucination in Image Captioning | Anna Rohrbach | This propose a new image relevance metric to evaluate current models with veridical visual labels and assess their rate of object hallucination. | Image |
Year | Source | Name | Author | Content | Class |
---|---|---|---|---|---|
2024 | ArXiv | Survey on Factuality in Large Language Models: Knowledge, Retrieval and Domain-Specificity | CUNXIANG WANG | This paper define capability of large language models for generating contents that follow factual information, which encompasses commonsense, world knowledge and domain facts. | Language |
2017 | ArXiv | Overcoming catastrophic forgetting in neural networks | James Kirkpatrick | normal |
Year | Source | Name | Author | Content | Class |
---|---|---|---|---|---|
2024 | ArXiv | AMBER: An LLM-free Multi-dimensional Benchmark for MLLMs Hallucination Evaluation | Junyang Wang | Image |
Year | Source | Name | Author | Content | Class |
---|---|---|---|---|---|
2024 | ArXiv | MLLM-Protector: Ensuring MLLM’s Safety without Hurting Performance | Renjie Pi | Image | |
2023 | ArXiv | Investigating the Catastrophic Forgetting in Multimodal Large Language Models | Yuexiang Zhai | Image | |
2023 | ArXiv | Speciality vs Generality: An Empirical Study on Catastrophic Forgetting in Fine-tuning Foundation Models | Yong Lin | Language | |
2023 | ArXiv | t-DGR: A Trajectory-Based Deep Generative Replay Method for Continual Learning in Decision Making | William Yue | Agent |
Year | Source | Name | Author | Content | Class |
---|---|---|---|---|---|
2024 | ArXiv | AMBER: An LLM-free Multi-dimensional Benchmark for MLLMs Hallucination Evaluation | Junyang Wang | Audio |
Year | Source | Name | Author | Content |
---|---|---|---|---|
2023 | ArXiv | Videochat: Chat-centric video understanding. | KunChang Li | They utilize Detailed Video Descriptions to reduce hallucinations and introduce spatiotemporal reasoning, event localization, and causal relationship to enrich the semantic expression of video-text, setting a standard for future research. |
2023 | ArXiv | Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding | Peng Jin | They have pioneered the integration of images and videos, proposing the concept of dynamic visual tokens, and employing both high-level semantic features and low-level visual detail features to propose a unified method. |
2023 | ArXiv | Deficiency-Aware Masked Transformer for Video Inpainting | Yongsheng Yu | introduce a dual-modality-compatible inpainting framework called Deficiency-aware Masked Transformer (DMT),they pretrain a image inpainting model DMTimg serve as a prior for distilling the video model DMTvid, thereby benefiting the hallucination of deficiency cases. |
2023 | ArXiv | Video-LLaMA An Instruction-tuned Audio-Visual Language Model for Video Understanding | Hang Zhang | This paper introduces a Video Q-former to better address the issue of inconsistencies in the temporal understanding of videos, and employs an Audio Q-former to further capture audio features. With the use of an adapter, multimodal and natural language integration is achieved, effectively mitigating the problem of hallucinations. |
2023 | ArXiv | Unified Model for Image, Video, Audio and Language Tasks | Mustafa Shukor | This model efficiently pretrained on many tasks, based on task balancing and multimodal curriculum learning and they propose a novel study on multimodal model merging via weight interpolation of models trained on different multimodal tasks, showing their benefits in particular for out-ofdistribution generalization |
2022 | ArXiv | Information-Theoretic Text Hallucination Reduction for Video-grounded Dialogue | Sunjae Yoon | This paper proposes a THR regularization loss to alleviate hallucinations, mitigating the impact of feature-level hallucinations by essentially reducing mutual information at the level of text features and image feature information. |
2023 | AAAI | Knowledge-Constrained Answer Generation for Open-Ended Video Question Answering | Yao Jin | |
2023 | ArXiv | Woodpecker: Hallucination Correction for Multimodal Large Language Models | Shukang Yin |
Year | Source | Name | Author | Content |
---|---|---|---|---|
2019 | CVPR | Learning 3D Human Dynamics from Video | Angjoo Kanazawa | learn a representation of 3D dynamics of humans from video via a simple but effective temporal encoding of image features which can reduce hallucinations |
2023 | ArXiv | 3D-LLM: Injecting the 3D World into Large Language Models | Yining Hong | 3D-LLMs can take 3D point clouds and their features as input and perform a diverse set of 3D-related tasks, including captioning, dense captioning, 3D question answering, task decomposition, 3D grounding, 3D-assisted dialog, navigation, and so on. |
2023 | ArXiv | M3DBench: Let’s Instruct Large Models with Multi-modal 3D Prompts | Mingsheng Li | introduce a comprehensive 3D instruction following dataset called M3DBench, It supports general multimodal instructions interleaved with text, images, 3D objects, and other visual prompts. It unifies diverse 3D tasks at both region and scene levels, covering a variety of fundamental abilities in real-world 3D environments.It is a large-scale 3D instruction-following dataset with over 320k instruction-response pairs. |
Year | Source | Name | Author | Content |
---|---|---|---|---|
2023 | AAAI | Certified Policy Smoothing for Cooperative Multi-Agent Reinforcement Learning | Ronghui Mu | |
2023 | ArXiv | Audio Visual Language Maps for Robot Navigation | Chenguang Huang |
Year | Source | Name | Author | Content |
---|---|---|---|---|
2022 | ACCV | Thinking Hallucination for Video Captioning | Nasib Ullah | To alleviate Object Hallucination and Action Hallucination, COAHA (by this paper) is proposed to comprehensively assess the extent of these two types of hallucinations. |
2022 | ArXiv | Audio-visual video face hallucination with frequency supervision and cross modality support by speech based lip reading loss | Shailza Sharma | |
2022 | ArXiv | Efficient Human Vision Inspired Action Recognition Using Adaptive Spatiotemporal Sampling | Khoi-Nguyen C. Mac | pre-scans the global scene context at low-resolution and decides to skip or request high-resolution features at salient regions for further processing. Based on a pre-scanned features, the temporal sampler decides whether to process the frame fully (Full model), or skip to the frame and propagate past information (bottom block). The spatial sampler in turns select RoIs from high-res input to augment the features with low-res inputs. |
2022 | ArXiv | Video Question Answering: Datasets, Algorithms and Challenges | Yaoyao Zhong | fine-grained to coarsegrained in both temporal and spatial domains , information from noisy web-scale visiontext data , multi-step reasoning |
2023 | ArXiv | RETRIEVAL-BASED VIDEO LANGUAGE MODEL FOR EFFICIENT LONG VIDEO QUESTION ANSWERING | Jiaqi Xu | long video and long text can introduces noise to the video QA process |
Year | Source | Name | Author | Content |
---|---|---|---|---|
2022 | ACCV | PoseTriplet: Co-evolving 3D Human Pose Estimation, Imitation, and Hallucination under Self-supervision | Kehong Gong | |
2014 | IEEE | 3D Face Hallucination from a Single Depth Frame | Shu Liang | |
2022 | AAAI | Texture Generation Using Dual-Domain Feature Flow with Multi-View Hallucinations | Seunggyu Chang | |
2023 | ArXiv | M3DBench: Let’s Instruct Large Models with Multi-modal 3D Prompts | Jiaqi Xu |
Year | Source | Name | Author | Content |
---|---|---|---|---|
2022 | IEEE | Hallucination of Speech Recognition Errors With Sequence to Sequence Learning | Prashant Serai | They present novel end-to-end models to directly predict hallucinated ASR word sequence outputs, conditioning on an input word sequence as well as a corresponding phoneme sequence. |
2023 | ArXiv | PARAMETER EFFICIENT AUDIO CAPTIONING WITH FAITHFUL GUIDANCE USING AUDIO-TEXT SHARED LATENT REPRESENTATION | Arvind Krishna Sridhar | propose a data augmentation technique for generating hallucinated audio captions and show that similarity based on an audio-text shared latent space is suitable for detecting hallucination. and propose a parameter efficient inference time faithful decoding algorithm that enables smaller audio captioning models with performance equivalent to larger models trained with more data |
2023 | ArXiv | Factual Consistency Oriented Speech Recognition | Naoyuki Kanda | This paper presents a novel optimization framework for automatic speech recognition (ASR) with the aim of reducing hallucinations produced by an ASR model. The proposed framework optimizes the ASR model to maximize an expected factual consistency score between ASR hypotheses and groundtruth transcriptions, where the factual consistency score is computed by a separately trained estimator. |
2023 | ArXiv | LP-MusicCaps: LLM-BASED PSEUDO MUSIC CAPTIONING | SeungHeon Doh | |
2020 | ArXiv | Identifying Audio Adversarial Examples via Anomalous Patern Detection | Victor Akinwande | Audio processing models based on deep neural networks are susceptible to adversarial attacks even when the adversarial audio waveform is 99.9% similar to a benign sample , propose a method to detect audio adversarial samples. |
2023 | ArXiv | LISTEN, THINK, AND UNDERSTAND | Yuan Gong | created a new OpenAQA-5M dataset consisting of 1.9 million closed-ended and 3.7 million open-ended, diverse (audio, question, answer) tuples, and have used an autoregressive training framework with a perception-to-understanding curriculum. LTU demonstrates strong performance and generalization ability on conventional audio tasks such as classification and captioning , can greatly mitigate the hallucination issue. |
Year | Source | Name | Author | Content |
---|---|---|---|---|
2023 | ArXiv | Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models | Yue Zhang | |
2022 | ArXiv | Survey of Hallucination in Natural Language Generation | ZIWEI JI | |
2023 | ArXiv | Theory of Hallucinations based on Equivariance | Hisaichi Shibata | |
2023 | ArXiv | Cognitive Mirage: A Review of Hallucinations in Large Language Models | Hongbin Ye | |
2023 | ArXiv | Factuality Challenges in the Era of Large Language Models | Isabelle Augenstein |
Year | Source | Name | Author | Content |
---|---|---|---|---|
2023 | ArXiv | Learning Perceptual Hallucination for Multi-Robot Navigation in Narrow Hallways | Jin-Soo Park | |
2023 | ArXiv | Robots That Ask For Help: Uncertainty Alignment for Large Language Model Planners | Allen Z. Ren | |
2021 | ArXiv | Toward Agile Maneuvers in Highly Constrained Spaces: Learning from Hallucination | Xuesu Xiao | |
2023 | ArXiv | LARGE LANGUAGE MODELS AS GENERALIZABLE POLICIES FOR EMBODIED TASKS | Andrew Szot | |
2023 | ArXiv | CogAgent: A Visual Language Model for GUI Agents | Wenyi Hong |
Year | Source | Name | Author | Content |
---|---|---|---|---|
2023 | ArXiv | FACTSCORE: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation | Sewon Min | |
2023 | ArXiv | Generating Benchmarks for Factuality Evaluation of Language Models | Dor Muhlgay | |
2022 | ArXiv | Teaching models to express their uncertainty in words | Stephanie Lin |
Year | Source | Name | Author | Content |
---|---|---|---|---|
2023 | ArXiv | A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity | Yejin Bang | |
2023 | ArXiv | AMBER: An LLM-free Multi-dimensional Benchmark for MLLMs Hallucination Evaluation | Junyang Wang | |
2023 | ArXiv | A Survey of Hallucination in “Large” Foundation Models | Vipula Rawte | |
2023 | ArXiv | HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models | Junyi Li | |
2023 | ArXiv | Evaluating Object Hallucination in Large Vision-Language Models | Yifan Li | |
2023 | ArXiv | A Benchmark for General AI Assistants | Gr´egoire Mialon | |
2023 | ArXiv | A Survey of Hallucination in “Large” Foundation Models | Vipula Rawte |
Year | Source | Name | Author | Content |
---|---|---|---|---|
2023 | ArXiv | A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity | Yejin Bang |
Year | Source | Name | Author | Content |
---|---|---|---|---|
2023 | ArXiv | Models See Hallucinations: Evaluating the Factuality in Video Captioning | Hui Liu | |
2023 | ArXiv | VIDEO-CSR: COMPLEX VIDEO DIGEST CREATION FOR VISUAL-LANGUAGE MODELS | Tingkai Liu |
Year | Source | Name | Author | Content |
---|---|---|---|---|
2023 | ArXiv | Evaluating VLMs for Score-Based, Multi-Probe Annotation of 3D Objects | Rishabh Kabra |
Year | Source | Name | Author | Content |
---|---|---|---|---|
2020 | ACL | Asking and Answering Questions to Evaluate the Factual Consistency of Summaries | Alex Wang |
Mitigating hallucinations is essential in AGI eras, it is also important to notice that not all such occurrences are detrimental. In some scenarios, hallucinations can induce the model's creativity. Striking a balance between hallucination and creation is a crucial challenge.
Year | Source | Name | Author | Content |
---|---|---|---|---|
2023 | ArXiv | LLM LIES: HALLUCINATIONS ARE NOT BUGS, BUT FEATURES AS ADVERSARIAL EXAMPLES | Jia-Yu Yao | |
2023 | IEEE | Intentional Biases in LLM Responses | Nicklaus Badyal | |
2023 | ArXiv | User-Controlled Knowledge Fusion in Large Language Models: Balancing Creativity and Hallucination | Chen Zhang | |
2022 | ArXiv | Embedding Hallucination for Few-Shot Language Fine-tuning | Yiren Jian |
Year | Source | Name | Author | Content |
---|---|---|---|---|
2023 | ArXiv | Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization | Zhiyuan Zhao | |
2023 | ArXiv | Iterative Teaching by Data Hallucination | Zeju Qiu | |
2023 | ArXiv | Hallucination Improves the Performance of Unsupervised Visual Representation Learning | Jing Wu | |
2023 | ArXiv | Scene Graph as Pivoting: Inference-time Image-free Unsupervised Multimodal Machine Translation with Visual Scene Hallucination | Hao Fei |
Year | Source | Name | Author | Content |
---|---|---|---|---|
2023 | ArXiv | Putting People in Their Place: Affordance-Aware Human Insertion into Scenes | Sumith Kulal | This paper proposes the method of inserting characters into scenes, enabling the model to generate both character and scene hallucinations, resulting in compositions that are both harmonious and creative. |
2023 | ArXiv | Multi-Object Tracking with Hallucinated and Unlabeled Videos | Daniel McKee |
Year | Source | Name | Author | Content |
---|---|---|---|---|
2023 | ArXiv | HALLUAUDIO: HALLUCINATE FREQUENCY AS CONCEPTS FOR FEW-SHOT AUDIO CLASSIFICATION | Zhongjie Yu |
Year | Source | Name | Author | Content |
---|---|---|---|---|
2021 | ArXiv | Agile Robot Navigation through Hallucinated Learning and Sober Deployment | Xuesu Xiao | |
2021 | ArXiv | From Agile Ground to Aerial Navigation: Learning from Learned Hallucination | Zizhao Wang | |
2023 | ArXiv | HaLP: Hallucinating Latent Positives for Skeleton-based Self-Supervised Learning of Actions | Anshul Shah |