📖 Definition for AGI Hallucination

Conflict in Intrinsic Knowledge of Models

Year	Source	Name	Author	Content	Class
2024	ArXiv	Hallucinations in Neural Automatic Speech Recognition: Identifying Errors and Hallucinatory Models	Rita Frieske	This paper define hallucinations in ASR as transcriptions generated by a model that are semantically unrelated to the source utterance, yet still fluent and coherent.	Audio
2022	ACCV	Thinking Hallucination for Video Captioning	Nasib Ullah	To alleviate Object Hallucination and Action Hallucination, COAHA (by this paper) is proposed to comprehensively assess the extent of these two types of hallucinations.	Video
2019	ArXiv	Object Hallucination in Image Captioning	Anna Rohrbach	This propose a new image relevance metric to evaluate current models with veridical visual labels and assess their rate of object hallucination.	Image

Factual Conflict in Information Forgetting and Updating

Year	Source	Name	Author	Content	Class
2024	ArXiv	Survey on Factuality in Large Language Models: Knowledge, Retrieval and Domain-Specificity	CUNXIANG WANG	This paper define capability of large language models for generating contents that follow factual information, which encompasses commonsense, world knowledge and domain facts.	Language
2017	ArXiv	Overcoming catastrophic forgetting in neural networks	James Kirkpatrick		normal

Conflict in Multimodal Fusion

Year	Source	Name	Author	Content	Class
2024	ArXiv	AMBER: An LLM-free Multi-dimensional Benchmark for MLLMs Hallucination Evaluation	Junyang Wang		Image

🌱 Emergence for AGI Hallucination

Training Data Distribution

Year	Source	Name	Author	Class
2021	ArXiv	How much do language models copy from their training data? Evaluating linguistic novelty in text generation using RAVEN	R. Thomas McCoy	Language
2023	ArXiv	Explanation Shift: How Did the Distribution Shift Impact the Model?	Carlos Mougan	Language
2023	ArXiv	Scaling Instruction-Finetuned Language Models	Hyung Won Chung	Language
2023	ArXiv	HOW ABILITIES IN LARGE LANGUAGE MODELS ARE AFFECTED BY SUPERVISED FINE-TUNING DATA COMPOSITION	Guanting Dong	Language
2023	ArXiv	On the Origin of Hallucinations in Conversational Models: Is it the Datasets or the Models?	Nouha Dziri	Language
2023	ArXiv	Sources of Hallucination by Large Language Models on Inference Tasks	Nick McKenna	Language
2023	ArXiv	LIMA: Less Is More for Alignment	Chunting Zhou	Language
2020	ArXiv	OVERFITTING OR UNDERFITTING? UNDERSTAND ROBUSTNESS DROP IN ADVERSARIAL TRAINING	Zichao Li	Language
2020	ArXiv	Data augmentation techniques for the Video Question Answering task	Alex Falcon	Video

Timeliness of Information

Year	Source	Name	Author	Class
2024	ArXiv	MLLM-Protector: Ensuring MLLM’s Safety without Hurting Performance	Renjie Pi	Image
2023	ArXiv	Investigating the Catastrophic Forgetting in Multimodal Large Language Models	Yuexiang Zhai	Image
2023	ArXiv	Speciality vs Generality: An Empirical Study on Catastrophic Forgetting in Fine-tuning Foundation Models	Yong Lin	Language
2023	ArXiv	t-DGR: A Trajectory-Based Deep Generative Replay Method for Continual Learning in Decision Making	William Yue	Agent

Ambiguity in Different Modalities

Year	Source	Name	Author	Content	Class
2024	ArXiv	AMBER: An LLM-free Multi-dimensional Benchmark for MLLMs Hallucination Evaluation	Junyang Wang		Audio

🛠️ Solution for AGI Hallucination

Language Hallucination

Year	Source	Name	Author
2022	ArXiv	Training language models to follow instructions with human feedback	Long Ouyang
2023	ArXiv	Faithful Persona-based Conversational Dataset Generation with Large Language Models	Pegah Jandaghi
2023	ArXiv	SpecInfer: Accelerating Generative Large Language Model Serving with Speculative Inference and Token Tree Verification	Xupeng Miao
2023	ArXiv	Post Hoc Explanations of Language Models Can Improve Language Models	Satyapriya Krishna
2023	ArXiv	ARE LARGE LANGUAGE MODELS POST HOC EXPLAINERS?	Nicholas Kroeger
2023	ArXiv	TELL YOUR MODEL WHERE TO ATTEND: POST-HOC ATTENTION STEERING FOR LLMS	Qingru Zhang
2023	ArXiv	PERSONALIZED SOUPS: PERSONALIZED LARGE LANGUAGE MODEL ALIGNMENT VIA POST-HOC PARAMETER MERGING	Joel Jang
2023	ArXiv	KNOWLEDGE SOLVER: TEACHING LLMS TO SEARCH FOR DOMAIN KNOWLEDGE FROM KNOWLEDGE GRAPHS	Chao Feng
2023	ArXiv	Enhancing LLM Intelligence with ARM-RAG: Auxiliary Rationale Memory for Retrieval Augmented Generation	Eric Melz
2023	ArXiv	Entity-Augmented Code Generation	Anton Shapkin

Video-Text Hallucination

Year	Source	Name	Author	Content
2023	ArXiv	Videochat: Chat-centric video understanding.	KunChang Li	They utilize Detailed Video Descriptions to reduce hallucinations and introduce spatiotemporal reasoning, event localization, and causal relationship to enrich the semantic expression of video-text, setting a standard for future research.
2023	ArXiv	Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding	Peng Jin	They have pioneered the integration of images and videos, proposing the concept of dynamic visual tokens, and employing both high-level semantic features and low-level visual detail features to propose a unified method.
2023	ArXiv	Deficiency-Aware Masked Transformer for Video Inpainting	Yongsheng Yu	introduce a dual-modality-compatible inpainting framework called Deficiency-aware Masked Transformer (DMT)，they pretrain a image inpainting model DMTimg serve as a prior for distilling the video model DMTvid, thereby benefiting the hallucination of deficiency cases.
2023	ArXiv	Video-LLaMA An Instruction-tuned Audio-Visual Language Model for Video Understanding	Hang Zhang	This paper introduces a Video Q-former to better address the issue of inconsistencies in the temporal understanding of videos, and employs an Audio Q-former to further capture audio features. With the use of an adapter, multimodal and natural language integration is achieved, effectively mitigating the problem of hallucinations.
2023	ArXiv	Unified Model for Image, Video, Audio and Language Tasks	Mustafa Shukor	This model efficiently pretrained on many tasks, based on task balancing and multimodal curriculum learning and they propose a novel study on multimodal model merging via weight interpolation of models trained on different multimodal tasks, showing their benefits in particular for out-ofdistribution generalization
2022	ArXiv	Information-Theoretic Text Hallucination Reduction for Video-grounded Dialogue	Sunjae Yoon	This paper proposes a THR regularization loss to alleviate hallucinations, mitigating the impact of feature-level hallucinations by essentially reducing mutual information at the level of text features and image feature information.
2023	AAAI	Knowledge-Constrained Answer Generation for Open-Ended Video Question Answering	Yao Jin
2023	ArXiv	Woodpecker: Hallucination Correction for Multimodal Large Language Models	Shukang Yin

3D Hallucination

Year	Source	Name	Author	Content
2019	CVPR	Learning 3D Human Dynamics from Video	Angjoo Kanazawa	learn a representation of 3D dynamics of humans from video via a simple but effective temporal encoding of image features which can reduce hallucinations
2023	ArXiv	3D-LLM: Injecting the 3D World into Large Language Models	Yining Hong	3D-LLMs can take 3D point clouds and their features as input and perform a diverse set of 3D-related tasks, including captioning, dense captioning, 3D question answering, task decomposition, 3D grounding, 3D-assisted dialog, navigation, and so on.
2023	ArXiv	M3DBench: Let’s Instruct Large Models with Multi-modal 3D Prompts	Mingsheng Li	introduce a comprehensive 3D instruction following dataset called M3DBench, It supports general multimodal instructions interleaved with text, images, 3D objects, and other visual prompts. It unifies diverse 3D tasks at both region and scene levels, covering a variety of fundamental abilities in real-world 3D environments.It is a large-scale 3D instruction-following dataset with over 320k instruction-response pairs.

Image-Text Hallucination

Year	Source	Name	Author
2023	ArXiv	ALIGNING LARGE MULTIMODAL MODELS WITH FACTUALLY AUGMENTED RLHF	Zhiqing Sun
2023	ArXiv	Mitigating Fine-Grained Hallucination by Fine-Tuning Large Vision-Language Models with Caption Rewrites	Lei Wang
2023	ArXiv	Improved Baselines with Visual Instruction Tuning	Haotian Liu
2017	ArXiv	SmoothGrad: removing noise by adding noise	Daniel Smilkov
2023	ArXiv	Mitigating Hallucination in Visual Language Models with Visual Supervision	Zhiyang Chen
2023	ArXiv	ANALYZING AND MITIGATING OBJECT HALLUCINATION IN LARGE VISION-LANGUAGE MODELS	Yiyang Zhou
2023	ArXiv	Multimodal Entity Tagging with Multimodal Knowledge Base	Hao Peng
2023	ArXiv	Boosting Entity-aware Image Captioning with Multi-modal Knowledge Graph	Wentian Zhao
2023	ArXiv	RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback	Tianyu Yu
2023	ArXiv	Learning by Hallucinating: Vision-Language Pre-training with Weak Supervision	Tzu-Jui Julius Wang
2023	ArXiv	MLLM-Protector: Ensuring MLLM’s Safety without Hurting Performance	Renjie Pi
2023	ArXiv	USING HUMAN FEEDBACK TO FINE-TUNE DIFFUSION MODELS WITHOUT ANY REWARD MODEL	Kai Yang

Robotic & Agent Hallucination

Year	Source	Name	Author	Content
2023	AAAI	Certified Policy Smoothing for Cooperative Multi-Agent Reinforcement Learning	Ronghui Mu
2023	ArXiv	Audio Visual Language Maps for Robot Navigation	Chenguang Huang

Video-Text Hallucination

Year	Source	Name	Author	Content
2022	ACCV	Thinking Hallucination for Video Captioning	Nasib Ullah	To alleviate Object Hallucination and Action Hallucination, COAHA (by this paper) is proposed to comprehensively assess the extent of these two types of hallucinations.
2022	ArXiv	Audio-visual video face hallucination with frequency supervision and cross modality support by speech based lip reading loss	Shailza Sharma
2022	ArXiv	Efficient Human Vision Inspired Action Recognition Using Adaptive Spatiotemporal Sampling	Khoi-Nguyen C. Mac	pre-scans the global scene context at low-resolution and decides to skip or request high-resolution features at salient regions for further processing. Based on a pre-scanned features, the temporal sampler decides whether to process the frame fully (Full model), or skip to the frame and propagate past information (bottom block). The spatial sampler in turns select RoIs from high-res input to augment the features with low-res inputs.
2022	ArXiv	Video Question Answering: Datasets, Algorithms and Challenges	Yaoyao Zhong	fine-grained to coarsegrained in both temporal and spatial domains ， information from noisy web-scale visiontext data , multi-step reasoning
2023	ArXiv	RETRIEVAL-BASED VIDEO LANGUAGE MODEL FOR EFFICIENT LONG VIDEO QUESTION ANSWERING	Jiaqi Xu	long video and long text can introduces noise to the video QA process

3D Hallucination

Year	Source	Name	Author
2022	ACCV	PoseTriplet: Co-evolving 3D Human Pose Estimation, Imitation, and Hallucination under Self-supervision	Kehong Gong
2014	IEEE	3D Face Hallucination from a Single Depth Frame	Shu Liang
2022	AAAI	Texture Generation Using Dual-Domain Feature Flow with Multi-View Hallucinations	Seunggyu Chang
2023	ArXiv	M3DBench: Let’s Instruct Large Models with Multi-modal 3D Prompts	Jiaqi Xu

Audio Hallucination

Year	Source	Name	Author	Content
2022	IEEE	Hallucination of Speech Recognition Errors With Sequence to Sequence Learning	Prashant Serai	They present novel end-to-end models to directly predict hallucinated ASR word sequence outputs, conditioning on an input word sequence as well as a corresponding phoneme sequence.
2023	ArXiv	PARAMETER EFFICIENT AUDIO CAPTIONING WITH FAITHFUL GUIDANCE USING AUDIO-TEXT SHARED LATENT REPRESENTATION	Arvind Krishna Sridhar	propose a data augmentation technique for generating hallucinated audio captions and show that similarity based on an audio-text shared latent space is suitable for detecting hallucination. and propose a parameter efficient inference time faithful decoding algorithm that enables smaller audio captioning models with performance equivalent to larger models trained with more data
2023	ArXiv	Factual Consistency Oriented Speech Recognition	Naoyuki Kanda	This paper presents a novel optimization framework for automatic speech recognition (ASR) with the aim of reducing hallucinations produced by an ASR model. The proposed framework optimizes the ASR model to maximize an expected factual consistency score between ASR hypotheses and groundtruth transcriptions, where the factual consistency score is computed by a separately trained estimator.
2023	ArXiv	LP-MusicCaps: LLM-BASED PSEUDO MUSIC CAPTIONING	SeungHeon Doh
2020	ArXiv	Identifying Audio Adversarial Examples via Anomalous Patern Detection	Victor Akinwande	Audio processing models based on deep neural networks are susceptible to adversarial attacks even when the adversarial audio waveform is 99.9% similar to a benign sample , propose a method to detect audio adversarial samples.
2023	ArXiv	LISTEN, THINK, AND UNDERSTAND	Yuan Gong	created a new OpenAQA-5M dataset consisting of 1.9 million closed-ended and 3.7 million open-ended, diverse (audio, question, answer) tuples, and have used an autoregressive training framework with a perception-to-understanding curriculum. LTU demonstrates strong performance and generalization ability on conventional audio tasks such as classification and captioning , can greatly mitigate the hallucination issue.

Language Hallucination

Year	Source	Name	Author
2023	ArXiv	Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models	Yue Zhang
2022	ArXiv	Survey of Hallucination in Natural Language Generation	ZIWEI JI
2023	ArXiv	Theory of Hallucinations based on Equivariance	Hisaichi Shibata
2023	ArXiv	Cognitive Mirage: A Review of Hallucinations in Large Language Models	Hongbin Ye
2023	ArXiv	Factuality Challenges in the Era of Large Language Models	Isabelle Augenstein

Robotic & Agent Hallucination

Year	Source	Name	Author
2023	ArXiv	Learning Perceptual Hallucination for Multi-Robot Navigation in Narrow Hallways	Jin-Soo Park
2023	ArXiv	Robots That Ask For Help: Uncertainty Alignment for Large Language Model Planners	Allen Z. Ren
2021	ArXiv	Toward Agile Maneuvers in Highly Constrained Spaces: Learning from Hallucination	Xuesu Xiao
2023	ArXiv	LARGE LANGUAGE MODELS AS GENERALIZABLE POLICIES FOR EMBODIED TASKS	Andrew Szot
2023	ArXiv	CogAgent: A Visual Language Model for GUI Agents	Wenyi Hong

📊 Evaluation for AGI Hallucination

LLMs

Year	Source	Name	Author
2023	ArXiv	FACTSCORE: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation	Sewon Min
2023	ArXiv	Generating Benchmarks for Factuality Evaluation of Language Models	Dor Muhlgay
2022	ArXiv	Teaching models to express their uncertainty in words	Stephanie Lin

MLLMs

Year	Source	Name	Author
2023	ArXiv	A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity	Yejin Bang
2023	ArXiv	AMBER: An LLM-free Multi-dimensional Benchmark for MLLMs Hallucination Evaluation	Junyang Wang
2023	ArXiv	A Survey of Hallucination in “Large” Foundation Models	Vipula Rawte
2023	ArXiv	HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models	Junyi Li
2023	ArXiv	Evaluating Object Hallucination in Large Vision-Language Models	Yifan Li
2023	ArXiv	A Benchmark for General AI Assistants	Gr´egoire Mialon
2023	ArXiv	A Survey of Hallucination in “Large” Foundation Models	Vipula Rawte

Image-Text Hallucination

Year	Source	Name	Author	Content
2023	ArXiv	A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity	Yejin Bang

Video-Text Hallucination

Year	Source	Name	Author	Content
2023	ArXiv	Models See Hallucinations: Evaluating the Factuality in Video Captioning	Hui Liu
2023	ArXiv	VIDEO-CSR: COMPLEX VIDEO DIGEST CREATION FOR VISUAL-LANGUAGE MODELS	Tingkai Liu

3D Hallucination

Year	Source	Name	Author	Content
2023	ArXiv	Evaluating VLMs for Score-Based, Multi-Probe Annotation of 3D Objects	Rishabh Kabra

Audio Hallucination

Year	Source	Name	Author	Content
2020	ACL	Asking and Answering Questions to Evaluate the Factual Consistency of Summaries	Alex Wang

🍬 Discourse for AGI Hallucination

Mitigating hallucinations is essential in AGI eras, it is also important to notice that not all such occurrences are detrimental. In some scenarios, hallucinations can induce the model's creativity. Striking a balance between hallucination and creation is a crucial challenge.

Text Hallucination

Year	Source	Name	Author
2023	ArXiv	LLM LIES: HALLUCINATIONS ARE NOT BUGS, BUT FEATURES AS ADVERSARIAL EXAMPLES	Jia-Yu Yao
2023	IEEE	Intentional Biases in LLM Responses	Nicklaus Badyal
2023	ArXiv	User-Controlled Knowledge Fusion in Large Language Models: Balancing Creativity and Hallucination	Chen Zhang
2022	ArXiv	Embedding Hallucination for Few-Shot Language Fine-tuning	Yiren Jian

Image-Text Hallucination

Year	Source	Name	Author
2023	ArXiv	Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization	Zhiyuan Zhao
2023	ArXiv	Iterative Teaching by Data Hallucination	Zeju Qiu
2023	ArXiv	Hallucination Improves the Performance of Unsupervised Visual Representation Learning	Jing Wu
2023	ArXiv	Scene Graph as Pivoting: Inference-time Image-free Unsupervised Multimodal Machine Translation with Visual Scene Hallucination	Hao Fei

Video-Text Hallucination

Year	Source	Name	Author	Content
2023	ArXiv	Putting People in Their Place: Affordance-Aware Human Insertion into Scenes	Sumith Kulal	This paper proposes the method of inserting characters into scenes, enabling the model to generate both character and scene hallucinations, resulting in compositions that are both harmonious and creative.
2023	ArXiv	Multi-Object Tracking with Hallucinated and Unlabeled Videos	Daniel McKee

Audio Hallucination

Year	Source	Name	Author	Content
2023	ArXiv	HALLUAUDIO: HALLUCINATE FREQUENCY AS CONCEPTS FOR FEW-SHOT AUDIO CLASSIFICATION	Zhongjie Yu

Robotic & Agent Hallucination

Year	Source	Name	Author
2021	ArXiv	Agile Robot Navigation through Hallucinated Learning and Sober Deployment	Xuesu Xiao
2021	ArXiv	From Agile Ground to Aerial Navigation: Learning from Learned Hallucination	Zizhao Wang
2023	ArXiv	HaLP: Hallucinating Latent Positives for Skeleton-based Self-Supervised Learning of Actions	Anshul Shah

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Index.md

Index.md

📖 Definition for AGI Hallucination

Conflict in Intrinsic Knowledge of Models

Factual Conflict in Information Forgetting and Updating

Conflict in Multimodal Fusion

🌱 Emergence for AGI Hallucination

Training Data Distribution

Timeliness of Information

Ambiguity in Different Modalities

🛠️ Solution for AGI Hallucination

Language Hallucination

Video-Text Hallucination

3D Hallucination

Image-Text Hallucination

Robotic & Agent Hallucination

Video-Text Hallucination

3D Hallucination

Audio Hallucination

Language Hallucination

Robotic & Agent Hallucination

📊 Evaluation for AGI Hallucination

LLMs

MLLMs

Image-Text Hallucination

Video-Text Hallucination

3D Hallucination

Audio Hallucination

🍬 Discourse for AGI Hallucination

Text Hallucination

Image-Text Hallucination

Video-Text Hallucination

Audio Hallucination

Robotic & Agent Hallucination

Files

Index.md

Latest commit

History

Index.md

File metadata and controls

📖 Definition for AGI Hallucination

Conflict in Intrinsic Knowledge of Models

Factual Conflict in Information Forgetting and Updating

Conflict in Multimodal Fusion

🌱 Emergence for AGI Hallucination

Training Data Distribution

Timeliness of Information

Ambiguity in Different Modalities

🛠️ Solution for AGI Hallucination

Language Hallucination

Video-Text Hallucination

3D Hallucination

Image-Text Hallucination

Robotic & Agent Hallucination

Video-Text Hallucination

3D Hallucination

Audio Hallucination

Language Hallucination

Robotic & Agent Hallucination

📊 Evaluation for AGI Hallucination

LLMs

MLLMs

Image-Text Hallucination

Video-Text Hallucination

3D Hallucination

Audio Hallucination

🍬 Discourse for AGI Hallucination

Text Hallucination

Image-Text Hallucination

Video-Text Hallucination

Audio Hallucination

Robotic & Agent Hallucination