Skip to content

Latest commit

 

History

History
306 lines (196 loc) · 25.2 KB

Index.md

File metadata and controls

306 lines (196 loc) · 25.2 KB

📖 Definition for AGI Hallucination

Conflict in Intrinsic Knowledge of Models

Year Source Name Author Content Class
2024 ArXiv Hallucinations in Neural Automatic Speech Recognition: Identifying Errors and Hallucinatory Models Rita Frieske This paper define hallucinations in ASR as transcriptions generated by a model that are semantically unrelated to the source utterance, yet still fluent and coherent. Audio
2022 ACCV Thinking Hallucination for Video Captioning Nasib Ullah To alleviate Object Hallucination and Action Hallucination, COAHA (by this paper) is proposed to comprehensively assess the extent of these two types of hallucinations. Video
2019 ArXiv Object Hallucination in Image Captioning Anna Rohrbach This propose a new image relevance metric to evaluate current models with veridical visual labels and assess their rate of object hallucination. Image

Factual Conflict in Information Forgetting and Updating

Year Source Name Author Content Class
2024 ArXiv Survey on Factuality in Large Language Models: Knowledge, Retrieval and Domain-Specificity CUNXIANG WANG This paper define capability of large language models for generating contents that follow factual information, which encompasses commonsense, world knowledge and domain facts. Language
2017 ArXiv Overcoming catastrophic forgetting in neural networks James Kirkpatrick normal

Conflict in Multimodal Fusion

Year Source Name Author Content Class
2024 ArXiv AMBER: An LLM-free Multi-dimensional Benchmark for MLLMs Hallucination Evaluation Junyang Wang Image

🌱 Emergence for AGI Hallucination

Training Data Distribution

Year Source Name Author Content Class
2021 ArXiv How much do language models copy from their training data? Evaluating linguistic novelty in text generation using RAVEN R. Thomas McCoy Language
2023 ArXiv Explanation Shift: How Did the Distribution Shift Impact the Model? Carlos Mougan Language
2023 ArXiv Scaling Instruction-Finetuned Language Models Hyung Won Chung Language
2023 ArXiv HOW ABILITIES IN LARGE LANGUAGE MODELS ARE AFFECTED BY SUPERVISED FINE-TUNING DATA COMPOSITION Guanting Dong Language
2023 ArXiv On the Origin of Hallucinations in Conversational Models: Is it the Datasets or the Models? Nouha Dziri Language
2023 ArXiv Sources of Hallucination by Large Language Models on Inference Tasks Nick McKenna Language
2023 ArXiv LIMA: Less Is More for Alignment Chunting Zhou Language
2020 ArXiv OVERFITTING OR UNDERFITTING? UNDERSTAND ROBUSTNESS DROP IN ADVERSARIAL TRAINING Zichao Li Language
2020 ArXiv Data augmentation techniques for the Video Question Answering task Alex Falcon Video

Timeliness of Information

Year Source Name Author Content Class
2024 ArXiv MLLM-Protector: Ensuring MLLM’s Safety without Hurting Performance Renjie Pi Image
2023 ArXiv Investigating the Catastrophic Forgetting in Multimodal Large Language Models Yuexiang Zhai Image
2023 ArXiv Speciality vs Generality: An Empirical Study on Catastrophic Forgetting in Fine-tuning Foundation Models Yong Lin Language
2023 ArXiv t-DGR: A Trajectory-Based Deep Generative Replay Method for Continual Learning in Decision Making William Yue Agent

Ambiguity in Different Modalities

Year Source Name Author Content Class
2024 ArXiv AMBER: An LLM-free Multi-dimensional Benchmark for MLLMs Hallucination Evaluation Junyang Wang Audio

🛠️ Solution for AGI Hallucination

Language Hallucination

Year Source Name Author Content
2022 ArXiv Training language models to follow instructions with human feedback Long Ouyang
2023 ArXiv Faithful Persona-based Conversational Dataset Generation with Large Language Models Pegah Jandaghi
2023 ArXiv SpecInfer: Accelerating Generative Large Language Model Serving with Speculative Inference and Token Tree Verification Xupeng Miao
2023 ArXiv Post Hoc Explanations of Language Models Can Improve Language Models Satyapriya Krishna
2023 ArXiv ARE LARGE LANGUAGE MODELS POST HOC EXPLAINERS? Nicholas Kroeger
2023 ArXiv TELL YOUR MODEL WHERE TO ATTEND: POST-HOC ATTENTION STEERING FOR LLMS Qingru Zhang
2023 ArXiv PERSONALIZED SOUPS: PERSONALIZED LARGE LANGUAGE MODEL ALIGNMENT VIA POST-HOC PARAMETER MERGING Joel Jang
2023 ArXiv KNOWLEDGE SOLVER: TEACHING LLMS TO SEARCH FOR DOMAIN KNOWLEDGE FROM KNOWLEDGE GRAPHS Chao Feng
2023 ArXiv Enhancing LLM Intelligence with ARM-RAG: Auxiliary Rationale Memory for Retrieval Augmented Generation Eric Melz
2023 ArXiv Entity-Augmented Code Generation Anton Shapkin

Video-Text Hallucination

Year Source Name Author Content
2023 ArXiv Videochat: Chat-centric video understanding. KunChang Li They utilize Detailed Video Descriptions to reduce hallucinations and introduce spatiotemporal reasoning, event localization, and causal relationship to enrich the semantic expression of video-text, setting a standard for future research.
2023 ArXiv Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding Peng Jin They have pioneered the integration of images and videos, proposing the concept of dynamic visual tokens, and employing both high-level semantic features and low-level visual detail features to propose a unified method.
2023 ArXiv Deficiency-Aware Masked Transformer for Video Inpainting Yongsheng Yu introduce a dual-modality-compatible inpainting framework called Deficiency-aware Masked Transformer (DMT),they pretrain a image inpainting model DMTimg serve as a prior for distilling the video model DMTvid, thereby benefiting the hallucination of deficiency cases.
2023 ArXiv Video-LLaMA An Instruction-tuned Audio-Visual Language Model for Video Understanding Hang Zhang This paper introduces a Video Q-former to better address the issue of inconsistencies in the temporal understanding of videos, and employs an Audio Q-former to further capture audio features. With the use of an adapter, multimodal and natural language integration is achieved, effectively mitigating the problem of hallucinations.
2023 ArXiv Unified Model for Image, Video, Audio and Language Tasks Mustafa Shukor This model efficiently pretrained on many tasks, based on task balancing and multimodal curriculum learning and they propose a novel study on multimodal model merging via weight interpolation of models trained on different multimodal tasks, showing their benefits in particular for out-ofdistribution generalization
2022 ArXiv Information-Theoretic Text Hallucination Reduction for Video-grounded Dialogue Sunjae Yoon This paper proposes a THR regularization loss to alleviate hallucinations, mitigating the impact of feature-level hallucinations by essentially reducing mutual information at the level of text features and image feature information.
2023 AAAI Knowledge-Constrained Answer Generation for Open-Ended Video Question Answering Yao Jin
2023 ArXiv Woodpecker: Hallucination Correction for Multimodal Large Language Models Shukang Yin

3D Hallucination

Year Source Name Author Content
2019 CVPR Learning 3D Human Dynamics from Video Angjoo Kanazawa learn a representation of 3D dynamics of humans from video via a simple but effective temporal encoding of image features which can reduce hallucinations
2023 ArXiv 3D-LLM: Injecting the 3D World into Large Language Models Yining Hong 3D-LLMs can take 3D point clouds and their features as input and perform a diverse set of 3D-related tasks, including captioning, dense captioning, 3D question answering, task decomposition, 3D grounding, 3D-assisted dialog, navigation, and so on.
2023 ArXiv M3DBench: Let’s Instruct Large Models with Multi-modal 3D Prompts Mingsheng Li introduce a comprehensive 3D instruction following dataset called M3DBench, It supports general multimodal instructions interleaved with text, images, 3D objects, and other visual prompts. It unifies diverse 3D tasks at both region and scene levels, covering a variety of fundamental abilities in real-world 3D environments.It is a large-scale 3D instruction-following dataset with over 320k instruction-response pairs.

Image-Text Hallucination

Year Source Name Author Content
2023 ArXiv ALIGNING LARGE MULTIMODAL MODELS WITH FACTUALLY AUGMENTED RLHF Zhiqing Sun
2023 ArXiv Mitigating Fine-Grained Hallucination by Fine-Tuning Large Vision-Language Models with Caption Rewrites Lei Wang
2023 ArXiv Improved Baselines with Visual Instruction Tuning Haotian Liu
2017 ArXiv SmoothGrad: removing noise by adding noise Daniel Smilkov
2023 ArXiv Mitigating Hallucination in Visual Language Models with Visual Supervision Zhiyang Chen
2023 ArXiv ANALYZING AND MITIGATING OBJECT HALLUCINATION IN LARGE VISION-LANGUAGE MODELS Yiyang Zhou
2023 ArXiv Multimodal Entity Tagging with Multimodal Knowledge Base Hao Peng
2023 ArXiv Boosting Entity-aware Image Captioning with Multi-modal Knowledge Graph Wentian Zhao
2023 ArXiv RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback Tianyu Yu
2023 ArXiv Learning by Hallucinating: Vision-Language Pre-training with Weak Supervision Tzu-Jui Julius Wang
2023 ArXiv MLLM-Protector: Ensuring MLLM’s Safety without Hurting Performance Renjie Pi
2023 ArXiv USING HUMAN FEEDBACK TO FINE-TUNE DIFFUSION MODELS WITHOUT ANY REWARD MODEL Kai Yang

Robotic & Agent Hallucination

Year Source Name Author Content
2023 AAAI Certified Policy Smoothing for Cooperative Multi-Agent Reinforcement Learning Ronghui Mu
2023 ArXiv Audio Visual Language Maps for Robot Navigation Chenguang Huang

Video-Text Hallucination

Year Source Name Author Content
2022 ACCV Thinking Hallucination for Video Captioning Nasib Ullah To alleviate Object Hallucination and Action Hallucination, COAHA (by this paper) is proposed to comprehensively assess the extent of these two types of hallucinations.
2022 ArXiv Audio-visual video face hallucination with frequency supervision and cross modality support by speech based lip reading loss Shailza Sharma
2022 ArXiv Efficient Human Vision Inspired Action Recognition Using Adaptive Spatiotemporal Sampling Khoi-Nguyen C. Mac pre-scans the global scene context at low-resolution and decides to skip or request high-resolution features at salient regions for further processing. Based on a pre-scanned features, the temporal sampler decides whether to process the frame fully (Full model), or skip to the frame and propagate past information (bottom block). The spatial sampler in turns select RoIs from high-res input to augment the features with low-res inputs.
2022 ArXiv Video Question Answering: Datasets, Algorithms and Challenges Yaoyao Zhong fine-grained to coarsegrained in both temporal and spatial domains , information from noisy web-scale visiontext data , multi-step reasoning
2023 ArXiv RETRIEVAL-BASED VIDEO LANGUAGE MODEL FOR EFFICIENT LONG VIDEO QUESTION ANSWERING Jiaqi Xu long video and long text can introduces noise to the video QA process

3D Hallucination

Year Source Name Author Content
2022 ACCV PoseTriplet: Co-evolving 3D Human Pose Estimation, Imitation, and Hallucination under Self-supervision Kehong Gong
2014 IEEE 3D Face Hallucination from a Single Depth Frame Shu Liang
2022 AAAI Texture Generation Using Dual-Domain Feature Flow with Multi-View Hallucinations Seunggyu Chang
2023 ArXiv M3DBench: Let’s Instruct Large Models with Multi-modal 3D Prompts Jiaqi Xu

Audio Hallucination

Year Source Name Author Content
2022 IEEE Hallucination of Speech Recognition Errors With Sequence to Sequence Learning Prashant Serai They present novel end-to-end models to directly predict hallucinated ASR word sequence outputs, conditioning on an input word sequence as well as a corresponding phoneme sequence.
2023 ArXiv PARAMETER EFFICIENT AUDIO CAPTIONING WITH FAITHFUL GUIDANCE USING AUDIO-TEXT SHARED LATENT REPRESENTATION Arvind Krishna Sridhar propose a data augmentation technique for generating hallucinated audio captions and show that similarity based on an audio-text shared latent space is suitable for detecting hallucination. and propose a parameter efficient inference time faithful decoding algorithm that enables smaller audio captioning models with performance equivalent to larger models trained with more data
2023 ArXiv Factual Consistency Oriented Speech Recognition Naoyuki Kanda This paper presents a novel optimization framework for automatic speech recognition (ASR) with the aim of reducing hallucinations produced by an ASR model. The proposed framework optimizes the ASR model to maximize an expected factual consistency score between ASR hypotheses and groundtruth transcriptions, where the factual consistency score is computed by a separately trained estimator.
2023 ArXiv LP-MusicCaps: LLM-BASED PSEUDO MUSIC CAPTIONING SeungHeon Doh
2020 ArXiv Identifying Audio Adversarial Examples via Anomalous Patern Detection Victor Akinwande Audio processing models based on deep neural networks are susceptible to adversarial attacks even when the adversarial audio waveform is 99.9% similar to a benign sample , propose a method to detect audio adversarial samples.
2023 ArXiv LISTEN, THINK, AND UNDERSTAND Yuan Gong created a new OpenAQA-5M dataset consisting of 1.9 million closed-ended and 3.7 million open-ended, diverse (audio, question, answer) tuples, and have used an autoregressive training framework with a perception-to-understanding curriculum. LTU demonstrates strong performance and generalization ability on conventional audio tasks such as classification and captioning , can greatly mitigate the hallucination issue.

Language Hallucination

Year Source Name Author Content
2023 ArXiv Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models Yue Zhang
2022 ArXiv Survey of Hallucination in Natural Language Generation ZIWEI JI
2023 ArXiv Theory of Hallucinations based on Equivariance Hisaichi Shibata
2023 ArXiv Cognitive Mirage: A Review of Hallucinations in Large Language Models Hongbin Ye
2023 ArXiv Factuality Challenges in the Era of Large Language Models Isabelle Augenstein

Robotic & Agent Hallucination

Year Source Name Author Content
2023 ArXiv Learning Perceptual Hallucination for Multi-Robot Navigation in Narrow Hallways Jin-Soo Park
2023 ArXiv Robots That Ask For Help: Uncertainty Alignment for Large Language Model Planners Allen Z. Ren
2021 ArXiv Toward Agile Maneuvers in Highly Constrained Spaces: Learning from Hallucination Xuesu Xiao
2023 ArXiv LARGE LANGUAGE MODELS AS GENERALIZABLE POLICIES FOR EMBODIED TASKS Andrew Szot
2023 ArXiv CogAgent: A Visual Language Model for GUI Agents Wenyi Hong

📊 Evaluation for AGI Hallucination

LLMs

Year Source Name Author Content
2023 ArXiv FACTSCORE: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation Sewon Min
2023 ArXiv Generating Benchmarks for Factuality Evaluation of Language Models Dor Muhlgay
2022 ArXiv Teaching models to express their uncertainty in words Stephanie Lin

MLLMs

Year Source Name Author Content
2023 ArXiv A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity Yejin Bang
2023 ArXiv AMBER: An LLM-free Multi-dimensional Benchmark for MLLMs Hallucination Evaluation Junyang Wang
2023 ArXiv A Survey of Hallucination in “Large” Foundation Models Vipula Rawte
2023 ArXiv HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models Junyi Li
2023 ArXiv Evaluating Object Hallucination in Large Vision-Language Models Yifan Li
2023 ArXiv A Benchmark for General AI Assistants Gr´egoire Mialon
2023 ArXiv A Survey of Hallucination in “Large” Foundation Models Vipula Rawte

Image-Text Hallucination

Year Source Name Author Content
2023 ArXiv A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity Yejin Bang

Video-Text Hallucination

Year Source Name Author Content
2023 ArXiv Models See Hallucinations: Evaluating the Factuality in Video Captioning Hui Liu
2023 ArXiv VIDEO-CSR: COMPLEX VIDEO DIGEST CREATION FOR VISUAL-LANGUAGE MODELS Tingkai Liu

3D Hallucination

Year Source Name Author Content
2023 ArXiv Evaluating VLMs for Score-Based, Multi-Probe Annotation of 3D Objects Rishabh Kabra

Audio Hallucination

Year Source Name Author Content
2020 ACL Asking and Answering Questions to Evaluate the Factual Consistency of Summaries Alex Wang

🍬 Discourse for AGI Hallucination

Mitigating hallucinations is essential in AGI eras, it is also important to notice that not all such occurrences are detrimental. In some scenarios, hallucinations can induce the model's creativity. Striking a balance between hallucination and creation is a crucial challenge.

Text Hallucination

Year Source Name Author Content
2023 ArXiv LLM LIES: HALLUCINATIONS ARE NOT BUGS, BUT FEATURES AS ADVERSARIAL EXAMPLES Jia-Yu Yao
2023 IEEE Intentional Biases in LLM Responses Nicklaus Badyal
2023 ArXiv User-Controlled Knowledge Fusion in Large Language Models: Balancing Creativity and Hallucination Chen Zhang
2022 ArXiv Embedding Hallucination for Few-Shot Language Fine-tuning Yiren Jian

Image-Text Hallucination

Year Source Name Author Content
2023 ArXiv Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization Zhiyuan Zhao
2023 ArXiv Iterative Teaching by Data Hallucination Zeju Qiu
2023 ArXiv Hallucination Improves the Performance of Unsupervised Visual Representation Learning Jing Wu
2023 ArXiv Scene Graph as Pivoting: Inference-time Image-free Unsupervised Multimodal Machine Translation with Visual Scene Hallucination Hao Fei

Video-Text Hallucination

Year Source Name Author Content
2023 ArXiv Putting People in Their Place: Affordance-Aware Human Insertion into Scenes Sumith Kulal This paper proposes the method of inserting characters into scenes, enabling the model to generate both character and scene hallucinations, resulting in compositions that are both harmonious and creative.
2023 ArXiv Multi-Object Tracking with Hallucinated and Unlabeled Videos Daniel McKee

Audio Hallucination

Year Source Name Author Content
2023 ArXiv HALLUAUDIO: HALLUCINATE FREQUENCY AS CONCEPTS FOR FEW-SHOT AUDIO CLASSIFICATION Zhongjie Yu

Robotic & Agent Hallucination

Year Source Name Author Content
2021 ArXiv Agile Robot Navigation through Hallucinated Learning and Sober Deployment Xuesu Xiao
2021 ArXiv From Agile Ground to Aerial Navigation: Learning from Learned Hallucination Zizhao Wang
2023 ArXiv HaLP: Hallucinating Latent Positives for Skeleton-based Self-Supervised Learning of Actions Anshul Shah