Skip to content

AmourWaltz/Reliable-LLM

Repository files navigation

Reliable LLM: Hallucination & Knowledge & Uncertainty (From Factuality Perception to Expression)


Awesome Made With Love License github license

Introduction

The project demonstrates the background about LLM hallucination 👻 as well as the mitigation methods regarding uncertainty 🤔 & knowledge 📓. The research works are collected and systematically clustered in various directions and methods for reliable AI development. The project provides a framework of improving LLMs' factuality perception and eliciting factual expressions to address the hallucination issue.

Welcome to participate in this project to share valuable papers and exchange great ideas!

Outline

👻 Hallucination & Factuality

Definition of LLM Hallucination

The definitions of hallucination vary and depend on specific tasks. This project focuses on hallucination issues in knowledge-intensive tasks (closed-book QA, dialogue, RAG, commonsense reasoning, translation, etc.), where hallucinations refer to the non-factual, incorrect knowledge in generations unfaithful with world knowledge.

Causes of LLM Hallucination

The causes of hallucinations vary in unfiltered incorrect statements in pertaining data, limited input length of model architecture, maximum likelihood training strategy, and diverse decoding strategies.

Architectures and input lengths, pertaining data and strategy of released LLMs are fixed. Tracing incorrect texts in substantial pertaining data is challenging. This project mainly focuses on detecting hallucinations by tracing what LLMs learn in the pertaining stage and mitigating hallucinations in fine-tuning and decoding.

Comparing open-generation tasks, knowledge-intensive tasks have specific grounding-truth reference - world knowledge. Therefore, we can estimate the knowledge boundary map of an LLM to specify what it knows. It is crucial to ensure the certainty level or honesty of LLMs to a piece of factual knowledge for hallucination detection (from grey area to green area).

📓 LLM Knowledge

The above diagram can roughly and simply represent the knowledge boundary. However, in reality, like humans, for much knowledge, we exist in a state of uncertainty, rather than only in a state of knowing or not knowing. Moreover, maximum likelihood prediction in pertaining makes LLMs be prone to generate over-confident responses. Even if the LLM knows a fact, how to make LLMs accurately tell what they know is also important.

This adds complexity to determining the knowledge boundary, which leads to two challenging questions:

  1. How to accurately perceive (Perception) the knowledge boundary?

    (Example: Given a question, such as "What is the capital of France?", the model is required to provide its confidence level for this question.)

  2. How to accurately express (Expression) knowledge where the boundary is somewhat vague? (Previous work U2Align is a method to enhance expressions. Current interests for the second stage “expression” also lie in “alignment” methods.)

    (Example: If the confidence level for answering "Paris" to the above question is 40%, should the model refuse to answer or provide a response in this situation?)

🤔 Uncertainty Estimation

Traditional Model Calibration

  • Models are prone to be over-confident in predictions using maximizing likelihood (MLE) training, it is crucial to identify the confidence score or uncertainty estimation for reliable AI applications.
  • A model is considered well-calibrated if the confidence score of predictions (SoftMax probability) are well-aligned with the actual probability of answers being correct.
  • Expected Calibration Error (ECE) and Reliability Diagram is used to measure the calibration performance.

Uncalibrated (left), over-confident (mid) and well-calibrated (right) models.

Uncertainty Estimation of Generative Models

  • To calibrate generative LLMs, we should quantify the confidence & uncertainty on generated sentences.
  • Uncertainty: Categorized into aleatoric (data) and epistemic (model) uncertainty. Frequently measured by the entropy of the prediction to indicate the dispersion of the model prediction.
  • Confidence: Generally associated with both the input and the prediction.
  • The terms uncertainty and confidence are often used interchangeably.

Although the knowledge boundary is important for knowledge-intensive tasks, there are no specific definitions or concepts in previous works. Current methods for estimating knowledge boundaries refer to confidence/uncertainty estimation methods including ① logit-based methods using token-level probabilities; ② prompt-based methods to make LLMs express confidence in words; ③ sampling-based methods to calculate consistency; and ④ training-based methods to learn the ability to express uncertainty.





Related Works of Hallucination & Knowledge & Uncertainty

👻 Hallucination & Factuality

Hallucination Detection

Consistency-based Detection

Title Conference/Journal
SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models EMNLP 2023
RCOT: Detecting and Rectifying Factual Inconsistency in Reasoning by Reversing Chain-of-Thought prePrint

Internal State based Detection

Title Conference/Journal
The Internal State of an LLM Knows When It's Lying prePrint
Unsupervised Real-Time Hallucination Detection based on the Internal States of Large Language Models prePrint
On the Universal Truthfulness Hyperplane Inside LLMs prePrint
INSIDE: LLMs' Internal States Retain the Power of Hallucination Detection prePrint
LLM Internal States Reveal Hallucination Risk Faced With a Query prePrint
Discovering Latent Knowledge in Language Models Without Supervision prePrint

📓 LLM Knowledge

Knowledge Boundary

Title Conference/Journal
Knowledge of Knowledge: Exploring Known-Unknowns Uncertainty with Large Language Models prePrint
Can AI Assistants Know What They Don’t Know? prePrint
Do Large Language Models Know What They Don't Know? prePrint
Investigating the Factual Knowledge Boundary of Large Language Models with Retrieval Augmentation EMNLP 2023
Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations? prePrint

🤔 Uncertainty Estimation

Survey & Investigation

Title Conference/Journal
A Survey of Confidence Estimation and Calibration in Large Language Models prePrint
Uncertainty Quantification with Pre-trained Language Models: A Large-Scale Empirical Analysis EMNLP 2022
Uncertainty Estimation and Quantification for LLMs: A Simple Supervised Approach prePrint
Confidence Under the Hood: An Investigation into the Confidence-Probability Alignment in Large Language Models prePrint
Large Language Models Must Be Taught to Know What They Don’t Know prePrint

Uncertainty Quantification

Title Conference/Journal
Language Models (Mostly) Know What They Know prePrint
Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation in Natural Language Generation ICLR 2023
Generating with Confidence: Uncertainty Quantification for Black-box Large Language Models prePrint
When Quantization Affects Confidence of Large Language Models? prePrint
Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs ICLR 2024
Kernel Language Entropy: Fine-grained Uncertainty Quantification for LLMs from Semantic Similarities prePrint
Semantically Diverse Language Generation for Uncertainty Estimation in Language Models prePrint
Uncertainty is Fragile: Manipulating Uncertainty in Large Language Models prePrint

Linguistic Uncertainty Expressions

Title Conference/Journal
Navigating the Grey Area: Expressions of Overconfidence and Uncertainty in Language Models EMNLP 2023
Teaching Models to Express Their Uncertainty in Words TMLR 2022
Relying on the Unreliable: The Impact of Language Models’ Reluctance to Express Uncertainty prePrint
"I'm Not Sure, But...": Examining the Impact of Large Language Models' Uncertainty Expression on User Reliance and Trust FAccT 2024
Can Large Language Models Faithfully Express Their Intrinsic Uncertainty in Words? prePrint

Confidence Expressions Improvements

This part of works focus on improving confidence expressions of LLMs in a two-stage form by 1) self-prompting LLMs to generate responses to queries and then collecting the samples to construct a dataset with specific features, and 2) fine-tuning LLMs on the collected dataset to improve the specific capability of LLMs.

Title Conference/Journal
Enhancing Confidence Expression in Large Language Models Through Learning from Past Experience prePrint
Improving the Reliability of Large Language Models by Leveraging Uncertainty-Aware In-Context Learning prePrint
Uncertainty in Language Models: Assessment through Rank-Calibration prePrint
SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales prePrint
Linguistic Calibration of Language Models prePrint
R-Tuning: Instructing Large Language Models to Say ‘I Don’t Know’ prePrint

Hallucination Detection by Uncertainty

Title Conference/Journal
On Hallucination and Predictive Uncertainty in Conditional Language Generation EACL 2021
Learning Confidence for Transformer-based Neural Machine Translation ACL 2022
Towards Reliable Misinformation Mitigation: Generalization, Uncertainty, and GPT-4 EMNLP 2023
SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models EMNLP 2023
Detecting Hallucinations in Large Language Models using Semantic Entropy Nature
LLM Internal States Reveal Hallucination Risk Faced With a Query prePrint

Factuality Alignment by Confidence

Title Conference/Journal
When to Trust LLMs: Aligning Confidence with Response Quality prePrint
Fine-tuning Language Models for Factuality ICLR 2024
Uncertainty Aware Learning for Language Model Alignment ACL 2024
FLAME: Factuality-Aware Alignment for Large Language Models prePrint
Learning to Trust Your Feelings: Leveraging Self-awareness in LLMs for Hallucination Mitigation prePrint
Self-Alignment for Factuality: Mitigating Hallucinations in LLMs via Self-Evaluation ACL 2024

Generative Model Calibration

Title Conference/Journal
Reducing Conversational Agents’ Overconfidence Through Linguistic Calibration TACL 2022
Preserving Pre-trained Features Helps Calibrate Fine-tuned Language Models ICLR 2023
Calibrating the Confidence of Large Language Models by Eliciting Fidelity prePrint
Few-Shot Recalibration of Language Models prePrint
How Can We Know When Language Models Know? On the Calibration of Language Models for Question Answering TACL 2022
Knowing More About Questions Can Help: Improving Calibration in Question Answering ACL 2021 Findings
Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human Feedback EMNLP 2023
Re-Examining Calibration: The Case of Question Answering TACL 2021
Calibrating Large Language Models Using Their Generations Only prePrint
Calibrating Large Language Models with Sample Consistency prePrint
Linguistic Calibration of Language Models prePrint


🔭 Future Directions

  1. More advanced methods to assist LLMs hallucination detection and human decisions. (A new paradigm)
  2. Confidence estimation for long-term generations like code, novel, etc. (Benchmark)
  3. Learning to explain and clarify its confidence estimation and calibration. (Natural language)
  4. Calibration on human variation (Misalignment between LM measures and human disagreement).
  5. Confidence estimation and calibration for multi-modal LLMs.