Skip to content

TRISKEL10N/HCPD

Repository files navigation


Zero-source LLM Hallucination Detection with Human-like Criteria Probing

Static Badge

Jiahao Yang, Shuhai Zhang, Hailong Kang, Feng Liu, Qi Chen, Mingkui Tan

NSG-VD

✨ Abstract

Large language models (LLMs) often hallucinate by generating factually incorrect or unfaithful content, posing significant risks to their safe use. Detecting such hallucinations is particularly challenging under the zero-source constraint, where no model internals or external references are available, and detection must rely solely on the textual query–answer pair. In this paper, we propose Human-like Criteria Probing for Hallucination Detection (HCPD), a paradigm that emulates the multi-faceted reasoning of human evaluators. Its core is an Human-like Criteria Probing (HCP) mechanism, in which an LLM agent adaptively decomposes its judgment into a weighted set of interpretable criteria and aggregates criterion-specific scores into a final truthfulness measure. To achieve this adaptive capability, we introduce a reward-based alignment scheme using only weak supervision from semantic consistency. At inference, we employ a multi-sampling aggregation strategy to ensures robust decisions while preserving full interpretability. We further provide theoretical analysis supporting the reliability of our approach. Extensive experiments show that HCPD consistently outperforms state-of-the-art baselines, offering an effective and explainable solution for zero-source hallucination detection.

⚙️ Requirements

  • GPU: 2 × NVIDIA RTX GPUs with 80 GB memory
  • CUDA: 12.4
  • Python: 3.11
  • PyTorch: 2.6.0

💡 Virtual Environment

Create a virtual environment and install all required dependencies for training and evaluation.

bash setup.sh
conda activate HCPD

📂 Data and Pre-trained Models

  • Dataset: We use four widely adopted QA benchmarks (TriviaQA, SciQ (train), NQ Open, and CoQA) to construct the hallucination detection datasets. The generated datasets can be obtained and stored in ./generated_datasets by running the command below:
bash generate_datasets.sh

The datasets and pre-trained models will be automatically downloaded to ./.cache. Alternatively, they can be downloaded manually from the corresponding official repositories. After downloading, please configure the MODEL_PATH in the run scripts.

🚀 Quick Start

Pretrained checkpoints are provided in Google Drive. The results can be quickly verified using the following bash scripts.

bash quick_validation.sh

▶️ Main Experiments

Training and evaluation pipelines are provided through the following bash scripts.

  • TriviaQA:
bash run_TriviaQA.sh
  • SciQ:
bash run_SciQ.sh
  • NQ Open:
bash run_NQOpen.sh
  • CoQA:
bash run_CoQA.sh

Output Directory: Model checkpoints generated during training are saved to ./data_{metric}. Evaluation logs and test results are saved to ./logs.

📖 Citation

If you find this work useful in your research, please consider citing:

@inproceedings{yang2026zerosource,
  title={Zero-source {LLM} Hallucination Detection with Human-like Criteria Probing},
  author={Jiahao Yang and Shuhai Zhang and Hailong Kang and Feng Liu and Qi Chen and Mingkui Tan},
  booktitle={Forty-third International Conference on Machine Learning},
  year={2026},
  url={https://openreview.net/forum?id=s4Jn6bKYGI}
}

About

[ICML 2026] "Zero-source LLM Hallucination Detection with Human-like Criteria Probing"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors