Large language models (LLMs) often hallucinate by generating factually incorrect or unfaithful content, posing significant risks to their safe use. Detecting such hallucinations is particularly challenging under the zero-source constraint, where no model internals or external references are available, and detection must rely solely on the textual query–answer pair. In this paper, we propose Human-like Criteria Probing for Hallucination Detection (HCPD), a paradigm that emulates the multi-faceted reasoning of human evaluators. Its core is an Human-like Criteria Probing (HCP) mechanism, in which an LLM agent adaptively decomposes its judgment into a weighted set of interpretable criteria and aggregates criterion-specific scores into a final truthfulness measure. To achieve this adaptive capability, we introduce a reward-based alignment scheme using only weak supervision from semantic consistency. At inference, we employ a multi-sampling aggregation strategy to ensures robust decisions while preserving full interpretability. We further provide theoretical analysis supporting the reliability of our approach. Extensive experiments show that HCPD consistently outperforms state-of-the-art baselines, offering an effective and explainable solution for zero-source hallucination detection.
- GPU: 2 × NVIDIA RTX GPUs with 80 GB memory
- CUDA: 12.4
- Python: 3.11
- PyTorch: 2.6.0
Create a virtual environment and install all required dependencies for training and evaluation.
bash setup.sh
conda activate HCPD- Dataset: We use four widely adopted QA benchmarks (TriviaQA, SciQ (train), NQ Open, and CoQA) to construct the hallucination detection datasets. The generated datasets can be obtained and stored in
./generated_datasetsby running the command below:
bash generate_datasets.sh- Pre-trained models: We adopt the Qwen2.5-7B-Instruct as the scoring agent and choose Llama-3.1-8B, Qwen3-8B as the evaluated target LLMs in the main experiments.
The datasets and pre-trained models will be automatically downloaded to ./.cache.
Alternatively, they can be downloaded manually from the corresponding official repositories. After downloading, please configure the MODEL_PATH in the run scripts.
Pretrained checkpoints are provided in Google Drive. The results can be quickly verified using the following bash scripts.
bash quick_validation.shTraining and evaluation pipelines are provided through the following bash scripts.
- TriviaQA:
bash run_TriviaQA.sh- SciQ:
bash run_SciQ.sh- NQ Open:
bash run_NQOpen.sh- CoQA:
bash run_CoQA.shOutput Directory:
Model checkpoints generated during training are saved to ./data_{metric}. Evaluation logs and test results are saved to ./logs.
If you find this work useful in your research, please consider citing:
@inproceedings{yang2026zerosource,
title={Zero-source {LLM} Hallucination Detection with Human-like Criteria Probing},
author={Jiahao Yang and Shuhai Zhang and Hailong Kang and Feng Liu and Qi Chen and Mingkui Tan},
booktitle={Forty-third International Conference on Machine Learning},
year={2026},
url={https://openreview.net/forum?id=s4Jn6bKYGI}
}