CheckRLM: Effective Knowledge-Thought Coherence Checking in Retrieval-Augmented Reasoning

Dingling Xu¹, Ruobing Wang², Qingfei Zhao², Yukun Yan³, Zhichun Wang¹, Daren Zha², Shi Yu³, Zhenghao Liu⁴, Shuo Wang³, Xu Han³, Maosong Sun³

¹Beijing Normal University, ² Institute of Information Engineering, Chinese Academy of Sciences, ³Tsinghua University, ⁴Northeastern University

📖 Introduction

We propose CheckRLM, a framework that employs Retrieval-Augmented Generation (RAG) to promptly identify and correct factual errors within long reasoning chains, thereby aligning them with external knowledge. CheckRLM comprises two components: in-process knowledge claim recognition and localized knowledge coherence correction via retrieval.

⚙️ Installation

Environment Setup

conda create --name checkrlm python=3.12
conda activate checkrlm

git clone https://github.com/AI9Stars/CheckRLM.git
cd CheckRLM
pip install -r requirement.txt

Elasticsearch Setup

Install Elasticsearch for BM25 retrieval:

wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.10.2-linux-x86_64.tar.gz
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.10.2-linux-x86_64.tar.gz.sha512
shasum -a 512 -c elasticsearch-7.10.2-linux-x86_64.tar.gz.sha512
tar -xzf elasticsearch-7.10.2-linux-x86_64.tar.gz
cd elasticsearch-7.10.2/
./bin/elasticsearch # Start the server
pkill -f elasticsearch # To stop the server

📁 Dataset Preparation

We evaluate on five benchmarks: hotpotqa, 2wikimultihopqa, simpleqa, musique, and iirc. For hotpotqa, 2wikimultihopqa, musique, and iirc, the evaluation splits and retrieval corpora follow the IRCoT setup. For simpleqa, we randomly sample 500 instances from the official test set and retrieve against the KILT Wikipedia knowledge source.

All per-benchmark files are under data/{dataset_name}/, with subsampled questions in data/{dataset_name}/test_subsampled.jsonl.

To fetch retrieval corpora, run the following from the repository root (the script follows IRCoT-style sources and may install helpers such as gdown):

cd CheckRLM
bash src/download_corpus.sh

🛠️ Configuration

You can configure Project root, Dataset Parameters, Model Parameters, Retrieval Parameters and DPO Parameters in src/scripts/config.sh.

💾 Build Indices

For hotpotqa, 2wikimultihopqa, musique, and iirc:

BM25 retrieval based on Elasticsearch
Dense retrieval with FAISS index using embeddings from bge-large-en-v1.5

For simpleqa:

Dense retrieval with FAISS index using embeddings from bge-large-en-v1.5

BM25

cd src/scripts
bash run_build_index_bm25.sh

Dense Retrieval

cd src/scripts
bash run_build_index_embedding.sh

CheckRLM Inference

We implement four methods from the paper: Direct Reasoning, Vanilla RAG, Post-reasoning Check, and In-reasoning Check. In-reasoning Check achieves the strongest results in our experiments.

Decoding hyperparameters

Configure the reasoning and check models under src/config/reasoning_model.yaml and src/config/check_model.yaml respectively. The parameters of different reasoning models and check models in the experiments are as follows:

Reasoning backbone(s)	Temperature	top_p	top_k
Qwen3-8B, Qwen3-32B	0.7	0.8	20
QwQ-32B	0.6	0.95	40
DeepSeek-R1-Distill-Llama-70B	0.6	0.95	−1

Check backbone(s)	Temperature	top_p	top_k
Qwen3-8B	0.7	0.8	20
Qwen2.5-14B-Instruct, Qwen2.5-32B-Instruct, Llama-3.3-70B-Instruct	0	0.95	−1

Direct Reasoning

bash src/scripts/run_base.sh

Vanilla RAG

bash src/scripts/run_vanilla.sh

Post-reasoning Check

bash src/scripts/run_check_think_offline.sh

In-reasoning Check

bash src/scripts/run_check_think_online.sh

DPO Training (Optional)

Training Data Construction

Before running the data-construction scripts, edit src/config/check_model.yaml and set temperature and top_p to lists of numeric values.

bash src/scripts/gen_dpo_data.sh

Training

bash src/scripts/train_dpo.sh

We also provide our DPO training data and Qwen-2.5-14B-Instruct_DPO model.

📄 Acknowledgement

We acknowledge the following open-source projects that informed our code:

🥰 Citation

@inproceedings{xu-etal-2026-checkrlm,
    title = "{C}heck{RLM}: Effective Knowledge{--}Thought Coherence Checking in Retrieval-Augmented Reasoning",
    author = "Xu, Dingling  and
      Wang, Ruobing  and
      Zhao, Qingfei  and
      Yan, Yukun  and
      Wang, Zhichun  and
      Zha, Daren  and
      Yu, Shi  and
      Liu, Zhenghao  and
      Wang, Shuo  and
      Han, Xu  and
      Sun, Maosong",
    editor = "Liakata, Maria  and
      Moreira, Viviane P.  and
      Zhang, Jiajun  and
      Jurgens, David",
    booktitle = "Proceedings of the 64th Annual Meeting of the {A}ssociation for {C}omputational {L}inguistics (Volume 1: Long Papers)",
    month = jul,
    year = "2026",
    address = "San Diego, California, United States",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2026.acl-long.1780/",
    doi = "10.18653/v1/2026.acl-long.1780",
    pages = "38403--38426",
    ISBN = "979-8-89176-390-6",
    abstract = "Reasoning Language Models (RLMs) have significantly improved performance on complex tasks by extending the reasoning chain. However, these chains are prone to containing factual errors, particularly in knowledge-intensive tasks. To address this issue, we propose **CheckRLM**, a framework that improves the reliability of the reasoning process through Retrieval-Augmented Generation (RAG) by timely checking and correcting factual errors. Specifically, CheckRLM extracts factual claims from the reasoning chain to identify and localize subtle knowledge inconsistencies during inference. Upon detection of errors, a refinement mechanism performs minimal-cost yet precise corrections by leveraging external knowledge, ensuring coherence between the reasoning chain and correct knowledge. Extensive experiments demonstrate that CheckRLM substantially outperforms existing baselines, exhibiting a strong capability to mitigate error accumulation in long-horizon reasoning with lower costs. The code and data are available at https://github.com/AI9Stars/CheckRLM."
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
assets		assets
data		data
src		src
README.md		README.md
requirement.txt		requirement.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CheckRLM: Effective Knowledge-Thought Coherence Checking in Retrieval-Augmented Reasoning

📖 Introduction

⚙️ Installation

Environment Setup

Elasticsearch Setup

📁 Dataset Preparation

🛠️ Configuration

💾 Build Indices

BM25

Dense Retrieval

CheckRLM Inference

Decoding hyperparameters

Direct Reasoning

Vanilla RAG

Post-reasoning Check

In-reasoning Check

DPO Training (Optional)

Training Data Construction

Training

📄 Acknowledgement

🥰 Citation

⭐ Star History

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

CheckRLM: Effective Knowledge-Thought Coherence Checking in Retrieval-Augmented Reasoning

Dingling Xu1, Ruobing Wang2, Qingfei Zhao2, Yukun Yan3, Zhichun Wang1, Daren Zha2, Shi Yu3, Zhenghao Liu4, Shuo Wang3, Xu Han3, Maosong Sun3 1Beijing Normal University, 2 Institute of Information Engineering, Chinese Academy of Sciences, 3Tsinghua University, 4Northeastern University

📖 Introduction

⚙️ Installation

Environment Setup

Elasticsearch Setup

📁 Dataset Preparation

🛠️ Configuration

💾 Build Indices

BM25

Dense Retrieval

CheckRLM Inference

Decoding hyperparameters

Direct Reasoning

Vanilla RAG

Post-reasoning Check

In-reasoning Check

DPO Training (Optional)

Training Data Construction

Training

📄 Acknowledgement

🥰 Citation

⭐ Star History

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages