ChatR1: Reinforcement Learning for Conversational Reasoning and Retrieval Augmented Question Answering

Installation

ChatR1 environment

conda create -n chatr1 python=3.9
conda activate chatr1

pip install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu121
pip3 install vllm==0.6.3

# verl
pip install -e .

# flash attention 2
pip3 install flash-attn --no-build-isolation
pip install wandb

Retriever environment

conda create -n retriever python=3.10
conda activate retriever

conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=12.1 -c pytorch -c nvidia
pip install transformers datasets pyserini

conda install -c pytorch -c nvidia faiss-gpu=1.8.0

## API function
pip install uvicorn fastapi

Quick start

Train a reasoning + search LLM on TopiOCQA dataset with e5 as the retriever tool.

(1) Download the indexing and corpus.

save_path=collection
python scripts/download.py --save_path $save_path
rm "$save_path"/data/*.parquet

cat "$save_path"/index/part_* > "$save_path"/topiocqa/index/e5_Flat.index
rm "$save_path"/index/part_*

(2) Download the TopiOCQA dataset.

python scripts/download_data_conv.py

(3) This launch a local retrieval server in the background and then train ChatR1.

sbatch train_ppo_3b_topiocqa_retrieval.sh

(This uses the address: http://127.0.0.1:8002/retrieve for retrieval)

Inference

You can evaluate our ChatR1 trained model on TopiOCQA

(1) Download ChatR1 model weights.

hf download \
  slupart/ChatR1-topiocqa-qwen2.5-3b-it-ppo \
  --local-dir verl_checkpoints/chatr1-topiocqa-qwen2.5-3b-it-ppo \
  --local-dir-use-symlinks False

(2) Load retrieval and run ChatR1.

sbatch eval_topiocqa.sh

(3) Evaluate with F1 score and reference answers.

run="verl_checkpoints/chatr1-topiocqa-qwen2.5-3b-it-ppo/val_topiocqa/validation_results_step0.json"
python scripts/eval_f1.py --predictions "$run"

or play with the trained ChatR1 model with your own questions.

(1) Launch a local retrieval server.

conda activate retriever
bash retrieval_launch_topiocqa.sh &

(2) Run inference on single question.

conda activate chatr1
python infer.py

More Resources

Additional ChatR1 resources are available in the Hugging Face collection of ChatR1.

All datasets collection in jsonl format for TopiOCQA, QReCC, INSCIT, MultiDoc2Dial and FaithDial.
All E5 retrieval indexes are also provided for the datasets above.
ChatR1 checkpoints for models trained on QReCC and TopiOCQA on 3B and 7B backbones.
A unified dataset format for training and evaluation conversation of TopiOCQA, QReCC, INSCIT, MultiDoc2Dial and FaithDial in here

Acknowledge

ChatR1 is built upon Search R1, and inspired by Deepseek-R1 and TinyZero. Its implementation is built from Search R1 using veRL and RAGEN.

Citations

@article{lupart2025chatr1,
  title={Chatr1: Reinforcement learning for conversational reasoning and retrieval augmented question answering},
  author={Lupart, Simon and Aliannejadi, Mohammad and Kanoulas, Evangelos},
  journal={arXiv preprint arXiv:2510.13312},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
chat_r1		chat_r1
scripts		scripts
verl		verl
.gitignore		.gitignore
LICENSE		LICENSE
Notice.txt		Notice.txt
README.md		README.md
eval_topiocqa.sh		eval_topiocqa.sh
infer.py		infer.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
retrieval_launch_topiocqa.sh		retrieval_launch_topiocqa.sh
setup.py		setup.py
train_ppo_3b_topiocqa_retrieval.sh		train_ppo_3b_topiocqa_retrieval.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ChatR1: Reinforcement Learning for Conversational Reasoning and Retrieval Augmented Question Answering

Installation

ChatR1 environment

Retriever environment

Quick start

Inference

You can evaluate our ChatR1 trained model on TopiOCQA

or play with the trained ChatR1 model with your own questions.

More Resources

Acknowledge

Citations

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ChatR1: Reinforcement Learning for Conversational Reasoning and Retrieval Augmented Question Answering

Installation

ChatR1 environment

Retriever environment

Quick start

Inference

You can evaluate our ChatR1 trained model on TopiOCQA

or play with the trained ChatR1 model with your own questions.

More Resources

Acknowledge

Citations

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages