The official repository for the paper "VCORE: Variance-Controlled Optimization-based Reweighting for Chain-of-Thought Supervision"
- Our paper has been accepted to ACL 2026 Main Conference 6/4/2026
- Evaluation code and all scripts
- Basic training code based on LLaMA-Factory frmework uploaded
- Preprint Paper.
.
- Training dataset (
) in huggingface format uploaded
- Beyond heuristics: take token weighting as optimization, not guesswork.
- Improving both in-domain accuracy and out-of-domain generalization.
- Serves as a more effective initialization for subsequent RL.
git clone https://github.com/coder-gx/VCORE.git
cd VCOREDownload the training data form huggingface.
change the data path in data_info.json file of the llamafactory framework.
conda create -n vcore python==3.10
conda activate vcore
pip install torch==2.7.0 torchvision==0.22.0 torchaudio==2.7.0 --index-url https://download.pytorch.org/whl/cu128
pip install -r requirements.txt
pip install -e ./llama_factory
pip install -e ./transformers-4.52.4We have two kinds of methods to run VCORE, multi-process one and single-process one.
There is training command examples in train_single.sh, and you can change the hyperparameters to run the different training settings.
bash train_multi_single.shThere is training command examples in train_multi_main.sh and train_multi_branch.sh, and you can change the hyperparameters to run the different training settings.
bash train_multi_main.sh
bash train_multi_branch.sh # run at a different shellNew Perspective on CoT Supervision:
(1) Optimization-Derived Weighting.
(2) Variance-Controlled Stabilization.
-
VCORE demonstrates the best overall performance, achieving strong in-domain accuracy and robust out-of-domain generalization across different models and domains.
-
VCORE yields larger improvements on smaller and less capable models, with gains scaling positively with the strength of larger models.
- As the training dataset scales up, VCORE consistently maintains its advantage over DFT method
- Optimization-derived reweighting hyperparameter sensitivity
- Variance control is critical for stabilizing sharp reweighting and ensuring reliable convergence.
- VCORE offers a more capable foundation model to support reasoning tasks in reinforcement learning.





