VCORE

The official repository for the paper "VCORE: Variance-Controlled Optimization-based Reweighting for Chain-of-Thought Supervision"

🔥 News

Our paper has been accepted to ACL 2026 Main Conference 6/4/2026
Evaluation code and all scripts
Basic training code based on LLaMA-Factory frmework uploaded
Preprint Paper. .
Training dataset () in huggingface format uploaded

🌟 Key Highlights

Beyond heuristics: take token weighting as optimization, not guesswork.
Improving both in-domain accuracy and out-of-domain generalization.
Serves as a more effective initialization for subsequent RL.

🚀 Quick Start

Prepare Code and Data

git clone https://github.com/coder-gx/VCORE.git
cd VCORE

Download the training data form huggingface.

change the data path in data_info.json file of the llamafactory framework.

Environment Setup

conda create -n vcore python==3.10
conda activate vcore
pip install torch==2.7.0 torchvision==0.22.0 torchaudio==2.7.0 --index-url https://download.pytorch.org/whl/cu128
pip install -r requirements.txt 
pip install -e ./llama_factory
pip install -e ./transformers-4.52.4

Start training

We have two kinds of methods to run VCORE, multi-process one and single-process one.

1. single process

There is training command examples in train_single.sh, and you can change the hyperparameters to run the different training settings.

bash train_multi_single.sh

2. multi process

There is training command examples in train_multi_main.sh and train_multi_branch.sh, and you can change the hyperparameters to run the different training settings.

bash train_multi_main.sh
bash train_multi_branch.sh # run at a different shell

📖 Methodology

New Perspective on CoT Supervision:

(1) Optimization-Derived Weighting.

(2) Variance-Controlled Stabilization.

📊 Results

1. Generalization

VCORE demonstrates the best overall performance, achieving strong in-domain accuracy and robust out-of-domain generalization across different models and domains.
VCORE yields larger improvements on smaller and less capable models, with gains scaling positively with the strength of larger models.

2. Ablation

As the training dataset scales up, VCORE consistently maintains its advantage over DFT method
Optimization-derived reweighting hyperparameter sensitivity

Variance control is critical for stabilizing sharp reweighting and ensuring reliable convergence.

3. Foundation for RL

VCORE offers a more capable foundation model to support reasoning tasks in reinforcement learning.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
figures		figures
llama_factory		llama_factory
paper_pdf		paper_pdf
transformers-4.52.4		transformers-4.52.4
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VCORE

🔥 News

🌟 Key Highlights

🚀 Quick Start

Prepare Code and Data

Environment Setup

Start training

1. single process

2. multi process

📖 Methodology

📊 Results

1. Generalization

2. Ablation

3. Foundation for RL

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

VCORE

🔥 News

🌟 Key Highlights

🚀 Quick Start

Prepare Code and Data

Environment Setup

Start training

1. single process

2. multi process

📖 Methodology

📊 Results

1. Generalization

2. Ablation

3. Foundation for RL

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages