Skip to content

cisnlp/COPSD

Repository files navigation

Crosslingual On-Policy Self-Distillation for Multilingual Reasoning

arXiv AfriMGSM-Train-DataSet PolyMath-Train-DataSet

Official code release for Crosslingual On-Policy Self-Distillation for Multilingual Reasoning.

Links: Paper Β· TrainData1 Β· TrainData2

Overview

Large language models have achieved strong mathematical reasoning performance in English, but this ability is not equally accessible across languages. In particular, low-resource languages often show much lower reasoning accuracy, even when the underlying reasoning problem is equivalent.

We propose Crosslingual On-Policy Self-Distillation (COPSD), a framework that transfers a model's own high-resource reasoning behavior to low-resource languages. COPSD uses the same model as both student and teacher:

During training:

  • the student receives only the low-resource or target-language problem and generates an on-policy reasoning trajectory;
  • the teacher receives privileged crosslingual context, including the English problem translation and the English reference solution;
  • training minimizes a full-distribution token-level divergence between the teacher and student policies on the student's own rollouts.

This repository builds on the original OPSD codebase and adapts it for crosslingual and low-resource multilingual mathematical reasoning.

Method Sketch

Low-resource target-language problem
        β”‚
        β–Ό
 Student policy generates an on-policy reasoning trajectory
        β”‚
        β–Ό
 Same model as teacher, conditioned on privileged English context
        β”‚
        β”œβ”€β”€ English problem translation
        └── English reference solution
        β”‚
        β–Ό
 Full-distribution token-level self-distillation loss on the student's own rollout

Repository Structure

.
β”œβ”€β”€ multilingual_opsd_train.py          # Main COPSD training entry point
β”œβ”€β”€ multilingual_opsd_trainer.py        # COPSD trainer and self-distillation losses
β”œβ”€β”€ multilingual_data_collator.py       # Multilingual student/teacher prompt construction
β”œβ”€β”€ multilingual_grpo_train.py          # Multilingual GRPO baseline training
β”œβ”€β”€ language_config.py                  # Language-specific prompts, labels, and thinking prefixes
β”œβ”€β”€ accelerate.yaml                     # Accelerate/DeepSpeed launch config
β”œβ”€β”€ environment.yml                     # Conda environment
β”œβ”€β”€ multilingual_scripts/
β”‚   └── run_all_opsd_4b_3000.sh         # Qwen3-4B multilingual COPSD training script
β”œβ”€β”€ african_langs_scripts/
β”‚   β”œβ”€β”€ run_all_opsd_1.7b_train.sh      # Qwen3-1.7B AfriMGSM training script
β”‚   β”œβ”€β”€ run_all_opsd_4b_train.sh        # Qwen3-4B AfriMGSM training script
β”‚   └── run_all_opsd_8b_train.sh        # Qwen3-8B AfriMGSM training script
β”œβ”€β”€ polymath_eval/
β”‚   β”œβ”€β”€ evaluate_math.py                # PolyMath evaluation
β”‚   └── run_eval_4b_all_checkpoints.sh  # Evaluate Qwen3-4B checkpoints
└── african_langs_eval/
    β”œβ”€β”€ evaluate_math.py                # AfriMGSM evaluation
    β”œβ”€β”€ run_afrimgsm_one_lang_all_ckpts_1.7b.sh
    β”œβ”€β”€ run_afrimgsm_one_lang_all_ckpts_4b.sh
    └── run_afrimgsm_one_lang_all_ckpts_8b.sh

Supported Languages

The current codebase includes language-specific prompts and labels for:

  • PolyMath / multilingual math languages: BN, DE, EN, ES, FR, JA, RU, SW, TE, TH, ZH
  • AfriMGSM languages: AMH, EWE, HAU, IBO, KIN, LIN, LUG, ORM, SNA, SOT, SWA, TWI, VAI, WOL, XHO, YOR, ZUL

Installation

conda env create -f environment.yml
conda activate opsd

The training scripts use FlashAttention 2 by default. If your environment does not already provide it, install a version compatible with your CUDA and PyTorch setup, for example:

pip install flash-attn --no-build-isolation

Before running training, update the placeholder paths in the training files/scripts:

CACHE_ROOT = "YOUR PATH"

and in shell scripts:

PROJECT_ROOT="YOUR PATH"

Data Format

Our current code expects translated JSON files with English source reasoning and target-language problem fields. A minimal example is:

{
  "problem": "English source problem here.",
  "solution": "English reference solution here.",
  "problem_de": "German translated problem here.",
}

At runtime, multilingual_opsd_train.py adds:

  • target_lang
  • problem_en

You can adapt the script to the training set structure we reliesed to HuggingFace: Train Data for AfriMGSM Languages Β· Train Data for PolyMath Languages.

Training

Multilingual COPSD on Qwen3-4B

Edit multilingual_scripts/run_all_opsd_4b_3000.sh to set the data directory, output directory, GPU IDs, and port, then run:

bash multilingual_scripts/run_all_opsd_4b_3000.sh

The default script trains separate models for following languages:

BN, SW, TE, TH, ZH, ES, RU, JA

AfriMGSM COPSD

For African-language experiments, use one of the scripts in african_langs_scripts/:

bash african_langs_scripts/run_all_opsd_1.7b_train.sh
bash african_langs_scripts/run_all_opsd_4b_train.sh
bash african_langs_scripts/run_all_opsd_8b_train.sh

These scripts train separate models for the supported AfriMGSM languages.

Important Training Options

Argument Description
--train_language Target language code, e.g. DE, ZH, SWA, YOR.
--translated_data_path Path to the translated JSON file.
--fixed_teacher Use the base model without LoRA adapters as a fixed teacher. Requires --use_peft.
--student_enable_thinking Enable target-language thinking prefix in the student prompt.
--include_problem_en Include the English source problem in the teacher context.
--include_reference_solution_en Include the English reference solution in the teacher context.

Evaluation

PolyMath

cd polymath_eval
bash run_eval_4b_all_checkpoints.sh

The script evaluates the base model and available COPSD checkpoints across the configured languages.

AfriMGSM

cd african_langs_eval
bash run_afrimgsm_one_lang_all_ckpts_1.7b.sh
bash run_afrimgsm_one_lang_all_ckpts_4b.sh
bash run_afrimgsm_one_lang_all_ckpts_8b.sh

Evaluation outputs are saved under eval_results/.

Released Assets

Citation

If you find this repository useful, please cite our paper:

@misc{liu2026crosslingualonpolicyselfdistillation,
      title={Crosslingual On-Policy Self-Distillation for Multilingual Reasoning}, 
      author={Yihong Liu and Raoyuan Zhao and Michael A. Hedderich and Hinrich SchΓΌtze},
      year={2026},
      eprint={2605.09548},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2605.09548}, 
}

This codebase is adapted from OPSD. Please also consider citing the original OPSD work:

@misc{zhao2026selfdistilledreasoneronpolicyselfdistillation,
      title={Self-Distilled Reasoner: On-Policy Self-Distillation for Large Language Models}, 
      author={Siyan Zhao and Zhihui Xie and Mengchen Liu and Jing Huang and Guan Pang and Feiyu Chen and Aditya Grover},
      year={2026},
      eprint={2601.18734},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2601.18734}, 
}

Acknowledgements

This repository builds on the excellent OPSD implementation for on-policy self-distillation. We thank the authors for releasing their code.

About

Crosslingual On-Policy Self-Distillation for Multilingual Reasoning

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors