GitHub

ENHANCING MULTILINGUAL REASONING IN LLMS:INSIGHTS FROM CROSS-LINGUISTIC CORRELATIONS AND OPTIMAL DATA PROPORTIONS

⛰️ Overview

This repository shares the code and dataset of our latest work on multilingual reasoning. In this work, we present a novel construction of dataset which performs targeted language alignment for best use of the LLMs English reasoning abilities.
Utilizing this dataset, you can finetune open-source LLMs into strong multilingual reasoning systems. For example, our fine-tuned LLaMA2-7B achieves superior multilingual performance, significantly outperforming baseline models of equivalent size.
Overall, our method effectively reduces the performance disparity of LLMs across English and non-English languages, showing a new paradigm to unlock LLM’s capabilities to accompolish multilingual tasks.
Please note that the core contribution of our work is to provide an idea for constructing datasets and to open-source the HighMath dataset. The code in the repository is just an example of how to use it, so please adjust the usage according to your actual situation.

📈 Benchmarks

Below we present LLMs' average answer accuracy (zero-shot) on multilingual reasoning benchmarks. With HighMath, our fine-tuned LLM surpasses the unaligned counterpart and the translate-training baseline by a large margin.

System (7B)	Monolingual Supervision	Multilingual Supervision	mGSM	mSVAMP
HighMath (ours)	-	HighMath	52.5	65.2
MetaMath	MetaMathQA	-	38.4	46.2
MathOctopus	-	GSM8KInstruct	40.0	44.1
WizardMath	GSM8K & MATH	-	23.0	32.5
MAmmoTh	MathInstruct	-	21.3	26.3
RFT	GSM8k-ScRel	-	20.6	31.3
SFT	GSM8K	-	22.6	30.9

📂 Dataset

In the table below, we list datasets that are used in this project. All datasets are available through the url.

Dataset	Usage	Size	Languages
HighMath	Training	395,000	En, Bn, Th, Sw, Ja, Zh, De, Fr, Ru, Es
MetaMathQA	Training	395,000	En
GSM8KInstruct	Training	73,559	En, Bn, Th, Sw, Ja, Zh, De, Fr, Ru, Es
mGSM	Evaluation	2,500	En, Bn, Th, Sw, Ja, Zh, De, Fr, Ru, Es
mSVAMP	Evaluation	10,000	En, Bn, Th, Sw, Ja, Zh, De, Fr, Ru, Es

🛠️ Training

We develope our training pipeline based on the stanford_alpaca repository.

To perform finetuning on pre-trained LLMs, use the following command. When fine-tuning the 70B model, we utilize DeepSpeed to save memory. You can find our deepspeed configuration in the repo.

Please note that the training configuration in the repository is provided as a reference example. Please customize the settings according to your actual requirements and hardware conditions.

The recommended configuration is 8xA100 GPUs.

Exanple: finetuning LLaMA2-7B

bash ./ds_run.sh

🌲 Citation

If you find this repository helpful, feel free to cite our paper:

@inproceedings{
wang2025enhancing,
title={Enhancing Multilingual Reasoning in {LLM}s: Insights from Cross-Linguistic Correlations and Optimal Data Proportions},
author={Jiangkuo Wang and Suyv Ma and Mingpeng Wei},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=S6cBH99BhB}
}

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
CODE		CODE
picture		picture
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ENHANCING MULTILINGUAL REASONING IN LLMS:INSIGHTS FROM CROSS-LINGUISTIC CORRELATIONS AND OPTIMAL DATA PROPORTIONS

⛰️ Overview

📈 Benchmarks

📂 Dataset

🛠️ Training

🌲 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

DeepShareAI/HighMath

Folders and files

Latest commit

History

Repository files navigation

ENHANCING MULTILINGUAL REASONING IN LLMS:INSIGHTS FROM CROSS-LINGUISTIC CORRELATIONS AND OPTIMAL DATA PROPORTIONS

⛰️ Overview

📈 Benchmarks

📂 Dataset

🛠️ Training

🌲 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages