Skip to content
/ PASMR Public

Code for the paper "Align to the Pivot: Dual Alignment with Self-Feedback for Multilingual Math Reasoning"

License

Notifications You must be signed in to change notification settings

Rover912/PASMR

Repository files navigation

Align to the Pivot: Dual Alignment with Self-Feedback for Multilingual Reasoning

This project is developed based on the OpenRLHF repository, with key modifications made to the experience-making logic.

🔧 Main Modification

  • Modified file: ./openrlhf/trainer/ppo_utils/experience_maker.py
  • Description: We revised the experience-making logic. See line 873 for the main changes.

🧩 Installation and Execution

1. Install the Modified OpenRLHF Package

cd Code4PASMR
pip install -e .

2. Start the Ray Cluster

ray start --head --node-ip-address 0.0.0.0 --num-gpus 8

3. Run the Training Script

bash run_ray_reinforce_final.sh

🗂️ Project Structure (Simplified)

/Code4PASMR/
├── openrlhf/
│   ├── trainer/
│   │   └── ppo_utils/
│   │       └── experience_maker.py  # 👈 Main logic modified at line 873
│   └── ...
├── run_ray_reinforce_final.sh       # 👈 Main script to execute
├── setup.py
└── ...

About

Code for the paper "Align to the Pivot: Dual Alignment with Self-Feedback for Multilingual Math Reasoning"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages