Align to the Pivot: Dual Alignment with Self-Feedback for Multilingual Reasoning

This project is developed based on the OpenRLHF repository, with key modifications made to the experience-making logic.

🔧 Main Modification

Modified file: ./openrlhf/trainer/ppo_utils/experience_maker.py
Description: We revised the experience-making logic. See line 873 for the main changes.

🧩 Installation and Execution

1. Install the Modified OpenRLHF Package

cd Code4PASMR
pip install -e .

2. Start the Ray Cluster

ray start --head --node-ip-address 0.0.0.0 --num-gpus 8

3. Run the Training Script

bash run_ray_reinforce_final.sh

🗂️ Project Structure (Simplified)

/Code4PASMR/
├── openrlhf/
│   ├── trainer/
│   │   └── ppo_utils/
│   │       └── experience_maker.py  # 👈 Main logic modified at line 873
│   └── ...
├── run_ray_reinforce_final.sh       # 👈 Main script to execute
├── setup.py
└── ...

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
dockerfile		dockerfile
docs		docs
examples/scripts		examples/scripts
openrlhf		openrlhf
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
Supplementary Materials.pdf		Supplementary Materials.pdf
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
run_ray_reinforce_final.sh		run_ray_reinforce_final.sh
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Align to the Pivot: Dual Alignment with Self-Feedback for Multilingual Reasoning

🔧 Main Modification

🧩 Installation and Execution

1. Install the Modified OpenRLHF Package

2. Start the Ray Cluster

3. Run the Training Script

🗂️ Project Structure (Simplified)

About

Uh oh!

Releases

Packages

Languages

License

Rover912/PASMR

Folders and files

Latest commit

History

Repository files navigation

Align to the Pivot: Dual Alignment with Self-Feedback for Multilingual Reasoning

🔧 Main Modification

🧩 Installation and Execution

1. Install the Modified OpenRLHF Package

2. Start the Ray Cluster

3. Run the Training Script

🗂️ Project Structure (Simplified)

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages