This project is developed based on the OpenRLHF repository, with key modifications made to the experience-making logic.
- Modified file:
./openrlhf/trainer/ppo_utils/experience_maker.py - Description: We revised the experience-making logic. See line 873 for the main changes.
cd Code4PASMR
pip install -e .ray start --head --node-ip-address 0.0.0.0 --num-gpus 8bash run_ray_reinforce_final.sh/Code4PASMR/
├── openrlhf/
│ ├── trainer/
│ │ └── ppo_utils/
│ │ └── experience_maker.py # 👈 Main logic modified at line 873
│ └── ...
├── run_ray_reinforce_final.sh # 👈 Main script to execute
├── setup.py
└── ...