MemPO is a self-memory policy optimization algorithm for long-horizon agents, enabling the policy model to autonomously summarize, manage, and selectively retain crucial memory during environment interaction.
We have open-sourced all RL training and evaluation code, as well as the data and models. If you would like to reproduce our results, please refer to training.md or Quick Start for the training part and evaluation.md for the evaluation part, both of which provide detailed instructions.
⭐️ Please star this repository if it is helpful for you!
The version of VeRL we used is 0.5.0.dev, supporting multi-round asynchronous training.
- Method1: Build from the Docker (STRONGLY Recommended!)
You can get the official docker file of VeRL in DockerHub. The version we used is:
docker pull verlai/verl:app-verl0.5-transformers4.55.4-sglang0.4.10.post2-mcore0.13.0-te2.2
- Method2: Build from conda & pip
You can follow the official Document of VeRL. We have provided a requirements.txt for reference.
Note: The RAG server for training is different from the one for evaluation.
- Environment
conda create -n retriever python=3.10
conda activate retriever
conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=12.1 -c pytorch -c nvidia
pip install transformers datasets pyserini
conda install -c pytorch -c nvidia faiss-gpu=1.8.0
pip install uvicorn fastapi
- Downloading
Download the wiki files. Change the save_path below to your target path. If you get problems when using the script, you can download them directly from huggingface of Search-R1 .
cd mempo
save_path=/the/path/to/save
python sglang_multiturn/local_dense_retriever/download.py --save_path $save_path
cat $save_path/part_* > $save_path/e5_Flat.index
gzip -d $save_path/wiki-18.jsonl.gz
- Lauch
Change the file_path to your own path in /sglang_multiturn/retrieval_launch.sh. Then you can launch the RAG server by:
cd mempo
conda activate retriever
bash /sglang_multiturn/retrieval_launch.sh
- Downloading
Download the sfted model MemPO_Qwen2.5-SFT, and the dataset.
- Launch
Change the actor_rollout_ref.model.path and the data.train_files to your own paths in sglang_multiturn/run_train.sh, and then you can train the model by:
cd mempo
bash sglang_multiturn/run_train.shThe code is implemented with reference to VeRL, Search-R1, Asearcher, and MEM1.
