Skip to content

TheNewBeeKing/MemPO

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MemPO: Self-Memory Policy Optimization for Long-Horizon Agents

[Paper] [Models and Datasets]

MemPO is a self-memory policy optimization algorithm for long-horizon agents, enabling the policy model to autonomously summarize, manage, and selectively retain crucial memory during environment interaction.

We have open-sourced all RL training and evaluation code, as well as the data and models. If you would like to reproduce our results, please refer to training.md or Quick Start for the training part and evaluation.md for the evaluation part, both of which provide detailed instructions.

⭐️ Please star this repository if it is helpful for you!

main

Quick Start

Environment

The version of VeRL we used is 0.5.0.dev, supporting multi-round asynchronous training.

  • Method1: Build from the Docker (STRONGLY Recommended!)

You can get the official docker file of VeRL in DockerHub. The version we used is:

docker pull verlai/verl:app-verl0.5-transformers4.55.4-sglang0.4.10.post2-mcore0.13.0-te2.2
  • Method2: Build from conda & pip

You can follow the official Document of VeRL. We have provided a requirements.txt for reference.

RAG Server for training

Note: The RAG server for training is different from the one for evaluation.

  • Environment
conda create -n retriever python=3.10
conda activate retriever

conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=12.1 -c pytorch -c nvidia
pip install transformers datasets pyserini

conda install -c pytorch -c nvidia faiss-gpu=1.8.0

pip install uvicorn fastapi
  • Downloading

Download the wiki files. Change the save_path below to your target path. If you get problems when using the script, you can download them directly from huggingface of Search-R1 .

cd mempo
save_path=/the/path/to/save
python sglang_multiturn/local_dense_retriever/download.py --save_path $save_path
cat $save_path/part_* > $save_path/e5_Flat.index
gzip -d $save_path/wiki-18.jsonl.gz
  • Lauch

Change the file_path to your own path in /sglang_multiturn/retrieval_launch.sh. Then you can launch the RAG server by:

cd mempo
conda activate retriever
bash /sglang_multiturn/retrieval_launch.sh

Training

  • Downloading

Download the sfted model MemPO_Qwen2.5-SFT, and the dataset.

  • Launch

Change the actor_rollout_ref.model.path and the data.train_files to your own paths in sglang_multiturn/run_train.sh, and then you can train the model by:

cd mempo
bash sglang_multiturn/run_train.sh

Acknowledgements

The code is implemented with reference to VeRL, Search-R1, Asearcher, and MEM1.

Star History Chart

About

The official repository of paper: MemPO: Self-Memory Policy Optimization for Long-Horizon Agents

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors