Quick Start

MemPO: Self-Memory Policy Optimization for Long-Horizon Agents

MemPO is a self-memory policy optimization algorithm for long-horizon agents, enabling the policy model to autonomously summarize, manage, and selectively retain crucial memory during environment interaction.

We have open-sourced all RL training and evaluation code, as well as the data and models. If you would like to reproduce our results, please refer to training.md or Quick Start for the training part and evaluation.md for the evaluation part, both of which provide detailed instructions.

⭐️ Please star this repository if it is helpful for you!

Quick Start

Environment

The version of VeRL we used is 0.5.0.dev, supporting multi-round asynchronous training.

Method1: Build from the Docker (STRONGLY Recommended!)

You can get the official docker file of VeRL in DockerHub. The version we used is:

docker pull verlai/verl:app-verl0.5-transformers4.55.4-sglang0.4.10.post2-mcore0.13.0-te2.2

Method2: Build from conda & pip

You can follow the official Document of VeRL. We have provided a requirements.txt for reference.

RAG Server for training

Note: The RAG server for training is different from the one for evaluation.

Environment

conda create -n retriever python=3.10
conda activate retriever

conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=12.1 -c pytorch -c nvidia
pip install transformers datasets pyserini

conda install -c pytorch -c nvidia faiss-gpu=1.8.0

pip install uvicorn fastapi

Downloading

Download the wiki files. Change the save_path below to your target path. If you get problems when using the script, you can download them directly from huggingface of Search-R1 .

cd mempo
save_path=/the/path/to/save
python sglang_multiturn/local_dense_retriever/download.py --save_path $save_path
cat $save_path/part_* > $save_path/e5_Flat.index
gzip -d $save_path/wiki-18.jsonl.gz

Lauch

Change the file_path to your own path in /sglang_multiturn/retrieval_launch.sh. Then you can launch the RAG server by:

cd mempo
conda activate retriever
bash /sglang_multiturn/retrieval_launch.sh

Training

Downloading

Download the sfted model MemPO_Qwen2.5-SFT, and the dataset.

Launch

Change the actor_rollout_ref.model.path and the data.train_files to your own paths in sglang_multiturn/run_train.sh, and then you can train the model by:

cd mempo
bash sglang_multiturn/run_train.sh

Acknowledgements

The code is implemented with reference to VeRL, Search-R1, Asearcher, and MEM1.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MemPO: Self-Memory Policy Optimization for Long-Horizon Agents

Quick Start

Environment

RAG Server for training

Training

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
eval		eval
figures		figures
sglang_multiturn		sglang_multiturn
verl		verl
.gitignore		.gitignore
README.md		README.md
evaluation.md		evaluation.md
requirements.txt		requirements.txt
training.md		training.md

Folders and files

Latest commit

History

Repository files navigation

MemPO: Self-Memory Policy Optimization for Long-Horizon Agents

Quick Start

Environment

RAG Server for training

Training

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages