This repository provides the implementation of the Mean Flow Policy Optimization (MFPO) algorithm.
To get started, you need to install the required dependencies.
conda create -n MFPO python=3.9
conda activate MFPO
pip install --upgrade pip
pip install -r requirements.txt
pip install -e .To reproduce the results in the paper, execute the training script:
XLA_PYTHON_CLIENT_MEM_FRACTION=.1 python3 train_online.py --config configs/mfpo_config.py --env_name HalfCheetah-v3
XLA_PYTHON_CLIENT_MEM_FRACTION=.1 python3 train_online.py --config configs/mfpo_config.py --env_name Humanoid-v3
XLA_PYTHON_CLIENT_MEM_FRACTION=.1 python3 train_online.py --config configs/mfpo_config.py --env_name Ant-v3
XLA_PYTHON_CLIENT_MEM_FRACTION=.1 python3 train_online.py --config configs/mfpo_config.py --env_name Walker2d-v3
XLA_PYTHON_CLIENT_MEM_FRACTION=.1 python3 train_online.py --config configs/mfpo_config.py --env_name Hopper-v3 When running with multiple gpus, the batch size (default 256) should be divisible by the number of devices.
The code is built based on the QSM and MaxEntDP implementation.