ACRE is a model-free, off-policy RL algorithm specifically designed to incorporate extra exploration signals without blurring the environmental rewards. ACRE is shipped with a Gaussian Mixture Model (GMM) to calculate the instantaneous novelty.
- Install ubuntu needed libraries
sudo apt install libpython3.7-dev
sudo apt install libopenmpi-dev
-
Install MuJoCo (Optional)
If you want to use the MuJoCo environments you must follow readme instructions to install mujoco-py
-
Clone this repository
git clone https://github.com/athakapo/ACRE.git
- Enter project's repository and create a new python environment of your choice. Here we provide a
venv
example, however the installation instructions usingconda
environment is pretty similar.
cd ACRE
python3.7 -m venv venv
- Activate the environment
. venv/bin/activate
- Install the needed dependencies*
python -m pip install -r requirements.txt
*If you encounter any problem in the installation of mpi4py, please check this guide: Probably you need to find your current path to mpicc (sudo find / -name mpicc) and then run:
env MPICC=path_to_mpicc/mpicc python -m pip install mpi4py==3.0.3
- Open up
terminal
, navigate to project's repository and activate python environment
. venv/bin/activate
- Add
[ACRE]
project into yourPYTHONPATH
export PYTHONPATH="$PWD"
- Execute a python scrypt
- [1rst Example] Run ACRE algorithm for
MountainCarContinuous-v0
environmentpython algos/acre/acre.py --env MountainCarContinuous-v0
- [2nd Example] After defining the values in run_experiment_grid.py execute
python run_experiment_grid.py
- [1rst Example] Run ACRE algorithm for
- Monitor learning progress through Tensorboard*
tensorboard --logdir tensorboard/
Following Spinning Up nomenclature:
├── README.md <- You are here!
├── algos <- All supported RL algorithms
│ ├── acre <- ACRE folder
│ │ ├── acre.py <- Algorithm logic and learning process
│ │ ├── acre_MountainCarContinuous-v0.py <- Saved ACRE parameters for MountainCarContinuous-v0 environment
│ │ ├── acre_Swimmer-v2.py <- Saved ACRE parameters for Swimmer-v2 environment
│ │ └── core.py <- Neural networks definitions and varius ACRE utilities
│ ├── acre_rnd <- ACRE+RND [ACRE + https://arxiv.org/abs/1810.12894]
│ ├── ddpg <- DDPG https://arxiv.org/abs/1509.02971
│ ├── ppo <- PPO https://arxiv.org/abs/1707.06347
│ ├── ppo_gmm <- PPO+GMM
│ ├── ppo_rnd <- PPO+RND https://arxiv.org/abs/1810.12894
│ ├── sac <- SAC https://arxiv.org/abs/1801.01290
│ └── td3 <- TD3 https://arxiv.org/abs/1802.09477
│
├── data <- Data folder for each algorithm to save checkpoints
| and reproduce experiments
│
├── images <- Generated graphics and figures for the repository README
│
├── tensorboard <- Monitor the progress of learning curves in real-time
│ with the power of tensorboard
│
├── utils <- Collection of several supplementary utilities
│ ├── gmm.py <- Gaussian Mixture Model definition and functionality
│ ├── logx.py <- A general-purpose logger
│ ├── ModifiedTensorBoard.py <- Tensorboard
│ ├── mpi_pytorch.py <- Data-parallel PyTorch optimization across MPI processes
│ ├── mpi_tools.py <- MPI tools
│ ├── plot.py <- Plot handling
│ ├── run_utils.py <- Utilities for running experiments
│ └── serialization_utils.py <- Serialization utilities
│
├── run_experiment_grid.py <- Run the same algorithm with many possible hyperparameters
├── requirements.txt <- The requirements file for reproducing the python environment
State-space coverage study:
State-space coverage study:
ACRE algorithm was evaluated on 12 continuous control tasks from the most well-known and used, openai-gym-style collections, using Tonic RL library. The evaluation was grouped into 3 bundles:
- Standard openai-gym control tasks
- BipedalWalker-v3
- LunarLanderContinuous-v2
- MountainCarContinuous-v0
- Pendulum-v0
- Advanced physics' simulator of MuJoCo environments
- Ant-v3
- Hopper-v3
- Swimmer-v3
- Walker2d-v3
- DeepMind Control Suite
- ball_in_cup-catch
- cartpole-two_poles
- finger-turn_easy
- quadruped-walk
The performance of ACRE in comparison with A2C, DDPG, PPO, SAC, TD3 and TRPO is illustrated in the following figure:
Contributions, issues and feature requests are welcome! Feel free to use issues page.
Kapoutsis, A. C., Koutras, D. I., Korkas, C. D., & Kosmatopoulos, E. B. (2023). ACRE: Actor-Critic with Reward-Preserving Exploration. Neural Computing and Applications, 1-14. [Link]
@article{kapoutsis2023acre,
title={ACRE: Actor-Critic with Reward-Preserving Exploration},
author={Kapoutsis, Athanasios Ch and Koutras, Dimitrios I and Korkas, Christos D and Kosmatopoulos, Elias B},
journal={Neural Computing and Applications},
pages={1--14},
year={2023},
publisher={Springer}
}