ACRE: Actor-Critic with Reward-Preserving Exploration

ACRE is a model-free, off-policy RL algorithm specifically designed to incorporate extra exploration signals without blurring the environmental rewards. ACRE is shipped with a Gaussian Mixture Model (GMM) to calculate the instantaneous novelty.

Installation

[Tested with python 3.7 and Ubuntu 18.04 & 20.04]

Install ubuntu needed libraries

sudo apt install libpython3.7-dev
sudo apt install libopenmpi-dev

Install MuJoCo (Optional)

If you want to use the MuJoCo environments you must follow readme instructions to install mujoco-py
Clone this repository

git clone https://github.com/athakapo/ACRE.git

Enter project's repository and create a new python environment of your choice. Here we provide a venv example, however the installation instructions using conda environment is pretty similar.

cd ACRE
python3.7 -m venv venv

Activate the environment

. venv/bin/activate

Install the needed dependencies*

python -m pip install -r requirements.txt

*If you encounter any problem in the installation of mpi4py, please check this guide: Probably you need to find your current path to mpicc (sudo find / -name mpicc) and then run: env MPICC=path_to_mpicc/mpicc python -m pip install mpi4py==3.0.3

Example Usage

Open up terminal, navigate to project's repository and activate python environment

. venv/bin/activate

Add [ACRE] project into your PYTHONPATH

export PYTHONPATH="$PWD"

Execute a python scrypt
1. [1^rst Example] Run ACRE algorithm for MountainCarContinuous-v0 environment
```
python algos/acre/acre.py --env MountainCarContinuous-v0
```
2. [2^nd Example] After defining the values in run_experiment_grid.py execute
```
python run_experiment_grid.py
```
Monitor learning progress through Tensorboard*

*execute this command after having started the training script (Step 3)

tensorboard --logdir tensorboard/

Project Structure

Following Spinning Up nomenclature:

├── README.md                       <- You are here!
├── algos                           <- All supported RL algorithms
│   ├── acre                        <- ACRE folder
│   │   ├── acre.py                 <- Algorithm logic and learning process
│   │   ├── acre_MountainCarContinuous-v0.py <- Saved ACRE parameters for MountainCarContinuous-v0 environment
│   │   ├── acre_Swimmer-v2.py      <- Saved ACRE parameters for Swimmer-v2 environment
│   │   └── core.py                 <- Neural networks definitions and varius ACRE utilities
│   ├── acre_rnd                    <- ACRE+RND [ACRE + https://arxiv.org/abs/1810.12894]
│   ├── ddpg                        <- DDPG https://arxiv.org/abs/1509.02971
│   ├── ppo                         <- PPO https://arxiv.org/abs/1707.06347
│   ├── ppo_gmm                     <- PPO+GMM
│   ├── ppo_rnd                     <- PPO+RND https://arxiv.org/abs/1810.12894 
│   ├── sac                         <- SAC https://arxiv.org/abs/1801.01290    
│   └── td3                         <- TD3 https://arxiv.org/abs/1802.09477
│
├── data                            <- Data folder for each algorithm to save checkpoints 
|                                      and reproduce experiments
│
├── images                          <- Generated graphics and figures for the repository README
│
├── tensorboard                     <- Monitor the progress of learning curves in real-time
│                                      with the power of tensorboard
│
├── utils                           <- Collection of several supplementary utilities
│   ├── gmm.py                      <- Gaussian Mixture Model definition and functionality
│   ├── logx.py                     <- A general-purpose logger
│   ├── ModifiedTensorBoard.py      <- Tensorboard
│   ├── mpi_pytorch.py              <- Data-parallel PyTorch optimization across MPI processes
│   ├── mpi_tools.py                <- MPI tools
│   ├── plot.py                     <- Plot handling
│   ├── run_utils.py                <- Utilities for running experiments
│   └── serialization_utils.py      <- Serialization utilities
│
├── run_experiment_grid.py          <- Run the same algorithm with many possible hyperparameters
├── requirements.txt                <- The requirements file for reproducing the python environment

Evaluation results

Investigating ACRE Novelty Signal Integration Mechanism

Performance comparison:

State-space coverage study:

Investigating Gaussian Mixture Model as Novelty Estimator

Performance comparison:

State-space coverage study:

Extensive Analysis on ACRE Performance

ACRE algorithm was evaluated on 12 continuous control tasks from the most well-known and used, openai-gym-style collections, using Tonic RL library. The evaluation was grouped into 3 bundles:

Standard openai-gym control tasks
1. BipedalWalker-v3
2. LunarLanderContinuous-v2
3. MountainCarContinuous-v0
4. Pendulum-v0
Advanced physics' simulator of MuJoCo environments
1. Ant-v3
2. Hopper-v3
3. Swimmer-v3
4. Walker2d-v3
DeepMind Control Suite
1. ball_in_cup-catch
2. cartpole-two_poles
3. finger-turn_easy
4. quadruped-walk

The performance of ACRE in comparison with A2C, DDPG, PPO, SAC, TD3 and TRPO is illustrated in the following figure:

Contributing

Contributions, issues and feature requests are welcome! Feel free to use issues page.

Cite as:

Kapoutsis, A. C., Koutras, D. I., Korkas, C. D., & Kosmatopoulos, E. B. (2023). ACRE: Actor-Critic with Reward-Preserving Exploration. Neural Computing and Applications, 1-14. [Link]

@article{kapoutsis2023acre,
title={ACRE: Actor-Critic with Reward-Preserving Exploration},
author={Kapoutsis, Athanasios Ch and Koutras, Dimitrios I and Korkas, Christos D and Kosmatopoulos, Elias B},
journal={Neural Computing and Applications},
pages={1--14},
year={2023},
publisher={Springer}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ACRE: Actor-Critic with Reward-Preserving Exploration

Installation

[Tested with python 3.7 and Ubuntu 18.04 & 20.04]

Example Usage

*execute this command after having started the training script (Step 3)

Project Structure

Evaluation results

Investigating ACRE Novelty Signal Integration Mechanism

Investigating Gaussian Mixture Model as Novelty Estimator

Extensive Analysis on ACRE Performance

Contributing

Cite as:

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
algos		algos
images		images
utils		utils
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
run_experiment_grid.py		run_experiment_grid.py
user_config.py		user_config.py

athakapo/ACRE

Folders and files

Latest commit

History

Repository files navigation

ACRE: Actor-Critic with Reward-Preserving Exploration

Installation

[Tested with python 3.7 and Ubuntu 18.04 & 20.04]

Example Usage

*execute this command after having started the training script (Step 3)

Project Structure

Evaluation results

Investigating ACRE Novelty Signal Integration Mechanism

Investigating Gaussian Mixture Model as Novelty Estimator

Extensive Analysis on ACRE Performance

Contributing

Cite as:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages