Skip to content
/ ACRE Public

ACRE: Actor-Critic with Reward-Preserving Exploration

Notifications You must be signed in to change notification settings

athakapo/ACRE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

61 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ACRE: Actor-Critic with Reward-Preserving Exploration

ACRE is a model-free, off-policy RL algorithm specifically designed to incorporate extra exploration signals without blurring the environmental rewards. ACRE is shipped with a Gaussian Mixture Model (GMM) to calculate the instantaneous novelty.

ACRE performance insights

Installation

[Tested with python 3.7 and Ubuntu 18.04 & 20.04]
  1. Install ubuntu needed libraries
sudo apt install libpython3.7-dev
sudo apt install libopenmpi-dev
  1. Install MuJoCo (Optional)

    If you want to use the MuJoCo environments you must follow readme instructions to install mujoco-py

  2. Clone this repository

git clone https://github.com/athakapo/ACRE.git
  1. Enter project's repository and create a new python environment of your choice. Here we provide a venv example, however the installation instructions using conda environment is pretty similar.
cd ACRE
python3.7 -m venv venv
  1. Activate the environment
. venv/bin/activate
  1. Install the needed dependencies*
python -m pip install -r requirements.txt

*If you encounter any problem in the installation of mpi4py, please check this guide: Probably you need to find your current path to mpicc (sudo find / -name mpicc) and then run: env MPICC=path_to_mpicc/mpicc python -m pip install mpi4py==3.0.3

Example Usage

  1. Open up terminal, navigate to project's repository and activate python environment
. venv/bin/activate
  1. Add [ACRE] project into your PYTHONPATH
export PYTHONPATH="$PWD"
  1. Execute a python scrypt
    1. [1rst Example] Run ACRE algorithm for MountainCarContinuous-v0 environment
      python algos/acre/acre.py --env MountainCarContinuous-v0
      
    2. [2nd Example] After defining the values in run_experiment_grid.py execute
      python run_experiment_grid.py
      
  2. Monitor learning progress through Tensorboard*
*execute this command after having started the training script (Step 3)
tensorboard --logdir tensorboard/

Project Structure

Following Spinning Up nomenclature:

├── README.md                       <- You are here!
├── algos                           <- All supported RL algorithms
│   ├── acre                        <- ACRE folder
│   │   ├── acre.py                 <- Algorithm logic and learning process
│   │   ├── acre_MountainCarContinuous-v0.py <- Saved ACRE parameters for MountainCarContinuous-v0 environment
│   │   ├── acre_Swimmer-v2.py      <- Saved ACRE parameters for Swimmer-v2 environment
│   │   └── core.py                 <- Neural networks definitions and varius ACRE utilities
│   ├── acre_rnd                    <- ACRE+RND [ACRE + https://arxiv.org/abs/1810.12894]
│   ├── ddpg                        <- DDPG https://arxiv.org/abs/1509.02971
│   ├── ppo                         <- PPO https://arxiv.org/abs/1707.06347
│   ├── ppo_gmm                     <- PPO+GMM
│   ├── ppo_rnd                     <- PPO+RND https://arxiv.org/abs/1810.12894 
│   ├── sac                         <- SAC https://arxiv.org/abs/1801.01290    
│   └── td3                         <- TD3 https://arxiv.org/abs/1802.09477
│
├── data                            <- Data folder for each algorithm to save checkpoints 
|                                      and reproduce experiments
│
├── images                          <- Generated graphics and figures for the repository README
│
├── tensorboard                     <- Monitor the progress of learning curves in real-time
│                                      with the power of tensorboard
│
├── utils                           <- Collection of several supplementary utilities
│   ├── gmm.py                      <- Gaussian Mixture Model definition and functionality
│   ├── logx.py                     <- A general-purpose logger
│   ├── ModifiedTensorBoard.py      <- Tensorboard
│   ├── mpi_pytorch.py              <- Data-parallel PyTorch optimization across MPI processes
│   ├── mpi_tools.py                <- MPI tools
│   ├── plot.py                     <- Plot handling
│   ├── run_utils.py                <- Utilities for running experiments
│   └── serialization_utils.py      <- Serialization utilities
│
├── run_experiment_grid.py          <- Run the same algorithm with many possible hyperparameters
├── requirements.txt                <- The requirements file for reproducing the python environment

Evaluation results

Investigating ACRE Novelty Signal Integration Mechanism

Performance comparison: ACRE experiments

State-space coverage study:

Investigating Gaussian Mixture Model as Novelty Estimator

Performance comparison: ACRE experiments

State-space coverage study:

Extensive Analysis on ACRE Performance

ACRE algorithm was evaluated on 12 continuous control tasks from the most well-known and used, openai-gym-style collections, using Tonic RL library. The evaluation was grouped into 3 bundles:

  1. Standard openai-gym control tasks
    1. BipedalWalker-v3
    2. LunarLanderContinuous-v2
    3. MountainCarContinuous-v0
    4. Pendulum-v0
  2. Advanced physics' simulator of MuJoCo environments
    1. Ant-v3
    2. Hopper-v3
    3. Swimmer-v3
    4. Walker2d-v3
  3. DeepMind Control Suite
    1. ball_in_cup-catch
    2. cartpole-two_poles
    3. finger-turn_easy
    4. quadruped-walk

The performance of ACRE in comparison with A2C, DDPG, PPO, SAC, TD3 and TRPO is illustrated in the following figure:

ACRE experiments

Contributing

Contributions, issues and feature requests are welcome! Feel free to use issues page.

Cite as:

Kapoutsis, A. C., Koutras, D. I., Korkas, C. D., & Kosmatopoulos, E. B. (2023). ACRE: Actor-Critic with Reward-Preserving Exploration. Neural Computing and Applications, 1-14. [Link]

@article{kapoutsis2023acre,
title={ACRE: Actor-Critic with Reward-Preserving Exploration},
author={Kapoutsis, Athanasios Ch and Koutras, Dimitrios I and Korkas, Christos D and Kosmatopoulos, Elias B},
journal={Neural Computing and Applications},
pages={1--14},
year={2023},
publisher={Springer}
}

About

ACRE: Actor-Critic with Reward-Preserving Exploration

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages