Adding README
May 17, 2019
Merging main CERL code May 17, 2019
LICENSE Merging main CERL code May 17, 2019 Adding README May 17, 2019 Merging main CERL code May 17, 2019


Codebase for Collaborative Evolutionary Reinforcement Learning accepted to be published in the Proceedings of the 36th International Conference on Machine Learning, Long Beach, California, PMLR 97, 2019. Copyright 2019 by the author(s).

Guide to set up and run CERL Experiments

  1. Setup Conda

    • Install Anaconda3
    • conda create -n $ENV_NAME$ python=3.6.1
    • source activate $ENV_NAME$
  2. Install Pytorch version 1.0

    • Refer to for instructions
    • conda install pytorch torchvision -c pytorch [GPU-version]
  3. Install Numpy, Cython and Scipy

    • pip install numpy==1.15.4
    • pip install cython==0.29.2
    • pip install scipy==1.1.0
  4. Install Mujoco and OpenAI_Gym

    • Download mjpro150 from
    • Unzip mjpro150 and place it + mjkey.txt (license file) in ~/.mujoco/ (create the .mujoco dir in you home folder)
    • pip install -U 'mujoco-py<1.50.2,>=1.50.1'
    • pip install 'gym[all]'

Code labels Main Script runs everything

core/ Rollout worker

core/ Upper Confidence Bound implemented for learner selection by the resource-manager

core/ Portfolio of learners which can vary in their hyperparameters

core/ Learner agent encapsulating the algo and sum-statistics

core/ Cyclic Replay buffer

core/ Wrapper around the Mujoco env

core/ Actor/Critic model

core/ Implements Neuroevolution

core/ Implements the off_policy_gradient learner TD3

core/ Helper functions

Reproduce Results

python -env HalfCheetah-v2 -portfolio {10,14} -total_steps 2 -seed {2018,2022}

python -env Hopper-v2 -portfolio {10,14} -total_steps 1.5 -seed {2018,2022}

python -env Humanoid-v2 -portfolio {10,14} -total_steps 1 -seed {2018,2022}

python -env Walker2d-v2 -portfolio {10,14} -total_steps 2 -seed {2018,2022}

python -env Swimmer-v2 -portfolio {10,14} -total_steps 2 -seed {2018,2022}

python -env Hopper-v2 -portfolio {100,102} -total_steps 5 -seed {2018,2022}

where {} represents an inclusive discrete range: {10, 14} --> {10, 11, 12, 13, 14}


All roll-outs (evaluation of actors in the evolutionary population and the explorative roll-outs conducted by the learners run in parallel). They are farmed out to different CPU cores, and write asynchronously to the collective replay buffer. Thus, slight variations in results are observed even with the same seed.

