Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

Lifelong Hanabi


This repo contains code and models for Continuous Coordination As a Realistic Scenario for Lifelong Learning, a multi-agent lifelong learning testbed that supports both zero-shot and few-shot settings. Our setup is based on hanabi — a partially-observable, fully cooperative multi-agent game.


Lifelong Hanabi consists of 3 phases: 1- Pre-training, 2- Continual training, 3- Testing.

The code is built on top of the Other-Play & Simplified Action Decoder in Hanabi repo.

Requirements and Installation

The build process is tested with Python 3.7, PyTorch 1.5.1, CUDA 10.1, cudnn 7.6, and nccl 2.4

# clone the repo
git clone --recursive
cd Lifelong-Hanabi

# create new conda env
conda create -n lifelong_hanabi python=3.7
conda activate lifelong_hanabi
pip install -r requirements.txt

# build 
mkdir build
cd build
cmake ..
mv ..
mv rela/ ..
mv hanabi-learning-environment/ ../hanabi-learning-environment/

Once the building is done and the .so files are moved to their required places as mentioned above, every subsequent time you just need to run:

conda activate lifelong_hanabi
export PYTHONPATH=/path/to/lifelong_hanabi:$PYTHONPATH


1- Pre-Trained Agents

Run the following command to download the pre-trained agents used in the paper.

pip install gdown
gdown --id 1rpmTPIT-g026pdQfAwHoE4i8tP7Qj2vI

You can find a detailed description of each agent's configs and architectures here: results/Pre-trained agents pool for Continual Hanabi.xlsx contains the pre-trained agents we used in our experiments (this can be extended by further training more expert Hanabi players).

To run any .sh file, update <path-to-pretrained-model-pool-dir> and <save-dir>, accordingly. Important flags are:

Flags Description
--sad enables Simplified Action Decoder
--pred_weight weight for auxiliary task (typically 0.25)
--shuffle_color enable other-play
--seed seed

For details of other hyperparameters refer code and/or paper.

* Pre-train a new agent through self-play:

A sample script is provided in pyhanabi/tools/ that can be run:

cd pyhanabi
sh tools/

* Reproduce the cross-play matrix:

To evaluate all the agents with each other, run:

cd pyhanabi

Cross-play matrix from our runs can be found in results/scores_data_100_nrun5.csv (results/sem_data_100_nrun5.csv contains s.e.m)

2- Continual Training

To train the learner with a set of 5 partners using for eg. ER method, run:

cd pyhanabi
sh tools/continual_learning_scripts/

Zero-shot and few-shot checkpoints will be stored in <save-dir>. Similar scripts are available for all the other algorithms described in paper.

In order to log the continual training results (from the above checkpoints stored in <save-dir>), run:

cd pyhanabi
sh tools/

* Add your lifelong algorithm:

In order to implement a new lifelong learning algorithm, depending on the type of the algorithm you can modify one of the following:

Memory based methods: episodic_memory is a list of the replay buffers from previous tasks. You can change the way the batch is collected like here or the way this replayed batch constrains the current gradients code.

Regularization based methods: Here is where the fisher information matrix at the end of each task is estimated. You can modify the way corresponding regularization loss is calculated and added to the original loss here.

Training regimes: These are a list of hyper-parameters which has been shown here that have high impact on the performance of the lifelong learning algorithms.

Flags Description
--optim_name optimizer
--batchsize batch size
--decay_lr learning rate decay
--initial_lr initial learning rate

3- Testing

To evaluate the learner against a set of unseen agents, run:

cd pyhanabi
sh tools/

Logging continual training results and testing requires a wandb account to plot the results.

Plot results

All the plots and experiment details are available at wandb report.

  • Other code used to reproduce figures in the paper can be found in results


If you found this work useful, please consider citing our paper.

      title={Continuous Coordination As a Realistic Scenario for Lifelong Learning},
      author={Hadi Nekoei and Akilesh Badrinaaraayanan and Aaron Courville and Sarath Chandar},


A Continual Multi-agent RL testbed based on Hanabi







No releases published


No packages published