This repository contains implementation and analysis code for the following paper: Modeling Moral Choices in Social Dilemmas with Multi-Agent Reinforcement Learning, IJCAI'23. https://doi.org/10.24963/ijcai.2023/36
If you use this code, please cite the following paper:
@INPROCEEDINGS{Tennant-ijcai2023p36,
title = {Modeling Moral Choices in Social Dilemmas with Multi-Agent Reinforcement Learning},
author = {Tennant, Elizaveta and Hailes, Stephen and Musolesi, Mirco},
booktitle = {Proceedings of the Thirty-Second International Joint Conference on
Artificial Intelligence, {IJCAI-23}},
publisher = {International Joint Conferences on Artificial Intelligence Organization},
editor = {Edith Elkind},
pages = {317--325},
year = {2023},
month = {8},
note = {Main Track},
doi = {10.24963/ijcai.2023/36},
url = {https://doi.org/10.24963/ijcai.2023/36},
}
You can contact the authors at: l.karmannaya.16@ucl.ac.uk
Intall packages listed in requirements.txt into a Python environment.
pip install -r requirements.txt
This code can be used to run a simulation of social dilemma games between two agents - a learning moral agent M and a learning opponent O. We use a. Reinforcement Learning paradigm where each agent learns accoridng to a rewards signal.
the rewards is defined by the agent's payoff in a game. In particular, we use three social dilemma games (Iterated Prisoner's Dilemma - IPD, Iterated Volunteer's dilemma - IVD, Iterated Stag Hunt - ISH), with the following payoffs:
These experiments conduct a systematic comparison of interactions between pairs of various moral learning agents in each of the dilemma games. The moral agents are defined using the following reward functions:
After installing the required packages (see 'Setup'), you can run the experiments for each dilemma game separately. Run the following steps:
- From your Python environment, change to the direcory for the specific dilemma game - e.g. IPD
cd IPD
- Run a test - one run between two baseline players. Define required arguments for argparse: --title1, --title2. Also specify --num_runs to do just a single run. The output will be saved in a directory called 'results' within the IPD parent directory.
python3 main.py --title1 QLS --title2 QLS --num_runs 1
- To run the main experiments, run sets of commands from script_for_bash.sh. For example, to run the IPD experiments from the main paper in parallel:
python3 main.py --title1 QLS --title2 QLS --eps0 1.0 --epsdecay True &
python3 main.py --title1 QLUT --title2 QLS --eps0 1.0 --epsdecay True &
python3 main.py --title1 QLDE --title2 QLS --eps0 1.0 --epsdecay True &
python3 main.py --title1 QLVE_e --title2 QLS --eps0 1.0 --epsdecay True &
python3 main.py --title1 QLVE_k --title2 QLS --eps0 1.0 --epsdecay True &
python3 main.py --title1 QLVM --title2 QLS --eps0 1.0 --epsdecay True &
python3 main.py --title1 QLUT --title2 QLUT --eps0 1.0 --epsdecay True &
python3 main.py --title1 QLDE --title2 QLUT --eps0 1.0 --epsdecay True &
python3 main.py --title1 QLDE --title2 QLDE --eps0 1.0 --epsdecay True &
python3 main.py --title1 QLVE_e --title2 QLUT --eps0 1.0 --epsdecay True &
python3 main.py --title1 QLVE_e --title2 QLDE --eps0 1.0 --epsdecay True &
python3 main.py --title1 QLVE_e --title2 QLVE_e --eps0 1.0 --epsdecay True &
python3 main.py --title1 QLVE_k --title2 QLUT --eps0 1.0 --epsdecay True &
python3 main.py --title1 QLVE_k --title2 QLDE --eps0 1.0 --epsdecay True &
python3 main.py --title1 QLVE_k --title2 QLVE_e --eps0 1.0 --epsdecay True &
python3 main.py --title1 QLVE_k --title2 QLVE_k --eps0 1.0 --epsdecay True &
python3 main.py --title1 QLVM --title2 QLUT --eps0 1.0 --epsdecay True &
python3 main.py --title1 QLVM --title2 QLDE --eps0 1.0 --epsdecay True &
python3 main.py --title1 QLVM --title2 QLVE_e --eps0 1.0 --epsdecay True &
python3 main.py --title1 QLVM --title2 QLVE_k --eps0 1.0 --epsdecay True &
python3 main.py --title1 QLVM --title2 QLVM --eps0 1.0 --epsdecay True
- to run experiments for a different dilemma game:
cd ../IVD
#run main_volunteer.py with each pair of agents as above
cd ../ISH
#run main_staghunt.py with each pair of agents as above
manually specified:
--eps0 1.0 (intiial exploration rate)
--epsdecay True (whether linear exploration decay is present)
set by default within main.py:
master_seed = 1 (initial seed for SeedSequence in random number generator)
alpha0 = 0.01 & decay = 0.0005 (learning rate for Q-Learning)
num_iterations = 10000 (number of iterations within a single runs)
num_runs = 100 (number of runs with different seeds)
gamma = 0.9 (discout factor for Q-Learning)
mixed_beta = 0.5 (for Virtue-mixed agent)
We recommend running specific sections from plotting.py in an IPython environment. Plots will be saved within the specific game's directory, e.g. 'IPD/results/QLS_QLS/plots'.