# CPR appropriation with policy gradient

This notebook contains actual Harvest trainings for each implemented policy gradient method. The environment in use is a custom implementation of Harvest.

## Pre-requisites

The cells down below install and import the necessary libraries to successfully run the notebook examples.

In [1]:
import sys
sys.path.append('../')

In [41]:
%%capture
!pip install -r ../init/requirements.txt
!pip install ../src/gym_cpr_grid

In [48]:
import numpy as np
import gym

from src import memory, models, policies

%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


## Utilities

The cell down below defines the environment, along with common variables to be used throughout the notebook.

In [49]:
env = gym.make(
    'gym_cpr_grid:CPRGridEnv-v0', 
    n_agents=11, 
    grid_width=39, 
    grid_height=19,
    tagging_ability=True,
    gifting_mechanism=None
)

In [53]:
observation_space_size = env.observation_space_size()
action_space_size = env.action_space_size()
epochs = 4000
steps_per_epoch = 4000
save_every = 100
hidden_sizes = [32, 32]
checkpoints_path = "../checkpoints"
render_every = 100
wandb_config = {
    "api_key": open("../wandb_api_key_file", "r").read().strip(),
    "project": "cpr-appropriation",
    "entity": "wadaboa",
}

## VPG

This section deals with training a set of Harvest agents using our custom Vanilla Policy Gradient implementation.

In [46]:
vpg_policy_nn = models.MLP(observation_space_size, hidden_sizes, action_space_size)
vpg_baseline_nn = models.MLP(observation_space_size, hidden_sizes, 1, log_softmax=False)
vpg_policy = policies.VPGPolicy(env, vpg_policy_nn, baseline_nn=vpg_baseline_nn)
vpg_policy.train(
    epochs,
    steps_per_epoch,
    enable_wandb=True,
    wandb_config={**wandb_config, "group": "VPG"},
    render_every=render_every
)

[34m[1mwandb[0m: Currently logged in as: [33mwadaboa[0m (use `wandb login --relogin` to force relogin)


2021-08-25 13:05:12.502 | INFO     | src.policies:train:103 - Epoch 1 / 4000
2021-08-25 13:05:12.503 | INFO     | src.policies:train:110 - Episode 1
2021-08-25 13:05:30.874 | DEBUG    | src.policies:execute_episode:270 - Early stopping, all agents done
2021-08-25 13:05:30.876 | INFO     | src.policies:train:117 - Episode infos: {'efficiency': 210.0909090909091, 'equality': 0.9288776995411495, 'sustainability': 474.1942691329546, 'peace': 756.9090909090909}
2021-08-25 13:05:30.877 | INFO     | src.policies:train:122 - Mean episode return: 210.0909090909091
2021-08-25 13:05:30.877 | INFO     | src.policies:train:123 - Last 100 episodes mean return: 210.0909090909091
2021-08-25 13:05:38.303 | INFO     | src.policies:train:159 - Total loss: 0.9996551871299744
2021-08-25 13:05:38.304 | INFO     | src.policies:train:164 - Epoch infos: {'efficiency': 210.0909090909091, 'equality': 0.9288776995411495, 'sustainability': 474.1942691329546, 'peace': 756.9090909090909}
2021-08-25 13:05:38.348 | IN

2021-08-25 13:08:40.733 | INFO     | src.policies:train:159 - Total loss: 0.9971657991409302
2021-08-25 13:08:40.734 | INFO     | src.policies:train:164 - Epoch infos: {'efficiency': 192.0909090909091, 'equality': 0.8630985673135332, 'sustainability': 481.78660196007417, 'peace': 653.1818181818181}
2021-08-25 13:08:40.781 | INFO     | src.policies:train:103 - Epoch 9 / 4000
2021-08-25 13:08:40.782 | INFO     | src.policies:train:110 - Episode 9
2021-08-25 13:08:59.288 | DEBUG    | src.policies:execute_episode:270 - Early stopping, all agents done
2021-08-25 13:08:59.309 | INFO     | src.policies:train:117 - Episode infos: {'efficiency': 215.0, 'equality': 0.9331923890076266, 'sustainability': 505.6465004460492, 'peace': 812.8181818181819}
2021-08-25 13:08:59.310 | INFO     | src.policies:train:122 - Mean episode return: 215.0
2021-08-25 13:08:59.310 | INFO     | src.policies:train:123 - Last 100 episodes mean return: 207.2828282828283
2021-08-25 13:09:06.978 | INFO     | src.policies:t

2021-08-25 13:12:05.012 | INFO     | src.policies:train:122 - Mean episode return: 194.27272727272728
2021-08-25 13:12:05.012 | INFO     | src.policies:train:123 - Last 100 episodes mean return: 208.84090909090907
2021-08-25 13:12:13.399 | INFO     | src.policies:train:159 - Total loss: 0.9996477365493774
2021-08-25 13:12:13.400 | INFO     | src.policies:train:164 - Epoch infos: {'efficiency': 194.27272727272728, 'equality': 0.8995192921278871, 'sustainability': 478.79792458808674, 'peace': 725.3636363636364}
2021-08-25 13:12:13.447 | INFO     | src.policies:train:103 - Epoch 17 / 4000
2021-08-25 13:12:13.448 | INFO     | src.policies:train:110 - Episode 17
2021-08-25 13:12:32.802 | DEBUG    | src.policies:execute_episode:270 - Early stopping, all agents done
2021-08-25 13:12:32.824 | INFO     | src.policies:train:117 - Episode infos: {'efficiency': 207.9090909090909, 'equality': 0.9270978256563362, 'sustainability': 487.7497772200393, 'peace': 669.0}
2021-08-25 13:12:32.825 | INFO    

2021-08-25 13:15:30.666 | INFO     | src.policies:train:103 - Epoch 24 / 4000
2021-08-25 13:15:30.667 | INFO     | src.policies:train:110 - Episode 24
2021-08-25 13:15:51.046 | DEBUG    | src.policies:execute_episode:270 - Early stopping, all agents done
2021-08-25 13:15:51.068 | INFO     | src.policies:train:117 - Episode infos: {'efficiency': 197.36363636363637, 'equality': 0.9317449018062112, 'sustainability': 493.5729859192556, 'peace': 706.1818181818181}
2021-08-25 13:15:51.069 | INFO     | src.policies:train:122 - Mean episode return: 197.36363636363637
2021-08-25 13:15:51.069 | INFO     | src.policies:train:123 - Last 100 episodes mean return: 209.82196969696966
2021-08-25 13:15:59.356 | INFO     | src.policies:train:159 - Total loss: 0.9995957016944885
2021-08-25 13:15:59.356 | INFO     | src.policies:train:164 - Epoch infos: {'efficiency': 197.36363636363637, 'equality': 0.9317449018062112, 'sustainability': 493.5729859192556, 'peace': 706.1818181818181}
2021-08-25 13:15:59.40

2021-08-25 13:19:23.961 | INFO     | src.policies:train:159 - Total loss: 1.0068039894104004
2021-08-25 13:19:23.961 | INFO     | src.policies:train:164 - Epoch infos: {'efficiency': 201.27272727272728, 'equality': 0.9343844953614522, 'sustainability': 481.00459614865974, 'peace': 689.1818181818181}
2021-08-25 13:19:24.008 | INFO     | src.policies:train:103 - Epoch 32 / 4000
2021-08-25 13:19:24.008 | INFO     | src.policies:train:110 - Episode 32
2021-08-25 13:19:45.644 | DEBUG    | src.policies:execute_episode:270 - Early stopping, all agents done
2021-08-25 13:19:45.667 | INFO     | src.policies:train:117 - Episode infos: {'efficiency': 213.0, 'equality': 0.9480076048588055, 'sustainability': 497.9819523633874, 'peace': 797.7272727272727}
2021-08-25 13:19:45.668 | INFO     | src.policies:train:122 - Mean episode return: 213.0
2021-08-25 13:19:45.668 | INFO     | src.policies:train:123 - Last 100 episodes mean return: 209.78977272727272
2021-08-25 13:19:54.482 | INFO     | src.polici

2021-08-25 13:23:22.596 | INFO     | src.policies:train:117 - Episode infos: {'efficiency': 212.27272727272728, 'equality': 0.9247031341264415, 'sustainability': 497.69277664037327, 'peace': 649.4545454545455}
2021-08-25 13:23:22.597 | INFO     | src.policies:train:122 - Mean episode return: 212.27272727272728
2021-08-25 13:23:22.597 | INFO     | src.policies:train:123 - Last 100 episodes mean return: 209.96270396270396
2021-08-25 13:23:31.632 | INFO     | src.policies:train:159 - Total loss: 1.001400351524353
2021-08-25 13:23:31.633 | INFO     | src.policies:train:164 - Epoch infos: {'efficiency': 212.27272727272728, 'equality': 0.9247031341264415, 'sustainability': 497.69277664037327, 'peace': 649.4545454545455}
2021-08-25 13:23:31.680 | INFO     | src.policies:train:103 - Epoch 40 / 4000
2021-08-25 13:23:31.681 | INFO     | src.policies:train:110 - Episode 40
2021-08-25 13:23:54.392 | DEBUG    | src.policies:execute_episode:270 - Early stopping, all agents done
2021-08-25 13:23:54.4

2021-08-25 13:27:13.340 | INFO     | src.policies:train:164 - Epoch infos: {'efficiency': 212.27272727272728, 'equality': 0.937161767569843, 'sustainability': 497.7415083601114, 'peace': 640.7272727272727}
2021-08-25 13:27:13.391 | INFO     | src.policies:train:103 - Epoch 47 / 4000
2021-08-25 13:27:13.391 | INFO     | src.policies:train:110 - Episode 47
2021-08-25 13:27:35.457 | DEBUG    | src.policies:execute_episode:270 - Early stopping, all agents done
2021-08-25 13:27:35.484 | INFO     | src.policies:train:117 - Episode infos: {'efficiency': 219.9090909090909, 'equality': 0.9539253635996482, 'sustainability': 487.9896388131372, 'peace': 768.8181818181819}
2021-08-25 13:27:35.485 | INFO     | src.policies:train:122 - Mean episode return: 219.9090909090909
2021-08-25 13:27:35.485 | INFO     | src.policies:train:123 - Last 100 episodes mean return: 210.20309477756285
2021-08-25 13:27:44.547 | INFO     | src.policies:train:159 - Total loss: 1.0042535066604614
2021-08-25 13:27:44.548 |

2021-08-25 13:31:21.857 | INFO     | src.policies:train:122 - Mean episode return: 217.8181818181818
2021-08-25 13:31:21.857 | INFO     | src.policies:train:123 - Last 100 episodes mean return: 210.35185185185185
2021-08-25 13:31:31.444 | INFO     | src.policies:train:159 - Total loss: 1.000842809677124
2021-08-25 13:31:31.445 | INFO     | src.policies:train:164 - Epoch infos: {'efficiency': 217.8181818181818, 'equality': 0.9244194870252614, 'sustainability': 489.18477372369665, 'peace': 717.8181818181819}
2021-08-25 13:31:31.495 | INFO     | src.policies:train:103 - Epoch 55 / 4000
2021-08-25 13:31:31.495 | INFO     | src.policies:train:110 - Episode 55
2021-08-25 13:31:54.230 | DEBUG    | src.policies:execute_episode:270 - Early stopping, all agents done
2021-08-25 13:31:54.255 | INFO     | src.policies:train:117 - Episode infos: {'efficiency': 196.36363636363637, 'equality': 0.9030303030323437, 'sustainability': 479.37800590856585, 'peace': 659.5454545454545}
2021-08-25 13:31:54.256

2021-08-25 13:35:33.107 | DEBUG    | src.policies:execute_episode:270 - Early stopping, all agents done
2021-08-25 13:35:33.132 | INFO     | src.policies:train:117 - Episode infos: {'efficiency': 202.8181818181818, 'equality': 0.8948698097083478, 'sustainability': 483.87186448110066, 'peace': 703.5454545454545}
2021-08-25 13:35:33.132 | INFO     | src.policies:train:122 - Mean episode return: 202.8181818181818
2021-08-25 13:35:33.133 | INFO     | src.policies:train:123 - Last 100 episodes mean return: 209.32697947214078
2021-08-25 13:35:42.191 | INFO     | src.policies:train:159 - Total loss: 1.004791021347046
2021-08-25 13:35:42.192 | INFO     | src.policies:train:164 - Epoch infos: {'efficiency': 202.8181818181818, 'equality': 0.8948698097083478, 'sustainability': 483.87186448110066, 'peace': 703.5454545454545}
2021-08-25 13:35:42.240 | INFO     | src.policies:train:103 - Epoch 63 / 4000
2021-08-25 13:35:42.241 | INFO     | src.policies:train:110 - Episode 63
2021-08-25 13:36:04.774 

2021-08-25 13:39:23.815 | INFO     | src.policies:train:159 - Total loss: 1.0009381771087646
2021-08-25 13:39:23.815 | INFO     | src.policies:train:164 - Epoch infos: {'efficiency': 199.8181818181818, 'equality': 0.9262966333045269, 'sustainability': 466.8555455640329, 'peace': 701.5454545454545}
2021-08-25 13:39:23.865 | INFO     | src.policies:train:103 - Epoch 70 / 4000
2021-08-25 13:39:23.866 | INFO     | src.policies:train:110 - Episode 70
2021-08-25 13:39:46.689 | DEBUG    | src.policies:execute_episode:270 - Early stopping, all agents done
2021-08-25 13:39:46.717 | INFO     | src.policies:train:117 - Episode infos: {'efficiency': 213.45454545454547, 'equality': 0.9468019203974988, 'sustainability': 494.0652500967135, 'peace': 763.9090909090909}
2021-08-25 13:39:46.718 | INFO     | src.policies:train:122 - Mean episode return: 213.45454545454547
2021-08-25 13:39:46.718 | INFO     | src.policies:train:123 - Last 100 episodes mean return: 208.5194805194805
2021-08-25 13:39:55.714 

2021-08-25 13:43:28.370 | INFO     | src.policies:train:122 - Mean episode return: 198.27272727272728
2021-08-25 13:43:28.370 | INFO     | src.policies:train:123 - Last 100 episodes mean return: 208.03896103896105
2021-08-25 13:43:37.445 | INFO     | src.policies:train:159 - Total loss: 0.9987347722053528
2021-08-25 13:43:37.446 | INFO     | src.policies:train:164 - Epoch infos: {'efficiency': 198.27272727272728, 'equality': 0.9363928139732318, 'sustainability': 470.1142073718709, 'peace': 700.6363636363636}
2021-08-25 13:43:37.495 | INFO     | src.policies:train:103 - Epoch 78 / 4000
2021-08-25 13:43:37.496 | INFO     | src.policies:train:110 - Episode 78
2021-08-25 13:43:59.854 | DEBUG    | src.policies:execute_episode:270 - Early stopping, all agents done
2021-08-25 13:43:59.877 | INFO     | src.policies:train:117 - Episode infos: {'efficiency': 201.45454545454547, 'equality': 0.9405973088295743, 'sustainability': 476.0885174774863, 'peace': 720.2727272727273}
2021-08-25 13:43:59.87

2021-08-25 13:47:15.990 | INFO     | src.policies:train:110 - Episode 85
2021-08-25 13:47:37.701 | DEBUG    | src.policies:execute_episode:270 - Early stopping, all agents done
2021-08-25 13:47:37.725 | INFO     | src.policies:train:117 - Episode infos: {'efficiency': 194.0909090909091, 'equality': 0.9129657228036414, 'sustainability': 502.9879926180079, 'peace': 766.0}
2021-08-25 13:47:37.725 | INFO     | src.policies:train:122 - Mean episode return: 194.0909090909091
2021-08-25 13:47:37.726 | INFO     | src.policies:train:123 - Last 100 episodes mean return: 207.97112299465238
2021-08-25 13:47:46.830 | INFO     | src.policies:train:159 - Total loss: 1.0055216550827026
2021-08-25 13:47:46.831 | INFO     | src.policies:train:164 - Epoch infos: {'efficiency': 194.0909090909091, 'equality': 0.9129657228036414, 'sustainability': 502.9879926180079, 'peace': 766.0}
2021-08-25 13:47:46.882 | INFO     | src.policies:train:103 - Epoch 86 / 4000
2021-08-25 13:47:46.882 | INFO     | src.policies

2021-08-25 13:51:32.719 | INFO     | src.policies:train:164 - Epoch infos: {'efficiency': 218.63636363636363, 'equality': 0.9092799092816239, 'sustainability': 510.7701598743622, 'peace': 743.8181818181819}
2021-08-25 13:51:32.775 | INFO     | src.policies:train:103 - Epoch 93 / 4000
2021-08-25 13:51:32.776 | INFO     | src.policies:train:110 - Episode 93
2021-08-25 13:51:58.501 | DEBUG    | src.policies:execute_episode:270 - Early stopping, all agents done
2021-08-25 13:51:58.528 | INFO     | src.policies:train:117 - Episode infos: {'efficiency': 200.36363636363637, 'equality': 0.9674146180504988, 'sustainability': 470.52826071218317, 'peace': 734.0909090909091}
2021-08-25 13:51:58.529 | INFO     | src.policies:train:122 - Mean episode return: 200.36363636363637
2021-08-25 13:51:58.530 | INFO     | src.policies:train:123 - Last 100 episodes mean return: 207.80938416422288
2021-08-25 13:52:09.545 | INFO     | src.policies:train:159 - Total loss: 1.0045057535171509
2021-08-25 13:52:09.5

2021-08-25 13:55:44.680 | INFO     | src.policies:train:122 - Mean episode return: 184.63636363636363
2021-08-25 13:55:44.681 | INFO     | src.policies:train:123 - Last 100 episodes mean return: 207.06909090909093
2021-08-25 13:55:54.212 | INFO     | src.policies:train:159 - Total loss: 1.0026956796646118
2021-08-25 13:55:54.213 | INFO     | src.policies:train:164 - Epoch infos: {'efficiency': 184.63636363636363, 'equality': 0.9287408799980139, 'sustainability': 443.5497396882569, 'peace': 595.9090909090909}
2021-08-25 13:55:54.264 | INFO     | src.policies:train:103 - Epoch 101 / 4000
2021-08-25 13:55:54.265 | INFO     | src.policies:train:110 - Episode 101
2021-08-25 13:56:19.664 | DEBUG    | src.policies:execute_episode:270 - Early stopping, all agents done
2021-08-25 13:56:19.690 | INFO     | src.policies:train:117 - Episode infos: {'efficiency': 213.54545454545453, 'equality': 0.9232942451348098, 'sustainability': 503.95696799951696, 'peace': 785.1818181818181}
2021-08-25 13:56:19

2021-08-25 13:59:54.972 | INFO     | src.policies:train:110 - Episode 108
2021-08-25 14:00:18.262 | DEBUG    | src.policies:execute_episode:270 - Early stopping, all agents done
2021-08-25 14:00:18.286 | INFO     | src.policies:train:117 - Episode infos: {'efficiency': 199.54545454545453, 'equality': 0.9006833713004622, 'sustainability': 465.8076161078238, 'peace': 721.6363636363636}
2021-08-25 14:00:18.286 | INFO     | src.policies:train:122 - Mean episode return: 199.54545454545453
2021-08-25 14:00:18.287 | INFO     | src.policies:train:123 - Last 100 episodes mean return: 206.89818181818183
2021-08-25 14:00:27.957 | INFO     | src.policies:train:159 - Total loss: 0.9994522929191589
2021-08-25 14:00:27.957 | INFO     | src.policies:train:164 - Epoch infos: {'efficiency': 199.54545454545453, 'equality': 0.9006833713004622, 'sustainability': 465.8076161078238, 'peace': 721.6363636363636}
2021-08-25 14:00:28.012 | INFO     | src.policies:train:103 - Epoch 109 / 4000
2021-08-25 14:00:28.

2021-08-25 14:04:40.656 | INFO     | src.policies:train:159 - Total loss: 0.9985252618789673
2021-08-25 14:04:40.657 | INFO     | src.policies:train:164 - Epoch infos: {'efficiency': 208.9090909090909, 'equality': 0.9355170504008324, 'sustainability': 509.1051252571682, 'peace': 742.2727272727273}
2021-08-25 14:04:40.711 | INFO     | src.policies:train:103 - Epoch 116 / 4000
2021-08-25 14:04:40.712 | INFO     | src.policies:train:110 - Episode 116
2021-08-25 14:05:07.004 | DEBUG    | src.policies:execute_episode:270 - Early stopping, all agents done
2021-08-25 14:05:07.032 | INFO     | src.policies:train:117 - Episode infos: {'efficiency': 199.27272727272728, 'equality': 0.9255142667566872, 'sustainability': 490.7914187613911, 'peace': 764.9090909090909}
2021-08-25 14:05:07.032 | INFO     | src.policies:train:122 - Mean episode return: 199.27272727272728
2021-08-25 14:05:07.033 | INFO     | src.policies:train:123 - Last 100 episodes mean return: 206.5127272727273
2021-08-25 14:05:17.33

2021-08-25 14:09:21.001 | INFO     | src.policies:train:122 - Mean episode return: 207.9090909090909
2021-08-25 14:09:21.002 | INFO     | src.policies:train:123 - Last 100 episodes mean return: 205.7563636363637
2021-08-25 14:09:30.323 | INFO     | src.policies:train:159 - Total loss: 0.9974158406257629
2021-08-25 14:09:30.324 | INFO     | src.policies:train:164 - Epoch infos: {'efficiency': 207.9090909090909, 'equality': 0.9100846682849687, 'sustainability': 477.1056953834955, 'peace': 675.6363636363636}
2021-08-25 14:09:30.374 | INFO     | src.policies:train:103 - Epoch 124 / 4000
2021-08-25 14:09:30.374 | INFO     | src.policies:train:110 - Episode 124
2021-08-25 14:09:53.591 | DEBUG    | src.policies:execute_episode:270 - Early stopping, all agents done
2021-08-25 14:09:53.616 | INFO     | src.policies:train:117 - Episode infos: {'efficiency': 211.27272727272728, 'equality': 0.9343608199042724, 'sustainability': 487.49962352414934, 'peace': 774.2727272727273}
2021-08-25 14:09:53.61

2021-08-25 14:13:38.547 | INFO     | src.policies:train:103 - Epoch 131 / 4000
2021-08-25 14:13:38.548 | INFO     | src.policies:train:110 - Episode 131
2021-08-25 14:14:02.012 | DEBUG    | src.policies:execute_episode:270 - Early stopping, all agents done
2021-08-25 14:14:02.038 | INFO     | src.policies:train:117 - Episode infos: {'efficiency': 221.36363636363637, 'equality': 0.9255553481440069, 'sustainability': 471.8954606985578, 'peace': 731.4545454545455}
2021-08-25 14:14:02.038 | INFO     | src.policies:train:122 - Mean episode return: 221.36363636363637
2021-08-25 14:14:02.039 | INFO     | src.policies:train:123 - Last 100 episodes mean return: 206.03363636363636
2021-08-25 14:14:12.011 | INFO     | src.policies:train:159 - Total loss: 1.002609133720398
2021-08-25 14:14:12.012 | INFO     | src.policies:train:164 - Epoch infos: {'efficiency': 221.36363636363637, 'equality': 0.9255553481440069, 'sustainability': 471.8954606985578, 'peace': 731.4545454545455}
2021-08-25 14:14:12.0

2021-08-25 14:18:30.135 | INFO     | src.policies:train:159 - Total loss: 1.0050582885742188
2021-08-25 14:18:30.136 | INFO     | src.policies:train:164 - Epoch infos: {'efficiency': 217.0909090909091, 'equality': 0.9615501751187462, 'sustainability': 493.8885265814053, 'peace': 794.4545454545455}
2021-08-25 14:18:30.190 | INFO     | src.policies:train:103 - Epoch 139 / 4000
2021-08-25 14:18:30.191 | INFO     | src.policies:train:110 - Episode 139
2021-08-25 14:18:55.485 | DEBUG    | src.policies:execute_episode:270 - Early stopping, all agents done
2021-08-25 14:18:55.512 | INFO     | src.policies:train:117 - Episode infos: {'efficiency': 207.36363636363637, 'equality': 0.9320074927278306, 'sustainability': 471.05786822758404, 'peace': 729.4545454545455}
2021-08-25 14:18:55.513 | INFO     | src.policies:train:122 - Mean episode return: 207.36363636363637
2021-08-25 14:18:55.514 | INFO     | src.policies:train:123 - Last 100 episodes mean return: 205.16090909090912
2021-08-25 14:19:05.

2021-08-25 14:22:57.836 | INFO     | src.policies:train:117 - Episode infos: {'efficiency': 199.36363636363637, 'equality': 0.916594121794209, 'sustainability': 476.41264874360405, 'peace': 659.4545454545455}
2021-08-25 14:22:57.837 | INFO     | src.policies:train:122 - Mean episode return: 199.36363636363637
2021-08-25 14:22:57.838 | INFO     | src.policies:train:123 - Last 100 episodes mean return: 205.1127272727273
2021-08-25 14:23:08.737 | INFO     | src.policies:train:159 - Total loss: 1.003564476966858
2021-08-25 14:23:08.738 | INFO     | src.policies:train:164 - Epoch infos: {'efficiency': 199.36363636363637, 'equality': 0.916594121794209, 'sustainability': 476.41264874360405, 'peace': 659.4545454545455}
2021-08-25 14:23:08.801 | INFO     | src.policies:train:103 - Epoch 147 / 4000
2021-08-25 14:23:08.802 | INFO     | src.policies:train:110 - Episode 147
2021-08-25 14:23:34.859 | DEBUG    | src.policies:execute_episode:270 - Early stopping, all agents done
2021-08-25 14:23:34.88

2021-08-25 14:27:19.743 | INFO     | src.policies:train:103 - Epoch 154 / 4000
2021-08-25 14:27:19.743 | INFO     | src.policies:train:110 - Episode 154
2021-08-25 14:27:45.590 | DEBUG    | src.policies:execute_episode:270 - Early stopping, all agents done
2021-08-25 14:27:45.617 | INFO     | src.policies:train:117 - Episode infos: {'efficiency': 206.72727272727272, 'equality': 0.927240745184152, 'sustainability': 498.0924375206096, 'peace': 742.4545454545455}
2021-08-25 14:27:45.618 | INFO     | src.policies:train:122 - Mean episode return: 206.72727272727272
2021-08-25 14:27:45.618 | INFO     | src.policies:train:123 - Last 100 episodes mean return: 204.85636363636365
2021-08-25 14:27:55.839 | INFO     | src.policies:train:159 - Total loss: 0.9998180270195007
2021-08-25 14:27:55.840 | INFO     | src.policies:train:164 - Epoch infos: {'efficiency': 206.72727272727272, 'equality': 0.927240745184152, 'sustainability': 498.0924375206096, 'peace': 742.4545454545455}
2021-08-25 14:27:55.89

2021-08-25 14:32:14.494 | INFO     | src.policies:train:159 - Total loss: 1.003471851348877
2021-08-25 14:32:14.495 | INFO     | src.policies:train:164 - Epoch infos: {'efficiency': 198.1818181818182, 'equality': 0.9045037531295975, 'sustainability': 481.2017994650568, 'peace': 692.8181818181819}
2021-08-25 14:32:14.548 | INFO     | src.policies:train:103 - Epoch 162 / 4000
2021-08-25 14:32:14.548 | INFO     | src.policies:train:110 - Episode 162
2021-08-25 14:32:38.652 | DEBUG    | src.policies:execute_episode:270 - Early stopping, all agents done
2021-08-25 14:32:38.676 | INFO     | src.policies:train:117 - Episode infos: {'efficiency': 222.0909090909091, 'equality': 0.931232091691824, 'sustainability': 505.56914786191487, 'peace': 809.2727272727273}
2021-08-25 14:32:38.677 | INFO     | src.policies:train:122 - Mean episode return: 222.0909090909091
2021-08-25 14:32:38.677 | INFO     | src.policies:train:123 - Last 100 episodes mean return: 205.34636363636366
2021-08-25 14:32:48.343 

2021-08-25 14:36:48.734 | INFO     | src.policies:train:117 - Episode infos: {'efficiency': 214.54545454545453, 'equality': 0.946456086287626, 'sustainability': 489.45420894759667, 'peace': 804.8181818181819}
2021-08-25 14:36:48.735 | INFO     | src.policies:train:122 - Mean episode return: 214.54545454545453
2021-08-25 14:36:48.735 | INFO     | src.policies:train:123 - Last 100 episodes mean return: 205.70909090909092
2021-08-25 14:36:58.986 | INFO     | src.policies:train:159 - Total loss: 0.9986152052879333
2021-08-25 14:36:58.987 | INFO     | src.policies:train:164 - Epoch infos: {'efficiency': 214.54545454545453, 'equality': 0.946456086287626, 'sustainability': 489.45420894759667, 'peace': 804.8181818181819}
2021-08-25 14:36:59.044 | INFO     | src.policies:train:103 - Epoch 170 / 4000
2021-08-25 14:36:59.045 | INFO     | src.policies:train:110 - Episode 170
2021-08-25 14:37:25.906 | DEBUG    | src.policies:execute_episode:270 - Early stopping, all agents done
2021-08-25 14:37:25.

2021-08-25 14:41:19.064 | INFO     | src.policies:train:103 - Epoch 177 / 4000
2021-08-25 14:41:19.065 | INFO     | src.policies:train:110 - Episode 177
2021-08-25 14:41:46.706 | DEBUG    | src.policies:execute_episode:270 - Early stopping, all agents done
2021-08-25 14:41:46.735 | INFO     | src.policies:train:117 - Episode infos: {'efficiency': 208.1818181818182, 'equality': 0.947121873760478, 'sustainability': 491.33049439048494, 'peace': 745.3636363636364}
2021-08-25 14:41:46.735 | INFO     | src.policies:train:122 - Mean episode return: 208.1818181818182
2021-08-25 14:41:46.736 | INFO     | src.policies:train:123 - Last 100 episodes mean return: 205.57818181818186
2021-08-25 14:41:57.745 | INFO     | src.policies:train:159 - Total loss: 0.9979227781295776
2021-08-25 14:41:57.746 | INFO     | src.policies:train:164 - Epoch infos: {'efficiency': 208.1818181818182, 'equality': 0.947121873760478, 'sustainability': 491.33049439048494, 'peace': 745.3636363636364}
2021-08-25 14:41:57.809

2021-08-25 14:46:17.212 | INFO     | src.policies:train:159 - Total loss: 1.0010466575622559
2021-08-25 14:46:17.213 | INFO     | src.policies:train:164 - Epoch infos: {'efficiency': 205.1818181818182, 'equality': 0.9266121561218308, 'sustainability': 509.5150748403839, 'peace': 718.1818181818181}
2021-08-25 14:46:17.271 | INFO     | src.policies:train:103 - Epoch 185 / 4000
2021-08-25 14:46:17.271 | INFO     | src.policies:train:110 - Episode 185
2021-08-25 14:46:44.930 | DEBUG    | src.policies:execute_episode:270 - Early stopping, all agents done
2021-08-25 14:46:44.958 | INFO     | src.policies:train:117 - Episode infos: {'efficiency': 207.63636363636363, 'equality': 0.9359974526362045, 'sustainability': 495.58513055400476, 'peace': 735.8181818181819}
2021-08-25 14:46:44.958 | INFO     | src.policies:train:122 - Mean episode return: 207.63636363636363
2021-08-25 14:46:44.959 | INFO     | src.policies:train:123 - Last 100 episodes mean return: 205.5009090909091
2021-08-25 14:46:55.2

2021-08-25 14:50:54.251 | INFO     | src.policies:train:117 - Episode infos: {'efficiency': 224.63636363636363, 'equality': 0.9416504175721708, 'sustainability': 487.15878192360566, 'peace': 778.5454545454545}
2021-08-25 14:50:54.252 | INFO     | src.policies:train:122 - Mean episode return: 224.63636363636363
2021-08-25 14:50:54.252 | INFO     | src.policies:train:123 - Last 100 episodes mean return: 205.71181818181816
2021-08-25 14:51:04.164 | INFO     | src.policies:train:159 - Total loss: 1.0041038990020752
2021-08-25 14:51:04.165 | INFO     | src.policies:train:164 - Epoch infos: {'efficiency': 224.63636363636363, 'equality': 0.9416504175721708, 'sustainability': 487.15878192360566, 'peace': 778.5454545454545}
2021-08-25 14:51:04.217 | INFO     | src.policies:train:103 - Epoch 193 / 4000
2021-08-25 14:51:04.218 | INFO     | src.policies:train:110 - Episode 193
2021-08-25 14:51:28.869 | DEBUG    | src.policies:execute_episode:270 - Early stopping, all agents done
2021-08-25 14:51:2

2021-08-25 14:55:19.186 | INFO     | src.policies:train:164 - Epoch infos: {'efficiency': 221.63636363636363, 'equality': 0.9295249459331507, 'sustainability': 474.3807650481449, 'peace': 727.9090909090909}
2021-08-25 14:55:19.238 | INFO     | src.policies:train:103 - Epoch 200 / 4000
2021-08-25 14:55:19.239 | INFO     | src.policies:train:110 - Episode 200
2021-08-25 14:55:42.727 | DEBUG    | src.policies:execute_episode:270 - Early stopping, all agents done
2021-08-25 14:55:42.754 | INFO     | src.policies:train:117 - Episode infos: {'efficiency': 202.0909090909091, 'equality': 0.9567333251552624, 'sustainability': 485.59020120034506, 'peace': 722.9090909090909}
2021-08-25 14:55:42.755 | INFO     | src.policies:train:122 - Mean episode return: 202.0909090909091
2021-08-25 14:55:42.755 | INFO     | src.policies:train:123 - Last 100 episodes mean return: 206.77999999999997
2021-08-25 14:55:52.843 | INFO     | src.policies:train:159 - Total loss: 1.0034147500991821
2021-08-25 14:55:52.8

2021-08-25 14:59:56.784 | INFO     | src.policies:train:122 - Mean episode return: 221.27272727272728
2021-08-25 14:59:56.785 | INFO     | src.policies:train:123 - Last 100 episodes mean return: 207.19818181818184
2021-08-25 15:00:08.584 | INFO     | src.policies:train:159 - Total loss: 0.9984179139137268
2021-08-25 15:00:08.585 | INFO     | src.policies:train:164 - Epoch infos: {'efficiency': 221.27272727272728, 'equality': 0.9313513109746143, 'sustainability': 500.098585497859, 'peace': 746.1818181818181}
2021-08-25 15:00:08.647 | INFO     | src.policies:train:103 - Epoch 208 / 4000
2021-08-25 15:00:08.648 | INFO     | src.policies:train:110 - Episode 208
2021-08-25 15:00:33.604 | DEBUG    | src.policies:execute_episode:270 - Early stopping, all agents done
2021-08-25 15:00:33.629 | INFO     | src.policies:train:117 - Episode infos: {'efficiency': 202.72727272727272, 'equality': 0.891642886263929, 'sustainability': 480.5284666882954, 'peace': 679.8181818181819}
2021-08-25 15:00:33.63

2021-08-25 15:04:08.211 | INFO     | src.policies:train:103 - Epoch 215 / 4000
2021-08-25 15:04:08.211 | INFO     | src.policies:train:110 - Episode 215
2021-08-25 15:04:33.287 | DEBUG    | src.policies:execute_episode:270 - Early stopping, all agents done
2021-08-25 15:04:33.311 | INFO     | src.policies:train:117 - Episode infos: {'efficiency': 209.1818181818182, 'equality': 0.9356011220430722, 'sustainability': 486.6440744913134, 'peace': 747.7272727272727}
2021-08-25 15:04:33.312 | INFO     | src.policies:train:122 - Mean episode return: 209.1818181818182
2021-08-25 15:04:33.312 | INFO     | src.policies:train:123 - Last 100 episodes mean return: 206.7018181818182
2021-08-25 15:04:42.917 | INFO     | src.policies:train:159 - Total loss: 1.0021817684173584
2021-08-25 15:04:42.918 | INFO     | src.policies:train:164 - Epoch infos: {'efficiency': 209.1818181818182, 'equality': 0.9356011220430722, 'sustainability': 486.6440744913134, 'peace': 747.7272727272727}
2021-08-25 15:04:42.968 

2021-08-25 15:08:43.996 | INFO     | src.policies:train:159 - Total loss: 1.0047566890716553
2021-08-25 15:08:43.997 | INFO     | src.policies:train:164 - Epoch infos: {'efficiency': 226.27272727272728, 'equality': 0.9363015449808922, 'sustainability': 480.4049378109557, 'peace': 755.8181818181819}
2021-08-25 15:08:44.053 | INFO     | src.policies:train:103 - Epoch 223 / 4000
2021-08-25 15:08:44.054 | INFO     | src.policies:train:110 - Episode 223
2021-08-25 15:09:09.678 | DEBUG    | src.policies:execute_episode:270 - Early stopping, all agents done
2021-08-25 15:09:09.704 | INFO     | src.policies:train:117 - Episode infos: {'efficiency': 197.8181818181818, 'equality': 0.9258021390389831, 'sustainability': 476.0016012852038, 'peace': 736.0}
2021-08-25 15:09:09.705 | INFO     | src.policies:train:122 - Mean episode return: 197.8181818181818
2021-08-25 15:09:09.706 | INFO     | src.policies:train:123 - Last 100 episodes mean return: 207.3572727272727
2021-08-25 15:09:20.330 | INFO     

2021-08-25 15:13:26.910 | INFO     | src.policies:train:117 - Episode infos: {'efficiency': 186.0, 'equality': 0.8958499955590542, 'sustainability': 472.0381053232267, 'peace': 584.3636363636364}
2021-08-25 15:13:26.911 | INFO     | src.policies:train:122 - Mean episode return: 186.0
2021-08-25 15:13:26.911 | INFO     | src.policies:train:123 - Last 100 episodes mean return: 207.02545454545452
2021-08-25 15:13:38.845 | INFO     | src.policies:train:159 - Total loss: 0.9978119134902954
2021-08-25 15:13:38.846 | INFO     | src.policies:train:164 - Epoch infos: {'efficiency': 186.0, 'equality': 0.8958499955590542, 'sustainability': 472.0381053232267, 'peace': 584.3636363636364}
2021-08-25 15:13:38.906 | INFO     | src.policies:train:103 - Epoch 231 / 4000
2021-08-25 15:13:38.907 | INFO     | src.policies:train:110 - Episode 231
2021-08-25 15:14:05.114 | DEBUG    | src.policies:execute_episode:270 - Early stopping, all agents done
2021-08-25 15:14:05.141 | INFO     | src.policies:train:117

2021-08-25 15:17:53.570 | INFO     | src.policies:train:103 - Epoch 238 / 4000
2021-08-25 15:17:53.571 | INFO     | src.policies:train:110 - Episode 238
2021-08-25 15:18:23.330 | DEBUG    | src.policies:execute_episode:270 - Early stopping, all agents done
2021-08-25 15:18:23.359 | INFO     | src.policies:train:117 - Episode infos: {'efficiency': 205.54545454545453, 'equality': 0.9152426520864613, 'sustainability': 447.0805781034421, 'peace': 648.2727272727273}
2021-08-25 15:18:23.360 | INFO     | src.policies:train:122 - Mean episode return: 205.54545454545453
2021-08-25 15:18:23.361 | INFO     | src.policies:train:123 - Last 100 episodes mean return: 207.07909090909092
2021-08-25 15:18:35.010 | INFO     | src.policies:train:159 - Total loss: 0.9999114871025085
2021-08-25 15:18:35.011 | INFO     | src.policies:train:164 - Epoch infos: {'efficiency': 205.54545454545453, 'equality': 0.9152426520864613, 'sustainability': 447.0805781034421, 'peace': 648.2727272727273}
2021-08-25 15:18:35.

2021-08-25 15:22:52.616 | INFO     | src.policies:train:159 - Total loss: 1.0047941207885742
2021-08-25 15:22:52.616 | INFO     | src.policies:train:164 - Epoch infos: {'efficiency': 214.1818181818182, 'equality': 0.9503781447763857, 'sustainability': 475.3867803480223, 'peace': 716.0}
2021-08-25 15:22:52.675 | INFO     | src.policies:train:103 - Epoch 246 / 4000
2021-08-25 15:22:52.676 | INFO     | src.policies:train:110 - Episode 246
2021-08-25 15:23:19.206 | DEBUG    | src.policies:execute_episode:270 - Early stopping, all agents done
2021-08-25 15:23:19.238 | INFO     | src.policies:train:117 - Episode infos: {'efficiency': 209.45454545454547, 'equality': 0.937500000001233, 'sustainability': 489.21100306028126, 'peace': 756.0}
2021-08-25 15:23:19.239 | INFO     | src.policies:train:122 - Mean episode return: 209.45454545454547
2021-08-25 15:23:19.240 | INFO     | src.policies:train:123 - Last 100 episodes mean return: 207.06181818181813
2021-08-25 15:23:29.954 | INFO     | src.poli

2021-08-25 15:27:43.557 | INFO     | src.policies:train:117 - Episode infos: {'efficiency': 200.54545454545453, 'equality': 0.8660677491167463, 'sustainability': 453.40920089074785, 'peace': 674.4545454545455}
2021-08-25 15:27:43.558 | INFO     | src.policies:train:122 - Mean episode return: 200.54545454545453
2021-08-25 15:27:43.559 | INFO     | src.policies:train:123 - Last 100 episodes mean return: 206.21545454545452
2021-08-25 15:27:53.142 | INFO     | src.policies:train:159 - Total loss: 1.0043574571609497
2021-08-25 15:27:53.143 | INFO     | src.policies:train:164 - Epoch infos: {'efficiency': 200.54545454545453, 'equality': 0.8660677491167463, 'sustainability': 453.40920089074785, 'peace': 674.4545454545455}
2021-08-25 15:27:53.194 | INFO     | src.policies:train:103 - Epoch 254 / 4000
2021-08-25 15:27:53.195 | INFO     | src.policies:train:110 - Episode 254
2021-08-25 15:28:18.215 | DEBUG    | src.policies:execute_episode:270 - Early stopping, all agents done
2021-08-25 15:28:1

2021-08-25 15:32:06.225 | INFO     | src.policies:train:164 - Epoch infos: {'efficiency': 210.8181818181818, 'equality': 0.9489591908748097, 'sustainability': 476.81395197122634, 'peace': 718.4545454545455}
2021-08-25 15:32:06.281 | INFO     | src.policies:train:103 - Epoch 261 / 4000
2021-08-25 15:32:06.282 | INFO     | src.policies:train:110 - Episode 261
2021-08-25 15:32:32.559 | DEBUG    | src.policies:execute_episode:270 - Early stopping, all agents done
2021-08-25 15:32:32.587 | INFO     | src.policies:train:117 - Episode infos: {'efficiency': 210.45454545454547, 'equality': 0.9255448655031309, 'sustainability': 500.97081130162223, 'peace': 706.5454545454545}
2021-08-25 15:32:32.588 | INFO     | src.policies:train:122 - Mean episode return: 210.45454545454547
2021-08-25 15:32:32.588 | INFO     | src.policies:train:123 - Last 100 episodes mean return: 206.2663636363636
2021-08-25 15:32:42.853 | INFO     | src.policies:train:159 - Total loss: 0.9965837597846985
2021-08-25 15:32:42.

2021-08-25 15:36:38.493 | INFO     | src.policies:train:122 - Mean episode return: 184.8181818181818
2021-08-25 15:36:38.494 | INFO     | src.policies:train:123 - Last 100 episodes mean return: 206.05181818181816
2021-08-25 15:36:48.041 | INFO     | src.policies:train:159 - Total loss: 0.9994921684265137
2021-08-25 15:36:48.042 | INFO     | src.policies:train:164 - Epoch infos: {'efficiency': 184.8181818181818, 'equality': 0.9001028484572708, 'sustainability': 470.2970257137829, 'peace': 631.8181818181819}
2021-08-25 15:36:48.091 | INFO     | src.policies:train:103 - Epoch 269 / 4000
2021-08-25 15:36:48.091 | INFO     | src.policies:train:110 - Episode 269
2021-08-25 15:37:11.673 | DEBUG    | src.policies:execute_episode:270 - Early stopping, all agents done
2021-08-25 15:37:11.698 | INFO     | src.policies:train:117 - Episode infos: {'efficiency': 226.1818181818182, 'equality': 0.9394182987441644, 'sustainability': 498.86067704833874, 'peace': 667.4545454545455}
2021-08-25 15:37:11.69

2021-08-25 15:40:41.083 | INFO     | src.policies:train:103 - Epoch 276 / 4000
2021-08-25 15:40:41.083 | INFO     | src.policies:train:110 - Episode 276
2021-08-25 15:41:05.777 | DEBUG    | src.policies:execute_episode:270 - Early stopping, all agents done
2021-08-25 15:41:05.801 | INFO     | src.policies:train:117 - Episode infos: {'efficiency': 200.36363636363637, 'equality': 0.9276522026083226, 'sustainability': 485.40339366649965, 'peace': 723.0909090909091}
2021-08-25 15:41:05.802 | INFO     | src.policies:train:122 - Mean episode return: 200.36363636363637
2021-08-25 15:41:05.803 | INFO     | src.policies:train:123 - Last 100 episodes mean return: 206.31181818181815
2021-08-25 15:41:15.705 | INFO     | src.policies:train:159 - Total loss: 1.0018130540847778
2021-08-25 15:41:15.706 | INFO     | src.policies:train:164 - Epoch infos: {'efficiency': 200.36363636363637, 'equality': 0.9276522026083226, 'sustainability': 485.40339366649965, 'peace': 723.0909090909091}
2021-08-25 15:41:1

2021-08-25 15:45:07.019 | INFO     | src.policies:train:159 - Total loss: 1.0008656978607178
2021-08-25 15:45:07.019 | INFO     | src.policies:train:164 - Epoch infos: {'efficiency': 203.63636363636363, 'equality': 0.9086850649369179, 'sustainability': 476.89596476138826, 'peace': 603.0909090909091}
2021-08-25 15:45:07.070 | INFO     | src.policies:train:103 - Epoch 284 / 4000
2021-08-25 15:45:07.070 | INFO     | src.policies:train:110 - Episode 284
2021-08-25 15:45:31.360 | DEBUG    | src.policies:execute_episode:270 - Early stopping, all agents done
2021-08-25 15:45:31.383 | INFO     | src.policies:train:117 - Episode infos: {'efficiency': 217.63636363636363, 'equality': 0.9691653375869757, 'sustainability': 503.6884296178568, 'peace': 747.7272727272727}
2021-08-25 15:45:31.384 | INFO     | src.policies:train:122 - Mean episode return: 217.63636363636363
2021-08-25 15:45:31.385 | INFO     | src.policies:train:123 - Last 100 episodes mean return: 205.85454545454547
2021-08-25 15:45:40

2021-08-25 15:49:26.908 | INFO     | src.policies:train:117 - Episode infos: {'efficiency': 185.27272727272728, 'equality': 0.917477027390546, 'sustainability': 470.42485744614754, 'peace': 617.6363636363636}
2021-08-25 15:49:26.909 | INFO     | src.policies:train:122 - Mean episode return: 185.27272727272728
2021-08-25 15:49:26.909 | INFO     | src.policies:train:123 - Last 100 episodes mean return: 205.8890909090909
2021-08-25 15:49:36.524 | INFO     | src.policies:train:159 - Total loss: 0.9987404346466064
2021-08-25 15:49:36.525 | INFO     | src.policies:train:164 - Epoch infos: {'efficiency': 185.27272727272728, 'equality': 0.917477027390546, 'sustainability': 470.42485744614754, 'peace': 617.6363636363636}
2021-08-25 15:49:36.576 | INFO     | src.policies:train:103 - Epoch 292 / 4000
2021-08-25 15:49:36.576 | INFO     | src.policies:train:110 - Episode 292
2021-08-25 15:49:59.945 | DEBUG    | src.policies:execute_episode:270 - Early stopping, all agents done
2021-08-25 15:49:59.9

2021-08-25 15:53:27.211 | INFO     | src.policies:train:164 - Epoch infos: {'efficiency': 208.0909090909091, 'equality': 0.9250963104188987, 'sustainability': 495.3781557221297, 'peace': 741.2727272727273}
2021-08-25 15:53:27.261 | INFO     | src.policies:train:103 - Epoch 299 / 4000
2021-08-25 15:53:27.262 | INFO     | src.policies:train:110 - Episode 299
2021-08-25 15:53:51.439 | DEBUG    | src.policies:execute_episode:270 - Early stopping, all agents done
2021-08-25 15:53:51.461 | INFO     | src.policies:train:117 - Episode infos: {'efficiency': 229.0, 'equality': 0.9324407232319384, 'sustainability': 501.4908628747795, 'peace': 740.8181818181819}
2021-08-25 15:53:51.462 | INFO     | src.policies:train:122 - Mean episode return: 229.0
2021-08-25 15:53:51.462 | INFO     | src.policies:train:123 - Last 100 episodes mean return: 205.7363636363636
2021-08-25 15:54:01.078 | INFO     | src.policies:train:159 - Total loss: 0.9986935257911682
2021-08-25 15:54:01.079 | INFO     | src.policie

2021-08-25 15:57:42.348 | INFO     | src.policies:train:122 - Mean episode return: 204.36363636363637
2021-08-25 15:57:42.348 | INFO     | src.policies:train:123 - Last 100 episodes mean return: 205.61000000000004
2021-08-25 15:57:51.945 | INFO     | src.policies:train:159 - Total loss: 1.0019819736480713
2021-08-25 15:57:51.946 | INFO     | src.policies:train:164 - Epoch infos: {'efficiency': 204.36363636363637, 'equality': 0.9054513102574925, 'sustainability': 482.29919815820284, 'peace': 746.3636363636364}
2021-08-25 15:57:51.996 | INFO     | src.policies:train:103 - Epoch 307 / 4000
2021-08-25 15:57:51.996 | INFO     | src.policies:train:110 - Episode 307
2021-08-25 15:58:15.199 | DEBUG    | src.policies:execute_episode:270 - Early stopping, all agents done
2021-08-25 15:58:15.223 | INFO     | src.policies:train:117 - Episode infos: {'efficiency': 229.72727272727272, 'equality': 0.9537360146786751, 'sustainability': 508.4250799943256, 'peace': 770.4545454545455}
2021-08-25 15:58:15

2021-08-25 16:01:43.628 | INFO     | src.policies:train:103 - Epoch 314 / 4000
2021-08-25 16:01:43.629 | INFO     | src.policies:train:110 - Episode 314
2021-08-25 16:02:05.974 | DEBUG    | src.policies:execute_episode:270 - Early stopping, all agents done
2021-08-25 16:02:05.996 | INFO     | src.policies:train:117 - Episode infos: {'efficiency': 189.8181818181818, 'equality': 0.8912399860699399, 'sustainability': 470.1360844368368, 'peace': 655.9090909090909}
2021-08-25 16:02:05.998 | INFO     | src.policies:train:122 - Mean episode return: 189.8181818181818
2021-08-25 16:02:05.998 | INFO     | src.policies:train:123 - Last 100 episodes mean return: 206.14
2021-08-25 16:02:15.356 | INFO     | src.policies:train:159 - Total loss: 1.0017091035842896
2021-08-25 16:02:15.357 | INFO     | src.policies:train:164 - Epoch infos: {'efficiency': 189.8181818181818, 'equality': 0.8912399860699399, 'sustainability': 470.1360844368368, 'peace': 655.9090909090909}
2021-08-25 16:02:15.408 | INFO     

2021-08-25 16:06:07.045 | INFO     | src.policies:train:159 - Total loss: 0.9965023398399353
2021-08-25 16:06:07.046 | INFO     | src.policies:train:164 - Epoch infos: {'efficiency': 200.45454545454547, 'equality': 0.9215007215023397, 'sustainability': 512.2696829330854, 'peace': 714.9090909090909}
2021-08-25 16:06:07.095 | INFO     | src.policies:train:103 - Epoch 322 / 4000
2021-08-25 16:06:07.096 | INFO     | src.policies:train:110 - Episode 322
2021-08-25 16:06:30.786 | DEBUG    | src.policies:execute_episode:270 - Early stopping, all agents done
2021-08-25 16:06:30.809 | INFO     | src.policies:train:117 - Episode infos: {'efficiency': 201.27272727272728, 'equality': 0.8906955736246469, 'sustainability': 473.2305487800431, 'peace': 647.4545454545455}
2021-08-25 16:06:30.809 | INFO     | src.policies:train:122 - Mean episode return: 201.27272727272728
2021-08-25 16:06:30.810 | INFO     | src.policies:train:123 - Last 100 episodes mean return: 205.4481818181818
2021-08-25 16:06:40.1

2021-08-25 16:10:20.527 | INFO     | src.policies:train:117 - Episode infos: {'efficiency': 202.54545454545453, 'equality': 0.9298188346676632, 'sustainability': 488.26826452530526, 'peace': 634.8181818181819}
2021-08-25 16:10:20.528 | INFO     | src.policies:train:122 - Mean episode return: 202.54545454545453
2021-08-25 16:10:20.528 | INFO     | src.policies:train:123 - Last 100 episodes mean return: 205.57636363636365
2021-08-25 16:10:29.882 | INFO     | src.policies:train:159 - Total loss: 1.0008634328842163
2021-08-25 16:10:29.882 | INFO     | src.policies:train:164 - Epoch infos: {'efficiency': 202.54545454545453, 'equality': 0.9298188346676632, 'sustainability': 488.26826452530526, 'peace': 634.8181818181819}
2021-08-25 16:10:29.933 | INFO     | src.policies:train:103 - Epoch 330 / 4000
2021-08-25 16:10:29.934 | INFO     | src.policies:train:110 - Episode 330
2021-08-25 16:10:52.659 | DEBUG    | src.policies:execute_episode:270 - Early stopping, all agents done
2021-08-25 16:10:5

2021-08-25 16:14:18.737 | INFO     | src.policies:train:164 - Epoch infos: {'efficiency': 200.72727272727272, 'equality': 0.921607378130731, 'sustainability': 505.5755852168917, 'peace': 708.7272727272727}
2021-08-25 16:14:18.786 | INFO     | src.policies:train:103 - Epoch 337 / 4000
2021-08-25 16:14:18.787 | INFO     | src.policies:train:110 - Episode 337
2021-08-25 16:14:42.756 | DEBUG    | src.policies:execute_episode:270 - Early stopping, all agents done
2021-08-25 16:14:42.781 | INFO     | src.policies:train:117 - Episode infos: {'efficiency': 199.1818181818182, 'equality': 0.9410812829355404, 'sustainability': 500.1308971295854, 'peace': 687.4545454545455}
2021-08-25 16:14:42.782 | INFO     | src.policies:train:122 - Mean episode return: 199.1818181818182
2021-08-25 16:14:42.782 | INFO     | src.policies:train:123 - Last 100 episodes mean return: 206.05090909090904
2021-08-25 16:14:52.695 | INFO     | src.policies:train:159 - Total loss: 0.999764084815979
2021-08-25 16:14:52.696 

2021-08-25 16:18:34.418 | INFO     | src.policies:train:122 - Mean episode return: 207.72727272727272
2021-08-25 16:18:34.419 | INFO     | src.policies:train:123 - Last 100 episodes mean return: 205.91272727272724
2021-08-25 16:18:44.008 | INFO     | src.policies:train:159 - Total loss: 0.9997689127922058
2021-08-25 16:18:44.009 | INFO     | src.policies:train:164 - Epoch infos: {'efficiency': 207.72727272727272, 'equality': 0.9459717525373788, 'sustainability': 496.63421895499266, 'peace': 728.1818181818181}
2021-08-25 16:18:44.060 | INFO     | src.policies:train:103 - Epoch 345 / 4000
2021-08-25 16:18:44.061 | INFO     | src.policies:train:110 - Episode 345
2021-08-25 16:19:07.939 | DEBUG    | src.policies:execute_episode:270 - Early stopping, all agents done
2021-08-25 16:19:07.965 | INFO     | src.policies:train:117 - Episode infos: {'efficiency': 215.27272727272728, 'equality': 0.9097051597068929, 'sustainability': 492.2969224462307, 'peace': 749.3636363636364}
2021-08-25 16:19:07

2021-08-25 16:22:46.667 | INFO     | src.policies:train:103 - Epoch 352 / 4000
2021-08-25 16:22:46.668 | INFO     | src.policies:train:110 - Episode 352
2021-08-25 16:23:12.195 | DEBUG    | src.policies:execute_episode:270 - Early stopping, all agents done
2021-08-25 16:23:12.223 | INFO     | src.policies:train:117 - Episode infos: {'efficiency': 209.8181818181818, 'equality': 0.9216165117393726, 'sustainability': 487.31368803654, 'peace': 759.5454545454545}
2021-08-25 16:23:12.224 | INFO     | src.policies:train:122 - Mean episode return: 209.8181818181818
2021-08-25 16:23:12.224 | INFO     | src.policies:train:123 - Last 100 episodes mean return: 206.6781818181818
2021-08-25 16:23:22.435 | INFO     | src.policies:train:159 - Total loss: 0.998066782951355
2021-08-25 16:23:22.436 | INFO     | src.policies:train:164 - Epoch infos: {'efficiency': 209.8181818181818, 'equality': 0.9216165117393726, 'sustainability': 487.31368803654, 'peace': 759.5454545454545}
2021-08-25 16:23:22.487 | INF

2021-08-25 16:27:29.506 | INFO     | src.policies:train:159 - Total loss: 0.9946997761726379
2021-08-25 16:27:29.507 | INFO     | src.policies:train:164 - Epoch infos: {'efficiency': 202.27272727272728, 'equality': 0.9141164453541549, 'sustainability': 488.8432280931396, 'peace': 679.9090909090909}
2021-08-25 16:27:29.557 | INFO     | src.policies:train:103 - Epoch 360 / 4000
2021-08-25 16:27:29.557 | INFO     | src.policies:train:110 - Episode 360
2021-08-25 16:27:52.956 | DEBUG    | src.policies:execute_episode:270 - Early stopping, all agents done
2021-08-25 16:27:52.978 | INFO     | src.policies:train:117 - Episode infos: {'efficiency': 210.0909090909091, 'equality': 0.9195153613170309, 'sustainability': 488.383878248964, 'peace': 735.0}
2021-08-25 16:27:52.979 | INFO     | src.policies:train:122 - Mean episode return: 210.0909090909091
2021-08-25 16:27:52.980 | INFO     | src.policies:train:123 - Last 100 episodes mean return: 206.84909090909088
2021-08-25 16:28:02.288 | INFO     

2021-08-25 16:31:49.184 | INFO     | src.policies:train:117 - Episode infos: {'efficiency': 217.54545454545453, 'equality': 0.9475743646250888, 'sustainability': 504.96602063847945, 'peace': 729.3636363636364}
2021-08-25 16:31:49.185 | INFO     | src.policies:train:122 - Mean episode return: 217.54545454545453
2021-08-25 16:31:49.186 | INFO     | src.policies:train:123 - Last 100 episodes mean return: 206.94454545454548
2021-08-25 16:31:58.828 | INFO     | src.policies:train:159 - Total loss: 0.9983823299407959
2021-08-25 16:31:58.829 | INFO     | src.policies:train:164 - Epoch infos: {'efficiency': 217.54545454545453, 'equality': 0.9475743646250888, 'sustainability': 504.96602063847945, 'peace': 729.3636363636364}
2021-08-25 16:31:58.879 | INFO     | src.policies:train:103 - Epoch 368 / 4000
2021-08-25 16:31:58.879 | INFO     | src.policies:train:110 - Episode 368
2021-08-25 16:32:22.309 | DEBUG    | src.policies:execute_episode:270 - Early stopping, all agents done
2021-08-25 16:32:2

2021-08-25 16:36:17.053 | INFO     | src.policies:train:164 - Epoch infos: {'efficiency': 198.9090909090909, 'equality': 0.9398371281381952, 'sustainability': 477.8405902564451, 'peace': 710.0909090909091}
2021-08-25 16:36:17.107 | INFO     | src.policies:train:103 - Epoch 375 / 4000
2021-08-25 16:36:17.108 | INFO     | src.policies:train:110 - Episode 375
2021-08-25 16:36:43.255 | DEBUG    | src.policies:execute_episode:270 - Early stopping, all agents done
2021-08-25 16:36:43.279 | INFO     | src.policies:train:117 - Episode infos: {'efficiency': 230.0, 'equality': 0.9458857348195133, 'sustainability': 494.88911944142825, 'peace': 746.0}
2021-08-25 16:36:43.280 | INFO     | src.policies:train:122 - Mean episode return: 230.0
2021-08-25 16:36:43.281 | INFO     | src.policies:train:123 - Last 100 episodes mean return: 207.55636363636364
2021-08-25 16:36:52.878 | INFO     | src.policies:train:159 - Total loss: 0.9985874891281128
2021-08-25 16:36:52.879 | INFO     | src.policies:train:16

2021-08-25 16:40:32.218 | INFO     | src.policies:train:123 - Last 100 episodes mean return: 208.12454545454548
2021-08-25 16:40:41.053 | INFO     | src.policies:train:159 - Total loss: 1.0026084184646606
2021-08-25 16:40:41.054 | INFO     | src.policies:train:164 - Epoch infos: {'efficiency': 200.36363636363637, 'equality': 0.9240224385430618, 'sustainability': 477.4693446089714, 'peace': 716.5454545454545}
2021-08-25 16:40:41.106 | INFO     | src.policies:train:103 - Epoch 383 / 4000
2021-08-25 16:40:41.107 | INFO     | src.policies:train:110 - Episode 383
2021-08-25 16:41:04.754 | DEBUG    | src.policies:execute_episode:270 - Early stopping, all agents done
2021-08-25 16:41:04.779 | INFO     | src.policies:train:117 - Episode infos: {'efficiency': 194.27272727272728, 'equality': 0.8904156208812181, 'sustainability': 497.466122590218, 'peace': 696.7272727272727}
2021-08-25 16:41:04.780 | INFO     | src.policies:train:122 - Mean episode return: 194.27272727272728
2021-08-25 16:41:04.7

2021-08-25 16:44:41.276 | DEBUG    | src.policies:execute_episode:270 - Early stopping, all agents done
2021-08-25 16:44:41.298 | INFO     | src.policies:train:117 - Episode infos: {'efficiency': 211.27272727272728, 'equality': 0.9112814895964778, 'sustainability': 486.9548381510411, 'peace': 714.1818181818181}
2021-08-25 16:44:41.299 | INFO     | src.policies:train:122 - Mean episode return: 211.27272727272728
2021-08-25 16:44:41.300 | INFO     | src.policies:train:123 - Last 100 episodes mean return: 207.98545454545456
2021-08-25 16:44:49.801 | INFO     | src.policies:train:159 - Total loss: 1.0043658018112183
2021-08-25 16:44:49.802 | INFO     | src.policies:train:164 - Epoch infos: {'efficiency': 211.27272727272728, 'equality': 0.9112814895964778, 'sustainability': 486.9548381510411, 'peace': 714.1818181818181}
2021-08-25 16:44:49.847 | INFO     | src.policies:train:103 - Epoch 391 / 4000
2021-08-25 16:44:49.848 | INFO     | src.policies:train:110 - Episode 391
2021-08-25 16:45:10.

2021-08-25 16:48:16.852 | INFO     | src.policies:train:159 - Total loss: 0.9996019601821899
2021-08-25 16:48:16.853 | INFO     | src.policies:train:164 - Epoch infos: {'efficiency': 212.9090909090909, 'equality': 0.9470538001718218, 'sustainability': 502.32502450448214, 'peace': 746.2727272727273}
2021-08-25 16:48:16.901 | INFO     | src.policies:train:103 - Epoch 398 / 4000
2021-08-25 16:48:16.901 | INFO     | src.policies:train:110 - Episode 398
2021-08-25 16:48:38.078 | DEBUG    | src.policies:execute_episode:270 - Early stopping, all agents done
2021-08-25 16:48:38.102 | INFO     | src.policies:train:117 - Episode infos: {'efficiency': 187.63636363636363, 'equality': 0.8816067653303029, 'sustainability': 465.1264923501723, 'peace': 696.0909090909091}
2021-08-25 16:48:38.102 | INFO     | src.policies:train:122 - Mean episode return: 187.63636363636363
2021-08-25 16:48:38.103 | INFO     | src.policies:train:123 - Last 100 episodes mean return: 207.67363636363635
2021-08-25 16:48:46.

2021-08-25 16:52:00.108 | INFO     | src.policies:train:122 - Mean episode return: 220.8181818181818
2021-08-25 16:52:00.109 | INFO     | src.policies:train:123 - Last 100 episodes mean return: 207.64454545454547
2021-08-25 16:52:08.694 | INFO     | src.policies:train:159 - Total loss: 1.0016230344772339
2021-08-25 16:52:08.695 | INFO     | src.policies:train:164 - Epoch infos: {'efficiency': 220.8181818181818, 'equality': 0.9484262135568616, 'sustainability': 489.91286999509737, 'peace': 783.4545454545455}
2021-08-25 16:52:08.744 | INFO     | src.policies:train:103 - Epoch 406 / 4000
2021-08-25 16:52:08.745 | INFO     | src.policies:train:110 - Episode 406
2021-08-25 16:52:29.409 | DEBUG    | src.policies:execute_episode:270 - Early stopping, all agents done
2021-08-25 16:52:29.433 | INFO     | src.policies:train:117 - Episode infos: {'efficiency': 202.27272727272728, 'equality': 0.921144024516422, 'sustainability': 482.3733103547447, 'peace': 706.1818181818181}
2021-08-25 16:52:29.43

2021-08-25 16:55:54.669 | INFO     | src.policies:train:110 - Episode 413
2021-08-25 16:56:16.841 | DEBUG    | src.policies:execute_episode:270 - Early stopping, all agents done
2021-08-25 16:56:16.863 | INFO     | src.policies:train:117 - Episode infos: {'efficiency': 210.36363636363637, 'equality': 0.9266912862432881, 'sustainability': 486.3670301768071, 'peace': 694.8181818181819}
2021-08-25 16:56:16.863 | INFO     | src.policies:train:122 - Mean episode return: 210.36363636363637
2021-08-25 16:56:16.864 | INFO     | src.policies:train:123 - Last 100 episodes mean return: 207.7681818181818
2021-08-25 16:56:25.464 | INFO     | src.policies:train:159 - Total loss: 1.0011311769485474
2021-08-25 16:56:25.465 | INFO     | src.policies:train:164 - Epoch infos: {'efficiency': 210.36363636363637, 'equality': 0.9266912862432881, 'sustainability': 486.3670301768071, 'peace': 694.8181818181819}
2021-08-25 16:56:25.514 | INFO     | src.policies:train:103 - Epoch 414 / 4000
2021-08-25 16:56:25.5

2021-08-25 16:59:46.755 | INFO     | src.policies:train:159 - Total loss: 1.000838041305542
2021-08-25 16:59:46.756 | INFO     | src.policies:train:164 - Epoch infos: {'efficiency': 217.8181818181818, 'equality': 0.9102291698302051, 'sustainability': 481.82976579858826, 'peace': 758.2727272727273}
2021-08-25 16:59:46.802 | INFO     | src.policies:train:103 - Epoch 421 / 4000
2021-08-25 16:59:46.802 | INFO     | src.policies:train:110 - Episode 421
2021-08-25 17:00:07.456 | DEBUG    | src.policies:execute_episode:270 - Early stopping, all agents done
2021-08-25 17:00:07.476 | INFO     | src.policies:train:117 - Episode infos: {'efficiency': 202.0909090909091, 'equality': 0.9197644460818761, 'sustainability': 481.3401972909041, 'peace': 755.1818181818181}
2021-08-25 17:00:07.477 | INFO     | src.policies:train:122 - Mean episode return: 202.0909090909091
2021-08-25 17:00:07.477 | INFO     | src.policies:train:123 - Last 100 episodes mean return: 208.2745454545454
2021-08-25 17:00:15.273 

2021-08-25 17:03:19.163 | INFO     | src.policies:train:122 - Mean episode return: 204.72727272727272
2021-08-25 17:03:19.163 | INFO     | src.policies:train:123 - Last 100 episodes mean return: 208.13090909090909
2021-08-25 17:03:27.316 | INFO     | src.policies:train:159 - Total loss: 1.0023552179336548
2021-08-25 17:03:27.317 | INFO     | src.policies:train:164 - Epoch infos: {'efficiency': 204.72727272727272, 'equality': 0.9139350880043207, 'sustainability': 470.3312917863063, 'peace': 726.7272727272727}
2021-08-25 17:03:27.362 | INFO     | src.policies:train:103 - Epoch 429 / 4000
2021-08-25 17:03:27.362 | INFO     | src.policies:train:110 - Episode 429
2021-08-25 17:03:49.068 | DEBUG    | src.policies:execute_episode:270 - Early stopping, all agents done
2021-08-25 17:03:49.133 | INFO     | src.policies:train:117 - Episode infos: {'efficiency': 199.54545454545453, 'equality': 0.8955477324519456, 'sustainability': 498.3027846010424, 'peace': 691.8181818181819}
2021-08-25 17:03:49.

2021-08-25 17:06:45.407 | INFO     | src.policies:train:103 - Epoch 436 / 4000
2021-08-25 17:06:45.408 | INFO     | src.policies:train:110 - Episode 436
2021-08-25 17:07:05.206 | DEBUG    | src.policies:execute_episode:270 - Early stopping, all agents done
2021-08-25 17:07:05.227 | INFO     | src.policies:train:117 - Episode infos: {'efficiency': 205.36363636363637, 'equality': 0.9267576160021176, 'sustainability': 487.6090440613702, 'peace': 724.0}
2021-08-25 17:07:05.227 | INFO     | src.policies:train:122 - Mean episode return: 205.36363636363637
2021-08-25 17:07:05.228 | INFO     | src.policies:train:123 - Last 100 episodes mean return: 208.0127272727273
2021-08-25 17:07:12.937 | INFO     | src.policies:train:159 - Total loss: 0.9978480935096741
2021-08-25 17:07:12.938 | INFO     | src.policies:train:164 - Epoch infos: {'efficiency': 205.36363636363637, 'equality': 0.9267576160021176, 'sustainability': 487.6090440613702, 'peace': 724.0}
2021-08-25 17:07:12.987 | INFO     | src.poli

2021-08-25 17:10:21.634 | INFO     | src.policies:train:159 - Total loss: 1.002406120300293
2021-08-25 17:10:21.635 | INFO     | src.policies:train:164 - Epoch infos: {'efficiency': 201.36363636363637, 'equality': 0.9059306382124783, 'sustainability': 497.1208653638523, 'peace': 656.0}
2021-08-25 17:10:21.681 | INFO     | src.policies:train:103 - Epoch 444 / 4000
2021-08-25 17:10:21.682 | INFO     | src.policies:train:110 - Episode 444
2021-08-25 17:10:40.672 | DEBUG    | src.policies:execute_episode:270 - Early stopping, all agents done
2021-08-25 17:10:40.693 | INFO     | src.policies:train:117 - Episode infos: {'efficiency': 191.72727272727272, 'equality': 0.9258588732288923, 'sustainability': 489.50254141548925, 'peace': 716.7272727272727}
2021-08-25 17:10:40.694 | INFO     | src.policies:train:122 - Mean episode return: 191.72727272727272
2021-08-25 17:10:40.695 | INFO     | src.policies:train:123 - Last 100 episodes mean return: 207.7090909090909
2021-08-25 17:10:48.370 | INFO   

2021-08-25 17:13:49.979 | INFO     | src.policies:train:122 - Mean episode return: 214.72727272727272
2021-08-25 17:13:49.979 | INFO     | src.policies:train:123 - Last 100 episodes mean return: 207.38636363636363
2021-08-25 17:13:57.843 | INFO     | src.policies:train:159 - Total loss: 0.9981465935707092
2021-08-25 17:13:57.844 | INFO     | src.policies:train:164 - Epoch infos: {'efficiency': 214.72727272727272, 'equality': 0.9481949041654185, 'sustainability': 489.32514546848086, 'peace': 745.1818181818181}
2021-08-25 17:13:57.895 | INFO     | src.policies:train:103 - Epoch 452 / 4000
2021-08-25 17:13:57.896 | INFO     | src.policies:train:110 - Episode 452
2021-08-25 17:14:17.125 | DEBUG    | src.policies:execute_episode:270 - Early stopping, all agents done
2021-08-25 17:14:17.146 | INFO     | src.policies:train:117 - Episode infos: {'efficiency': 215.9090909090909, 'equality': 0.9141818181834607, 'sustainability': 475.2605217996542, 'peace': 593.0909090909091}
2021-08-25 17:14:17.

2021-08-25 17:17:04.679 | INFO     | src.policies:train:103 - Epoch 459 / 4000
2021-08-25 17:17:04.680 | INFO     | src.policies:train:110 - Episode 459
2021-08-25 17:17:24.019 | DEBUG    | src.policies:execute_episode:270 - Early stopping, all agents done
2021-08-25 17:17:24.041 | INFO     | src.policies:train:117 - Episode infos: {'efficiency': 218.0, 'equality': 0.931458033210792, 'sustainability': 497.5436715045045, 'peace': 560.4545454545455}
2021-08-25 17:17:24.041 | INFO     | src.policies:train:122 - Mean episode return: 218.0
2021-08-25 17:17:24.042 | INFO     | src.policies:train:123 - Last 100 episodes mean return: 207.53272727272727
2021-08-25 17:17:31.750 | INFO     | src.policies:train:159 - Total loss: 1.0022579431533813
2021-08-25 17:17:31.750 | INFO     | src.policies:train:164 - Epoch infos: {'efficiency': 218.0, 'equality': 0.931458033210792, 'sustainability': 497.5436715045045, 'peace': 560.4545454545455}
2021-08-25 17:17:31.801 | INFO     | src.policies:train:103 -

2021-08-25 17:20:51.095 | INFO     | src.policies:train:159 - Total loss: 1.0025179386138916
2021-08-25 17:20:51.096 | INFO     | src.policies:train:164 - Epoch infos: {'efficiency': 209.63636363636363, 'equality': 0.9035717101651113, 'sustainability': 471.49309729936516, 'peace': 748.3636363636364}
2021-08-25 17:20:51.147 | INFO     | src.policies:train:103 - Epoch 467 / 4000
2021-08-25 17:20:51.148 | INFO     | src.policies:train:110 - Episode 467
2021-08-25 17:21:11.603 | DEBUG    | src.policies:execute_episode:270 - Early stopping, all agents done
2021-08-25 17:21:11.629 | INFO     | src.policies:train:117 - Episode infos: {'efficiency': 215.36363636363637, 'equality': 0.9585555854031514, 'sustainability': 469.51767296253547, 'peace': 707.7272727272727}
2021-08-25 17:21:11.630 | INFO     | src.policies:train:122 - Mean episode return: 215.36363636363637
2021-08-25 17:21:11.631 | INFO     | src.policies:train:123 - Last 100 episodes mean return: 207.4527272727273
2021-08-25 17:21:20

2021-08-25 17:24:35.681 | INFO     | src.policies:train:117 - Episode infos: {'efficiency': 214.0, 'equality': 0.9050745346430626, 'sustainability': 489.2240399955357, 'peace': 747.0909090909091}
2021-08-25 17:24:35.681 | INFO     | src.policies:train:122 - Mean episode return: 214.0
2021-08-25 17:24:35.682 | INFO     | src.policies:train:123 - Last 100 episodes mean return: 207.5036363636364
2021-08-25 17:24:43.355 | INFO     | src.policies:train:159 - Total loss: 0.9998958706855774
2021-08-25 17:24:43.356 | INFO     | src.policies:train:164 - Epoch infos: {'efficiency': 214.0, 'equality': 0.9050745346430626, 'sustainability': 489.2240399955357, 'peace': 747.0909090909091}
2021-08-25 17:24:43.410 | INFO     | src.policies:train:103 - Epoch 475 / 4000
2021-08-25 17:24:43.410 | INFO     | src.policies:train:110 - Episode 475
2021-08-25 17:25:03.614 | DEBUG    | src.policies:execute_episode:270 - Early stopping, all agents done
2021-08-25 17:25:03.636 | INFO     | src.policies:train:117 

2021-08-25 17:27:58.685 | INFO     | src.policies:train:164 - Epoch infos: {'efficiency': 212.27272727272728, 'equality': 0.9208876776343997, 'sustainability': 481.3227532526727, 'peace': 710.6363636363636}
2021-08-25 17:27:58.733 | INFO     | src.policies:train:103 - Epoch 482 / 4000
2021-08-25 17:27:58.734 | INFO     | src.policies:train:110 - Episode 482
2021-08-25 17:28:17.950 | DEBUG    | src.policies:execute_episode:270 - Early stopping, all agents done
2021-08-25 17:28:17.973 | INFO     | src.policies:train:117 - Episode infos: {'efficiency': 204.1818181818182, 'equality': 0.8737958390699871, 'sustainability': 478.41441990569837, 'peace': 701.7272727272727}
2021-08-25 17:28:17.974 | INFO     | src.policies:train:122 - Mean episode return: 204.1818181818182
2021-08-25 17:28:17.974 | INFO     | src.policies:train:123 - Last 100 episodes mean return: 207.50181818181824
2021-08-25 17:28:25.944 | INFO     | src.policies:train:159 - Total loss: 1.001261830329895
2021-08-25 17:28:25.94

2021-08-25 17:31:32.760 | INFO     | src.policies:train:122 - Mean episode return: 195.9090909090909
2021-08-25 17:31:32.760 | INFO     | src.policies:train:123 - Last 100 episodes mean return: 207.5954545454546
2021-08-25 17:31:40.961 | INFO     | src.policies:train:159 - Total loss: 1.0029457807540894
2021-08-25 17:31:40.962 | INFO     | src.policies:train:164 - Epoch infos: {'efficiency': 195.9090909090909, 'equality': 0.9012866483884985, 'sustainability': 448.64522671352063, 'peace': 627.0}
2021-08-25 17:31:41.013 | INFO     | src.policies:train:103 - Epoch 490 / 4000
2021-08-25 17:31:41.014 | INFO     | src.policies:train:110 - Episode 490
2021-08-25 17:32:01.700 | DEBUG    | src.policies:execute_episode:270 - Early stopping, all agents done
2021-08-25 17:32:01.726 | INFO     | src.policies:train:117 - Episode infos: {'efficiency': 228.1818181818182, 'equality': 0.9552336110113141, 'sustainability': 487.15975279073973, 'peace': 796.8181818181819}
2021-08-25 17:32:01.727 | INFO    

2021-08-25 17:35:06.434 | INFO     | src.policies:train:110 - Episode 497
2021-08-25 17:35:27.338 | DEBUG    | src.policies:execute_episode:270 - Early stopping, all agents done
2021-08-25 17:35:27.362 | INFO     | src.policies:train:117 - Episode infos: {'efficiency': 205.54545454545453, 'equality': 0.9119456395015892, 'sustainability': 487.1775688662811, 'peace': 665.7272727272727}
2021-08-25 17:35:27.363 | INFO     | src.policies:train:122 - Mean episode return: 205.54545454545453
2021-08-25 17:35:27.364 | INFO     | src.policies:train:123 - Last 100 episodes mean return: 207.83272727272728
2021-08-25 17:35:35.984 | INFO     | src.policies:train:159 - Total loss: 1.000747561454773
2021-08-25 17:35:35.985 | INFO     | src.policies:train:164 - Epoch infos: {'efficiency': 205.54545454545453, 'equality': 0.9119456395015892, 'sustainability': 487.1775688662811, 'peace': 665.7272727272727}
2021-08-25 17:35:36.037 | INFO     | src.policies:train:103 - Epoch 498 / 4000
2021-08-25 17:35:36.0

2021-08-25 17:39:06.553 | INFO     | src.policies:train:159 - Total loss: 1.000697135925293
2021-08-25 17:39:06.553 | INFO     | src.policies:train:164 - Epoch infos: {'efficiency': 198.36363636363637, 'equality': 0.9232563953019903, 'sustainability': 490.92758214744697, 'peace': 647.9090909090909}
2021-08-25 17:39:06.603 | INFO     | src.policies:train:103 - Epoch 505 / 4000
2021-08-25 17:39:06.603 | INFO     | src.policies:train:110 - Episode 505
2021-08-25 17:39:28.957 | DEBUG    | src.policies:execute_episode:270 - Early stopping, all agents done
2021-08-25 17:39:28.978 | INFO     | src.policies:train:117 - Episode infos: {'efficiency': 200.0909090909091, 'equality': 0.9041757878669989, 'sustainability': 495.05250537341107, 'peace': 702.1818181818181}
2021-08-25 17:39:28.979 | INFO     | src.policies:train:122 - Mean episode return: 200.0909090909091
2021-08-25 17:39:28.980 | INFO     | src.policies:train:123 - Last 100 episodes mean return: 207.68818181818185
2021-08-25 17:39:37.5

2021-08-25 17:43:09.857 | INFO     | src.policies:train:122 - Mean episode return: 193.0909090909091
2021-08-25 17:43:09.858 | INFO     | src.policies:train:123 - Last 100 episodes mean return: 207.05272727272728
2021-08-25 17:43:18.859 | INFO     | src.policies:train:159 - Total loss: 0.9945150017738342
2021-08-25 17:43:18.860 | INFO     | src.policies:train:164 - Epoch infos: {'efficiency': 193.0909090909091, 'equality': 0.8655196028106164, 'sustainability': 491.9531938803957, 'peace': 676.5454545454545}
2021-08-25 17:43:18.912 | INFO     | src.policies:train:103 - Epoch 513 / 4000
2021-08-25 17:43:18.913 | INFO     | src.policies:train:110 - Episode 513
2021-08-25 17:43:42.312 | DEBUG    | src.policies:execute_episode:270 - Early stopping, all agents done
2021-08-25 17:43:42.335 | INFO     | src.policies:train:117 - Episode infos: {'efficiency': 228.27272727272728, 'equality': 0.9254190652053615, 'sustainability': 490.46745455773817, 'peace': 805.5454545454545}
2021-08-25 17:43:42.3

2021-08-25 17:47:13.988 | INFO     | src.policies:train:103 - Epoch 520 / 4000
2021-08-25 17:47:13.988 | INFO     | src.policies:train:110 - Episode 520
2021-08-25 17:47:38.669 | DEBUG    | src.policies:execute_episode:270 - Early stopping, all agents done
2021-08-25 17:47:38.694 | INFO     | src.policies:train:117 - Episode infos: {'efficiency': 192.45454545454547, 'equality': 0.9312062524169878, 'sustainability': 446.0349382458822, 'peace': 548.2727272727273}
2021-08-25 17:47:38.694 | INFO     | src.policies:train:122 - Mean episode return: 192.45454545454547
2021-08-25 17:47:38.695 | INFO     | src.policies:train:123 - Last 100 episodes mean return: 207.13727272727274
2021-08-25 17:47:48.645 | INFO     | src.policies:train:159 - Total loss: 0.9994122385978699
2021-08-25 17:47:48.646 | INFO     | src.policies:train:164 - Epoch infos: {'efficiency': 192.45454545454547, 'equality': 0.9312062524169878, 'sustainability': 446.0349382458822, 'peace': 548.2727272727273}
2021-08-25 17:47:48.

2021-08-25 17:52:04.468 | INFO     | src.policies:train:159 - Total loss: 1.0051045417785645
2021-08-25 17:52:04.469 | INFO     | src.policies:train:164 - Epoch infos: {'efficiency': 206.0, 'equality': 0.9477653855422498, 'sustainability': 491.0232116942009, 'peace': 624.0}
2021-08-25 17:52:04.537 | INFO     | src.policies:train:103 - Epoch 528 / 4000
2021-08-25 17:52:04.538 | INFO     | src.policies:train:110 - Episode 528
2021-08-25 17:52:35.501 | DEBUG    | src.policies:execute_episode:270 - Early stopping, all agents done
2021-08-25 17:52:35.533 | INFO     | src.policies:train:117 - Episode infos: {'efficiency': 196.1818181818182, 'equality': 0.924003707137838, 'sustainability': 476.1330106341124, 'peace': 600.8181818181819}
2021-08-25 17:52:35.534 | INFO     | src.policies:train:122 - Mean episode return: 196.1818181818182
2021-08-25 17:52:35.534 | INFO     | src.policies:train:123 - Last 100 episodes mean return: 207.60636363636365
2021-08-25 17:52:47.346 | INFO     | src.policie

2021-08-25 17:56:55.924 | INFO     | src.policies:train:117 - Episode infos: {'efficiency': 208.27272727272728, 'equality': 0.9004007777488909, 'sustainability': 489.5854352726565, 'peace': 648.2727272727273}
2021-08-25 17:56:55.925 | INFO     | src.policies:train:122 - Mean episode return: 208.27272727272728
2021-08-25 17:56:55.925 | INFO     | src.policies:train:123 - Last 100 episodes mean return: 207.75727272727275


KeyboardInterrupt: 

## TRPO

This section deals with training a set of Harvest agents using our custom Trust Region Policy Optimization implementation.

In [51]:
beta = 1.0
kl_target = 0.01

In [None]:
trpo_policy_nn = models.MLP(observation_space_size, hidden_sizes, action_space_size)
trpo_baseline_nn = models.MLP(observation_space_size, hidden_sizes, 1, log_softmax=False)
trpo_policy = policies.TRPOPolicy(env, trpo_policy_nn, trpo_baseline_nn, beta=beta, kl_target=kl_target)
trpo_policy.train(
    epochs,
    steps_per_epoch,
    enable_wandb=True,
    wandb_config={**wandb_config, "group": "TRPO"},
    render_every=render_every
)

2021-08-25 18:03:30.853 | DEBUG    | src.policies:execute_episode:294 - Early stopping, all agents done
2021-08-25 18:03:30.856 | INFO     | src.policies:train:124 - Episode infos: {'efficiency': 26.09090909090909, 'equality': 0.8131137155823387, 'sustainability': 443.1172088229232, 'peace': 641.9090909090909}
2021-08-25 18:03:30.856 | INFO     | src.policies:train:129 - Mean episode return: 26.09090909090909
2021-08-25 18:03:30.857 | INFO     | src.policies:train:130 - Last 100 episodes mean return: 26.09090909090909
2021-08-25 18:03:39.446 | INFO     | src.policies:train:166 - Total loss: 1.1647214889526367
2021-08-25 18:03:39.447 | INFO     | src.policies:train:171 - Epoch infos: {'efficiency': 26.09090909090909, 'equality': 0.8131137155823387, 'sustainability': 443.1172088229232, 'peace': 641.9090909090909}
2021-08-25 18:03:39.489 | INFO     | src.policies:train:106 - Epoch 2 / 4000
2021-08-25 18:03:39.489 | INFO     | src.policies:train:113 - Episode 2
2021-08-25 18:04:02.500 | DE

2021-08-25 18:07:36.595 | INFO     | src.policies:train:171 - Epoch infos: {'efficiency': 25.818181818181817, 'equality': 0.8201024328072819, 'sustainability': 408.41192621129335, 'peace': 605.1818181818181}
2021-08-25 18:07:36.645 | INFO     | src.policies:train:106 - Epoch 9 / 4000
2021-08-25 18:07:36.645 | INFO     | src.policies:train:113 - Episode 9
2021-08-25 18:07:59.038 | DEBUG    | src.policies:execute_episode:294 - Early stopping, all agents done
2021-08-25 18:07:59.062 | INFO     | src.policies:train:124 - Episode infos: {'efficiency': 30.181818181818183, 'equality': 0.8247535597173119, 'sustainability': 460.3603476851406, 'peace': 662.2727272727273}
2021-08-25 18:07:59.062 | INFO     | src.policies:train:129 - Mean episode return: 30.181818181818183
2021-08-25 18:07:59.063 | INFO     | src.policies:train:130 - Last 100 episodes mean return: 24.242424242424242
2021-08-25 18:08:08.462 | INFO     | src.policies:train:166 - Total loss: 1.060709834098816
2021-08-25 18:08:08.463 

2021-08-25 18:11:58.159 | INFO     | src.policies:train:129 - Mean episode return: 24.818181818181817
2021-08-25 18:11:58.160 | INFO     | src.policies:train:130 - Last 100 episodes mean return: 25.011363636363633
2021-08-25 18:12:07.815 | INFO     | src.policies:train:166 - Total loss: 1.0007688999176025
2021-08-25 18:12:07.815 | INFO     | src.policies:train:171 - Epoch infos: {'efficiency': 24.818181818181817, 'equality': 0.8068598068919648, 'sustainability': 403.14813624376325, 'peace': 582.2727272727273}
2021-08-25 18:12:07.865 | INFO     | src.policies:train:106 - Epoch 17 / 4000
2021-08-25 18:12:07.866 | INFO     | src.policies:train:113 - Episode 17
2021-08-25 18:12:35.220 | DEBUG    | src.policies:execute_episode:294 - Early stopping, all agents done
2021-08-25 18:12:35.247 | INFO     | src.policies:train:124 - Episode infos: {'efficiency': 30.181818181818183, 'equality': 0.8849945235644859, 'sustainability': 410.3584748312393, 'peace': 589.0}
2021-08-25 18:12:35.248 | INFO   

2021-08-25 18:16:06.999 | INFO     | src.policies:train:106 - Epoch 24 / 4000
2021-08-25 18:16:07.000 | INFO     | src.policies:train:113 - Episode 24
2021-08-25 18:16:33.516 | DEBUG    | src.policies:execute_episode:294 - Early stopping, all agents done
2021-08-25 18:16:33.541 | INFO     | src.policies:train:124 - Episode infos: {'efficiency': 28.454545454545453, 'equality': 0.8083067092930138, 'sustainability': 455.67153466838187, 'peace': 616.1818181818181}
2021-08-25 18:16:33.542 | INFO     | src.policies:train:129 - Mean episode return: 28.454545454545453
2021-08-25 18:16:33.543 | INFO     | src.policies:train:130 - Last 100 episodes mean return: 25.03030303030303
2021-08-25 18:16:43.018 | INFO     | src.policies:train:166 - Total loss: 1.000118374824524
2021-08-25 18:16:43.019 | INFO     | src.policies:train:171 - Epoch infos: {'efficiency': 28.454545454545453, 'equality': 0.8083067092930138, 'sustainability': 455.67153466838187, 'peace': 616.1818181818181}
2021-08-25 18:16:43.06

2021-08-25 18:21:14.950 | INFO     | src.policies:train:166 - Total loss: 0.9999059438705444
2021-08-25 18:21:14.951 | INFO     | src.policies:train:171 - Epoch infos: {'efficiency': 23.181818181818183, 'equality': 0.7632798574397006, 'sustainability': 425.50709694346057, 'peace': 637.0}
2021-08-25 18:21:15.009 | INFO     | src.policies:train:106 - Epoch 32 / 4000
2021-08-25 18:21:15.009 | INFO     | src.policies:train:113 - Episode 32
2021-08-25 18:21:45.739 | DEBUG    | src.policies:execute_episode:294 - Early stopping, all agents done
2021-08-25 18:21:45.764 | INFO     | src.policies:train:124 - Episode infos: {'efficiency': 25.454545454545453, 'equality': 0.8175324675620889, 'sustainability': 437.1229949650111, 'peace': 569.9090909090909}
2021-08-25 18:21:45.765 | INFO     | src.policies:train:129 - Mean episode return: 25.454545454545453
2021-08-25 18:21:45.766 | INFO     | src.policies:train:130 - Last 100 episodes mean return: 25.03409090909091
2021-08-25 18:21:56.204 | INFO    

2021-08-25 18:27:14.951 | INFO     | src.policies:train:124 - Episode infos: {'efficiency': 24.09090909090909, 'equality': 0.683018867978899, 'sustainability': 416.72371924106375, 'peace': 561.6363636363636}
2021-08-25 18:27:14.952 | INFO     | src.policies:train:129 - Mean episode return: 24.09090909090909
2021-08-25 18:27:14.953 | INFO     | src.policies:train:130 - Last 100 episodes mean return: 24.8951048951049
2021-08-25 18:27:27.407 | INFO     | src.policies:train:166 - Total loss: 0.999906599521637
2021-08-25 18:27:27.408 | INFO     | src.policies:train:171 - Epoch infos: {'efficiency': 24.09090909090909, 'equality': 0.683018867978899, 'sustainability': 416.72371924106375, 'peace': 561.6363636363636}
2021-08-25 18:27:27.470 | INFO     | src.policies:train:106 - Epoch 40 / 4000
2021-08-25 18:27:27.471 | INFO     | src.policies:train:113 - Episode 40
2021-08-25 18:28:03.388 | DEBUG    | src.policies:execute_episode:294 - Early stopping, all agents done
2021-08-25 18:28:03.425 | IN

## PPO

This section deals with training a set of Harvest agents using our custom Proximal Policy Optimization implementation.

In [None]:
c1=1.0
c2=0.01
eps=0.2

In [None]:
ppo_policy_nn = models.MLP(observation_space_size, hidden_sizes, action_space_size)
ppo_baseline_nn = models.MLP(observation_space_size, hidden_sizes, 1, log_softmax=False)
ppo_policy = policies.PPOPolicy(env, ppo_policy_nn, ppo_baseline_nn, c1=c1, c2=c2, eps=eps)
ppo_policy.train(
    epochs,
    steps_per_epoch,
    enable_wandb=True,
    wandb_config={**wandb_config, "group": "PPO"},
    render_every=render_every
)