# Cartpole tests with policy gradient

This notebook contains a simple test for each implemented policy gradient method. In order to test if they function properly, we rely on the [Cartpole](https://gym.openai.com/envs/CartPole-v0/) environment, provided out-of-the-box in OpenAI Gym. As stated in Gym's documentation, the problem is considered "solved" if the agent is able to obtain a mean return of 200 in the last 100 episodes.

## Pre-requisites

The cells down below install and import the necessary libraries to successfully run the notebook examples.

In [1]:
import sys
sys.path.append('../')

In [2]:
%%capture
!pip install -r ../init/requirements.txt

In [3]:
import numpy as np
import gym

from src import models, policies

%load_ext autoreload
%autoreload 2

## Utilities

The cell down below defines the environment, along with common variables to be used throughout the notebook.

In [None]:
env = gym.make('CartPole-v0')

In [3]:
observation_space_size = 4
action_space_size = 2
hidden_sizes = [32, 32]
epochs = 800
steps_per_epoch = 200
episodes_mean_reward = 100

## VPG

This section deals with training a Cartpole agent using our custom Vanilla Policy Gradient implementation.

In [10]:
vpg_policy_nn = models.MLP(observation_space_size, hidden_sizes, action_space_size)
vpg_baseline_nn = models.MLP(observation_space_size, hidden_sizes, 1, log_softmax=False)
vpg_policy = policies.VPGPolicy(env, vpg_policy_nn, baseline_nn=vpg_baseline_nn)
vpg_policy.train(
    epochs,
    steps_per_epoch,
    enable_wandb=False,
    episodes_mean_reward=episodes_mean_reward
)

2021-08-25 10:46:14.849 | INFO     | src.policies:train:103 - Epoch 1 / 800
2021-08-25 10:46:14.850 | INFO     | src.policies:train:109 - Episode 1
2021-08-25 10:46:14.864 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:14.865 | INFO     | src.policies:train:121 - Mean episode return: 47.0
2021-08-25 10:46:14.865 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 47.0
2021-08-25 10:46:14.866 | INFO     | src.policies:train:109 - Episode 2
2021-08-25 10:46:14.872 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:14.873 | INFO     | src.policies:train:121 - Mean episode return: 13.0
2021-08-25 10:46:14.873 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 30.0
2021-08-25 10:46:14.874 | INFO     | src.policies:train:109 - Episode 3
2021-08-25 10:46:14.882 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:14.882 

2021-08-25 10:46:15.106 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:15.107 | INFO     | src.policies:train:121 - Mean episode return: 18.0
2021-08-25 10:46:15.108 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 25.523809523809526
2021-08-25 10:46:15.109 | INFO     | src.policies:train:109 - Episode 22
2021-08-25 10:46:15.117 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:15.118 | INFO     | src.policies:train:121 - Mean episode return: 21.0
2021-08-25 10:46:15.119 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 25.318181818181817
2021-08-25 10:46:15.119 | INFO     | src.policies:train:109 - Episode 23
2021-08-25 10:46:15.130 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:15.131 | INFO     | src.policies:train:121 - Mean episode return: 33.0
2021-08-25 10:46:15.132 | INFO     | src.policies:trai

2021-08-25 10:46:15.353 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:15.354 | INFO     | src.policies:train:121 - Mean episode return: 22.0
2021-08-25 10:46:15.354 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 25.4390243902439
2021-08-25 10:46:15.355 | INFO     | src.policies:train:109 - Episode 42
2021-08-25 10:46:15.360 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:15.361 | INFO     | src.policies:train:121 - Mean episode return: 12.0
2021-08-25 10:46:15.362 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 25.11904761904762
2021-08-25 10:46:15.363 | INFO     | src.policies:train:109 - Episode 43
2021-08-25 10:46:15.372 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:15.373 | INFO     | src.policies:train:121 - Mean episode return: 22.0
2021-08-25 10:46:15.373 | INFO     | src.policies:train:1

2021-08-25 10:46:15.585 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:15.586 | INFO     | src.policies:train:121 - Mean episode return: 21.0
2021-08-25 10:46:15.586 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 24.557377049180328
2021-08-25 10:46:15.587 | INFO     | src.policies:train:109 - Episode 62
2021-08-25 10:46:15.598 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:15.599 | INFO     | src.policies:train:121 - Mean episode return: 34.0
2021-08-25 10:46:15.600 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 24.70967741935484
2021-08-25 10:46:15.605 | INFO     | src.policies:train:157 - Total loss: 259.30316162109375
2021-08-25 10:46:15.606 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:46:15.608 | INFO     | src.policies:train:103 - Epoch 8 / 800
2021-08-25 10:46:15.609 | INFO     | src.policies:train:109 - Episode 63


2021-08-25 10:46:15.801 | INFO     | src.policies:train:121 - Mean episode return: 9.0
2021-08-25 10:46:15.802 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 23.049382716049383
2021-08-25 10:46:15.803 | INFO     | src.policies:train:109 - Episode 82
2021-08-25 10:46:15.812 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:15.813 | INFO     | src.policies:train:121 - Mean episode return: 15.0
2021-08-25 10:46:15.814 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 22.951219512195124
2021-08-25 10:46:15.815 | INFO     | src.policies:train:109 - Episode 83
2021-08-25 10:46:15.834 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:15.835 | INFO     | src.policies:train:121 - Mean episode return: 48.0
2021-08-25 10:46:15.836 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 23.253012048192772
2021-08-25 10:46:15.837 | INFO     | src.polici

2021-08-25 10:46:16.090 | INFO     | src.policies:train:121 - Mean episode return: 17.0
2021-08-25 10:46:16.091 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 23.83
2021-08-25 10:46:16.091 | INFO     | src.policies:train:109 - Episode 101
2021-08-25 10:46:16.099 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:16.099 | INFO     | src.policies:train:121 - Mean episode return: 18.0
2021-08-25 10:46:16.100 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 23.54
2021-08-25 10:46:16.101 | INFO     | src.policies:train:109 - Episode 102
2021-08-25 10:46:16.108 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:16.109 | INFO     | src.policies:train:121 - Mean episode return: 17.0
2021-08-25 10:46:16.110 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 23.58
2021-08-25 10:46:16.111 | INFO     | src.policies:train:109 - Episode 103
2021-08-2

2021-08-25 10:46:16.319 | INFO     | src.policies:train:109 - Episode 121
2021-08-25 10:46:16.326 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:16.327 | INFO     | src.policies:train:121 - Mean episode return: 14.0
2021-08-25 10:46:16.327 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 22.72
2021-08-25 10:46:16.328 | INFO     | src.policies:train:109 - Episode 122
2021-08-25 10:46:16.346 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:16.347 | INFO     | src.policies:train:121 - Mean episode return: 56.0
2021-08-25 10:46:16.348 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 23.07
2021-08-25 10:46:16.349 | INFO     | src.policies:train:109 - Episode 123
2021-08-25 10:46:16.358 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:16.359 | INFO     | src.policies:train:121 - Mean episode return: 24.0
2021

2021-08-25 10:46:16.562 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 21.84
2021-08-25 10:46:16.563 | INFO     | src.policies:train:109 - Episode 142
2021-08-25 10:46:16.568 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:16.569 | INFO     | src.policies:train:121 - Mean episode return: 11.0
2021-08-25 10:46:16.570 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 21.83
2021-08-25 10:46:16.571 | INFO     | src.policies:train:109 - Episode 143
2021-08-25 10:46:16.578 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:16.579 | INFO     | src.policies:train:121 - Mean episode return: 16.0
2021-08-25 10:46:16.579 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 21.77
2021-08-25 10:46:16.580 | INFO     | src.policies:train:109 - Episode 144
2021-08-25 10:46:16.588 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agent

2021-08-25 10:46:16.799 | INFO     | src.policies:train:121 - Mean episode return: 20.0
2021-08-25 10:46:16.800 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 21.18
2021-08-25 10:46:16.800 | INFO     | src.policies:train:109 - Episode 163
2021-08-25 10:46:16.812 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:16.813 | INFO     | src.policies:train:121 - Mean episode return: 32.0
2021-08-25 10:46:16.814 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 21.05
2021-08-25 10:46:16.814 | INFO     | src.policies:train:109 - Episode 164
2021-08-25 10:46:16.822 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:16.823 | INFO     | src.policies:train:121 - Mean episode return: 21.0
2021-08-25 10:46:16.824 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 21.09
2021-08-25 10:46:16.825 | INFO     | src.policies:train:109 - Episode 165
2021-08-2

2021-08-25 10:46:17.052 | INFO     | src.policies:train:109 - Episode 183
2021-08-25 10:46:17.068 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:17.069 | INFO     | src.policies:train:121 - Mean episode return: 46.0
2021-08-25 10:46:17.069 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 22.5
2021-08-25 10:46:17.070 | INFO     | src.policies:train:109 - Episode 184
2021-08-25 10:46:17.079 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:17.079 | INFO     | src.policies:train:121 - Mean episode return: 19.0
2021-08-25 10:46:17.080 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 22.57
2021-08-25 10:46:17.081 | INFO     | src.policies:train:109 - Episode 185
2021-08-25 10:46:17.091 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:17.092 | INFO     | src.policies:train:121 - Mean episode return: 28.0
2021-

2021-08-25 10:46:17.312 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 22.09
2021-08-25 10:46:17.312 | INFO     | src.policies:train:109 - Episode 204
2021-08-25 10:46:17.318 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:17.320 | INFO     | src.policies:train:121 - Mean episode return: 14.0
2021-08-25 10:46:17.320 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 22.12
2021-08-25 10:46:17.321 | INFO     | src.policies:train:109 - Episode 205
2021-08-25 10:46:17.331 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:17.332 | INFO     | src.policies:train:121 - Mean episode return: 26.0
2021-08-25 10:46:17.332 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 22.0
2021-08-25 10:46:17.333 | INFO     | src.policies:train:109 - Episode 206
2021-08-25 10:46:17.344 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents

2021-08-25 10:46:17.551 | INFO     | src.policies:train:121 - Mean episode return: 14.0
2021-08-25 10:46:17.552 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 21.48
2021-08-25 10:46:17.552 | INFO     | src.policies:train:109 - Episode 225
2021-08-25 10:46:17.561 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:17.561 | INFO     | src.policies:train:121 - Mean episode return: 21.0
2021-08-25 10:46:17.562 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 21.54
2021-08-25 10:46:17.563 | INFO     | src.policies:train:109 - Episode 226
2021-08-25 10:46:17.573 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:17.574 | INFO     | src.policies:train:121 - Mean episode return: 25.0
2021-08-25 10:46:17.574 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 21.63
2021-08-25 10:46:17.575 | INFO     | src.policies:train:109 - Episode 227
2021-08-2

2021-08-25 10:46:17.807 | INFO     | src.policies:train:109 - Episode 245
2021-08-25 10:46:17.819 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:17.819 | INFO     | src.policies:train:121 - Mean episode return: 32.0
2021-08-25 10:46:17.820 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 22.41
2021-08-25 10:46:17.827 | INFO     | src.policies:train:157 - Total loss: 169.75656127929688
2021-08-25 10:46:17.827 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:46:17.830 | INFO     | src.policies:train:103 - Epoch 27 / 800
2021-08-25 10:46:17.831 | INFO     | src.policies:train:109 - Episode 246
2021-08-25 10:46:17.836 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:17.837 | INFO     | src.policies:train:121 - Mean episode return: 11.0
2021-08-25 10:46:17.837 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 22.42
2021-08-25 10:46:17.83

2021-08-25 10:46:18.107 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 23.44
2021-08-25 10:46:18.108 | INFO     | src.policies:train:109 - Episode 265
2021-08-25 10:46:18.113 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:18.114 | INFO     | src.policies:train:121 - Mean episode return: 9.0
2021-08-25 10:46:18.115 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 23.4
2021-08-25 10:46:18.116 | INFO     | src.policies:train:109 - Episode 266
2021-08-25 10:46:18.125 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:18.126 | INFO     | src.policies:train:121 - Mean episode return: 25.0
2021-08-25 10:46:18.126 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 23.55
2021-08-25 10:46:18.127 | INFO     | src.policies:train:109 - Episode 267
2021-08-25 10:46:18.137 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents 

2021-08-25 10:46:18.424 | INFO     | src.policies:train:121 - Mean episode return: 19.0
2021-08-25 10:46:18.425 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 24.11
2021-08-25 10:46:18.426 | INFO     | src.policies:train:109 - Episode 286
2021-08-25 10:46:18.441 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:18.442 | INFO     | src.policies:train:121 - Mean episode return: 40.0
2021-08-25 10:46:18.443 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 24.18
2021-08-25 10:46:18.451 | INFO     | src.policies:train:157 - Total loss: 466.3680419921875
2021-08-25 10:46:18.451 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:46:18.455 | INFO     | src.policies:train:103 - Epoch 32 / 800
2021-08-25 10:46:18.456 | INFO     | src.policies:train:109 - Episode 287
2021-08-25 10:46:18.464 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:18.465

2021-08-25 10:46:18.755 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:18.756 | INFO     | src.policies:train:121 - Mean episode return: 12.0
2021-08-25 10:46:18.757 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 25.27
2021-08-25 10:46:18.758 | INFO     | src.policies:train:109 - Episode 306
2021-08-25 10:46:18.769 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:18.771 | INFO     | src.policies:train:121 - Mean episode return: 32.0
2021-08-25 10:46:18.771 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 25.3
2021-08-25 10:46:18.773 | INFO     | src.policies:train:109 - Episode 307
2021-08-25 10:46:18.780 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:18.781 | INFO     | src.policies:train:121 - Mean episode return: 11.0
2021-08-25 10:46:18.782 | INFO     | src.policies:train:122 - Last 100 episodes

2021-08-25 10:46:19.063 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 26.66
2021-08-25 10:46:19.063 | INFO     | src.policies:train:109 - Episode 326
2021-08-25 10:46:19.073 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:19.074 | INFO     | src.policies:train:121 - Mean episode return: 17.0
2021-08-25 10:46:19.075 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 26.58
2021-08-25 10:46:19.076 | INFO     | src.policies:train:109 - Episode 327
2021-08-25 10:46:19.084 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:19.085 | INFO     | src.policies:train:121 - Mean episode return: 14.0
2021-08-25 10:46:19.085 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 26.56
2021-08-25 10:46:19.094 | INFO     | src.policies:train:157 - Total loss: 123.33795928955078
2021-08-25 10:46:19.095 | INFO     | src.policies:train:161 - Epoch infos: {}


2021-08-25 10:46:19.389 | INFO     | src.policies:train:121 - Mean episode return: 71.0
2021-08-25 10:46:19.390 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 28.04
2021-08-25 10:46:19.391 | INFO     | src.policies:train:109 - Episode 346
2021-08-25 10:46:19.400 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:19.401 | INFO     | src.policies:train:121 - Mean episode return: 22.0
2021-08-25 10:46:19.401 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 28.15
2021-08-25 10:46:19.402 | INFO     | src.policies:train:109 - Episode 347
2021-08-25 10:46:19.408 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:19.409 | INFO     | src.policies:train:121 - Mean episode return: 12.0
2021-08-25 10:46:19.410 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 28.02
2021-08-25 10:46:19.411 | INFO     | src.policies:train:109 - Episode 348
2021-08-2

2021-08-25 10:46:19.655 | INFO     | src.policies:train:109 - Episode 366
2021-08-25 10:46:19.666 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:19.667 | INFO     | src.policies:train:121 - Mean episode return: 26.0
2021-08-25 10:46:19.667 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 27.91
2021-08-25 10:46:19.668 | INFO     | src.policies:train:109 - Episode 367
2021-08-25 10:46:19.679 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:19.680 | INFO     | src.policies:train:121 - Mean episode return: 26.0
2021-08-25 10:46:19.681 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 27.89
2021-08-25 10:46:19.688 | INFO     | src.policies:train:157 - Total loss: 165.20335388183594
2021-08-25 10:46:19.689 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:46:19.691 | INFO     | src.policies:train:103 - Epoch 42 / 800
2021-08-25 10:46:19.69

2021-08-25 10:46:19.938 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 27.04
2021-08-25 10:46:19.938 | INFO     | src.policies:train:109 - Episode 386
2021-08-25 10:46:19.945 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:19.946 | INFO     | src.policies:train:121 - Mean episode return: 12.0
2021-08-25 10:46:19.947 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 26.76
2021-08-25 10:46:19.948 | INFO     | src.policies:train:109 - Episode 387
2021-08-25 10:46:19.956 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:19.957 | INFO     | src.policies:train:121 - Mean episode return: 15.0
2021-08-25 10:46:19.957 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 26.71
2021-08-25 10:46:19.958 | INFO     | src.policies:train:109 - Episode 388
2021-08-25 10:46:19.971 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agent

2021-08-25 10:46:20.227 | INFO     | src.policies:train:121 - Mean episode return: 14.0
2021-08-25 10:46:20.227 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 26.25
2021-08-25 10:46:20.228 | INFO     | src.policies:train:109 - Episode 407
2021-08-25 10:46:20.235 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:20.236 | INFO     | src.policies:train:121 - Mean episode return: 13.0
2021-08-25 10:46:20.236 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 26.27
2021-08-25 10:46:20.237 | INFO     | src.policies:train:109 - Episode 408
2021-08-25 10:46:20.255 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:20.256 | INFO     | src.policies:train:121 - Mean episode return: 50.0
2021-08-25 10:46:20.257 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 26.66
2021-08-25 10:46:20.258 | INFO     | src.policies:train:109 - Episode 409
2021-08-2

2021-08-25 10:46:20.524 | INFO     | src.policies:train:109 - Episode 427
2021-08-25 10:46:20.536 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:20.537 | INFO     | src.policies:train:121 - Mean episode return: 31.0
2021-08-25 10:46:20.537 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 26.88
2021-08-25 10:46:20.544 | INFO     | src.policies:train:157 - Total loss: 129.65711975097656
2021-08-25 10:46:20.544 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:46:20.547 | INFO     | src.policies:train:103 - Epoch 49 / 800
2021-08-25 10:46:20.548 | INFO     | src.policies:train:109 - Episode 428
2021-08-25 10:46:20.565 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:20.566 | INFO     | src.policies:train:121 - Mean episode return: 51.0
2021-08-25 10:46:20.567 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 27.18
2021-08-25 10:46:20.56

2021-08-25 10:46:20.825 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 26.0
2021-08-25 10:46:20.826 | INFO     | src.policies:train:109 - Episode 447
2021-08-25 10:46:20.833 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:20.834 | INFO     | src.policies:train:121 - Mean episode return: 12.0
2021-08-25 10:46:20.835 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 26.0
2021-08-25 10:46:20.836 | INFO     | src.policies:train:109 - Episode 448
2021-08-25 10:46:20.844 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:20.845 | INFO     | src.policies:train:121 - Mean episode return: 19.0
2021-08-25 10:46:20.846 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 25.95
2021-08-25 10:46:20.846 | INFO     | src.policies:train:109 - Episode 449
2021-08-25 10:46:20.857 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents 

2021-08-25 10:46:21.128 | INFO     | src.policies:train:121 - Mean episode return: 10.0
2021-08-25 10:46:21.129 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 26.89
2021-08-25 10:46:21.130 | INFO     | src.policies:train:109 - Episode 467
2021-08-25 10:46:21.140 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:21.142 | INFO     | src.policies:train:121 - Mean episode return: 17.0
2021-08-25 10:46:21.142 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 26.8
2021-08-25 10:46:21.143 | INFO     | src.policies:train:109 - Episode 468
2021-08-25 10:46:21.151 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:21.152 | INFO     | src.policies:train:121 - Mean episode return: 14.0
2021-08-25 10:46:21.153 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 26.8
2021-08-25 10:46:21.154 | INFO     | src.policies:train:109 - Episode 469
2021-08-25 

2021-08-25 10:46:21.423 | INFO     | src.policies:train:121 - Mean episode return: 35.0
2021-08-25 10:46:21.424 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 27.11
2021-08-25 10:46:21.425 | INFO     | src.policies:train:109 - Episode 488
2021-08-25 10:46:21.439 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:21.440 | INFO     | src.policies:train:121 - Mean episode return: 36.0
2021-08-25 10:46:21.441 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 27.1
2021-08-25 10:46:21.442 | INFO     | src.policies:train:109 - Episode 489
2021-08-25 10:46:21.449 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:21.450 | INFO     | src.policies:train:121 - Mean episode return: 12.0
2021-08-25 10:46:21.451 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 26.98
2021-08-25 10:46:21.452 | INFO     | src.policies:train:109 - Episode 490
2021-08-25

2021-08-25 10:46:21.721 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:21.723 | INFO     | src.policies:train:121 - Mean episode return: 44.0
2021-08-25 10:46:21.723 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 27.45
2021-08-25 10:46:21.724 | INFO     | src.policies:train:109 - Episode 508
2021-08-25 10:46:21.736 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:21.737 | INFO     | src.policies:train:121 - Mean episode return: 29.0
2021-08-25 10:46:21.738 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 27.24
2021-08-25 10:46:21.739 | INFO     | src.policies:train:109 - Episode 509
2021-08-25 10:46:21.747 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:21.748 | INFO     | src.policies:train:121 - Mean episode return: 17.0
2021-08-25 10:46:21.748 | INFO     | src.policies:train:122 - Last 100 episode

2021-08-25 10:46:21.993 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 26.41
2021-08-25 10:46:21.994 | INFO     | src.policies:train:109 - Episode 528
2021-08-25 10:46:22.001 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:22.002 | INFO     | src.policies:train:121 - Mean episode return: 17.0
2021-08-25 10:46:22.003 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 26.07
2021-08-25 10:46:22.004 | INFO     | src.policies:train:109 - Episode 529
2021-08-25 10:46:22.014 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:22.015 | INFO     | src.policies:train:121 - Mean episode return: 24.0
2021-08-25 10:46:22.016 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 26.12
2021-08-25 10:46:22.017 | INFO     | src.policies:train:109 - Episode 530
2021-08-25 10:46:22.027 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agent

2021-08-25 10:46:22.260 | INFO     | src.policies:train:121 - Mean episode return: 27.0
2021-08-25 10:46:22.260 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 25.35
2021-08-25 10:46:22.261 | INFO     | src.policies:train:109 - Episode 549
2021-08-25 10:46:22.271 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:22.272 | INFO     | src.policies:train:121 - Mean episode return: 23.0
2021-08-25 10:46:22.273 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 25.33
2021-08-25 10:46:22.274 | INFO     | src.policies:train:109 - Episode 550
2021-08-25 10:46:22.280 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:22.281 | INFO     | src.policies:train:121 - Mean episode return: 12.0
2021-08-25 10:46:22.282 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 25.32
2021-08-25 10:46:22.283 | INFO     | src.policies:train:109 - Episode 551
2021-08-2

2021-08-25 10:46:22.533 | INFO     | src.policies:train:109 - Episode 569
2021-08-25 10:46:22.541 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:22.542 | INFO     | src.policies:train:121 - Mean episode return: 19.0
2021-08-25 10:46:22.543 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 24.32
2021-08-25 10:46:22.544 | INFO     | src.policies:train:109 - Episode 570
2021-08-25 10:46:22.552 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:22.553 | INFO     | src.policies:train:121 - Mean episode return: 18.0
2021-08-25 10:46:22.554 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 24.2
2021-08-25 10:46:22.560 | INFO     | src.policies:train:157 - Total loss: 114.65805053710938
2021-08-25 10:46:22.561 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:46:22.564 | INFO     | src.policies:train:103 - Epoch 66 / 800
2021-08-25 10:46:22.565

2021-08-25 10:46:22.834 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 24.73
2021-08-25 10:46:22.835 | INFO     | src.policies:train:109 - Episode 589
2021-08-25 10:46:22.843 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:22.844 | INFO     | src.policies:train:121 - Mean episode return: 18.0
2021-08-25 10:46:22.845 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 24.79
2021-08-25 10:46:22.846 | INFO     | src.policies:train:109 - Episode 590
2021-08-25 10:46:22.853 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:22.854 | INFO     | src.policies:train:121 - Mean episode return: 12.0
2021-08-25 10:46:22.854 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 24.68
2021-08-25 10:46:22.855 | INFO     | src.policies:train:109 - Episode 591
2021-08-25 10:46:22.867 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agent

2021-08-25 10:46:23.119 | INFO     | src.policies:train:121 - Mean episode return: 26.0
2021-08-25 10:46:23.120 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 24.34
2021-08-25 10:46:23.120 | INFO     | src.policies:train:109 - Episode 610
2021-08-25 10:46:23.131 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:23.132 | INFO     | src.policies:train:121 - Mean episode return: 23.0
2021-08-25 10:46:23.133 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 24.43
2021-08-25 10:46:23.134 | INFO     | src.policies:train:109 - Episode 611
2021-08-25 10:46:23.147 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:23.148 | INFO     | src.policies:train:121 - Mean episode return: 32.0
2021-08-25 10:46:23.148 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 24.58
2021-08-25 10:46:23.155 | INFO     | src.policies:train:157 - Total loss: 155.15640

2021-08-25 10:46:23.466 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:23.467 | INFO     | src.policies:train:121 - Mean episode return: 36.0
2021-08-25 10:46:23.468 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 26.69
2021-08-25 10:46:23.469 | INFO     | src.policies:train:109 - Episode 630
2021-08-25 10:46:23.479 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:23.480 | INFO     | src.policies:train:121 - Mean episode return: 21.0
2021-08-25 10:46:23.480 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 26.65
2021-08-25 10:46:23.487 | INFO     | src.policies:train:157 - Total loss: 642.825439453125
2021-08-25 10:46:23.488 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:46:23.490 | INFO     | src.policies:train:103 - Epoch 74 / 800
2021-08-25 10:46:23.491 | INFO     | src.policies:train:109 - Episode 631
2021-08-25 10:46:23.502 

2021-08-25 10:46:23.742 | INFO     | src.policies:train:103 - Epoch 76 / 800
2021-08-25 10:46:23.743 | INFO     | src.policies:train:109 - Episode 649
2021-08-25 10:46:23.758 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:23.759 | INFO     | src.policies:train:121 - Mean episode return: 46.0
2021-08-25 10:46:23.760 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 27.51
2021-08-25 10:46:23.761 | INFO     | src.policies:train:109 - Episode 650
2021-08-25 10:46:23.767 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:23.768 | INFO     | src.policies:train:121 - Mean episode return: 9.0
2021-08-25 10:46:23.769 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 27.48
2021-08-25 10:46:23.770 | INFO     | src.policies:train:109 - Episode 651
2021-08-25 10:46:23.778 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46

2021-08-25 10:46:24.024 | INFO     | src.policies:train:121 - Mean episode return: 19.0
2021-08-25 10:46:24.025 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 27.46
2021-08-25 10:46:24.026 | INFO     | src.policies:train:109 - Episode 670
2021-08-25 10:46:24.033 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:24.034 | INFO     | src.policies:train:121 - Mean episode return: 15.0
2021-08-25 10:46:24.035 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 27.43
2021-08-25 10:46:24.036 | INFO     | src.policies:train:109 - Episode 671
2021-08-25 10:46:24.044 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:24.045 | INFO     | src.policies:train:121 - Mean episode return: 20.0
2021-08-25 10:46:24.046 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 27.45
2021-08-25 10:46:24.047 | INFO     | src.policies:train:109 - Episode 672
2021-08-2

2021-08-25 10:46:24.333 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:24.334 | INFO     | src.policies:train:121 - Mean episode return: 51.0
2021-08-25 10:46:24.334 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 27.74
2021-08-25 10:46:24.340 | INFO     | src.policies:train:157 - Total loss: 217.71441650390625
2021-08-25 10:46:24.341 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:46:24.344 | INFO     | src.policies:train:103 - Epoch 81 / 800
2021-08-25 10:46:24.345 | INFO     | src.policies:train:109 - Episode 691
2021-08-25 10:46:24.362 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:24.363 | INFO     | src.policies:train:121 - Mean episode return: 50.0
2021-08-25 10:46:24.364 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 27.97
2021-08-25 10:46:24.365 | INFO     | src.policies:train:109 - Episode 692
2021-08-25 10:46:24.37

2021-08-25 10:46:24.630 | INFO     | src.policies:train:109 - Episode 710
2021-08-25 10:46:24.647 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:24.648 | INFO     | src.policies:train:121 - Mean episode return: 47.0
2021-08-25 10:46:24.648 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 28.81
2021-08-25 10:46:24.649 | INFO     | src.policies:train:109 - Episode 711
2021-08-25 10:46:24.656 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:24.657 | INFO     | src.policies:train:121 - Mean episode return: 13.0
2021-08-25 10:46:24.658 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 28.62
2021-08-25 10:46:24.659 | INFO     | src.policies:train:109 - Episode 712
2021-08-25 10:46:24.666 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:24.667 | INFO     | src.policies:train:121 - Mean episode return: 16.0
2021

2021-08-25 10:46:24.933 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 27.13
2021-08-25 10:46:24.939 | INFO     | src.policies:train:157 - Total loss: 187.53970336914062
2021-08-25 10:46:24.940 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:46:24.943 | INFO     | src.policies:train:103 - Epoch 86 / 800
2021-08-25 10:46:24.944 | INFO     | src.policies:train:109 - Episode 731
2021-08-25 10:46:24.951 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:24.952 | INFO     | src.policies:train:121 - Mean episode return: 17.0
2021-08-25 10:46:24.953 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 26.98
2021-08-25 10:46:24.954 | INFO     | src.policies:train:109 - Episode 732
2021-08-25 10:46:24.961 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:24.962 | INFO     | src.policies:train:121 - Mean episode return: 16.0
2021-08-25 10:46:24.96

2021-08-25 10:46:25.249 | INFO     | src.policies:train:121 - Mean episode return: 12.0
2021-08-25 10:46:25.250 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 26.93
2021-08-25 10:46:25.250 | INFO     | src.policies:train:109 - Episode 751
2021-08-25 10:46:25.273 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:25.274 | INFO     | src.policies:train:121 - Mean episode return: 64.0
2021-08-25 10:46:25.275 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 27.35
2021-08-25 10:46:25.276 | INFO     | src.policies:train:109 - Episode 752
2021-08-25 10:46:25.292 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:25.293 | INFO     | src.policies:train:121 - Mean episode return: 45.0
2021-08-25 10:46:25.294 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 27.59
2021-08-25 10:46:25.295 | INFO     | src.policies:train:109 - Episode 753
2021-08-2

2021-08-25 10:46:25.565 | INFO     | src.policies:train:109 - Episode 771
2021-08-25 10:46:25.572 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:25.573 | INFO     | src.policies:train:121 - Mean episode return: 14.0
2021-08-25 10:46:25.574 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 28.0
2021-08-25 10:46:25.580 | INFO     | src.policies:train:157 - Total loss: 247.47854614257812
2021-08-25 10:46:25.580 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:46:25.583 | INFO     | src.policies:train:103 - Epoch 91 / 800
2021-08-25 10:46:25.584 | INFO     | src.policies:train:109 - Episode 772
2021-08-25 10:46:25.596 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:25.598 | INFO     | src.policies:train:121 - Mean episode return: 32.0
2021-08-25 10:46:25.598 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 28.2
2021-08-25 10:46:25.599 

2021-08-25 10:46:25.903 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 28.61
2021-08-25 10:46:25.904 | INFO     | src.policies:train:109 - Episode 791
2021-08-25 10:46:25.914 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:25.914 | INFO     | src.policies:train:121 - Mean episode return: 20.0
2021-08-25 10:46:25.915 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 28.31
2021-08-25 10:46:25.916 | INFO     | src.policies:train:109 - Episode 792
2021-08-25 10:46:25.933 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:25.934 | INFO     | src.policies:train:121 - Mean episode return: 44.0
2021-08-25 10:46:25.934 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 28.55
2021-08-25 10:46:25.941 | INFO     | src.policies:train:157 - Total loss: 272.3721008300781
2021-08-25 10:46:25.941 | INFO     | src.policies:train:161 - Epoch infos: {}
2

2021-08-25 10:46:26.244 | INFO     | src.policies:train:121 - Mean episode return: 31.0
2021-08-25 10:46:26.245 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 29.06
2021-08-25 10:46:26.246 | INFO     | src.policies:train:109 - Episode 811
2021-08-25 10:46:26.254 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:26.255 | INFO     | src.policies:train:121 - Mean episode return: 19.0
2021-08-25 10:46:26.256 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 29.12
2021-08-25 10:46:26.257 | INFO     | src.policies:train:109 - Episode 812
2021-08-25 10:46:26.267 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:26.268 | INFO     | src.policies:train:121 - Mean episode return: 25.0
2021-08-25 10:46:26.269 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 29.21
2021-08-25 10:46:26.270 | INFO     | src.policies:train:109 - Episode 813
2021-08-2

2021-08-25 10:46:26.521 | INFO     | src.policies:train:109 - Episode 831
2021-08-25 10:46:26.528 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:26.529 | INFO     | src.policies:train:121 - Mean episode return: 14.0
2021-08-25 10:46:26.530 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 28.46
2021-08-25 10:46:26.537 | INFO     | src.policies:train:157 - Total loss: 168.76844787597656
2021-08-25 10:46:26.538 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:46:26.541 | INFO     | src.policies:train:103 - Epoch 99 / 800
2021-08-25 10:46:26.542 | INFO     | src.policies:train:109 - Episode 832
2021-08-25 10:46:26.550 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:26.552 | INFO     | src.policies:train:121 - Mean episode return: 18.0
2021-08-25 10:46:26.553 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 28.48
2021-08-25 10:46:26.55

2021-08-25 10:46:26.866 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 29.53
2021-08-25 10:46:26.866 | INFO     | src.policies:train:109 - Episode 851
2021-08-25 10:46:26.877 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:26.878 | INFO     | src.policies:train:121 - Mean episode return: 27.0
2021-08-25 10:46:26.879 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 29.16
2021-08-25 10:46:26.880 | INFO     | src.policies:train:109 - Episode 852
2021-08-25 10:46:26.891 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:26.892 | INFO     | src.policies:train:121 - Mean episode return: 25.0
2021-08-25 10:46:26.893 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 28.96
2021-08-25 10:46:26.894 | INFO     | src.policies:train:109 - Episode 853
2021-08-25 10:46:26.909 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agent

2021-08-25 10:46:27.215 | INFO     | src.policies:train:121 - Mean episode return: 10.0
2021-08-25 10:46:27.216 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 29.89
2021-08-25 10:46:27.217 | INFO     | src.policies:train:109 - Episode 871
2021-08-25 10:46:27.224 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:27.225 | INFO     | src.policies:train:121 - Mean episode return: 14.0
2021-08-25 10:46:27.226 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 29.89
2021-08-25 10:46:27.226 | INFO     | src.policies:train:109 - Episode 872
2021-08-25 10:46:27.244 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:27.245 | INFO     | src.policies:train:121 - Mean episode return: 47.0
2021-08-25 10:46:27.246 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 30.04
2021-08-25 10:46:27.247 | INFO     | src.policies:train:109 - Episode 873
2021-08-2

2021-08-25 10:46:27.604 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:27.605 | INFO     | src.policies:train:121 - Mean episode return: 15.0
2021-08-25 10:46:27.606 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 30.17
2021-08-25 10:46:27.607 | INFO     | src.policies:train:109 - Episode 891
2021-08-25 10:46:27.633 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:27.635 | INFO     | src.policies:train:121 - Mean episode return: 60.0
2021-08-25 10:46:27.636 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 30.57
2021-08-25 10:46:27.637 | INFO     | src.policies:train:109 - Episode 892
2021-08-25 10:46:27.645 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:27.646 | INFO     | src.policies:train:121 - Mean episode return: 10.0
2021-08-25 10:46:27.647 | INFO     | src.policies:train:122 - Last 100 episode

2021-08-25 10:46:28.006 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 30.6
2021-08-25 10:46:28.007 | INFO     | src.policies:train:109 - Episode 910
2021-08-25 10:46:28.022 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:28.023 | INFO     | src.policies:train:121 - Mean episode return: 29.0
2021-08-25 10:46:28.025 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 30.58
2021-08-25 10:46:28.026 | INFO     | src.policies:train:109 - Episode 911
2021-08-25 10:46:28.048 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:28.049 | INFO     | src.policies:train:121 - Mean episode return: 54.0
2021-08-25 10:46:28.050 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 30.93
2021-08-25 10:46:28.050 | INFO     | src.policies:train:109 - Episode 912
2021-08-25 10:46:28.071 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents

2021-08-25 10:46:28.386 | INFO     | src.policies:train:157 - Total loss: 187.78887939453125
2021-08-25 10:46:28.387 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:46:28.390 | INFO     | src.policies:train:103 - Epoch 113 / 800
2021-08-25 10:46:28.392 | INFO     | src.policies:train:109 - Episode 930
2021-08-25 10:46:28.408 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:28.409 | INFO     | src.policies:train:121 - Mean episode return: 39.0
2021-08-25 10:46:28.410 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 31.75
2021-08-25 10:46:28.411 | INFO     | src.policies:train:109 - Episode 931
2021-08-25 10:46:28.430 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:28.431 | INFO     | src.policies:train:121 - Mean episode return: 43.0
2021-08-25 10:46:28.432 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 32.04
2021-08-25 10:46:28.4

2021-08-25 10:46:28.768 | INFO     | src.policies:train:109 - Episode 950
2021-08-25 10:46:28.781 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:28.783 | INFO     | src.policies:train:121 - Mean episode return: 26.0
2021-08-25 10:46:28.783 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 31.85
2021-08-25 10:46:28.784 | INFO     | src.policies:train:109 - Episode 951
2021-08-25 10:46:28.793 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:28.794 | INFO     | src.policies:train:121 - Mean episode return: 18.0
2021-08-25 10:46:28.795 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 31.76
2021-08-25 10:46:28.801 | INFO     | src.policies:train:157 - Total loss: 137.84075927734375
2021-08-25 10:46:28.802 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:46:28.805 | INFO     | src.policies:train:103 - Epoch 116 / 800
2021-08-25 10:46:28.8

2021-08-25 10:46:29.165 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 33.2
2021-08-25 10:46:29.166 | INFO     | src.policies:train:109 - Episode 969
2021-08-25 10:46:29.177 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:29.178 | INFO     | src.policies:train:121 - Mean episode return: 23.0
2021-08-25 10:46:29.179 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 33.1
2021-08-25 10:46:29.180 | INFO     | src.policies:train:109 - Episode 970
2021-08-25 10:46:29.189 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:29.191 | INFO     | src.policies:train:121 - Mean episode return: 19.0
2021-08-25 10:46:29.191 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 33.19
2021-08-25 10:46:29.192 | INFO     | src.policies:train:109 - Episode 971
2021-08-25 10:46:29.204 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents 

2021-08-25 10:46:29.522 | INFO     | src.policies:train:121 - Mean episode return: 29.0
2021-08-25 10:46:29.523 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 32.61
2021-08-25 10:46:29.529 | INFO     | src.policies:train:157 - Total loss: 220.91995239257812
2021-08-25 10:46:29.530 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:46:29.533 | INFO     | src.policies:train:103 - Epoch 122 / 800
2021-08-25 10:46:29.534 | INFO     | src.policies:train:109 - Episode 990
2021-08-25 10:46:29.551 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:29.552 | INFO     | src.policies:train:121 - Mean episode return: 44.0
2021-08-25 10:46:29.553 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 32.9
2021-08-25 10:46:29.554 | INFO     | src.policies:train:109 - Episode 991
2021-08-25 10:46:29.563 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:29.56

2021-08-25 10:46:29.865 | INFO     | src.policies:train:109 - Episode 1009
2021-08-25 10:46:29.873 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:29.874 | INFO     | src.policies:train:121 - Mean episode return: 18.0
2021-08-25 10:46:29.875 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 32.02
2021-08-25 10:46:29.876 | INFO     | src.policies:train:109 - Episode 1010
2021-08-25 10:46:29.884 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:29.885 | INFO     | src.policies:train:121 - Mean episode return: 16.0
2021-08-25 10:46:29.886 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 31.89
2021-08-25 10:46:29.893 | INFO     | src.policies:train:157 - Total loss: 180.14476013183594
2021-08-25 10:46:29.893 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:46:29.896 | INFO     | src.policies:train:103 - Epoch 125 / 800
2021-08-25 10:46:29

2021-08-25 10:46:30.217 | INFO     | src.policies:train:121 - Mean episode return: 67.0
2021-08-25 10:46:30.217 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 32.48
2021-08-25 10:46:30.218 | INFO     | src.policies:train:109 - Episode 1029
2021-08-25 10:46:30.227 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:30.229 | INFO     | src.policies:train:121 - Mean episode return: 19.0
2021-08-25 10:46:30.229 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 32.5
2021-08-25 10:46:30.236 | INFO     | src.policies:train:157 - Total loss: 231.87820434570312
2021-08-25 10:46:30.236 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:46:30.239 | INFO     | src.policies:train:103 - Epoch 128 / 800
2021-08-25 10:46:30.240 | INFO     | src.policies:train:109 - Episode 1030
2021-08-25 10:46:30.248 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:30.

2021-08-25 10:46:30.573 | INFO     | src.policies:train:157 - Total loss: 217.85464477539062
2021-08-25 10:46:30.574 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:46:30.576 | INFO     | src.policies:train:103 - Epoch 131 / 800
2021-08-25 10:46:30.577 | INFO     | src.policies:train:109 - Episode 1048
2021-08-25 10:46:30.597 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:30.598 | INFO     | src.policies:train:121 - Mean episode return: 54.0
2021-08-25 10:46:30.599 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 33.45
2021-08-25 10:46:30.600 | INFO     | src.policies:train:109 - Episode 1049
2021-08-25 10:46:30.621 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:30.622 | INFO     | src.policies:train:121 - Mean episode return: 55.0
2021-08-25 10:46:30.623 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 33.65
2021-08-25 10:46:30

2021-08-25 10:46:30.953 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 33.16
2021-08-25 10:46:30.954 | INFO     | src.policies:train:109 - Episode 1067
2021-08-25 10:46:30.960 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:30.961 | INFO     | src.policies:train:121 - Mean episode return: 9.0
2021-08-25 10:46:30.961 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 32.88
2021-08-25 10:46:30.962 | INFO     | src.policies:train:109 - Episode 1068
2021-08-25 10:46:30.969 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:30.970 | INFO     | src.policies:train:121 - Mean episode return: 11.0
2021-08-25 10:46:30.971 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 32.66
2021-08-25 10:46:30.972 | INFO     | src.policies:train:109 - Episode 1069
2021-08-25 10:46:30.999 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all age

2021-08-25 10:46:31.317 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 33.67
2021-08-25 10:46:31.318 | INFO     | src.policies:train:109 - Episode 1087
2021-08-25 10:46:31.329 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:31.331 | INFO     | src.policies:train:121 - Mean episode return: 31.0
2021-08-25 10:46:31.331 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 33.87
2021-08-25 10:46:31.332 | INFO     | src.policies:train:109 - Episode 1088
2021-08-25 10:46:31.343 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:31.344 | INFO     | src.policies:train:121 - Mean episode return: 23.0
2021-08-25 10:46:31.344 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 33.69
2021-08-25 10:46:31.345 | INFO     | src.policies:train:109 - Episode 1089
2021-08-25 10:46:31.359 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all ag

2021-08-25 10:46:31.659 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:31.660 | INFO     | src.policies:train:121 - Mean episode return: 14.0
2021-08-25 10:46:31.660 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 33.51
2021-08-25 10:46:31.661 | INFO     | src.policies:train:109 - Episode 1107
2021-08-25 10:46:31.671 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:31.672 | INFO     | src.policies:train:121 - Mean episode return: 20.0
2021-08-25 10:46:31.673 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 33.34
2021-08-25 10:46:31.674 | INFO     | src.policies:train:109 - Episode 1108
2021-08-25 10:46:31.693 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:31.694 | INFO     | src.policies:train:121 - Mean episode return: 48.0
2021-08-25 10:46:31.694 | INFO     | src.policies:train:122 - Last 100 episo

2021-08-25 10:46:31.991 | INFO     | src.policies:train:103 - Epoch 143 / 800
2021-08-25 10:46:31.992 | INFO     | src.policies:train:109 - Episode 1126
2021-08-25 10:46:31.999 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:32.001 | INFO     | src.policies:train:121 - Mean episode return: 16.0
2021-08-25 10:46:32.001 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 33.39
2021-08-25 10:46:32.002 | INFO     | src.policies:train:109 - Episode 1127
2021-08-25 10:46:32.015 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:32.016 | INFO     | src.policies:train:121 - Mean episode return: 30.0
2021-08-25 10:46:32.016 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 33.43
2021-08-25 10:46:32.017 | INFO     | src.policies:train:109 - Episode 1128
2021-08-25 10:46:32.032 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 

2021-08-25 10:46:32.329 | INFO     | src.policies:train:121 - Mean episode return: 9.0
2021-08-25 10:46:32.330 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 32.63
2021-08-25 10:46:32.331 | INFO     | src.policies:train:109 - Episode 1147
2021-08-25 10:46:32.364 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:32.365 | INFO     | src.policies:train:121 - Mean episode return: 93.0
2021-08-25 10:46:32.366 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 32.91
2021-08-25 10:46:32.373 | INFO     | src.policies:train:157 - Total loss: 337.5208740234375
2021-08-25 10:46:32.374 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:46:32.376 | INFO     | src.policies:train:103 - Epoch 146 / 800
2021-08-25 10:46:32.377 | INFO     | src.policies:train:109 - Episode 1148
2021-08-25 10:46:32.385 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:32.3

2021-08-25 10:46:32.683 | INFO     | src.policies:train:109 - Episode 1166
2021-08-25 10:46:32.710 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:32.711 | INFO     | src.policies:train:121 - Mean episode return: 72.0
2021-08-25 10:46:32.712 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 31.83
2021-08-25 10:46:32.713 | INFO     | src.policies:train:109 - Episode 1167
2021-08-25 10:46:32.724 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:32.725 | INFO     | src.policies:train:121 - Mean episode return: 26.0
2021-08-25 10:46:32.726 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 32.0
2021-08-25 10:46:32.727 | INFO     | src.policies:train:109 - Episode 1168
2021-08-25 10:46:32.744 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:32.746 | INFO     | src.policies:train:121 - Mean episode return: 44.0
20

2021-08-25 10:46:33.052 | INFO     | src.policies:train:109 - Episode 1186
2021-08-25 10:46:33.059 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:33.060 | INFO     | src.policies:train:121 - Mean episode return: 12.0
2021-08-25 10:46:33.061 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 31.38
2021-08-25 10:46:33.062 | INFO     | src.policies:train:109 - Episode 1187
2021-08-25 10:46:33.069 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:33.070 | INFO     | src.policies:train:121 - Mean episode return: 11.0
2021-08-25 10:46:33.071 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 31.18
2021-08-25 10:46:33.072 | INFO     | src.policies:train:109 - Episode 1188
2021-08-25 10:46:33.079 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:33.080 | INFO     | src.policies:train:121 - Mean episode return: 13.0
2

2021-08-25 10:46:33.395 | INFO     | src.policies:train:121 - Mean episode return: 56.0
2021-08-25 10:46:33.396 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 31.39
2021-08-25 10:46:33.397 | INFO     | src.policies:train:109 - Episode 1206
2021-08-25 10:46:33.411 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:33.412 | INFO     | src.policies:train:121 - Mean episode return: 35.0
2021-08-25 10:46:33.413 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 31.6
2021-08-25 10:46:33.414 | INFO     | src.policies:train:109 - Episode 1207
2021-08-25 10:46:33.424 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:33.425 | INFO     | src.policies:train:121 - Mean episode return: 22.0
2021-08-25 10:46:33.425 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 31.62
2021-08-25 10:46:33.426 | INFO     | src.policies:train:109 - Episode 1208
2021-08

2021-08-25 10:46:33.731 | INFO     | src.policies:train:109 - Episode 1225
2021-08-25 10:46:33.751 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:33.752 | INFO     | src.policies:train:121 - Mean episode return: 54.0
2021-08-25 10:46:33.753 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 31.95
2021-08-25 10:46:33.754 | INFO     | src.policies:train:109 - Episode 1226
2021-08-25 10:46:33.789 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:33.790 | INFO     | src.policies:train:121 - Mean episode return: 99.0
2021-08-25 10:46:33.791 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 32.78
2021-08-25 10:46:33.792 | INFO     | src.policies:train:109 - Episode 1227
2021-08-25 10:46:33.802 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:33.803 | INFO     | src.policies:train:121 - Mean episode return: 19.0
2

2021-08-25 10:46:34.127 | INFO     | src.policies:train:121 - Mean episode return: 35.0
2021-08-25 10:46:34.128 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 33.67
2021-08-25 10:46:34.129 | INFO     | src.policies:train:109 - Episode 1245
2021-08-25 10:46:34.139 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:34.140 | INFO     | src.policies:train:121 - Mean episode return: 21.0
2021-08-25 10:46:34.141 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 33.33
2021-08-25 10:46:34.142 | INFO     | src.policies:train:109 - Episode 1246
2021-08-25 10:46:34.152 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:34.154 | INFO     | src.policies:train:121 - Mean episode return: 21.0
2021-08-25 10:46:34.154 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 33.45
2021-08-25 10:46:34.155 | INFO     | src.policies:train:109 - Episode 1247
2021-0

2021-08-25 10:46:34.475 | INFO     | src.policies:train:109 - Episode 1264
2021-08-25 10:46:34.493 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:34.494 | INFO     | src.policies:train:121 - Mean episode return: 47.0
2021-08-25 10:46:34.495 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 34.1
2021-08-25 10:46:34.496 | INFO     | src.policies:train:109 - Episode 1265
2021-08-25 10:46:34.516 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:34.517 | INFO     | src.policies:train:121 - Mean episode return: 52.0
2021-08-25 10:46:34.518 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 33.83
2021-08-25 10:46:34.519 | INFO     | src.policies:train:109 - Episode 1266
2021-08-25 10:46:34.539 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:34.540 | INFO     | src.policies:train:121 - Mean episode return: 51.0
20

2021-08-25 10:46:34.838 | INFO     | src.policies:train:121 - Mean episode return: 22.0
2021-08-25 10:46:34.839 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 33.49
2021-08-25 10:46:34.840 | INFO     | src.policies:train:109 - Episode 1284
2021-08-25 10:46:34.857 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:34.858 | INFO     | src.policies:train:121 - Mean episode return: 43.0
2021-08-25 10:46:34.858 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 33.68
2021-08-25 10:46:34.859 | INFO     | src.policies:train:109 - Episode 1285
2021-08-25 10:46:34.871 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:34.872 | INFO     | src.policies:train:121 - Mean episode return: 23.0
2021-08-25 10:46:34.873 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 33.26
2021-08-25 10:46:34.873 | INFO     | src.policies:train:109 - Episode 1286
2021-0

2021-08-25 10:46:35.260 | INFO     | src.policies:train:109 - Episode 1303
2021-08-25 10:46:35.266 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:35.267 | INFO     | src.policies:train:121 - Mean episode return: 12.0
2021-08-25 10:46:35.268 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 35.25
2021-08-25 10:46:35.269 | INFO     | src.policies:train:109 - Episode 1304
2021-08-25 10:46:35.282 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:35.283 | INFO     | src.policies:train:121 - Mean episode return: 28.0
2021-08-25 10:46:35.284 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 35.24
2021-08-25 10:46:35.285 | INFO     | src.policies:train:109 - Episode 1305
2021-08-25 10:46:35.295 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:35.296 | INFO     | src.policies:train:121 - Mean episode return: 21.0
2

2021-08-25 10:46:35.573 | INFO     | src.policies:train:121 - Mean episode return: 14.0
2021-08-25 10:46:35.573 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 33.86
2021-08-25 10:46:35.574 | INFO     | src.policies:train:109 - Episode 1324
2021-08-25 10:46:35.588 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:35.589 | INFO     | src.policies:train:121 - Mean episode return: 33.0
2021-08-25 10:46:35.589 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 34.02
2021-08-25 10:46:35.590 | INFO     | src.policies:train:109 - Episode 1325
2021-08-25 10:46:35.600 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:35.601 | INFO     | src.policies:train:121 - Mean episode return: 20.0
2021-08-25 10:46:35.602 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 33.68
2021-08-25 10:46:35.603 | INFO     | src.policies:train:109 - Episode 1326
2021-0

2021-08-25 10:46:35.883 | INFO     | src.policies:train:109 - Episode 1344
2021-08-25 10:46:35.893 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:35.894 | INFO     | src.policies:train:121 - Mean episode return: 20.0
2021-08-25 10:46:35.895 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 31.4
2021-08-25 10:46:35.896 | INFO     | src.policies:train:109 - Episode 1345
2021-08-25 10:46:35.915 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:35.916 | INFO     | src.policies:train:121 - Mean episode return: 52.0
2021-08-25 10:46:35.917 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 31.71
2021-08-25 10:46:35.924 | INFO     | src.policies:train:157 - Total loss: 97.375244140625
2021-08-25 10:46:35.925 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:46:35.928 | INFO     | src.policies:train:103 - Epoch 174 / 800
2021-08-25 10:46:35.929

2021-08-25 10:46:36.206 | INFO     | src.policies:train:121 - Mean episode return: 33.0
2021-08-25 10:46:36.207 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 30.54
2021-08-25 10:46:36.208 | INFO     | src.policies:train:109 - Episode 1364
2021-08-25 10:46:36.228 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:36.229 | INFO     | src.policies:train:121 - Mean episode return: 47.0
2021-08-25 10:46:36.229 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 30.54
2021-08-25 10:46:36.230 | INFO     | src.policies:train:109 - Episode 1365
2021-08-25 10:46:36.244 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:36.245 | INFO     | src.policies:train:121 - Mean episode return: 29.0
2021-08-25 10:46:36.245 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 30.31
2021-08-25 10:46:36.246 | INFO     | src.policies:train:109 - Episode 1366
2021-0

2021-08-25 10:46:36.552 | INFO     | src.policies:train:157 - Total loss: 74.5862045288086
2021-08-25 10:46:36.553 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:46:36.556 | INFO     | src.policies:train:103 - Epoch 179 / 800
2021-08-25 10:46:36.557 | INFO     | src.policies:train:109 - Episode 1384
2021-08-25 10:46:36.573 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:36.574 | INFO     | src.policies:train:121 - Mean episode return: 43.0
2021-08-25 10:46:36.575 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 30.0
2021-08-25 10:46:36.576 | INFO     | src.policies:train:109 - Episode 1385
2021-08-25 10:46:36.587 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:36.588 | INFO     | src.policies:train:121 - Mean episode return: 24.0
2021-08-25 10:46:36.589 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 30.01
2021-08-25 10:46:36.59

2021-08-25 10:46:36.902 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:46:36.905 | INFO     | src.policies:train:103 - Epoch 182 / 800
2021-08-25 10:46:36.906 | INFO     | src.policies:train:109 - Episode 1403
2021-08-25 10:46:36.935 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:36.936 | INFO     | src.policies:train:121 - Mean episode return: 80.0
2021-08-25 10:46:36.937 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 29.18
2021-08-25 10:46:36.938 | INFO     | src.policies:train:109 - Episode 1404
2021-08-25 10:46:36.946 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:36.947 | INFO     | src.policies:train:121 - Mean episode return: 17.0
2021-08-25 10:46:36.948 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 29.07
2021-08-25 10:46:36.949 | INFO     | src.policies:train:109 - Episode 1405
2021-08-25 10:46:36.974 | DEBUG    | 

2021-08-25 10:46:37.270 | INFO     | src.policies:train:157 - Total loss: 122.9393539428711
2021-08-25 10:46:37.271 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:46:37.274 | INFO     | src.policies:train:103 - Epoch 185 / 800
2021-08-25 10:46:37.275 | INFO     | src.policies:train:109 - Episode 1423
2021-08-25 10:46:37.281 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:37.282 | INFO     | src.policies:train:121 - Mean episode return: 11.0
2021-08-25 10:46:37.283 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 29.86
2021-08-25 10:46:37.284 | INFO     | src.policies:train:109 - Episode 1424
2021-08-25 10:46:37.293 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:37.294 | INFO     | src.policies:train:121 - Mean episode return: 18.0
2021-08-25 10:46:37.295 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 29.71
2021-08-25 10:46:37.

2021-08-25 10:46:37.664 | INFO     | src.policies:train:109 - Episode 1442
2021-08-25 10:46:37.680 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:37.681 | INFO     | src.policies:train:121 - Mean episode return: 38.0
2021-08-25 10:46:37.683 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 32.02
2021-08-25 10:46:37.684 | INFO     | src.policies:train:109 - Episode 1443
2021-08-25 10:46:37.711 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:37.713 | INFO     | src.policies:train:121 - Mean episode return: 70.0
2021-08-25 10:46:37.714 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 32.43
2021-08-25 10:46:37.715 | INFO     | src.policies:train:109 - Episode 1444
2021-08-25 10:46:37.725 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:37.727 | INFO     | src.policies:train:121 - Mean episode return: 19.0
2

2021-08-25 10:46:38.111 | INFO     | src.policies:train:121 - Mean episode return: 15.0
2021-08-25 10:46:38.112 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 34.53
2021-08-25 10:46:38.113 | INFO     | src.policies:train:109 - Episode 1462
2021-08-25 10:46:38.131 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:38.132 | INFO     | src.policies:train:121 - Mean episode return: 40.0
2021-08-25 10:46:38.133 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 34.64
2021-08-25 10:46:38.134 | INFO     | src.policies:train:109 - Episode 1463
2021-08-25 10:46:38.167 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:38.168 | INFO     | src.policies:train:121 - Mean episode return: 92.0
2021-08-25 10:46:38.169 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 35.23
2021-08-25 10:46:38.177 | INFO     | src.policies:train:157 - Total loss: 246.567

2021-08-25 10:46:38.518 | INFO     | src.policies:train:109 - Episode 1481
2021-08-25 10:46:38.530 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:38.532 | INFO     | src.policies:train:121 - Mean episode return: 23.0
2021-08-25 10:46:38.533 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 35.12
2021-08-25 10:46:38.534 | INFO     | src.policies:train:109 - Episode 1482
2021-08-25 10:46:38.556 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:38.558 | INFO     | src.policies:train:121 - Mean episode return: 48.0
2021-08-25 10:46:38.559 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 35.48
2021-08-25 10:46:38.560 | INFO     | src.policies:train:109 - Episode 1483
2021-08-25 10:46:38.569 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:38.570 | INFO     | src.policies:train:121 - Mean episode return: 14.0
2

2021-08-25 10:46:38.963 | INFO     | src.policies:train:121 - Mean episode return: 40.0
2021-08-25 10:46:38.965 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 36.09
2021-08-25 10:46:38.966 | INFO     | src.policies:train:109 - Episode 1501
2021-08-25 10:46:38.974 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:38.975 | INFO     | src.policies:train:121 - Mean episode return: 14.0
2021-08-25 10:46:38.976 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 36.03
2021-08-25 10:46:38.976 | INFO     | src.policies:train:109 - Episode 1502
2021-08-25 10:46:38.990 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:38.992 | INFO     | src.policies:train:121 - Mean episode return: 31.0
2021-08-25 10:46:38.993 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 35.64
2021-08-25 10:46:39.002 | INFO     | src.policies:train:157 - Total loss: 97.5756

2021-08-25 10:46:39.379 | INFO     | src.policies:train:109 - Episode 1519
2021-08-25 10:46:39.389 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:39.390 | INFO     | src.policies:train:121 - Mean episode return: 19.0
2021-08-25 10:46:39.391 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 37.21
2021-08-25 10:46:39.392 | INFO     | src.policies:train:109 - Episode 1520
2021-08-25 10:46:39.402 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:39.404 | INFO     | src.policies:train:121 - Mean episode return: 20.0
2021-08-25 10:46:39.404 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 37.26
2021-08-25 10:46:39.405 | INFO     | src.policies:train:109 - Episode 1521
2021-08-25 10:46:39.426 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:39.428 | INFO     | src.policies:train:121 - Mean episode return: 51.0
2

2021-08-25 10:46:39.777 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:46:39.780 | INFO     | src.policies:train:103 - Epoch 204 / 800
2021-08-25 10:46:39.781 | INFO     | src.policies:train:109 - Episode 1539
2021-08-25 10:46:39.792 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:39.793 | INFO     | src.policies:train:121 - Mean episode return: 26.0
2021-08-25 10:46:39.794 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 37.71
2021-08-25 10:46:39.795 | INFO     | src.policies:train:109 - Episode 1540
2021-08-25 10:46:39.839 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:39.840 | INFO     | src.policies:train:121 - Mean episode return: 122.0
2021-08-25 10:46:39.841 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 38.52
2021-08-25 10:46:39.842 | INFO     | src.policies:train:109 - Episode 1541
2021-08-25 10:46:39.854 | DEBUG    |

2021-08-25 10:46:40.194 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:40.195 | INFO     | src.policies:train:121 - Mean episode return: 84.0
2021-08-25 10:46:40.196 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 37.37
2021-08-25 10:46:40.197 | INFO     | src.policies:train:109 - Episode 1559
2021-08-25 10:46:40.210 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:40.211 | INFO     | src.policies:train:121 - Mean episode return: 33.0
2021-08-25 10:46:40.212 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 37.26
2021-08-25 10:46:40.213 | INFO     | src.policies:train:109 - Episode 1560
2021-08-25 10:46:40.229 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:40.230 | INFO     | src.policies:train:121 - Mean episode return: 36.0
2021-08-25 10:46:40.230 | INFO     | src.policies:train:122 - Last 100 episo

2021-08-25 10:46:40.575 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 37.25
2021-08-25 10:46:40.576 | INFO     | src.policies:train:109 - Episode 1578
2021-08-25 10:46:40.596 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:40.597 | INFO     | src.policies:train:121 - Mean episode return: 49.0
2021-08-25 10:46:40.598 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 37.46
2021-08-25 10:46:40.599 | INFO     | src.policies:train:109 - Episode 1579
2021-08-25 10:46:40.614 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:40.615 | INFO     | src.policies:train:121 - Mean episode return: 35.0
2021-08-25 10:46:40.616 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 37.63
2021-08-25 10:46:40.622 | INFO     | src.policies:train:157 - Total loss: 132.21112060546875
2021-08-25 10:46:40.623 | INFO     | src.policies:train:161 - Epoch infos: {

2021-08-25 10:46:41.009 | INFO     | src.policies:train:121 - Mean episode return: 95.0
2021-08-25 10:46:41.010 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 39.03
2021-08-25 10:46:41.017 | INFO     | src.policies:train:157 - Total loss: 208.55772399902344
2021-08-25 10:46:41.018 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:46:41.021 | INFO     | src.policies:train:103 - Epoch 214 / 800
2021-08-25 10:46:41.022 | INFO     | src.policies:train:109 - Episode 1598
2021-08-25 10:46:41.045 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:41.046 | INFO     | src.policies:train:121 - Mean episode return: 58.0
2021-08-25 10:46:41.047 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 39.0
2021-08-25 10:46:41.048 | INFO     | src.policies:train:109 - Episode 1599
2021-08-25 10:46:41.064 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:41.

2021-08-25 10:46:41.386 | INFO     | src.policies:train:157 - Total loss: 122.40425109863281
2021-08-25 10:46:41.387 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:46:41.390 | INFO     | src.policies:train:103 - Epoch 217 / 800
2021-08-25 10:46:41.391 | INFO     | src.policies:train:109 - Episode 1617
2021-08-25 10:46:41.403 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:41.404 | INFO     | src.policies:train:121 - Mean episode return: 27.0
2021-08-25 10:46:41.405 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 36.99
2021-08-25 10:46:41.406 | INFO     | src.policies:train:109 - Episode 1618
2021-08-25 10:46:41.427 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:41.428 | INFO     | src.policies:train:121 - Mean episode return: 51.0
2021-08-25 10:46:41.429 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 37.39
2021-08-25 10:46:41

2021-08-25 10:46:41.790 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 37.96
2021-08-25 10:46:41.791 | INFO     | src.policies:train:109 - Episode 1636
2021-08-25 10:46:41.804 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:41.805 | INFO     | src.policies:train:121 - Mean episode return: 27.0
2021-08-25 10:46:41.806 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 37.97
2021-08-25 10:46:41.807 | INFO     | src.policies:train:109 - Episode 1637
2021-08-25 10:46:41.832 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:41.833 | INFO     | src.policies:train:121 - Mean episode return: 61.0
2021-08-25 10:46:41.834 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 38.29
2021-08-25 10:46:41.839 | INFO     | src.policies:train:157 - Total loss: 136.14698791503906
2021-08-25 10:46:41.839 | INFO     | src.policies:train:161 - Epoch infos: {

2021-08-25 10:46:42.193 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:46:42.196 | INFO     | src.policies:train:103 - Epoch 224 / 800
2021-08-25 10:46:42.197 | INFO     | src.policies:train:109 - Episode 1655
2021-08-25 10:46:42.205 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:42.206 | INFO     | src.policies:train:121 - Mean episode return: 16.0
2021-08-25 10:46:42.207 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 37.51
2021-08-25 10:46:42.208 | INFO     | src.policies:train:109 - Episode 1656
2021-08-25 10:46:42.221 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:42.222 | INFO     | src.policies:train:121 - Mean episode return: 27.0
2021-08-25 10:46:42.223 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 37.64
2021-08-25 10:46:42.224 | INFO     | src.policies:train:109 - Episode 1657
2021-08-25 10:46:42.248 | DEBUG    | 

2021-08-25 10:46:42.612 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:42.614 | INFO     | src.policies:train:121 - Mean episode return: 86.0
2021-08-25 10:46:42.614 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 38.5
2021-08-25 10:46:42.615 | INFO     | src.policies:train:109 - Episode 1675
2021-08-25 10:46:42.625 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:42.626 | INFO     | src.policies:train:121 - Mean episode return: 21.0
2021-08-25 10:46:42.627 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 38.34
2021-08-25 10:46:42.628 | INFO     | src.policies:train:109 - Episode 1676
2021-08-25 10:46:42.655 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:42.656 | INFO     | src.policies:train:121 - Mean episode return: 69.0
2021-08-25 10:46:42.657 | INFO     | src.policies:train:122 - Last 100 episod

2021-08-25 10:46:42.973 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 37.27
2021-08-25 10:46:42.974 | INFO     | src.policies:train:109 - Episode 1694
2021-08-25 10:46:42.989 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:42.990 | INFO     | src.policies:train:121 - Mean episode return: 37.0
2021-08-25 10:46:42.991 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 37.29
2021-08-25 10:46:42.992 | INFO     | src.policies:train:109 - Episode 1695
2021-08-25 10:46:43.005 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:43.006 | INFO     | src.policies:train:121 - Mean episode return: 28.0
2021-08-25 10:46:43.007 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 36.95
2021-08-25 10:46:43.007 | INFO     | src.policies:train:109 - Episode 1696
2021-08-25 10:46:43.016 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all ag

2021-08-25 10:46:43.313 | INFO     | src.policies:train:121 - Mean episode return: 13.0
2021-08-25 10:46:43.314 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 34.92
2021-08-25 10:46:43.315 | INFO     | src.policies:train:109 - Episode 1714
2021-08-25 10:46:43.324 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:43.325 | INFO     | src.policies:train:121 - Mean episode return: 20.0
2021-08-25 10:46:43.326 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 34.74
2021-08-25 10:46:43.327 | INFO     | src.policies:train:109 - Episode 1715
2021-08-25 10:46:43.365 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:43.366 | INFO     | src.policies:train:121 - Mean episode return: 100.0
2021-08-25 10:46:43.367 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 35.46
2021-08-25 10:46:43.367 | INFO     | src.policies:train:109 - Episode 1716
2021-

2021-08-25 10:46:43.638 | INFO     | src.policies:train:109 - Episode 1734
2021-08-25 10:46:43.648 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:43.649 | INFO     | src.policies:train:121 - Mean episode return: 18.0
2021-08-25 10:46:43.650 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 32.69
2021-08-25 10:46:43.651 | INFO     | src.policies:train:109 - Episode 1735
2021-08-25 10:46:43.665 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:43.666 | INFO     | src.policies:train:121 - Mean episode return: 28.0
2021-08-25 10:46:43.667 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 32.45
2021-08-25 10:46:43.668 | INFO     | src.policies:train:109 - Episode 1736
2021-08-25 10:46:43.679 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:43.680 | INFO     | src.policies:train:121 - Mean episode return: 18.0
2

2021-08-25 10:46:43.971 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:46:43.974 | INFO     | src.policies:train:103 - Epoch 238 / 800
2021-08-25 10:46:43.975 | INFO     | src.policies:train:109 - Episode 1754
2021-08-25 10:46:43.999 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:44.000 | INFO     | src.policies:train:121 - Mean episode return: 65.0
2021-08-25 10:46:44.001 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 30.54
2021-08-25 10:46:44.002 | INFO     | src.policies:train:109 - Episode 1755
2021-08-25 10:46:44.011 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:44.012 | INFO     | src.policies:train:121 - Mean episode return: 15.0
2021-08-25 10:46:44.013 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 30.53
2021-08-25 10:46:44.014 | INFO     | src.policies:train:109 - Episode 1756
2021-08-25 10:46:44.040 | DEBUG    | 

2021-08-25 10:46:44.350 | INFO     | src.policies:train:157 - Total loss: 80.36029815673828
2021-08-25 10:46:44.350 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:46:44.353 | INFO     | src.policies:train:103 - Epoch 241 / 800
2021-08-25 10:46:44.355 | INFO     | src.policies:train:109 - Episode 1774
2021-08-25 10:46:44.368 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:44.370 | INFO     | src.policies:train:121 - Mean episode return: 37.0
2021-08-25 10:46:44.370 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 28.98
2021-08-25 10:46:44.371 | INFO     | src.policies:train:109 - Episode 1775
2021-08-25 10:46:44.402 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:44.403 | INFO     | src.policies:train:121 - Mean episode return: 76.0
2021-08-25 10:46:44.404 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 29.53
2021-08-25 10:46:44.

2021-08-25 10:46:44.768 | INFO     | src.policies:train:109 - Episode 1793
2021-08-25 10:46:44.799 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:44.800 | INFO     | src.policies:train:121 - Mean episode return: 82.0
2021-08-25 10:46:44.800 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 31.03
2021-08-25 10:46:44.801 | INFO     | src.policies:train:109 - Episode 1794
2021-08-25 10:46:44.814 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:44.815 | INFO     | src.policies:train:121 - Mean episode return: 23.0
2021-08-25 10:46:44.816 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 30.89
2021-08-25 10:46:44.817 | INFO     | src.policies:train:109 - Episode 1795
2021-08-25 10:46:44.829 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:44.830 | INFO     | src.policies:train:121 - Mean episode return: 27.0
2

2021-08-25 10:46:45.149 | INFO     | src.policies:train:121 - Mean episode return: 41.0
2021-08-25 10:46:45.149 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 31.65
2021-08-25 10:46:45.150 | INFO     | src.policies:train:109 - Episode 1813
2021-08-25 10:46:45.174 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:45.175 | INFO     | src.policies:train:121 - Mean episode return: 61.0
2021-08-25 10:46:45.176 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 32.13
2021-08-25 10:46:45.177 | INFO     | src.policies:train:109 - Episode 1814
2021-08-25 10:46:45.190 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:45.192 | INFO     | src.policies:train:121 - Mean episode return: 31.0
2021-08-25 10:46:45.192 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 32.24
2021-08-25 10:46:45.193 | INFO     | src.policies:train:109 - Episode 1815
2021-0

2021-08-25 10:46:45.559 | INFO     | src.policies:train:109 - Episode 1832
2021-08-25 10:46:45.569 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:45.570 | INFO     | src.policies:train:121 - Mean episode return: 19.0
2021-08-25 10:46:45.571 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 34.33
2021-08-25 10:46:45.578 | INFO     | src.policies:train:157 - Total loss: 101.92375183105469
2021-08-25 10:46:45.579 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:46:45.582 | INFO     | src.policies:train:103 - Epoch 251 / 800
2021-08-25 10:46:45.583 | INFO     | src.policies:train:109 - Episode 1833
2021-08-25 10:46:45.604 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:45.606 | INFO     | src.policies:train:121 - Mean episode return: 57.0
2021-08-25 10:46:45.607 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 34.72
2021-08-25 10:46:45

2021-08-25 10:46:45.950 | INFO     | src.policies:train:121 - Mean episode return: 16.0
2021-08-25 10:46:45.951 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 35.81
2021-08-25 10:46:45.952 | INFO     | src.policies:train:109 - Episode 1852
2021-08-25 10:46:45.968 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:45.969 | INFO     | src.policies:train:121 - Mean episode return: 40.0
2021-08-25 10:46:45.970 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 35.83
2021-08-25 10:46:45.971 | INFO     | src.policies:train:109 - Episode 1853
2021-08-25 10:46:45.982 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:45.983 | INFO     | src.policies:train:121 - Mean episode return: 21.0
2021-08-25 10:46:45.984 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 35.88
2021-08-25 10:46:45.991 | INFO     | src.policies:train:157 - Total loss: 173.415

2021-08-25 10:46:46.322 | INFO     | src.policies:train:109 - Episode 1871
2021-08-25 10:46:46.336 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:46.338 | INFO     | src.policies:train:121 - Mean episode return: 32.0
2021-08-25 10:46:46.338 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 36.45
2021-08-25 10:46:46.339 | INFO     | src.policies:train:109 - Episode 1872
2021-08-25 10:46:46.350 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:46.351 | INFO     | src.policies:train:121 - Mean episode return: 22.0
2021-08-25 10:46:46.352 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 36.35
2021-08-25 10:46:46.359 | INFO     | src.policies:train:157 - Total loss: 225.27938842773438
2021-08-25 10:46:46.359 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:46:46.362 | INFO     | src.policies:train:103 - Epoch 257 / 800
2021-08-25 10:46:46

2021-08-25 10:46:46.670 | INFO     | src.policies:train:121 - Mean episode return: 37.0
2021-08-25 10:46:46.671 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 33.9
2021-08-25 10:46:46.672 | INFO     | src.policies:train:109 - Episode 1891
2021-08-25 10:46:46.681 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:46.683 | INFO     | src.policies:train:121 - Mean episode return: 16.0
2021-08-25 10:46:46.684 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 33.7
2021-08-25 10:46:46.685 | INFO     | src.policies:train:109 - Episode 1892
2021-08-25 10:46:46.693 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:46.694 | INFO     | src.policies:train:121 - Mean episode return: 15.0
2021-08-25 10:46:46.695 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 33.68
2021-08-25 10:46:46.696 | INFO     | src.policies:train:109 - Episode 1893
2021-08-

2021-08-25 10:46:47.056 | INFO     | src.policies:train:109 - Episode 1910
2021-08-25 10:46:47.063 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:47.065 | INFO     | src.policies:train:121 - Mean episode return: 13.0
2021-08-25 10:46:47.065 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 34.22
2021-08-25 10:46:47.066 | INFO     | src.policies:train:109 - Episode 1911
2021-08-25 10:46:47.103 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:47.104 | INFO     | src.policies:train:121 - Mean episode return: 101.0
2021-08-25 10:46:47.105 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 35.12
2021-08-25 10:46:47.106 | INFO     | src.policies:train:109 - Episode 1912
2021-08-25 10:46:47.116 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:47.117 | INFO     | src.policies:train:121 - Mean episode return: 19.0


2021-08-25 10:46:47.463 | INFO     | src.policies:train:121 - Mean episode return: 26.0
2021-08-25 10:46:47.464 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 34.12
2021-08-25 10:46:47.465 | INFO     | src.policies:train:109 - Episode 1930
2021-08-25 10:46:47.481 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:47.482 | INFO     | src.policies:train:121 - Mean episode return: 34.0
2021-08-25 10:46:47.483 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 34.27
2021-08-25 10:46:47.491 | INFO     | src.policies:train:157 - Total loss: 64.1790542602539
2021-08-25 10:46:47.492 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:46:47.495 | INFO     | src.policies:train:103 - Epoch 266 / 800
2021-08-25 10:46:47.496 | INFO     | src.policies:train:109 - Episode 1931
2021-08-25 10:46:47.508 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:47.5

2021-08-25 10:46:47.871 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:47.873 | INFO     | src.policies:train:121 - Mean episode return: 11.0
2021-08-25 10:46:47.873 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 35.27
2021-08-25 10:46:47.875 | INFO     | src.policies:train:109 - Episode 1949
2021-08-25 10:46:47.882 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:47.883 | INFO     | src.policies:train:121 - Mean episode return: 11.0
2021-08-25 10:46:47.884 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 34.53
2021-08-25 10:46:47.885 | INFO     | src.policies:train:109 - Episode 1950
2021-08-25 10:46:47.897 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:47.898 | INFO     | src.policies:train:121 - Mean episode return: 26.0
2021-08-25 10:46:47.899 | INFO     | src.policies:train:122 - Last 100 episo

2021-08-25 10:46:48.378 | INFO     | src.policies:train:103 - Epoch 273 / 800
2021-08-25 10:46:48.379 | INFO     | src.policies:train:109 - Episode 1967
2021-08-25 10:46:48.385 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:48.386 | INFO     | src.policies:train:121 - Mean episode return: 13.0
2021-08-25 10:46:48.387 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 38.75
2021-08-25 10:46:48.388 | INFO     | src.policies:train:109 - Episode 1968
2021-08-25 10:46:48.413 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:48.414 | INFO     | src.policies:train:121 - Mean episode return: 61.0
2021-08-25 10:46:48.415 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 38.92
2021-08-25 10:46:48.416 | INFO     | src.policies:train:109 - Episode 1969
2021-08-25 10:46:48.448 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 

2021-08-25 10:46:48.885 | INFO     | src.policies:train:121 - Mean episode return: 20.0
2021-08-25 10:46:48.886 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 41.54
2021-08-25 10:46:48.894 | INFO     | src.policies:train:157 - Total loss: 175.09800720214844
2021-08-25 10:46:48.895 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:46:48.899 | INFO     | src.policies:train:103 - Epoch 277 / 800
2021-08-25 10:46:48.900 | INFO     | src.policies:train:109 - Episode 1987
2021-08-25 10:46:48.923 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:48.925 | INFO     | src.policies:train:121 - Mean episode return: 62.0
2021-08-25 10:46:48.926 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 41.83
2021-08-25 10:46:48.926 | INFO     | src.policies:train:109 - Episode 1988
2021-08-25 10:46:48.936 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:48

2021-08-25 10:46:49.340 | INFO     | src.policies:train:157 - Total loss: 84.1073989868164
2021-08-25 10:46:49.341 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:46:49.345 | INFO     | src.policies:train:103 - Epoch 280 / 800
2021-08-25 10:46:49.346 | INFO     | src.policies:train:109 - Episode 2006
2021-08-25 10:46:49.360 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:49.361 | INFO     | src.policies:train:121 - Mean episode return: 31.0
2021-08-25 10:46:49.362 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 42.15
2021-08-25 10:46:49.363 | INFO     | src.policies:train:109 - Episode 2007
2021-08-25 10:46:49.373 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:49.374 | INFO     | src.policies:train:121 - Mean episode return: 18.0
2021-08-25 10:46:49.375 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 42.13
2021-08-25 10:46:49.3

2021-08-25 10:46:49.733 | INFO     | src.policies:train:121 - Mean episode return: 19.0
2021-08-25 10:46:49.734 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 41.48
2021-08-25 10:46:49.735 | INFO     | src.policies:train:109 - Episode 2026
2021-08-25 10:46:49.756 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:49.757 | INFO     | src.policies:train:121 - Mean episode return: 55.0
2021-08-25 10:46:49.758 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 41.83
2021-08-25 10:46:49.765 | INFO     | src.policies:train:157 - Total loss: 58.39814376831055
2021-08-25 10:46:49.766 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:46:49.769 | INFO     | src.policies:train:103 - Epoch 283 / 800
2021-08-25 10:46:49.770 | INFO     | src.policies:train:109 - Episode 2027
2021-08-25 10:46:49.776 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:49.

2021-08-25 10:46:50.131 | INFO     | src.policies:train:109 - Episode 2045
2021-08-25 10:46:50.172 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:50.173 | INFO     | src.policies:train:121 - Mean episode return: 113.0
2021-08-25 10:46:50.174 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 41.58
2021-08-25 10:46:50.181 | INFO     | src.policies:train:157 - Total loss: 238.256103515625
2021-08-25 10:46:50.182 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:46:50.185 | INFO     | src.policies:train:103 - Epoch 286 / 800
2021-08-25 10:46:50.186 | INFO     | src.policies:train:109 - Episode 2046
2021-08-25 10:46:50.193 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:50.195 | INFO     | src.policies:train:121 - Mean episode return: 15.0
2021-08-25 10:46:50.195 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 41.58
2021-08-25 10:46:50.

2021-08-25 10:46:50.550 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 40.84
2021-08-25 10:46:50.551 | INFO     | src.policies:train:109 - Episode 2064
2021-08-25 10:46:50.569 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:50.570 | INFO     | src.policies:train:121 - Mean episode return: 45.0
2021-08-25 10:46:50.571 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 40.82
2021-08-25 10:46:50.572 | INFO     | src.policies:train:109 - Episode 2065
2021-08-25 10:46:50.602 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:50.603 | INFO     | src.policies:train:121 - Mean episode return: 83.0
2021-08-25 10:46:50.604 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 41.52
2021-08-25 10:46:50.605 | INFO     | src.policies:train:109 - Episode 2066
2021-08-25 10:46:50.649 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all ag

2021-08-25 10:46:50.968 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:50.969 | INFO     | src.policies:train:121 - Mean episode return: 18.0
2021-08-25 10:46:50.970 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 38.51
2021-08-25 10:46:50.971 | INFO     | src.policies:train:109 - Episode 2084
2021-08-25 10:46:50.979 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:50.980 | INFO     | src.policies:train:121 - Mean episode return: 14.0
2021-08-25 10:46:50.981 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 37.67
2021-08-25 10:46:50.982 | INFO     | src.policies:train:109 - Episode 2085
2021-08-25 10:46:50.990 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:50.991 | INFO     | src.policies:train:121 - Mean episode return: 15.0
2021-08-25 10:46:50.992 | INFO     | src.policies:train:122 - Last 100 episo

2021-08-25 10:46:51.342 | INFO     | src.policies:train:103 - Epoch 295 / 800
2021-08-25 10:46:51.343 | INFO     | src.policies:train:109 - Episode 2103
2021-08-25 10:46:51.364 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:51.365 | INFO     | src.policies:train:121 - Mean episode return: 56.0
2021-08-25 10:46:51.366 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 37.29
2021-08-25 10:46:51.367 | INFO     | src.policies:train:109 - Episode 2104
2021-08-25 10:46:51.377 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:51.378 | INFO     | src.policies:train:121 - Mean episode return: 17.0
2021-08-25 10:46:51.379 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 37.32
2021-08-25 10:46:51.380 | INFO     | src.policies:train:109 - Episode 2105
2021-08-25 10:46:51.395 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 

2021-08-25 10:46:51.802 | INFO     | src.policies:train:121 - Mean episode return: 14.0
2021-08-25 10:46:51.803 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 39.32
2021-08-25 10:46:51.809 | INFO     | src.policies:train:157 - Total loss: 269.7599792480469
2021-08-25 10:46:51.809 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:46:51.813 | INFO     | src.policies:train:103 - Epoch 299 / 800
2021-08-25 10:46:51.814 | INFO     | src.policies:train:109 - Episode 2123
2021-08-25 10:46:51.831 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:51.832 | INFO     | src.policies:train:121 - Mean episode return: 42.0
2021-08-25 10:46:51.833 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 39.5
2021-08-25 10:46:51.834 | INFO     | src.policies:train:109 - Episode 2124
2021-08-25 10:46:51.841 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:51.8

2021-08-25 10:46:52.295 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:52.296 | INFO     | src.policies:train:121 - Mean episode return: 15.0
2021-08-25 10:46:52.297 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 42.26
2021-08-25 10:46:52.298 | INFO     | src.policies:train:109 - Episode 2142
2021-08-25 10:46:52.324 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:52.325 | INFO     | src.policies:train:121 - Mean episode return: 70.0
2021-08-25 10:46:52.326 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 42.78
2021-08-25 10:46:52.333 | INFO     | src.policies:train:157 - Total loss: 171.0751953125
2021-08-25 10:46:52.333 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:46:52.336 | INFO     | src.policies:train:103 - Epoch 303 / 800
2021-08-25 10:46:52.338 | INFO     | src.policies:train:109 - Episode 2143
2021-08-25 10:46:52.353

2021-08-25 10:46:52.794 | INFO     | src.policies:train:103 - Epoch 307 / 800
2021-08-25 10:46:52.795 | INFO     | src.policies:train:109 - Episode 2159
2021-08-25 10:46:52.831 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:52.832 | INFO     | src.policies:train:121 - Mean episode return: 103.0
2021-08-25 10:46:52.833 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 46.43
2021-08-25 10:46:52.834 | INFO     | src.policies:train:109 - Episode 2160
2021-08-25 10:46:52.845 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:52.846 | INFO     | src.policies:train:121 - Mean episode return: 24.0
2021-08-25 10:46:52.846 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 46.23
2021-08-25 10:46:52.847 | INFO     | src.policies:train:109 - Episode 2161
2021-08-25 10:46:52.869 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25

2021-08-25 10:46:53.349 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 49.67
2021-08-25 10:46:53.350 | INFO     | src.policies:train:109 - Episode 2178
2021-08-25 10:46:53.371 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:53.372 | INFO     | src.policies:train:121 - Mean episode return: 48.0
2021-08-25 10:46:53.373 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 49.73
2021-08-25 10:46:53.374 | INFO     | src.policies:train:109 - Episode 2179
2021-08-25 10:46:53.385 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:53.387 | INFO     | src.policies:train:121 - Mean episode return: 26.0
2021-08-25 10:46:53.387 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 49.84
2021-08-25 10:46:53.394 | INFO     | src.policies:train:157 - Total loss: 196.21240234375
2021-08-25 10:46:53.394 | INFO     | src.policies:train:161 - Epoch infos: {}
2

2021-08-25 10:46:53.954 | INFO     | src.policies:train:157 - Total loss: 389.82415771484375
2021-08-25 10:46:53.955 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:46:53.958 | INFO     | src.policies:train:103 - Epoch 317 / 800
2021-08-25 10:46:53.959 | INFO     | src.policies:train:109 - Episode 2195
2021-08-25 10:46:53.991 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:53.993 | INFO     | src.policies:train:121 - Mean episode return: 95.0
2021-08-25 10:46:53.994 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 58.5
2021-08-25 10:46:53.994 | INFO     | src.policies:train:109 - Episode 2196
2021-08-25 10:46:54.021 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:54.022 | INFO     | src.policies:train:121 - Mean episode return: 68.0
2021-08-25 10:46:54.023 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 59.05
2021-08-25 10:46:54.

2021-08-25 10:46:54.698 | INFO     | src.policies:train:109 - Episode 2211
2021-08-25 10:46:54.737 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:54.738 | INFO     | src.policies:train:121 - Mean episode return: 110.0
2021-08-25 10:46:54.739 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 70.28
2021-08-25 10:46:54.740 | INFO     | src.policies:train:109 - Episode 2212
2021-08-25 10:46:54.783 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:54.784 | INFO     | src.policies:train:121 - Mean episode return: 110.0
2021-08-25 10:46:54.785 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 71.17
2021-08-25 10:46:54.792 | INFO     | src.policies:train:157 - Total loss: 297.5151062011719
2021-08-25 10:46:54.793 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:46:54.795 | INFO     | src.policies:train:103 - Epoch 324 / 800
2021-08-25 10:46:5

2021-08-25 10:46:55.508 | INFO     | src.policies:train:109 - Episode 2227
2021-08-25 10:46:55.548 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:55.549 | INFO     | src.policies:train:121 - Mean episode return: 109.0
2021-08-25 10:46:55.550 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 81.93
2021-08-25 10:46:55.556 | INFO     | src.policies:train:157 - Total loss: 304.3032531738281
2021-08-25 10:46:55.557 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:46:55.560 | INFO     | src.policies:train:103 - Epoch 331 / 800
2021-08-25 10:46:55.561 | INFO     | src.policies:train:109 - Episode 2228
2021-08-25 10:46:55.610 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:55.611 | INFO     | src.policies:train:121 - Mean episode return: 141.0
2021-08-25 10:46:55.612 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 81.87
2021-08-25 10:46:5

2021-08-25 10:46:56.313 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 90.97
2021-08-25 10:46:56.322 | INFO     | src.policies:train:157 - Total loss: 222.52041625976562
2021-08-25 10:46:56.323 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:46:56.328 | INFO     | src.policies:train:103 - Epoch 338 / 800
2021-08-25 10:46:56.329 | INFO     | src.policies:train:109 - Episode 2243
2021-08-25 10:46:56.375 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:56.376 | INFO     | src.policies:train:121 - Mean episode return: 132.0
2021-08-25 10:46:56.377 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 91.89
2021-08-25 10:46:56.378 | INFO     | src.policies:train:109 - Episode 2244
2021-08-25 10:46:56.432 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:56.433 | INFO     | src.policies:train:121 - Mean episode return: 155.0
2021-08-25 10:46:

2021-08-25 10:46:57.090 | INFO     | src.policies:train:121 - Mean episode return: 88.0
2021-08-25 10:46:57.091 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 99.85
2021-08-25 10:46:57.092 | INFO     | src.policies:train:109 - Episode 2259
2021-08-25 10:46:57.161 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:57.162 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 10:46:57.163 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 100.82
2021-08-25 10:46:57.170 | INFO     | src.policies:train:157 - Total loss: 446.3867492675781
2021-08-25 10:46:57.171 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:46:57.174 | INFO     | src.policies:train:103 - Epoch 345 / 800
2021-08-25 10:46:57.175 | INFO     | src.policies:train:109 - Episode 2260
2021-08-25 10:46:57.223 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:5

2021-08-25 10:46:58.045 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 115.24
2021-08-25 10:46:58.051 | INFO     | src.policies:train:157 - Total loss: 456.7421569824219
2021-08-25 10:46:58.052 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:46:58.055 | INFO     | src.policies:train:103 - Epoch 353 / 800
2021-08-25 10:46:58.056 | INFO     | src.policies:train:109 - Episode 2274
2021-08-25 10:46:58.123 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:58.124 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 10:46:58.125 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 116.24
2021-08-25 10:46:58.130 | INFO     | src.policies:train:157 - Total loss: 590.6574096679688
2021-08-25 10:46:58.130 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:46:58.134 | INFO     | src.policies:train:103 - Epoch 354 / 800
2021-08-25 10:46:58.135 | INFO     |

2021-08-25 10:46:58.961 | INFO     | src.policies:train:109 - Episode 2288
2021-08-25 10:46:59.034 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:59.036 | INFO     | src.policies:train:121 - Mean episode return: 199.0
2021-08-25 10:46:59.037 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 127.59
2021-08-25 10:46:59.037 | INFO     | src.policies:train:109 - Episode 2289
2021-08-25 10:46:59.066 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:46:59.068 | INFO     | src.policies:train:121 - Mean episode return: 68.0
2021-08-25 10:46:59.069 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 127.34
2021-08-25 10:46:59.076 | INFO     | src.policies:train:157 - Total loss: 385.54119873046875
2021-08-25 10:46:59.077 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:46:59.080 | INFO     | src.policies:train:103 - Epoch 363 / 800
2021-08-25 10:46

2021-08-25 10:47:00.003 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:47:00.005 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 10:47:00.006 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 134.08
2021-08-25 10:47:00.014 | INFO     | src.policies:train:157 - Total loss: 353.6727294921875
2021-08-25 10:47:00.016 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:47:00.020 | INFO     | src.policies:train:103 - Epoch 372 / 800
2021-08-25 10:47:00.021 | INFO     | src.policies:train:109 - Episode 2304
2021-08-25 10:47:00.095 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:47:00.097 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 10:47:00.098 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 134.84
2021-08-25 10:47:00.104 | INFO     | src.policies:train:157 - Total loss: 456.9593811035156

2021-08-25 10:47:00.976 | INFO     | src.policies:train:103 - Epoch 380 / 800
2021-08-25 10:47:00.977 | INFO     | src.policies:train:109 - Episode 2318
2021-08-25 10:47:01.029 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:47:01.030 | INFO     | src.policies:train:121 - Mean episode return: 142.0
2021-08-25 10:47:01.031 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 140.45
2021-08-25 10:47:01.032 | INFO     | src.policies:train:109 - Episode 2319
2021-08-25 10:47:01.072 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:47:01.073 | INFO     | src.policies:train:121 - Mean episode return: 108.0
2021-08-25 10:47:01.074 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 141.3
2021-08-25 10:47:01.080 | INFO     | src.policies:train:157 - Total loss: 266.49078369140625
2021-08-25 10:47:01.081 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:47

2021-08-25 10:47:01.967 | INFO     | src.policies:train:121 - Mean episode return: 119.0
2021-08-25 10:47:01.968 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 144.36
2021-08-25 10:47:01.974 | INFO     | src.policies:train:157 - Total loss: 242.70516967773438
2021-08-25 10:47:01.975 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:47:01.978 | INFO     | src.policies:train:103 - Epoch 388 / 800
2021-08-25 10:47:01.979 | INFO     | src.policies:train:109 - Episode 2334
2021-08-25 10:47:02.026 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:47:02.028 | INFO     | src.policies:train:121 - Mean episode return: 125.0
2021-08-25 10:47:02.029 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 144.61
2021-08-25 10:47:02.030 | INFO     | src.policies:train:109 - Episode 2335
2021-08-25 10:47:02.074 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:4

2021-08-25 10:47:02.878 | INFO     | src.policies:train:157 - Total loss: 251.0276641845703
2021-08-25 10:47:02.879 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:47:02.882 | INFO     | src.policies:train:103 - Epoch 395 / 800
2021-08-25 10:47:02.883 | INFO     | src.policies:train:109 - Episode 2349
2021-08-25 10:47:02.949 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:47:02.951 | INFO     | src.policies:train:121 - Mean episode return: 188.0
2021-08-25 10:47:02.952 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 149.53
2021-08-25 10:47:02.953 | INFO     | src.policies:train:109 - Episode 2350
2021-08-25 10:47:03.003 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:47:03.004 | INFO     | src.policies:train:121 - Mean episode return: 141.0
2021-08-25 10:47:03.005 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 150.23
2021-08-25 10:47

2021-08-25 10:47:03.898 | INFO     | src.policies:train:121 - Mean episode return: 126.0
2021-08-25 10:47:03.899 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 154.32
2021-08-25 10:47:03.905 | INFO     | src.policies:train:157 - Total loss: 224.23544311523438
2021-08-25 10:47:03.906 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:47:03.909 | INFO     | src.policies:train:103 - Epoch 404 / 800
2021-08-25 10:47:03.910 | INFO     | src.policies:train:109 - Episode 2364
2021-08-25 10:47:03.982 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:47:03.983 | INFO     | src.policies:train:121 - Mean episode return: 198.0
2021-08-25 10:47:03.984 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 154.36
2021-08-25 10:47:03.985 | INFO     | src.policies:train:109 - Episode 2365
2021-08-25 10:47:04.051 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:4

2021-08-25 10:47:04.926 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:47:04.928 | INFO     | src.policies:train:121 - Mean episode return: 126.0
2021-08-25 10:47:04.929 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 153.97
2021-08-25 10:47:04.929 | INFO     | src.policies:train:109 - Episode 2379
2021-08-25 10:47:04.976 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:47:04.977 | INFO     | src.policies:train:121 - Mean episode return: 125.0
2021-08-25 10:47:04.978 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 154.16
2021-08-25 10:47:04.985 | INFO     | src.policies:train:157 - Total loss: 157.6848602294922
2021-08-25 10:47:04.986 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:47:04.989 | INFO     | src.policies:train:103 - Epoch 412 / 800
2021-08-25 10:47:04.990 | INFO     | src.policies:train:109 - Episode 2380
2021-08-25 10:47

2021-08-25 10:47:05.833 | INFO     | src.policies:train:103 - Epoch 420 / 800
2021-08-25 10:47:05.834 | INFO     | src.policies:train:109 - Episode 2393
2021-08-25 10:47:05.899 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:47:05.901 | INFO     | src.policies:train:121 - Mean episode return: 183.0
2021-08-25 10:47:05.902 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 154.48
2021-08-25 10:47:05.902 | INFO     | src.policies:train:109 - Episode 2394
2021-08-25 10:47:05.975 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:47:05.976 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 10:47:05.977 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 155.84
2021-08-25 10:47:05.984 | INFO     | src.policies:train:157 - Total loss: 311.01300048828125
2021-08-25 10:47:05.985 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:4

2021-08-25 10:47:06.950 | INFO     | src.policies:train:109 - Episode 2408
2021-08-25 10:47:07.011 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:47:07.013 | INFO     | src.policies:train:121 - Mean episode return: 170.0
2021-08-25 10:47:07.014 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 159.68
2021-08-25 10:47:07.020 | INFO     | src.policies:train:157 - Total loss: 178.64529418945312
2021-08-25 10:47:07.021 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:47:07.025 | INFO     | src.policies:train:103 - Epoch 429 / 800
2021-08-25 10:47:07.026 | INFO     | src.policies:train:109 - Episode 2409
2021-08-25 10:47:07.097 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:47:07.099 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 10:47:07.100 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 159.83
2021-08-25 10:4

2021-08-25 10:47:08.067 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 10:47:08.068 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 162.27
2021-08-25 10:47:08.073 | INFO     | src.policies:train:157 - Total loss: 391.68865966796875
2021-08-25 10:47:08.074 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:47:08.077 | INFO     | src.policies:train:103 - Epoch 438 / 800
2021-08-25 10:47:08.077 | INFO     | src.policies:train:109 - Episode 2423
2021-08-25 10:47:08.098 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:47:08.100 | INFO     | src.policies:train:121 - Mean episode return: 53.0
2021-08-25 10:47:08.101 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 161.72
2021-08-25 10:47:08.102 | INFO     | src.policies:train:109 - Episode 2424
2021-08-25 10:47:08.170 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:47

2021-08-25 10:47:09.037 | INFO     | src.policies:train:109 - Episode 2437
2021-08-25 10:47:09.107 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:47:09.109 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 10:47:09.110 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 165.63
2021-08-25 10:47:09.115 | INFO     | src.policies:train:157 - Total loss: 199.7146759033203
2021-08-25 10:47:09.115 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:47:09.119 | INFO     | src.policies:train:103 - Epoch 448 / 800
2021-08-25 10:47:09.120 | INFO     | src.policies:train:109 - Episode 2438
2021-08-25 10:47:09.152 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:47:09.154 | INFO     | src.policies:train:121 - Mean episode return: 83.0
2021-08-25 10:47:09.155 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 165.04
2021-08-25 10:47:

2021-08-25 10:47:10.108 | INFO     | src.policies:train:103 - Epoch 459 / 800
2021-08-25 10:47:10.109 | INFO     | src.policies:train:109 - Episode 2451
2021-08-25 10:47:10.185 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:47:10.186 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 10:47:10.187 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 168.92
2021-08-25 10:47:10.193 | INFO     | src.policies:train:157 - Total loss: 269.92742919921875
2021-08-25 10:47:10.194 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:47:10.199 | INFO     | src.policies:train:103 - Epoch 460 / 800
2021-08-25 10:47:10.200 | INFO     | src.policies:train:109 - Episode 2452
2021-08-25 10:47:10.270 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:47:10.272 | INFO     | src.policies:train:121 - Mean episode return: 189.0
2021-08-25 10:47:10.273 | INFO     | 

2021-08-25 10:47:11.283 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:47:11.286 | INFO     | src.policies:train:103 - Epoch 471 / 800
2021-08-25 10:47:11.287 | INFO     | src.policies:train:109 - Episode 2465
2021-08-25 10:47:11.358 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:47:11.360 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 10:47:11.361 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 172.92
2021-08-25 10:47:11.365 | INFO     | src.policies:train:157 - Total loss: 989.5156860351562
2021-08-25 10:47:11.366 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:47:11.369 | INFO     | src.policies:train:103 - Epoch 472 / 800
2021-08-25 10:47:11.370 | INFO     | src.policies:train:109 - Episode 2466
2021-08-25 10:47:11.438 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:47:11.439 | INFO     | src.policies

2021-08-25 10:47:12.426 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 10:47:12.427 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 176.51
2021-08-25 10:47:12.432 | INFO     | src.policies:train:157 - Total loss: 297.9515686035156
2021-08-25 10:47:12.433 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:47:12.436 | INFO     | src.policies:train:103 - Epoch 481 / 800
2021-08-25 10:47:12.436 | INFO     | src.policies:train:109 - Episode 2480
2021-08-25 10:47:12.508 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:47:12.510 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 10:47:12.511 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 177.32
2021-08-25 10:47:12.515 | INFO     | src.policies:train:157 - Total loss: 783.3734741210938
2021-08-25 10:47:12.516 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:47:12.520 |

2021-08-25 10:47:13.536 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:47:13.537 | INFO     | src.policies:train:121 - Mean episode return: 178.0
2021-08-25 10:47:13.538 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 181.38
2021-08-25 10:47:13.539 | INFO     | src.policies:train:109 - Episode 2494
2021-08-25 10:47:13.611 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:47:13.612 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 10:47:13.613 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 181.38
2021-08-25 10:47:13.620 | INFO     | src.policies:train:157 - Total loss: 888.3699340820312
2021-08-25 10:47:13.621 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:47:13.625 | INFO     | src.policies:train:103 - Epoch 493 / 800
2021-08-25 10:47:13.626 | INFO     | src.policies:train:109 - Episode 2495
2021-08-25 10:47

2021-08-25 10:47:14.615 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:47:14.617 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 10:47:14.618 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 182.05
2021-08-25 10:47:14.622 | INFO     | src.policies:train:157 - Total loss: 722.765380859375
2021-08-25 10:47:14.623 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:47:14.627 | INFO     | src.policies:train:103 - Epoch 504 / 800
2021-08-25 10:47:14.628 | INFO     | src.policies:train:109 - Episode 2508
2021-08-25 10:47:14.701 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:47:14.702 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 10:47:14.703 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 182.35
2021-08-25 10:47:14.708 | INFO     | src.policies:train:157 - Total loss: 383.5592346191406


2021-08-25 10:47:15.700 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:47:15.703 | INFO     | src.policies:train:103 - Epoch 516 / 800
2021-08-25 10:47:15.704 | INFO     | src.policies:train:109 - Episode 2521
2021-08-25 10:47:15.776 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:47:15.777 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 10:47:15.778 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 185.02
2021-08-25 10:47:15.783 | INFO     | src.policies:train:157 - Total loss: 620.53662109375
2021-08-25 10:47:15.783 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:47:15.786 | INFO     | src.policies:train:103 - Epoch 517 / 800
2021-08-25 10:47:15.787 | INFO     | src.policies:train:109 - Episode 2522
2021-08-25 10:47:15.859 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:47:15.861 | INFO     | src.policies:t

2021-08-25 10:47:16.862 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:47:16.864 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 10:47:16.865 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 189.89
2021-08-25 10:47:16.870 | INFO     | src.policies:train:157 - Total loss: 555.3593139648438
2021-08-25 10:47:16.870 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:47:16.873 | INFO     | src.policies:train:103 - Epoch 530 / 800
2021-08-25 10:47:16.874 | INFO     | src.policies:train:109 - Episode 2535
2021-08-25 10:47:16.934 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:47:16.936 | INFO     | src.policies:train:121 - Mean episode return: 166.0
2021-08-25 10:47:16.937 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 190.06
2021-08-25 10:47:16.937 | INFO     | src.policies:train:109 - Episode 2536
2021-08-25 10:47

2021-08-25 10:47:17.928 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:47:17.930 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 10:47:17.931 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 191.15
2021-08-25 10:47:17.935 | INFO     | src.policies:train:157 - Total loss: 589.1818237304688
2021-08-25 10:47:17.936 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:47:17.940 | INFO     | src.policies:train:103 - Epoch 541 / 800
2021-08-25 10:47:17.940 | INFO     | src.policies:train:109 - Episode 2549
2021-08-25 10:47:18.015 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:47:18.016 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 10:47:18.017 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 191.15
2021-08-25 10:47:18.022 | INFO     | src.policies:train:157 - Total loss: 435.5231018066406

2021-08-25 10:47:19.029 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 192.06
2021-08-25 10:47:19.034 | INFO     | src.policies:train:157 - Total loss: 504.1355285644531
2021-08-25 10:47:19.035 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:47:19.038 | INFO     | src.policies:train:103 - Epoch 554 / 800
2021-08-25 10:47:19.038 | INFO     | src.policies:train:109 - Episode 2562
2021-08-25 10:47:19.111 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:47:19.113 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 10:47:19.114 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 192.06
2021-08-25 10:47:19.119 | INFO     | src.policies:train:157 - Total loss: 655.6625366210938
2021-08-25 10:47:19.120 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:47:19.123 | INFO     | src.policies:train:103 - Epoch 555 / 800
2021-08-25 10:47:19.123 | INFO     |

2021-08-25 10:47:20.094 | INFO     | src.policies:train:109 - Episode 2575
2021-08-25 10:47:20.166 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:47:20.167 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 10:47:20.169 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 192.98
2021-08-25 10:47:20.173 | INFO     | src.policies:train:157 - Total loss: 325.7951965332031
2021-08-25 10:47:20.174 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:47:20.178 | INFO     | src.policies:train:103 - Epoch 567 / 800
2021-08-25 10:47:20.179 | INFO     | src.policies:train:109 - Episode 2576
2021-08-25 10:47:20.251 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:47:20.252 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 10:47:20.253 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 192.98
2021-08-25 10:47

2021-08-25 10:47:21.267 | INFO     | src.policies:train:103 - Epoch 578 / 800
2021-08-25 10:47:21.268 | INFO     | src.policies:train:109 - Episode 2589
2021-08-25 10:47:21.343 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:47:21.345 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 10:47:21.346 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 193.53
2021-08-25 10:47:21.352 | INFO     | src.policies:train:157 - Total loss: 701.2713012695312
2021-08-25 10:47:21.352 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:47:21.355 | INFO     | src.policies:train:103 - Epoch 579 / 800
2021-08-25 10:47:21.356 | INFO     | src.policies:train:109 - Episode 2590
2021-08-25 10:47:21.429 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:47:21.430 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 10:47:21.431 | INFO     | s

2021-08-25 10:47:22.461 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 10:47:22.462 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 195.8
2021-08-25 10:47:22.467 | INFO     | src.policies:train:157 - Total loss: 728.1613159179688
2021-08-25 10:47:22.468 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:47:22.471 | INFO     | src.policies:train:103 - Epoch 592 / 800
2021-08-25 10:47:22.472 | INFO     | src.policies:train:109 - Episode 2603
2021-08-25 10:47:22.542 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:47:22.544 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 10:47:22.545 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 195.8
2021-08-25 10:47:22.550 | INFO     | src.policies:train:157 - Total loss: 707.6143188476562
2021-08-25 10:47:22.550 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:47:22.554 | I

2021-08-25 10:47:23.565 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:47:23.567 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 10:47:23.568 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 194.33
2021-08-25 10:47:23.573 | INFO     | src.policies:train:157 - Total loss: 497.6723937988281
2021-08-25 10:47:23.573 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:47:23.577 | INFO     | src.policies:train:103 - Epoch 604 / 800
2021-08-25 10:47:23.578 | INFO     | src.policies:train:109 - Episode 2617
2021-08-25 10:47:23.648 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:47:23.649 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 10:47:23.650 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 194.33
2021-08-25 10:47:23.655 | INFO     | src.policies:train:157 - Total loss: 592.2322387695312

2021-08-25 10:47:24.617 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:47:24.620 | INFO     | src.policies:train:103 - Epoch 616 / 800
2021-08-25 10:47:24.621 | INFO     | src.policies:train:109 - Episode 2630
2021-08-25 10:47:24.681 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:47:24.683 | INFO     | src.policies:train:121 - Mean episode return: 163.0
2021-08-25 10:47:24.684 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 192.69
2021-08-25 10:47:24.685 | INFO     | src.policies:train:109 - Episode 2631
2021-08-25 10:47:24.758 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:47:24.759 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 10:47:24.760 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 192.69
2021-08-25 10:47:24.767 | INFO     | src.policies:train:157 - Total loss: 598.2463989257812
2021-08-25 10:47

2021-08-25 10:47:25.775 | INFO     | src.policies:train:121 - Mean episode return: 176.0
2021-08-25 10:47:25.775 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 193.69
2021-08-25 10:47:25.783 | INFO     | src.policies:train:157 - Total loss: 599.33837890625
2021-08-25 10:47:25.784 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:47:25.786 | INFO     | src.policies:train:103 - Epoch 627 / 800
2021-08-25 10:47:25.788 | INFO     | src.policies:train:109 - Episode 2645
2021-08-25 10:47:25.835 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:47:25.837 | INFO     | src.policies:train:121 - Mean episode return: 132.0
2021-08-25 10:47:25.838 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 193.01
2021-08-25 10:47:25.839 | INFO     | src.policies:train:109 - Episode 2646
2021-08-25 10:47:25.904 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:47:2

2021-08-25 10:47:26.727 | INFO     | src.policies:train:157 - Total loss: 455.884521484375
2021-08-25 10:47:26.728 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:47:26.731 | INFO     | src.policies:train:103 - Epoch 634 / 800
2021-08-25 10:47:26.732 | INFO     | src.policies:train:109 - Episode 2660
2021-08-25 10:47:26.790 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:47:26.792 | INFO     | src.policies:train:121 - Mean episode return: 166.0
2021-08-25 10:47:26.793 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 186.14
2021-08-25 10:47:26.794 | INFO     | src.policies:train:109 - Episode 2661
2021-08-25 10:47:26.857 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:47:26.858 | INFO     | src.policies:train:121 - Mean episode return: 177.0
2021-08-25 10:47:26.859 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 185.91
2021-08-25 10:47:

2021-08-25 10:47:27.644 | INFO     | src.policies:train:109 - Episode 2675
2021-08-25 10:47:27.694 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:47:27.695 | INFO     | src.policies:train:121 - Mean episode return: 137.0
2021-08-25 10:47:27.696 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 178.58
2021-08-25 10:47:27.697 | INFO     | src.policies:train:109 - Episode 2676
2021-08-25 10:47:27.759 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:47:27.760 | INFO     | src.policies:train:121 - Mean episode return: 169.0
2021-08-25 10:47:27.761 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 178.27
2021-08-25 10:47:27.768 | INFO     | src.policies:train:157 - Total loss: 442.4185485839844
2021-08-25 10:47:27.769 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:47:27.772 | INFO     | src.policies:train:103 - Epoch 642 / 800
2021-08-25 10:47

2021-08-25 10:47:28.509 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:47:28.512 | INFO     | src.policies:train:103 - Epoch 648 / 800
2021-08-25 10:47:28.514 | INFO     | src.policies:train:109 - Episode 2691
2021-08-25 10:47:28.564 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:47:28.566 | INFO     | src.policies:train:121 - Mean episode return: 137.0
2021-08-25 10:47:28.567 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 167.46
2021-08-25 10:47:28.567 | INFO     | src.policies:train:109 - Episode 2692
2021-08-25 10:47:28.584 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:47:28.585 | INFO     | src.policies:train:121 - Mean episode return: 37.0
2021-08-25 10:47:28.586 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 165.83
2021-08-25 10:47:28.587 | INFO     | src.policies:train:109 - Episode 2693
2021-08-25 10:47:28.633 | DEBUG   

2021-08-25 10:47:29.347 | INFO     | src.policies:train:157 - Total loss: 294.1414794921875
2021-08-25 10:47:29.348 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:47:29.351 | INFO     | src.policies:train:103 - Epoch 655 / 800
2021-08-25 10:47:29.352 | INFO     | src.policies:train:109 - Episode 2707
2021-08-25 10:47:29.405 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:47:29.406 | INFO     | src.policies:train:121 - Mean episode return: 147.0
2021-08-25 10:47:29.407 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 154.02
2021-08-25 10:47:29.408 | INFO     | src.policies:train:109 - Episode 2708
2021-08-25 10:47:29.462 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:47:29.463 | INFO     | src.policies:train:121 - Mean episode return: 148.0
2021-08-25 10:47:29.464 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 153.5
2021-08-25 10:47:

2021-08-25 10:47:30.237 | INFO     | src.policies:train:109 - Episode 2722
2021-08-25 10:47:30.307 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:47:30.309 | INFO     | src.policies:train:121 - Mean episode return: 195.0
2021-08-25 10:47:30.310 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 147.02
2021-08-25 10:47:30.311 | INFO     | src.policies:train:109 - Episode 2723
2021-08-25 10:47:30.365 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:47:30.367 | INFO     | src.policies:train:121 - Mean episode return: 152.0
2021-08-25 10:47:30.367 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 146.54
2021-08-25 10:47:30.374 | INFO     | src.policies:train:157 - Total loss: 388.5416259765625
2021-08-25 10:47:30.375 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:47:30.378 | INFO     | src.policies:train:103 - Epoch 663 / 800
2021-08-25 10:47

2021-08-25 10:47:31.269 | INFO     | src.policies:train:157 - Total loss: 304.82952880859375
2021-08-25 10:47:31.269 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:47:31.273 | INFO     | src.policies:train:103 - Epoch 670 / 800
2021-08-25 10:47:31.274 | INFO     | src.policies:train:109 - Episode 2738
2021-08-25 10:47:31.325 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:47:31.327 | INFO     | src.policies:train:121 - Mean episode return: 143.0
2021-08-25 10:47:31.328 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 141.96
2021-08-25 10:47:31.329 | INFO     | src.policies:train:109 - Episode 2739
2021-08-25 10:47:31.399 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:47:31.401 | INFO     | src.policies:train:121 - Mean episode return: 187.0
2021-08-25 10:47:31.402 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 141.86
2021-08-25 10:4

2021-08-25 10:47:32.293 | INFO     | src.policies:train:157 - Total loss: 279.0473937988281
2021-08-25 10:47:32.294 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:47:32.297 | INFO     | src.policies:train:103 - Epoch 678 / 800
2021-08-25 10:47:32.298 | INFO     | src.policies:train:109 - Episode 2753
2021-08-25 10:47:32.358 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:47:32.360 | INFO     | src.policies:train:121 - Mean episode return: 169.0
2021-08-25 10:47:32.361 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 139.86
2021-08-25 10:47:32.362 | INFO     | src.policies:train:109 - Episode 2754
2021-08-25 10:47:32.412 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:47:32.413 | INFO     | src.policies:train:121 - Mean episode return: 134.0
2021-08-25 10:47:32.414 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 139.52
2021-08-25 10:47

2021-08-25 10:47:33.291 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 140.31
2021-08-25 10:47:33.292 | INFO     | src.policies:train:109 - Episode 2768
2021-08-25 10:47:33.364 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:47:33.365 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 10:47:33.366 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 140.84
2021-08-25 10:47:33.373 | INFO     | src.policies:train:157 - Total loss: 320.2264709472656
2021-08-25 10:47:33.374 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:47:33.377 | INFO     | src.policies:train:103 - Epoch 686 / 800
2021-08-25 10:47:33.378 | INFO     | src.policies:train:109 - Episode 2769
2021-08-25 10:47:33.432 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:47:33.433 | INFO     | src.policies:train:121 - Mean episode return: 149.0
2021-08-25 10:47

2021-08-25 10:47:34.360 | INFO     | src.policies:train:109 - Episode 2782
2021-08-25 10:47:34.430 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:47:34.432 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 10:47:34.433 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 148.22
2021-08-25 10:47:34.438 | INFO     | src.policies:train:157 - Total loss: 457.1365051269531
2021-08-25 10:47:34.438 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:47:34.441 | INFO     | src.policies:train:103 - Epoch 696 / 800
2021-08-25 10:47:34.442 | INFO     | src.policies:train:109 - Episode 2783
2021-08-25 10:47:34.512 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:47:34.514 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 10:47:34.515 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 149.41
2021-08-25 10:47

2021-08-25 10:47:35.525 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 159.27
2021-08-25 10:47:35.530 | INFO     | src.policies:train:157 - Total loss: 509.78399658203125
2021-08-25 10:47:35.530 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:47:35.533 | INFO     | src.policies:train:103 - Epoch 709 / 800
2021-08-25 10:47:35.534 | INFO     | src.policies:train:109 - Episode 2796
2021-08-25 10:47:35.606 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:47:35.608 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 10:47:35.609 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 160.93
2021-08-25 10:47:35.614 | INFO     | src.policies:train:157 - Total loss: 679.3924560546875
2021-08-25 10:47:35.615 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:47:35.619 | INFO     | src.policies:train:103 - Epoch 710 / 800
2021-08-25 10:47:35.619 | INFO     

2021-08-25 10:47:36.640 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:47:36.642 | INFO     | src.policies:train:103 - Epoch 722 / 800
2021-08-25 10:47:36.643 | INFO     | src.policies:train:109 - Episode 2809
2021-08-25 10:47:36.717 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:47:36.718 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 10:47:36.720 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 170.23
2021-08-25 10:47:36.725 | INFO     | src.policies:train:157 - Total loss: 497.3808898925781
2021-08-25 10:47:36.726 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:47:36.729 | INFO     | src.policies:train:103 - Epoch 723 / 800
2021-08-25 10:47:36.730 | INFO     | src.policies:train:109 - Episode 2810
2021-08-25 10:47:36.803 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:47:36.804 | INFO     | src.policies

2021-08-25 10:47:37.819 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:47:37.821 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 10:47:37.822 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 177.52
2021-08-25 10:47:37.827 | INFO     | src.policies:train:157 - Total loss: 575.942626953125
2021-08-25 10:47:37.828 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:47:37.831 | INFO     | src.policies:train:103 - Epoch 736 / 800
2021-08-25 10:47:37.833 | INFO     | src.policies:train:109 - Episode 2823
2021-08-25 10:47:37.903 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:47:37.905 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 10:47:37.906 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 178.0
2021-08-25 10:47:37.910 | INFO     | src.policies:train:157 - Total loss: 586.8952026367188
2

2021-08-25 10:47:38.905 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 183.62
2021-08-25 10:47:38.910 | INFO     | src.policies:train:157 - Total loss: 588.3477172851562
2021-08-25 10:47:38.910 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:47:38.914 | INFO     | src.policies:train:103 - Epoch 749 / 800
2021-08-25 10:47:38.914 | INFO     | src.policies:train:109 - Episode 2836
2021-08-25 10:47:38.986 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:47:38.987 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 10:47:38.988 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 183.99
2021-08-25 10:47:38.993 | INFO     | src.policies:train:157 - Total loss: 573.646484375
2021-08-25 10:47:38.994 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:47:38.997 | INFO     | src.policies:train:103 - Epoch 750 / 800
2021-08-25 10:47:38.998 | INFO     | src

2021-08-25 10:47:40.000 | INFO     | src.policies:train:103 - Epoch 762 / 800
2021-08-25 10:47:40.000 | INFO     | src.policies:train:109 - Episode 2849
2021-08-25 10:47:40.076 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:47:40.078 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 10:47:40.079 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 189.6
2021-08-25 10:47:40.085 | INFO     | src.policies:train:157 - Total loss: 586.1451416015625
2021-08-25 10:47:40.086 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:47:40.090 | INFO     | src.policies:train:103 - Epoch 763 / 800
2021-08-25 10:47:40.091 | INFO     | src.policies:train:109 - Episode 2850
2021-08-25 10:47:40.170 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:47:40.172 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 10:47:40.173 | INFO     | sr

2021-08-25 10:47:41.150 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 10:47:41.151 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 195.8
2021-08-25 10:47:41.155 | INFO     | src.policies:train:157 - Total loss: 540.7841186523438
2021-08-25 10:47:41.156 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:47:41.159 | INFO     | src.policies:train:103 - Epoch 776 / 800
2021-08-25 10:47:41.159 | INFO     | src.policies:train:109 - Episode 2863
2021-08-25 10:47:41.227 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:47:41.229 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 10:47:41.230 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 196.28
2021-08-25 10:47:41.234 | INFO     | src.policies:train:157 - Total loss: 551.5658569335938
2021-08-25 10:47:41.235 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:47:41.238 | 

2021-08-25 10:47:42.191 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:47:42.194 | INFO     | src.policies:train:103 - Epoch 789 / 800
2021-08-25 10:47:42.194 | INFO     | src.policies:train:109 - Episode 2876
2021-08-25 10:47:42.263 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:47:42.264 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 10:47:42.265 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 200.0
2021-08-25 10:47:42.270 | INFO     | src.policies:train:157 - Total loss: 596.014892578125
2021-08-25 10:47:42.270 | INFO     | src.policies:train:161 - Epoch infos: {}
2021-08-25 10:47:42.274 | INFO     | src.policies:train:103 - Epoch 790 / 800
2021-08-25 10:47:42.275 | INFO     | src.policies:train:109 - Episode 2877
2021-08-25 10:47:42.342 | DEBUG    | src.policies:execute_episode:266 - Early stopping, all agents done
2021-08-25 10:47:42.344 | INFO     | src.policies:t

## TRPO

This section deals with training a Cartpole agent using our custom Trust Region Policy Optimization implementation.

In [None]:
beta = 1.0
kl_target = 0.01

In [18]:
trpo_policy_nn = models.MLP(observation_space_size, hidden_sizes, action_space_size)
trpo_baseline_nn = models.MLP(observation_space_size, hidden_sizes, 1, log_softmax=False)
trpo_policy = policies.TRPOPolicy(env, trpo_policy_nn, trpo_baseline_nn, beta=beta, kl_target=kl_target)
trpo_policy.train(
    epochs,
    steps_per_epoch,
    enable_wandb=False,
    episodes_mean_reward=episodes_mean_reward
)

2021-08-25 11:17:27.623 | INFO     | src.policies:train:103 - Epoch 1 / 800
2021-08-25 11:17:27.624 | INFO     | src.policies:train:109 - Episode 1
2021-08-25 11:17:27.647 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:27.648 | INFO     | src.policies:train:121 - Mean episode return: 73.0
2021-08-25 11:17:27.649 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 73.0
2021-08-25 11:17:27.650 | INFO     | src.policies:train:109 - Episode 2
2021-08-25 11:17:27.658 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:27.659 | INFO     | src.policies:train:121 - Mean episode return: 13.0
2021-08-25 11:17:27.660 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 43.0
2021-08-25 11:17:27.661 | INFO     | src.policies:train:109 - Episode 3
2021-08-25 11:17:27.669 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:27.670 

2021-08-25 11:17:27.958 | INFO     | src.policies:train:121 - Mean episode return: 22.0
2021-08-25 11:17:27.959 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 28.333333333333332
2021-08-25 11:17:27.960 | INFO     | src.policies:train:109 - Episode 22
2021-08-25 11:17:27.967 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:27.969 | INFO     | src.policies:train:121 - Mean episode return: 12.0
2021-08-25 11:17:27.970 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 27.59090909090909
2021-08-25 11:17:27.971 | INFO     | src.policies:train:109 - Episode 23
2021-08-25 11:17:27.987 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:27.989 | INFO     | src.policies:train:121 - Mean episode return: 43.0
2021-08-25 11:17:27.989 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 28.26086956521739
2021-08-25 11:17:27.997 | INFO     | src.policie

2021-08-25 11:17:28.261 | INFO     | src.policies:train:157 - Total loss: 1.1009654998779297
2021-08-25 11:17:28.263 | INFO     | src.policies:train:103 - Epoch 6 / 800
2021-08-25 11:17:28.264 | INFO     | src.policies:train:109 - Episode 42
2021-08-25 11:17:28.274 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:28.275 | INFO     | src.policies:train:121 - Mean episode return: 22.0
2021-08-25 11:17:28.275 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 26.666666666666668
2021-08-25 11:17:28.276 | INFO     | src.policies:train:109 - Episode 43
2021-08-25 11:17:28.283 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:28.284 | INFO     | src.policies:train:121 - Mean episode return: 12.0
2021-08-25 11:17:28.285 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 26.325581395348838
2021-08-25 11:17:28.286 | INFO     | src.policies:train:109 - Episode 44
2021

2021-08-25 11:17:28.536 | INFO     | src.policies:train:103 - Epoch 8 / 800
2021-08-25 11:17:28.537 | INFO     | src.policies:train:109 - Episode 62
2021-08-25 11:17:28.555 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:28.556 | INFO     | src.policies:train:121 - Mean episode return: 50.0
2021-08-25 11:17:28.557 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 25.548387096774192
2021-08-25 11:17:28.558 | INFO     | src.policies:train:109 - Episode 63
2021-08-25 11:17:28.589 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:28.591 | INFO     | src.policies:train:121 - Mean episode return: 89.0
2021-08-25 11:17:28.591 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 26.555555555555557
2021-08-25 11:17:28.592 | INFO     | src.policies:train:109 - Episode 64
2021-08-25 11:17:28.624 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agent

2021-08-25 11:17:28.885 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:28.886 | INFO     | src.policies:train:121 - Mean episode return: 34.0
2021-08-25 11:17:28.887 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 26.98780487804878
2021-08-25 11:17:28.894 | INFO     | src.policies:train:157 - Total loss: 1.0239838361740112
2021-08-25 11:17:28.896 | INFO     | src.policies:train:103 - Epoch 11 / 800
2021-08-25 11:17:28.897 | INFO     | src.policies:train:109 - Episode 83
2021-08-25 11:17:28.915 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:28.916 | INFO     | src.policies:train:121 - Mean episode return: 50.0
2021-08-25 11:17:28.917 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 27.265060240963855
2021-08-25 11:17:28.918 | INFO     | src.policies:train:109 - Episode 84
2021-08-25 11:17:28.926 | DEBUG    | src.policies:execute_episode:267 - Early

2021-08-25 11:17:29.147 | INFO     | src.policies:train:103 - Epoch 13 / 800
2021-08-25 11:17:29.148 | INFO     | src.policies:train:109 - Episode 102
2021-08-25 11:17:29.157 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:29.158 | INFO     | src.policies:train:121 - Mean episode return: 19.0
2021-08-25 11:17:29.159 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 25.6
2021-08-25 11:17:29.159 | INFO     | src.policies:train:109 - Episode 103
2021-08-25 11:17:29.167 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:29.168 | INFO     | src.policies:train:121 - Mean episode return: 15.0
2021-08-25 11:17:29.169 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 25.61
2021-08-25 11:17:29.170 | INFO     | src.policies:train:109 - Episode 104
2021-08-25 11:17:29.184 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17

2021-08-25 11:17:29.479 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 26.41
2021-08-25 11:17:29.480 | INFO     | src.policies:train:109 - Episode 123
2021-08-25 11:17:29.499 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:29.500 | INFO     | src.policies:train:121 - Mean episode return: 48.0
2021-08-25 11:17:29.501 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 26.46
2021-08-25 11:17:29.502 | INFO     | src.policies:train:109 - Episode 124
2021-08-25 11:17:29.509 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:29.510 | INFO     | src.policies:train:121 - Mean episode return: 16.0
2021-08-25 11:17:29.511 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 26.37
2021-08-25 11:17:29.512 | INFO     | src.policies:train:109 - Episode 125
2021-08-25 11:17:29.522 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agent

2021-08-25 11:17:29.752 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 25.46
2021-08-25 11:17:29.753 | INFO     | src.policies:train:109 - Episode 144
2021-08-25 11:17:29.764 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:29.765 | INFO     | src.policies:train:121 - Mean episode return: 22.0
2021-08-25 11:17:29.765 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 25.45
2021-08-25 11:17:29.766 | INFO     | src.policies:train:109 - Episode 145
2021-08-25 11:17:29.772 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:29.773 | INFO     | src.policies:train:121 - Mean episode return: 10.0
2021-08-25 11:17:29.774 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 25.36
2021-08-25 11:17:29.775 | INFO     | src.policies:train:109 - Episode 146
2021-08-25 11:17:29.794 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agent

2021-08-25 11:17:30.042 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 24.12
2021-08-25 11:17:30.043 | INFO     | src.policies:train:109 - Episode 165
2021-08-25 11:17:30.051 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:30.052 | INFO     | src.policies:train:121 - Mean episode return: 17.0
2021-08-25 11:17:30.053 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 24.14
2021-08-25 11:17:30.060 | INFO     | src.policies:train:157 - Total loss: 0.9953268766403198
2021-08-25 11:17:30.062 | INFO     | src.policies:train:103 - Epoch 20 / 800
2021-08-25 11:17:30.064 | INFO     | src.policies:train:109 - Episode 166
2021-08-25 11:17:30.070 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:30.072 | INFO     | src.policies:train:121 - Mean episode return: 17.0
2021-08-25 11:17:30.072 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 24.07
2

2021-08-25 11:17:30.376 | INFO     | src.policies:train:109 - Episode 185
2021-08-25 11:17:30.384 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:30.385 | INFO     | src.policies:train:121 - Mean episode return: 14.0
2021-08-25 11:17:30.386 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 25.16
2021-08-25 11:17:30.387 | INFO     | src.policies:train:109 - Episode 186
2021-08-25 11:17:30.396 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:30.397 | INFO     | src.policies:train:121 - Mean episode return: 20.0
2021-08-25 11:17:30.398 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 25.21
2021-08-25 11:17:30.399 | INFO     | src.policies:train:109 - Episode 187
2021-08-25 11:17:30.409 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:30.410 | INFO     | src.policies:train:121 - Mean episode return: 25.0
2021

2021-08-25 11:17:30.677 | INFO     | src.policies:train:109 - Episode 206
2021-08-25 11:17:30.686 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:30.687 | INFO     | src.policies:train:121 - Mean episode return: 15.0
2021-08-25 11:17:30.688 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 26.02
2021-08-25 11:17:30.697 | INFO     | src.policies:train:157 - Total loss: 0.9953267574310303
2021-08-25 11:17:30.701 | INFO     | src.policies:train:103 - Epoch 25 / 800
2021-08-25 11:17:30.703 | INFO     | src.policies:train:109 - Episode 207
2021-08-25 11:17:30.710 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:30.711 | INFO     | src.policies:train:121 - Mean episode return: 13.0
2021-08-25 11:17:30.712 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 25.25
2021-08-25 11:17:30.713 | INFO     | src.policies:train:109 - Episode 208
2021-08-25 11:17:30.723 | 

2021-08-25 11:17:30.990 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:30.991 | INFO     | src.policies:train:121 - Mean episode return: 11.0
2021-08-25 11:17:30.992 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 24.16
2021-08-25 11:17:30.993 | INFO     | src.policies:train:109 - Episode 227
2021-08-25 11:17:31.011 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:31.012 | INFO     | src.policies:train:121 - Mean episode return: 49.0
2021-08-25 11:17:31.013 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 24.49
2021-08-25 11:17:31.014 | INFO     | src.policies:train:109 - Episode 228
2021-08-25 11:17:31.027 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:31.028 | INFO     | src.policies:train:121 - Mean episode return: 31.0
2021-08-25 11:17:31.029 | INFO     | src.policies:train:122 - Last 100 episode

2021-08-25 11:17:31.300 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:31.301 | INFO     | src.policies:train:121 - Mean episode return: 19.0
2021-08-25 11:17:31.301 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 25.2
2021-08-25 11:17:31.302 | INFO     | src.policies:train:109 - Episode 248
2021-08-25 11:17:31.311 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:31.312 | INFO     | src.policies:train:121 - Mean episode return: 18.0
2021-08-25 11:17:31.313 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 25.15
2021-08-25 11:17:31.313 | INFO     | src.policies:train:109 - Episode 249
2021-08-25 11:17:31.324 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:31.325 | INFO     | src.policies:train:121 - Mean episode return: 23.0
2021-08-25 11:17:31.326 | INFO     | src.policies:train:122 - Last 100 episodes

2021-08-25 11:17:31.603 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:31.604 | INFO     | src.policies:train:121 - Mean episode return: 28.0
2021-08-25 11:17:31.605 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 25.48
2021-08-25 11:17:31.611 | INFO     | src.policies:train:157 - Total loss: 0.9955157041549683
2021-08-25 11:17:31.614 | INFO     | src.policies:train:103 - Epoch 32 / 800
2021-08-25 11:17:31.615 | INFO     | src.policies:train:109 - Episode 269
2021-08-25 11:17:31.624 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:31.625 | INFO     | src.policies:train:121 - Mean episode return: 22.0
2021-08-25 11:17:31.626 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 25.29
2021-08-25 11:17:31.626 | INFO     | src.policies:train:109 - Episode 270
2021-08-25 11:17:31.644 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents d

2021-08-25 11:17:31.905 | INFO     | src.policies:train:121 - Mean episode return: 24.0
2021-08-25 11:17:31.906 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 24.56
2021-08-25 11:17:31.907 | INFO     | src.policies:train:109 - Episode 289
2021-08-25 11:17:31.914 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:31.915 | INFO     | src.policies:train:121 - Mean episode return: 14.0
2021-08-25 11:17:31.916 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 24.55
2021-08-25 11:17:31.917 | INFO     | src.policies:train:109 - Episode 290
2021-08-25 11:17:31.926 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:31.927 | INFO     | src.policies:train:121 - Mean episode return: 18.0
2021-08-25 11:17:31.927 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 24.35
2021-08-25 11:17:31.928 | INFO     | src.policies:train:109 - Episode 291
2021-08-2

2021-08-25 11:17:32.199 | INFO     | src.policies:train:121 - Mean episode return: 32.0
2021-08-25 11:17:32.199 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 25.0
2021-08-25 11:17:32.200 | INFO     | src.policies:train:109 - Episode 310
2021-08-25 11:17:32.211 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:32.212 | INFO     | src.policies:train:121 - Mean episode return: 27.0
2021-08-25 11:17:32.213 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 25.12
2021-08-25 11:17:32.214 | INFO     | src.policies:train:109 - Episode 311
2021-08-25 11:17:32.231 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:32.232 | INFO     | src.policies:train:121 - Mean episode return: 47.0
2021-08-25 11:17:32.233 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 25.21
2021-08-25 11:17:32.240 | INFO     | src.policies:train:157 - Total loss: 0.99567055

2021-08-25 11:17:32.532 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 25.73
2021-08-25 11:17:32.533 | INFO     | src.policies:train:109 - Episode 330
2021-08-25 11:17:32.545 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:32.546 | INFO     | src.policies:train:121 - Mean episode return: 26.0
2021-08-25 11:17:32.547 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 25.77
2021-08-25 11:17:32.547 | INFO     | src.policies:train:109 - Episode 331
2021-08-25 11:17:32.569 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:32.570 | INFO     | src.policies:train:121 - Mean episode return: 52.0
2021-08-25 11:17:32.571 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 26.09
2021-08-25 11:17:32.572 | INFO     | src.policies:train:109 - Episode 332
2021-08-25 11:17:32.585 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agent

2021-08-25 11:17:32.857 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 26.1
2021-08-25 11:17:32.858 | INFO     | src.policies:train:109 - Episode 351
2021-08-25 11:17:32.870 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:32.871 | INFO     | src.policies:train:121 - Mean episode return: 30.0
2021-08-25 11:17:32.872 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 26.18
2021-08-25 11:17:32.873 | INFO     | src.policies:train:109 - Episode 352
2021-08-25 11:17:32.883 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:32.884 | INFO     | src.policies:train:121 - Mean episode return: 21.0
2021-08-25 11:17:32.884 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 26.2
2021-08-25 11:17:32.885 | INFO     | src.policies:train:109 - Episode 353
2021-08-25 11:17:32.914 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents 

2021-08-25 11:17:33.187 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 26.17
2021-08-25 11:17:33.196 | INFO     | src.policies:train:157 - Total loss: 0.9954748749732971
2021-08-25 11:17:33.200 | INFO     | src.policies:train:103 - Epoch 44 / 800
2021-08-25 11:17:33.201 | INFO     | src.policies:train:109 - Episode 372
2021-08-25 11:17:33.208 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:33.209 | INFO     | src.policies:train:121 - Mean episode return: 11.0
2021-08-25 11:17:33.210 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 26.08
2021-08-25 11:17:33.211 | INFO     | src.policies:train:109 - Episode 373
2021-08-25 11:17:33.221 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:33.222 | INFO     | src.policies:train:121 - Mean episode return: 12.0
2021-08-25 11:17:33.223 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 25.94
2

2021-08-25 11:17:33.524 | INFO     | src.policies:train:109 - Episode 392
2021-08-25 11:17:33.530 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:33.531 | INFO     | src.policies:train:121 - Mean episode return: 10.0
2021-08-25 11:17:33.532 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 25.89
2021-08-25 11:17:33.533 | INFO     | src.policies:train:109 - Episode 393
2021-08-25 11:17:33.539 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:33.540 | INFO     | src.policies:train:121 - Mean episode return: 12.0
2021-08-25 11:17:33.541 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 25.87
2021-08-25 11:17:33.542 | INFO     | src.policies:train:109 - Episode 394
2021-08-25 11:17:33.552 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:33.553 | INFO     | src.policies:train:121 - Mean episode return: 21.0
2021

2021-08-25 11:17:33.868 | INFO     | src.policies:train:109 - Episode 413
2021-08-25 11:17:33.879 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:33.881 | INFO     | src.policies:train:121 - Mean episode return: 20.0
2021-08-25 11:17:33.882 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 25.9
2021-08-25 11:17:33.882 | INFO     | src.policies:train:109 - Episode 414
2021-08-25 11:17:33.894 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:33.895 | INFO     | src.policies:train:121 - Mean episode return: 24.0
2021-08-25 11:17:33.896 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 25.56
2021-08-25 11:17:33.897 | INFO     | src.policies:train:109 - Episode 415
2021-08-25 11:17:33.908 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:33.909 | INFO     | src.policies:train:121 - Mean episode return: 17.0
2021-

2021-08-25 11:17:34.231 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:34.232 | INFO     | src.policies:train:121 - Mean episode return: 20.0
2021-08-25 11:17:34.233 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 25.95
2021-08-25 11:17:34.234 | INFO     | src.policies:train:109 - Episode 434
2021-08-25 11:17:34.248 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:34.249 | INFO     | src.policies:train:121 - Mean episode return: 31.0
2021-08-25 11:17:34.250 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 25.59
2021-08-25 11:17:34.251 | INFO     | src.policies:train:109 - Episode 435
2021-08-25 11:17:34.260 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:34.261 | INFO     | src.policies:train:121 - Mean episode return: 13.0
2021-08-25 11:17:34.262 | INFO     | src.policies:train:122 - Last 100 episode

2021-08-25 11:17:34.581 | INFO     | src.policies:train:121 - Mean episode return: 12.0
2021-08-25 11:17:34.581 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 26.58
2021-08-25 11:17:34.582 | INFO     | src.policies:train:109 - Episode 454
2021-08-25 11:17:34.588 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:34.589 | INFO     | src.policies:train:121 - Mean episode return: 12.0
2021-08-25 11:17:34.590 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 26.37
2021-08-25 11:17:34.591 | INFO     | src.policies:train:109 - Episode 455
2021-08-25 11:17:34.598 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:34.599 | INFO     | src.policies:train:121 - Mean episode return: 14.0
2021-08-25 11:17:34.599 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 26.2
2021-08-25 11:17:34.600 | INFO     | src.policies:train:109 - Episode 456
2021-08-25

2021-08-25 11:17:34.876 | INFO     | src.policies:train:121 - Mean episode return: 12.0
2021-08-25 11:17:34.876 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 26.77
2021-08-25 11:17:34.877 | INFO     | src.policies:train:109 - Episode 475
2021-08-25 11:17:34.889 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:34.890 | INFO     | src.policies:train:121 - Mean episode return: 30.0
2021-08-25 11:17:34.891 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 26.92
2021-08-25 11:17:34.892 | INFO     | src.policies:train:109 - Episode 476
2021-08-25 11:17:34.900 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:34.901 | INFO     | src.policies:train:121 - Mean episode return: 17.0
2021-08-25 11:17:34.902 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 26.92
2021-08-25 11:17:34.903 | INFO     | src.policies:train:109 - Episode 477
2021-08-2

2021-08-25 11:17:35.198 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 28.28
2021-08-25 11:17:35.199 | INFO     | src.policies:train:109 - Episode 495
2021-08-25 11:17:35.207 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:35.208 | INFO     | src.policies:train:121 - Mean episode return: 16.0
2021-08-25 11:17:35.209 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 28.09
2021-08-25 11:17:35.210 | INFO     | src.policies:train:109 - Episode 496
2021-08-25 11:17:35.216 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:35.218 | INFO     | src.policies:train:121 - Mean episode return: 9.0
2021-08-25 11:17:35.218 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 27.63
2021-08-25 11:17:35.219 | INFO     | src.policies:train:109 - Episode 497
2021-08-25 11:17:35.229 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents

2021-08-25 11:17:35.543 | INFO     | src.policies:train:109 - Episode 515
2021-08-25 11:17:35.552 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:35.554 | INFO     | src.policies:train:121 - Mean episode return: 15.0
2021-08-25 11:17:35.555 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 28.3
2021-08-25 11:17:35.555 | INFO     | src.policies:train:109 - Episode 516
2021-08-25 11:17:35.566 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:35.567 | INFO     | src.policies:train:121 - Mean episode return: 18.0
2021-08-25 11:17:35.568 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 28.29
2021-08-25 11:17:35.569 | INFO     | src.policies:train:109 - Episode 517
2021-08-25 11:17:35.590 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:35.592 | INFO     | src.policies:train:121 - Mean episode return: 52.0
2021-

2021-08-25 11:17:35.925 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:35.926 | INFO     | src.policies:train:121 - Mean episode return: 41.0
2021-08-25 11:17:35.927 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 29.16
2021-08-25 11:17:35.928 | INFO     | src.policies:train:109 - Episode 536
2021-08-25 11:17:35.944 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:35.945 | INFO     | src.policies:train:121 - Mean episode return: 35.0
2021-08-25 11:17:35.945 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 29.15
2021-08-25 11:17:35.946 | INFO     | src.policies:train:109 - Episode 537
2021-08-25 11:17:35.956 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:35.957 | INFO     | src.policies:train:121 - Mean episode return: 16.0
2021-08-25 11:17:35.958 | INFO     | src.policies:train:122 - Last 100 episode

2021-08-25 11:17:36.257 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:36.258 | INFO     | src.policies:train:121 - Mean episode return: 14.0
2021-08-25 11:17:36.259 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 28.45
2021-08-25 11:17:36.260 | INFO     | src.policies:train:109 - Episode 557
2021-08-25 11:17:36.271 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:36.272 | INFO     | src.policies:train:121 - Mean episode return: 22.0
2021-08-25 11:17:36.273 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 28.42
2021-08-25 11:17:36.274 | INFO     | src.policies:train:109 - Episode 558
2021-08-25 11:17:36.292 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:36.293 | INFO     | src.policies:train:121 - Mean episode return: 44.0
2021-08-25 11:17:36.293 | INFO     | src.policies:train:122 - Last 100 episode

2021-08-25 11:17:36.617 | INFO     | src.policies:train:121 - Mean episode return: 9.0
2021-08-25 11:17:36.618 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 29.82
2021-08-25 11:17:36.619 | INFO     | src.policies:train:109 - Episode 577
2021-08-25 11:17:36.630 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:36.631 | INFO     | src.policies:train:121 - Mean episode return: 22.0
2021-08-25 11:17:36.632 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 29.75
2021-08-25 11:17:36.633 | INFO     | src.policies:train:109 - Episode 578
2021-08-25 11:17:36.642 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:36.643 | INFO     | src.policies:train:121 - Mean episode return: 15.0
2021-08-25 11:17:36.644 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 29.69
2021-08-25 11:17:36.644 | INFO     | src.policies:train:109 - Episode 579
2021-08-25

2021-08-25 11:17:36.959 | INFO     | src.policies:train:109 - Episode 597
2021-08-25 11:17:36.968 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:36.969 | INFO     | src.policies:train:121 - Mean episode return: 17.0
2021-08-25 11:17:36.970 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 29.53
2021-08-25 11:17:36.971 | INFO     | src.policies:train:109 - Episode 598
2021-08-25 11:17:36.982 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:36.983 | INFO     | src.policies:train:121 - Mean episode return: 25.0
2021-08-25 11:17:36.984 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 29.28
2021-08-25 11:17:36.985 | INFO     | src.policies:train:109 - Episode 599
2021-08-25 11:17:37.000 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:37.001 | INFO     | src.policies:train:121 - Mean episode return: 34.0
2021

2021-08-25 11:17:37.295 | INFO     | src.policies:train:109 - Episode 618
2021-08-25 11:17:37.309 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:37.310 | INFO     | src.policies:train:121 - Mean episode return: 32.0
2021-08-25 11:17:37.311 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 28.21
2021-08-25 11:17:37.312 | INFO     | src.policies:train:109 - Episode 619
2021-08-25 11:17:37.326 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:37.327 | INFO     | src.policies:train:121 - Mean episode return: 36.0
2021-08-25 11:17:37.328 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 28.36
2021-08-25 11:17:37.336 | INFO     | src.policies:train:157 - Total loss: 0.9951690435409546
2021-08-25 11:17:37.339 | INFO     | src.policies:train:103 - Epoch 76 / 800
2021-08-25 11:17:37.340 | INFO     | src.policies:train:109 - Episode 620
2021-08-25 11:17:37.364 | 

2021-08-25 11:17:37.645 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:37.646 | INFO     | src.policies:train:121 - Mean episode return: 58.0
2021-08-25 11:17:37.647 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 27.85
2021-08-25 11:17:37.648 | INFO     | src.policies:train:109 - Episode 639
2021-08-25 11:17:37.656 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:37.657 | INFO     | src.policies:train:121 - Mean episode return: 11.0
2021-08-25 11:17:37.657 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 27.79
2021-08-25 11:17:37.658 | INFO     | src.policies:train:109 - Episode 640
2021-08-25 11:17:37.673 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:37.674 | INFO     | src.policies:train:121 - Mean episode return: 33.0
2021-08-25 11:17:37.675 | INFO     | src.policies:train:122 - Last 100 episode

2021-08-25 11:17:37.977 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:37.978 | INFO     | src.policies:train:121 - Mean episode return: 24.0
2021-08-25 11:17:37.979 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 27.5
2021-08-25 11:17:37.980 | INFO     | src.policies:train:109 - Episode 660
2021-08-25 11:17:37.988 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:37.989 | INFO     | src.policies:train:121 - Mean episode return: 17.0
2021-08-25 11:17:37.990 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 27.52
2021-08-25 11:17:37.991 | INFO     | src.policies:train:109 - Episode 661
2021-08-25 11:17:38.000 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:38.001 | INFO     | src.policies:train:121 - Mean episode return: 15.0
2021-08-25 11:17:38.002 | INFO     | src.policies:train:122 - Last 100 episodes

2021-08-25 11:17:38.284 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 26.27
2021-08-25 11:17:38.284 | INFO     | src.policies:train:109 - Episode 680
2021-08-25 11:17:38.293 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:38.294 | INFO     | src.policies:train:121 - Mean episode return: 17.0
2021-08-25 11:17:38.295 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 26.29
2021-08-25 11:17:38.296 | INFO     | src.policies:train:109 - Episode 681
2021-08-25 11:17:38.304 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:38.305 | INFO     | src.policies:train:121 - Mean episode return: 15.0
2021-08-25 11:17:38.306 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 26.31
2021-08-25 11:17:38.307 | INFO     | src.policies:train:109 - Episode 682
2021-08-25 11:17:38.315 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agent

2021-08-25 11:17:38.594 | INFO     | src.policies:train:109 - Episode 701
2021-08-25 11:17:38.601 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:38.602 | INFO     | src.policies:train:121 - Mean episode return: 16.0
2021-08-25 11:17:38.603 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 25.74
2021-08-25 11:17:38.609 | INFO     | src.policies:train:157 - Total loss: 0.9950737953186035
2021-08-25 11:17:38.611 | INFO     | src.policies:train:103 - Epoch 86 / 800
2021-08-25 11:17:38.612 | INFO     | src.policies:train:109 - Episode 702
2021-08-25 11:17:38.619 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:38.620 | INFO     | src.policies:train:121 - Mean episode return: 14.0
2021-08-25 11:17:38.621 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 25.73
2021-08-25 11:17:38.622 | INFO     | src.policies:train:109 - Episode 703
2021-08-25 11:17:38.633 | 

2021-08-25 11:17:38.876 | INFO     | src.policies:train:157 - Total loss: 0.9953268766403198
2021-08-25 11:17:38.878 | INFO     | src.policies:train:103 - Epoch 88 / 800
2021-08-25 11:17:38.879 | INFO     | src.policies:train:109 - Episode 722
2021-08-25 11:17:38.887 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:38.888 | INFO     | src.policies:train:121 - Mean episode return: 18.0
2021-08-25 11:17:38.889 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 24.48
2021-08-25 11:17:38.890 | INFO     | src.policies:train:109 - Episode 723
2021-08-25 11:17:38.910 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:38.911 | INFO     | src.policies:train:121 - Mean episode return: 52.0
2021-08-25 11:17:38.912 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 24.73
2021-08-25 11:17:38.913 | INFO     | src.policies:train:109 - Episode 724
2021-08-25 11:17:38.925 | 

2021-08-25 11:17:39.187 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:39.188 | INFO     | src.policies:train:121 - Mean episode return: 22.0
2021-08-25 11:17:39.189 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 25.06
2021-08-25 11:17:39.190 | INFO     | src.policies:train:109 - Episode 743
2021-08-25 11:17:39.206 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:39.207 | INFO     | src.policies:train:121 - Mean episode return: 42.0
2021-08-25 11:17:39.208 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 25.17
2021-08-25 11:17:39.209 | INFO     | src.policies:train:109 - Episode 744
2021-08-25 11:17:39.217 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:39.218 | INFO     | src.policies:train:121 - Mean episode return: 16.0
2021-08-25 11:17:39.219 | INFO     | src.policies:train:122 - Last 100 episode

2021-08-25 11:17:39.539 | INFO     | src.policies:train:121 - Mean episode return: 35.0
2021-08-25 11:17:39.540 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 26.99
2021-08-25 11:17:39.541 | INFO     | src.policies:train:109 - Episode 763
2021-08-25 11:17:39.547 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:39.548 | INFO     | src.policies:train:121 - Mean episode return: 12.0
2021-08-25 11:17:39.549 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 26.95
2021-08-25 11:17:39.550 | INFO     | src.policies:train:109 - Episode 764
2021-08-25 11:17:39.558 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:39.559 | INFO     | src.policies:train:121 - Mean episode return: 16.0
2021-08-25 11:17:39.560 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 26.89
2021-08-25 11:17:39.560 | INFO     | src.policies:train:109 - Episode 765
2021-08-2

2021-08-25 11:17:39.863 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 27.63
2021-08-25 11:17:39.864 | INFO     | src.policies:train:109 - Episode 783
2021-08-25 11:17:39.872 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:39.873 | INFO     | src.policies:train:121 - Mean episode return: 18.0
2021-08-25 11:17:39.874 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 27.66
2021-08-25 11:17:39.875 | INFO     | src.policies:train:109 - Episode 784
2021-08-25 11:17:39.885 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:39.886 | INFO     | src.policies:train:121 - Mean episode return: 22.0
2021-08-25 11:17:39.887 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 27.68
2021-08-25 11:17:39.887 | INFO     | src.policies:train:109 - Episode 785
2021-08-25 11:17:39.904 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agent

2021-08-25 11:17:40.176 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 27.47
2021-08-25 11:17:40.177 | INFO     | src.policies:train:109 - Episode 804
2021-08-25 11:17:40.193 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:40.194 | INFO     | src.policies:train:121 - Mean episode return: 41.0
2021-08-25 11:17:40.195 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 27.71
2021-08-25 11:17:40.202 | INFO     | src.policies:train:157 - Total loss: 0.9951218366622925
2021-08-25 11:17:40.205 | INFO     | src.policies:train:103 - Epoch 99 / 800
2021-08-25 11:17:40.205 | INFO     | src.policies:train:109 - Episode 805
2021-08-25 11:17:40.212 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:40.213 | INFO     | src.policies:train:121 - Mean episode return: 14.0
2021-08-25 11:17:40.214 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 27.69
2

2021-08-25 11:17:40.508 | INFO     | src.policies:train:109 - Episode 824
2021-08-25 11:17:40.516 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:40.517 | INFO     | src.policies:train:121 - Mean episode return: 15.0
2021-08-25 11:17:40.518 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 28.78
2021-08-25 11:17:40.519 | INFO     | src.policies:train:109 - Episode 825
2021-08-25 11:17:40.532 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:40.533 | INFO     | src.policies:train:121 - Mean episode return: 33.0
2021-08-25 11:17:40.534 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 28.72
2021-08-25 11:17:40.541 | INFO     | src.policies:train:157 - Total loss: 0.995073676109314
2021-08-25 11:17:40.543 | INFO     | src.policies:train:103 - Epoch 102 / 800
2021-08-25 11:17:40.544 | INFO     | src.policies:train:109 - Episode 826
2021-08-25 11:17:40.550 | 

2021-08-25 11:17:40.847 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:40.848 | INFO     | src.policies:train:121 - Mean episode return: 14.0
2021-08-25 11:17:40.849 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 28.97
2021-08-25 11:17:40.850 | INFO     | src.policies:train:109 - Episode 845
2021-08-25 11:17:40.867 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:40.868 | INFO     | src.policies:train:121 - Mean episode return: 32.0
2021-08-25 11:17:40.869 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 28.91
2021-08-25 11:17:40.870 | INFO     | src.policies:train:109 - Episode 846
2021-08-25 11:17:40.880 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:40.882 | INFO     | src.policies:train:121 - Mean episode return: 13.0
2021-08-25 11:17:40.883 | INFO     | src.policies:train:122 - Last 100 episode

2021-08-25 11:17:41.197 | INFO     | src.policies:train:121 - Mean episode return: 32.0
2021-08-25 11:17:41.198 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 28.64
2021-08-25 11:17:41.199 | INFO     | src.policies:train:109 - Episode 865
2021-08-25 11:17:41.209 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:41.210 | INFO     | src.policies:train:121 - Mean episode return: 23.0
2021-08-25 11:17:41.210 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 28.71
2021-08-25 11:17:41.211 | INFO     | src.policies:train:109 - Episode 866
2021-08-25 11:17:41.219 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:41.220 | INFO     | src.policies:train:121 - Mean episode return: 13.0
2021-08-25 11:17:41.221 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 28.53
2021-08-25 11:17:41.222 | INFO     | src.policies:train:109 - Episode 867
2021-08-2

2021-08-25 11:17:41.522 | INFO     | src.policies:train:121 - Mean episode return: 35.0
2021-08-25 11:17:41.523 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 28.32
2021-08-25 11:17:41.524 | INFO     | src.policies:train:109 - Episode 886
2021-08-25 11:17:41.537 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:41.538 | INFO     | src.policies:train:121 - Mean episode return: 33.0
2021-08-25 11:17:41.539 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 28.34
2021-08-25 11:17:41.545 | INFO     | src.policies:train:157 - Total loss: 0.9952152967453003
2021-08-25 11:17:41.547 | INFO     | src.policies:train:103 - Epoch 110 / 800
2021-08-25 11:17:41.548 | INFO     | src.policies:train:109 - Episode 887
2021-08-25 11:17:41.554 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:41.555 | INFO     | src.policies:train:121 - Mean episode return: 11.0
2021-08-25 

2021-08-25 11:17:41.821 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 28.09
2021-08-25 11:17:41.822 | INFO     | src.policies:train:109 - Episode 906
2021-08-25 11:17:41.830 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:41.831 | INFO     | src.policies:train:121 - Mean episode return: 15.0
2021-08-25 11:17:41.832 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 28.08
2021-08-25 11:17:41.833 | INFO     | src.policies:train:109 - Episode 907
2021-08-25 11:17:41.848 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:41.849 | INFO     | src.policies:train:121 - Mean episode return: 40.0
2021-08-25 11:17:41.850 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 27.98
2021-08-25 11:17:41.851 | INFO     | src.policies:train:109 - Episode 908
2021-08-25 11:17:41.870 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agent

2021-08-25 11:17:42.146 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 27.81
2021-08-25 11:17:42.153 | INFO     | src.policies:train:157 - Total loss: 0.9955354332923889
2021-08-25 11:17:42.156 | INFO     | src.policies:train:103 - Epoch 115 / 800
2021-08-25 11:17:42.157 | INFO     | src.policies:train:109 - Episode 927
2021-08-25 11:17:42.166 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:42.167 | INFO     | src.policies:train:121 - Mean episode return: 21.0
2021-08-25 11:17:42.168 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 27.88
2021-08-25 11:17:42.169 | INFO     | src.policies:train:109 - Episode 928
2021-08-25 11:17:42.180 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:42.181 | INFO     | src.policies:train:121 - Mean episode return: 23.0
2021-08-25 11:17:42.182 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 27.93


2021-08-25 11:17:42.507 | INFO     | src.policies:train:157 - Total loss: 0.9956139326095581
2021-08-25 11:17:42.509 | INFO     | src.policies:train:103 - Epoch 118 / 800
2021-08-25 11:17:42.511 | INFO     | src.policies:train:109 - Episode 947
2021-08-25 11:17:42.526 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:42.527 | INFO     | src.policies:train:121 - Mean episode return: 41.0
2021-08-25 11:17:42.527 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 29.43
2021-08-25 11:17:42.528 | INFO     | src.policies:train:109 - Episode 948
2021-08-25 11:17:42.545 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:42.546 | INFO     | src.policies:train:121 - Mean episode return: 40.0
2021-08-25 11:17:42.547 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 29.62
2021-08-25 11:17:42.548 | INFO     | src.policies:train:109 - Episode 949
2021-08-25 11:17:42.562 |

2021-08-25 11:17:42.881 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:42.882 | INFO     | src.policies:train:121 - Mean episode return: 57.0
2021-08-25 11:17:42.883 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 29.56
2021-08-25 11:17:42.890 | INFO     | src.policies:train:157 - Total loss: 0.9954127073287964
2021-08-25 11:17:42.893 | INFO     | src.policies:train:103 - Epoch 121 / 800
2021-08-25 11:17:42.894 | INFO     | src.policies:train:109 - Episode 968
2021-08-25 11:17:42.905 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:42.906 | INFO     | src.policies:train:121 - Mean episode return: 25.0
2021-08-25 11:17:42.907 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 29.25
2021-08-25 11:17:42.908 | INFO     | src.policies:train:109 - Episode 969
2021-08-25 11:17:42.929 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents 

2021-08-25 11:17:43.284 | INFO     | src.policies:train:121 - Mean episode return: 21.0
2021-08-25 11:17:43.285 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 30.63
2021-08-25 11:17:43.286 | INFO     | src.policies:train:109 - Episode 988
2021-08-25 11:17:43.311 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:43.312 | INFO     | src.policies:train:121 - Mean episode return: 59.0
2021-08-25 11:17:43.314 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 31.08
2021-08-25 11:17:43.324 | INFO     | src.policies:train:157 - Total loss: 0.9953915476799011
2021-08-25 11:17:43.327 | INFO     | src.policies:train:103 - Epoch 124 / 800
2021-08-25 11:17:43.328 | INFO     | src.policies:train:109 - Episode 989
2021-08-25 11:17:43.335 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:43.336 | INFO     | src.policies:train:121 - Mean episode return: 12.0
2021-08-25 

2021-08-25 11:17:43.682 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 31.62
2021-08-25 11:17:43.683 | INFO     | src.policies:train:109 - Episode 1008
2021-08-25 11:17:43.693 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:43.694 | INFO     | src.policies:train:121 - Mean episode return: 19.0
2021-08-25 11:17:43.695 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 31.36
2021-08-25 11:17:43.696 | INFO     | src.policies:train:109 - Episode 1009
2021-08-25 11:17:43.716 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:43.718 | INFO     | src.policies:train:121 - Mean episode return: 49.0
2021-08-25 11:17:43.719 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 31.39
2021-08-25 11:17:43.720 | INFO     | src.policies:train:109 - Episode 1010
2021-08-25 11:17:43.736 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all ag

2021-08-25 11:17:44.078 | INFO     | src.policies:train:109 - Episode 1028
2021-08-25 11:17:44.116 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:44.118 | INFO     | src.policies:train:121 - Mean episode return: 98.0
2021-08-25 11:17:44.119 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 32.94
2021-08-25 11:17:44.120 | INFO     | src.policies:train:109 - Episode 1029
2021-08-25 11:17:44.132 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:44.133 | INFO     | src.policies:train:121 - Mean episode return: 26.0
2021-08-25 11:17:44.134 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 32.38
2021-08-25 11:17:44.143 | INFO     | src.policies:train:157 - Total loss: 0.995326817035675
2021-08-25 11:17:44.146 | INFO     | src.policies:train:103 - Epoch 130 / 800
2021-08-25 11:17:44.147 | INFO     | src.policies:train:109 - Episode 1030
2021-08-25 11:17:44.174

2021-08-25 11:17:44.635 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 36.37
2021-08-25 11:17:44.636 | INFO     | src.policies:train:109 - Episode 1047
2021-08-25 11:17:44.646 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:44.647 | INFO     | src.policies:train:121 - Mean episode return: 16.0
2021-08-25 11:17:44.648 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 36.12
2021-08-25 11:17:44.649 | INFO     | src.policies:train:109 - Episode 1048
2021-08-25 11:17:44.659 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:44.660 | INFO     | src.policies:train:121 - Mean episode return: 21.0
2021-08-25 11:17:44.661 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 35.93
2021-08-25 11:17:44.662 | INFO     | src.policies:train:109 - Episode 1049
2021-08-25 11:17:44.693 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all ag

2021-08-25 11:17:45.062 | INFO     | src.policies:train:109 - Episode 1067
2021-08-25 11:17:45.106 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:45.107 | INFO     | src.policies:train:121 - Mean episode return: 130.0
2021-08-25 11:17:45.108 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 39.39
2021-08-25 11:17:45.115 | INFO     | src.policies:train:157 - Total loss: 0.9968251585960388
2021-08-25 11:17:45.118 | INFO     | src.policies:train:103 - Epoch 138 / 800
2021-08-25 11:17:45.119 | INFO     | src.policies:train:109 - Episode 1068
2021-08-25 11:17:45.131 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:45.132 | INFO     | src.policies:train:121 - Mean episode return: 33.0
2021-08-25 11:17:45.133 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 39.47
2021-08-25 11:17:45.134 | INFO     | src.policies:train:109 - Episode 1069
2021-08-25 11:17:45.1

2021-08-25 11:17:45.529 | INFO     | src.policies:train:121 - Mean episode return: 65.0
2021-08-25 11:17:45.529 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 41.06
2021-08-25 11:17:45.530 | INFO     | src.policies:train:109 - Episode 1087
2021-08-25 11:17:45.539 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:45.540 | INFO     | src.policies:train:121 - Mean episode return: 13.0
2021-08-25 11:17:45.541 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 40.98
2021-08-25 11:17:45.542 | INFO     | src.policies:train:109 - Episode 1088
2021-08-25 11:17:45.551 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:45.552 | INFO     | src.policies:train:121 - Mean episode return: 21.0
2021-08-25 11:17:45.553 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 40.6
2021-08-25 11:17:45.554 | INFO     | src.policies:train:109 - Episode 1089
2021-08

2021-08-25 11:17:45.955 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:45.956 | INFO     | src.policies:train:121 - Mean episode return: 20.0
2021-08-25 11:17:45.957 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 43.29
2021-08-25 11:17:45.958 | INFO     | src.policies:train:109 - Episode 1107
2021-08-25 11:17:45.975 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:45.976 | INFO     | src.policies:train:121 - Mean episode return: 42.0
2021-08-25 11:17:45.977 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 43.46
2021-08-25 11:17:45.978 | INFO     | src.policies:train:109 - Episode 1108
2021-08-25 11:17:46.029 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:46.030 | INFO     | src.policies:train:121 - Mean episode return: 145.0
2021-08-25 11:17:46.031 | INFO     | src.policies:train:122 - Last 100 epis

2021-08-25 11:17:46.442 | INFO     | src.policies:train:103 - Epoch 150 / 800
2021-08-25 11:17:46.443 | INFO     | src.policies:train:109 - Episode 1125
2021-08-25 11:17:46.450 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:46.452 | INFO     | src.policies:train:121 - Mean episode return: 15.0
2021-08-25 11:17:46.453 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 48.23
2021-08-25 11:17:46.453 | INFO     | src.policies:train:109 - Episode 1126
2021-08-25 11:17:46.464 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:46.466 | INFO     | src.policies:train:121 - Mean episode return: 25.0
2021-08-25 11:17:46.466 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 47.88
2021-08-25 11:17:46.467 | INFO     | src.policies:train:109 - Episode 1127
2021-08-25 11:17:46.479 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 

2021-08-25 11:17:46.947 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:46.948 | INFO     | src.policies:train:121 - Mean episode return: 77.0
2021-08-25 11:17:46.949 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 48.82
2021-08-25 11:17:46.950 | INFO     | src.policies:train:109 - Episode 1145
2021-08-25 11:17:46.971 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:46.972 | INFO     | src.policies:train:121 - Mean episode return: 51.0
2021-08-25 11:17:46.973 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 48.49
2021-08-25 11:17:46.979 | INFO     | src.policies:train:157 - Total loss: 0.9952149987220764
2021-08-25 11:17:46.982 | INFO     | src.policies:train:103 - Epoch 155 / 800
2021-08-25 11:17:46.983 | INFO     | src.policies:train:109 - Episode 1146
2021-08-25 11:17:47.016 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agent

2021-08-25 11:17:47.439 | INFO     | src.policies:train:109 - Episode 1163
2021-08-25 11:17:47.450 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:47.452 | INFO     | src.policies:train:121 - Mean episode return: 29.0
2021-08-25 11:17:47.452 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 51.0
2021-08-25 11:17:47.454 | INFO     | src.policies:train:109 - Episode 1164
2021-08-25 11:17:47.469 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:47.470 | INFO     | src.policies:train:121 - Mean episode return: 36.0
2021-08-25 11:17:47.471 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 51.13
2021-08-25 11:17:47.472 | INFO     | src.policies:train:109 - Episode 1165
2021-08-25 11:17:47.535 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:47.536 | INFO     | src.policies:train:121 - Mean episode return: 184.0
2

2021-08-25 11:17:48.001 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 55.52
2021-08-25 11:17:48.002 | INFO     | src.policies:train:109 - Episode 1182
2021-08-25 11:17:48.017 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:48.018 | INFO     | src.policies:train:121 - Mean episode return: 37.0
2021-08-25 11:17:48.019 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 55.73
2021-08-25 11:17:48.020 | INFO     | src.policies:train:109 - Episode 1183
2021-08-25 11:17:48.034 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:48.035 | INFO     | src.policies:train:121 - Mean episode return: 32.0
2021-08-25 11:17:48.036 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 55.64
2021-08-25 11:17:48.037 | INFO     | src.policies:train:109 - Episode 1184
2021-08-25 11:17:48.064 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all ag

2021-08-25 11:17:48.652 | INFO     | src.policies:train:103 - Epoch 170 / 800
2021-08-25 11:17:48.653 | INFO     | src.policies:train:109 - Episode 1200
2021-08-25 11:17:48.660 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:48.662 | INFO     | src.policies:train:121 - Mean episode return: 16.0
2021-08-25 11:17:48.663 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 63.08
2021-08-25 11:17:48.664 | INFO     | src.policies:train:109 - Episode 1201
2021-08-25 11:17:48.705 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:48.706 | INFO     | src.policies:train:121 - Mean episode return: 117.0
2021-08-25 11:17:48.707 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 63.75
2021-08-25 11:17:48.708 | INFO     | src.policies:train:109 - Episode 1202
2021-08-25 11:17:48.738 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25

2021-08-25 11:17:49.327 | INFO     | src.policies:train:103 - Epoch 176 / 800
2021-08-25 11:17:49.328 | INFO     | src.policies:train:109 - Episode 1218
2021-08-25 11:17:49.359 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:49.360 | INFO     | src.policies:train:121 - Mean episode return: 87.0
2021-08-25 11:17:49.361 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 69.13
2021-08-25 11:17:49.362 | INFO     | src.policies:train:109 - Episode 1219
2021-08-25 11:17:49.394 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:49.395 | INFO     | src.policies:train:121 - Mean episode return: 90.0
2021-08-25 11:17:49.396 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 69.54
2021-08-25 11:17:49.397 | INFO     | src.policies:train:109 - Episode 1220
2021-08-25 11:17:49.430 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 

2021-08-25 11:17:50.047 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:17:50.048 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 76.02
2021-08-25 11:17:50.055 | INFO     | src.policies:train:157 - Total loss: 0.9969878792762756
2021-08-25 11:17:50.057 | INFO     | src.policies:train:103 - Epoch 182 / 800
2021-08-25 11:17:50.058 | INFO     | src.policies:train:109 - Episode 1237
2021-08-25 11:17:50.080 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:50.082 | INFO     | src.policies:train:121 - Mean episode return: 58.0
2021-08-25 11:17:50.083 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 75.51
2021-08-25 11:17:50.083 | INFO     | src.policies:train:109 - Episode 1238
2021-08-25 11:17:50.101 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:50.103 | INFO     | src.policies:train:121 - Mean episode return: 46.0
2021-08-

2021-08-25 11:17:50.869 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 86.75
2021-08-25 11:17:50.874 | INFO     | src.policies:train:157 - Total loss: 0.9949999451637268
2021-08-25 11:17:50.877 | INFO     | src.policies:train:103 - Epoch 189 / 800
2021-08-25 11:17:50.877 | INFO     | src.policies:train:109 - Episode 1254
2021-08-25 11:17:50.933 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:50.935 | INFO     | src.policies:train:121 - Mean episode return: 176.0
2021-08-25 11:17:50.936 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 87.57
2021-08-25 11:17:50.936 | INFO     | src.policies:train:109 - Episode 1255
2021-08-25 11:17:50.988 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:50.989 | INFO     | src.policies:train:121 - Mean episode return: 162.0
2021-08-25 11:17:50.990 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 88

2021-08-25 11:17:51.937 | INFO     | src.policies:train:121 - Mean episode return: 189.0
2021-08-25 11:17:51.938 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 104.26
2021-08-25 11:17:51.939 | INFO     | src.policies:train:109 - Episode 1270
2021-08-25 11:17:51.957 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:51.958 | INFO     | src.policies:train:121 - Mean episode return: 47.0
2021-08-25 11:17:51.958 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 104.25
2021-08-25 11:17:51.965 | INFO     | src.policies:train:157 - Total loss: 0.9957625269889832
2021-08-25 11:17:51.968 | INFO     | src.policies:train:103 - Epoch 200 / 800
2021-08-25 11:17:51.969 | INFO     | src.policies:train:109 - Episode 1271
2021-08-25 11:17:52.023 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:52.024 | INFO     | src.policies:train:121 - Mean episode return: 159.0
2021-

2021-08-25 11:17:52.809 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:17:52.810 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 114.37
2021-08-25 11:17:52.816 | INFO     | src.policies:train:157 - Total loss: 0.9961684346199036
2021-08-25 11:17:52.818 | INFO     | src.policies:train:103 - Epoch 208 / 800
2021-08-25 11:17:52.819 | INFO     | src.policies:train:109 - Episode 1287
2021-08-25 11:17:52.882 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:52.884 | INFO     | src.policies:train:121 - Mean episode return: 189.0
2021-08-25 11:17:52.885 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 115.3
2021-08-25 11:17:52.886 | INFO     | src.policies:train:109 - Episode 1288
2021-08-25 11:17:52.946 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:52.947 | INFO     | src.policies:train:121 - Mean episode return: 182.0
2021-

2021-08-25 11:17:53.879 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:53.881 | INFO     | src.policies:train:121 - Mean episode return: 126.0
2021-08-25 11:17:53.882 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 128.48
2021-08-25 11:17:53.882 | INFO     | src.policies:train:109 - Episode 1303
2021-08-25 11:17:53.944 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:53.945 | INFO     | src.policies:train:121 - Mean episode return: 186.0
2021-08-25 11:17:53.946 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 129.77
2021-08-25 11:17:53.954 | INFO     | src.policies:train:157 - Total loss: 0.9967948198318481
2021-08-25 11:17:53.958 | INFO     | src.policies:train:103 - Epoch 218 / 800
2021-08-25 11:17:53.959 | INFO     | src.policies:train:109 - Episode 1304
2021-08-25 11:17:54.025 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all a

2021-08-25 11:17:55.000 | INFO     | src.policies:train:121 - Mean episode return: 148.0
2021-08-25 11:17:55.001 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 141.37
2021-08-25 11:17:55.008 | INFO     | src.policies:train:157 - Total loss: 0.9962685108184814
2021-08-25 11:17:55.010 | INFO     | src.policies:train:103 - Epoch 228 / 800
2021-08-25 11:17:55.011 | INFO     | src.policies:train:109 - Episode 1319
2021-08-25 11:17:55.042 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:55.044 | INFO     | src.policies:train:121 - Mean episode return: 78.0
2021-08-25 11:17:55.045 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 141.25
2021-08-25 11:17:55.045 | INFO     | src.policies:train:109 - Episode 1320
2021-08-25 11:17:55.085 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:55.086 | INFO     | src.policies:train:121 - Mean episode return: 110.0
2021-

2021-08-25 11:17:55.878 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 149.98
2021-08-25 11:17:55.879 | INFO     | src.policies:train:109 - Episode 1335
2021-08-25 11:17:55.914 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:55.915 | INFO     | src.policies:train:121 - Mean episode return: 94.0
2021-08-25 11:17:55.916 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 149.6
2021-08-25 11:17:55.923 | INFO     | src.policies:train:157 - Total loss: 0.9959838390350342
2021-08-25 11:17:55.925 | INFO     | src.policies:train:103 - Epoch 236 / 800
2021-08-25 11:17:55.926 | INFO     | src.policies:train:109 - Episode 1336
2021-08-25 11:17:55.951 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:55.953 | INFO     | src.policies:train:121 - Mean episode return: 70.0
2021-08-25 11:17:55.954 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 148

2021-08-25 11:17:56.678 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:56.679 | INFO     | src.policies:train:121 - Mean episode return: 131.0
2021-08-25 11:17:56.680 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 147.42
2021-08-25 11:17:56.687 | INFO     | src.policies:train:157 - Total loss: 0.9969415068626404
2021-08-25 11:17:56.690 | INFO     | src.policies:train:103 - Epoch 243 / 800
2021-08-25 11:17:56.691 | INFO     | src.policies:train:109 - Episode 1353
2021-08-25 11:17:56.723 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:56.724 | INFO     | src.policies:train:121 - Mean episode return: 92.0
2021-08-25 11:17:56.725 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 146.34
2021-08-25 11:17:56.726 | INFO     | src.policies:train:109 - Episode 1354
2021-08-25 11:17:56.767 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all ag

2021-08-25 11:17:57.423 | INFO     | src.policies:train:103 - Epoch 250 / 800
2021-08-25 11:17:57.423 | INFO     | src.policies:train:109 - Episode 1369
2021-08-25 11:17:57.489 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:57.491 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:17:57.491 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 136.64
2021-08-25 11:17:57.496 | INFO     | src.policies:train:157 - Total loss: 0.9949999451637268
2021-08-25 11:17:57.499 | INFO     | src.policies:train:103 - Epoch 251 / 800
2021-08-25 11:17:57.500 | INFO     | src.policies:train:109 - Episode 1370
2021-08-25 11:17:57.554 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:57.556 | INFO     | src.policies:train:121 - Mean episode return: 162.0
2021-08-25 11:17:57.557 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 137.79
2021-08-25 11:1

2021-08-25 11:17:58.401 | INFO     | src.policies:train:109 - Episode 1385
2021-08-25 11:17:58.467 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:58.468 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:17:58.469 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 141.13
2021-08-25 11:17:58.474 | INFO     | src.policies:train:157 - Total loss: 0.9949999451637268
2021-08-25 11:17:58.477 | INFO     | src.policies:train:103 - Epoch 260 / 800
2021-08-25 11:17:58.478 | INFO     | src.policies:train:109 - Episode 1386
2021-08-25 11:17:58.544 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:58.546 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:17:58.547 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 141.13
2021-08-25 11:17:58.552 | INFO     | src.policies:train:157 - Total loss: 0.9949999451637268


2021-08-25 11:17:59.429 | INFO     | src.policies:train:109 - Episode 1401
2021-08-25 11:17:59.452 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:59.453 | INFO     | src.policies:train:121 - Mean episode return: 63.0
2021-08-25 11:17:59.455 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 137.1
2021-08-25 11:17:59.455 | INFO     | src.policies:train:109 - Episode 1402
2021-08-25 11:17:59.507 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:17:59.508 | INFO     | src.policies:train:121 - Mean episode return: 145.0
2021-08-25 11:17:59.510 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 137.29
2021-08-25 11:17:59.516 | INFO     | src.policies:train:157 - Total loss: 0.9951922297477722
2021-08-25 11:17:59.518 | INFO     | src.policies:train:103 - Epoch 270 / 800
2021-08-25 11:17:59.519 | INFO     | src.policies:train:109 - Episode 1403
2021-08-25 11:17:59.

2021-08-25 11:18:00.407 | INFO     | src.policies:train:109 - Episode 1417
2021-08-25 11:18:00.449 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:18:00.450 | INFO     | src.policies:train:121 - Mean episode return: 126.0
2021-08-25 11:18:00.451 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 135.04
2021-08-25 11:18:00.452 | INFO     | src.policies:train:109 - Episode 1418
2021-08-25 11:18:00.522 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:18:00.523 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:18:00.524 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 135.56
2021-08-25 11:18:00.531 | INFO     | src.policies:train:157 - Total loss: 0.9969323873519897
2021-08-25 11:18:00.534 | INFO     | src.policies:train:103 - Epoch 280 / 800
2021-08-25 11:18:00.535 | INFO     | src.policies:train:109 - Episode 1419
2021-08-25 11:18:0

2021-08-25 11:18:01.534 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 141.57
2021-08-25 11:18:01.540 | INFO     | src.policies:train:157 - Total loss: 0.9961829781532288
2021-08-25 11:18:01.542 | INFO     | src.policies:train:103 - Epoch 288 / 800
2021-08-25 11:18:01.543 | INFO     | src.policies:train:109 - Episode 1434
2021-08-25 11:18:01.610 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:18:01.611 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:18:01.612 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 142.02
2021-08-25 11:18:01.617 | INFO     | src.policies:train:157 - Total loss: 0.9949999451637268
2021-08-25 11:18:01.621 | INFO     | src.policies:train:103 - Epoch 289 / 800
2021-08-25 11:18:01.622 | INFO     | src.policies:train:109 - Episode 1435
2021-08-25 11:18:01.686 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 

2021-08-25 11:18:02.624 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:18:02.625 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 151.04
2021-08-25 11:18:02.630 | INFO     | src.policies:train:157 - Total loss: 0.9949999451637268
2021-08-25 11:18:02.632 | INFO     | src.policies:train:103 - Epoch 298 / 800
2021-08-25 11:18:02.633 | INFO     | src.policies:train:109 - Episode 1450
2021-08-25 11:18:02.675 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:18:02.677 | INFO     | src.policies:train:121 - Mean episode return: 123.0
2021-08-25 11:18:02.678 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 151.53
2021-08-25 11:18:02.678 | INFO     | src.policies:train:109 - Episode 1451
2021-08-25 11:18:02.721 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:18:02.722 | INFO     | src.policies:train:121 - Mean episode return: 123.0
2021

2021-08-25 11:18:03.685 | INFO     | src.policies:train:121 - Mean episode return: 104.0
2021-08-25 11:18:03.686 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 162.47
2021-08-25 11:18:03.687 | INFO     | src.policies:train:109 - Episode 1466
2021-08-25 11:18:03.742 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:18:03.744 | INFO     | src.policies:train:121 - Mean episode return: 156.0
2021-08-25 11:18:03.745 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 162.03
2021-08-25 11:18:03.751 | INFO     | src.policies:train:157 - Total loss: 0.996153712272644
2021-08-25 11:18:03.754 | INFO     | src.policies:train:103 - Epoch 308 / 800
2021-08-25 11:18:03.755 | INFO     | src.policies:train:109 - Episode 1467
2021-08-25 11:18:03.815 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:18:03.817 | INFO     | src.policies:train:121 - Mean episode return: 175.0
2021-

2021-08-25 11:18:04.708 | INFO     | src.policies:train:109 - Episode 1482
2021-08-25 11:18:04.777 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:18:04.778 | INFO     | src.policies:train:121 - Mean episode return: 190.0
2021-08-25 11:18:04.779 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 162.75
2021-08-25 11:18:04.787 | INFO     | src.policies:train:157 - Total loss: 0.9973956346511841
2021-08-25 11:18:04.789 | INFO     | src.policies:train:103 - Epoch 317 / 800
2021-08-25 11:18:04.791 | INFO     | src.policies:train:109 - Episode 1483
2021-08-25 11:18:04.842 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:18:04.844 | INFO     | src.policies:train:121 - Mean episode return: 144.0
2021-08-25 11:18:04.845 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 162.59
2021-08-25 11:18:04.845 | INFO     | src.policies:train:109 - Episode 1484
2021-08-25 11:18:0

2021-08-25 11:18:05.830 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:18:05.831 | INFO     | src.policies:train:121 - Mean episode return: 134.0
2021-08-25 11:18:05.832 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 163.12
2021-08-25 11:18:05.839 | INFO     | src.policies:train:157 - Total loss: 0.9966213703155518
2021-08-25 11:18:05.842 | INFO     | src.policies:train:103 - Epoch 326 / 800
2021-08-25 11:18:05.843 | INFO     | src.policies:train:109 - Episode 1499
2021-08-25 11:18:05.891 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:18:05.893 | INFO     | src.policies:train:121 - Mean episode return: 143.0
2021-08-25 11:18:05.894 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 162.99
2021-08-25 11:18:05.894 | INFO     | src.policies:train:109 - Episode 1500
2021-08-25 11:18:05.951 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all a

2021-08-25 11:18:06.730 | INFO     | src.policies:train:109 - Episode 1514
2021-08-25 11:18:06.797 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:18:06.799 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:18:06.800 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 163.28
2021-08-25 11:18:06.804 | INFO     | src.policies:train:157 - Total loss: 0.9949999451637268
2021-08-25 11:18:06.807 | INFO     | src.policies:train:103 - Epoch 335 / 800
2021-08-25 11:18:06.807 | INFO     | src.policies:train:109 - Episode 1515
2021-08-25 11:18:06.857 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:18:06.859 | INFO     | src.policies:train:121 - Mean episode return: 145.0
2021-08-25 11:18:06.860 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 163.0
2021-08-25 11:18:06.860 | INFO     | src.policies:train:109 - Episode 1516
2021-08-25 11:18:06

2021-08-25 11:18:07.832 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:18:07.834 | INFO     | src.policies:train:121 - Mean episode return: 139.0
2021-08-25 11:18:07.834 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 161.18
2021-08-25 11:18:07.841 | INFO     | src.policies:train:157 - Total loss: 0.996491014957428
2021-08-25 11:18:07.843 | INFO     | src.policies:train:103 - Epoch 345 / 800
2021-08-25 11:18:07.844 | INFO     | src.policies:train:109 - Episode 1531
2021-08-25 11:18:07.874 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:18:07.876 | INFO     | src.policies:train:121 - Mean episode return: 83.0
2021-08-25 11:18:07.877 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 160.37
2021-08-25 11:18:07.878 | INFO     | src.policies:train:109 - Episode 1532
2021-08-25 11:18:07.945 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all age

2021-08-25 11:18:08.827 | INFO     | src.policies:train:103 - Epoch 355 / 800
2021-08-25 11:18:08.828 | INFO     | src.policies:train:109 - Episode 1546
2021-08-25 11:18:08.885 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:18:08.886 | INFO     | src.policies:train:121 - Mean episode return: 160.0
2021-08-25 11:18:08.887 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 162.04
2021-08-25 11:18:08.888 | INFO     | src.policies:train:109 - Episode 1547
2021-08-25 11:18:08.938 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:18:08.940 | INFO     | src.policies:train:121 - Mean episode return: 138.0
2021-08-25 11:18:08.940 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 161.42
2021-08-25 11:18:08.947 | INFO     | src.policies:train:157 - Total loss: 0.9966440200805664
2021-08-25 11:18:08.950 | INFO     | src.policies:train:103 - Epoch 356 / 800
2021-08-25 11:1

2021-08-25 11:18:09.818 | INFO     | src.policies:train:109 - Episode 1562
2021-08-25 11:18:09.861 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:18:09.863 | INFO     | src.policies:train:121 - Mean episode return: 120.0
2021-08-25 11:18:09.863 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 157.9
2021-08-25 11:18:09.864 | INFO     | src.policies:train:109 - Episode 1563
2021-08-25 11:18:09.907 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:18:09.908 | INFO     | src.policies:train:121 - Mean episode return: 120.0
2021-08-25 11:18:09.909 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 157.5
2021-08-25 11:18:09.916 | INFO     | src.policies:train:157 - Total loss: 0.9958332777023315
2021-08-25 11:18:09.919 | INFO     | src.policies:train:103 - Epoch 365 / 800
2021-08-25 11:18:09.920 | INFO     | src.policies:train:109 - Episode 1564
2021-08-25 11:18:09.

2021-08-25 11:18:10.819 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 155.29
2021-08-25 11:18:10.826 | INFO     | src.policies:train:157 - Total loss: 0.9964156746864319
2021-08-25 11:18:10.828 | INFO     | src.policies:train:103 - Epoch 373 / 800
2021-08-25 11:18:10.829 | INFO     | src.policies:train:109 - Episode 1579
2021-08-25 11:18:10.895 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:18:10.897 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:18:10.898 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 156.72
2021-08-25 11:18:10.902 | INFO     | src.policies:train:157 - Total loss: 0.9949999451637268
2021-08-25 11:18:10.905 | INFO     | src.policies:train:103 - Epoch 374 / 800
2021-08-25 11:18:10.906 | INFO     | src.policies:train:109 - Episode 1580
2021-08-25 11:18:10.955 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 

2021-08-25 11:18:11.903 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 159.9
2021-08-25 11:18:11.910 | INFO     | src.policies:train:157 - Total loss: 0.9965635538101196
2021-08-25 11:18:11.913 | INFO     | src.policies:train:103 - Epoch 383 / 800
2021-08-25 11:18:11.914 | INFO     | src.policies:train:109 - Episode 1595
2021-08-25 11:18:11.981 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:18:11.982 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:18:11.983 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 160.24
2021-08-25 11:18:11.988 | INFO     | src.policies:train:157 - Total loss: 0.9949999451637268
2021-08-25 11:18:11.991 | INFO     | src.policies:train:103 - Epoch 384 / 800
2021-08-25 11:18:11.991 | INFO     | src.policies:train:109 - Episode 1596
2021-08-25 11:18:12.054 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 1

2021-08-25 11:18:12.994 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:18:12.996 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:18:12.997 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 162.94
2021-08-25 11:18:13.002 | INFO     | src.policies:train:157 - Total loss: 0.9949999451637268
2021-08-25 11:18:13.004 | INFO     | src.policies:train:103 - Epoch 394 / 800
2021-08-25 11:18:13.005 | INFO     | src.policies:train:109 - Episode 1611
2021-08-25 11:18:13.076 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:18:13.077 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:18:13.078 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 163.59
2021-08-25 11:18:13.083 | INFO     | src.policies:train:157 - Total loss: 0.9949999451637268
2021-08-25 11:18:13.085 | INFO     | src.policies:train:103 - Epoch 395 / 8

2021-08-25 11:18:14.164 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:18:14.165 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 166.79
2021-08-25 11:18:14.172 | INFO     | src.policies:train:157 - Total loss: 0.9972824454307556
2021-08-25 11:18:14.174 | INFO     | src.policies:train:103 - Epoch 405 / 800
2021-08-25 11:18:14.175 | INFO     | src.policies:train:109 - Episode 1627
2021-08-25 11:18:14.244 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:18:14.245 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:18:14.246 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 166.79
2021-08-25 11:18:14.252 | INFO     | src.policies:train:157 - Total loss: 0.9949999451637268
2021-08-25 11:18:14.254 | INFO     | src.policies:train:103 - Epoch 406 / 800
2021-08-25 11:18:14.255 | INFO     | src.policies:train:109 - Episode 1628
2021-08-25 11:18:14.308 | 

2021-08-25 11:18:15.326 | INFO     | src.policies:train:103 - Epoch 417 / 800
2021-08-25 11:18:15.327 | INFO     | src.policies:train:109 - Episode 1642
2021-08-25 11:18:15.386 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:18:15.388 | INFO     | src.policies:train:121 - Mean episode return: 164.0
2021-08-25 11:18:15.389 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 169.53
2021-08-25 11:18:15.390 | INFO     | src.policies:train:109 - Episode 1643
2021-08-25 11:18:15.464 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:18:15.465 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:18:15.466 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 169.53
2021-08-25 11:18:15.475 | INFO     | src.policies:train:157 - Total loss: 0.9972527623176575
2021-08-25 11:18:15.478 | INFO     | src.policies:train:103 - Epoch 418 / 800
2021-08-25 11:1

2021-08-25 11:18:16.541 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:18:16.542 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 173.88
2021-08-25 11:18:16.547 | INFO     | src.policies:train:157 - Total loss: 0.9949999451637268
2021-08-25 11:18:16.550 | INFO     | src.policies:train:103 - Epoch 431 / 800
2021-08-25 11:18:16.550 | INFO     | src.policies:train:109 - Episode 1658
2021-08-25 11:18:16.617 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:18:16.618 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:18:16.619 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 174.88
2021-08-25 11:18:16.624 | INFO     | src.policies:train:157 - Total loss: 0.9949999451637268
2021-08-25 11:18:16.627 | INFO     | src.policies:train:103 - Epoch 432 / 800
2021-08-25 11:18:16.627 | INFO     | src.policies:train:109 - Episode 1659
2021-08-25 11:18:16.693 | 

2021-08-25 11:18:17.668 | INFO     | src.policies:train:109 - Episode 1673
2021-08-25 11:18:17.738 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:18:17.740 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:18:17.741 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 181.99
2021-08-25 11:18:17.745 | INFO     | src.policies:train:157 - Total loss: 0.9949999451637268
2021-08-25 11:18:17.748 | INFO     | src.policies:train:103 - Epoch 444 / 800
2021-08-25 11:18:17.749 | INFO     | src.policies:train:109 - Episode 1674
2021-08-25 11:18:17.815 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:18:17.817 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:18:17.818 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 182.69
2021-08-25 11:18:17.822 | INFO     | src.policies:train:157 - Total loss: 0.9949999451637268


2021-08-25 11:18:18.854 | INFO     | src.policies:train:109 - Episode 1689
2021-08-25 11:18:18.922 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:18:18.924 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:18:18.925 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 185.51
2021-08-25 11:18:18.932 | INFO     | src.policies:train:157 - Total loss: 0.9973956942558289
2021-08-25 11:18:18.935 | INFO     | src.policies:train:103 - Epoch 455 / 800
2021-08-25 11:18:18.936 | INFO     | src.policies:train:109 - Episode 1690
2021-08-25 11:18:19.004 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:18:19.005 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:18:19.006 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 186.26
2021-08-25 11:18:19.011 | INFO     | src.policies:train:157 - Total loss: 0.9949999451637268


2021-08-25 11:18:20.081 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 188.49
2021-08-25 11:18:20.081 | INFO     | src.policies:train:109 - Episode 1705
2021-08-25 11:18:20.151 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:18:20.153 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:18:20.154 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 189.13
2021-08-25 11:18:20.161 | INFO     | src.policies:train:157 - Total loss: 0.9974811673164368
2021-08-25 11:18:20.163 | INFO     | src.policies:train:103 - Epoch 467 / 800
2021-08-25 11:18:20.164 | INFO     | src.policies:train:109 - Episode 1706
2021-08-25 11:18:20.231 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:18:20.233 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:18:20.234 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 

2021-08-25 11:18:21.196 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 189.6
2021-08-25 11:18:21.201 | INFO     | src.policies:train:157 - Total loss: 0.9949999451637268
2021-08-25 11:18:21.203 | INFO     | src.policies:train:103 - Epoch 477 / 800
2021-08-25 11:18:21.204 | INFO     | src.policies:train:109 - Episode 1721
2021-08-25 11:18:21.274 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:18:21.276 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:18:21.277 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 189.78
2021-08-25 11:18:21.281 | INFO     | src.policies:train:157 - Total loss: 0.9949999451637268
2021-08-25 11:18:21.283 | INFO     | src.policies:train:103 - Epoch 478 / 800
2021-08-25 11:18:21.284 | INFO     | src.policies:train:109 - Episode 1722
2021-08-25 11:18:21.335 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 1

2021-08-25 11:18:22.342 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:18:22.343 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 188.53
2021-08-25 11:18:22.348 | INFO     | src.policies:train:157 - Total loss: 0.9949999451637268
2021-08-25 11:18:22.350 | INFO     | src.policies:train:103 - Epoch 489 / 800
2021-08-25 11:18:22.351 | INFO     | src.policies:train:109 - Episode 1737
2021-08-25 11:18:22.418 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:18:22.420 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:18:22.421 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 188.53
2021-08-25 11:18:22.426 | INFO     | src.policies:train:157 - Total loss: 0.9949999451637268
2021-08-25 11:18:22.428 | INFO     | src.policies:train:103 - Epoch 490 / 800
2021-08-25 11:18:22.429 | INFO     | src.policies:train:109 - Episode 1738
2021-08-25 11:18:22.476 | 

2021-08-25 11:18:23.462 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 187.59
2021-08-25 11:18:23.469 | INFO     | src.policies:train:157 - Total loss: 0.9970325827598572
2021-08-25 11:18:23.471 | INFO     | src.policies:train:103 - Epoch 499 / 800
2021-08-25 11:18:23.472 | INFO     | src.policies:train:109 - Episode 1753
2021-08-25 11:18:23.541 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:18:23.542 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:18:23.543 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 187.59
2021-08-25 11:18:23.548 | INFO     | src.policies:train:157 - Total loss: 0.9949999451637268
2021-08-25 11:18:23.550 | INFO     | src.policies:train:103 - Epoch 500 / 800
2021-08-25 11:18:23.551 | INFO     | src.policies:train:109 - Episode 1754
2021-08-25 11:18:23.620 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 

2021-08-25 11:18:24.529 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:18:24.530 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 183.63
2021-08-25 11:18:24.535 | INFO     | src.policies:train:157 - Total loss: 0.9949999451637268
2021-08-25 11:18:24.538 | INFO     | src.policies:train:103 - Epoch 509 / 800
2021-08-25 11:18:24.539 | INFO     | src.policies:train:109 - Episode 1769
2021-08-25 11:18:24.608 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:18:24.610 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:18:24.611 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 183.63
2021-08-25 11:18:24.616 | INFO     | src.policies:train:157 - Total loss: 0.9949999451637268
2021-08-25 11:18:24.619 | INFO     | src.policies:train:103 - Epoch 510 / 800
2021-08-25 11:18:24.619 | INFO     | src.policies:train:109 - Episode 1770
2021-08-25 11:18:24.687 | 

2021-08-25 11:18:25.728 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:18:25.729 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 182.85
2021-08-25 11:18:25.736 | INFO     | src.policies:train:157 - Total loss: 0.9949999451637268
2021-08-25 11:18:25.739 | INFO     | src.policies:train:103 - Epoch 519 / 800
2021-08-25 11:18:25.740 | INFO     | src.policies:train:109 - Episode 1785
2021-08-25 11:18:25.811 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:18:25.812 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:18:25.813 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 182.85
2021-08-25 11:18:25.819 | INFO     | src.policies:train:157 - Total loss: 0.9949999451637268
2021-08-25 11:18:25.822 | INFO     | src.policies:train:103 - Epoch 520 / 800
2021-08-25 11:18:25.822 | INFO     | src.policies:train:109 - Episode 1786
2021-08-25 11:18:25.883 | 

2021-08-25 11:18:26.978 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:18:26.979 | INFO     | src.policies:train:121 - Mean episode return: 190.0
2021-08-25 11:18:26.980 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 182.42
2021-08-25 11:18:26.987 | INFO     | src.policies:train:157 - Total loss: 0.9970844388008118
2021-08-25 11:18:26.990 | INFO     | src.policies:train:103 - Epoch 532 / 800
2021-08-25 11:18:26.992 | INFO     | src.policies:train:109 - Episode 1801
2021-08-25 11:18:27.033 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:18:27.035 | INFO     | src.policies:train:121 - Mean episode return: 123.0
2021-08-25 11:18:27.036 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 181.65
2021-08-25 11:18:27.036 | INFO     | src.policies:train:109 - Episode 1802
2021-08-25 11:18:27.105 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all a

2021-08-25 11:18:28.133 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:18:28.134 | INFO     | src.policies:train:121 - Mean episode return: 180.0
2021-08-25 11:18:28.134 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 182.67
2021-08-25 11:18:28.141 | INFO     | src.policies:train:157 - Total loss: 0.9971012473106384
2021-08-25 11:18:28.143 | INFO     | src.policies:train:103 - Epoch 542 / 800
2021-08-25 11:18:28.145 | INFO     | src.policies:train:109 - Episode 1817
2021-08-25 11:18:28.203 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:18:28.205 | INFO     | src.policies:train:121 - Mean episode return: 178.0
2021-08-25 11:18:28.206 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 182.64
2021-08-25 11:18:28.207 | INFO     | src.policies:train:109 - Episode 1818
2021-08-25 11:18:28.247 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all a

2021-08-25 11:18:29.164 | INFO     | src.policies:train:103 - Epoch 551 / 800
2021-08-25 11:18:29.165 | INFO     | src.policies:train:109 - Episode 1832
2021-08-25 11:18:29.218 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:18:29.219 | INFO     | src.policies:train:121 - Mean episode return: 154.0
2021-08-25 11:18:29.220 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 181.1
2021-08-25 11:18:29.221 | INFO     | src.policies:train:109 - Episode 1833
2021-08-25 11:18:29.288 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:18:29.289 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:18:29.290 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 181.1
2021-08-25 11:18:29.297 | INFO     | src.policies:train:157 - Total loss: 0.997174859046936
2021-08-25 11:18:29.299 | INFO     | src.policies:train:103 - Epoch 552 / 800
2021-08-25 11:18:2

2021-08-25 11:18:30.350 | INFO     | src.policies:train:157 - Total loss: 0.9949999451637268
2021-08-25 11:18:30.352 | INFO     | src.policies:train:103 - Epoch 562 / 800
2021-08-25 11:18:30.353 | INFO     | src.policies:train:109 - Episode 1848
2021-08-25 11:18:30.420 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:18:30.421 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:18:30.422 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 183.02
2021-08-25 11:18:30.427 | INFO     | src.policies:train:157 - Total loss: 0.9949999451637268
2021-08-25 11:18:30.429 | INFO     | src.policies:train:103 - Epoch 563 / 800
2021-08-25 11:18:30.430 | INFO     | src.policies:train:109 - Episode 1849
2021-08-25 11:18:30.479 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:18:30.480 | INFO     | src.policies:train:121 - Mean episode return: 141.0
2021-08-25 11:18:30.48

2021-08-25 11:18:31.471 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 184.39
2021-08-25 11:18:31.476 | INFO     | src.policies:train:157 - Total loss: 0.9949999451637268
2021-08-25 11:18:31.479 | INFO     | src.policies:train:103 - Epoch 573 / 800
2021-08-25 11:18:31.479 | INFO     | src.policies:train:109 - Episode 1864
2021-08-25 11:18:31.546 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:18:31.547 | INFO     | src.policies:train:121 - Mean episode return: 197.0
2021-08-25 11:18:31.548 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 184.84
2021-08-25 11:18:31.549 | INFO     | src.policies:train:109 - Episode 1865
2021-08-25 11:18:31.598 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:18:31.599 | INFO     | src.policies:train:121 - Mean episode return: 143.0
2021-08-25 11:18:31.600 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 

2021-08-25 11:18:32.605 | INFO     | src.policies:train:109 - Episode 1879
2021-08-25 11:18:32.662 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:18:32.664 | INFO     | src.policies:train:121 - Mean episode return: 166.0
2021-08-25 11:18:32.665 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 185.85
2021-08-25 11:18:32.666 | INFO     | src.policies:train:109 - Episode 1880
2021-08-25 11:18:32.736 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:18:32.737 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:18:32.738 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 185.85
2021-08-25 11:18:32.745 | INFO     | src.policies:train:157 - Total loss: 0.9972677230834961
2021-08-25 11:18:32.747 | INFO     | src.policies:train:103 - Epoch 587 / 800
2021-08-25 11:18:32.748 | INFO     | src.policies:train:109 - Episode 1881
2021-08-25 11:18:3

2021-08-25 11:18:33.811 | INFO     | src.policies:train:157 - Total loss: 0.9972143769264221
2021-08-25 11:18:33.814 | INFO     | src.policies:train:103 - Epoch 598 / 800
2021-08-25 11:18:33.815 | INFO     | src.policies:train:109 - Episode 1895
2021-08-25 11:18:33.882 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:18:33.884 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:18:33.885 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 185.63
2021-08-25 11:18:33.890 | INFO     | src.policies:train:157 - Total loss: 0.9949999451637268
2021-08-25 11:18:33.892 | INFO     | src.policies:train:103 - Epoch 599 / 800
2021-08-25 11:18:33.893 | INFO     | src.policies:train:109 - Episode 1896
2021-08-25 11:18:33.952 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:18:33.954 | INFO     | src.policies:train:121 - Mean episode return: 172.0
2021-08-25 11:18:33.95

2021-08-25 11:18:34.907 | INFO     | src.policies:train:109 - Episode 1911
2021-08-25 11:18:34.958 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:18:34.959 | INFO     | src.policies:train:121 - Mean episode return: 142.0
2021-08-25 11:18:34.960 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 183.39
2021-08-25 11:18:34.967 | INFO     | src.policies:train:157 - Total loss: 0.9968552589416504
2021-08-25 11:18:34.969 | INFO     | src.policies:train:103 - Epoch 608 / 800
2021-08-25 11:18:34.970 | INFO     | src.policies:train:109 - Episode 1912
2021-08-25 11:18:35.022 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:18:35.023 | INFO     | src.policies:train:121 - Mean episode return: 152.0
2021-08-25 11:18:35.024 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 182.91
2021-08-25 11:18:35.025 | INFO     | src.policies:train:109 - Episode 1913
2021-08-25 11:18:3

2021-08-25 11:18:36.000 | INFO     | src.policies:train:121 - Mean episode return: 138.0
2021-08-25 11:18:36.001 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 180.95
2021-08-25 11:18:36.007 | INFO     | src.policies:train:157 - Total loss: 0.9965985417366028
2021-08-25 11:18:36.010 | INFO     | src.policies:train:103 - Epoch 616 / 800
2021-08-25 11:18:36.011 | INFO     | src.policies:train:109 - Episode 1928
2021-08-25 11:18:36.068 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:18:36.069 | INFO     | src.policies:train:121 - Mean episode return: 158.0
2021-08-25 11:18:36.071 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 180.55
2021-08-25 11:18:36.072 | INFO     | src.policies:train:109 - Episode 1929
2021-08-25 11:18:36.121 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:18:36.123 | INFO     | src.policies:train:121 - Mean episode return: 130.0
2021

2021-08-25 11:18:37.164 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:18:37.165 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 179.55
2021-08-25 11:18:37.172 | INFO     | src.policies:train:157 - Total loss: 0.9974552392959595
2021-08-25 11:18:37.175 | INFO     | src.policies:train:103 - Epoch 626 / 800
2021-08-25 11:18:37.176 | INFO     | src.policies:train:109 - Episode 1944
2021-08-25 11:18:37.235 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:18:37.236 | INFO     | src.policies:train:121 - Mean episode return: 166.0
2021-08-25 11:18:37.237 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 179.21
2021-08-25 11:18:37.238 | INFO     | src.policies:train:109 - Episode 1945
2021-08-25 11:18:37.284 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:18:37.285 | INFO     | src.policies:train:121 - Mean episode return: 136.0
2021

2021-08-25 11:18:38.202 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 176.88
2021-08-25 11:18:38.208 | INFO     | src.policies:train:157 - Total loss: 0.9968650341033936
2021-08-25 11:18:38.211 | INFO     | src.policies:train:103 - Epoch 635 / 800
2021-08-25 11:18:38.212 | INFO     | src.policies:train:109 - Episode 1960
2021-08-25 11:18:38.256 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:18:38.257 | INFO     | src.policies:train:121 - Mean episode return: 127.0
2021-08-25 11:18:38.258 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 176.26
2021-08-25 11:18:38.259 | INFO     | src.policies:train:109 - Episode 1961
2021-08-25 11:18:38.316 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:18:38.317 | INFO     | src.policies:train:121 - Mean episode return: 165.0
2021-08-25 11:18:38.318 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 

2021-08-25 11:18:39.228 | INFO     | src.policies:train:157 - Total loss: 0.9961086511611938
2021-08-25 11:18:39.231 | INFO     | src.policies:train:103 - Epoch 644 / 800
2021-08-25 11:18:39.232 | INFO     | src.policies:train:109 - Episode 1976
2021-08-25 11:18:39.296 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:18:39.298 | INFO     | src.policies:train:121 - Mean episode return: 189.0
2021-08-25 11:18:39.298 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 172.09
2021-08-25 11:18:39.299 | INFO     | src.policies:train:109 - Episode 1977
2021-08-25 11:18:39.351 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:18:39.353 | INFO     | src.policies:train:121 - Mean episode return: 148.0
2021-08-25 11:18:39.354 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 171.57
2021-08-25 11:18:39.360 | INFO     | src.policies:train:157 - Total loss: 0.9970322847366333


2021-08-25 11:18:40.300 | INFO     | src.policies:train:157 - Total loss: 0.9949999451637268
2021-08-25 11:18:40.303 | INFO     | src.policies:train:103 - Epoch 653 / 800
2021-08-25 11:18:40.304 | INFO     | src.policies:train:109 - Episode 1992
2021-08-25 11:18:40.357 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:18:40.359 | INFO     | src.policies:train:121 - Mean episode return: 157.0
2021-08-25 11:18:40.360 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 167.89
2021-08-25 11:18:40.361 | INFO     | src.policies:train:109 - Episode 1993
2021-08-25 11:18:40.427 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:18:40.428 | INFO     | src.policies:train:121 - Mean episode return: 193.0
2021-08-25 11:18:40.429 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 168.23
2021-08-25 11:18:40.436 | INFO     | src.policies:train:157 - Total loss: 0.9971426129341125


2021-08-25 11:18:41.433 | INFO     | src.policies:train:109 - Episode 2008
2021-08-25 11:18:41.500 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:18:41.501 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:18:41.502 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 169.31
2021-08-25 11:18:41.509 | INFO     | src.policies:train:157 - Total loss: 0.9973116517066956
2021-08-25 11:18:41.512 | INFO     | src.policies:train:103 - Epoch 663 / 800
2021-08-25 11:18:41.513 | INFO     | src.policies:train:109 - Episode 2009
2021-08-25 11:18:41.574 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:18:41.576 | INFO     | src.policies:train:121 - Mean episode return: 173.0
2021-08-25 11:18:41.577 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 169.04
2021-08-25 11:18:41.577 | INFO     | src.policies:train:109 - Episode 2010
2021-08-25 11:18:4

2021-08-25 11:18:42.708 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:18:42.709 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 174.23
2021-08-25 11:18:42.714 | INFO     | src.policies:train:157 - Total loss: 0.9949999451637268
2021-08-25 11:18:42.716 | INFO     | src.policies:train:103 - Epoch 676 / 800
2021-08-25 11:18:42.717 | INFO     | src.policies:train:109 - Episode 2024
2021-08-25 11:18:42.784 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:18:42.785 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:18:42.786 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 174.42
2021-08-25 11:18:42.791 | INFO     | src.policies:train:157 - Total loss: 0.9949999451637268
2021-08-25 11:18:42.794 | INFO     | src.policies:train:103 - Epoch 677 / 800
2021-08-25 11:18:42.795 | INFO     | src.policies:train:109 - Episode 2025
2021-08-25 11:18:42.863 | 

2021-08-25 11:18:43.895 | INFO     | src.policies:train:157 - Total loss: 0.9949999451637268
2021-08-25 11:18:43.897 | INFO     | src.policies:train:103 - Epoch 691 / 800
2021-08-25 11:18:43.898 | INFO     | src.policies:train:109 - Episode 2039
2021-08-25 11:18:43.967 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:18:43.968 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:18:43.970 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 179.68
2021-08-25 11:18:43.974 | INFO     | src.policies:train:157 - Total loss: 0.9949999451637268
2021-08-25 11:18:43.977 | INFO     | src.policies:train:103 - Epoch 692 / 800
2021-08-25 11:18:43.978 | INFO     | src.policies:train:109 - Episode 2040
2021-08-25 11:18:44.049 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:18:44.051 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:18:44.05

2021-08-25 11:18:45.135 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:18:45.137 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:18:45.138 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 184.02
2021-08-25 11:18:45.143 | INFO     | src.policies:train:157 - Total loss: 0.9949999451637268
2021-08-25 11:18:45.145 | INFO     | src.policies:train:103 - Epoch 705 / 800
2021-08-25 11:18:45.146 | INFO     | src.policies:train:109 - Episode 2055
2021-08-25 11:18:45.213 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:18:45.215 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:18:45.216 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 184.02
2021-08-25 11:18:45.220 | INFO     | src.policies:train:157 - Total loss: 0.9949999451637268
2021-08-25 11:18:45.223 | INFO     | src.policies:train:103 - Epoch 706 / 8

2021-08-25 11:18:46.324 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 189.48
2021-08-25 11:18:46.329 | INFO     | src.policies:train:157 - Total loss: 0.9949999451637268
2021-08-25 11:18:46.331 | INFO     | src.policies:train:103 - Epoch 720 / 800
2021-08-25 11:18:46.332 | INFO     | src.policies:train:109 - Episode 2070
2021-08-25 11:18:46.401 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:18:46.403 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:18:46.403 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 189.48
2021-08-25 11:18:46.408 | INFO     | src.policies:train:157 - Total loss: 0.9949999451637268
2021-08-25 11:18:46.411 | INFO     | src.policies:train:103 - Epoch 721 / 800
2021-08-25 11:18:46.412 | INFO     | src.policies:train:109 - Episode 2071
2021-08-25 11:18:46.480 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 

2021-08-25 11:18:47.559 | INFO     | src.policies:train:103 - Epoch 735 / 800
2021-08-25 11:18:47.560 | INFO     | src.policies:train:109 - Episode 2085
2021-08-25 11:18:47.620 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:18:47.621 | INFO     | src.policies:train:121 - Mean episode return: 165.0
2021-08-25 11:18:47.622 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 193.5
2021-08-25 11:18:47.623 | INFO     | src.policies:train:109 - Episode 2086
2021-08-25 11:18:47.695 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:18:47.696 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:18:47.697 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 194.92
2021-08-25 11:18:47.706 | INFO     | src.policies:train:157 - Total loss: 0.9972601532936096
2021-08-25 11:18:47.709 | INFO     | src.policies:train:103 - Epoch 736 / 800
2021-08-25 11:18

2021-08-25 11:18:48.793 | INFO     | src.policies:train:157 - Total loss: 0.9949999451637268
2021-08-25 11:18:48.796 | INFO     | src.policies:train:103 - Epoch 747 / 800
2021-08-25 11:18:48.797 | INFO     | src.policies:train:109 - Episode 2101
2021-08-25 11:18:48.864 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:18:48.866 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:18:48.867 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 198.06
2021-08-25 11:18:48.872 | INFO     | src.policies:train:157 - Total loss: 0.9949999451637268
2021-08-25 11:18:48.875 | INFO     | src.policies:train:103 - Epoch 748 / 800
2021-08-25 11:18:48.876 | INFO     | src.policies:train:109 - Episode 2102
2021-08-25 11:18:48.928 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:18:48.929 | INFO     | src.policies:train:121 - Mean episode return: 150.0
2021-08-25 11:18:48.93

2021-08-25 11:18:49.967 | INFO     | src.policies:train:157 - Total loss: 0.9974357485771179
2021-08-25 11:18:49.970 | INFO     | src.policies:train:103 - Epoch 760 / 800
2021-08-25 11:18:49.971 | INFO     | src.policies:train:109 - Episode 2117
2021-08-25 11:18:50.041 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:18:50.042 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:18:50.043 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 196.67
2021-08-25 11:18:50.048 | INFO     | src.policies:train:157 - Total loss: 0.9949999451637268
2021-08-25 11:18:50.050 | INFO     | src.policies:train:103 - Epoch 761 / 800
2021-08-25 11:18:50.051 | INFO     | src.policies:train:109 - Episode 2118
2021-08-25 11:18:50.121 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:18:50.123 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:18:50.12

2021-08-25 11:18:51.148 | INFO     | src.policies:train:103 - Epoch 774 / 800
2021-08-25 11:18:51.149 | INFO     | src.policies:train:109 - Episode 2132
2021-08-25 11:18:51.218 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:18:51.219 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:18:51.221 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 196.6
2021-08-25 11:18:51.226 | INFO     | src.policies:train:157 - Total loss: 0.9949999451637268
2021-08-25 11:18:51.229 | INFO     | src.policies:train:103 - Epoch 775 / 800
2021-08-25 11:18:51.229 | INFO     | src.policies:train:109 - Episode 2133
2021-08-25 11:18:51.297 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:18:51.299 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:18:51.300 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 196.6
2021-08-25 11:18:

2021-08-25 11:18:52.385 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:18:52.386 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 196.59
2021-08-25 11:18:52.392 | INFO     | src.policies:train:157 - Total loss: 0.9949999451637268
2021-08-25 11:18:52.394 | INFO     | src.policies:train:103 - Epoch 789 / 800
2021-08-25 11:18:52.395 | INFO     | src.policies:train:109 - Episode 2148
2021-08-25 11:18:52.463 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:18:52.465 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:18:52.466 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 196.59
2021-08-25 11:18:52.470 | INFO     | src.policies:train:157 - Total loss: 0.9949999451637268
2021-08-25 11:18:52.473 | INFO     | src.policies:train:103 - Epoch 790 / 800
2021-08-25 11:18:52.474 | INFO     | src.policies:train:109 - Episode 2149
2021-08-25 11:18:52.542 | 

## PPO

This section deals with training a Cartpole agent using our custom Proximal Policy Optimization implementation.

In [None]:
c1=1.0
c2=0.01
eps=0.2

In [20]:
ppo_policy_nn = models.MLP(observation_space_size, hidden_sizes, action_space_size)
ppo_baseline_nn = models.MLP(observation_space_size, hidden_sizes, 1, log_softmax=False)
ppo_policy = policies.PPOPolicy(env, ppo_policy_nn, ppo_baseline_nn, c1=c1, c2=c2, eps=eps)
ppo_policy.train(
    epochs,
    steps_per_epoch,
    enable_wandb=False
    episodes_mean_reward=episodes_mean_reward
)

2021-08-25 11:21:14.804 | INFO     | src.policies:train:103 - Epoch 1 / 800
2021-08-25 11:21:14.805 | INFO     | src.policies:train:109 - Episode 1
2021-08-25 11:21:14.826 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:14.828 | INFO     | src.policies:train:121 - Mean episode return: 45.0
2021-08-25 11:21:14.829 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 45.0
2021-08-25 11:21:14.831 | INFO     | src.policies:train:109 - Episode 2
2021-08-25 11:21:14.841 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:14.843 | INFO     | src.policies:train:121 - Mean episode return: 14.0
2021-08-25 11:21:14.844 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 29.5
2021-08-25 11:21:14.846 | INFO     | src.policies:train:109 - Episode 3
2021-08-25 11:21:14.856 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:14.857 

2021-08-25 11:21:15.237 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 26.38095238095238
2021-08-25 11:21:15.238 | INFO     | src.policies:train:109 - Episode 22
2021-08-25 11:21:15.250 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:15.252 | INFO     | src.policies:train:121 - Mean episode return: 21.0
2021-08-25 11:21:15.253 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 26.136363636363637
2021-08-25 11:21:15.254 | INFO     | src.policies:train:109 - Episode 23
2021-08-25 11:21:15.264 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:15.265 | INFO     | src.policies:train:121 - Mean episode return: 17.0
2021-08-25 11:21:15.266 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 25.73913043478261
2021-08-25 11:21:15.267 | INFO     | src.policies:train:109 - Episode 24
2021-08-25 11:21:15.275 | DEBUG    | src.policies:execute_episo

2021-08-25 11:21:15.580 | INFO     | src.policies:train:121 - Mean episode return: 27.0
2021-08-25 11:21:15.581 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 23.904761904761905
2021-08-25 11:21:15.582 | INFO     | src.policies:train:109 - Episode 43
2021-08-25 11:21:15.591 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:15.593 | INFO     | src.policies:train:121 - Mean episode return: 15.0
2021-08-25 11:21:15.594 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 23.697674418604652
2021-08-25 11:21:15.595 | INFO     | src.policies:train:109 - Episode 44
2021-08-25 11:21:15.604 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:15.605 | INFO     | src.policies:train:121 - Mean episode return: 17.0
2021-08-25 11:21:15.606 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 23.545454545454547
2021-08-25 11:21:15.607 | INFO     | src.polic

2021-08-25 11:21:15.904 | INFO     | src.policies:train:109 - Episode 63
2021-08-25 11:21:15.918 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:15.919 | INFO     | src.policies:train:121 - Mean episode return: 26.0
2021-08-25 11:21:15.920 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 23.07936507936508
2021-08-25 11:21:15.921 | INFO     | src.policies:train:109 - Episode 64
2021-08-25 11:21:15.932 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:15.933 | INFO     | src.policies:train:121 - Mean episode return: 18.0
2021-08-25 11:21:15.934 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 23.0
2021-08-25 11:21:15.935 | INFO     | src.policies:train:109 - Episode 65
2021-08-25 11:21:15.945 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:15.946 | INFO     | src.policies:train:121 - Mean episode return: 1

2021-08-25 11:21:16.218 | INFO     | src.policies:train:121 - Mean episode return: 12.0
2021-08-25 11:21:16.219 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 21.759036144578314
2021-08-25 11:21:16.220 | INFO     | src.policies:train:109 - Episode 84
2021-08-25 11:21:16.246 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:16.247 | INFO     | src.policies:train:121 - Mean episode return: 60.0
2021-08-25 11:21:16.248 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 22.214285714285715
2021-08-25 11:21:16.249 | INFO     | src.policies:train:109 - Episode 85
2021-08-25 11:21:16.264 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:16.265 | INFO     | src.policies:train:121 - Mean episode return: 25.0
2021-08-25 11:21:16.266 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 22.24705882352941
2021-08-25 11:21:16.267 | INFO     | src.polici

2021-08-25 11:21:16.575 | INFO     | src.policies:train:109 - Episode 104
2021-08-25 11:21:16.586 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:16.587 | INFO     | src.policies:train:121 - Mean episode return: 20.0
2021-08-25 11:21:16.588 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 22.24
2021-08-25 11:21:16.589 | INFO     | src.policies:train:109 - Episode 105
2021-08-25 11:21:16.598 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:16.600 | INFO     | src.policies:train:121 - Mean episode return: 14.0
2021-08-25 11:21:16.601 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 22.04
2021-08-25 11:21:16.602 | INFO     | src.policies:train:109 - Episode 106
2021-08-25 11:21:16.610 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:16.611 | INFO     | src.policies:train:121 - Mean episode return: 11.0
2021

2021-08-25 11:21:16.912 | INFO     | src.policies:train:109 - Episode 125
2021-08-25 11:21:16.923 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:16.925 | INFO     | src.policies:train:121 - Mean episode return: 15.0
2021-08-25 11:21:16.925 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 21.46
2021-08-25 11:21:16.926 | INFO     | src.policies:train:109 - Episode 126
2021-08-25 11:21:16.937 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:16.938 | INFO     | src.policies:train:121 - Mean episode return: 18.0
2021-08-25 11:21:16.939 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 21.37
2021-08-25 11:21:16.940 | INFO     | src.policies:train:109 - Episode 127
2021-08-25 11:21:16.952 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:16.953 | INFO     | src.policies:train:121 - Mean episode return: 20.0
2021

2021-08-25 11:21:17.288 | INFO     | src.policies:train:109 - Episode 146
2021-08-25 11:21:17.305 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:17.306 | INFO     | src.policies:train:121 - Mean episode return: 35.0
2021-08-25 11:21:17.307 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 22.17
2021-08-25 11:21:17.315 | INFO     | src.policies:train:157 - Total loss: 1.003970742225647
2021-08-25 11:21:17.318 | INFO     | src.policies:train:103 - Epoch 16 / 800
2021-08-25 11:21:17.320 | INFO     | src.policies:train:109 - Episode 147
2021-08-25 11:21:17.329 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:17.330 | INFO     | src.policies:train:121 - Mean episode return: 16.0
2021-08-25 11:21:17.332 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 22.03
2021-08-25 11:21:17.333 | INFO     | src.policies:train:109 - Episode 148
2021-08-25 11:21:17.345 | D

2021-08-25 11:21:17.624 | INFO     | src.policies:train:109 - Episode 167
2021-08-25 11:21:17.633 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:17.634 | INFO     | src.policies:train:121 - Mean episode return: 14.0
2021-08-25 11:21:17.635 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 21.85
2021-08-25 11:21:17.636 | INFO     | src.policies:train:109 - Episode 168
2021-08-25 11:21:17.647 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:17.648 | INFO     | src.policies:train:121 - Mean episode return: 21.0
2021-08-25 11:21:17.649 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 21.89
2021-08-25 11:21:17.659 | INFO     | src.policies:train:157 - Total loss: 1.0021296739578247
2021-08-25 11:21:17.662 | INFO     | src.policies:train:103 - Epoch 18 / 800
2021-08-25 11:21:17.663 | INFO     | src.policies:train:109 - Episode 169
2021-08-25 11:21:17.671 | 

2021-08-25 11:21:17.961 | INFO     | src.policies:train:109 - Episode 188
2021-08-25 11:21:17.979 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:17.980 | INFO     | src.policies:train:121 - Mean episode return: 36.0
2021-08-25 11:21:17.982 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 22.3
2021-08-25 11:21:17.990 | INFO     | src.policies:train:157 - Total loss: 1.0024693012237549
2021-08-25 11:21:17.993 | INFO     | src.policies:train:103 - Epoch 20 / 800
2021-08-25 11:21:17.995 | INFO     | src.policies:train:109 - Episode 189
2021-08-25 11:21:18.004 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:18.005 | INFO     | src.policies:train:121 - Mean episode return: 18.0
2021-08-25 11:21:18.006 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 22.0
2021-08-25 11:21:18.007 | INFO     | src.policies:train:109 - Episode 190
2021-08-25 11:21:18.016 | DE

2021-08-25 11:21:18.349 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:18.351 | INFO     | src.policies:train:121 - Mean episode return: 28.0
2021-08-25 11:21:18.352 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 22.9
2021-08-25 11:21:18.353 | INFO     | src.policies:train:109 - Episode 209
2021-08-25 11:21:18.374 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:18.376 | INFO     | src.policies:train:121 - Mean episode return: 49.0
2021-08-25 11:21:18.377 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 23.22
2021-08-25 11:21:18.378 | INFO     | src.policies:train:109 - Episode 210
2021-08-25 11:21:18.390 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:18.391 | INFO     | src.policies:train:121 - Mean episode return: 24.0
2021-08-25 11:21:18.392 | INFO     | src.policies:train:122 - Last 100 episodes

2021-08-25 11:21:18.751 | INFO     | src.policies:train:121 - Mean episode return: 33.0
2021-08-25 11:21:18.752 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 24.73
2021-08-25 11:21:18.753 | INFO     | src.policies:train:109 - Episode 229
2021-08-25 11:21:18.765 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:18.766 | INFO     | src.policies:train:121 - Mean episode return: 21.0
2021-08-25 11:21:18.767 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 24.59
2021-08-25 11:21:18.768 | INFO     | src.policies:train:109 - Episode 230
2021-08-25 11:21:18.788 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:18.789 | INFO     | src.policies:train:121 - Mean episode return: 44.0
2021-08-25 11:21:18.790 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 24.79
2021-08-25 11:21:18.791 | INFO     | src.policies:train:109 - Episode 231
2021-08-2

2021-08-25 11:21:19.107 | INFO     | src.policies:train:121 - Mean episode return: 22.0
2021-08-25 11:21:19.108 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 24.36
2021-08-25 11:21:19.109 | INFO     | src.policies:train:109 - Episode 250
2021-08-25 11:21:19.126 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:19.127 | INFO     | src.policies:train:121 - Mean episode return: 34.0
2021-08-25 11:21:19.128 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 24.56
2021-08-25 11:21:19.129 | INFO     | src.policies:train:109 - Episode 251
2021-08-25 11:21:19.161 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:19.162 | INFO     | src.policies:train:121 - Mean episode return: 69.0
2021-08-25 11:21:19.163 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 25.06
2021-08-25 11:21:19.171 | INFO     | src.policies:train:157 - Total loss: 1.0026313

2021-08-25 11:21:19.497 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 25.95
2021-08-25 11:21:19.499 | INFO     | src.policies:train:109 - Episode 270
2021-08-25 11:21:19.510 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:19.512 | INFO     | src.policies:train:121 - Mean episode return: 18.0
2021-08-25 11:21:19.513 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 25.9
2021-08-25 11:21:19.514 | INFO     | src.policies:train:109 - Episode 271
2021-08-25 11:21:19.531 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:19.533 | INFO     | src.policies:train:121 - Mean episode return: 36.0
2021-08-25 11:21:19.534 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 26.14
2021-08-25 11:21:19.534 | INFO     | src.policies:train:109 - Episode 272
2021-08-25 11:21:19.552 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents

2021-08-25 11:21:19.927 | INFO     | src.policies:train:121 - Mean episode return: 41.0
2021-08-25 11:21:19.928 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 28.11
2021-08-25 11:21:19.929 | INFO     | src.policies:train:109 - Episode 291
2021-08-25 11:21:19.944 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:19.945 | INFO     | src.policies:train:121 - Mean episode return: 29.0
2021-08-25 11:21:19.946 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 28.17
2021-08-25 11:21:19.947 | INFO     | src.policies:train:109 - Episode 292
2021-08-25 11:21:19.959 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:19.960 | INFO     | src.policies:train:121 - Mean episode return: 19.0
2021-08-25 11:21:19.961 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 28.0
2021-08-25 11:21:19.962 | INFO     | src.policies:train:109 - Episode 293
2021-08-25

2021-08-25 11:21:20.317 | INFO     | src.policies:train:103 - Epoch 36 / 800
2021-08-25 11:21:20.318 | INFO     | src.policies:train:109 - Episode 311
2021-08-25 11:21:20.332 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:20.333 | INFO     | src.policies:train:121 - Mean episode return: 30.0
2021-08-25 11:21:20.334 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 28.47
2021-08-25 11:21:20.335 | INFO     | src.policies:train:109 - Episode 312
2021-08-25 11:21:20.355 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:20.356 | INFO     | src.policies:train:121 - Mean episode return: 40.0
2021-08-25 11:21:20.358 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 28.67
2021-08-25 11:21:20.359 | INFO     | src.policies:train:109 - Episode 313
2021-08-25 11:21:20.368 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:2

2021-08-25 11:21:20.701 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 27.66
2021-08-25 11:21:20.702 | INFO     | src.policies:train:109 - Episode 332
2021-08-25 11:21:20.718 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:20.719 | INFO     | src.policies:train:121 - Mean episode return: 31.0
2021-08-25 11:21:20.720 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 27.41
2021-08-25 11:21:20.721 | INFO     | src.policies:train:109 - Episode 333
2021-08-25 11:21:20.734 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:20.735 | INFO     | src.policies:train:121 - Mean episode return: 22.0
2021-08-25 11:21:20.737 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 27.48
2021-08-25 11:21:20.738 | INFO     | src.policies:train:109 - Episode 334
2021-08-25 11:21:20.745 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agent

2021-08-25 11:21:21.074 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 27.48
2021-08-25 11:21:21.084 | INFO     | src.policies:train:157 - Total loss: 1.0024771690368652
2021-08-25 11:21:21.087 | INFO     | src.policies:train:103 - Epoch 41 / 800
2021-08-25 11:21:21.088 | INFO     | src.policies:train:109 - Episode 353
2021-08-25 11:21:21.100 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:21.102 | INFO     | src.policies:train:121 - Mean episode return: 26.0
2021-08-25 11:21:21.103 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 27.53
2021-08-25 11:21:21.104 | INFO     | src.policies:train:109 - Episode 354
2021-08-25 11:21:21.117 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:21.119 | INFO     | src.policies:train:121 - Mean episode return: 27.0
2021-08-25 11:21:21.119 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 27.65
2

2021-08-25 11:21:21.477 | INFO     | src.policies:train:109 - Episode 373
2021-08-25 11:21:21.491 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:21.493 | INFO     | src.policies:train:121 - Mean episode return: 26.0
2021-08-25 11:21:21.494 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 28.48
2021-08-25 11:21:21.495 | INFO     | src.policies:train:109 - Episode 374
2021-08-25 11:21:21.503 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:21.504 | INFO     | src.policies:train:121 - Mean episode return: 12.0
2021-08-25 11:21:21.505 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 28.21
2021-08-25 11:21:21.506 | INFO     | src.policies:train:109 - Episode 375
2021-08-25 11:21:21.525 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:21.526 | INFO     | src.policies:train:121 - Mean episode return: 40.0
2021

2021-08-25 11:21:21.860 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:21.861 | INFO     | src.policies:train:121 - Mean episode return: 9.0
2021-08-25 11:21:21.862 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 27.13
2021-08-25 11:21:21.863 | INFO     | src.policies:train:109 - Episode 394
2021-08-25 11:21:21.873 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:21.874 | INFO     | src.policies:train:121 - Mean episode return: 17.0
2021-08-25 11:21:21.875 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 26.84
2021-08-25 11:21:21.876 | INFO     | src.policies:train:109 - Episode 395
2021-08-25 11:21:21.886 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:21.887 | INFO     | src.policies:train:121 - Mean episode return: 19.0
2021-08-25 11:21:21.888 | INFO     | src.policies:train:122 - Last 100 episodes

2021-08-25 11:21:22.243 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:22.244 | INFO     | src.policies:train:121 - Mean episode return: 21.0
2021-08-25 11:21:22.245 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 26.98
2021-08-25 11:21:22.246 | INFO     | src.policies:train:109 - Episode 415
2021-08-25 11:21:22.253 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:22.254 | INFO     | src.policies:train:121 - Mean episode return: 9.0
2021-08-25 11:21:22.255 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 26.85
2021-08-25 11:21:22.256 | INFO     | src.policies:train:109 - Episode 416
2021-08-25 11:21:22.265 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:22.266 | INFO     | src.policies:train:121 - Mean episode return: 13.0
2021-08-25 11:21:22.267 | INFO     | src.policies:train:122 - Last 100 episodes

2021-08-25 11:21:22.639 | INFO     | src.policies:train:121 - Mean episode return: 31.0
2021-08-25 11:21:22.641 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 26.9
2021-08-25 11:21:22.642 | INFO     | src.policies:train:109 - Episode 435
2021-08-25 11:21:22.657 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:22.658 | INFO     | src.policies:train:121 - Mean episode return: 24.0
2021-08-25 11:21:22.660 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 27.02
2021-08-25 11:21:22.661 | INFO     | src.policies:train:109 - Episode 436
2021-08-25 11:21:22.690 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:22.691 | INFO     | src.policies:train:121 - Mean episode return: 62.0
2021-08-25 11:21:22.692 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 27.43
2021-08-25 11:21:22.693 | INFO     | src.policies:train:109 - Episode 437
2021-08-25

2021-08-25 11:21:23.054 | INFO     | src.policies:train:121 - Mean episode return: 51.0
2021-08-25 11:21:23.056 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 27.68
2021-08-25 11:21:23.064 | INFO     | src.policies:train:157 - Total loss: 1.00259530544281
2021-08-25 11:21:23.067 | INFO     | src.policies:train:103 - Epoch 54 / 800
2021-08-25 11:21:23.068 | INFO     | src.policies:train:109 - Episode 456
2021-08-25 11:21:23.075 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:23.076 | INFO     | src.policies:train:121 - Mean episode return: 11.0
2021-08-25 11:21:23.077 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 27.59
2021-08-25 11:21:23.078 | INFO     | src.policies:train:109 - Episode 457
2021-08-25 11:21:23.086 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:23.087 | INFO     | src.policies:train:121 - Mean episode return: 9.0
2021-08-25 11:2

2021-08-25 11:21:23.413 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 26.66
2021-08-25 11:21:23.414 | INFO     | src.policies:train:109 - Episode 476
2021-08-25 11:21:23.423 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:23.425 | INFO     | src.policies:train:121 - Mean episode return: 16.0
2021-08-25 11:21:23.425 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 26.69
2021-08-25 11:21:23.426 | INFO     | src.policies:train:109 - Episode 477
2021-08-25 11:21:23.440 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:23.441 | INFO     | src.policies:train:121 - Mean episode return: 24.0
2021-08-25 11:21:23.443 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 26.69
2021-08-25 11:21:23.444 | INFO     | src.policies:train:109 - Episode 478
2021-08-25 11:21:23.458 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agent

2021-08-25 11:21:23.810 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 27.22
2021-08-25 11:21:23.818 | INFO     | src.policies:train:157 - Total loss: 1.0018731355667114
2021-08-25 11:21:23.822 | INFO     | src.policies:train:103 - Epoch 59 / 800
2021-08-25 11:21:23.823 | INFO     | src.policies:train:109 - Episode 497
2021-08-25 11:21:23.841 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:23.842 | INFO     | src.policies:train:121 - Mean episode return: 40.0
2021-08-25 11:21:23.843 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 27.04
2021-08-25 11:21:23.844 | INFO     | src.policies:train:109 - Episode 498
2021-08-25 11:21:23.864 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:23.865 | INFO     | src.policies:train:121 - Mean episode return: 41.0
2021-08-25 11:21:23.866 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 27.34
2

2021-08-25 11:21:24.227 | INFO     | src.policies:train:109 - Episode 517
2021-08-25 11:21:24.238 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:24.239 | INFO     | src.policies:train:121 - Mean episode return: 20.0
2021-08-25 11:21:24.240 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 28.04
2021-08-25 11:21:24.241 | INFO     | src.policies:train:109 - Episode 518
2021-08-25 11:21:24.258 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:24.260 | INFO     | src.policies:train:121 - Mean episode return: 35.0
2021-08-25 11:21:24.260 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 28.03
2021-08-25 11:21:24.261 | INFO     | src.policies:train:109 - Episode 519
2021-08-25 11:21:24.269 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:24.270 | INFO     | src.policies:train:121 - Mean episode return: 11.0
2021

2021-08-25 11:21:24.626 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:24.627 | INFO     | src.policies:train:121 - Mean episode return: 13.0
2021-08-25 11:21:24.628 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 27.71
2021-08-25 11:21:24.629 | INFO     | src.policies:train:109 - Episode 538
2021-08-25 11:21:24.639 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:24.640 | INFO     | src.policies:train:121 - Mean episode return: 14.0
2021-08-25 11:21:24.641 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 27.67
2021-08-25 11:21:24.643 | INFO     | src.policies:train:109 - Episode 539
2021-08-25 11:21:24.663 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:24.664 | INFO     | src.policies:train:121 - Mean episode return: 41.0
2021-08-25 11:21:24.665 | INFO     | src.policies:train:122 - Last 100 episode

2021-08-25 11:21:25.012 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:25.013 | INFO     | src.policies:train:121 - Mean episode return: 44.0
2021-08-25 11:21:25.014 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 26.94
2021-08-25 11:21:25.015 | INFO     | src.policies:train:109 - Episode 559
2021-08-25 11:21:25.034 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:25.036 | INFO     | src.policies:train:121 - Mean episode return: 37.0
2021-08-25 11:21:25.037 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 26.74
2021-08-25 11:21:25.038 | INFO     | src.policies:train:109 - Episode 560
2021-08-25 11:21:25.058 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:25.060 | INFO     | src.policies:train:121 - Mean episode return: 41.0
2021-08-25 11:21:25.061 | INFO     | src.policies:train:122 - Last 100 episode

2021-08-25 11:21:25.429 | INFO     | src.policies:train:121 - Mean episode return: 13.0
2021-08-25 11:21:25.430 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 27.58
2021-08-25 11:21:25.432 | INFO     | src.policies:train:109 - Episode 579
2021-08-25 11:21:25.454 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:25.455 | INFO     | src.policies:train:121 - Mean episode return: 40.0
2021-08-25 11:21:25.457 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 27.33
2021-08-25 11:21:25.458 | INFO     | src.policies:train:109 - Episode 580
2021-08-25 11:21:25.475 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:25.476 | INFO     | src.policies:train:121 - Mean episode return: 24.0
2021-08-25 11:21:25.478 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 27.45
2021-08-25 11:21:25.479 | INFO     | src.policies:train:109 - Episode 581
2021-08-2

2021-08-25 11:21:25.908 | INFO     | src.policies:train:121 - Mean episode return: 16.0
2021-08-25 11:21:25.910 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 27.82
2021-08-25 11:21:25.921 | INFO     | src.policies:train:157 - Total loss: 1.0022023916244507
2021-08-25 11:21:25.925 | INFO     | src.policies:train:103 - Epoch 72 / 800
2021-08-25 11:21:25.926 | INFO     | src.policies:train:109 - Episode 600
2021-08-25 11:21:25.951 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:25.952 | INFO     | src.policies:train:121 - Mean episode return: 58.0
2021-08-25 11:21:25.954 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 28.24
2021-08-25 11:21:25.954 | INFO     | src.policies:train:109 - Episode 601
2021-08-25 11:21:25.977 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:25.979 | INFO     | src.policies:train:121 - Mean episode return: 46.0
2021-08-25 1

2021-08-25 11:21:26.397 | INFO     | src.policies:train:109 - Episode 619
2021-08-25 11:21:26.411 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:26.413 | INFO     | src.policies:train:121 - Mean episode return: 26.0
2021-08-25 11:21:26.415 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 29.19
2021-08-25 11:21:26.416 | INFO     | src.policies:train:109 - Episode 620
2021-08-25 11:21:26.428 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:26.429 | INFO     | src.policies:train:121 - Mean episode return: 15.0
2021-08-25 11:21:26.430 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 29.18
2021-08-25 11:21:26.431 | INFO     | src.policies:train:109 - Episode 621
2021-08-25 11:21:26.441 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:26.443 | INFO     | src.policies:train:121 - Mean episode return: 11.0
2021

2021-08-25 11:21:26.829 | INFO     | src.policies:train:109 - Episode 640
2021-08-25 11:21:26.852 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:26.854 | INFO     | src.policies:train:121 - Mean episode return: 45.0
2021-08-25 11:21:26.855 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 28.95
2021-08-25 11:21:26.856 | INFO     | src.policies:train:109 - Episode 641
2021-08-25 11:21:26.870 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:26.871 | INFO     | src.policies:train:121 - Mean episode return: 22.0
2021-08-25 11:21:26.873 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 29.01
2021-08-25 11:21:26.874 | INFO     | src.policies:train:109 - Episode 642
2021-08-25 11:21:26.884 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:26.885 | INFO     | src.policies:train:121 - Mean episode return: 10.0
2021

2021-08-25 11:21:27.218 | INFO     | src.policies:train:109 - Episode 661
2021-08-25 11:21:27.245 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:27.246 | INFO     | src.policies:train:121 - Mean episode return: 62.0
2021-08-25 11:21:27.247 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 28.82
2021-08-25 11:21:27.248 | INFO     | src.policies:train:109 - Episode 662
2021-08-25 11:21:27.262 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:27.263 | INFO     | src.policies:train:121 - Mean episode return: 23.0
2021-08-25 11:21:27.264 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 28.91
2021-08-25 11:21:27.272 | INFO     | src.policies:train:157 - Total loss: 1.002049207687378
2021-08-25 11:21:27.276 | INFO     | src.policies:train:103 - Epoch 80 / 800
2021-08-25 11:21:27.277 | INFO     | src.policies:train:109 - Episode 663
2021-08-25 11:21:27.295 | D

2021-08-25 11:21:27.633 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:27.634 | INFO     | src.policies:train:121 - Mean episode return: 35.0
2021-08-25 11:21:27.635 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 28.83
2021-08-25 11:21:27.636 | INFO     | src.policies:train:109 - Episode 682
2021-08-25 11:21:27.658 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:27.660 | INFO     | src.policies:train:121 - Mean episode return: 44.0
2021-08-25 11:21:27.661 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 28.98
2021-08-25 11:21:27.662 | INFO     | src.policies:train:109 - Episode 683
2021-08-25 11:21:27.691 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:27.692 | INFO     | src.policies:train:121 - Mean episode return: 64.0
2021-08-25 11:21:27.693 | INFO     | src.policies:train:122 - Last 100 episode

2021-08-25 11:21:27.999 | INFO     | src.policies:train:121 - Mean episode return: 17.0
2021-08-25 11:21:28.000 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 26.73
2021-08-25 11:21:28.002 | INFO     | src.policies:train:109 - Episode 703
2021-08-25 11:21:28.016 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:28.017 | INFO     | src.policies:train:121 - Mean episode return: 26.0
2021-08-25 11:21:28.018 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 26.74
2021-08-25 11:21:28.019 | INFO     | src.policies:train:109 - Episode 704
2021-08-25 11:21:28.039 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:28.041 | INFO     | src.policies:train:121 - Mean episode return: 41.0
2021-08-25 11:21:28.042 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 26.59
2021-08-25 11:21:28.050 | INFO     | src.policies:train:157 - Total loss: 1.0023666

2021-08-25 11:21:28.365 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 25.46
2021-08-25 11:21:28.366 | INFO     | src.policies:train:109 - Episode 723
2021-08-25 11:21:28.380 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:28.381 | INFO     | src.policies:train:121 - Mean episode return: 25.0
2021-08-25 11:21:28.382 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 25.58
2021-08-25 11:21:28.383 | INFO     | src.policies:train:109 - Episode 724
2021-08-25 11:21:28.401 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:28.402 | INFO     | src.policies:train:121 - Mean episode return: 38.0
2021-08-25 11:21:28.403 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 25.49
2021-08-25 11:21:28.404 | INFO     | src.policies:train:109 - Episode 725
2021-08-25 11:21:28.411 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agent

2021-08-25 11:21:28.775 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 26.04
2021-08-25 11:21:28.784 | INFO     | src.policies:train:157 - Total loss: 1.0020167827606201
2021-08-25 11:21:28.788 | INFO     | src.policies:train:103 - Epoch 90 / 800
2021-08-25 11:21:28.789 | INFO     | src.policies:train:109 - Episode 744
2021-08-25 11:21:28.813 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:28.815 | INFO     | src.policies:train:121 - Mean episode return: 56.0
2021-08-25 11:21:28.816 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 26.49
2021-08-25 11:21:28.817 | INFO     | src.policies:train:109 - Episode 745
2021-08-25 11:21:28.834 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:28.835 | INFO     | src.policies:train:121 - Mean episode return: 33.0
2021-08-25 11:21:28.836 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 26.62
2

2021-08-25 11:21:29.237 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:29.239 | INFO     | src.policies:train:121 - Mean episode return: 24.0
2021-08-25 11:21:29.239 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 28.25
2021-08-25 11:21:29.241 | INFO     | src.policies:train:109 - Episode 764
2021-08-25 11:21:29.252 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:29.253 | INFO     | src.policies:train:121 - Mean episode return: 18.0
2021-08-25 11:21:29.254 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 28.31
2021-08-25 11:21:29.255 | INFO     | src.policies:train:109 - Episode 765
2021-08-25 11:21:29.271 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:29.273 | INFO     | src.policies:train:121 - Mean episode return: 34.0
2021-08-25 11:21:29.274 | INFO     | src.policies:train:122 - Last 100 episode

2021-08-25 11:21:29.620 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:29.621 | INFO     | src.policies:train:121 - Mean episode return: 23.0
2021-08-25 11:21:29.622 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 27.61
2021-08-25 11:21:29.623 | INFO     | src.policies:train:109 - Episode 785
2021-08-25 11:21:29.632 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:29.634 | INFO     | src.policies:train:121 - Mean episode return: 17.0
2021-08-25 11:21:29.635 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 27.67
2021-08-25 11:21:29.636 | INFO     | src.policies:train:109 - Episode 786
2021-08-25 11:21:29.650 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:29.651 | INFO     | src.policies:train:121 - Mean episode return: 31.0
2021-08-25 11:21:29.652 | INFO     | src.policies:train:122 - Last 100 episode

2021-08-25 11:21:30.038 | INFO     | src.policies:train:121 - Mean episode return: 20.0
2021-08-25 11:21:30.039 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 29.81
2021-08-25 11:21:30.040 | INFO     | src.policies:train:109 - Episode 805
2021-08-25 11:21:30.050 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:30.051 | INFO     | src.policies:train:121 - Mean episode return: 17.0
2021-08-25 11:21:30.052 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 29.54
2021-08-25 11:21:30.053 | INFO     | src.policies:train:109 - Episode 806
2021-08-25 11:21:30.070 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:30.072 | INFO     | src.policies:train:121 - Mean episode return: 35.0
2021-08-25 11:21:30.073 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 29.74
2021-08-25 11:21:30.081 | INFO     | src.policies:train:157 - Total loss: 1.0023719

2021-08-25 11:21:30.411 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 30.18
2021-08-25 11:21:30.411 | INFO     | src.policies:train:109 - Episode 825
2021-08-25 11:21:30.420 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:30.421 | INFO     | src.policies:train:121 - Mean episode return: 11.0
2021-08-25 11:21:30.422 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 30.2
2021-08-25 11:21:30.423 | INFO     | src.policies:train:109 - Episode 826
2021-08-25 11:21:30.435 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:30.436 | INFO     | src.policies:train:121 - Mean episode return: 22.0
2021-08-25 11:21:30.437 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 30.21
2021-08-25 11:21:30.438 | INFO     | src.policies:train:109 - Episode 827
2021-08-25 11:21:30.453 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents

2021-08-25 11:21:30.786 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 29.11
2021-08-25 11:21:30.787 | INFO     | src.policies:train:109 - Episode 846
2021-08-25 11:21:30.798 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:30.799 | INFO     | src.policies:train:121 - Mean episode return: 13.0
2021-08-25 11:21:30.800 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 28.93
2021-08-25 11:21:30.801 | INFO     | src.policies:train:109 - Episode 847
2021-08-25 11:21:30.817 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:30.818 | INFO     | src.policies:train:121 - Mean episode return: 31.0
2021-08-25 11:21:30.819 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 28.94
2021-08-25 11:21:30.828 | INFO     | src.policies:train:157 - Total loss: 1.0019522905349731
2021-08-25 11:21:30.831 | INFO     | src.policies:train:103 - Epoch 104 / 800


2021-08-25 11:21:31.143 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 26.85
2021-08-25 11:21:31.151 | INFO     | src.policies:train:157 - Total loss: 1.001747965812683
2021-08-25 11:21:31.155 | INFO     | src.policies:train:103 - Epoch 106 / 800
2021-08-25 11:21:31.156 | INFO     | src.policies:train:109 - Episode 867
2021-08-25 11:21:31.171 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:31.173 | INFO     | src.policies:train:121 - Mean episode return: 31.0
2021-08-25 11:21:31.174 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 26.89
2021-08-25 11:21:31.175 | INFO     | src.policies:train:109 - Episode 868
2021-08-25 11:21:31.186 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:31.187 | INFO     | src.policies:train:121 - Mean episode return: 17.0
2021-08-25 11:21:31.188 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 26.91
2

2021-08-25 11:21:31.531 | INFO     | src.policies:train:109 - Episode 887
2021-08-25 11:21:31.545 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:31.546 | INFO     | src.policies:train:121 - Mean episode return: 24.0
2021-08-25 11:21:31.547 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 27.15
2021-08-25 11:21:31.548 | INFO     | src.policies:train:109 - Episode 888
2021-08-25 11:21:31.561 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:31.562 | INFO     | src.policies:train:121 - Mean episode return: 23.0
2021-08-25 11:21:31.564 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 27.02
2021-08-25 11:21:31.564 | INFO     | src.policies:train:109 - Episode 889
2021-08-25 11:21:31.572 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:31.574 | INFO     | src.policies:train:121 - Mean episode return: 11.0
2021

2021-08-25 11:21:31.986 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:31.987 | INFO     | src.policies:train:121 - Mean episode return: 32.0
2021-08-25 11:21:31.988 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 27.74
2021-08-25 11:21:31.998 | INFO     | src.policies:train:157 - Total loss: 1.0018831491470337
2021-08-25 11:21:32.001 | INFO     | src.policies:train:103 - Epoch 112 / 800
2021-08-25 11:21:32.002 | INFO     | src.policies:train:109 - Episode 908
2021-08-25 11:21:32.012 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:32.014 | INFO     | src.policies:train:121 - Mean episode return: 15.0
2021-08-25 11:21:32.016 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 27.75
2021-08-25 11:21:32.017 | INFO     | src.policies:train:109 - Episode 909
2021-08-25 11:21:32.028 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents 

2021-08-25 11:21:32.364 | INFO     | src.policies:train:121 - Mean episode return: 20.0
2021-08-25 11:21:32.365 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 27.76
2021-08-25 11:21:32.366 | INFO     | src.policies:train:109 - Episode 928
2021-08-25 11:21:32.374 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:32.375 | INFO     | src.policies:train:121 - Mean episode return: 12.0
2021-08-25 11:21:32.376 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 27.51
2021-08-25 11:21:32.377 | INFO     | src.policies:train:109 - Episode 929
2021-08-25 11:21:32.397 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:32.398 | INFO     | src.policies:train:121 - Mean episode return: 41.0
2021-08-25 11:21:32.399 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 27.78
2021-08-25 11:21:32.400 | INFO     | src.policies:train:109 - Episode 930
2021-08-2

2021-08-25 11:21:32.769 | INFO     | src.policies:train:103 - Epoch 117 / 800
2021-08-25 11:21:32.770 | INFO     | src.policies:train:109 - Episode 948
2021-08-25 11:21:32.779 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:32.781 | INFO     | src.policies:train:121 - Mean episode return: 17.0
2021-08-25 11:21:32.782 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 28.51
2021-08-25 11:21:32.783 | INFO     | src.policies:train:109 - Episode 949
2021-08-25 11:21:32.791 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:32.793 | INFO     | src.policies:train:121 - Mean episode return: 14.0
2021-08-25 11:21:32.794 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 28.52
2021-08-25 11:21:32.795 | INFO     | src.policies:train:109 - Episode 950
2021-08-25 11:21:32.806 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:

2021-08-25 11:21:33.158 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 29.06
2021-08-25 11:21:33.159 | INFO     | src.policies:train:109 - Episode 969
2021-08-25 11:21:33.204 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:33.205 | INFO     | src.policies:train:121 - Mean episode return: 104.0
2021-08-25 11:21:33.206 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 29.66
2021-08-25 11:21:33.214 | INFO     | src.policies:train:157 - Total loss: 1.0024603605270386
2021-08-25 11:21:33.217 | INFO     | src.policies:train:103 - Epoch 120 / 800
2021-08-25 11:21:33.218 | INFO     | src.policies:train:109 - Episode 970
2021-08-25 11:21:33.239 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:33.240 | INFO     | src.policies:train:121 - Mean episode return: 49.0
2021-08-25 11:21:33.242 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 29.57

2021-08-25 11:21:33.601 | INFO     | src.policies:train:109 - Episode 989
2021-08-25 11:21:33.611 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:33.612 | INFO     | src.policies:train:121 - Mean episode return: 15.0
2021-08-25 11:21:33.613 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 30.64
2021-08-25 11:21:33.614 | INFO     | src.policies:train:109 - Episode 990
2021-08-25 11:21:33.628 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:33.629 | INFO     | src.policies:train:121 - Mean episode return: 27.0
2021-08-25 11:21:33.630 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 30.43
2021-08-25 11:21:33.639 | INFO     | src.policies:train:157 - Total loss: 1.002073884010315
2021-08-25 11:21:33.642 | INFO     | src.policies:train:103 - Epoch 123 / 800
2021-08-25 11:21:33.644 | INFO     | src.policies:train:109 - Episode 991
2021-08-25 11:21:33.652 | 

2021-08-25 11:21:33.999 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:34.000 | INFO     | src.policies:train:121 - Mean episode return: 16.0
2021-08-25 11:21:34.001 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 29.2
2021-08-25 11:21:34.002 | INFO     | src.policies:train:109 - Episode 1010
2021-08-25 11:21:34.011 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:34.012 | INFO     | src.policies:train:121 - Mean episode return: 12.0
2021-08-25 11:21:34.014 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 29.2
2021-08-25 11:21:34.014 | INFO     | src.policies:train:109 - Episode 1011
2021-08-25 11:21:34.022 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:34.024 | INFO     | src.policies:train:121 - Mean episode return: 11.0
2021-08-25 11:21:34.025 | INFO     | src.policies:train:122 - Last 100 episode

2021-08-25 11:21:34.403 | INFO     | src.policies:train:121 - Mean episode return: 62.0
2021-08-25 11:21:34.404 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 29.84
2021-08-25 11:21:34.405 | INFO     | src.policies:train:109 - Episode 1030
2021-08-25 11:21:34.430 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:34.431 | INFO     | src.policies:train:121 - Mean episode return: 54.0
2021-08-25 11:21:34.432 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 29.81
2021-08-25 11:21:34.433 | INFO     | src.policies:train:109 - Episode 1031
2021-08-25 11:21:34.442 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:34.443 | INFO     | src.policies:train:121 - Mean episode return: 13.0
2021-08-25 11:21:34.444 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 29.46
2021-08-25 11:21:34.445 | INFO     | src.policies:train:109 - Episode 1032
2021-0

2021-08-25 11:21:34.930 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 32.66
2021-08-25 11:21:34.938 | INFO     | src.policies:train:157 - Total loss: 1.0031001567840576
2021-08-25 11:21:34.941 | INFO     | src.policies:train:103 - Epoch 132 / 800
2021-08-25 11:21:34.942 | INFO     | src.policies:train:109 - Episode 1050
2021-08-25 11:21:34.954 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:34.955 | INFO     | src.policies:train:121 - Mean episode return: 25.0
2021-08-25 11:21:34.956 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 32.7
2021-08-25 11:21:34.957 | INFO     | src.policies:train:109 - Episode 1051
2021-08-25 11:21:34.967 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:34.968 | INFO     | src.policies:train:121 - Mean episode return: 16.0
2021-08-25 11:21:34.968 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 32.71

2021-08-25 11:21:35.318 | INFO     | src.policies:train:109 - Episode 1070
2021-08-25 11:21:35.353 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:35.354 | INFO     | src.policies:train:121 - Mean episode return: 74.0
2021-08-25 11:21:35.356 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 32.23
2021-08-25 11:21:35.357 | INFO     | src.policies:train:109 - Episode 1071
2021-08-25 11:21:35.372 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:35.373 | INFO     | src.policies:train:121 - Mean episode return: 27.0
2021-08-25 11:21:35.375 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 32.07
2021-08-25 11:21:35.384 | INFO     | src.policies:train:157 - Total loss: 1.0016850233078003
2021-08-25 11:21:35.387 | INFO     | src.policies:train:103 - Epoch 135 / 800
2021-08-25 11:21:35.389 | INFO     | src.policies:train:109 - Episode 1072
2021-08-25 11:21:35.41

2021-08-25 11:21:35.787 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:35.788 | INFO     | src.policies:train:121 - Mean episode return: 49.0
2021-08-25 11:21:35.789 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 32.52
2021-08-25 11:21:35.790 | INFO     | src.policies:train:109 - Episode 1091
2021-08-25 11:21:35.803 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:35.804 | INFO     | src.policies:train:121 - Mean episode return: 20.0
2021-08-25 11:21:35.805 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 32.57
2021-08-25 11:21:35.807 | INFO     | src.policies:train:109 - Episode 1092
2021-08-25 11:21:35.816 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:35.818 | INFO     | src.policies:train:121 - Mean episode return: 14.0
2021-08-25 11:21:35.819 | INFO     | src.policies:train:122 - Last 100 episo

2021-08-25 11:21:36.202 | INFO     | src.policies:train:121 - Mean episode return: 20.0
2021-08-25 11:21:36.204 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 32.62
2021-08-25 11:21:36.205 | INFO     | src.policies:train:109 - Episode 1111
2021-08-25 11:21:36.223 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:36.224 | INFO     | src.policies:train:121 - Mean episode return: 29.0
2021-08-25 11:21:36.225 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 32.8
2021-08-25 11:21:36.226 | INFO     | src.policies:train:109 - Episode 1112
2021-08-25 11:21:36.250 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:36.252 | INFO     | src.policies:train:121 - Mean episode return: 45.0
2021-08-25 11:21:36.253 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 33.08
2021-08-25 11:21:36.254 | INFO     | src.policies:train:109 - Episode 1113
2021-08

2021-08-25 11:21:36.687 | INFO     | src.policies:train:109 - Episode 1131
2021-08-25 11:21:36.704 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:36.706 | INFO     | src.policies:train:121 - Mean episode return: 29.0
2021-08-25 11:21:36.706 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 33.15
2021-08-25 11:21:36.707 | INFO     | src.policies:train:109 - Episode 1132
2021-08-25 11:21:36.719 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:36.720 | INFO     | src.policies:train:121 - Mean episode return: 17.0
2021-08-25 11:21:36.721 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 32.86
2021-08-25 11:21:36.723 | INFO     | src.policies:train:109 - Episode 1133
2021-08-25 11:21:36.734 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:36.735 | INFO     | src.policies:train:121 - Mean episode return: 16.0
2

2021-08-25 11:21:37.138 | INFO     | src.policies:train:157 - Total loss: 1.0028561353683472
2021-08-25 11:21:37.142 | INFO     | src.policies:train:103 - Epoch 146 / 800
2021-08-25 11:21:37.144 | INFO     | src.policies:train:109 - Episode 1152
2021-08-25 11:21:37.157 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:37.158 | INFO     | src.policies:train:121 - Mean episode return: 22.0
2021-08-25 11:21:37.160 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 30.59
2021-08-25 11:21:37.161 | INFO     | src.policies:train:109 - Episode 1153
2021-08-25 11:21:37.178 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:37.179 | INFO     | src.policies:train:121 - Mean episode return: 29.0
2021-08-25 11:21:37.181 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 30.69
2021-08-25 11:21:37.181 | INFO     | src.policies:train:109 - Episode 1154
2021-08-25 11:21:37.19

2021-08-25 11:21:37.619 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:37.620 | INFO     | src.policies:train:121 - Mean episode return: 38.0
2021-08-25 11:21:37.622 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 30.79
2021-08-25 11:21:37.634 | INFO     | src.policies:train:157 - Total loss: 1.0025588274002075
2021-08-25 11:21:37.639 | INFO     | src.policies:train:103 - Epoch 149 / 800
2021-08-25 11:21:37.641 | INFO     | src.policies:train:109 - Episode 1173
2021-08-25 11:21:37.661 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:37.662 | INFO     | src.policies:train:121 - Mean episode return: 35.0
2021-08-25 11:21:37.664 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 30.98
2021-08-25 11:21:37.665 | INFO     | src.policies:train:109 - Episode 1174
2021-08-25 11:21:37.681 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agent

2021-08-25 11:21:38.108 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 31.05
2021-08-25 11:21:38.120 | INFO     | src.policies:train:157 - Total loss: 1.0019185543060303
2021-08-25 11:21:38.124 | INFO     | src.policies:train:103 - Epoch 152 / 800
2021-08-25 11:21:38.126 | INFO     | src.policies:train:109 - Episode 1193
2021-08-25 11:21:38.164 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:38.165 | INFO     | src.policies:train:121 - Mean episode return: 84.0
2021-08-25 11:21:38.166 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 31.41
2021-08-25 11:21:38.167 | INFO     | src.policies:train:109 - Episode 1194
2021-08-25 11:21:38.219 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:38.221 | INFO     | src.policies:train:121 - Mean episode return: 116.0
2021-08-25 11:21:38.222 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 32.

2021-08-25 11:21:38.676 | INFO     | src.policies:train:121 - Mean episode return: 27.0
2021-08-25 11:21:38.677 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 32.01
2021-08-25 11:21:38.678 | INFO     | src.policies:train:109 - Episode 1213
2021-08-25 11:21:38.692 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:38.694 | INFO     | src.policies:train:121 - Mean episode return: 19.0
2021-08-25 11:21:38.695 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 32.03
2021-08-25 11:21:38.697 | INFO     | src.policies:train:109 - Episode 1214
2021-08-25 11:21:38.709 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:38.710 | INFO     | src.policies:train:121 - Mean episode return: 18.0
2021-08-25 11:21:38.712 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 31.36
2021-08-25 11:21:38.713 | INFO     | src.policies:train:109 - Episode 1215
2021-0

2021-08-25 11:21:39.187 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 32.21
2021-08-25 11:21:39.188 | INFO     | src.policies:train:109 - Episode 1233
2021-08-25 11:21:39.204 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:39.205 | INFO     | src.policies:train:121 - Mean episode return: 26.0
2021-08-25 11:21:39.207 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 32.31
2021-08-25 11:21:39.207 | INFO     | src.policies:train:109 - Episode 1234
2021-08-25 11:21:39.234 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:39.235 | INFO     | src.policies:train:121 - Mean episode return: 52.0
2021-08-25 11:21:39.237 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 32.7
2021-08-25 11:21:39.238 | INFO     | src.policies:train:109 - Episode 1235
2021-08-25 11:21:39.250 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all age

2021-08-25 11:21:39.693 | INFO     | src.policies:train:109 - Episode 1253
2021-08-25 11:21:39.708 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:39.709 | INFO     | src.policies:train:121 - Mean episode return: 31.0
2021-08-25 11:21:39.711 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 32.26
2021-08-25 11:21:39.712 | INFO     | src.policies:train:109 - Episode 1254
2021-08-25 11:21:39.733 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:39.734 | INFO     | src.policies:train:121 - Mean episode return: 39.0
2021-08-25 11:21:39.735 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 32.32
2021-08-25 11:21:39.736 | INFO     | src.policies:train:109 - Episode 1255
2021-08-25 11:21:39.780 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:39.781 | INFO     | src.policies:train:121 - Mean episode return: 101.0


2021-08-25 11:21:40.177 | INFO     | src.policies:train:157 - Total loss: 1.0024616718292236
2021-08-25 11:21:40.180 | INFO     | src.policies:train:103 - Epoch 164 / 800
2021-08-25 11:21:40.182 | INFO     | src.policies:train:109 - Episode 1274
2021-08-25 11:21:40.195 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:40.197 | INFO     | src.policies:train:121 - Mean episode return: 28.0
2021-08-25 11:21:40.198 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 32.22
2021-08-25 11:21:40.199 | INFO     | src.policies:train:109 - Episode 1275
2021-08-25 11:21:40.227 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:40.228 | INFO     | src.policies:train:121 - Mean episode return: 59.0
2021-08-25 11:21:40.229 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 32.21
2021-08-25 11:21:40.230 | INFO     | src.policies:train:109 - Episode 1276
2021-08-25 11:21:40.25

2021-08-25 11:21:40.606 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:40.608 | INFO     | src.policies:train:121 - Mean episode return: 39.0
2021-08-25 11:21:40.609 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 30.37
2021-08-25 11:21:40.610 | INFO     | src.policies:train:109 - Episode 1295
2021-08-25 11:21:40.623 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:40.625 | INFO     | src.policies:train:121 - Mean episode return: 19.0
2021-08-25 11:21:40.626 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 30.3
2021-08-25 11:21:40.627 | INFO     | src.policies:train:109 - Episode 1296
2021-08-25 11:21:40.639 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:40.640 | INFO     | src.policies:train:121 - Mean episode return: 15.0
2021-08-25 11:21:40.641 | INFO     | src.policies:train:122 - Last 100 episod

2021-08-25 11:21:41.087 | INFO     | src.policies:train:121 - Mean episode return: 28.0
2021-08-25 11:21:41.088 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 32.23
2021-08-25 11:21:41.090 | INFO     | src.policies:train:109 - Episode 1315
2021-08-25 11:21:41.105 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:41.107 | INFO     | src.policies:train:121 - Mean episode return: 25.0
2021-08-25 11:21:41.108 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 31.48
2021-08-25 11:21:41.119 | INFO     | src.policies:train:157 - Total loss: 1.0018495321273804
2021-08-25 11:21:41.124 | INFO     | src.policies:train:103 - Epoch 170 / 800
2021-08-25 11:21:41.125 | INFO     | src.policies:train:109 - Episode 1316
2021-08-25 11:21:41.133 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:41.135 | INFO     | src.policies:train:121 - Mean episode return: 11.0
2021-08-2

2021-08-25 11:21:41.537 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 31.44
2021-08-25 11:21:41.538 | INFO     | src.policies:train:109 - Episode 1335
2021-08-25 11:21:41.550 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:41.552 | INFO     | src.policies:train:121 - Mean episode return: 21.0
2021-08-25 11:21:41.553 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 31.49
2021-08-25 11:21:41.554 | INFO     | src.policies:train:109 - Episode 1336
2021-08-25 11:21:41.563 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:41.564 | INFO     | src.policies:train:121 - Mean episode return: 12.0
2021-08-25 11:21:41.565 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 31.3
2021-08-25 11:21:41.566 | INFO     | src.policies:train:109 - Episode 1337
2021-08-25 11:21:41.579 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all age

2021-08-25 11:21:41.938 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 30.0
2021-08-25 11:21:41.947 | INFO     | src.policies:train:157 - Total loss: 1.0018216371536255
2021-08-25 11:21:41.950 | INFO     | src.policies:train:103 - Epoch 175 / 800
2021-08-25 11:21:41.951 | INFO     | src.policies:train:109 - Episode 1356
2021-08-25 11:21:41.971 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:41.973 | INFO     | src.policies:train:121 - Mean episode return: 43.0
2021-08-25 11:21:41.974 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 29.98
2021-08-25 11:21:41.975 | INFO     | src.policies:train:109 - Episode 1357
2021-08-25 11:21:41.990 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:41.992 | INFO     | src.policies:train:121 - Mean episode return: 28.0
2021-08-25 11:21:41.994 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 30.14

2021-08-25 11:21:42.346 | INFO     | src.policies:train:109 - Episode 1376
2021-08-25 11:21:42.369 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:42.370 | INFO     | src.policies:train:121 - Mean episode return: 49.0
2021-08-25 11:21:42.371 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 29.87
2021-08-25 11:21:42.379 | INFO     | src.policies:train:157 - Total loss: 1.001662015914917
2021-08-25 11:21:42.382 | INFO     | src.policies:train:103 - Epoch 178 / 800
2021-08-25 11:21:42.383 | INFO     | src.policies:train:109 - Episode 1377
2021-08-25 11:21:42.391 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:42.393 | INFO     | src.policies:train:121 - Mean episode return: 15.0
2021-08-25 11:21:42.394 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 29.67
2021-08-25 11:21:42.395 | INFO     | src.policies:train:109 - Episode 1378
2021-08-25 11:21:42.415

2021-08-25 11:21:42.826 | INFO     | src.policies:train:121 - Mean episode return: 39.0
2021-08-25 11:21:42.827 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 32.48
2021-08-25 11:21:42.828 | INFO     | src.policies:train:109 - Episode 1396
2021-08-25 11:21:42.842 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:42.843 | INFO     | src.policies:train:121 - Mean episode return: 24.0
2021-08-25 11:21:42.844 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 32.57
2021-08-25 11:21:42.845 | INFO     | src.policies:train:109 - Episode 1397
2021-08-25 11:21:42.853 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:42.855 | INFO     | src.policies:train:121 - Mean episode return: 11.0
2021-08-25 11:21:42.856 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 32.49
2021-08-25 11:21:42.857 | INFO     | src.policies:train:109 - Episode 1398
2021-0

2021-08-25 11:21:43.307 | INFO     | src.policies:train:109 - Episode 1415
2021-08-25 11:21:43.324 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:43.326 | INFO     | src.policies:train:121 - Mean episode return: 40.0
2021-08-25 11:21:43.326 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 33.57
2021-08-25 11:21:43.327 | INFO     | src.policies:train:109 - Episode 1416
2021-08-25 11:21:43.337 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:43.338 | INFO     | src.policies:train:121 - Mean episode return: 12.0
2021-08-25 11:21:43.339 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 33.58
2021-08-25 11:21:43.340 | INFO     | src.policies:train:109 - Episode 1417
2021-08-25 11:21:43.358 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:43.359 | INFO     | src.policies:train:121 - Mean episode return: 41.0
2

2021-08-25 11:21:43.815 | INFO     | src.policies:train:157 - Total loss: 1.0021885633468628
2021-08-25 11:21:43.818 | INFO     | src.policies:train:103 - Epoch 189 / 800
2021-08-25 11:21:43.819 | INFO     | src.policies:train:109 - Episode 1435
2021-08-25 11:21:43.848 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:43.850 | INFO     | src.policies:train:121 - Mean episode return: 76.0
2021-08-25 11:21:43.851 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 36.37
2021-08-25 11:21:43.852 | INFO     | src.policies:train:109 - Episode 1436
2021-08-25 11:21:43.866 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:43.867 | INFO     | src.policies:train:121 - Mean episode return: 27.0
2021-08-25 11:21:43.868 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 36.52
2021-08-25 11:21:43.869 | INFO     | src.policies:train:109 - Episode 1437
2021-08-25 11:21:43.88

2021-08-25 11:21:44.317 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 38.78
2021-08-25 11:21:44.318 | INFO     | src.policies:train:109 - Episode 1455
2021-08-25 11:21:44.341 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:44.343 | INFO     | src.policies:train:121 - Mean episode return: 31.0
2021-08-25 11:21:44.344 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 38.96
2021-08-25 11:21:44.345 | INFO     | src.policies:train:109 - Episode 1456
2021-08-25 11:21:44.367 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:44.369 | INFO     | src.policies:train:121 - Mean episode return: 42.0
2021-08-25 11:21:44.369 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 38.95
2021-08-25 11:21:44.371 | INFO     | src.policies:train:109 - Episode 1457
2021-08-25 11:21:44.390 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all ag

2021-08-25 11:21:44.742 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:44.743 | INFO     | src.policies:train:121 - Mean episode return: 10.0
2021-08-25 11:21:44.744 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 38.94
2021-08-25 11:21:44.745 | INFO     | src.policies:train:109 - Episode 1476
2021-08-25 11:21:44.753 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:44.755 | INFO     | src.policies:train:121 - Mean episode return: 14.0
2021-08-25 11:21:44.756 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 38.59
2021-08-25 11:21:44.757 | INFO     | src.policies:train:109 - Episode 1477
2021-08-25 11:21:44.770 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:44.771 | INFO     | src.policies:train:121 - Mean episode return: 23.0
2021-08-25 11:21:44.772 | INFO     | src.policies:train:122 - Last 100 episo

2021-08-25 11:21:45.161 | INFO     | src.policies:train:121 - Mean episode return: 14.0
2021-08-25 11:21:45.162 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 37.19
2021-08-25 11:21:45.163 | INFO     | src.policies:train:109 - Episode 1496
2021-08-25 11:21:45.197 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:45.199 | INFO     | src.policies:train:121 - Mean episode return: 85.0
2021-08-25 11:21:45.200 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 37.8
2021-08-25 11:21:45.200 | INFO     | src.policies:train:109 - Episode 1497
2021-08-25 11:21:45.224 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:45.225 | INFO     | src.policies:train:121 - Mean episode return: 55.0
2021-08-25 11:21:45.226 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 38.24
2021-08-25 11:21:45.227 | INFO     | src.policies:train:109 - Episode 1498
2021-08

2021-08-25 11:21:45.631 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 36.92
2021-08-25 11:21:45.631 | INFO     | src.policies:train:109 - Episode 1516
2021-08-25 11:21:45.669 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:45.670 | INFO     | src.policies:train:121 - Mean episode return: 92.0
2021-08-25 11:21:45.671 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 37.72
2021-08-25 11:21:45.680 | INFO     | src.policies:train:157 - Total loss: 1.0029674768447876
2021-08-25 11:21:45.684 | INFO     | src.policies:train:103 - Epoch 202 / 800
2021-08-25 11:21:45.685 | INFO     | src.policies:train:109 - Episode 1517
2021-08-25 11:21:45.698 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:45.699 | INFO     | src.policies:train:121 - Mean episode return: 30.0
2021-08-25 11:21:45.700 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 37.6

2021-08-25 11:21:46.149 | INFO     | src.policies:train:157 - Total loss: 1.0030380487442017
2021-08-25 11:21:46.153 | INFO     | src.policies:train:103 - Epoch 205 / 800
2021-08-25 11:21:46.154 | INFO     | src.policies:train:109 - Episode 1536
2021-08-25 11:21:46.170 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:46.173 | INFO     | src.policies:train:121 - Mean episode return: 38.0
2021-08-25 11:21:46.174 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 36.89
2021-08-25 11:21:46.175 | INFO     | src.policies:train:109 - Episode 1537
2021-08-25 11:21:46.206 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:46.208 | INFO     | src.policies:train:121 - Mean episode return: 73.0
2021-08-25 11:21:46.209 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 37.33
2021-08-25 11:21:46.209 | INFO     | src.policies:train:109 - Episode 1538
2021-08-25 11:21:46.22

2021-08-25 11:21:46.580 | INFO     | src.policies:train:157 - Total loss: 1.0018982887268066
2021-08-25 11:21:46.584 | INFO     | src.policies:train:103 - Epoch 208 / 800
2021-08-25 11:21:46.585 | INFO     | src.policies:train:109 - Episode 1556
2021-08-25 11:21:46.592 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:46.593 | INFO     | src.policies:train:121 - Mean episode return: 14.0
2021-08-25 11:21:46.594 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 35.64
2021-08-25 11:21:46.595 | INFO     | src.policies:train:109 - Episode 1557
2021-08-25 11:21:46.602 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:46.603 | INFO     | src.policies:train:121 - Mean episode return: 9.0
2021-08-25 11:21:46.604 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 35.36
2021-08-25 11:21:46.605 | INFO     | src.policies:train:109 - Episode 1558
2021-08-25 11:21:46.619

2021-08-25 11:21:46.979 | INFO     | src.policies:train:121 - Mean episode return: 18.0
2021-08-25 11:21:46.979 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 36.07
2021-08-25 11:21:46.981 | INFO     | src.policies:train:109 - Episode 1577
2021-08-25 11:21:47.001 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:47.002 | INFO     | src.policies:train:121 - Mean episode return: 42.0
2021-08-25 11:21:47.003 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 36.26
2021-08-25 11:21:47.011 | INFO     | src.policies:train:157 - Total loss: 1.0023647546768188
2021-08-25 11:21:47.015 | INFO     | src.policies:train:103 - Epoch 211 / 800
2021-08-25 11:21:47.016 | INFO     | src.policies:train:109 - Episode 1578
2021-08-25 11:21:47.030 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:47.031 | INFO     | src.policies:train:121 - Mean episode return: 29.0
2021-08-2

2021-08-25 11:21:47.463 | INFO     | src.policies:train:109 - Episode 1596
2021-08-25 11:21:47.470 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:47.472 | INFO     | src.policies:train:121 - Mean episode return: 11.0
2021-08-25 11:21:47.473 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 36.91
2021-08-25 11:21:47.474 | INFO     | src.policies:train:109 - Episode 1597
2021-08-25 11:21:47.488 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:47.490 | INFO     | src.policies:train:121 - Mean episode return: 30.0
2021-08-25 11:21:47.491 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 36.66
2021-08-25 11:21:47.492 | INFO     | src.policies:train:109 - Episode 1598
2021-08-25 11:21:47.503 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:47.504 | INFO     | src.policies:train:121 - Mean episode return: 19.0
2

2021-08-25 11:21:47.917 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:47.918 | INFO     | src.policies:train:121 - Mean episode return: 19.0
2021-08-25 11:21:47.919 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 36.26
2021-08-25 11:21:47.921 | INFO     | src.policies:train:109 - Episode 1617
2021-08-25 11:21:47.938 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:47.939 | INFO     | src.policies:train:121 - Mean episode return: 36.0
2021-08-25 11:21:47.940 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 36.32
2021-08-25 11:21:47.941 | INFO     | src.policies:train:109 - Episode 1618
2021-08-25 11:21:47.951 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:47.953 | INFO     | src.policies:train:121 - Mean episode return: 16.0
2021-08-25 11:21:47.954 | INFO     | src.policies:train:122 - Last 100 episo

2021-08-25 11:21:48.468 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 37.69
2021-08-25 11:21:48.469 | INFO     | src.policies:train:109 - Episode 1636
2021-08-25 11:21:48.484 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:48.485 | INFO     | src.policies:train:121 - Mean episode return: 25.0
2021-08-25 11:21:48.486 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 37.56
2021-08-25 11:21:48.487 | INFO     | src.policies:train:109 - Episode 1637
2021-08-25 11:21:48.498 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:48.499 | INFO     | src.policies:train:121 - Mean episode return: 15.0
2021-08-25 11:21:48.500 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 36.98
2021-08-25 11:21:48.501 | INFO     | src.policies:train:109 - Episode 1638
2021-08-25 11:21:48.512 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all ag

2021-08-25 11:21:48.950 | INFO     | src.policies:train:109 - Episode 1656
2021-08-25 11:21:48.963 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:48.965 | INFO     | src.policies:train:121 - Mean episode return: 21.0
2021-08-25 11:21:48.966 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 38.65
2021-08-25 11:21:48.967 | INFO     | src.policies:train:109 - Episode 1657
2021-08-25 11:21:48.980 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:48.982 | INFO     | src.policies:train:121 - Mean episode return: 19.0
2021-08-25 11:21:48.983 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 38.75
2021-08-25 11:21:48.984 | INFO     | src.policies:train:109 - Episode 1658
2021-08-25 11:21:48.995 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:48.997 | INFO     | src.policies:train:121 - Mean episode return: 14.0
2

2021-08-25 11:21:49.483 | INFO     | src.policies:train:157 - Total loss: 1.0025438070297241
2021-08-25 11:21:49.487 | INFO     | src.policies:train:103 - Epoch 228 / 800
2021-08-25 11:21:49.489 | INFO     | src.policies:train:109 - Episode 1676
2021-08-25 11:21:49.503 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:49.505 | INFO     | src.policies:train:121 - Mean episode return: 29.0
2021-08-25 11:21:49.506 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 40.78
2021-08-25 11:21:49.508 | INFO     | src.policies:train:109 - Episode 1677
2021-08-25 11:21:49.517 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:49.519 | INFO     | src.policies:train:121 - Mean episode return: 12.0
2021-08-25 11:21:49.521 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 40.48
2021-08-25 11:21:49.522 | INFO     | src.policies:train:109 - Episode 1678
2021-08-25 11:21:49.53

2021-08-25 11:21:49.948 | INFO     | src.policies:train:121 - Mean episode return: 44.0
2021-08-25 11:21:49.949 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 39.06
2021-08-25 11:21:49.961 | INFO     | src.policies:train:157 - Total loss: 1.001866102218628
2021-08-25 11:21:49.964 | INFO     | src.policies:train:103 - Epoch 231 / 800
2021-08-25 11:21:49.966 | INFO     | src.policies:train:109 - Episode 1697
2021-08-25 11:21:49.976 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:49.978 | INFO     | src.policies:train:121 - Mean episode return: 14.0
2021-08-25 11:21:49.979 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 38.9
2021-08-25 11:21:49.981 | INFO     | src.policies:train:109 - Episode 1698
2021-08-25 11:21:49.994 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:49.996 | INFO     | src.policies:train:121 - Mean episode return: 18.0
2021-08-25 

2021-08-25 11:21:50.445 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 38.05
2021-08-25 11:21:50.456 | INFO     | src.policies:train:157 - Total loss: 1.001941204071045
2021-08-25 11:21:50.460 | INFO     | src.policies:train:103 - Epoch 234 / 800
2021-08-25 11:21:50.461 | INFO     | src.policies:train:109 - Episode 1717
2021-08-25 11:21:50.477 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:50.480 | INFO     | src.policies:train:121 - Mean episode return: 29.0
2021-08-25 11:21:50.481 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 37.98
2021-08-25 11:21:50.483 | INFO     | src.policies:train:109 - Episode 1718
2021-08-25 11:21:50.498 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:50.499 | INFO     | src.policies:train:121 - Mean episode return: 24.0
2021-08-25 11:21:50.500 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 38.06

2021-08-25 11:21:50.997 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:50.998 | INFO     | src.policies:train:121 - Mean episode return: 33.0
2021-08-25 11:21:51.000 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 36.39
2021-08-25 11:21:51.001 | INFO     | src.policies:train:109 - Episode 1737
2021-08-25 11:21:51.015 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:51.016 | INFO     | src.policies:train:121 - Mean episode return: 16.0
2021-08-25 11:21:51.018 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 36.4
2021-08-25 11:21:51.019 | INFO     | src.policies:train:109 - Episode 1738
2021-08-25 11:21:51.037 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:51.038 | INFO     | src.policies:train:121 - Mean episode return: 29.0
2021-08-25 11:21:51.040 | INFO     | src.policies:train:122 - Last 100 episod

2021-08-25 11:21:51.539 | INFO     | src.policies:train:121 - Mean episode return: 56.0
2021-08-25 11:21:51.540 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 36.57
2021-08-25 11:21:51.549 | INFO     | src.policies:train:157 - Total loss: 1.002274513244629
2021-08-25 11:21:51.553 | INFO     | src.policies:train:103 - Epoch 241 / 800
2021-08-25 11:21:51.555 | INFO     | src.policies:train:109 - Episode 1757
2021-08-25 11:21:51.575 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:51.576 | INFO     | src.policies:train:121 - Mean episode return: 45.0
2021-08-25 11:21:51.578 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 36.83
2021-08-25 11:21:51.579 | INFO     | src.policies:train:109 - Episode 1758
2021-08-25 11:21:51.590 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:51.591 | INFO     | src.policies:train:121 - Mean episode return: 13.0
2021-08-25

2021-08-25 11:21:52.064 | INFO     | src.policies:train:109 - Episode 1776
2021-08-25 11:21:52.085 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:52.086 | INFO     | src.policies:train:121 - Mean episode return: 47.0
2021-08-25 11:21:52.087 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 37.82
2021-08-25 11:21:52.088 | INFO     | src.policies:train:109 - Episode 1777
2021-08-25 11:21:52.102 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:52.103 | INFO     | src.policies:train:121 - Mean episode return: 26.0
2021-08-25 11:21:52.104 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 37.96
2021-08-25 11:21:52.105 | INFO     | src.policies:train:109 - Episode 1778
2021-08-25 11:21:52.115 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:52.117 | INFO     | src.policies:train:121 - Mean episode return: 16.0
2

2021-08-25 11:21:52.562 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:52.563 | INFO     | src.policies:train:121 - Mean episode return: 15.0
2021-08-25 11:21:52.564 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 39.95
2021-08-25 11:21:52.565 | INFO     | src.policies:train:109 - Episode 1797
2021-08-25 11:21:52.596 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:52.597 | INFO     | src.policies:train:121 - Mean episode return: 72.0
2021-08-25 11:21:52.598 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 40.53
2021-08-25 11:21:52.606 | INFO     | src.policies:train:157 - Total loss: 1.002049207687378
2021-08-25 11:21:52.609 | INFO     | src.policies:train:103 - Epoch 248 / 800
2021-08-25 11:21:52.610 | INFO     | src.policies:train:109 - Episode 1798
2021-08-25 11:21:52.624 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents

2021-08-25 11:21:53.103 | INFO     | src.policies:train:109 - Episode 1815
2021-08-25 11:21:53.124 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:53.126 | INFO     | src.policies:train:121 - Mean episode return: 51.0
2021-08-25 11:21:53.127 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 43.84
2021-08-25 11:21:53.128 | INFO     | src.policies:train:109 - Episode 1816
2021-08-25 11:21:53.167 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:53.168 | INFO     | src.policies:train:121 - Mean episode return: 98.0
2021-08-25 11:21:53.169 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 44.68
2021-08-25 11:21:53.170 | INFO     | src.policies:train:109 - Episode 1817
2021-08-25 11:21:53.184 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:53.185 | INFO     | src.policies:train:121 - Mean episode return: 26.0
2

2021-08-25 11:21:53.771 | INFO     | src.policies:train:121 - Mean episode return: 126.0
2021-08-25 11:21:53.772 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 49.02
2021-08-25 11:21:53.779 | INFO     | src.policies:train:157 - Total loss: 1.0020813941955566
2021-08-25 11:21:53.782 | INFO     | src.policies:train:103 - Epoch 257 / 800
2021-08-25 11:21:53.783 | INFO     | src.policies:train:109 - Episode 1835
2021-08-25 11:21:53.797 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:53.798 | INFO     | src.policies:train:121 - Mean episode return: 29.0
2021-08-25 11:21:53.799 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 48.98
2021-08-25 11:21:53.800 | INFO     | src.policies:train:109 - Episode 1836
2021-08-25 11:21:53.814 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:53.815 | INFO     | src.policies:train:121 - Mean episode return: 28.0
2021-08-

2021-08-25 11:21:54.458 | INFO     | src.policies:train:121 - Mean episode return: 31.0
2021-08-25 11:21:54.459 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 56.33
2021-08-25 11:21:54.460 | INFO     | src.policies:train:109 - Episode 1853
2021-08-25 11:21:54.505 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:54.506 | INFO     | src.policies:train:121 - Mean episode return: 105.0
2021-08-25 11:21:54.508 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 57.03
2021-08-25 11:21:54.509 | INFO     | src.policies:train:109 - Episode 1854
2021-08-25 11:21:54.568 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:54.570 | INFO     | src.policies:train:121 - Mean episode return: 146.0
2021-08-25 11:21:54.571 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 57.61
2021-08-25 11:21:54.579 | INFO     | src.policies:train:157 - Total loss: 1.002

2021-08-25 11:21:55.295 | INFO     | src.policies:train:121 - Mean episode return: 134.0
2021-08-25 11:21:55.296 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 64.98
2021-08-25 11:21:55.296 | INFO     | src.policies:train:109 - Episode 1871
2021-08-25 11:21:55.337 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:55.338 | INFO     | src.policies:train:121 - Mean episode return: 95.0
2021-08-25 11:21:55.339 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 65.74
2021-08-25 11:21:55.347 | INFO     | src.policies:train:157 - Total loss: 1.0019248723983765
2021-08-25 11:21:55.350 | INFO     | src.policies:train:103 - Epoch 269 / 800
2021-08-25 11:21:55.351 | INFO     | src.policies:train:109 - Episode 1872
2021-08-25 11:21:55.384 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:55.386 | INFO     | src.policies:train:121 - Mean episode return: 83.0
2021-08-

2021-08-25 11:21:56.129 | INFO     | src.policies:train:121 - Mean episode return: 183.0
2021-08-25 11:21:56.129 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 74.94
2021-08-25 11:21:56.138 | INFO     | src.policies:train:157 - Total loss: 1.003029465675354
2021-08-25 11:21:56.141 | INFO     | src.policies:train:103 - Epoch 275 / 800
2021-08-25 11:21:56.142 | INFO     | src.policies:train:109 - Episode 1889
2021-08-25 11:21:56.201 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:56.203 | INFO     | src.policies:train:121 - Mean episode return: 153.0
2021-08-25 11:21:56.204 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 76.04
2021-08-25 11:21:56.205 | INFO     | src.policies:train:109 - Episode 1890
2021-08-25 11:21:56.262 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:56.263 | INFO     | src.policies:train:121 - Mean episode return: 143.0
2021-08

2021-08-25 11:21:56.980 | INFO     | src.policies:train:121 - Mean episode return: 129.0
2021-08-25 11:21:56.981 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 85.28
2021-08-25 11:21:56.982 | INFO     | src.policies:train:109 - Episode 1906
2021-08-25 11:21:57.035 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:57.036 | INFO     | src.policies:train:121 - Mean episode return: 136.0
2021-08-25 11:21:57.037 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 86.52
2021-08-25 11:21:57.045 | INFO     | src.policies:train:157 - Total loss: 1.0023598670959473
2021-08-25 11:21:57.048 | INFO     | src.policies:train:103 - Epoch 283 / 800
2021-08-25 11:21:57.049 | INFO     | src.policies:train:109 - Episode 1907
2021-08-25 11:21:57.126 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:57.127 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-0

2021-08-25 11:21:57.961 | INFO     | src.policies:train:121 - Mean episode return: 138.0
2021-08-25 11:21:57.962 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 98.13
2021-08-25 11:21:57.969 | INFO     | src.policies:train:157 - Total loss: 1.0023741722106934
2021-08-25 11:21:57.973 | INFO     | src.policies:train:103 - Epoch 291 / 800
2021-08-25 11:21:57.974 | INFO     | src.policies:train:109 - Episode 1923
2021-08-25 11:21:58.023 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:58.024 | INFO     | src.policies:train:121 - Mean episode return: 130.0
2021-08-25 11:21:58.026 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 99.08
2021-08-25 11:21:58.026 | INFO     | src.policies:train:109 - Episode 1924
2021-08-25 11:21:58.084 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:58.086 | INFO     | src.policies:train:121 - Mean episode return: 150.0
2021-0

2021-08-25 11:21:59.071 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:59.073 | INFO     | src.policies:train:121 - Mean episode return: 152.0
2021-08-25 11:21:59.074 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 110.25
2021-08-25 11:21:59.075 | INFO     | src.policies:train:109 - Episode 1939
2021-08-25 11:21:59.096 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:59.098 | INFO     | src.policies:train:121 - Mean episode return: 40.0
2021-08-25 11:21:59.099 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 110.34
2021-08-25 11:21:59.100 | INFO     | src.policies:train:109 - Episode 1940
2021-08-25 11:21:59.181 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:21:59.182 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:21:59.184 | INFO     | src.policies:train:122 - Last 100 e

2021-08-25 11:21:59.983 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 116.52
2021-08-25 11:21:59.984 | INFO     | src.policies:train:109 - Episode 1956
2021-08-25 11:22:00.060 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:00.061 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:22:00.062 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 118.18
2021-08-25 11:22:00.071 | INFO     | src.policies:train:157 - Total loss: 1.0033644437789917
2021-08-25 11:22:00.074 | INFO     | src.policies:train:103 - Epoch 308 / 800
2021-08-25 11:22:00.075 | INFO     | src.policies:train:109 - Episode 1957
2021-08-25 11:22:00.085 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:00.087 | INFO     | src.policies:train:121 - Mean episode return: 20.0
2021-08-25 11:22:00.088 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 1

2021-08-25 11:22:01.066 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:22:01.067 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 124.03
2021-08-25 11:22:01.073 | INFO     | src.policies:train:157 - Total loss: 1.0008388757705688
2021-08-25 11:22:01.076 | INFO     | src.policies:train:103 - Epoch 317 / 800
2021-08-25 11:22:01.077 | INFO     | src.policies:train:109 - Episode 1973
2021-08-25 11:22:01.153 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:01.155 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:22:01.156 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 125.12
2021-08-25 11:22:01.161 | INFO     | src.policies:train:157 - Total loss: 1.0007582902908325
2021-08-25 11:22:01.164 | INFO     | src.policies:train:103 - Epoch 318 / 800
2021-08-25 11:22:01.165 | INFO     | src.policies:train:109 - Episode 1974
2021-08-25 11:22:01.201 | 

2021-08-25 11:22:02.192 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 132.47
2021-08-25 11:22:02.199 | INFO     | src.policies:train:157 - Total loss: 1.0006763935089111
2021-08-25 11:22:02.202 | INFO     | src.policies:train:103 - Epoch 327 / 800
2021-08-25 11:22:02.203 | INFO     | src.policies:train:109 - Episode 1989
2021-08-25 11:22:02.258 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:02.260 | INFO     | src.policies:train:121 - Mean episode return: 138.0
2021-08-25 11:22:02.261 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 132.32
2021-08-25 11:22:02.262 | INFO     | src.policies:train:109 - Episode 1990
2021-08-25 11:22:02.328 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:02.329 | INFO     | src.policies:train:121 - Mean episode return: 166.0
2021-08-25 11:22:02.330 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 

2021-08-25 11:22:03.392 | INFO     | src.policies:train:103 - Epoch 336 / 800
2021-08-25 11:22:03.394 | INFO     | src.policies:train:109 - Episode 2005
2021-08-25 11:22:03.471 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:03.473 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:22:03.474 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 142.24
2021-08-25 11:22:03.481 | INFO     | src.policies:train:157 - Total loss: 1.000608205795288
2021-08-25 11:22:03.485 | INFO     | src.policies:train:103 - Epoch 337 / 800
2021-08-25 11:22:03.486 | INFO     | src.policies:train:109 - Episode 2006
2021-08-25 11:22:03.550 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:03.551 | INFO     | src.policies:train:121 - Mean episode return: 160.0
2021-08-25 11:22:03.552 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 142.48
2021-08-25 11:22

2021-08-25 11:22:04.603 | INFO     | src.policies:train:103 - Epoch 346 / 800
2021-08-25 11:22:04.604 | INFO     | src.policies:train:109 - Episode 2021
2021-08-25 11:22:04.680 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:04.682 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:22:04.683 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 149.41
2021-08-25 11:22:04.689 | INFO     | src.policies:train:157 - Total loss: 1.0006147623062134
2021-08-25 11:22:04.692 | INFO     | src.policies:train:103 - Epoch 347 / 800
2021-08-25 11:22:04.692 | INFO     | src.policies:train:109 - Episode 2022
2021-08-25 11:22:04.769 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:04.771 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:22:04.772 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 150.03
2021-08-25 11:2

2021-08-25 11:22:05.894 | INFO     | src.policies:train:157 - Total loss: 1.0006686449050903
2021-08-25 11:22:05.897 | INFO     | src.policies:train:103 - Epoch 358 / 800
2021-08-25 11:22:05.898 | INFO     | src.policies:train:109 - Episode 2037
2021-08-25 11:22:05.947 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:05.948 | INFO     | src.policies:train:121 - Mean episode return: 121.0
2021-08-25 11:22:05.949 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 153.52
2021-08-25 11:22:05.950 | INFO     | src.policies:train:109 - Episode 2038
2021-08-25 11:22:05.980 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:05.981 | INFO     | src.policies:train:121 - Mean episode return: 70.0
2021-08-25 11:22:05.982 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 152.7
2021-08-25 11:22:05.983 | INFO     | src.policies:train:109 - Episode 2039
2021-08-25 11:22:06.

2021-08-25 11:22:07.053 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:07.054 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:22:07.055 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 159.31
2021-08-25 11:22:07.061 | INFO     | src.policies:train:157 - Total loss: 1.0006074905395508
2021-08-25 11:22:07.064 | INFO     | src.policies:train:103 - Epoch 368 / 800
2021-08-25 11:22:07.065 | INFO     | src.policies:train:109 - Episode 2054
2021-08-25 11:22:07.120 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:07.122 | INFO     | src.policies:train:121 - Mean episode return: 143.0
2021-08-25 11:22:07.123 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 159.0
2021-08-25 11:22:07.124 | INFO     | src.policies:train:109 - Episode 2055
2021-08-25 11:22:07.200 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all ag

2021-08-25 11:22:08.301 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:08.302 | INFO     | src.policies:train:121 - Mean episode return: 104.0
2021-08-25 11:22:08.303 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 165.82
2021-08-25 11:22:08.304 | INFO     | src.policies:train:109 - Episode 2070
2021-08-25 11:22:08.382 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:08.383 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:22:08.385 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 167.55
2021-08-25 11:22:08.392 | INFO     | src.policies:train:157 - Total loss: 1.0021560192108154
2021-08-25 11:22:08.396 | INFO     | src.policies:train:103 - Epoch 378 / 800
2021-08-25 11:22:08.397 | INFO     | src.policies:train:109 - Episode 2071
2021-08-25 11:22:08.471 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all a

2021-08-25 11:22:09.482 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 170.21
2021-08-25 11:22:09.490 | INFO     | src.policies:train:157 - Total loss: 1.0025975704193115
2021-08-25 11:22:09.494 | INFO     | src.policies:train:103 - Epoch 387 / 800
2021-08-25 11:22:09.495 | INFO     | src.policies:train:109 - Episode 2086
2021-08-25 11:22:09.566 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:09.567 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:22:09.569 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 170.88
2021-08-25 11:22:09.574 | INFO     | src.policies:train:157 - Total loss: 1.0005714893341064
2021-08-25 11:22:09.577 | INFO     | src.policies:train:103 - Epoch 388 / 800
2021-08-25 11:22:09.578 | INFO     | src.policies:train:109 - Episode 2087
2021-08-25 11:22:09.649 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 

2021-08-25 11:22:10.676 | INFO     | src.policies:train:109 - Episode 2101
2021-08-25 11:22:10.753 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:10.755 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:22:10.756 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 174.64
2021-08-25 11:22:10.763 | INFO     | src.policies:train:157 - Total loss: 1.000580072402954
2021-08-25 11:22:10.766 | INFO     | src.policies:train:103 - Epoch 400 / 800
2021-08-25 11:22:10.767 | INFO     | src.policies:train:109 - Episode 2102
2021-08-25 11:22:10.823 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:10.825 | INFO     | src.policies:train:121 - Mean episode return: 135.0
2021-08-25 11:22:10.826 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 174.46
2021-08-25 11:22:10.827 | INFO     | src.policies:train:109 - Episode 2103
2021-08-25 11:22:10

2021-08-25 11:22:11.912 | INFO     | src.policies:train:103 - Epoch 410 / 800
2021-08-25 11:22:11.913 | INFO     | src.policies:train:109 - Episode 2117
2021-08-25 11:22:11.970 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:11.972 | INFO     | src.policies:train:121 - Mean episode return: 156.0
2021-08-25 11:22:11.973 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 176.38
2021-08-25 11:22:11.973 | INFO     | src.policies:train:109 - Episode 2118
2021-08-25 11:22:12.043 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:12.045 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:22:12.045 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 176.79
2021-08-25 11:22:12.053 | INFO     | src.policies:train:157 - Total loss: 1.0026451349258423
2021-08-25 11:22:12.056 | INFO     | src.policies:train:103 - Epoch 411 / 800
2021-08-25 11:2

2021-08-25 11:22:13.167 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:13.168 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:22:13.169 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 176.44
2021-08-25 11:22:13.177 | INFO     | src.policies:train:157 - Total loss: 1.0028690099716187
2021-08-25 11:22:13.181 | INFO     | src.policies:train:103 - Epoch 420 / 800
2021-08-25 11:22:13.182 | INFO     | src.policies:train:109 - Episode 2134
2021-08-25 11:22:13.243 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:13.244 | INFO     | src.policies:train:121 - Mean episode return: 169.0
2021-08-25 11:22:13.245 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 176.47
2021-08-25 11:22:13.246 | INFO     | src.policies:train:109 - Episode 2135
2021-08-25 11:22:13.319 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all a

2021-08-25 11:22:14.333 | INFO     | src.policies:train:109 - Episode 2149
2021-08-25 11:22:14.405 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:14.407 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:22:14.408 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 178.6
2021-08-25 11:22:14.414 | INFO     | src.policies:train:157 - Total loss: 1.0005940198898315
2021-08-25 11:22:14.418 | INFO     | src.policies:train:103 - Epoch 429 / 800
2021-08-25 11:22:14.419 | INFO     | src.policies:train:109 - Episode 2150
2021-08-25 11:22:14.455 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:14.457 | INFO     | src.policies:train:121 - Mean episode return: 94.0
2021-08-25 11:22:14.458 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 178.03
2021-08-25 11:22:14.459 | INFO     | src.policies:train:109 - Episode 2151
2021-08-25 11:22:14.

2021-08-25 11:22:15.461 | INFO     | src.policies:train:103 - Epoch 439 / 800
2021-08-25 11:22:15.462 | INFO     | src.policies:train:109 - Episode 2165
2021-08-25 11:22:15.515 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:15.516 | INFO     | src.policies:train:121 - Mean episode return: 145.0
2021-08-25 11:22:15.517 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 177.8
2021-08-25 11:22:15.518 | INFO     | src.policies:train:109 - Episode 2166
2021-08-25 11:22:15.574 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:15.576 | INFO     | src.policies:train:121 - Mean episode return: 156.0
2021-08-25 11:22:15.576 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 177.37
2021-08-25 11:22:15.583 | INFO     | src.policies:train:157 - Total loss: 1.0023548603057861
2021-08-25 11:22:15.587 | INFO     | src.policies:train:103 - Epoch 440 / 800
2021-08-25 11:22

2021-08-25 11:22:16.596 | INFO     | src.policies:train:103 - Epoch 449 / 800
2021-08-25 11:22:16.597 | INFO     | src.policies:train:109 - Episode 2181
2021-08-25 11:22:16.656 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:16.658 | INFO     | src.policies:train:121 - Mean episode return: 163.0
2021-08-25 11:22:16.659 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 178.11
2021-08-25 11:22:16.660 | INFO     | src.policies:train:109 - Episode 2182
2021-08-25 11:22:16.719 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:16.720 | INFO     | src.policies:train:121 - Mean episode return: 157.0
2021-08-25 11:22:16.721 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 178.16
2021-08-25 11:22:16.728 | INFO     | src.policies:train:157 - Total loss: 1.0024750232696533
2021-08-25 11:22:16.731 | INFO     | src.policies:train:103 - Epoch 450 / 800
2021-08-25 11:2

2021-08-25 11:22:17.755 | INFO     | src.policies:train:103 - Epoch 459 / 800
2021-08-25 11:22:17.756 | INFO     | src.policies:train:109 - Episode 2197
2021-08-25 11:22:17.808 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:17.810 | INFO     | src.policies:train:121 - Mean episode return: 145.0
2021-08-25 11:22:17.811 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 176.48
2021-08-25 11:22:17.812 | INFO     | src.policies:train:109 - Episode 2198
2021-08-25 11:22:17.869 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:17.871 | INFO     | src.policies:train:121 - Mean episode return: 158.0
2021-08-25 11:22:17.872 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 176.22
2021-08-25 11:22:17.879 | INFO     | src.policies:train:157 - Total loss: 1.0021708011627197
2021-08-25 11:22:17.882 | INFO     | src.policies:train:103 - Epoch 460 / 800
2021-08-25 11:2

2021-08-25 11:22:18.979 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:18.980 | INFO     | src.policies:train:121 - Mean episode return: 189.0
2021-08-25 11:22:18.981 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 175.01
2021-08-25 11:22:18.989 | INFO     | src.policies:train:157 - Total loss: 1.0027161836624146
2021-08-25 11:22:18.991 | INFO     | src.policies:train:103 - Epoch 469 / 800
2021-08-25 11:22:18.993 | INFO     | src.policies:train:109 - Episode 2214
2021-08-25 11:22:19.047 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:19.048 | INFO     | src.policies:train:121 - Mean episode return: 149.0
2021-08-25 11:22:19.050 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 175.1
2021-08-25 11:22:19.050 | INFO     | src.policies:train:109 - Episode 2215
2021-08-25 11:22:19.122 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all ag

2021-08-25 11:22:20.163 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:20.165 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:22:20.166 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 175.16
2021-08-25 11:22:20.173 | INFO     | src.policies:train:157 - Total loss: 1.0023193359375
2021-08-25 11:22:20.176 | INFO     | src.policies:train:103 - Epoch 479 / 800
2021-08-25 11:22:20.177 | INFO     | src.policies:train:109 - Episode 2230
2021-08-25 11:22:20.250 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:20.252 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:22:20.253 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 175.61
2021-08-25 11:22:20.258 | INFO     | src.policies:train:157 - Total loss: 1.0003548860549927
2021-08-25 11:22:20.261 | INFO     | src.policies:train:103 - Epoch 480 / 800


2021-08-25 11:22:21.365 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 177.42
2021-08-25 11:22:21.370 | INFO     | src.policies:train:157 - Total loss: 1.0001649856567383
2021-08-25 11:22:21.373 | INFO     | src.policies:train:103 - Epoch 492 / 800
2021-08-25 11:22:21.374 | INFO     | src.policies:train:109 - Episode 2245
2021-08-25 11:22:21.445 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:21.447 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:22:21.448 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 177.52
2021-08-25 11:22:21.453 | INFO     | src.policies:train:157 - Total loss: 1.0002065896987915
2021-08-25 11:22:21.457 | INFO     | src.policies:train:103 - Epoch 493 / 800
2021-08-25 11:22:21.458 | INFO     | src.policies:train:109 - Episode 2246
2021-08-25 11:22:21.530 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 

2021-08-25 11:22:22.588 | INFO     | src.policies:train:109 - Episode 2260
2021-08-25 11:22:22.648 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:22.649 | INFO     | src.policies:train:121 - Mean episode return: 160.0
2021-08-25 11:22:22.650 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 181.54
2021-08-25 11:22:22.651 | INFO     | src.policies:train:109 - Episode 2261
2021-08-25 11:22:22.723 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:22.724 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:22:22.725 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 182.17
2021-08-25 11:22:22.733 | INFO     | src.policies:train:157 - Total loss: 1.0025252103805542
2021-08-25 11:22:22.736 | INFO     | src.policies:train:103 - Epoch 506 / 800
2021-08-25 11:22:22.737 | INFO     | src.policies:train:109 - Episode 2262
2021-08-25 11:22:2

2021-08-25 11:22:23.777 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:23.778 | INFO     | src.policies:train:121 - Mean episode return: 145.0
2021-08-25 11:22:23.779 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 179.73
2021-08-25 11:22:23.780 | INFO     | src.policies:train:109 - Episode 2277
2021-08-25 11:22:23.852 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:23.853 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:22:23.854 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 180.1
2021-08-25 11:22:23.862 | INFO     | src.policies:train:157 - Total loss: 1.0023442506790161
2021-08-25 11:22:23.866 | INFO     | src.policies:train:103 - Epoch 516 / 800
2021-08-25 11:22:23.867 | INFO     | src.policies:train:109 - Episode 2278
2021-08-25 11:22:23.932 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all ag

2021-08-25 11:22:24.973 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 181.69
2021-08-25 11:22:24.983 | INFO     | src.policies:train:157 - Total loss: 1.0023596286773682
2021-08-25 11:22:24.986 | INFO     | src.policies:train:103 - Epoch 525 / 800
2021-08-25 11:22:24.987 | INFO     | src.policies:train:109 - Episode 2293
2021-08-25 11:22:25.055 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:25.056 | INFO     | src.policies:train:121 - Mean episode return: 189.0
2021-08-25 11:22:25.057 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 181.58
2021-08-25 11:22:25.058 | INFO     | src.policies:train:109 - Episode 2294
2021-08-25 11:22:25.132 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:25.133 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:22:25.134 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 

2021-08-25 11:22:26.140 | INFO     | src.policies:train:157 - Total loss: 1.001160979270935
2021-08-25 11:22:26.143 | INFO     | src.policies:train:103 - Epoch 534 / 800
2021-08-25 11:22:26.144 | INFO     | src.policies:train:109 - Episode 2309
2021-08-25 11:22:26.214 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:26.216 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:22:26.217 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 180.08
2021-08-25 11:22:26.222 | INFO     | src.policies:train:157 - Total loss: 1.000244140625
2021-08-25 11:22:26.225 | INFO     | src.policies:train:103 - Epoch 535 / 800
2021-08-25 11:22:26.226 | INFO     | src.policies:train:109 - Episode 2310
2021-08-25 11:22:26.294 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:26.296 | INFO     | src.policies:train:121 - Mean episode return: 194.0
2021-08-25 11:22:26.297 | I

2021-08-25 11:22:27.288 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 178.61
2021-08-25 11:22:27.289 | INFO     | src.policies:train:109 - Episode 2325
2021-08-25 11:22:27.352 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:27.353 | INFO     | src.policies:train:121 - Mean episode return: 169.0
2021-08-25 11:22:27.354 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 178.3
2021-08-25 11:22:27.361 | INFO     | src.policies:train:157 - Total loss: 1.0020465850830078
2021-08-25 11:22:27.365 | INFO     | src.policies:train:103 - Epoch 544 / 800
2021-08-25 11:22:27.366 | INFO     | src.policies:train:109 - Episode 2326
2021-08-25 11:22:27.427 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:27.429 | INFO     | src.policies:train:121 - Mean episode return: 170.0
2021-08-25 11:22:27.430 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 1

2021-08-25 11:22:28.416 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 175.59
2021-08-25 11:22:28.417 | INFO     | src.policies:train:109 - Episode 2341
2021-08-25 11:22:28.478 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:28.479 | INFO     | src.policies:train:121 - Mean episode return: 172.0
2021-08-25 11:22:28.480 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 175.31
2021-08-25 11:22:28.487 | INFO     | src.policies:train:157 - Total loss: 1.0026148557662964
2021-08-25 11:22:28.490 | INFO     | src.policies:train:103 - Epoch 554 / 800
2021-08-25 11:22:28.492 | INFO     | src.policies:train:109 - Episode 2342
2021-08-25 11:22:28.547 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:28.548 | INFO     | src.policies:train:121 - Mean episode return: 153.0
2021-08-25 11:22:28.549 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 

2021-08-25 11:22:29.605 | INFO     | src.policies:train:121 - Mean episode return: 184.0
2021-08-25 11:22:29.606 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 173.42
2021-08-25 11:22:29.607 | INFO     | src.policies:train:109 - Episode 2357
2021-08-25 11:22:29.675 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:29.676 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:22:29.677 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 173.42
2021-08-25 11:22:29.684 | INFO     | src.policies:train:157 - Total loss: 1.0026582479476929
2021-08-25 11:22:29.687 | INFO     | src.policies:train:103 - Epoch 565 / 800
2021-08-25 11:22:29.688 | INFO     | src.policies:train:109 - Episode 2358
2021-08-25 11:22:29.756 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:29.757 | INFO     | src.policies:train:121 - Mean episode return: 196.0
2021

2021-08-25 11:22:30.791 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:30.792 | INFO     | src.policies:train:121 - Mean episode return: 146.0
2021-08-25 11:22:30.793 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 175.35
2021-08-25 11:22:30.794 | INFO     | src.policies:train:109 - Episode 2373
2021-08-25 11:22:30.845 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:30.846 | INFO     | src.policies:train:121 - Mean episode return: 148.0
2021-08-25 11:22:30.847 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 175.47
2021-08-25 11:22:30.854 | INFO     | src.policies:train:157 - Total loss: 1.0021276473999023
2021-08-25 11:22:30.857 | INFO     | src.policies:train:103 - Epoch 576 / 800
2021-08-25 11:22:30.858 | INFO     | src.policies:train:109 - Episode 2374
2021-08-25 11:22:30.925 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all a

2021-08-25 11:22:31.868 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 174.6
2021-08-25 11:22:31.875 | INFO     | src.policies:train:157 - Total loss: 1.0021268129348755
2021-08-25 11:22:31.878 | INFO     | src.policies:train:103 - Epoch 585 / 800
2021-08-25 11:22:31.879 | INFO     | src.policies:train:109 - Episode 2389
2021-08-25 11:22:31.946 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:31.948 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:22:31.949 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 175.13
2021-08-25 11:22:31.953 | INFO     | src.policies:train:157 - Total loss: 1.0001559257507324
2021-08-25 11:22:31.956 | INFO     | src.policies:train:103 - Epoch 586 / 800
2021-08-25 11:22:31.957 | INFO     | src.policies:train:109 - Episode 2390
2021-08-25 11:22:32.022 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 1

2021-08-25 11:22:32.981 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 177.1
2021-08-25 11:22:32.988 | INFO     | src.policies:train:157 - Total loss: 1.002247929573059
2021-08-25 11:22:32.991 | INFO     | src.policies:train:103 - Epoch 595 / 800
2021-08-25 11:22:32.992 | INFO     | src.policies:train:109 - Episode 2405
2021-08-25 11:22:33.060 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:33.062 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:22:33.063 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 177.17
2021-08-25 11:22:33.069 | INFO     | src.policies:train:157 - Total loss: 1.000458002090454
2021-08-25 11:22:33.074 | INFO     | src.policies:train:103 - Epoch 596 / 800
2021-08-25 11:22:33.075 | INFO     | src.policies:train:109 - Episode 2406
2021-08-25 11:22:33.146 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:

2021-08-25 11:22:34.153 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:22:34.154 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 178.65
2021-08-25 11:22:34.159 | INFO     | src.policies:train:157 - Total loss: 1.000416874885559
2021-08-25 11:22:34.162 | INFO     | src.policies:train:103 - Epoch 605 / 800
2021-08-25 11:22:34.163 | INFO     | src.policies:train:109 - Episode 2421
2021-08-25 11:22:34.220 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:34.222 | INFO     | src.policies:train:121 - Mean episode return: 170.0
2021-08-25 11:22:34.223 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 178.75
2021-08-25 11:22:34.224 | INFO     | src.policies:train:109 - Episode 2422
2021-08-25 11:22:34.287 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:34.288 | INFO     | src.policies:train:121 - Mean episode return: 183.0
2021-

2021-08-25 11:22:35.222 | INFO     | src.policies:train:121 - Mean episode return: 132.0
2021-08-25 11:22:35.223 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 177.91
2021-08-25 11:22:35.223 | INFO     | src.policies:train:109 - Episode 2437
2021-08-25 11:22:35.287 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:35.289 | INFO     | src.policies:train:121 - Mean episode return: 188.0
2021-08-25 11:22:35.289 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 178.3
2021-08-25 11:22:35.297 | INFO     | src.policies:train:157 - Total loss: 1.0022387504577637
2021-08-25 11:22:35.300 | INFO     | src.policies:train:103 - Epoch 615 / 800
2021-08-25 11:22:35.301 | INFO     | src.policies:train:109 - Episode 2438
2021-08-25 11:22:35.365 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:35.366 | INFO     | src.policies:train:121 - Mean episode return: 189.0
2021-

2021-08-25 11:22:36.300 | INFO     | src.policies:train:157 - Total loss: 1.0023576021194458
2021-08-25 11:22:36.303 | INFO     | src.policies:train:103 - Epoch 623 / 800
2021-08-25 11:22:36.304 | INFO     | src.policies:train:109 - Episode 2453
2021-08-25 11:22:36.348 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:36.350 | INFO     | src.policies:train:121 - Mean episode return: 121.0
2021-08-25 11:22:36.351 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 176.19
2021-08-25 11:22:36.352 | INFO     | src.policies:train:109 - Episode 2454
2021-08-25 11:22:36.423 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:36.424 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:22:36.425 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 176.38
2021-08-25 11:22:36.434 | INFO     | src.policies:train:157 - Total loss: 1.0019841194152832


2021-08-25 11:22:37.394 | INFO     | src.policies:train:103 - Epoch 632 / 800
2021-08-25 11:22:37.395 | INFO     | src.policies:train:109 - Episode 2469
2021-08-25 11:22:37.433 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:37.434 | INFO     | src.policies:train:121 - Mean episode return: 110.0
2021-08-25 11:22:37.435 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 172.07
2021-08-25 11:22:37.436 | INFO     | src.policies:train:109 - Episode 2470
2021-08-25 11:22:37.486 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:37.487 | INFO     | src.policies:train:121 - Mean episode return: 141.0
2021-08-25 11:22:37.488 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 171.59
2021-08-25 11:22:37.494 | INFO     | src.policies:train:157 - Total loss: 1.0012056827545166
2021-08-25 11:22:37.497 | INFO     | src.policies:train:103 - Epoch 633 / 800
2021-08-25 11:2

2021-08-25 11:22:38.434 | INFO     | src.policies:train:103 - Epoch 642 / 800
2021-08-25 11:22:38.435 | INFO     | src.policies:train:109 - Episode 2485
2021-08-25 11:22:38.502 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:38.503 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:22:38.504 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 171.28
2021-08-25 11:22:38.510 | INFO     | src.policies:train:157 - Total loss: 1.000100016593933
2021-08-25 11:22:38.513 | INFO     | src.policies:train:103 - Epoch 643 / 800
2021-08-25 11:22:38.513 | INFO     | src.policies:train:109 - Episode 2486
2021-08-25 11:22:38.580 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:38.582 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:22:38.583 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 171.68
2021-08-25 11:22

2021-08-25 11:22:39.577 | INFO     | src.policies:train:109 - Episode 2501
2021-08-25 11:22:39.646 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:39.647 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:22:39.648 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 172.61
2021-08-25 11:22:39.655 | INFO     | src.policies:train:157 - Total loss: 1.0025243759155273
2021-08-25 11:22:39.658 | INFO     | src.policies:train:103 - Epoch 653 / 800
2021-08-25 11:22:39.659 | INFO     | src.policies:train:109 - Episode 2502
2021-08-25 11:22:39.721 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:39.722 | INFO     | src.policies:train:121 - Mean episode return: 183.0
2021-08-25 11:22:39.723 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 172.44
2021-08-25 11:22:39.724 | INFO     | src.policies:train:109 - Episode 2503
2021-08-25 11:22:3

2021-08-25 11:22:40.691 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:22:40.692 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 170.04
2021-08-25 11:22:40.700 | INFO     | src.policies:train:157 - Total loss: 1.0026201009750366
2021-08-25 11:22:40.704 | INFO     | src.policies:train:103 - Epoch 661 / 800
2021-08-25 11:22:40.706 | INFO     | src.policies:train:109 - Episode 2518
2021-08-25 11:22:40.749 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:40.751 | INFO     | src.policies:train:121 - Mean episode return: 125.0
2021-08-25 11:22:40.752 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 169.6
2021-08-25 11:22:40.752 | INFO     | src.policies:train:109 - Episode 2519
2021-08-25 11:22:40.819 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:40.821 | INFO     | src.policies:train:121 - Mean episode return: 188.0
2021-

2021-08-25 11:22:41.761 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:41.762 | INFO     | src.policies:train:121 - Mean episode return: 138.0
2021-08-25 11:22:41.763 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 169.07
2021-08-25 11:22:41.764 | INFO     | src.policies:train:109 - Episode 2534
2021-08-25 11:22:41.817 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:41.818 | INFO     | src.policies:train:121 - Mean episode return: 152.0
2021-08-25 11:22:41.819 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 169.16
2021-08-25 11:22:41.826 | INFO     | src.policies:train:157 - Total loss: 1.001956582069397
2021-08-25 11:22:41.829 | INFO     | src.policies:train:103 - Epoch 671 / 800
2021-08-25 11:22:41.830 | INFO     | src.policies:train:109 - Episode 2535
2021-08-25 11:22:41.891 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all ag

2021-08-25 11:22:42.769 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 169.37
2021-08-25 11:22:42.776 | INFO     | src.policies:train:157 - Total loss: 1.0012487173080444
2021-08-25 11:22:42.779 | INFO     | src.policies:train:103 - Epoch 679 / 800
2021-08-25 11:22:42.780 | INFO     | src.policies:train:109 - Episode 2550
2021-08-25 11:22:42.837 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:42.838 | INFO     | src.policies:train:121 - Mean episode return: 166.0
2021-08-25 11:22:42.839 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 169.34
2021-08-25 11:22:42.840 | INFO     | src.policies:train:109 - Episode 2551
2021-08-25 11:22:42.909 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:42.910 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:22:42.911 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 

2021-08-25 11:22:43.928 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:43.929 | INFO     | src.policies:train:121 - Mean episode return: 165.0
2021-08-25 11:22:43.930 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 171.65
2021-08-25 11:22:43.931 | INFO     | src.policies:train:109 - Episode 2566
2021-08-25 11:22:43.996 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:43.997 | INFO     | src.policies:train:121 - Mean episode return: 187.0
2021-08-25 11:22:43.997 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 171.52
2021-08-25 11:22:44.005 | INFO     | src.policies:train:157 - Total loss: 1.0026627779006958
2021-08-25 11:22:44.008 | INFO     | src.policies:train:103 - Epoch 690 / 800
2021-08-25 11:22:44.009 | INFO     | src.policies:train:109 - Episode 2567
2021-08-25 11:22:44.076 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all a

2021-08-25 11:22:45.012 | INFO     | src.policies:train:109 - Episode 2581
2021-08-25 11:22:45.080 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:45.082 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:22:45.082 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 175.29
2021-08-25 11:22:45.087 | INFO     | src.policies:train:157 - Total loss: 1.0000466108322144
2021-08-25 11:22:45.090 | INFO     | src.policies:train:103 - Epoch 701 / 800
2021-08-25 11:22:45.091 | INFO     | src.policies:train:109 - Episode 2582
2021-08-25 11:22:45.156 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:45.158 | INFO     | src.policies:train:121 - Mean episode return: 195.0
2021-08-25 11:22:45.159 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 175.24
2021-08-25 11:22:45.159 | INFO     | src.policies:train:109 - Episode 2583
2021-08-25 11:22:4

2021-08-25 11:22:46.131 | INFO     | src.policies:train:109 - Episode 2597
2021-08-25 11:22:46.185 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:46.187 | INFO     | src.policies:train:121 - Mean episode return: 144.0
2021-08-25 11:22:46.188 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 173.35
2021-08-25 11:22:46.189 | INFO     | src.policies:train:109 - Episode 2598
2021-08-25 11:22:46.249 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:46.251 | INFO     | src.policies:train:121 - Mean episode return: 163.0
2021-08-25 11:22:46.252 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 173.68
2021-08-25 11:22:46.262 | INFO     | src.policies:train:157 - Total loss: 1.0021368265151978
2021-08-25 11:22:46.265 | INFO     | src.policies:train:103 - Epoch 711 / 800
2021-08-25 11:22:46.266 | INFO     | src.policies:train:109 - Episode 2599
2021-08-25 11:22:4

2021-08-25 11:22:47.294 | INFO     | src.policies:train:109 - Episode 2613
2021-08-25 11:22:47.363 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:47.364 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:22:47.365 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 175.28
2021-08-25 11:22:47.370 | INFO     | src.policies:train:157 - Total loss: 1.0002535581588745
2021-08-25 11:22:47.373 | INFO     | src.policies:train:103 - Epoch 721 / 800
2021-08-25 11:22:47.373 | INFO     | src.policies:train:109 - Episode 2614
2021-08-25 11:22:47.420 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:47.421 | INFO     | src.policies:train:121 - Mean episode return: 131.0
2021-08-25 11:22:47.422 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 174.88
2021-08-25 11:22:47.423 | INFO     | src.policies:train:109 - Episode 2615
2021-08-25 11:22:4

2021-08-25 11:22:48.411 | INFO     | src.policies:train:109 - Episode 2629
2021-08-25 11:22:48.480 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:48.482 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:22:48.483 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 175.63
2021-08-25 11:22:48.487 | INFO     | src.policies:train:157 - Total loss: 1.000393033027649
2021-08-25 11:22:48.491 | INFO     | src.policies:train:103 - Epoch 731 / 800
2021-08-25 11:22:48.492 | INFO     | src.policies:train:109 - Episode 2630
2021-08-25 11:22:48.554 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:48.555 | INFO     | src.policies:train:121 - Mean episode return: 185.0
2021-08-25 11:22:48.556 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 175.48
2021-08-25 11:22:48.557 | INFO     | src.policies:train:109 - Episode 2631
2021-08-25 11:22:48

2021-08-25 11:22:49.534 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:49.535 | INFO     | src.policies:train:121 - Mean episode return: 149.0
2021-08-25 11:22:49.536 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 175.9
2021-08-25 11:22:49.543 | INFO     | src.policies:train:157 - Total loss: 1.0019237995147705
2021-08-25 11:22:49.546 | INFO     | src.policies:train:103 - Epoch 741 / 800
2021-08-25 11:22:49.547 | INFO     | src.policies:train:109 - Episode 2646
2021-08-25 11:22:49.611 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:49.612 | INFO     | src.policies:train:121 - Mean episode return: 184.0
2021-08-25 11:22:49.613 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 175.93
2021-08-25 11:22:49.614 | INFO     | src.policies:train:109 - Episode 2647
2021-08-25 11:22:49.658 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all ag

2021-08-25 11:22:50.547 | INFO     | src.policies:train:103 - Epoch 751 / 800
2021-08-25 11:22:50.548 | INFO     | src.policies:train:109 - Episode 2661
2021-08-25 11:22:50.618 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:50.620 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:22:50.621 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 175.3
2021-08-25 11:22:50.626 | INFO     | src.policies:train:157 - Total loss: 1.000360131263733
2021-08-25 11:22:50.628 | INFO     | src.policies:train:103 - Epoch 752 / 800
2021-08-25 11:22:50.629 | INFO     | src.policies:train:109 - Episode 2662
2021-08-25 11:22:50.697 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:50.699 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:22:50.700 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 175.3
2021-08-25 11:22:5

2021-08-25 11:22:51.630 | INFO     | src.policies:train:157 - Total loss: 1.0017198324203491
2021-08-25 11:22:51.633 | INFO     | src.policies:train:103 - Epoch 762 / 800
2021-08-25 11:22:51.634 | INFO     | src.policies:train:109 - Episode 2677
2021-08-25 11:22:51.699 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:51.700 | INFO     | src.policies:train:121 - Mean episode return: 189.0
2021-08-25 11:22:51.701 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 173.18
2021-08-25 11:22:51.702 | INFO     | src.policies:train:109 - Episode 2678
2021-08-25 11:22:51.770 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:51.771 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:22:51.772 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 173.2
2021-08-25 11:22:51.780 | INFO     | src.policies:train:157 - Total loss: 1.002837061882019
20

2021-08-25 11:22:52.717 | INFO     | src.policies:train:103 - Epoch 771 / 800
2021-08-25 11:22:52.718 | INFO     | src.policies:train:109 - Episode 2693
2021-08-25 11:22:52.776 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:52.777 | INFO     | src.policies:train:121 - Mean episode return: 164.0
2021-08-25 11:22:52.779 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 172.11
2021-08-25 11:22:52.779 | INFO     | src.policies:train:109 - Episode 2694
2021-08-25 11:22:52.846 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:52.847 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:22:52.848 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 172.11
2021-08-25 11:22:52.855 | INFO     | src.policies:train:157 - Total loss: 1.0027358531951904
2021-08-25 11:22:52.858 | INFO     | src.policies:train:103 - Epoch 772 / 800
2021-08-25 11:2

2021-08-25 11:22:53.891 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:53.892 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:22:53.893 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 172.05
2021-08-25 11:22:53.900 | INFO     | src.policies:train:157 - Total loss: 1.0025460720062256
2021-08-25 11:22:53.903 | INFO     | src.policies:train:103 - Epoch 781 / 800
2021-08-25 11:22:53.904 | INFO     | src.policies:train:109 - Episode 2710
2021-08-25 11:22:53.963 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:53.964 | INFO     | src.policies:train:121 - Mean episode return: 181.0
2021-08-25 11:22:53.965 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 172.42
2021-08-25 11:22:53.966 | INFO     | src.policies:train:109 - Episode 2711
2021-08-25 11:22:54.030 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all a

2021-08-25 11:22:55.052 | INFO     | src.policies:train:103 - Epoch 791 / 800
2021-08-25 11:22:55.053 | INFO     | src.policies:train:109 - Episode 2725
2021-08-25 11:22:55.121 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:55.122 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:22:55.123 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 175.66
2021-08-25 11:22:55.128 | INFO     | src.policies:train:157 - Total loss: 1.0002570152282715
2021-08-25 11:22:55.131 | INFO     | src.policies:train:103 - Epoch 792 / 800
2021-08-25 11:22:55.131 | INFO     | src.policies:train:109 - Episode 2726
2021-08-25 11:22:55.199 | DEBUG    | src.policies:execute_episode:267 - Early stopping, all agents done
2021-08-25 11:22:55.201 | INFO     | src.policies:train:121 - Mean episode return: 200.0
2021-08-25 11:22:55.202 | INFO     | src.policies:train:122 - Last 100 episodes mean return: 175.66
2021-08-25 11:2