# Continuous Control

---

In this notebook, you will learn how to use the Unity ML-Agents environment for the second project of the [Deep Reinforcement Learning Nanodegree](https://www.udacity.com/course/deep-reinforcement-learning-nanodegree--nd893) program.

### 1. Start the Environment

We begin by importing the necessary packages.  If the code cell below returns an error, please revisit the project instructions to double-check that you have installed [Unity ML-Agents](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Installation.md) and [NumPy](http://www.numpy.org/).

In [1]:
from unityagents import UnityEnvironment
import numpy as np

Next, we will start the environment!  **_Before running the code cell below_**, change the `file_name` parameter to match the location of the Unity environment that you downloaded.

- **Mac**: `"path/to/Reacher.app"`
- **Windows** (x86): `"path/to/Reacher_Windows_x86/Reacher.exe"`
- **Windows** (x86_64): `"path/to/Reacher_Windows_x86_64/Reacher.exe"`
- **Linux** (x86): `"path/to/Reacher_Linux/Reacher.x86"`
- **Linux** (x86_64): `"path/to/Reacher_Linux/Reacher.x86_64"`
- **Linux** (x86, headless): `"path/to/Reacher_Linux_NoVis/Reacher.x86"`
- **Linux** (x86_64, headless): `"path/to/Reacher_Linux_NoVis/Reacher.x86_64"`

For instance, if you are using a Mac, then you downloaded `Reacher.app`.  If this file is in the same folder as the notebook, then the line below should appear as follows:
```
env = UnityEnvironment(file_name="Reacher.app")
```

In [2]:
enable_visualization = False
if enable_visualization:
    env = UnityEnvironment(file_name='Reacher_Linux/Reacher.x86_64')
else:
    env = UnityEnvironment(file_name='Reacher_Linux_NoVis/Reacher.x86_64')

INFO:unityagents:
'Academy' started successfully!
Unity Academy name: Academy
        Number of Brains: 1
        Number of External Brains : 1
        Lesson number : 0
        Reset Parameters :
		goal_speed -> 1.0
		goal_size -> 5.0
Unity brain name: ReacherBrain
        Number of Visual Observations (per agent): 0
        Vector Observation space type: continuous
        Vector Observation space size (per agent): 33
        Number of stacked Vector Observation: 1
        Vector Action space type: continuous
        Vector Action space size (per agent): 4
        Vector Action descriptions: , , , 


Environments contain **_brains_** which are responsible for deciding the actions of their associated agents. Here we check for the first brain available, and set it as the default brain we will be controlling from Python.

In [3]:
# get the default brain
brain_name = env.brain_names[0]
brain = env.brains[brain_name]

### 2. Examine the State and Action Spaces

In this environment, a double-jointed arm can move to target locations. A reward of `+0.1` is provided for each step that the agent's hand is in the goal location. Thus, the goal of your agent is to maintain its position at the target location for as many time steps as possible.

The observation space consists of `33` variables corresponding to position, rotation, velocity, and angular velocities of the arm.  Each action is a vector with four numbers, corresponding to torque applicable to two joints.  Every entry in the action vector must be a number between `-1` and `1`.

Run the code cell below to print some information about the environment.

In [4]:
# reset the environment
env_info = env.reset(train_mode=True)[brain_name]

# number of agents
num_agents = len(env_info.agents)
print('Number of agents:', num_agents)

# size of each action
action_size = brain.vector_action_space_size
print('Size of each action:', action_size)

# examine the state space 
states = env_info.vector_observations
state_size = states.shape[1]
print('There are {} agents. Each observes a state with length: {}'.format(states.shape[0], state_size))
print('The state for the first agent looks like:', states[0])

Number of agents: 20
Size of each action: 4
There are 20 agents. Each observes a state with length: 33
The state for the first agent looks like: [ 0.00000000e+00 -4.00000000e+00  0.00000000e+00  1.00000000e+00
 -0.00000000e+00 -0.00000000e+00 -4.37113883e-08  0.00000000e+00
  0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
  0.00000000e+00  0.00000000e+00 -1.00000000e+01  0.00000000e+00
  1.00000000e+00 -0.00000000e+00 -0.00000000e+00 -4.37113883e-08
  0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
  0.00000000e+00  0.00000000e+00  5.75471878e+00 -1.00000000e+00
  5.55726624e+00  0.00000000e+00  1.00000000e+00  0.00000000e+00
 -1.68164849e-01]


### 3. Take Random Actions in the Environment

In the next code cell, you will learn how to use the Python API to control the agent and receive feedback from the environment.

Once this cell is executed, you will watch the agent's performance, if it selects an action at random with each time step.  A window should pop up that allows you to observe the agent, as it moves through the environment.  

Of course, as part of the project, you'll have to change the code so that the agent is able to use its experience to gradually choose better actions when interacting with the environment!

In [5]:
env_info = env.reset(train_mode=True)[brain_name]     # reset the environment    
states = env_info.vector_observations                  # get the current state (for each agent)
scores = np.zeros(num_agents)                          # initialize the score (for each agent)
t_max = 10000
for t in range(t_max):
# while True:
    actions = np.random.randn(num_agents, action_size) # select an action (for each agent)
    actions = np.clip(actions, -1, 1)                  # all actions between -1 and 1
    env_info = env.step(actions)[brain_name]           # send all actions to tne environment
    next_states = env_info.vector_observations         # get next state (for each agent)
    rewards = env_info.rewards                         # get reward (for each agent)
    dones = env_info.local_done                        # see if episode finished
    scores += env_info.rewards                         # update the score (for each agent)
    states = next_states                               # roll over states to next time step
    # print('current time: ', t)
    if np.any(dones):                                  # exit loop if episode finished
        print('done in ', t, ' steps')
        break
print('Total score (averaged over agents) this episode: {}'.format(np.mean(scores)))

done in  1000  steps
Total score (averaged over agents) this episode: 0.12399999722838402


When finished, you can close the environment.

In [6]:
# env.close()

### 4. It's Your Turn!

Now it's your turn to train your own agent to solve the environment!  When training the environment, set `train_mode=True`, so that the line for resetting the environment looks like the following:
```python
env_info = env.reset(train_mode=True)[brain_name]
```

In [7]:
from collections import deque
from ddpg_agent import Agent
from matplotlib import pyplot as plt
import os
import datetime
import torch

# widget bar to display progress
# !pip install progressbar
import progressbar as pb


# Methods on how to improve performance
# 1. Deeper and more complex networks
#   1.1 make critic deeper(ddpg_result_2022_09_30_15_35_43), batch(40): not working, reward stay below 1
# 2. Delayed updates
# 3. Tuned hyper parameters
#   3.1 different batch size
#   20(ddpg_result_2022_09_30_13_04_29)/80(ddpg_result_2022_09_30_14_24_26) not good, 40 good(ddpg_result_2022_09_30_13_27_48)
# 4. Noise function (uniform to normal distribution)
#   4.1 same batch size(40), smaller variation, better performance(ddpg_result_2022_09_30_13_27_48)
# 5. BatchNormalization
# 6. Dueling Q network(Not applicable for ddpg continuous control)
# 7. N-step bootstrapping
# 8. Importance Sampling Priority Queue

# Try 1
# UPDATE_STEP: 1->20, UPDATE_TIMES: 1->10, BATCH_SIZE: 40(ddpg_result_2022_09_30_16_08_56): not working
# Try 2: Change multi_step to step
# UPDATE_STEP: 20, UPDATE_TIMES: 5, BATCH_SIZE: 40(ddpg_result_2022_09_30_16_49_02): not working
# Try 3:
# UPDATE_STEP: 20, UPDATE_TIMES: 5, BATCH_SIZE: 40, disable grad clip for critic, change update time logic: better(ddpg_result_2022_09_30_17_28_04)
# Try 4:
# UPDATE_STEP: 20, UPDATE_TIMES: 5, BATCH_SIZE: 40, above and add BN in both net:(ddpg_result_2022_10_01_15_16_22) better but still fails
# Try 5:
# UPDATE_STEP: 20, UPDATE_TIMES: 5, BATCH_SIZE: 128, above:(ddpg_result_2022_10_01_19_12_28) much better but still fails
# Try 6:
# above and change net param(400,300)->(128,256):(ddpg_result_2022_10_01_22_51_33) worse
# Try 7:
# same as 5, but increase buffer 1e5->1e6(ddpg_result_2022_10_02_01_44_19): much better but not enough
# Try 8:
# same as above but change step logic(ddpg_result_2022_10_02_10_04_21): not working
# Try 9:
# same as above but change noise from normal to ou, noise std 0.2->0.05(ddpg_result_2022_10_02_12_18_34): much better but not enough
# Try 10:
# same as above but increase update times 5->7:(ddpg_result_2022_10_02_18_35_44) worse
# Try 11:
# same as above and change update time logic back:(ddpg_result_2022_10_02_22_45_19) huge bump ast start, but then not growing
# Try 12:
# same as above and change GAMMA 0.99->0.95:(ddpg_result_2022_10_02_23_57_47) not working
# Try 13:
# fix ou noise error by adding -0.5 bias and add reset:(ddpg_result_2022_10_03_02_05_59) much better but not enough
# Try 14:
# change detach to no_grad: (ddpg_result_2022_10_04_16_14_37) success
# Try 13:
# add prioritized exp replay, change update time to 1: (ddpg_result_2022_10_08_21_09_12) success and much faster(however the env unity env got stuck at communication, so only finished 44 episodes)



# conclusion
# small sampler first to evaluate the learning ability
# then increase batch size to give more data to learn


def GetFolderPath(name='result'):
    cur_folder = os.getcwd()
    str_time = datetime.datetime.strftime(datetime.datetime.now(),
                                          '%Y_%m_%d_%H_%M_%S')
    folder_name = '/' + name + '_' + str_time + '/'
    save_path = cur_folder + folder_name
    if not os.path.exists(save_path):
        os.makedirs(save_path)

    return save_path


def plot_result(scores, scores_avg, actual_target_score, save_path):
    target_score_curve = np.ones(len(scores)) * actual_target_score
    fig = plt.figure(figsize=[15,10])
    ax = fig.add_axes([0.1, 0.1, 0.8, 0.8])
    plt.xlabel("episode"), plt.ylabel("score")
    ax.plot(scores)
    ax.plot(scores_avg)
    ax.plot(target_score_curve)
    ax.legend(['score','avg_score', 'target_score'])
    plt.savefig(save_path)
    plt.close(fig)


def Learn(env: UnityEnvironment, n_episodes=1000, max_t=1000, target_score=0.0, actual_target_score=0.0, prioritized_learn=False):

    cur_folder = os.getcwd()
    str_time = datetime.datetime.strftime(datetime.datetime.now(), '%Y_%m_%d_%H_%M_%S')
    folder_name = '/ddpg_result_' + str_time + '/'
    save_path = cur_folder + folder_name
    if not os.path.isdir(save_path):
        os.mkdir(save_path)

    widget = ['training loop: ', pb.Percentage(), ' ', 
          pb.Bar(), ' ', pb.ETA() ]
    timer = pb.ProgressBar(widgets=widget, maxval=n_episodes).start()

    scores_window = deque(maxlen=20)
    avg_agent_scores = []
    avg_scores = []
    cur_target_score = target_score  

    agent = Agent(env, save_path+'log.txt')
    
    for i_episode in range(1, n_episodes+1):
        agent.reset()
        env_info = env.reset(train_mode=True)[brain_name]     # reset the environment    
        num_agents = len(env_info.agents)
        states = env_info.vector_observations                  # get the current state (for each agent)
        scores = np.zeros(num_agents)                          # initialize the score (for each agent)
        for t in range(max_t):
            actions = agent.multi_agent_act(states)
            env_info = env.step(actions)[brain_name]           # send all actions to tne environment
            next_states = env_info.vector_observations         # get next state (for each agent)
            rewards = env_info.rewards                         # get reward (for each agent)
            dones = env_info.local_done                        # see if episode finished
            # agent.multi_agent_step(states, actions, rewards, next_states, dones)
            for i in range(num_agents):
                if prioritized_learn:
                    agent.priority_step(states[i], actions[i], rewards[i], next_states[i], dones[i], t)
                else:
                    agent.step(states[i], actions[i], rewards[i], next_states[i], dones[i], t)
            states = next_states                               # roll over states to next time step
            scores += env_info.rewards                         # update the score (for each agent)
            # print('current time: ', t)
            if np.any(dones):                                  # exit loop if episode finished
                break
        scores_window.append(np.mean(scores))
        print('\nEpisode {}\tTotal score (averaged over agents) this episode: {}'.format(i_episode, scores_window[-1]))
        avg_agent_scores.append(scores_window[-1])
        avg_scores.append(np.mean(scores_window))
        
        # update progress widget bar
        timer.update(i_episode)
        
        if avg_agent_scores[-1] > cur_target_score:
            print('\nEnvironment solved in {:d} episodes!\tAverage Score: {:.2f}'.format(i_episode, avg_scores[-1]))
            result_str = 'epi_' + str(i_episode) + '_score_' + str(avg_scores[-1])
            torch.save(agent.critic_local.state_dict(), save_path+result_str+'_critic_checkpoint.pth')
            torch.save(agent.actor_local.state_dict(), save_path+result_str+'_actor_checkpoint.pth')
            cur_target_score = avg_agent_scores[-1]
        
        pic_save_path = save_path+'results/'
        if not os.path.isdir(pic_save_path):
            os.mkdir(pic_save_path)
        pic_name = 'ddpg_n_epi_'+str(i_episode)+'.png'    
        plot_result(avg_agent_scores, avg_scores, actual_target_score, pic_save_path+pic_name) 
        
        if i_episode > 100 and avg_scores[-1] > actual_target_score:
            print('training done')
            break
    
    timer.finish()

    return avg_scores


In [8]:
# prioritized training
prioritized_learn = True
scores = Learn(env, 300, 10000, 0.0, 30.0, prioritized_learn)

training loop:   0% |                                          | ETA:  --:--:--

state size: 33 action size: 4
t_step: 1000 	after first sample, sample_time: 0.005 	update_time: 0.010 	learn_time: 0.026 	sum: 97.525 	size: 20016                                               

training loop:   0% |                                          | ETA:  10:35:12

t_step: 1000 	after first sample, sample_time: 0.004 	update_time: 0.008 	learn_time: 0.022 	sum: 98.064 	size: 20020                        
Episode 1	Total score (averaged over agents) this episode: 0.42049999060109255

Environment solved in 1 episodes!	Average Score: 0.42
t_step: 1000 	after first sample, sample_time: 0.004 	update_time: 0.011 	learn_time: 0.022 	sum: 237.152 	size: 40032                                                                       

training loop:   0% |                                          | ETA:  10:33:56

t_step: 1000 	after first sample, sample_time: 0.004 	update_time: 0.009 	learn_time: 0.019 	sum: 238.356 	size: 40040                        
Episode 2	Total score (averaged over agents) this episode: 0.9674999783746898

Environment solved in 2 episodes!	Average Score: 0.69
t_step: 1000 	after first sample, sample_time: 0.005 	update_time: 0.009 	learn_time: 0.017 	sum: 463.530 	size: 60054                                               

training loop:   1% |                                          | ETA:  10:31:06

t_step: 1000 	after first sample, sample_time: 0.008 	update_time: 0.010 	learn_time: 0.025 	sum: 466.834 	size: 60060                        
Episode 3	Total score (averaged over agents) this episode: 0.9169999795034528
t_step: 1000 	after first sample, sample_time: 0.007 	update_time: 0.010 	learn_time: 0.040 	sum: 729.555 	size: 80073                                                                       

training loop:   1% |                                          | ETA:  10:28:22

t_step: 1000 	after first sample, sample_time: 0.005 	update_time: 0.012 	learn_time: 0.036 	sum: 727.571 	size: 80080                        
Episode 4	Total score (averaged over agents) this episode: 0.9619999784976244
t_step: 1000 	after first sample, sample_time: 0.005 	update_time: 0.009 	learn_time: 0.028 	sum: 1168.539 	size: 100093                        

training loop:   1% |                                          | ETA:  10:26:57

t_step: 1000 	after first sample, sample_time: 0.005 	update_time: 0.010 	learn_time: 0.022 	sum: 1153.563 	size: 100100                        
Episode 5	Total score (averaged over agents) this episode: 2.1319999523460864

Environment solved in 5 episodes!	Average Score: 1.08
t_step: 1000 	after first sample, sample_time: 0.005 	update_time: 0.011 	learn_time: 0.020 	sum: 1723.256 	size: 120115                        

training loop:   2% |                                          | ETA:  10:40:48

t_step: 1000 	after first sample, sample_time: 0.006 	update_time: 0.010 	learn_time: 0.017 	sum: 1713.176 	size: 120120                        
Episode 6	Total score (averaged over agents) this episode: 3.0929999308660627

Environment solved in 6 episodes!	Average Score: 1.42
t_step: 1000 	after first sample, sample_time: 0.005 	update_time: 0.010 	learn_time: 0.032 	sum: 2274.729 	size: 140135                        

training loop:   2% |                                          | ETA:  10:45:02

t_step: 1000 	after first sample, sample_time: 0.005 	update_time: 0.012 	learn_time: 0.021 	sum: 2266.586 	size: 140140                        
Episode 7	Total score (averaged over agents) this episode: 3.8044999149627983

Environment solved in 7 episodes!	Average Score: 1.76
t_step: 1000 	after first sample, sample_time: 0.006 	update_time: 0.011 	learn_time: 0.037 	sum: 2896.600 	size: 160157                        

training loop:   2% |#                                         | ETA:  10:48:02

t_step: 1000 	after first sample, sample_time: 0.010 	update_time: 0.011 	learn_time: 0.018 	sum: 2888.715 	size: 160160                        
Episode 8	Total score (averaged over agents) this episode: 4.022999910078942

Environment solved in 8 episodes!	Average Score: 2.04
t_step: 1000 	after first sample, sample_time: 0.007 	update_time: 0.012 	learn_time: 0.027 	sum: 3600.126 	size: 180177                        

training loop:   3% |#                                         | ETA:  10:54:12

t_step: 1000 	after first sample, sample_time: 0.006 	update_time: 0.012 	learn_time: 0.024 	sum: 3589.022 	size: 180180                        
Episode 9	Total score (averaged over agents) this episode: 4.521999898925424

Environment solved in 9 episodes!	Average Score: 2.32
t_step: 1000 	after first sample, sample_time: 0.007 	update_time: 0.012 	learn_time: 0.024 	sum: 4294.976 	size: 200195                                               

training loop:   3% |#                                         | ETA:  10:56:50

t_step: 1000 	after first sample, sample_time: 0.006 	update_time: 0.014 	learn_time: 0.025 	sum: 4279.873 	size: 200200                        
Episode 10	Total score (averaged over agents) this episode: 4.973499888833612

Environment solved in 10 episodes!	Average Score: 2.58
t_step: 1000 	after first sample, sample_time: 0.006 	update_time: 0.013 	learn_time: 0.035 	sum: 4982.478 	size: 220217                        

training loop:   3% |#                                         | ETA:  10:59:34

t_step: 1000 	after first sample, sample_time: 0.011 	update_time: 0.010 	learn_time: 0.036 	sum: 4974.294 	size: 220220                        
Episode 11	Total score (averaged over agents) this episode: 6.217499861028045

Environment solved in 11 episodes!	Average Score: 2.91
t_step: 1000 	after first sample, sample_time: 0.007 	update_time: 0.013 	learn_time: 0.040 	sum: 5754.518 	size: 240237                                               

training loop:   4% |#                                         | ETA:  11:02:48

t_step: 1000 	after first sample, sample_time: 0.006 	update_time: 0.011 	learn_time: 0.027 	sum: 5742.463 	size: 240240                        
Episode 12	Total score (averaged over agents) this episode: 9.156499795336277

Environment solved in 12 episodes!	Average Score: 3.43
t_step: 1000 	after first sample, sample_time: 0.009 	update_time: 0.012 	learn_time: 0.025 	sum: 6533.641 	size: 260253                                               

training loop:   4% |#                                         | ETA:  11:05:56

t_step: 1000 	after first sample, sample_time: 0.007 	update_time: 0.017 	learn_time: 0.027 	sum: 6505.528 	size: 260260                        
Episode 13	Total score (averaged over agents) this episode: 10.197999772056937

Environment solved in 13 episodes!	Average Score: 3.95
t_step: 1000 	after first sample, sample_time: 0.007 	update_time: 0.014 	learn_time: 0.026 	sum: 7397.847 	size: 280278                        

training loop:   4% |#                                         | ETA:  11:07:30

t_step: 1000 	after first sample, sample_time: 0.008 	update_time: 0.015 	learn_time: 0.040 	sum: 7388.576 	size: 280280                        
Episode 14	Total score (averaged over agents) this episode: 11.887999734282493

Environment solved in 14 episodes!	Average Score: 4.52
t_step: 1000 	after first sample, sample_time: 0.005 	update_time: 0.013 	learn_time: 0.030 	sum: 8458.384 	size: 300299                        

training loop:   5% |##                                        | ETA:  11:09:12

t_step: 1000 	after first sample, sample_time: 0.011 	update_time: 0.012 	learn_time: 0.026 	sum: 8455.548 	size: 300300                        
Episode 15	Total score (averaged over agents) this episode: 15.688999649323524

Environment solved in 15 episodes!	Average Score: 5.26
t_step: 1000 	after first sample, sample_time: 0.008 	update_time: 0.016 	learn_time: 0.023 	sum: 9621.733 	size: 320314                                                                       

training loop:   5% |##                                        | ETA:  11:11:57

t_step: 1000 	after first sample, sample_time: 0.014 	update_time: 0.015 	learn_time: 0.025 	sum: 9598.545 	size: 320320                        
Episode 16	Total score (averaged over agents) this episode: 18.33549959016964

Environment solved in 16 episodes!	Average Score: 6.08
t_step: 1000 	after first sample, sample_time: 0.017 	update_time: 0.016 	learn_time: 0.024 	sum: 10769.190 	size: 340338                        

training loop:   5% |##                                        | ETA:  11:14:55

t_step: 1000 	after first sample, sample_time: 0.007 	update_time: 0.013 	learn_time: 0.024 	sum: 10766.052 	size: 340340                        
Episode 17	Total score (averaged over agents) this episode: 21.398499521706253

Environment solved in 17 episodes!	Average Score: 6.98
t_step: 1000 	after first sample, sample_time: 0.009 	update_time: 0.012 	learn_time: 0.021 	sum: 12154.316 	size: 360354                        

training loop:   6% |##                                        | ETA:  11:16:15

t_step: 1000 	after first sample, sample_time: 0.007 	update_time: 0.010 	learn_time: 0.032 	sum: 12131.532 	size: 360360                        
Episode 18	Total score (averaged over agents) this episode: 26.123499416094273

Environment solved in 18 episodes!	Average Score: 8.05
t_step: 1000 	after first sample, sample_time: 0.010 	update_time: 0.016 	learn_time: 0.024 	sum: 13528.547 	size: 380374                        

training loop:   6% |##                                        | ETA:  11:18:30

t_step: 1000 	after first sample, sample_time: 0.007 	update_time: 0.018 	learn_time: 0.034 	sum: 13502.792 	size: 380380                        
Episode 19	Total score (averaged over agents) this episode: 30.215499324630947

Environment solved in 19 episodes!	Average Score: 9.21
t_step: 1000 	after first sample, sample_time: 0.006 	update_time: 0.015 	learn_time: 0.031 	sum: 15069.215 	size: 400396                                              

training loop:   6% |##                                        | ETA:  11:19:59

t_step: 1000 	after first sample, sample_time: 0.010 	update_time: 0.016 	learn_time: 0.026 	sum: 15056.241 	size: 400400                        
Episode 20	Total score (averaged over agents) this episode: 31.21999930217862

Environment solved in 20 episodes!	Average Score: 10.31
t_step: 1000 	after first sample, sample_time: 0.012 	update_time: 0.015 	learn_time: 0.025 	sum: 16674.853 	size: 420418                                               

training loop:   7% |##                                        | ETA:  11:21:52

t_step: 1000 	after first sample, sample_time: 0.007 	update_time: 0.017 	learn_time: 0.032 	sum: 16669.984 	size: 420420                        
Episode 21	Total score (averaged over agents) this episode: 37.334499165508895

Environment solved in 21 episodes!	Average Score: 12.16
t_step: 1000 	after first sample, sample_time: 0.013 	update_time: 0.014 	learn_time: 0.022 	sum: 18208.708 	size: 440435                                                                                               

training loop:   7% |###                                       | ETA:  11:23:42

t_step: 1000 	after first sample, sample_time: 0.013 	update_time: 0.013 	learn_time: 0.029 	sum: 18184.763 	size: 440440                        
Episode 22	Total score (averaged over agents) this episode: 37.40049916403368

Environment solved in 22 episodes!	Average Score: 13.98
t_step: 1000 	after first sample, sample_time: 0.009 	update_time: 0.016 	learn_time: 0.028 	sum: 19806.279 	size: 460456                                                                       

training loop:   7% |###                                       | ETA:  11:24:16

t_step: 1000 	after first sample, sample_time: 0.009 	update_time: 0.018 	learn_time: 0.027 	sum: 19788.406 	size: 460460                        
Episode 23	Total score (averaged over agents) this episode: 37.74999915622175

Environment solved in 23 episodes!	Average Score: 15.82
t_step: 1000 	after first sample, sample_time: 0.009 	update_time: 0.016 	learn_time: 0.025 	sum: 21334.734 	size: 480477                        

training loop:   8% |###                                       | ETA:  11:24:55

t_step: 1000 	after first sample, sample_time: 0.011 	update_time: 0.012 	learn_time: 0.019 	sum: 21322.578 	size: 480480                        
Episode 24	Total score (averaged over agents) this episode: 36.94149917429313
t_step: 1000 	after first sample, sample_time: 0.014 	update_time: 0.019 	learn_time: 0.027 	sum: 22793.753 	size: 500497                                               

training loop:   8% |###                                       | ETA:  11:25:37

t_step: 1000 	after first sample, sample_time: 0.008 	update_time: 0.015 	learn_time: 0.026 	sum: 22784.656 	size: 500500                        
Episode 25	Total score (averaged over agents) this episode: 37.254499167297034
t_step: 1000 	after first sample, sample_time: 0.009 	update_time: 0.014 	learn_time: 0.023 	sum: 24089.351 	size: 520518                        

training loop:   8% |###                                       | ETA:  11:26:11

t_step: 1000 	after first sample, sample_time: 0.009 	update_time: 0.016 	learn_time: 0.034 	sum: 24079.682 	size: 520520                        
Episode 26	Total score (averaged over agents) this episode: 37.20549916839227
t_step: 1000 	after first sample, sample_time: 0.011 	update_time: 0.018 	learn_time: 0.028 	sum: 25416.131 	size: 540539                        

training loop:   9% |###                                       | ETA:  11:26:57

t_step: 1000 	after first sample, sample_time: 0.007 	update_time: 0.017 	learn_time: 0.033 	sum: 25413.190 	size: 540540                        
Episode 27	Total score (averaged over agents) this episode: 36.695999179780486
t_step: 1000 	after first sample, sample_time: 0.013 	update_time: 0.016 	learn_time: 0.024 	sum: 26637.191 	size: 560558                                               

training loop:   9% |###                                       | ETA:  11:28:20

t_step: 1000 	after first sample, sample_time: 0.015 	update_time: 0.015 	learn_time: 0.029 	sum: 26632.185 	size: 560560                        
Episode 28	Total score (averaged over agents) this episode: 36.10299919303507
t_step: 1000 	after first sample, sample_time: 0.010 	update_time: 0.016 	learn_time: 0.029 	sum: 27742.341 	size: 580579                                                                       

training loop:   9% |####                                      | ETA:  11:28:57

t_step: 1000 	after first sample, sample_time: 0.007 	update_time: 0.018 	learn_time: 0.026 	sum: 27739.886 	size: 580580                        
Episode 29	Total score (averaged over agents) this episode: 34.931999219208954
t_step: 1000 	after first sample, sample_time: 0.009 	update_time: 0.014 	learn_time: 0.023 	sum: 29090.109 	size: 600596                                               

training loop:  10% |####                                      | ETA:  11:29:56

t_step: 1000 	after first sample, sample_time: 0.011 	update_time: 0.020 	learn_time: 0.028 	sum: 29073.531 	size: 600600                        
Episode 30	Total score (averaged over agents) this episode: 33.089999260380864
t_step: 1000 	after first sample, sample_time: 0.010 	update_time: 0.015 	learn_time: 0.023 	sum: 30274.888 	size: 620618                        

training loop:  10% |####                                      | ETA:  11:30:59

t_step: 1000 	after first sample, sample_time: 0.010 	update_time: 0.024 	learn_time: 0.037 	sum: 30268.503 	size: 620620                        
Episode 31	Total score (averaged over agents) this episode: 36.37799918688834
t_step: 1000 	after first sample, sample_time: 0.007 	update_time: 0.012 	learn_time: 0.026 	sum: 31404.031 	size: 640637                        

training loop:  10% |####                                      | ETA:  11:32:36

t_step: 1000 	after first sample, sample_time: 0.017 	update_time: 0.018 	learn_time: 0.036 	sum: 31388.408 	size: 640640                        
Episode 32	Total score (averaged over agents) this episode: 36.45949918506667
t_step: 1000 	after first sample, sample_time: 0.009 	update_time: 0.013 	learn_time: 0.024 	sum: 32354.956 	size: 660658                        

training loop:  11% |####                                      | ETA:  11:34:29

t_step: 1000 	after first sample, sample_time: 0.011 	update_time: 0.022 	learn_time: 0.030 	sum: 32350.154 	size: 660660                        
Episode 33	Total score (averaged over agents) this episode: 36.63649918111041
t_step: 1000 	after first sample, sample_time: 0.008 	update_time: 0.016 	learn_time: 0.029 	sum: 33515.733 	size: 680677                                                                      

training loop:  11% |####                                      | ETA:  11:36:56

t_step: 1000 	after first sample, sample_time: 0.010 	update_time: 0.014 	learn_time: 0.024 	sum: 33507.159 	size: 680680                        
Episode 34	Total score (averaged over agents) this episode: 35.320499210525306
t_step: 1000 	after first sample, sample_time: 0.011 	update_time: 0.019 	learn_time: 0.027 	sum: 34580.259 	size: 700695                        

training loop:  11% |####                                      | ETA:  11:38:34

t_step: 1000 	after first sample, sample_time: 0.011 	update_time: 0.019 	learn_time: 0.029 	sum: 34558.112 	size: 700700                        
Episode 35	Total score (averaged over agents) this episode: 35.56449920507148
t_step: 1000 	after first sample, sample_time: 0.012 	update_time: 0.021 	learn_time: 0.029 	sum: 35757.536 	size: 720719                                                                                               

training loop:  12% |#####                                     | ETA:  11:39:49

t_step: 1000 	after first sample, sample_time: 0.011 	update_time: 0.017 	learn_time: 0.026 	sum: 35755.433 	size: 720720                        
Episode 36	Total score (averaged over agents) this episode: 35.14549921443686
t_step: 1000 	after first sample, sample_time: 0.013 	update_time: 0.022 	learn_time: 0.031 	sum: 37008.592 	size: 740736                                                                                                                       

training loop:  12% |#####                                     | ETA:  11:40:45

t_step: 1000 	after first sample, sample_time: 0.015 	update_time: 0.017 	learn_time: 0.023 	sum: 36993.491 	size: 740740                        
Episode 37	Total score (averaged over agents) this episode: 36.849999176338315
t_step: 1000 	after first sample, sample_time: 0.006 	update_time: 0.012 	learn_time: 0.020 	sum: 38126.814 	size: 760759                        

training loop:  12% |#####                                     | ETA:  11:41:51

t_step: 1000 	after first sample, sample_time: 0.013 	update_time: 0.026 	learn_time: 0.033 	sum: 38118.597 	size: 760760                        
Episode 38	Total score (averaged over agents) this episode: 35.42849920811132
t_step: 1000 	after first sample, sample_time: 0.013 	update_time: 0.018 	learn_time: 0.025 	sum: 39308.698 	size: 780777                        

training loop:  13% |#####                                     | ETA:  11:42:42

t_step: 1000 	after first sample, sample_time: 0.012 	update_time: 0.024 	learn_time: 0.033 	sum: 39293.229 	size: 780780                        
Episode 39	Total score (averaged over agents) this episode: 35.32399921044707
t_step: 1000 	after first sample, sample_time: 0.010 	update_time: 0.015 	learn_time: 0.030 	sum: 40368.133 	size: 800797                                                                                               

training loop:  13% |#####                                     | ETA:  11:43:52

t_step: 1000 	after first sample, sample_time: 0.011 	update_time: 0.019 	learn_time: 0.026 	sum: 40353.786 	size: 800800                        
Episode 40	Total score (averaged over agents) this episode: 35.454999207518995
t_step: 1000 	after first sample, sample_time: 0.009 	update_time: 0.016 	learn_time: 0.026 	sum: 41552.900 	size: 820816                                               

training loop:  13% |#####                                     | ETA:  11:44:56

t_step: 1000 	after first sample, sample_time: 0.010 	update_time: 0.018 	learn_time: 0.026 	sum: 41532.573 	size: 820820                        
Episode 41	Total score (averaged over agents) this episode: 35.05949921635911
t_step: 1000 	after first sample, sample_time: 0.009 	update_time: 0.016 	learn_time: 0.026 	sum: 42587.731 	size: 840836                                               

training loop:  14% |#####                                     | ETA:  11:45:27

t_step: 1000 	after first sample, sample_time: 0.012 	update_time: 0.017 	learn_time: 0.025 	sum: 42570.610 	size: 840840                        
Episode 42	Total score (averaged over agents) this episode: 34.851999220997094
t_step: 1000 	after first sample, sample_time: 0.015 	update_time: 0.025 	learn_time: 0.031 	sum: 43754.725 	size: 860857                        

training loop:  14% |######                                    | ETA:  11:45:57

t_step: 1000 	after first sample, sample_time: 0.007 	update_time: 0.015 	learn_time: 0.025 	sum: 43746.053 	size: 860860                        
Episode 43	Total score (averaged over agents) this episode: 34.259999234229326
t_step: 1000 	after first sample, sample_time: 0.013 	update_time: 0.018 	learn_time: 0.036 	sum: 44918.696 	size: 880876                        

training loop:  14% |######                                    | ETA:  11:46:15

t_step: 1000 	after first sample, sample_time: 0.009 	update_time: 0.018 	learn_time: 0.026 	sum: 44899.628 	size: 880880                        
Episode 44	Total score (averaged over agents) this episode: 34.706999224238096
t_step: 1000 	after first sample, sample_time: 0.011 	update_time: 0.017 	learn_time: 0.030 	sum: 45942.408 	size: 900896                                              

training loop:  15% |######                                    | ETA:  11:47:02

t_step: 1000 	after first sample, sample_time: 0.014 	update_time: 0.018 	learn_time: 0.025 	sum: 45928.198 	size: 900900                        
Episode 45	Total score (averaged over agents) this episode: 36.91699917484075
t_step: 1000 	after first sample, sample_time: 0.013 	update_time: 0.018 	learn_time: 0.029 	sum: 47204.118 	size: 920919                                               

training loop:  15% |######                                    | ETA:  11:47:23

t_step: 1000 	after first sample, sample_time: 0.010 	update_time: 0.018 	learn_time: 0.153 	sum: 47199.461 	size: 920920                        
Episode 46	Total score (averaged over agents) this episode: 36.51749918377027
t_step: 1000 	after first sample, sample_time: 0.014 	update_time: 0.017 	learn_time: 0.031 	sum: 48342.452 	size: 940935                        

training loop:  15% |######                                    | ETA:  11:47:18

t_step: 1000 	after first sample, sample_time: 0.006 	update_time: 0.019 	learn_time: 0.027 	sum: 48318.884 	size: 940940                        
Episode 47	Total score (averaged over agents) this episode: 37.99599915072322

Environment solved in 47 episodes!	Average Score: 35.65
t_step: 1000 	after first sample, sample_time: 0.013 	update_time: 0.018 	learn_time: 0.027 	sum: 49169.103 	size: 960955                                                                                              

training loop:  16% |######                                    | ETA:  11:47:32

t_step: 1000 	after first sample, sample_time: 0.013 	update_time: 0.022 	learn_time: 0.034 	sum: 49150.122 	size: 960960                        
Episode 48	Total score (averaged over agents) this episode: 37.38049916448072
t_step: 1000 	after first sample, sample_time: 0.015 	update_time: 0.021 	learn_time: 0.031 	sum: 50544.912 	size: 980975                        

training loop:  16% |######                                    | ETA:  11:47:27

t_step: 1000 	after first sample, sample_time: 0.011 	update_time: 0.018 	learn_time: 0.028 	sum: 50528.528 	size: 980980                        
Episode 49	Total score (averaged over agents) this episode: 36.82499917689711
t_step: 1000 	after first sample, sample_time: 0.014 	update_time: 0.020 	learn_time: 0.030 	sum: 51587.780 	size: 1000000                        

training loop:  16% |#######                                   | ETA:  11:47:24

t_step: 1000 	after first sample, sample_time: 0.016 	update_time: 0.025 	learn_time: 0.035 	sum: 51582.819 	size: 1000000                        
Episode 50	Total score (averaged over agents) this episode: 37.534499161038546
t_step: 1000 	after first sample, sample_time: 0.013 	update_time: 0.019 	learn_time: 0.029 	sum: 52358.846 	size: 1000000                        

training loop:  17% |#######                                   | ETA:  11:48:32

t_step: 1000 	after first sample, sample_time: 0.010 	update_time: 0.021 	learn_time: 0.036 	sum: 52337.442 	size: 1000000                        
Episode 51	Total score (averaged over agents) this episode: 37.74949915623292
t_step: 1000 	after first sample, sample_time: 0.015 	update_time: 0.021 	learn_time: 0.029 	sum: 52859.793 	size: 1000000                                               

training loop:  17% |#######                                   | ETA:  11:49:29

t_step: 1000 	after first sample, sample_time: 0.009 	update_time: 0.021 	learn_time: 0.031 	sum: 52844.039 	size: 1000000                        
Episode 52	Total score (averaged over agents) this episode: 36.716999179311095
t_step: 1000 	after first sample, sample_time: 0.016 	update_time: 0.026 	learn_time: 0.034 	sum: 53418.232 	size: 1000000                        

training loop:  17% |#######                                   | ETA:  11:50:03

t_step: 1000 	after first sample, sample_time: 0.010 	update_time: 0.017 	learn_time: 0.024 	sum: 53398.858 	size: 1000000                        
Episode 53	Total score (averaged over agents) this episode: 37.062999171577395
t_step: 1000 	after first sample, sample_time: 0.010 	update_time: 0.018 	learn_time: 0.026 	sum: 53823.384 	size: 1000000                                                                                               

training loop:  18% |#######                                   | ETA:  11:50:21

t_step: 1000 	after first sample, sample_time: 0.014 	update_time: 0.017 	learn_time: 0.025 	sum: 53803.038 	size: 1000000                        
Episode 54	Total score (averaged over agents) this episode: 36.656499180663374
t_step: 1000 	after first sample, sample_time: 0.020 	update_time: 0.019 	learn_time: 0.027 	sum: 54061.223 	size: 1000000                        

training loop:  18% |#######                                   | ETA:  11:50:56

t_step: 1000 	after first sample, sample_time: 0.012 	update_time: 0.021 	learn_time: 0.030 	sum: 54058.064 	size: 1000000                        
Episode 55	Total score (averaged over agents) this episode: 35.9274991969578
t_step: 1000 	after first sample, sample_time: 0.014 	update_time: 0.032 	learn_time: 0.157 	sum: 54266.737 	size: 1000000                        

training loop:  18% |#######                                   | ETA:  11:51:32

t_step: 1000 	after first sample, sample_time: 0.014 	update_time: 0.023 	learn_time: 0.037 	sum: 54249.893 	size: 1000000                        
Episode 56	Total score (averaged over agents) this episode: 38.053499149438

Environment solved in 56 episodes!	Average Score: 36.36
t_step: 1000 	after first sample, sample_time: 0.011 	update_time: 0.020 	learn_time: 0.028 	sum: 54193.044 	size: 1000000                        

training loop:  19% |#######                                   | ETA:  11:52:45

t_step: 1000 	after first sample, sample_time: 0.015 	update_time: 0.019 	learn_time: 0.027 	sum: 54188.663 	size: 1000000                        
Episode 57	Total score (averaged over agents) this episode: 36.759499178361146
t_step: 1000 	after first sample, sample_time: 0.008 	update_time: 0.017 	learn_time: 0.024 	sum: 54099.117 	size: 1000000                                                                       

training loop:  19% |########                                  | ETA:  11:53:54

t_step: 1000 	after first sample, sample_time: 0.015 	update_time: 0.026 	learn_time: 0.035 	sum: 54088.607 	size: 1000000                        
Episode 58	Total score (averaged over agents) this episode: 35.83049919912592
t_step: 1000 	after first sample, sample_time: 0.012 	update_time: 0.030 	learn_time: 0.114 	sum: 54172.357 	size: 1000000                                                                       

training loop:  19% |########                                  | ETA:  11:55:34

t_step: 1000 	after first sample, sample_time: 0.008 	update_time: 0.020 	learn_time: 0.028 	sum: 54170.626 	size: 1000000                        
Episode 59	Total score (averaged over agents) this episode: 36.914499174896626
t_step: 1000 	after first sample, sample_time: 0.013 	update_time: 0.024 	learn_time: 0.033 	sum: 54292.597 	size: 1000000                        

training loop:  20% |########                                  | ETA:  11:56:50

t_step: 1000 	after first sample, sample_time: 0.019 	update_time: 0.021 	learn_time: 0.030 	sum: 54286.474 	size: 1000000                        
Episode 60	Total score (averaged over agents) this episode: 36.499999184161425
t_step: 1000 	after first sample, sample_time: 0.016 	update_time: 0.031 	learn_time: 0.040 	sum: 54133.514 	size: 1000000                                               

training loop:  20% |########                                  | ETA:  11:58:25

t_step: 1000 	after first sample, sample_time: 0.011 	update_time: 0.024 	learn_time: 0.033 	sum: 54120.865 	size: 1000000                        
Episode 61	Total score (averaged over agents) this episode: 37.96899915132671
t_step: 1000 	after first sample, sample_time: 0.015 	update_time: 0.032 	learn_time: 0.039 	sum: 53834.281 	size: 1000000                        

training loop:  20% |########                                  | ETA:  11:59:36

t_step: 1000 	after first sample, sample_time: 0.020 	update_time: 0.037 	learn_time: 0.046 	sum: 53826.825 	size: 1000000                        
Episode 62	Total score (averaged over agents) this episode: 33.25999925658107
t_step: 1000 	after first sample, sample_time: 0.012 	update_time: 0.025 	learn_time: 0.034 	sum: 53512.286 	size: 1000000                                                                      

training loop:  21% |########                                  | ETA:  12:01:38

t_step: 1000 	after first sample, sample_time: 0.018 	update_time: 0.033 	learn_time: 0.043 	sum: 53500.116 	size: 1000000                        
Episode 63	Total score (averaged over agents) this episode: 34.18499923590571
t_step: 1000 	after first sample, sample_time: 0.017 	update_time: 0.032 	learn_time: 0.046 	sum: 53304.344 	size: 1000000                        

training loop:  21% |########                                  | ETA:  12:04:04

t_step: 1000 	after first sample, sample_time: 0.012 	update_time: 0.032 	learn_time: 0.038 	sum: 53287.783 	size: 1000000                        
Episode 64	Total score (averaged over agents) this episode: 35.14299921449274
t_step: 1000 	after first sample, sample_time: 0.014 	update_time: 0.024 	learn_time: 0.035 	sum: 53057.010 	size: 1000000                                               

training loop:  21% |#########                                 | ETA:  12:05:47

t_step: 1000 	after first sample, sample_time: 0.013 	update_time: 0.026 	learn_time: 0.034 	sum: 53028.979 	size: 1000000                        
Episode 65	Total score (averaged over agents) this episode: 32.780999267287555
t_step: 1000 	after first sample, sample_time: 0.015 	update_time: 0.032 	learn_time: 0.043 	sum: 52738.381 	size: 1000000                        

training loop:  22% |#########                                 | ETA:  12:07:24

t_step: 1000 	after first sample, sample_time: 0.016 	update_time: 0.037 	learn_time: 0.046 	sum: 52708.827 	size: 1000000                        
Episode 66	Total score (averaged over agents) this episode: 33.72349924622104
t_step: 1000 	after first sample, sample_time: 0.020 	update_time: 0.033 	learn_time: 0.040 	sum: 52507.115 	size: 1000000                                               

training loop:  22% |#########                                 | ETA:  12:10:04

t_step: 1000 	after first sample, sample_time: 0.016 	update_time: 0.041 	learn_time: 0.048 	sum: 52491.938 	size: 1000000                        
Episode 67	Total score (averaged over agents) this episode: 35.06799921616912
t_step: 1000 	after first sample, sample_time: 0.012 	update_time: 0.038 	learn_time: 0.045 	sum: 52471.294 	size: 1000000                                                                                              

training loop:  22% |#########                                 | ETA:  12:13:45

t_step: 1000 	after first sample, sample_time: 0.019 	update_time: 0.046 	learn_time: 0.053 	sum: 52451.024 	size: 1000000                        
Episode 68	Total score (averaged over agents) this episode: 33.53649925040081
t_step: 1000 	after first sample, sample_time: 0.030 	update_time: 0.055 	learn_time: 0.062 	sum: 52233.430 	size: 1000000                                                                       

training loop:  23% |#########                                 | ETA:  12:18:42

t_step: 1000 	after first sample, sample_time: 0.022 	update_time: 0.049 	learn_time: 0.056 	sum: 52229.606 	size: 1000000                        
Episode 69	Total score (averaged over agents) this episode: 36.287999188899995
t_step: 1000 	after first sample, sample_time: 0.024 	update_time: 0.052 	learn_time: 0.062 	sum: 51776.495 	size: 1000000                        

training loop:  23% |#########                                 | ETA:  12:25:05

t_step: 1000 	after first sample, sample_time: 0.034 	update_time: 0.072 	learn_time: 0.081 	sum: 51769.146 	size: 1000000                        
Episode 70	Total score (averaged over agents) this episode: 36.93799917437136
t_step: 1000 	after first sample, sample_time: 0.025 	update_time: 0.055 	learn_time: 0.062 	sum: 51646.025 	size: 1000000                                               

training loop:  23% |#########                                 | ETA:  12:33:08

t_step: 1000 	after first sample, sample_time: 0.021 	update_time: 0.085 	learn_time: 0.092 	sum: 51636.578 	size: 1000000                        
Episode 71	Total score (averaged over agents) this episode: 33.743499245774004
t_step: 1000 	after first sample, sample_time: 0.035 	update_time: 0.057 	learn_time: 0.071 	sum: 51717.149 	size: 1000000                        

training loop:  24% |##########                                | ETA:  12:42:45

t_step: 1000 	after first sample, sample_time: 0.033 	update_time: 0.077 	learn_time: 0.092 	sum: 51705.731 	size: 1000000                        
Episode 72	Total score (averaged over agents) this episode: 33.18099925834686
t_step: 1000 	after first sample, sample_time: 0.044 	update_time: 0.089 	learn_time: 0.096 	sum: 51715.745 	size: 1000000                        

training loop:  24% |##########                                | ETA:  12:55:04

t_step: 1000 	after first sample, sample_time: 0.036 	update_time: 0.080 	learn_time: 0.090 	sum: 51706.000 	size: 1000000                        
Episode 73	Total score (averaged over agents) this episode: 33.190499258134516
t_step: 1000 	after first sample, sample_time: 0.036 	update_time: 0.072 	learn_time: 0.080 	sum: 51594.055 	size: 1000000                        

training loop:  24% |##########                                | ETA:  13:10:30

t_step: 1000 	after first sample, sample_time: 0.040 	update_time: 0.093 	learn_time: 0.100 	sum: 51572.773 	size: 1000000                        
Episode 74	Total score (averaged over agents) this episode: 33.56199924983084
t_step: 1000 	after first sample, sample_time: 0.033 	update_time: 0.095 	learn_time: 0.104 	sum: 51378.923 	size: 1000000                        

training loop:  25% |##########                                | ETA:  13:30:08

t_step: 1000 	after first sample, sample_time: 0.050 	update_time: 0.113 	learn_time: 0.120 	sum: 51373.431 	size: 1000000                        
Episode 75	Total score (averaged over agents) this episode: 29.401499342825264
t_step: 1000 	after first sample, sample_time: 0.070 	update_time: 0.119 	learn_time: 0.125 	sum: 51278.807 	size: 1000000                                                                                                                       

training loop:  25% |##########                                | ETA:  13:52:00

t_step: 1000 	after first sample, sample_time: 0.058 	update_time: 0.141 	learn_time: 0.148 	sum: 51278.099 	size: 1000000                        
Episode 76	Total score (averaged over agents) this episode: 31.475499296467753
t_step: 1000 	after first sample, sample_time: 0.062 	update_time: 0.144 	learn_time: 0.155 	sum: 51341.628 	size: 1000000                                               

training loop:  25% |##########                                | ETA:  14:19:49

t_step: 1000 	after first sample, sample_time: 0.063 	update_time: 0.193 	learn_time: 0.201 	sum: 51336.509 	size: 1000000                        
Episode 77	Total score (averaged over agents) this episode: 32.642999270372094
t_step: 1000 	after first sample, sample_time: 0.083 	update_time: 0.162 	learn_time: 0.169 	sum: 51501.968 	size: 1000000                                                                                             

training loop:  26% |##########                                | ETA:  14:51:20

t_step: 1000 	after first sample, sample_time: 0.092 	update_time: 0.174 	learn_time: 0.181 	sum: 51495.846 	size: 1000000                        
Episode 78	Total score (averaged over agents) this episode: 31.900499286968262
t_step: 1000 	after first sample, sample_time: 0.084 	update_time: 0.204 	learn_time: 0.211 	sum: 51811.725 	size: 1000000                                                                       

training loop:  26% |###########                               | ETA:  15:25:17

t_step: 1000 	after first sample, sample_time: 0.070 	update_time: 0.202 	learn_time: 0.210 	sum: 51801.639 	size: 1000000                        
Episode 79	Total score (averaged over agents) this episode: 28.37049936586991
t_step: 1000 	after first sample, sample_time: 0.092 	update_time: 0.233 	learn_time: 0.241 	sum: 51811.012 	size: 1000000                        

training loop:  26% |###########                               | ETA:  16:06:19

t_step: 1000 	after first sample, sample_time: 0.107 	update_time: 0.239 	learn_time: 0.249 	sum: 51796.194 	size: 1000000                        
Episode 80	Total score (averaged over agents) this episode: 32.1374992816709
t_step: 1000 	after first sample, sample_time: 0.106 	update_time: 0.272 	learn_time: 0.280 	sum: 51980.491 	size: 1000000                        

training loop:  27% |###########                               | ETA:  16:54:22

t_step: 1000 	after first sample, sample_time: 0.103 	update_time: 0.297 	learn_time: 0.305 	sum: 51970.796 	size: 1000000                        
Episode 81	Total score (averaged over agents) this episode: 30.002499329391867
t_step: 1000 	after first sample, sample_time: 0.132 	update_time: 0.277 	learn_time: 0.284 	sum: 52472.310 	size: 1000000                        

training loop:  27% |###########                               | ETA:  17:49:39

t_step: 1000 	after first sample, sample_time: 0.155 	update_time: 0.312 	learn_time: 0.327 	sum: 52462.585 	size: 1000000                        
Episode 82	Total score (averaged over agents) this episode: 33.243499256949875
t_step: 1000 	after first sample, sample_time: 0.114 	update_time: 0.341 	learn_time: 0.361 	sum: 52630.427 	size: 1000000                        

training loop:  27% |###########                               | ETA:  18:54:13

t_step: 1000 	after first sample, sample_time: 0.220 	update_time: 0.352 	learn_time: 0.366 	sum: 52619.840 	size: 1000000                        
Episode 83	Total score (averaged over agents) this episode: 31.74199929051101
t_step: 1000 	after first sample, sample_time: 0.151 	update_time: 0.386 	learn_time: 0.392 	sum: 52702.845 	size: 1000000                        

training loop:  28% |###########                               | ETA:  20:05:22

t_step: 1000 	after first sample, sample_time: 0.184 	update_time: 0.419 	learn_time: 0.494 	sum: 52700.056 	size: 1000000                        
Episode 84	Total score (averaged over agents) this episode: 32.518999273143706
t_step: 1000 	after first sample, sample_time: 0.255 	update_time: 0.527 	learn_time: 0.544 	sum: 52564.045 	size: 1000000                        

training loop:  28% |###########                               | ETA:  21:25:06

t_step: 1000 	after first sample, sample_time: 0.168 	update_time: 0.414 	learn_time: 0.426 	sum: 52559.272 	size: 1000000                        
Episode 85	Total score (averaged over agents) this episode: 33.34749925462529
t_step: 1000 	after first sample, sample_time: 0.182 	update_time: 0.460 	learn_time: 0.467 	sum: 52666.153 	size: 1000000                        

training loop:  28% |############                              | ETA:  23:00:58

t_step: 1000 	after first sample, sample_time: 0.222 	update_time: 0.560 	learn_time: 0.568 	sum: 52660.520 	size: 1000000                        
Episode 86	Total score (averaged over agents) this episode: 31.503999295830727
t_step: 1000 	after first sample, sample_time: 0.313 	update_time: 0.655 	learn_time: 0.663 	sum: 53011.933 	size: 1000000                        

training loop:  29% |##########                          | ETA:  1 day, 0:47:18

t_step: 1000 	after first sample, sample_time: 0.252 	update_time: 0.611 	learn_time: 0.620 	sum: 53000.468 	size: 1000000                        
Episode 87	Total score (averaged over agents) this episode: 29.65449933717027
t_step: 1000 	after first sample, sample_time: 0.264 	update_time: 0.654 	learn_time: 0.663 	sum: 53164.322 	size: 1000000                        

training loop:  29% |##########                          | ETA:  1 day, 2:48:06

t_step: 1000 	after first sample, sample_time: 0.293 	update_time: 0.722 	learn_time: 0.732 	sum: 53162.420 	size: 1000000                        
Episode 88	Total score (averaged over agents) this episode: 29.663999336957932
t_step: 1000 	after first sample, sample_time: 0.243 	update_time: 0.540 	learn_time: 0.547 	sum: 53165.089 	size: 1000000                        

training loop:  29% |##########                          | ETA:  1 day, 5:02:57

t_step: 1000 	after first sample, sample_time: 0.267 	update_time: 0.592 	learn_time: 0.599 	sum: 53161.478 	size: 1000000                        
Episode 89	Total score (averaged over agents) this episode: 28.737999357655646
t_step: 1000 	after first sample, sample_time: 0.224 	update_time: 0.698 	learn_time: 0.704 	sum: 53188.689 	size: 1000000                        

training loop:  30% |##########                          | ETA:  1 day, 7:00:49

t_step: 1000 	after first sample, sample_time: 0.251 	update_time: 0.527 	learn_time: 0.533 	sum: 53176.661 	size: 1000000                        
Episode 90	Total score (averaged over agents) this episode: 32.513999273255465
t_step: 1000 	after first sample, sample_time: 0.267 	update_time: 0.713 	learn_time: 0.725 	sum: 52992.000 	size: 1000000                        

training loop:  30% |##########                          | ETA:  1 day, 9:04:19

t_step: 1000 	after first sample, sample_time: 0.322 	update_time: 0.737 	learn_time: 0.744 	sum: 52983.816 	size: 1000000                        
Episode 91	Total score (averaged over agents) this episode: 30.446499319467694
t_step: 1000 	after first sample, sample_time: 0.428 	update_time: 0.902 	learn_time: 0.909 	sum: 53070.542 	size: 1000000                        

training loop:  30% |##########                         | ETA:  1 day, 11:24:09

t_step: 1000 	after first sample, sample_time: 0.402 	update_time: 0.989 	learn_time: 0.998 	sum: 53065.030 	size: 1000000                        
Episode 92	Total score (averaged over agents) this episode: 26.43949940903112
t_step: 1000 	after first sample, sample_time: 0.255 	update_time: 0.611 	learn_time: 0.617 	sum: 53092.987 	size: 1000000                        

training loop:  31% |##########                         | ETA:  1 day, 13:18:06

t_step: 1000 	after first sample, sample_time: 0.250 	update_time: 0.685 	learn_time: 0.691 	sum: 53090.234 	size: 1000000                        
Episode 93	Total score (averaged over agents) this episode: 26.766499401722104
t_step: 1000 	after first sample, sample_time: 0.402 	update_time: 1.054 	learn_time: 1.084 	sum: 52056.203 	size: 1000000                        

training loop:  31% |##########                         | ETA:  1 day, 15:52:47

t_step: 1000 	after first sample, sample_time: 0.415 	update_time: 1.048 	learn_time: 1.054 	sum: 52037.366 	size: 1000000                        
Episode 94	Total score (averaged over agents) this episode: 28.719999358057976
t_step: 1000 	after first sample, sample_time: 0.448 	update_time: 1.065 	learn_time: 1.071 	sum: 47424.444 	size: 1000000                        

training loop:  31% |###########                        | ETA:  1 day, 19:34:58

t_step: 1000 	after first sample, sample_time: 0.521 	update_time: 1.055 	learn_time: 1.066 	sum: 47413.079 	size: 1000000                        
Episode 95	Total score (averaged over agents) this episode: 31.698999291472138
t_step: 1000 	after first sample, sample_time: 0.423 	update_time: 0.999 	learn_time: 1.010 	sum: 45272.550 	size: 1000000                        

training loop:  32% |###########                        | ETA:  1 day, 23:21:32

t_step: 1000 	after first sample, sample_time: 0.472 	update_time: 0.995 	learn_time: 1.003 	sum: 45260.740 	size: 1000000                        
Episode 96	Total score (averaged over agents) this episode: 37.7904991553165
t_step: 1000 	after first sample, sample_time: 0.307 	update_time: 0.746 	learn_time: 0.753 	sum: 44618.473 	size: 1000000                        

training loop:  32% |###########                        | ETA:  2 days, 1:45:44

t_step: 1000 	after first sample, sample_time: 0.325 	update_time: 0.771 	learn_time: 0.777 	sum: 44609.210 	size: 1000000                        
Episode 97	Total score (averaged over agents) this episode: 36.38449918674305
t_step: 1000 	after first sample, sample_time: 0.352 	update_time: 0.762 	learn_time: 0.769 	sum: 44498.739 	size: 1000000                        

training loop:  32% |###########                        | ETA:  2 days, 3:42:43

t_step: 1000 	after first sample, sample_time: 0.306 	update_time: 0.725 	learn_time: 0.731 	sum: 44501.677 	size: 1000000                        
Episode 98	Total score (averaged over agents) this episode: 39.07149912668392

Environment solved in 98 episodes!	Average Score: 31.54
t_step: 1000 	after first sample, sample_time: 0.387 	update_time: 0.934 	learn_time: 0.941 	sum: 44600.351 	size: 1000000                        

training loop:  33% |###########                        | ETA:  2 days, 6:11:49

t_step: 1000 	after first sample, sample_time: 0.410 	update_time: 0.995 	learn_time: 1.004 	sum: 44593.833 	size: 1000000                        
Episode 99	Total score (averaged over agents) this episode: 39.39499911945313

Environment solved in 99 episodes!	Average Score: 32.09
t_step: 1000 	after first sample, sample_time: 0.427 	update_time: 0.899 	learn_time: 0.926 	sum: 44622.562 	size: 1000000                        

training loop:  33% |###########                        | ETA:  2 days, 8:33:42

t_step: 1000 	after first sample, sample_time: 0.436 	update_time: 0.848 	learn_time: 0.870 	sum: 44613.195 	size: 1000000                        
Episode 100	Total score (averaged over agents) this episode: 39.23599912300706
t_step: 1000 	after first sample, sample_time: 0.348 	update_time: 0.833 	learn_time: 0.840 	sum: 44497.822 	size: 1000000                        

training loop: 100% |####################################| Time: 1 day, 5:47:59

t_step: 1000 	after first sample, sample_time: 0.393 	update_time: 0.857 	learn_time: 0.864 	sum: 44487.875 	size: 1000000                        
Episode 101	Total score (averaged over agents) this episode: 39.44799911826849

Environment solved in 101 episodes!	Average Score: 32.92
training done





In [8]:
# training
prioritized_learn = False
scores = Learn(env, 300, 10000, 0.0, 30.0, prioritized_learn)

training loop:   0% |                                          | ETA:  --:--:--

state size: 33 action size: 4


training loop:   0% |                                          | ETA:  17:07:11

Episode 1	Total score (averaged over agents) this episode: 0.5969999866560102

Environment solved in 1 episodes!	Average Score: 0.60


training loop:   0% |                                          | ETA:  16:16:28

Episode 2	Total score (averaged over agents) this episode: 0.6334999858401715

Environment solved in 2 episodes!	Average Score: 0.62


training loop:   1% |                                          | ETA:  16:14:53

Episode 3	Total score (averaged over agents) this episode: 1.0124999773688614

Environment solved in 3 episodes!	Average Score: 0.75


training loop:   1% |                                          | ETA:  16:36:00

Episode 4	Total score (averaged over agents) this episode: 1.4824999668635428

Environment solved in 4 episodes!	Average Score: 0.93


training loop:   1% |                                          | ETA:  16:31:28

Episode 5	Total score (averaged over agents) this episode: 1.765499960538

Environment solved in 5 episodes!	Average Score: 1.10


training loop:   2% |                                          | ETA:  16:27:24

Episode 6	Total score (averaged over agents) this episode: 2.2264999502338467

Environment solved in 6 episodes!	Average Score: 1.29


training loop:   2% |                                          | ETA:  16:34:56

Episode 7	Total score (averaged over agents) this episode: 2.4389999454841016

Environment solved in 7 episodes!	Average Score: 1.45


training loop:   2% |#                                         | ETA:  16:32:21

Episode 8	Total score (averaged over agents) this episode: 3.768999915756285

Environment solved in 8 episodes!	Average Score: 1.74


training loop:   3% |#                                         | ETA:  16:27:37

Episode 9	Total score (averaged over agents) this episode: 3.1969999285414814


training loop:   3% |#                                         | ETA:  16:17:28

Episode 10	Total score (averaged over agents) this episode: 2.5074999439530075


training loop:   3% |#                                         | ETA:  16:13:42

Episode 11	Total score (averaged over agents) this episode: 2.9969999330118298


training loop:   4% |#                                         | ETA:  16:13:50

Episode 12	Total score (averaged over agents) this episode: 3.101999930664897


training loop:   4% |#                                         | ETA:  16:08:24

Episode 13	Total score (averaged over agents) this episode: 3.263499927055091


training loop:   4% |#                                         | ETA:  16:05:23

Episode 14	Total score (averaged over agents) this episode: 3.8519999139010905

Environment solved in 14 episodes!	Average Score: 2.35


training loop:   5% |##                                        | ETA:  16:10:12

Episode 15	Total score (averaged over agents) this episode: 4.766999893449247

Environment solved in 15 episodes!	Average Score: 2.51


training loop:   5% |##                                        | ETA:  16:18:19

Episode 16	Total score (averaged over agents) this episode: 5.023999887704849

Environment solved in 16 episodes!	Average Score: 2.66


training loop:   5% |##                                        | ETA:  16:20:09

Episode 17	Total score (averaged over agents) this episode: 5.53749987622723

Environment solved in 17 episodes!	Average Score: 2.83


training loop:   6% |##                                        | ETA:  16:17:03

Episode 18	Total score (averaged over agents) this episode: 7.0034998434595765

Environment solved in 18 episodes!	Average Score: 3.07


training loop:   6% |##                                        | ETA:  16:14:43

Episode 19	Total score (averaged over agents) this episode: 6.449499855842442


training loop:   6% |##                                        | ETA:  16:12:49

Episode 20	Total score (averaged over agents) this episode: 7.763999826461077

Environment solved in 20 episodes!	Average Score: 3.47


training loop:   7% |##                                        | ETA:  16:13:50

Episode 21	Total score (averaged over agents) this episode: 7.745499826874584


training loop:   7% |###                                       | ETA:  16:13:27

Episode 22	Total score (averaged over agents) this episode: 7.492499832529575


training loop:   7% |###                                       | ETA:  16:10:13

Episode 23	Total score (averaged over agents) this episode: 6.854499846789986


training loop:   8% |###                                       | ETA:  16:07:35

Episode 24	Total score (averaged over agents) this episode: 7.843499824684113

Environment solved in 24 episodes!	Average Score: 4.78


training loop:   8% |###                                       | ETA:  16:06:35

Episode 25	Total score (averaged over agents) this episode: 8.717999805137515

Environment solved in 25 episodes!	Average Score: 5.13


training loop:   8% |###                                       | ETA:  16:11:25

Episode 26	Total score (averaged over agents) this episode: 9.204999794252217

Environment solved in 26 episodes!	Average Score: 5.48


training loop:   9% |###                                       | ETA:  16:12:12

Episode 27	Total score (averaged over agents) this episode: 9.500499787647277

Environment solved in 27 episodes!	Average Score: 5.83


training loop:   9% |###                                       | ETA:  16:11:04

Episode 28	Total score (averaged over agents) this episode: 9.043499797862022


training loop:   9% |####                                      | ETA:  16:12:41

Episode 29	Total score (averaged over agents) this episode: 8.712999805249273


training loop:  10% |####                                      | ETA:  16:11:24

Episode 30	Total score (averaged over agents) this episode: 8.99599979892373


training loop:  10% |####                                      | ETA:  16:13:51

Episode 31	Total score (averaged over agents) this episode: 9.817499780561775

Environment solved in 31 episodes!	Average Score: 7.03


training loop:  10% |####                                      | ETA:  16:17:15

Episode 32	Total score (averaged over agents) this episode: 8.47149981064722


training loop:  11% |####                                      | ETA:  16:15:33

Episode 33	Total score (averaged over agents) this episode: 10.238499771151691

Environment solved in 33 episodes!	Average Score: 7.65


training loop:  11% |####                                      | ETA:  16:14:17

Episode 34	Total score (averaged over agents) this episode: 11.491999743133784

Environment solved in 34 episodes!	Average Score: 8.03


training loop:  11% |####                                      | ETA:  16:12:14

Episode 35	Total score (averaged over agents) this episode: 11.25549974841997


training loop:  12% |#####                                     | ETA:  16:13:56

Episode 36	Total score (averaged over agents) this episode: 10.93449975559488


training loop:  12% |#####                                     | ETA:  16:16:05

Episode 37	Total score (averaged over agents) this episode: 13.38549970081076

Environment solved in 37 episodes!	Average Score: 9.05


training loop:  12% |#####                                     | ETA:  16:14:25

Episode 38	Total score (averaged over agents) this episode: 12.335999724268913


training loop:  13% |#####                                     | ETA:  16:12:43

Episode 39	Total score (averaged over agents) this episode: 12.091999729722739


training loop:  13% |#####                                     | ETA:  16:13:00

Episode 40	Total score (averaged over agents) this episode: 12.813999713584781


training loop:  13% |#####                                     | ETA:  16:13:36

Episode 41	Total score (averaged over agents) this episode: 12.144499728549272


training loop:  14% |#####                                     | ETA:  16:18:15

Episode 42	Total score (averaged over agents) this episode: 12.192499727476388


training loop:  14% |######                                    | ETA:  16:18:18

Episode 43	Total score (averaged over agents) this episode: 12.321499724593014


training loop:  14% |######                                    | ETA:  16:18:48

Episode 44	Total score (averaged over agents) this episode: 12.665999716892838


training loop:  15% |######                                    | ETA:  16:19:46

Episode 45	Total score (averaged over agents) this episode: 13.589499696251005

Environment solved in 45 episodes!	Average Score: 11.06


training loop:  15% |######                                    | ETA:  16:19:04

Episode 46	Total score (averaged over agents) this episode: 13.329499702062458


training loop:  15% |######                                    | ETA:  16:22:06

Episode 47	Total score (averaged over agents) this episode: 14.674499671999365

Environment solved in 47 episodes!	Average Score: 11.53


training loop:  16% |######                                    | ETA:  16:23:11

Episode 48	Total score (averaged over agents) this episode: 14.634999672882259


training loop:  16% |######                                    | ETA:  16:21:21

Episode 49	Total score (averaged over agents) this episode: 14.210999682359397


training loop:  16% |#######                                   | ETA:  16:22:18

Episode 50	Total score (averaged over agents) this episode: 15.300999657995998

Environment solved in 50 episodes!	Average Score: 12.40


training loop:  17% |#######                                   | ETA:  16:22:29

Episode 51	Total score (averaged over agents) this episode: 12.985499709751457


training loop:  17% |#######                                   | ETA:  16:23:49

Episode 52	Total score (averaged over agents) this episode: 15.603499651234596

Environment solved in 52 episodes!	Average Score: 12.91


training loop:  17% |#######                                   | ETA:  16:23:11

Episode 53	Total score (averaged over agents) this episode: 15.886499644909055

Environment solved in 53 episodes!	Average Score: 13.19


training loop:  18% |#######                                   | ETA:  16:23:53

Episode 54	Total score (averaged over agents) this episode: 15.849999645724893


training loop:  18% |#######                                   | ETA:  16:23:04

Episode 55	Total score (averaged over agents) this episode: 16.40649963328615

Environment solved in 55 episodes!	Average Score: 13.67


training loop:  18% |#######                                   | ETA:  16:24:42

Episode 56	Total score (averaged over agents) this episode: 17.31849961290136

Environment solved in 56 episodes!	Average Score: 13.99


training loop:  19% |#######                                   | ETA:  16:22:33

Episode 57	Total score (averaged over agents) this episode: 17.275999613851308


training loop:  19% |########                                  | ETA:  16:19:32

Episode 58	Total score (averaged over agents) this episode: 17.622999606095256

Environment solved in 58 episodes!	Average Score: 14.45


training loop:  19% |########                                  | ETA:  16:19:46

Episode 59	Total score (averaged over agents) this episode: 16.59449962908402


training loop:  20% |########                                  | ETA:  16:19:13

Episode 60	Total score (averaged over agents) this episode: 14.647499672602862


training loop:  20% |########                                  | ETA:  16:19:31

Episode 61	Total score (averaged over agents) this episode: 15.583999651670457


training loop:  20% |########                                  | ETA:  16:16:50

Episode 62	Total score (averaged over agents) this episode: 14.270499681029468


training loop:  21% |########                                  | ETA:  16:14:30

Episode 63	Total score (averaged over agents) this episode: 16.675999627262353


training loop:  21% |########                                  | ETA:  16:11:46

Episode 64	Total score (averaged over agents) this episode: 18.15899959411472

Environment solved in 64 episodes!	Average Score: 15.53


training loop:  21% |#########                                 | ETA:  16:09:37

Episode 65	Total score (averaged over agents) this episode: 19.722499559167773

Environment solved in 65 episodes!	Average Score: 15.84


training loop:  22% |#########                                 | ETA:  16:05:47

Episode 66	Total score (averaged over agents) this episode: 18.629499583598225


training loop:  22% |#########                                 | ETA:  16:01:55

Episode 67	Total score (averaged over agents) this episode: 20.863499533664434

Environment solved in 67 episodes!	Average Score: 16.41


training loop:  22% |#########                                 | ETA:  16:00:22

Episode 68	Total score (averaged over agents) this episode: 20.992999530769886

Environment solved in 68 episodes!	Average Score: 16.73


training loop:  23% |#########                                 | ETA:  15:56:56

Episode 69	Total score (averaged over agents) this episode: 22.997499485965818

Environment solved in 69 episodes!	Average Score: 17.17


training loop:  23% |#########                                 | ETA:  15:53:46

Episode 70	Total score (averaged over agents) this episode: 24.406499454472215

Environment solved in 70 episodes!	Average Score: 17.62


training loop:  23% |#########                                 | ETA:  15:49:51

Episode 71	Total score (averaged over agents) this episode: 23.108499483484774


training loop:  24% |##########                                | ETA:  15:46:13

Episode 72	Total score (averaged over agents) this episode: 23.248999480344356


training loop:  24% |##########                                | ETA:  15:43:26

Episode 73	Total score (averaged over agents) this episode: 23.187499481718987


training loop:  24% |##########                                | ETA:  15:39:25

Episode 74	Total score (averaged over agents) this episode: 24.137999460473658


training loop:  25% |##########                                | ETA:  15:35:18

Episode 75	Total score (averaged over agents) this episode: 25.104999438859522

Environment solved in 75 episodes!	Average Score: 19.73


training loop:  25% |##########                                | ETA:  15:31:03

Episode 76	Total score (averaged over agents) this episode: 27.55049938419834

Environment solved in 76 episodes!	Average Score: 20.24


training loop:  25% |##########                                | ETA:  15:27:01

Episode 77	Total score (averaged over agents) this episode: 29.332499344367534

Environment solved in 77 episodes!	Average Score: 20.84


training loop:  26% |##########                                | ETA:  15:23:04

Episode 78	Total score (averaged over agents) this episode: 31.74199929051101

Environment solved in 78 episodes!	Average Score: 21.55


training loop:  26% |###########                               | ETA:  15:19:01

Episode 79	Total score (averaged over agents) this episode: 32.509999273344874

Environment solved in 79 episodes!	Average Score: 22.34


training loop:  26% |###########                               | ETA:  15:14:53

Episode 80	Total score (averaged over agents) this episode: 31.997499284800142


training loop:  27% |###########                               | ETA:  15:10:44

Episode 81	Total score (averaged over agents) this episode: 31.128499304223805


training loop:  27% |###########                               | ETA:  15:06:23

Episode 82	Total score (averaged over agents) this episode: 30.159499325882642


training loop:  27% |###########                               | ETA:  15:02:13

Episode 83	Total score (averaged over agents) this episode: 31.877999287471177


training loop:  28% |###########                               | ETA:  14:58:13

Episode 84	Total score (averaged over agents) this episode: 33.78249924490228

Environment solved in 84 episodes!	Average Score: 26.32


training loop:  28% |###########                               | ETA:  14:54:11

Episode 85	Total score (averaged over agents) this episode: 38.14149914747104

Environment solved in 85 episodes!	Average Score: 27.25


training loop:  28% |############                              | ETA:  14:49:53

Episode 86	Total score (averaged over agents) this episode: 37.09899917077273


training loop:  29% |############                              | ETA:  14:45:48

Episode 87	Total score (averaged over agents) this episode: 37.968499151337895


training loop:  29% |############                              | ETA:  14:41:30

Episode 88	Total score (averaged over agents) this episode: 39.139499125164

Environment solved in 88 episodes!	Average Score: 29.93


training loop:  29% |############                              | ETA:  14:37:16

Episode 89	Total score (averaged over agents) this episode: 38.44449914069846


training loop:  30% |############                              | ETA:  14:33:07

Episode 90	Total score (averaged over agents) this episode: 37.57949916003272


training loop:  30% |############                              | ETA:  14:29:09

Episode 91	Total score (averaged over agents) this episode: 38.48849913971499


training loop:  30% |############                              | ETA:  14:25:03

Episode 92	Total score (averaged over agents) this episode: 38.78049913318828


training loop:  31% |#############                             | ETA:  14:21:01

Episode 93	Total score (averaged over agents) this episode: 37.67249915795401


training loop:  31% |#############                             | ETA:  14:17:01

Episode 94	Total score (averaged over agents) this episode: 37.230999167822304


training loop:  31% |#############                             | ETA:  14:12:54

Episode 95	Total score (averaged over agents) this episode: 37.305999166145924


training loop:  32% |#############                             | ETA:  14:09:11

Episode 96	Total score (averaged over agents) this episode: 37.370499164704235


training loop:  32% |#############                             | ETA:  14:05:10

Episode 97	Total score (averaged over agents) this episode: 37.372999164648355


training loop:  32% |#############                             | ETA:  14:01:18

Episode 98	Total score (averaged over agents) this episode: 38.32449914338067


training loop:  33% |#############                             | ETA:  13:57:40

Episode 99	Total score (averaged over agents) this episode: 38.09999914839864


training loop:  33% |##############                            | ETA:  13:53:41

Episode 100	Total score (averaged over agents) this episode: 37.88249915326014


training loop: 100% |###########################################| Time: 7:01:03

Episode 101	Total score (averaged over agents) this episode: 38.50299913939089
training done



