# Continuous Control

---

In this notebook, you will learn how to use the Unity ML-Agents environment for the second project of the [Deep Reinforcement Learning Nanodegree](https://www.udacity.com/course/deep-reinforcement-learning-nanodegree--nd893) program.

### 1. Start the Environment

We begin by importing the necessary packages.  If the code cell below returns an error, please revisit the project instructions to double-check that you have installed [Unity ML-Agents](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Installation.md) and [NumPy](http://www.numpy.org/).

In [1]:
from unityagents import UnityEnvironment
import numpy as np

Next, we will start the environment!  **_Before running the code cell below_**, change the `file_name` parameter to match the location of the Unity environment that you downloaded.

- **Mac**: `"path/to/Reacher.app"`
- **Windows** (x86): `"path/to/Reacher_Windows_x86/Reacher.exe"`
- **Windows** (x86_64): `"path/to/Reacher_Windows_x86_64/Reacher.exe"`
- **Linux** (x86): `"path/to/Reacher_Linux/Reacher.x86"`
- **Linux** (x86_64): `"path/to/Reacher_Linux/Reacher.x86_64"`
- **Linux** (x86, headless): `"path/to/Reacher_Linux_NoVis/Reacher.x86"`
- **Linux** (x86_64, headless): `"path/to/Reacher_Linux_NoVis/Reacher.x86_64"`

For instance, if you are using a Mac, then you downloaded `Reacher.app`.  If this file is in the same folder as the notebook, then the line below should appear as follows:
```
env = UnityEnvironment(file_name="Reacher.app")
```

In [2]:
enable_visualization = True
if enable_visualization:
    env = UnityEnvironment(file_name='Reacher_Linux/Reacher.x86_64')
else:
    env = UnityEnvironment(file_name='Reacher_Linux_NoVis/Reacher.x86_64')

INFO:unityagents:
'Academy' started successfully!
Unity Academy name: Academy
        Number of Brains: 1
        Number of External Brains : 1
        Lesson number : 0
        Reset Parameters :
		goal_speed -> 1.0
		goal_size -> 5.0
Unity brain name: ReacherBrain
        Number of Visual Observations (per agent): 0
        Vector Observation space type: continuous
        Vector Observation space size (per agent): 33
        Number of stacked Vector Observation: 1
        Vector Action space type: continuous
        Vector Action space size (per agent): 4
        Vector Action descriptions: , , , 


Environments contain **_brains_** which are responsible for deciding the actions of their associated agents. Here we check for the first brain available, and set it as the default brain we will be controlling from Python.

In [3]:
# get the default brain
brain_name = env.brain_names[0]
brain = env.brains[brain_name]

### 2. Examine the State and Action Spaces

In this environment, a double-jointed arm can move to target locations. A reward of `+0.1` is provided for each step that the agent's hand is in the goal location. Thus, the goal of your agent is to maintain its position at the target location for as many time steps as possible.

The observation space consists of `33` variables corresponding to position, rotation, velocity, and angular velocities of the arm.  Each action is a vector with four numbers, corresponding to torque applicable to two joints.  Every entry in the action vector must be a number between `-1` and `1`.

Run the code cell below to print some information about the environment.

In [4]:
# reset the environment
env_info = env.reset(train_mode=True)[brain_name]

# number of agents
num_agents = len(env_info.agents)
print('Number of agents:', num_agents)

# size of each action
action_size = brain.vector_action_space_size
print('Size of each action:', action_size)

# examine the state space 
states = env_info.vector_observations
state_size = states.shape[1]
print('There are {} agents. Each observes a state with length: {}'.format(states.shape[0], state_size))
print('The state for the first agent looks like:', states[0])

Number of agents: 20
Size of each action: 4
There are 20 agents. Each observes a state with length: 33
The state for the first agent looks like: [ 0.00000000e+00 -4.00000000e+00  0.00000000e+00  1.00000000e+00
 -0.00000000e+00 -0.00000000e+00 -4.37113883e-08  0.00000000e+00
  0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
  0.00000000e+00  0.00000000e+00 -1.00000000e+01  0.00000000e+00
  1.00000000e+00 -0.00000000e+00 -0.00000000e+00 -4.37113883e-08
  0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
  0.00000000e+00  0.00000000e+00  5.75471878e+00 -1.00000000e+00
  5.55726624e+00  0.00000000e+00  1.00000000e+00  0.00000000e+00
 -1.68164849e-01]


### 3. Take Random Actions in the Environment

In the next code cell, you will learn how to use the Python API to control the agent and receive feedback from the environment.

Once this cell is executed, you will watch the agent's performance, if it selects an action at random with each time step.  A window should pop up that allows you to observe the agent, as it moves through the environment.  

Of course, as part of the project, you'll have to change the code so that the agent is able to use its experience to gradually choose better actions when interacting with the environment!

In [5]:
env_info = env.reset(train_mode=True)[brain_name]     # reset the environment    
states = env_info.vector_observations                  # get the current state (for each agent)
scores = np.zeros(num_agents)                          # initialize the score (for each agent)
t_max = 10000
for t in range(t_max):
# while True:
    actions = np.random.randn(num_agents, action_size) # select an action (for each agent)
    actions = np.clip(actions, -1, 1)                  # all actions between -1 and 1
    env_info = env.step(actions)[brain_name]           # send all actions to tne environment
    next_states = env_info.vector_observations         # get next state (for each agent)
    rewards = env_info.rewards                         # get reward (for each agent)
    dones = env_info.local_done                        # see if episode finished
    scores += env_info.rewards                         # update the score (for each agent)
    states = next_states                               # roll over states to next time step
    # print('current time: ', t)
    if np.any(dones):                                  # exit loop if episode finished
        print('done in ', t, ' steps')
        break
print('Total score (averaged over agents) this episode: {}'.format(np.mean(scores)))

done in  1000  steps
Total score (averaged over agents) this episode: 0.1904999957419932


When finished, you can close the environment.

In [6]:
# env.close()

### 4. It's Your Turn!

Now it's your turn to train your own agent to solve the environment!  When training the environment, set `train_mode=True`, so that the line for resetting the environment looks like the following:
```python
env_info = env.reset(train_mode=True)[brain_name]
```

In [7]:
from collections import deque
from ddpg_agent import Agent
from matplotlib import pyplot as plt
import os
import datetime
import torch

# widget bar to display progress
# !pip install progressbar
import progressbar as pb


# Methods on how to improve performance
# 1. Deeper and more complex networks
#   1.1 make critic deeper(ddpg_result_2022_09_30_15_35_43), batch(40): not working, reward stay below 1
# 2. Delayed updates
# 3. Tuned hyper parameters
#   3.1 different batch size
#   20(ddpg_result_2022_09_30_13_04_29)/80(ddpg_result_2022_09_30_14_24_26) not good, 40 good(ddpg_result_2022_09_30_13_27_48)
# 4. Noise function (uniform to normal distribution)
#   4.1 same batch size(40), smaller variation, better performance(ddpg_result_2022_09_30_13_27_48)
# 5. BatchNormalization
# 6. Dueling Q network(Not applicable for ddpg continuous control)
# 7. N-step bootstrapping
# 8. Importance Sampling Priority Queue

# Try 1
# UPDATE_STEP: 1->20, UPDATE_TIMES: 1->10, BATCH_SIZE: 40(ddpg_result_2022_09_30_16_08_56): not working
# Try 2: Change multi_step to step
# UPDATE_STEP: 20, UPDATE_TIMES: 5, BATCH_SIZE: 40(ddpg_result_2022_09_30_16_49_02): not working
# Try 3:
# UPDATE_STEP: 20, UPDATE_TIMES: 5, BATCH_SIZE: 40, disable grad clip for critic, change update time logic: better(ddpg_result_2022_09_30_17_28_04)
# Try 4:
# UPDATE_STEP: 20, UPDATE_TIMES: 5, BATCH_SIZE: 40, above and add BN in both net:(ddpg_result_2022_10_01_15_16_22) better but still fails
# Try 5:
# UPDATE_STEP: 20, UPDATE_TIMES: 5, BATCH_SIZE: 128, above:(ddpg_result_2022_10_01_19_12_28) much better but still fails
# Try 6:
# above and change net param(400,300)->(128,256):(ddpg_result_2022_10_01_22_51_33) worse
# Try 7:
# same as 5, but increase buffer 1e5->1e6(ddpg_result_2022_10_02_01_44_19): much better but not enough
# Try 8:
# same as above but change step logic(ddpg_result_2022_10_02_10_04_21): not working
# Try 9:
# same as above but change noise from normal to ou, noise std 0.2->0.05(ddpg_result_2022_10_02_12_18_34): much better but not enough
# Try 10:
# same as above but increase update times 5->7:(ddpg_result_2022_10_02_18_35_44) worse
# Try 11:
# same as above and change update time logic back:(ddpg_result_2022_10_02_22_45_19) huge bump ast start, but then not growing
# Try 12:
# same as above and change GAMMA 0.99->0.95:(ddpg_result_2022_10_02_23_57_47) not working
# Try 13:
# fix ou noise error by adding -0.5 bias and add reset:(ddpg_result_2022_10_03_02_05_59) much better but not enough
# Try 14:
# change detach to no_grad: (ddpg_result_2022_10_04_16_14_37) success
# Try 13:
# add prioritized exp replay, change update time to 1: (ddpg_result_2022_10_08_21_09_12) success and much faster(however the env unity env got stuck at communication, so only finished 44 episodes)



# conclusion
# small sampler first to evaluate the learning ability
# then increase batch size to give more data to learn

def plot_result(scores, scores_avg, actual_target_score, save_path):
    target_score_curve = np.ones(len(scores)) * actual_target_score
    fig = plt.figure(figsize=[15,10])
    ax = fig.add_axes([0.1, 0.1, 0.8, 0.8])
    plt.xlabel("episode"), plt.ylabel("score")
    ax.plot(scores)
    ax.plot(scores_avg)
    ax.plot(target_score_curve)
    ax.legend(['score','avg_score', 'target_score'])
    plt.savefig(save_path)
    plt.close(fig)

def Learn(env: UnityEnvironment, n_episodes=1000, max_t=1000, target_score=0.0, actual_target_score=0.0, prioritized_learn=False):

    cur_folder = os.getcwd()
    str_time = datetime.datetime.strftime(datetime.datetime.now(), '%Y_%m_%d_%H_%M_%S')
    folder_name = '/ddpg_result_' + str_time + '/'
    save_path = cur_folder + folder_name
    if not os.path.isdir(save_path):
        os.mkdir(save_path)

    widget = ['training loop: ', pb.Percentage(), ' ', 
          pb.Bar(), ' ', pb.ETA() ]
    timer = pb.ProgressBar(widgets=widget, maxval=n_episodes).start()

    scores_window = deque(maxlen=20)
    avg_agent_scores = []
    avg_scores = []
    cur_target_score = target_score  

    agent = Agent(env, save_path+'log.txt')
    
    for i_episode in range(1, n_episodes+1):
        agent.reset()
        env_info = env.reset(train_mode=True)[brain_name]     # reset the environment    
        num_agents = len(env_info.agents)
        states = env_info.vector_observations                  # get the current state (for each agent)
        scores = np.zeros(num_agents)                          # initialize the score (for each agent)
        for t in range(max_t):
            actions = agent.multi_agent_act(states)
            env_info = env.step(actions)[brain_name]           # send all actions to tne environment
            next_states = env_info.vector_observations         # get next state (for each agent)
            rewards = env_info.rewards                         # get reward (for each agent)
            dones = env_info.local_done                        # see if episode finished
            # agent.multi_agent_step(states, actions, rewards, next_states, dones)
            for i in range(num_agents):
                agent.priority_step(states[i], actions[i], rewards[i], next_states[i], dones[i], t)
            states = next_states                               # roll over states to next time step
            scores += env_info.rewards                         # update the score (for each agent)
            # print('current time: ', t)
            if np.any(dones):                                  # exit loop if episode finished
                break
        scores_window.append(np.mean(scores))
        print('\nEpisode {}\tTotal score (averaged over agents) this episode: {}'.format(i_episode, scores_window[-1]))
        avg_agent_scores.append(scores_window[-1])
        avg_scores.append(np.mean(scores_window))
        
        # update progress widget bar
        timer.update(i_episode)
        
        if avg_agent_scores[-1] > cur_target_score:
            print('\nEnvironment solved in {:d} episodes!\tAverage Score: {:.2f}'.format(i_episode, avg_scores[-1]))
            result_str = 'epi_' + str(i_episode) + '_score_' + str(avg_scores[-1])
            torch.save(agent.critic_local.state_dict(), save_path+result_str+'_critic_checkpoint.pth')
            torch.save(agent.actor_local.state_dict(), save_path+result_str+'_actor_checkpoint.pth')
            cur_target_score = avg_agent_scores[-1]
        
        pic_save_path = save_path+'results/'
        if not os.path.isdir(pic_save_path):
            os.mkdir(pic_save_path)
        pic_name = 'ddpg_n_epi_'+str(i_episode)+'.png'    
        plot_result(avg_agent_scores, avg_scores, actual_target_score, pic_save_path+pic_name) 
        
        if i_episode > 100 and avg_scores[-1] > actual_target_score:
            print('training done')
            break
    
    timer.finish()

    return avg_scores


In [8]:
# prioritized training
prioritized_learn = True
scores = Learn(env, 300, 10000, 0.0, 30.0, prioritized_learn)

training loop:   0% |                                          | ETA:  --:--:--

state size: 33 action size: 4
t_step: 1000 	after first sample, sample_time: 0.006 	update_time: 0.010 	learn_time: 0.028 	sum: 117.927 	size: 20015                                             

training loop:   0% |                                          | ETA:  13:52:24

t_step: 1000 	after first sample, sample_time: 0.005 	update_time: 0.008 	learn_time: 0.035 	sum: 118.894 	size: 20020                        
Episode 1	Total score (averaged over agents) this episode: 0.8384999812580645

Environment solved in 1 episodes!	Average Score: 0.84
t_step: 1000 	after first sample, sample_time: 0.006 	update_time: 0.011 	learn_time: 0.072 	sum: 279.658 	size: 40039                                               

training loop:   0% |                                          | ETA:  14:06:37

t_step: 1000 	after first sample, sample_time: 0.005 	update_time: 0.011 	learn_time: 0.073 	sum: 278.829 	size: 40040                        
Episode 2	Total score (averaged over agents) this episode: 0.7954999822191894
t_step: 1000 	after first sample, sample_time: 0.004 	update_time: 0.010 	learn_time: 0.035 	sum: 554.480 	size: 60053                                              

training loop:   1% |                                          | ETA:  14:24:16

t_step: 1000 	after first sample, sample_time: 0.005 	update_time: 0.008 	learn_time: 0.015 	sum: 558.999 	size: 60060                        
Episode 3	Total score (averaged over agents) this episode: 1.436999967880547

Environment solved in 3 episodes!	Average Score: 1.02
t_step: 1000 	after first sample, sample_time: 0.006 	update_time: 0.011 	learn_time: 0.051 	sum: 961.995 	size: 80079                                               

training loop:   1% |                                          | ETA:  14:19:36

t_step: 1000 	after first sample, sample_time: 0.009 	update_time: 0.013 	learn_time: 0.057 	sum: 959.036 	size: 80080                        
Episode 4	Total score (averaged over agents) this episode: 1.7029999619349838

Environment solved in 4 episodes!	Average Score: 1.19
t_step: 1000 	after first sample, sample_time: 0.006 	update_time: 0.008 	learn_time: 0.018 	sum: 1394.153 	size: 100097                                                                                              

training loop:   1% |                                          | ETA:  14:49:37

t_step: 1000 	after first sample, sample_time: 0.011 	update_time: 0.009 	learn_time: 0.024 	sum: 1383.382 	size: 100100                        
Episode 5	Total score (averaged over agents) this episode: 1.9174999571405351

Environment solved in 5 episodes!	Average Score: 1.34
t_step: 1000 	after first sample, sample_time: 0.004 	update_time: 0.008 	learn_time: 0.031 	sum: 1933.352 	size: 120114                        

training loop:   2% |                                          | ETA:  13:59:16

t_step: 1000 	after first sample, sample_time: 0.005 	update_time: 0.008 	learn_time: 0.015 	sum: 1921.081 	size: 120120                        
Episode 6	Total score (averaged over agents) this episode: 2.7384999387897553

Environment solved in 6 episodes!	Average Score: 1.57
t_step: 1000 	after first sample, sample_time: 0.004 	update_time: 0.008 	learn_time: 0.017 	sum: 2463.224 	size: 140133                                               

training loop:   2% |                                          | ETA:  13:22:42

t_step: 1000 	after first sample, sample_time: 0.005 	update_time: 0.009 	learn_time: 0.015 	sum: 2449.798 	size: 140140                        
Episode 7	Total score (averaged over agents) this episode: 2.966499933693558

Environment solved in 7 episodes!	Average Score: 1.77
t_step: 1000 	after first sample, sample_time: 0.010 	update_time: 0.009 	learn_time: 0.024 	sum: 3080.704 	size: 160159                        

training loop:   2% |#                                         | ETA:  12:51:06

t_step: 1000 	after first sample, sample_time: 0.006 	update_time: 0.009 	learn_time: 0.041 	sum: 3079.261 	size: 160160                        
Episode 8	Total score (averaged over agents) this episode: 3.896499912906438

Environment solved in 8 episodes!	Average Score: 2.04
t_step: 1000 	after first sample, sample_time: 0.005 	update_time: 0.009 	learn_time: 0.016 	sum: 3738.242 	size: 180179                        

training loop:   3% |#                                         | ETA:  12:30:02

t_step: 1000 	after first sample, sample_time: 0.005 	update_time: 0.009 	learn_time: 0.055 	sum: 3738.646 	size: 180180                        
Episode 9	Total score (averaged over agents) this episode: 4.479999899864197

Environment solved in 9 episodes!	Average Score: 2.31
t_step: 1000 	after first sample, sample_time: 0.006 	update_time: 0.010 	learn_time: 0.021 	sum: 4416.432 	size: 200197                                                                       

training loop:   3% |#                                         | ETA:  12:10:43

t_step: 1000 	after first sample, sample_time: 0.005 	update_time: 0.009 	learn_time: 0.016 	sum: 4409.892 	size: 200200                        
Episode 10	Total score (averaged over agents) this episode: 5.843499869387597

Environment solved in 10 episodes!	Average Score: 2.66
t_step: 1000 	after first sample, sample_time: 0.005 	update_time: 0.009 	learn_time: 0.020 	sum: 5152.131 	size: 220216                        

training loop:   3% |#                                         | ETA:  11:57:44

t_step: 1000 	after first sample, sample_time: 0.005 	update_time: 0.010 	learn_time: 0.018 	sum: 5137.149 	size: 220220                        
Episode 11	Total score (averaged over agents) this episode: 7.239999838173389

Environment solved in 11 episodes!	Average Score: 3.08
t_step: 1000 	after first sample, sample_time: 0.005 	update_time: 0.009 	learn_time: 0.018 	sum: 6000.621 	size: 240233                        

training loop:   4% |#                                         | ETA:  11:44:41

t_step: 1000 	after first sample, sample_time: 0.005 	update_time: 0.010 	learn_time: 0.066 	sum: 5981.068 	size: 240240                        
Episode 12	Total score (averaged over agents) this episode: 8.04299982022494

Environment solved in 12 episodes!	Average Score: 3.49
t_step: 1000 	after first sample, sample_time: 0.006 	update_time: 0.012 	learn_time: 0.080 	sum: 6889.959 	size: 260257                                               

training loop:   4% |#                                         | ETA:  11:34:07

t_step: 1000 	after first sample, sample_time: 0.005 	update_time: 0.009 	learn_time: 0.068 	sum: 6879.609 	size: 260260                        
Episode 13	Total score (averaged over agents) this episode: 9.12549979602918

Environment solved in 13 episodes!	Average Score: 3.92
t_step: 1000 	after first sample, sample_time: 0.005 	update_time: 0.011 	learn_time: 0.026 	sum: 7892.099 	size: 280279                        

training loop:   4% |#                                         | ETA:  11:26:00

t_step: 1000 	after first sample, sample_time: 0.006 	update_time: 0.012 	learn_time: 0.064 	sum: 7888.904 	size: 280280                        
Episode 14	Total score (averaged over agents) this episode: 11.870999734662472

Environment solved in 14 episodes!	Average Score: 4.49
t_step: 1000 	after first sample, sample_time: 0.007 	update_time: 0.011 	learn_time: 0.019 	sum: 8931.242 	size: 300299                        

training loop:   5% |##                                        | ETA:  11:19:47

t_step: 1000 	after first sample, sample_time: 0.005 	update_time: 0.011 	learn_time: 0.024 	sum: 8927.720 	size: 300300                        
Episode 15	Total score (averaged over agents) this episode: 12.453999721631408

Environment solved in 15 episodes!	Average Score: 5.02
t_step: 1000 	after first sample, sample_time: 0.007 	update_time: 0.013 	learn_time: 0.036 	sum: 10054.294 	size: 320315                                             

training loop:   5% |##                                        | ETA:  11:14:19

t_step: 1000 	after first sample, sample_time: 0.010 	update_time: 0.012 	learn_time: 0.049 	sum: 10037.126 	size: 320320                        
Episode 16	Total score (averaged over agents) this episode: 14.608999673463405

Environment solved in 16 episodes!	Average Score: 5.62
t_step: 1000 	after first sample, sample_time: 0.005 	update_time: 0.012 	learn_time: 0.018 	sum: 11130.119 	size: 340333                        

training loop:   5% |##                                        | ETA:  11:08:33

t_step: 1000 	after first sample, sample_time: 0.005 	update_time: 0.010 	learn_time: 0.031 	sum: 11107.727 	size: 340340                        
Episode 17	Total score (averaged over agents) this episode: 16.689499626960604

Environment solved in 17 episodes!	Average Score: 6.27
t_step: 1000 	after first sample, sample_time: 0.006 	update_time: 0.012 	learn_time: 0.021 	sum: 12290.192 	size: 360355                                                                       

training loop:   6% |##                                        | ETA:  11:04:48

t_step: 1000 	after first sample, sample_time: 0.006 	update_time: 0.010 	learn_time: 0.031 	sum: 12270.920 	size: 360360                        
Episode 18	Total score (averaged over agents) this episode: 16.8509996233508

Environment solved in 18 episodes!	Average Score: 6.86
t_step: 1000 	after first sample, sample_time: 0.006 	update_time: 0.012 	learn_time: 0.020 	sum: 13496.438 	size: 380377                        

training loop:   6% |##                                        | ETA:  11:00:14

t_step: 1000 	after first sample, sample_time: 0.006 	update_time: 0.011 	learn_time: 0.018 	sum: 13483.775 	size: 380380                        
Episode 19	Total score (averaged over agents) this episode: 23.125499483104797

Environment solved in 19 episodes!	Average Score: 7.72
t_step: 1000 	after first sample, sample_time: 0.006 	update_time: 0.010 	learn_time: 0.031 	sum: 14870.324 	size: 400393                        

training loop:   6% |##                                        | ETA:  10:57:00

t_step: 1000 	after first sample, sample_time: 0.006 	update_time: 0.011 	learn_time: 0.046 	sum: 14841.530 	size: 400400                        
Episode 20	Total score (averaged over agents) this episode: 23.743499469291418

Environment solved in 20 episodes!	Average Score: 8.52
t_step: 1000 	after first sample, sample_time: 0.005 	update_time: 0.011 	learn_time: 0.034 	sum: 16337.944 	size: 420417                        

training loop:   7% |##                                        | ETA:  10:54:06

t_step: 1000 	after first sample, sample_time: 0.005 	update_time: 0.011 	learn_time: 0.019 	sum: 16332.023 	size: 420420                        
Episode 21	Total score (averaged over agents) this episode: 27.731499380152673

Environment solved in 21 episodes!	Average Score: 9.86
t_step: 1000 	after first sample, sample_time: 0.012 	update_time: 0.012 	learn_time: 0.033 	sum: 17786.022 	size: 440438                                               

training loop:   7% |###                                       | ETA:  10:52:42

t_step: 1000 	after first sample, sample_time: 0.008 	update_time: 0.011 	learn_time: 0.037 	sum: 17777.888 	size: 440440                        
Episode 22	Total score (averaged over agents) this episode: 31.209499302413313

Environment solved in 22 episodes!	Average Score: 11.38
t_step: 1000 	after first sample, sample_time: 0.008 	update_time: 0.013 	learn_time: 0.020 	sum: 19243.507 	size: 460454                        

training loop:   7% |###                                       | ETA:  10:50:18

t_step: 1000 	after first sample, sample_time: 0.007 	update_time: 0.012 	learn_time: 0.041 	sum: 19216.008 	size: 460460                        
Episode 23	Total score (averaged over agents) this episode: 32.124999281950295

Environment solved in 23 episodes!	Average Score: 12.92
t_step: 1000 	after first sample, sample_time: 0.007 	update_time: 0.012 	learn_time: 0.018 	sum: 20613.672 	size: 480478                        

training loop:   8% |###                                       | ETA:  10:47:59

t_step: 1000 	after first sample, sample_time: 0.006 	update_time: 0.014 	learn_time: 0.065 	sum: 20609.262 	size: 480480                        
Episode 24	Total score (averaged over agents) this episode: 34.851999220997094

Environment solved in 24 episodes!	Average Score: 14.58
t_step: 1000 	after first sample, sample_time: 0.007 	update_time: 0.013 	learn_time: 0.022 	sum: 22048.061 	size: 500495                        

training loop:   8% |###                                       | ETA:  10:47:07

t_step: 1000 	after first sample, sample_time: 0.007 	update_time: 0.011 	learn_time: 0.038 	sum: 22037.078 	size: 500500                        
Episode 25	Total score (averaged over agents) this episode: 32.473999274149534
t_step: 1000 	after first sample, sample_time: 0.007 	update_time: 0.012 	learn_time: 0.018 	sum: 23359.772 	size: 520517                                               

training loop:   8% |###                                       | ETA:  10:45:09

t_step: 1000 	after first sample, sample_time: 0.008 	update_time: 0.012 	learn_time: 0.018 	sum: 23351.972 	size: 520520                        
Episode 26	Total score (averaged over agents) this episode: 34.11399923749268
t_step: 1000 	after first sample, sample_time: 0.005 	update_time: 0.012 	learn_time: 0.033 	sum: 24643.474 	size: 540535                        

training loop:   9% |###                                       | ETA:  10:43:55

t_step: 1000 	after first sample, sample_time: 0.007 	update_time: 0.011 	learn_time: 0.019 	sum: 24621.926 	size: 540540                        
Episode 27	Total score (averaged over agents) this episode: 33.6564992477186
t_step: 1000 	after first sample, sample_time: 0.009 	update_time: 0.014 	learn_time: 0.020 	sum: 25862.705 	size: 560555                                               

training loop:   9% |###                                       | ETA:  10:42:20

t_step: 1000 	after first sample, sample_time: 0.007 	update_time: 0.013 	learn_time: 0.021 	sum: 25841.994 	size: 560560                        
Episode 28	Total score (averaged over agents) this episode: 32.03249928401783
t_step: 1000 	after first sample, sample_time: 0.006 	update_time: 0.011 	learn_time: 0.026 	sum: 26999.270 	size: 580575                        

training loop:   9% |####                                      | ETA:  10:40:45

t_step: 1000 	after first sample, sample_time: 0.006 	update_time: 0.011 	learn_time: 0.018 	sum: 26972.791 	size: 580580                        
Episode 29	Total score (averaged over agents) this episode: 33.24699925687164
t_step: 1000 	after first sample, sample_time: 0.008 	update_time: 0.013 	learn_time: 0.020 	sum: 28274.044 	size: 600594                                               

training loop:  10% |####                                      | ETA:  10:38:41

t_step: 1000 	after first sample, sample_time: 0.009 	update_time: 0.015 	learn_time: 0.053 	sum: 28252.242 	size: 600600                        
Episode 30	Total score (averaged over agents) this episode: 33.20149925788864
t_step: 1000 	after first sample, sample_time: 0.006 	update_time: 0.011 	learn_time: 0.017 	sum: 29391.186 	size: 620615                        

training loop:  10% |####                                      | ETA:  10:37:03

t_step: 1000 	after first sample, sample_time: 0.007 	update_time: 0.014 	learn_time: 0.052 	sum: 29375.611 	size: 620620                        
Episode 31	Total score (averaged over agents) this episode: 34.48199922926724
t_step: 1000 	after first sample, sample_time: 0.008 	update_time: 0.013 	learn_time: 0.020 	sum: 30564.272 	size: 640639                        

training loop:  10% |####                                      | ETA:  10:35:24

t_step: 1000 	after first sample, sample_time: 0.010 	update_time: 0.012 	learn_time: 0.022 	sum: 30554.189 	size: 640640                        
Episode 32	Total score (averaged over agents) this episode: 35.05699921641499

Environment solved in 32 episodes!	Average Score: 26.13
t_step: 1000 	after first sample, sample_time: 0.008 	update_time: 0.012 	learn_time: 0.018 	sum: 31679.900 	size: 660653                        

training loop:  11% |####                                      | ETA:  10:33:55

t_step: 1000 	after first sample, sample_time: 0.006 	update_time: 0.012 	learn_time: 0.020 	sum: 31658.959 	size: 660660                        
Episode 33	Total score (averaged over agents) this episode: 34.967999218404294
t_step: 1000 	after first sample, sample_time: 0.007 	update_time: 0.012 	learn_time: 0.033 	sum: 33026.741 	size: 680673                        

training loop:  11% |####                                      | ETA:  10:32:01

t_step: 1000 	after first sample, sample_time: 0.008 	update_time: 0.014 	learn_time: 0.020 	sum: 33011.460 	size: 680680                        
Episode 34	Total score (averaged over agents) this episode: 36.76199917830527

Environment solved in 34 episodes!	Average Score: 28.67
t_step: 1000 	after first sample, sample_time: 0.008 	update_time: 0.014 	learn_time: 0.020 	sum: 34142.003 	size: 700694                        

training loop:  11% |####                                      | ETA:  10:31:14

t_step: 1000 	after first sample, sample_time: 0.007 	update_time: 0.014 	learn_time: 0.020 	sum: 34117.575 	size: 700700                        
Episode 35	Total score (averaged over agents) this episode: 37.316499165911225

Environment solved in 35 episodes!	Average Score: 29.91
t_step: 1000 	after first sample, sample_time: 0.008 	update_time: 0.015 	learn_time: 0.052 	sum: 35254.688 	size: 720716                        

training loop:  12% |#####                                     | ETA:  10:29:21

t_step: 1000 	after first sample, sample_time: 0.007 	update_time: 0.014 	learn_time: 0.020 	sum: 35245.862 	size: 720720                        
Episode 36	Total score (averaged over agents) this episode: 37.69549915743992

Environment solved in 36 episodes!	Average Score: 31.07
t_step: 1000 	after first sample, sample_time: 0.008 	update_time: 0.013 	learn_time: 0.023 	sum: 36326.096 	size: 740737                        

training loop:  12% |#####                                     | ETA:  10:28:33

t_step: 1000 	after first sample, sample_time: 0.008 	update_time: 0.012 	learn_time: 0.019 	sum: 36317.717 	size: 740740                        
Episode 37	Total score (averaged over agents) this episode: 36.019499194901435
t_step: 1000 	after first sample, sample_time: 0.010 	update_time: 0.012 	learn_time: 0.019 	sum: 37454.017 	size: 760754                                                                       

training loop:  12% |#####                                     | ETA:  10:27:43

t_step: 1000 	after first sample, sample_time: 0.011 	update_time: 0.017 	learn_time: 0.023 	sum: 37432.075 	size: 760760                        
Episode 38	Total score (averaged over agents) this episode: 35.51849920609966
t_step: 1000 	after first sample, sample_time: 0.008 	update_time: 0.014 	learn_time: 0.021 	sum: 38485.371 	size: 780776                                               

training loop:  13% |#####                                     | ETA:  10:26:58

t_step: 1000 	after first sample, sample_time: 0.007 	update_time: 0.013 	learn_time: 0.019 	sum: 38469.258 	size: 780780                        
Episode 39	Total score (averaged over agents) this episode: 35.9669991960749
t_step: 1000 	after first sample, sample_time: 0.010 	update_time: 0.014 	learn_time: 0.020 	sum: 39645.053 	size: 800798                        

training loop:  13% |#####                                     | ETA:  10:26:04

t_step: 1000 	after first sample, sample_time: 0.009 	update_time: 0.016 	learn_time: 0.025 	sum: 39635.723 	size: 800800                        
Episode 40	Total score (averaged over agents) this episode: 36.28749918891117
t_step: 1000 	after first sample, sample_time: 0.012 	update_time: 0.018 	learn_time: 0.033 	sum: 40760.219 	size: 820819                        

training loop:  13% |#####                                     | ETA:  10:27:01

t_step: 1000 	after first sample, sample_time: 0.010 	update_time: 0.016 	learn_time: 0.042 	sum: 40757.202 	size: 820820                        
Episode 41	Total score (averaged over agents) this episode: 36.68349918005988
t_step: 1000 	after first sample, sample_time: 0.010 	update_time: 0.016 	learn_time: 0.023 	sum: 41650.607 	size: 840839                        

training loop:  14% |#####                                     | ETA:  10:27:23

t_step: 1000 	after first sample, sample_time: 0.009 	update_time: 0.013 	learn_time: 0.021 	sum: 41648.189 	size: 840840                        
Episode 42	Total score (averaged over agents) this episode: 37.61399915926158
t_step: 1000 	after first sample, sample_time: 0.009 	update_time: 0.013 	learn_time: 0.026 	sum: 42852.099 	size: 860859                        

training loop:  14% |######                                    | ETA:  10:27:13

t_step: 1000 	after first sample, sample_time: 0.007 	update_time: 0.014 	learn_time: 0.023 	sum: 42850.112 	size: 860860                        
Episode 43	Total score (averaged over agents) this episode: 37.89149915305897

Environment solved in 43 episodes!	Average Score: 35.29
t_step: 1000 	after first sample, sample_time: 0.011 	update_time: 0.017 	learn_time: 0.023 	sum: 43828.108 	size: 880879                        

training loop:  14% |######                                    | ETA:  10:27:54

t_step: 1000 	after first sample, sample_time: 0.008 	update_time: 0.014 	learn_time: 0.022 	sum: 43826.504 	size: 880880                        
Episode 44	Total score (averaged over agents) this episode: 35.95649919630959
t_step: 670 	after first sample, sample_time: 0.011 	update_time: 0.015 	learn_time: 0.039 	sum: 44572.631 	size: 894300                        

KeyboardInterrupt: 

In [8]:
# training
prioritized_learn = False
scores = Learn(env, 300, 10000, 0.0, 30.0, prioritized_learn)

training loop:   0% |                                          | ETA:  --:--:--

state size: 33 action size: 4


training loop:   0% |                                          | ETA:  17:07:11

Episode 1	Total score (averaged over agents) this episode: 0.5969999866560102

Environment solved in 1 episodes!	Average Score: 0.60


training loop:   0% |                                          | ETA:  16:16:28

Episode 2	Total score (averaged over agents) this episode: 0.6334999858401715

Environment solved in 2 episodes!	Average Score: 0.62


training loop:   1% |                                          | ETA:  16:14:53

Episode 3	Total score (averaged over agents) this episode: 1.0124999773688614

Environment solved in 3 episodes!	Average Score: 0.75


training loop:   1% |                                          | ETA:  16:36:00

Episode 4	Total score (averaged over agents) this episode: 1.4824999668635428

Environment solved in 4 episodes!	Average Score: 0.93


training loop:   1% |                                          | ETA:  16:31:28

Episode 5	Total score (averaged over agents) this episode: 1.765499960538

Environment solved in 5 episodes!	Average Score: 1.10


training loop:   2% |                                          | ETA:  16:27:24

Episode 6	Total score (averaged over agents) this episode: 2.2264999502338467

Environment solved in 6 episodes!	Average Score: 1.29


training loop:   2% |                                          | ETA:  16:34:56

Episode 7	Total score (averaged over agents) this episode: 2.4389999454841016

Environment solved in 7 episodes!	Average Score: 1.45


training loop:   2% |#                                         | ETA:  16:32:21

Episode 8	Total score (averaged over agents) this episode: 3.768999915756285

Environment solved in 8 episodes!	Average Score: 1.74


training loop:   3% |#                                         | ETA:  16:27:37

Episode 9	Total score (averaged over agents) this episode: 3.1969999285414814


training loop:   3% |#                                         | ETA:  16:17:28

Episode 10	Total score (averaged over agents) this episode: 2.5074999439530075


training loop:   3% |#                                         | ETA:  16:13:42

Episode 11	Total score (averaged over agents) this episode: 2.9969999330118298


training loop:   4% |#                                         | ETA:  16:13:50

Episode 12	Total score (averaged over agents) this episode: 3.101999930664897


training loop:   4% |#                                         | ETA:  16:08:24

Episode 13	Total score (averaged over agents) this episode: 3.263499927055091


training loop:   4% |#                                         | ETA:  16:05:23

Episode 14	Total score (averaged over agents) this episode: 3.8519999139010905

Environment solved in 14 episodes!	Average Score: 2.35


training loop:   5% |##                                        | ETA:  16:10:12

Episode 15	Total score (averaged over agents) this episode: 4.766999893449247

Environment solved in 15 episodes!	Average Score: 2.51


training loop:   5% |##                                        | ETA:  16:18:19

Episode 16	Total score (averaged over agents) this episode: 5.023999887704849

Environment solved in 16 episodes!	Average Score: 2.66


training loop:   5% |##                                        | ETA:  16:20:09

Episode 17	Total score (averaged over agents) this episode: 5.53749987622723

Environment solved in 17 episodes!	Average Score: 2.83


training loop:   6% |##                                        | ETA:  16:17:03

Episode 18	Total score (averaged over agents) this episode: 7.0034998434595765

Environment solved in 18 episodes!	Average Score: 3.07


training loop:   6% |##                                        | ETA:  16:14:43

Episode 19	Total score (averaged over agents) this episode: 6.449499855842442


training loop:   6% |##                                        | ETA:  16:12:49

Episode 20	Total score (averaged over agents) this episode: 7.763999826461077

Environment solved in 20 episodes!	Average Score: 3.47


training loop:   7% |##                                        | ETA:  16:13:50

Episode 21	Total score (averaged over agents) this episode: 7.745499826874584


training loop:   7% |###                                       | ETA:  16:13:27

Episode 22	Total score (averaged over agents) this episode: 7.492499832529575


training loop:   7% |###                                       | ETA:  16:10:13

Episode 23	Total score (averaged over agents) this episode: 6.854499846789986


training loop:   8% |###                                       | ETA:  16:07:35

Episode 24	Total score (averaged over agents) this episode: 7.843499824684113

Environment solved in 24 episodes!	Average Score: 4.78


training loop:   8% |###                                       | ETA:  16:06:35

Episode 25	Total score (averaged over agents) this episode: 8.717999805137515

Environment solved in 25 episodes!	Average Score: 5.13


training loop:   8% |###                                       | ETA:  16:11:25

Episode 26	Total score (averaged over agents) this episode: 9.204999794252217

Environment solved in 26 episodes!	Average Score: 5.48


training loop:   9% |###                                       | ETA:  16:12:12

Episode 27	Total score (averaged over agents) this episode: 9.500499787647277

Environment solved in 27 episodes!	Average Score: 5.83


training loop:   9% |###                                       | ETA:  16:11:04

Episode 28	Total score (averaged over agents) this episode: 9.043499797862022


training loop:   9% |####                                      | ETA:  16:12:41

Episode 29	Total score (averaged over agents) this episode: 8.712999805249273


training loop:  10% |####                                      | ETA:  16:11:24

Episode 30	Total score (averaged over agents) this episode: 8.99599979892373


training loop:  10% |####                                      | ETA:  16:13:51

Episode 31	Total score (averaged over agents) this episode: 9.817499780561775

Environment solved in 31 episodes!	Average Score: 7.03


training loop:  10% |####                                      | ETA:  16:17:15

Episode 32	Total score (averaged over agents) this episode: 8.47149981064722


training loop:  11% |####                                      | ETA:  16:15:33

Episode 33	Total score (averaged over agents) this episode: 10.238499771151691

Environment solved in 33 episodes!	Average Score: 7.65


training loop:  11% |####                                      | ETA:  16:14:17

Episode 34	Total score (averaged over agents) this episode: 11.491999743133784

Environment solved in 34 episodes!	Average Score: 8.03


training loop:  11% |####                                      | ETA:  16:12:14

Episode 35	Total score (averaged over agents) this episode: 11.25549974841997


training loop:  12% |#####                                     | ETA:  16:13:56

Episode 36	Total score (averaged over agents) this episode: 10.93449975559488


training loop:  12% |#####                                     | ETA:  16:16:05

Episode 37	Total score (averaged over agents) this episode: 13.38549970081076

Environment solved in 37 episodes!	Average Score: 9.05


training loop:  12% |#####                                     | ETA:  16:14:25

Episode 38	Total score (averaged over agents) this episode: 12.335999724268913


training loop:  13% |#####                                     | ETA:  16:12:43

Episode 39	Total score (averaged over agents) this episode: 12.091999729722739


training loop:  13% |#####                                     | ETA:  16:13:00

Episode 40	Total score (averaged over agents) this episode: 12.813999713584781


training loop:  13% |#####                                     | ETA:  16:13:36

Episode 41	Total score (averaged over agents) this episode: 12.144499728549272


training loop:  14% |#####                                     | ETA:  16:18:15

Episode 42	Total score (averaged over agents) this episode: 12.192499727476388


training loop:  14% |######                                    | ETA:  16:18:18

Episode 43	Total score (averaged over agents) this episode: 12.321499724593014


training loop:  14% |######                                    | ETA:  16:18:48

Episode 44	Total score (averaged over agents) this episode: 12.665999716892838


training loop:  15% |######                                    | ETA:  16:19:46

Episode 45	Total score (averaged over agents) this episode: 13.589499696251005

Environment solved in 45 episodes!	Average Score: 11.06


training loop:  15% |######                                    | ETA:  16:19:04

Episode 46	Total score (averaged over agents) this episode: 13.329499702062458


training loop:  15% |######                                    | ETA:  16:22:06

Episode 47	Total score (averaged over agents) this episode: 14.674499671999365

Environment solved in 47 episodes!	Average Score: 11.53


training loop:  16% |######                                    | ETA:  16:23:11

Episode 48	Total score (averaged over agents) this episode: 14.634999672882259


training loop:  16% |######                                    | ETA:  16:21:21

Episode 49	Total score (averaged over agents) this episode: 14.210999682359397


training loop:  16% |#######                                   | ETA:  16:22:18

Episode 50	Total score (averaged over agents) this episode: 15.300999657995998

Environment solved in 50 episodes!	Average Score: 12.40


training loop:  17% |#######                                   | ETA:  16:22:29

Episode 51	Total score (averaged over agents) this episode: 12.985499709751457


training loop:  17% |#######                                   | ETA:  16:23:49

Episode 52	Total score (averaged over agents) this episode: 15.603499651234596

Environment solved in 52 episodes!	Average Score: 12.91


training loop:  17% |#######                                   | ETA:  16:23:11

Episode 53	Total score (averaged over agents) this episode: 15.886499644909055

Environment solved in 53 episodes!	Average Score: 13.19


training loop:  18% |#######                                   | ETA:  16:23:53

Episode 54	Total score (averaged over agents) this episode: 15.849999645724893


training loop:  18% |#######                                   | ETA:  16:23:04

Episode 55	Total score (averaged over agents) this episode: 16.40649963328615

Environment solved in 55 episodes!	Average Score: 13.67


training loop:  18% |#######                                   | ETA:  16:24:42

Episode 56	Total score (averaged over agents) this episode: 17.31849961290136

Environment solved in 56 episodes!	Average Score: 13.99


training loop:  19% |#######                                   | ETA:  16:22:33

Episode 57	Total score (averaged over agents) this episode: 17.275999613851308


training loop:  19% |########                                  | ETA:  16:19:32

Episode 58	Total score (averaged over agents) this episode: 17.622999606095256

Environment solved in 58 episodes!	Average Score: 14.45


training loop:  19% |########                                  | ETA:  16:19:46

Episode 59	Total score (averaged over agents) this episode: 16.59449962908402


training loop:  20% |########                                  | ETA:  16:19:13

Episode 60	Total score (averaged over agents) this episode: 14.647499672602862


training loop:  20% |########                                  | ETA:  16:19:31

Episode 61	Total score (averaged over agents) this episode: 15.583999651670457


training loop:  20% |########                                  | ETA:  16:16:50

Episode 62	Total score (averaged over agents) this episode: 14.270499681029468


training loop:  21% |########                                  | ETA:  16:14:30

Episode 63	Total score (averaged over agents) this episode: 16.675999627262353


training loop:  21% |########                                  | ETA:  16:11:46

Episode 64	Total score (averaged over agents) this episode: 18.15899959411472

Environment solved in 64 episodes!	Average Score: 15.53


training loop:  21% |#########                                 | ETA:  16:09:37

Episode 65	Total score (averaged over agents) this episode: 19.722499559167773

Environment solved in 65 episodes!	Average Score: 15.84


training loop:  22% |#########                                 | ETA:  16:05:47

Episode 66	Total score (averaged over agents) this episode: 18.629499583598225


training loop:  22% |#########                                 | ETA:  16:01:55

Episode 67	Total score (averaged over agents) this episode: 20.863499533664434

Environment solved in 67 episodes!	Average Score: 16.41


training loop:  22% |#########                                 | ETA:  16:00:22

Episode 68	Total score (averaged over agents) this episode: 20.992999530769886

Environment solved in 68 episodes!	Average Score: 16.73


training loop:  23% |#########                                 | ETA:  15:56:56

Episode 69	Total score (averaged over agents) this episode: 22.997499485965818

Environment solved in 69 episodes!	Average Score: 17.17


training loop:  23% |#########                                 | ETA:  15:53:46

Episode 70	Total score (averaged over agents) this episode: 24.406499454472215

Environment solved in 70 episodes!	Average Score: 17.62


training loop:  23% |#########                                 | ETA:  15:49:51

Episode 71	Total score (averaged over agents) this episode: 23.108499483484774


training loop:  24% |##########                                | ETA:  15:46:13

Episode 72	Total score (averaged over agents) this episode: 23.248999480344356


training loop:  24% |##########                                | ETA:  15:43:26

Episode 73	Total score (averaged over agents) this episode: 23.187499481718987


training loop:  24% |##########                                | ETA:  15:39:25

Episode 74	Total score (averaged over agents) this episode: 24.137999460473658


training loop:  25% |##########                                | ETA:  15:35:18

Episode 75	Total score (averaged over agents) this episode: 25.104999438859522

Environment solved in 75 episodes!	Average Score: 19.73


training loop:  25% |##########                                | ETA:  15:31:03

Episode 76	Total score (averaged over agents) this episode: 27.55049938419834

Environment solved in 76 episodes!	Average Score: 20.24


training loop:  25% |##########                                | ETA:  15:27:01

Episode 77	Total score (averaged over agents) this episode: 29.332499344367534

Environment solved in 77 episodes!	Average Score: 20.84


training loop:  26% |##########                                | ETA:  15:23:04

Episode 78	Total score (averaged over agents) this episode: 31.74199929051101

Environment solved in 78 episodes!	Average Score: 21.55


training loop:  26% |###########                               | ETA:  15:19:01

Episode 79	Total score (averaged over agents) this episode: 32.509999273344874

Environment solved in 79 episodes!	Average Score: 22.34


training loop:  26% |###########                               | ETA:  15:14:53

Episode 80	Total score (averaged over agents) this episode: 31.997499284800142


training loop:  27% |###########                               | ETA:  15:10:44

Episode 81	Total score (averaged over agents) this episode: 31.128499304223805


training loop:  27% |###########                               | ETA:  15:06:23

Episode 82	Total score (averaged over agents) this episode: 30.159499325882642


training loop:  27% |###########                               | ETA:  15:02:13

Episode 83	Total score (averaged over agents) this episode: 31.877999287471177


training loop:  28% |###########                               | ETA:  14:58:13

Episode 84	Total score (averaged over agents) this episode: 33.78249924490228

Environment solved in 84 episodes!	Average Score: 26.32


training loop:  28% |###########                               | ETA:  14:54:11

Episode 85	Total score (averaged over agents) this episode: 38.14149914747104

Environment solved in 85 episodes!	Average Score: 27.25


training loop:  28% |############                              | ETA:  14:49:53

Episode 86	Total score (averaged over agents) this episode: 37.09899917077273


training loop:  29% |############                              | ETA:  14:45:48

Episode 87	Total score (averaged over agents) this episode: 37.968499151337895


training loop:  29% |############                              | ETA:  14:41:30

Episode 88	Total score (averaged over agents) this episode: 39.139499125164

Environment solved in 88 episodes!	Average Score: 29.93


training loop:  29% |############                              | ETA:  14:37:16

Episode 89	Total score (averaged over agents) this episode: 38.44449914069846


training loop:  30% |############                              | ETA:  14:33:07

Episode 90	Total score (averaged over agents) this episode: 37.57949916003272


training loop:  30% |############                              | ETA:  14:29:09

Episode 91	Total score (averaged over agents) this episode: 38.48849913971499


training loop:  30% |############                              | ETA:  14:25:03

Episode 92	Total score (averaged over agents) this episode: 38.78049913318828


training loop:  31% |#############                             | ETA:  14:21:01

Episode 93	Total score (averaged over agents) this episode: 37.67249915795401


training loop:  31% |#############                             | ETA:  14:17:01

Episode 94	Total score (averaged over agents) this episode: 37.230999167822304


training loop:  31% |#############                             | ETA:  14:12:54

Episode 95	Total score (averaged over agents) this episode: 37.305999166145924


training loop:  32% |#############                             | ETA:  14:09:11

Episode 96	Total score (averaged over agents) this episode: 37.370499164704235


training loop:  32% |#############                             | ETA:  14:05:10

Episode 97	Total score (averaged over agents) this episode: 37.372999164648355


training loop:  32% |#############                             | ETA:  14:01:18

Episode 98	Total score (averaged over agents) this episode: 38.32449914338067


training loop:  33% |#############                             | ETA:  13:57:40

Episode 99	Total score (averaged over agents) this episode: 38.09999914839864


training loop:  33% |##############                            | ETA:  13:53:41

Episode 100	Total score (averaged over agents) this episode: 37.88249915326014


training loop: 100% |###########################################| Time: 7:01:03

Episode 101	Total score (averaged over agents) this episode: 38.50299913939089
training done



