# Collaboration and Competition

---

This notebook runs John's solution for the third project of the [Deep Reinforcement Learning Nanodegree](https://www.udacity.com/course/deep-reinforcement-learning-nanodegree--nd893) program.  It uses the Unity ML-Agents environment to train two cooperative agents to play a tennis-like game.

**Need more description here - refer to readme?**






...


This code depends upon a custom Unity environment provided by the Udacity staff that embodies the variation on tennis.  It will open a separate Unity window for visualizing the environment as the agents train or play.


## Define how we will use this notebook - JOHN FIX THIS!!!!

In the next cell, set the appropriate values of a couple control variables:
- **EXPLORE** determines whether the notebook does exploratory training or inference demonstration.
    - **True** runs a hyperparameter exploration loop to generate many training runs with a random search algorithm.  To use this well, you should study that cell and specify the ranges of hyperparameters to be explored.
    - **False** runs a few inference episodes of a pretrained model and opens a visualization window to watch it play.
- **config_name:** the name of a model configuration & run to be loaded from a checkpoint to begin the exercise.  
    - If EXPLORE = True, this is optional, and tells the training loop to start from this pre-trained model and continue refining it; if the value is _None_ then the training starts from a randomly initialized model.
    - If EXPLORE = False, then this must reflect the name of a legitimate config/run (e.g. "M37.01").
- **checkpoint_episode:** if a checkpoint is being used to start the exercise, then this number reflects what episode that checkpoint was captured from.  The checkpoint_name and checkpoint_episode together are required to completely identify the checkpoint file.


In [1]:
EXPLORE            = True
config_name        = None # Must be None if not using!
run_number         = 0
checkpoint_episode = 2

In [2]:
import numpy as np
import time
import matplotlib.pyplot as plt
from unityagents import UnityEnvironment
from train import train
from maddpg import Maddpg

%matplotlib inline

initial_episode = checkpoint_episode
checkpoint_path = "checkpoint/{}/".format(config_name)
tag = "{}.{:02d}".format(config_name, run_number)

if EXPLORE:
    turn_off_graphics = True
    initial_episode = 0
    unity_train_mode = True
    if config_name != None:
        initial_episode = checkpoint_episode
else:
    turn_off_graphics = False
    unity_train_mode = False

# create a new Unity environment
# it needs to be done once, outside any loop, as closing an environment then restarting causes
# a Unity exception about the handle no longer being active.
env = UnityEnvironment(file_name="Tennis_Linux/Tennis.x86_64", seed=0, 
                       no_graphics=turn_off_graphics)
brain_name = env.brain_names[0]
brain = env.brains[brain_name]                       
env_info = env.reset(train_mode=unity_train_mode)[brain_name]
num_agents = len(env_info.agents)
action_size = brain.vector_action_space_size
states = env_info.vector_observations
state_size = states.shape[1]


INFO:unityagents:
'Academy' started successfully!
Unity Academy name: Academy
        Number of Brains: 1
        Number of External Brains : 1
        Lesson number : 0
        Reset Parameters :
		
Unity brain name: TennisBrain
        Number of Visual Observations (per agent): 0
        Vector Observation space type: continuous
        Vector Observation space size (per agent): 8
        Number of stacked Vector Observation: 3
        Vector Action space type: continuous
        Vector Action space size (per agent): 2
        Vector Action descriptions: , 


## Train the agents

The next cells will invoke the training program to create the agents.  All of the real code is in Python flat files in this project directory.

In [3]:
import numpy as np

class RandomSampler():
    
    def __init__(self, vars):
        """Accepts definition of the set of variables to be sampled.
            
            Params:
                vars (list of lists): each item is a list containing:
                                        item 0 - either 'discrete', 'continuous-int' or 'continuous-float'
                                        items 1-N depend on the value of item 0:
                                        if discrete, then these are the set of values to be chosen from
                                        if continuous then these are the min & max bounds of the range
        """
        
        self.vars = vars
        
        self.rng = np.random.default_rng()
    
    
    def sample(self):
        """Draws a random sample of all variables at its disposal.
        
            Returns a list of values in the order of definition.
        """

        rtn = []
        for v in self.vars:
            if v[0] == "discrete":
                choice = self.rng.integers(low=1, high=len(v), size=1)[0]
                rtn.append(v[choice])
                
            elif v[0] == "continuous-int":
                choice = self.rng.integers(low=v[1], high=v[2], size=1)[0]
                rtn.append(choice)
                
            elif v[0] == "continuous-float":
                choice = self.rng.random() * (v[2]-v[1]) + v[1]
                rtn.append(choice)
            
            else:
                print("///// RandomSampler error:  unknown type ", v[0])
            
        return rtn
                
vars = [["discrete", 88, 66, 11, 22, 33, 44, 99, 101, 77],
        ["discrete", 500], #1-item list
        ["continuous-int", 43, 44], #1-item range
        ["continuous-int", 0, 10],
        ["continuous-float", 0.0, 1.0],
        ["continuous-float", -3.3, 0.0],
        ["continuous-float", -1.0, 6.4],
       ]
rs = RandomSampler(vars)

for i in range(3):
    out = rs.sample()
    print("\n", i, "\n", out)



 0 
 [88, 500, 43, 7, 0.052603964002829295, -1.571946362422261, 3.6376210279320276]

 1 
 [22, 500, 43, 8, 0.5041338170520017, -1.7431735660157148, 5.454940610488104]

 2 
 [22, 500, 43, 8, 0.22113401413347833, -2.4283701752803775, 3.382426406103554]


In [4]:
# This cell will explore several combinations of hyperparams by training all of them
# Use a random search for the hyperparams

TIME_STEPS         = 600
SAVE_ANALYSIS      = False
MODEL_DISPLAY_STEP = 0 #200k is approx 10k episodes at bad_step_prob = 0.01

if EXPLORE:
    
    # fixed for the session:
    RUN_PREFIX        = "M45"
    EPISODES          = 10001
    NUM_RUNS          = 30
    BUFFER_PRIME_SIZE = 5000
    WEIGHT_DECAY      = 1.0e-5 #was 1.0e-5
    GAMMA             = 0.99
    LR_ANNEAL_FREQ    = 10000 #episodes
    LR_ANNEAL_MULT    = 1.0
    SEED              = 44939 #(0, 111, 468, 5555, 23100, 44939)
    
    # session variables:
    vars = [
            ["discrete",         0.15,      1.00],      #BAD_STEP_PROB
            ["continuous-float", 0.999000,  0.999900],  #NOISE_DECAY
            ["discrete",         0.020, 0.100, 1.000],  #NOISE_SCALE (was 0.040, 1.0)
            ["continuous-float", 0.0000200, 0.0010000], #LR_ACTOR  (was 0.000010, 0.000080)
            ["continuous-float", 0.08,      1.0],       #LR_RATIO (determines LR_CRITIC)
            ["discrete",         2, 8, 20, 80],         #LEARN_EVERY
            ["continuous-int",   1,         2],         #LEARN_ITER
            ["continuous-float", 0.00100,   0.01000],   #TAU
            ["discrete",         256]                   #BATCH
           ]
    rs = RandomSampler(vars)
    
    print("Ready to train {} over {} training sets for {} episodes each, with fixed params:"
          .format(RUN_PREFIX, NUM_RUNS, EPISODES))
    print("    Max episodes   = ", EPISODES)
    print("    Weight decay   = ", WEIGHT_DECAY)
    print("    Gamma          = ", GAMMA)
    print("    LR anneal freq = ", LR_ANNEAL_FREQ)
    print("    LR anneal mult = ", LR_ANNEAL_MULT)
    print("    Buf prime size = ", BUFFER_PRIME_SIZE)
            
    for set_id in range(NUM_RUNS):
        
        # sample the variables
        v = rs.sample()
        BAD_STEP_PROB = v[0]
        NOISE_DECAY   = v[1]
        NOISE_SCALE   = v[2]
        LR_ACTOR      = v[3]
        LR_CRITIC     = v[4] * LR_ACTOR
        LEARN_EVERY   = v[5]
        LEARN_ITER    = v[6]
        TAU           = v[7]
        BATCH         = v[8]

        # set the replay buffer size to that it fills after ~5000 bad episodes
        # (at ~14 experiences/episode), based on the bad step retention rate
        #buffer_size = int(60000 - 50000*(1.0 - BAD_STEP_PROB))
        buffer_size = 100000

        RUN_NAME = "{}.{:02d}".format(RUN_PREFIX, set_id)
        print("\n///// Beginning training set ", RUN_NAME, " with:")
        print("      Batch size       = {:d}".format(BATCH))
        print("      Buffer size      = {:d}".format(buffer_size))
        print("      Bad step prob    = {:.4f}".format(BAD_STEP_PROB))
        print("      Noise decay      = {:.6f}".format(NOISE_DECAY))
        print("      Noise scale      = {:.3f}".format(NOISE_SCALE))
        print("      LR actor         = {:.7f}".format(LR_ACTOR))
        print("      LR critic        = {:.7f}".format(LR_CRITIC))
        print("      Learning every     ", LEARN_EVERY, " time steps")
        print("      Learn iterations = ", LEARN_ITER)
        print("      Tau              = {:.5f}".format(TAU))
        print("      Seed             = ", SEED)

        ##### instantiate the agents and perform the training

        maddpg = Maddpg(state_size, action_size, 2, bad_step_prob=BAD_STEP_PROB,
                        random_seed=SEED, batch_size=BATCH, buffer_size=buffer_size,
                        noise_decay=NOISE_DECAY, buffer_prime_size=BUFFER_PRIME_SIZE,
                        learn_every=LEARN_EVERY, 
                        learn_iter=LEARN_ITER, lr_actor=LR_ACTOR, lr_critic=LR_CRITIC,
                        lr_anneal_freq=LR_ANNEAL_FREQ, lr_anneal_mult=LR_ANNEAL_MULT,
                        weight_decay=WEIGHT_DECAY, gamma=GAMMA, noise_scale=NOISE_SCALE,
                        tau=TAU, model_display_step=MODEL_DISPLAY_STEP)
        
        if config_name != None:
            print("///// Beginning training from checkpoint for {}, episode {}" \
                  .format(tag, initial_episode))
            maddpg.restore_checkpoint(checkpoint_path, tag, initial_episode)

        scores, avgs = train(maddpg, env, run_name=RUN_NAME, starting_episode=initial_episode,
                             max_episodes=EPISODES, winning_score=0.5, max_time_steps=TIME_STEPS,
                             checkpoint_interval=1000)

        ##### plot the training reward history

        fig = plt.figure()
        ax = fig.add_subplot(111)
        plt.plot(np.arange(len(scores)), scores)
        plt.ylabel('Score')
        plt.xlabel('Episode #')
        plt.show()
        
        fig = plt.figure()
        ax = fig.add_subplot(111)
        plt.plot(np.arange(len(avgs)), avgs)
        plt.ylabel('Avg Score')
        plt.xlabel('Episode #')
        plt.show()

        ##### store the action/noise data, if being used

        if SAVE_ANALYSIS:
            maddpg.save_anal_data(RUN_PREFIX)

                            
    print("\n\nDONE!")

Ready to train M45 over 30 training sets for 12001 episodes each, with fixed params:
    Max episodes   =  12001
    Weight decay   =  1e-05
    Gamma          =  0.99
    LR anneal freq =  10000
    LR anneal mult =  1.0
    Buf prime size =  1000

///// Beginning training set  M45.00  with:
      Batch size       = 256
      Buffer size      = 100000
      Bad step prob    = 1.0000
      Noise decay      = 0.999547
      Noise scale      = 0.100
      LR actor         = 0.0003907
      LR critic        = 0.0000486
      Learning every      20  time steps
      Learn iterations =  1
      Tau              = 0.00271
      Seed             =  44939
Priming the replay buffer.....!

164	Running avg/max: 0.003/0.100,  mem:   3529/     7 ( 0.2%), avg 570.1 eps/min   
* noise mult = 0.1
401	Running avg/max: 0.006/0.100,  mem:   7218/    15 ( 0.2%), avg 561.6 eps/min   hit_rate =  0.03740648379052369
402	Running avg/max: 0.006/0.100,  mem:   7232/    15 ( 0.2%), avg 562.5 eps/min   hit_rate =

467	Running avg/max: 0.004/0.100,  mem:   8211/    17 ( 0.2%), avg 559.3 eps/min   hit_rate =  0.03640256959314775
468	Running avg/max: 0.004/0.100,  mem:   8226/    17 ( 0.2%), avg 559.0 eps/min   hit_rate =  0.03632478632478633
469	Running avg/max: 0.004/0.100,  mem:   8240/    17 ( 0.2%), avg 559.8 eps/min   hit_rate =  0.03624733475479744
470	Running avg/max: 0.005/0.100,  mem:   8266/    18 ( 0.2%), avg 559.3 eps/min   hit_rate =  0.03829787234042553
471	Running avg/max: 0.006/0.100,  mem:   8297/    19 ( 0.2%), avg 557.7 eps/min   hit_rate =  0.040339702760084924
472	Running avg/max: 0.006/0.100,  mem:   8311/    19 ( 0.2%), avg 557.6 eps/min   hit_rate =  0.04025423728813559
473	Running avg/max: 0.006/0.100,  mem:   8326/    19 ( 0.2%), avg 558.4 eps/min   hit_rate =  0.040169133192389
474	Running avg/max: 0.006/0.100,  mem:   8340/    19 ( 0.2%), avg 558.2 eps/min   hit_rate =  0.04008438818565401
475	Running avg/max: 0.006/0.100,  mem:   8355/    19 ( 0.2%), avg 558.1 eps/min 

539	Running avg/max: 0.007/0.100,  mem:   9369/    23 ( 0.2%), avg 551.6 eps/min   hit_rate =  0.04267161410018553

* noise mult = 0.0005
540	Running avg/max: 0.007/0.100,  mem:   9384/    23 ( 0.2%), avg 551.0 eps/min   hit_rate =  0.04259259259259259
541	Running avg/max: 0.007/0.100,  mem:   9398/    23 ( 0.2%), avg 551.5 eps/min   hit_rate =  0.04251386321626617
542	Running avg/max: 0.008/0.100,  mem:   9428/    24 ( 0.3%), avg 549.4 eps/min   hit_rate =  0.04428044280442804
543	Running avg/max: 0.007/0.100,  mem:   9443/    24 ( 0.3%), avg 550.0 eps/min   hit_rate =  0.04419889502762431
544	Running avg/max: 0.007/0.100,  mem:   9457/    24 ( 0.3%), avg 549.8 eps/min   hit_rate =  0.04411764705882353
545	Running avg/max: 0.008/0.100,  mem:   9476/    25 ( 0.3%), avg 549.5 eps/min   hit_rate =  0.045871559633027525
546	Running avg/max: 0.008/0.100,  mem:   9492/    25 ( 0.3%), avg 549.2 eps/min   hit_rate =  0.045787545787545784
547	Running avg/max: 0.008/0.100,  mem:   9524/    25 (

610	Running avg/max: 0.005/0.090,  mem:  10488/    28 ( 0.3%), avg 544.6 eps/min   hit_rate =  0.04590163934426229
611	Running avg/max: 0.005/0.090,  mem:  10502/    28 ( 0.3%), avg 544.3 eps/min   hit_rate =  0.04582651391162029
612	Running avg/max: 0.005/0.090,  mem:  10516/    28 ( 0.3%), avg 544.0 eps/min   hit_rate =  0.0457516339869281
613	Running avg/max: 0.005/0.090,  mem:  10530/    28 ( 0.3%), avg 544.4 eps/min   hit_rate =  0.04567699836867863
614	Running avg/max: 0.005/0.090,  mem:  10545/    28 ( 0.3%), avg 543.8 eps/min   hit_rate =  0.04560260586319218
615	Running avg/max: 0.005/0.090,  mem:  10559/    28 ( 0.3%), avg 543.3 eps/min   hit_rate =  0.04552845528455285
616	Running avg/max: 0.005/0.090,  mem:  10579/    28 ( 0.3%), avg 542.6 eps/min   hit_rate =  0.045454545454545456
617	Running avg/max: 0.005/0.090,  mem:  10593/    28 ( 0.3%), avg 543.2 eps/min   hit_rate =  0.04538087520259319
618	Running avg/max: 0.005/0.090,  mem:  10607/    28 ( 0.3%), avg 543.2 eps/min

682	Running avg/max: 0.004/0.200,  mem:  11572/    30 ( 0.3%), avg 540.6 eps/min   hit_rate =  0.04398826979472141
683	Running avg/max: 0.004/0.200,  mem:  11591/    30 ( 0.3%), avg 540.1 eps/min   hit_rate =  0.043923865300146414
684	Running avg/max: 0.004/0.200,  mem:  11606/    30 ( 0.3%), avg 540.6 eps/min   hit_rate =  0.043859649122807015
685	Running avg/max: 0.004/0.200,  mem:  11620/    30 ( 0.3%), avg 540.4 eps/min   hit_rate =  0.043795620437956206
686	Running avg/max: 0.004/0.200,  mem:  11634/    30 ( 0.3%), avg 540.2 eps/min   hit_rate =  0.043731778425655975
687	Running avg/max: 0.004/0.200,  mem:  11648/    30 ( 0.3%), avg 540.8 eps/min   hit_rate =  0.043668122270742356
688	Running avg/max: 0.004/0.200,  mem:  11662/    30 ( 0.3%), avg 540.7 eps/min   hit_rate =  0.0436046511627907
689	Running avg/max: 0.004/0.200,  mem:  11677/    30 ( 0.3%), avg 540.4 eps/min   hit_rate =  0.04354136429608128
690	Running avg/max: 0.004/0.200,  mem:  11691/    30 ( 0.3%), avg 540.3 eps

755	Running avg/max: 0.003/0.200,  mem:  12653/    31 ( 0.2%), avg 539.0 eps/min   hit_rate =  0.04105960264900662
756	Running avg/max: 0.003/0.200,  mem:  12667/    31 ( 0.2%), avg 538.7 eps/min   hit_rate =  0.041005291005291
757	Running avg/max: 0.003/0.200,  mem:  12682/    31 ( 0.2%), avg 538.6 eps/min   hit_rate =  0.04095112285336856
758	Running avg/max: 0.003/0.200,  mem:  12696/    31 ( 0.2%), avg 539.1 eps/min   hit_rate =  0.040897097625329816
759	Running avg/max: 0.001/0.090,  mem:  12710/    31 ( 0.2%), avg 539.1 eps/min   hit_rate =  0.04084321475625823
760	Running avg/max: 0.001/0.090,  mem:  12724/    31 ( 0.2%), avg 539.0 eps/min   hit_rate =  0.04078947368421053
761	Running avg/max: 0.001/0.090,  mem:  12739/    31 ( 0.2%), avg 539.4 eps/min   hit_rate =  0.040735873850197106
762	Running avg/max: 0.001/0.090,  mem:  12753/    31 ( 0.2%), avg 539.4 eps/min   hit_rate =  0.04068241469816273
763	Running avg/max: 0.001/0.090,  mem:  12768/    31 ( 0.2%), avg 539.2 eps/min

828	Running avg/max: 0.002/0.090,  mem:  13721/    32 ( 0.2%), avg 538.1 eps/min   hit_rate =  0.03864734299516908
829	Running avg/max: 0.002/0.090,  mem:  13735/    32 ( 0.2%), avg 537.8 eps/min   hit_rate =  0.038600723763570564
830	Running avg/max: 0.001/0.090,  mem:  13749/    32 ( 0.2%), avg 537.6 eps/min   hit_rate =  0.03855421686746988
831	Running avg/max: 0.001/0.090,  mem:  13763/    32 ( 0.2%), avg 538.0 eps/min   hit_rate =  0.03850782190132371
832	Running avg/max: 0.001/0.090,  mem:  13777/    32 ( 0.2%), avg 537.8 eps/min   hit_rate =  0.038461538461538464
833	Running avg/max: 0.001/0.090,  mem:  13792/    32 ( 0.2%), avg 537.6 eps/min   hit_rate =  0.03841536614645858
834	Running avg/max: 0.001/0.090,  mem:  13806/    32 ( 0.2%), avg 538.1 eps/min   hit_rate =  0.03836930455635491
835	Running avg/max: 0.001/0.090,  mem:  13820/    32 ( 0.2%), avg 538.0 eps/min   hit_rate =  0.03832335329341317
836	Running avg/max: 0.001/0.090,  mem:  13834/    32 ( 0.2%), avg 537.8 eps/m

900	Running avg/max: 0.005/0.200,  mem:  14833/    36 ( 0.2%), avg 536.7 eps/min   hit_rate =  0.04
901	Running avg/max: 0.005/0.200,  mem:  14847/    36 ( 0.2%), avg 536.7 eps/min   hit_rate =  0.03995560488346282
902	Running avg/max: 0.005/0.200,  mem:  14861/    36 ( 0.2%), avg 537.1 eps/min   hit_rate =  0.03991130820399113
903	Running avg/max: 0.005/0.200,  mem:  14875/    36 ( 0.2%), avg 537.1 eps/min   hit_rate =  0.03986710963455149
904	Running avg/max: 0.005/0.200,  mem:  14890/    36 ( 0.2%), avg 537.1 eps/min   hit_rate =  0.03982300884955752
905	Running avg/max: 0.005/0.200,  mem:  14904/    36 ( 0.2%), avg 536.9 eps/min   hit_rate =  0.039779005524861875
906	Running avg/max: 0.005/0.200,  mem:  14936/    36 ( 0.2%), avg 536.7 eps/min   hit_rate =  0.039735099337748346
907	Running avg/max: 0.005/0.200,  mem:  14951/    36 ( 0.2%), avg 536.7 eps/min   hit_rate =  0.03969128996692393
908	Running avg/max: 0.005/0.200,  mem:  14965/    36 ( 0.2%), avg 537.1 eps/min   hit_rate =

974	Running avg/max: 0.005/0.200,  mem:  16006/    39 ( 0.2%), avg 538.0 eps/min   hit_rate =  0.04004106776180698
975	Running avg/max: 0.005/0.200,  mem:  16020/    39 ( 0.2%), avg 538.0 eps/min   hit_rate =  0.04
976	Running avg/max: 0.004/0.200,  mem:  16035/    39 ( 0.2%), avg 538.4 eps/min   hit_rate =  0.039959016393442626
977	Running avg/max: 0.004/0.200,  mem:  16049/    39 ( 0.2%), avg 538.3 eps/min   hit_rate =  0.03991811668372569
978	Running avg/max: 0.004/0.200,  mem:  16063/    39 ( 0.2%), avg 537.8 eps/min   hit_rate =  0.03987730061349693
979	Running avg/max: 0.004/0.200,  mem:  16077/    39 ( 0.2%), avg 538.0 eps/min   hit_rate =  0.03983656792645557
980	Running avg/max: 0.003/0.200,  mem:  16091/    39 ( 0.2%), avg 537.7 eps/min   hit_rate =  0.03979591836734694
981	Running avg/max: 0.003/0.200,  mem:  16106/    39 ( 0.2%), avg 537.6 eps/min   hit_rate =  0.039755351681957186
982	Running avg/max: 0.003/0.200,  mem:  16120/    39 ( 0.2%), avg 538.0 eps/min   hit_rate =

1046	Running avg/max: 0.003/0.100,  mem:  17141/    42 ( 0.2%), avg 536.0 eps/min   hit_rate =  0.040152963671128104
1047	Running avg/max: 0.003/0.100,  mem:  17155/    42 ( 0.2%), avg 535.5 eps/min   hit_rate =  0.04011461318051576
1048	Running avg/max: 0.003/0.100,  mem:  17169/    42 ( 0.2%), avg 535.8 eps/min   hit_rate =  0.04007633587786259
1049	Running avg/max: 0.003/0.100,  mem:  17184/    42 ( 0.2%), avg 535.3 eps/min   hit_rate =  0.04003813155386082
1050	Running avg/max: 0.003/0.100,  mem:  17198/    42 ( 0.2%), avg 535.0 eps/min   hit_rate =  0.04
1051	Running avg/max: 0.003/0.100,  mem:  17213/    42 ( 0.2%), avg 535.3 eps/min   hit_rate =  0.039961941008563276
1052	Running avg/max: 0.003/0.100,  mem:  17227/    42 ( 0.2%), avg 535.2 eps/min   hit_rate =  0.039923954372623575
1053	Running avg/max: 0.003/0.100,  mem:  17241/    42 ( 0.2%), avg 535.1 eps/min   hit_rate =  0.039886039886039885
1054	Running avg/max: 0.003/0.100,  mem:  17256/    42 ( 0.2%), avg 535.0 eps/min  

1117	Running avg/max: 0.005/0.100,  mem:  18274/    44 ( 0.2%), avg 531.0 eps/min   hit_rate =  0.03939122649955237
1118	Running avg/max: 0.004/0.100,  mem:  18288/    44 ( 0.2%), avg 530.9 eps/min   hit_rate =  0.03935599284436494
1119	Running avg/max: 0.004/0.100,  mem:  18302/    44 ( 0.2%), avg 531.2 eps/min   hit_rate =  0.03932082216264522
1120	Running avg/max: 0.004/0.100,  mem:  18317/    44 ( 0.2%), avg 531.0 eps/min   hit_rate =  0.039285714285714285
1121	Running avg/max: 0.004/0.100,  mem:  18331/    44 ( 0.2%), avg 530.9 eps/min   hit_rate =  0.039250669045495096
1122	Running avg/max: 0.004/0.100,  mem:  18345/    44 ( 0.2%), avg 531.2 eps/min   hit_rate =  0.0392156862745098
1123	Running avg/max: 0.004/0.100,  mem:  18359/    44 ( 0.2%), avg 531.1 eps/min   hit_rate =  0.039180765805877114
1124	Running avg/max: 0.004/0.100,  mem:  18373/    44 ( 0.2%), avg 531.0 eps/min   hit_rate =  0.03914590747330961
1125	Running avg/max: 0.004/0.100,  mem:  18388/    44 ( 0.2%), avg 53

1188	Running avg/max: 0.004/0.200,  mem:  19377/    46 ( 0.2%), avg 529.7 eps/min   hit_rate =  0.03872053872053872
1189	Running avg/max: 0.004/0.200,  mem:  19392/    46 ( 0.2%), avg 529.8 eps/min   hit_rate =  0.03868797308662742
1190	Running avg/max: 0.004/0.200,  mem:  19406/    46 ( 0.2%), avg 529.6 eps/min   hit_rate =  0.03865546218487395
1191	Running avg/max: 0.004/0.200,  mem:  19430/    46 ( 0.2%), avg 529.3 eps/min   hit_rate =  0.038623005877413935
1192	Running avg/max: 0.004/0.200,  mem:  19444/    46 ( 0.2%), avg 529.2 eps/min   hit_rate =  0.03859060402684564
1193	Running avg/max: 0.004/0.200,  mem:  19459/    46 ( 0.2%), avg 529.6 eps/min   hit_rate =  0.038558256496227995
1194	Running avg/max: 0.004/0.200,  mem:  19473/    46 ( 0.2%), avg 529.6 eps/min   hit_rate =  0.038525963149078725
1195	Running avg/max: 0.004/0.200,  mem:  19487/    46 ( 0.2%), avg 529.5 eps/min   hit_rate =  0.038493723849372385
1196	Running avg/max: 0.004/0.200,  mem:  19502/    46 ( 0.2%), avg 

1261	Running avg/max: 0.009/0.200,  mem:  20538/    53 ( 0.3%), avg 528.1 eps/min   hit_rate =  0.04203013481363997
1262	Running avg/max: 0.009/0.200,  mem:  20552/    53 ( 0.3%), avg 528.4 eps/min   hit_rate =  0.041996830427892234
1263	Running avg/max: 0.009/0.200,  mem:  20567/    53 ( 0.3%), avg 528.4 eps/min   hit_rate =  0.04196357878068092
1264	Running avg/max: 0.009/0.200,  mem:  20581/    53 ( 0.3%), avg 528.3 eps/min   hit_rate =  0.041930379746835444
1265	Running avg/max: 0.009/0.200,  mem:  20595/    53 ( 0.3%), avg 528.1 eps/min   hit_rate =  0.04189723320158103
1266	Running avg/max: 0.009/0.200,  mem:  20619/    53 ( 0.3%), avg 527.7 eps/min   hit_rate =  0.04186413902053712
1267	Running avg/max: 0.009/0.200,  mem:  20633/    53 ( 0.3%), avg 528.0 eps/min   hit_rate =  0.041831097079715863
1268	Running avg/max: 0.009/0.200,  mem:  20648/    53 ( 0.3%), avg 527.9 eps/min   hit_rate =  0.0417981072555205
1269	Running avg/max: 0.009/0.200,  mem:  20662/    53 ( 0.3%), avg 52

1332	Running avg/max: 0.004/0.200,  mem:  21561/    53 ( 0.2%), avg 528.9 eps/min   hit_rate =  0.03978978978978979
1333	Running avg/max: 0.004/0.200,  mem:  21576/    53 ( 0.2%), avg 529.1 eps/min   hit_rate =  0.03975993998499625
1334	Running avg/max: 0.004/0.200,  mem:  21590/    53 ( 0.2%), avg 529.1 eps/min   hit_rate =  0.03973013493253373
1335	Running avg/max: 0.004/0.200,  mem:  21604/    53 ( 0.2%), avg 529.0 eps/min   hit_rate =  0.039700374531835204
1336	Running avg/max: 0.004/0.200,  mem:  21618/    53 ( 0.2%), avg 529.3 eps/min   hit_rate =  0.03967065868263473
1337	Running avg/max: 0.002/0.100,  mem:  21633/    53 ( 0.2%), avg 529.3 eps/min   hit_rate =  0.039640987284966345
1338	Running avg/max: 0.002/0.100,  mem:  21647/    53 ( 0.2%), avg 529.3 eps/min   hit_rate =  0.03961136023916293
1339	Running avg/max: 0.002/0.100,  mem:  21662/    53 ( 0.2%), avg 529.5 eps/min   hit_rate =  0.039581777445855115
1340	Running avg/max: 0.002/0.100,  mem:  21676/    53 ( 0.2%), avg 5

1404	Running avg/max: 0.003/0.100,  mem:  22666/    56 ( 0.2%), avg 528.5 eps/min   hit_rate =  0.039886039886039885
1405	Running avg/max: 0.003/0.100,  mem:  22680/    56 ( 0.2%), avg 528.5 eps/min   hit_rate =  0.0398576512455516
1406	Running avg/max: 0.003/0.100,  mem:  22694/    56 ( 0.2%), avg 528.6 eps/min   hit_rate =  0.03982930298719772
1407	Running avg/max: 0.003/0.100,  mem:  22708/    56 ( 0.2%), avg 528.3 eps/min   hit_rate =  0.03980099502487562
1408	Running avg/max: 0.003/0.100,  mem:  22723/    56 ( 0.2%), avg 528.2 eps/min   hit_rate =  0.03977272727272727
1409	Running avg/max: 0.003/0.100,  mem:  22737/    56 ( 0.2%), avg 528.2 eps/min   hit_rate =  0.0397444996451384
1410	Running avg/max: 0.003/0.100,  mem:  22751/    56 ( 0.2%), avg 528.5 eps/min   hit_rate =  0.03971631205673759
1411	Running avg/max: 0.003/0.100,  mem:  22765/    56 ( 0.2%), avg 528.4 eps/min   hit_rate =  0.039688164422395464
1412	Running avg/max: 0.003/0.100,  mem:  22780/    56 ( 0.2%), avg 528.

1475	Running avg/max: 0.002/0.090,  mem:  23704/    57 ( 0.2%), avg 528.3 eps/min   hit_rate =  0.03864406779661017
1476	Running avg/max: 0.002/0.090,  mem:  23718/    57 ( 0.2%), avg 528.4 eps/min   hit_rate =  0.03861788617886179
1477	Running avg/max: 0.002/0.090,  mem:  23732/    57 ( 0.2%), avg 528.4 eps/min   hit_rate =  0.03859174001354096
1478	Running avg/max: 0.002/0.090,  mem:  23746/    57 ( 0.2%), avg 528.3 eps/min   hit_rate =  0.03856562922868741
1479	Running avg/max: 0.002/0.090,  mem:  23761/    57 ( 0.2%), avg 528.5 eps/min   hit_rate =  0.038539553752535496
1480	Running avg/max: 0.002/0.090,  mem:  23775/    57 ( 0.2%), avg 528.5 eps/min   hit_rate =  0.038513513513513516
1481	Running avg/max: 0.002/0.090,  mem:  23789/    57 ( 0.2%), avg 528.4 eps/min   hit_rate =  0.03848750844024308
1482	Running avg/max: 0.004/0.200,  mem:  23828/    59 ( 0.2%), avg 527.9 eps/min   hit_rate =  0.0398110661268556
1483	Running avg/max: 0.004/0.200,  mem:  23842/    59 ( 0.2%), avg 527

1547	Running avg/max: 0.004/0.200,  mem:  24782/    60 ( 0.2%), avg 528.0 eps/min   hit_rate =  0.03878474466709761
1548	Running avg/max: 0.004/0.200,  mem:  24796/    60 ( 0.2%), avg 528.0 eps/min   hit_rate =  0.03875968992248062
1549	Running avg/max: 0.003/0.200,  mem:  24811/    60 ( 0.2%), avg 528.2 eps/min   hit_rate =  0.03873466752743705
1550	Running avg/max: 0.003/0.200,  mem:  24826/    60 ( 0.2%), avg 528.1 eps/min   hit_rate =  0.03870967741935484
1551	Running avg/max: 0.003/0.200,  mem:  24840/    60 ( 0.2%), avg 528.0 eps/min   hit_rate =  0.03868471953578337
1552	Running avg/max: 0.003/0.200,  mem:  24855/    60 ( 0.2%), avg 528.2 eps/min   hit_rate =  0.03865979381443299
1553	Running avg/max: 0.003/0.200,  mem:  24869/    60 ( 0.2%), avg 528.0 eps/min   hit_rate =  0.038634900193174504
1554	Running avg/max: 0.003/0.200,  mem:  24883/    60 ( 0.2%), avg 528.0 eps/min   hit_rate =  0.03861003861003861
1555	Running avg/max: 0.003/0.200,  mem:  24897/    60 ( 0.2%), avg 528

1619	Running avg/max: 0.001/0.090,  mem:  25855/    60 ( 0.2%), avg 527.4 eps/min   hit_rate =  0.03705991352686844
1620	Running avg/max: 0.001/0.090,  mem:  25869/    60 ( 0.2%), avg 527.4 eps/min   hit_rate =  0.037037037037037035
1621	Running avg/max: 0.001/0.090,  mem:  25884/    60 ( 0.2%), avg 527.6 eps/min   hit_rate =  0.03701418877236274
1622	Running avg/max: 0.001/0.090,  mem:  25899/    60 ( 0.2%), avg 527.5 eps/min   hit_rate =  0.036991368680641186
1623	Running avg/max: 0.001/0.090,  mem:  25931/    60 ( 0.2%), avg 527.0 eps/min   hit_rate =  0.036968576709796676
1624	Running avg/max: 0.001/0.090,  mem:  25950/    60 ( 0.2%), avg 527.0 eps/min   hit_rate =  0.03694581280788178
1625	Running avg/max: 0.002/0.090,  mem:  25969/    61 ( 0.2%), avg 527.2 eps/min   hit_rate =  0.03753846153846154
1626	Running avg/max: 0.002/0.090,  mem:  25983/    61 ( 0.2%), avg 527.1 eps/min   hit_rate =  0.037515375153751536
1627	Running avg/max: 0.003/0.100,  mem:  26015/    62 ( 0.2%), avg 

1692	Running avg/max: 0.003/0.100,  mem:  27011/    63 ( 0.2%), avg 525.3 eps/min   hit_rate =  0.03723404255319149
1693	Running avg/max: 0.003/0.100,  mem:  27026/    63 ( 0.2%), avg 525.3 eps/min   hit_rate =  0.03721204961606615
1694	Running avg/max: 0.003/0.100,  mem:  27058/    63 ( 0.2%), avg 525.1 eps/min   hit_rate =  0.0371900826446281
1695	Running avg/max: 0.003/0.100,  mem:  27072/    63 ( 0.2%), avg 525.0 eps/min   hit_rate =  0.03716814159292035
1696	Running avg/max: 0.003/0.100,  mem:  27086/    63 ( 0.2%), avg 524.9 eps/min   hit_rate =  0.03714622641509434
1697	Running avg/max: 0.003/0.100,  mem:  27100/    63 ( 0.2%), avg 525.1 eps/min   hit_rate =  0.037124337065409546


KeyboardInterrupt: 

# HEY JOHN - TODO!

- update main.py to match the above code {ALL CELLS}
- Test running from cmd line (may need a script?)
- Clean up the bottom part of this notebook

### Run two trained agents against each other (inference mode)

Note:  before running this cell, the Unity environment object will need to be defined (at top of notebook) with `no_graphics=False` so that the graphical game display will appear.

In [None]:
if not EXPLORE:
    
    # load the pre-trained model
    model = Maddpg(state_size, action_size, 2)
    model.restore_checkpoint(checkpoint_path, tag, initial_episode)

    for i in range(10):                                        # play game for several episodes
        env_info = env.reset(train_mode=False)[brain_name]     # reset the environment    
        states = env_info.vector_observations                  # get the current state (for each agent)
        scores = np.zeros(num_agents)                          # initialize the score (for each agent)
        num_steps = 0
        while True:
            actions = model.act(states, add_noise=False)
            env_info = env.step(actions)[brain_name]           # send all actions to tne environment
            next_states = env_info.vector_observations         # get next state (for each agent)
            rewards = env_info.rewards                         # get reward (for each agent)
            dones = env_info.local_done                        # see if episode finished
            scores += env_info.rewards                         # update the score (for each agent)
            states = next_states                               # roll over states to next time step
            num_steps += 1
            if np.any(dones):                                  # exit loop if episode finished
                break
        print('Episode {}: {:5.3f}, took {} steps'.format(i, np.max(scores), num_steps))


When finished, you can close the environment.

In [None]:
env.close()