# Evaluate_Training

### **1T-10L: 1 Teams composed of 10 agents **

We run single/multiple game-play to evaluate whether organizing agents under a leader-followers team can induce more agents to cross over and gather from the 2nd food pile. 

10 follower agents are organized into one team with a leader. The leader agent is envisioned to be a drone hovering atop a “target zone” (green frame) and is not physically represented in the game space. In full CTMA implementation, the leader will learn a trajectory for guiding its followers to the area of the game space where they can achieve the global optimum (the 2nd food pile). For now, we have hard-coded eight trajectories for the leader. 

<img src="images/leader-followers.png" width="800">

During training, the follower agents are trained in 2 settings:

(1) Static target zone   
(2) Moving target zone  

During multiple game-play,  we conduct repeated game play (30 episodes) whereby a team of 10 agents loaded with their saved models follow the 8 pre-defined target zone trajectories to the 2nd food pile. Then we average over the results of these games to calculate game metrics for each trajectory.


In [10]:
import os
import random
import time
import platform
import torch
import torch.optim as optim
import gym
import numpy as np
import pickle

# This is the Crossing game environment
from teams_env import CrossingEnv
from teams_model import *
from interface import *

import matplotlib.pyplot as plt

%matplotlib inline
plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

# for auto-reloading external modules
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2

print("Python version: ", platform.python_version())
print("Pytorch version: {}".format(torch.__version__))
print("OpenAI Gym version: {}".format(gym.__version__))

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload
Python version:  3.6.4
Pytorch version: 0.4.1.post2
OpenAI Gym version: 0.9.2


## Trajectories

These scanarios are essentially trajectories, which are defined by [(pos1, time1), (pos2, time2), (pos3, time3), ...] where:
* pos - ((zone position), (zone dimension))
* time - duration it stays stationary

Trajectories used:

<img src="images/zone_trajectories.png" width="800">

In [6]:
# pace = 50
pace = 100

trajectories = [
    [ # T1 - A south-east trajectory towards the 2nd food pile
            {'loc':((10,3),(5,5)), 'duration': pace},
            {'loc':((20,5),(5,5)), 'duration': pace},
            {'loc':((30,5),(5,5)), 'duration': pace},
            {'loc':((40,5),(5,5)), 'duration': pace},
            {'loc':((50,5),(5,5)), 'duration': pace},
            {'loc':((55,5),(5,5)), 'duration': 400}
    ],
    [# T2 - A east-south trajectory towards the 2nd food pile
            {'loc':((10,0),(5,5)), 'duration': pace},
            {'loc':((20,0),(5,5)), 'duration': pace},
            {'loc':((30,0),(5,5)), 'duration': pace},
            {'loc':((40,3),(5,5)), 'duration': pace},
            {'loc':((50,5),(5,5)), 'duration': pace},
            {'loc':((55,5),(5,5)), 'duration': 400}
    ],
    [# T3 - A slanted trajectory towards the 2nd food pile
            {'loc':((10,0),(5,5)), 'duration': pace},
            {'loc':((20,1),(5,5)), 'duration': pace},
            {'loc':((30,2),(5,5)), 'duration': pace},
            {'loc':((40,3),(5,5)), 'duration': pace},
            {'loc':((50,5),(5,5)), 'duration': pace},
            {'loc':((55,5),(5,5)), 'duration': 400}
    ],
    [# T4 - A zig-zag trajectory towards the 2nd food pile
            {'loc':((10,5),(5,5)), 'duration': pace},
            {'loc':((20,10),(5,5)), 'duration': pace},
            {'loc':((30,2),(5,5)), 'duration': pace},
            {'loc':((40,10),(5,5)), 'duration': pace},
            {'loc':((50,2),(5,5)), 'duration': pace},
            {'loc':((55,5),(5,5)), 'duration': 400} 
    ],
    [# T5 - A shortened south-east trajectory towards the 2nd food pile
            {'loc':((15,3),(5,5)), 'duration': pace},
            {'loc':((30,5),(5,5)), 'duration': pace},
            {'loc':((45,5),(5,5)), 'duration': pace},
            {'loc':((55,5),(5,5)), 'duration': 400}
    ],
    [# T6 - A shortened east-south trajectory towards the 2nd food pile
            {'loc':((15,0),(5,5)), 'duration': pace},
            {'loc':((30,0),(5,5)), 'duration': pace},
            {'loc':((45,5),(5,5)), 'duration': pace},
            {'loc':((55,5),(5,5)), 'duration': 400}
    ],
    [# T7 -  A shortened slanted trajectory towards the 2nd food pile
            {'loc':((15,0),(5,5)), 'duration': pace},
            {'loc':((30,2),(5,5)), 'duration': pace},
            {'loc':((45,5),(5,5)), 'duration': pace},
            {'loc':((55,5),(5,5)), 'duration': 400}
    ],
    [# T8 - A super-short trajectory towards the 2nd food pile
            {'loc':((15,0),(5,5)), 'duration': pace},
            {'loc':((35,2),(5,5)), 'duration': pace},
            {'loc':((55,5),(5,5)), 'duration': 400}
    ]
]

## Trained Models

The code block contains the folder locations of the trained models of follower agents as well as the parameters used in their training.

In [8]:
folders = [
    # Agents trained with a static target zone
    'models/1T-10L/followers_static/food_d37/pacifist_follower/tr5.0_t2.0_rp-1.0_300gs_s1/',   # scenario=1
    'models/1T-10L/followers_static/food_d37/pacifist_follower/tr5.0_t2.0_rp-1.0_300gs_s2/',   # scenario=2
    'models/1T-10L/followers_static/food_d37/pacifist_follower/tr5.0_t2.0_rp-1.0_300gs_s3/',   # scenario=3    
    'models/1T-10L/followers_static/food_d37/pacifist_follower/tr5.0_t1.5_rp-1.0_300gs_s1/',   # scenario=4
    'models/1T-10L/followers_static/food_d37/pacifist_follower/tr5.0_t1.5_rp-1.0_300gs_s2/',   # scenario=5
    'models/1T-10L/followers_static/food_d37/pacifist_follower/tr5.0_t1.5_rp-1.0_300gs_s3/',   # scenario=6
    # Agents trained with a moving target zone, map = food_d37
    "models/1T-10L/followers_trajectory/food_d37/pacifist_follower/tr5.0_t2.0_rp-1.0_600gs_s1/",   # scenario=7
    "models/1T-10L/followers_trajectory/food_d37/pacifist_follower/tr5.0_t2.0_rp-1.0_500gs_s2/",   # scenario=8 
    "models/1T-10L/followers_trajectory/food_d37/pacifist_follower/tr5.0_t1.5_rp-1.0_600gs_s1/",   # scenario=9
    "models/1T-10L/followers_trajectory/food_d37/pacifist_follower/tr5.0_t1.5_rp-1.0_500gs_s2/",   # scenario=10 
    # Agents trained using moving target zone and map = food_d37_river_w1_d25
    "models/1T-10L/followers_trajectory/food_d37_river_w1_d25/pacifist_follower/tr5.0_t1.5_rp-1.0_600gs_s1/",   # scenario=11
    "models/1T-10L/followers_trajectory/food_d37_river_w1_d25/pacifist_follower/tr5.0_t1.5_rp-1.0_500gs_s2/",   # scenario=12
    "models/1T-10L/followers_trajectory/food_d37_river_w1_d25/pacifist_follower/tr5.0_t2.0_rp-1.0_600gs_s1/",   # scenario=13
    "models/1T-10L/followers_trajectory/food_d37_river_w1_d25/pacifist_follower/tr5.0_t2.0_rp-1.0_500gs_s2/"   # scenario=14
]

# Parameter sets pertaining to the trained models in the folders above (not used in the code)
parameters =[ 
            # Temperature for explore/exploit; penalty per step in river; game steps per episode
            {'temp_start':2.0, 'river_penalty':-1.0, 'target_reward':5.0, 'target_zone':((20,0),(5,5)),\
             'game_steps':300, 'set': 1},
            {'temp_start':2.0, 'river_penalty':-1.0, 'target_reward':5.0, 'target_zone':((20,1),(5,5)),\
             'game_steps':300, 'set': 2},
            {'temp_start':2.0, 'river_penalty':-1.0, 'target_reward':5.0, 'target_zone':((20,2),(5,5)),\
             'game_steps':300, 'set': 3},
            {'temp_start':1.5, 'river_penalty':-1.0, 'target_reward':5.0, 'target_zone':((20,0),(5,5)),\
             'game_steps':300, 'set': 1},
            {'temp_start':1.5, 'river_penalty':-1.0, 'target_reward':5.0, 'target_zone':((20,1),(5,5)),\
             'game_steps':300, 'set': 2},
            {'temp_start':1.5, 'river_penalty':-1.0, 'target_reward':5.0, 'target_zone':((20,2),(5,5)),\
             'game_steps':300, 'set': 3},
            {'temp_start':2.0, 'river_penalty':-1.0, 'target_reward':5.0, 'trajectory':trajectories[6],\
             'game_steps':600, 'set': 1},
            {'temp_start':2.0, 'river_penalty':-1.0, 'target_reward':5.0, 'trajectory':trajectories[7],\
             'game_steps':500, 'set': 2},
            {'temp_start':1.5, 'river_penalty':-1.0, 'target_reward':5.0, 'trajectory':trajectories[6],\
             'game_steps':600, 'set': 1},
            {'temp_start':1.5, 'river_penalty':-1.0, 'target_reward':5.0, 'trajectory':trajectories[7],\
             'game_steps':500, 'set': 2},
            {'temp_start':2.0, 'river_penalty':-1.0, 'target_reward':5.0, 'trajectory':trajectories[6],\
             'game_steps':600, 'set': 1},
            {'temp_start':2.0, 'river_penalty':-1.0, 'target_reward':5.0, 'trajectory':trajectories[7],\
             'game_steps':500, 'set': 2},
            {'temp_start':1.5, 'river_penalty':-1.0, 'target_reward':5.0, 'trajectory':trajectories[6],\
             'game_steps':600, 'set': 1},
            {'temp_start':1.5, 'river_penalty':-1.0, 'target_reward':5.0, 'trajectory':trajectories[7],\
             'game_steps':500, 'set': 2},
            ]

# Single Game-Play - Static Target

Play a single game with rendering to observe agents' learning and resulting behaviors. A target zone is positioned somewhere in the game space (zone_num), which allows us to evaluate how well follower agents look for and assemble within a target zone in different part of the game space.

Change the scenario to load agent models. Change the zone_num to locate the static target zone for the game.


In [12]:
import pickle
import numpy as np

import torch
from torch.autograd import Variable
from teams_env import CrossingEnv

map_name = "food_d37"
culture = "pacifist_follower"

zones =[
    ((10,0),(5,5)),
    ((10,5),(5,5)),
    ((10,10),(5,5)),
    ((20,0),(5,5)),
    ((20,5),(5,5)),
    ((20,10),(5,5)),
    ((30,0),(5,5)),
    ((30,5),(5,5)),    
    ((30,10),(5,5)), 
]


################ User Inputs ######################

scenario = 10   # This picks the folder from which the followers' trained models are loaded from
zone_num = 6   # This picks the location of the target zone for the game
episodes = 3000  # This is used to recall a model file trained to a # of episodes

####################################################

dir_name = folders[scenario-1]
target_zone = zones[zone_num-1]


# There will be 10 agents - 0 teams of 0 AI agents each and 0 random agent
num_ai_agents = 10
num_rdn_agents = 0
num_agents = num_ai_agents+num_rdn_agents  # just the sum of the two

# Data structure for AI agents (agents will form their own Class later on)
agents = []
actions = []
tags = []

# Initialize environment
render = True
SPEED = 1/30
num_actions = 8                       # There are 8 actions defined in Gathering

# Initialize constants
num_frames = 7
max_episodes = 1
max_frames = 1000

# Initialize parameters for Crossing and Explore
river_penalty = -1
crossed = [0 for i in range(num_ai_agents)]  # Keep track of agents gathering from 2nd food pile
second_pile_x = 50   # x-coordinate of the 2nd food pile
jumping_zone = False


# Load models for AI agents
if episodes > 0:
    agents= [[] for i in range(num_ai_agents)]
    # If episodes is provided (not 0), load the model for each AI agent
    for i in range(num_ai_agents):
        model_file = dir_name+'MA{}_Crossing_ep{}.p'.format(i,episodes)
        try:
            with open(model_file, 'rb') as f:
                print("Load saved model for agent {}".format(i))
                agent = Policy(num_frames, num_actions, 0)
                optimizer = optim.Adam(agent.parameters(), lr=0.1)

                # New way to save and load models - based on: 
                # https://pytorch.org/tutorials/beginner/saving_loading_models.html
                _ = load_model(agent, optimizer, f)
                agent.eval()
                agents[i] = agent
        except OSError:
            print('Model file not found.')
            raise
else:
    # If episodes=0, start with a freshly initialized model for each AI agent
    for i in range(num_ai_agents):
        print("Load AI agent {}".format(i))
        agents.append(Policy(num_frames, num_actions, i))

# Load random agents    
for i in range(num_ai_agents,num_agents):
    print("Load random agent {}".format(i))
    agents.append(Rdn_Policy())

# Initialize AI and random agent data
actions = [0 for i in range(num_agents)]
tags = [0 for i in range(num_agents)]

# Establish tribal association
tribes = []
tribes.append(Tribe(name='Vikings',color='blue', culture=culture, \
                    agents=[agents[0], agents[1], agents[2], agents[3], agents[4], \
                           agents[5], agents[6], agents[7], agents[8], agents[9]]))
tribes[0].set_target_zone(target_zone)

#tribes.append(Tribe(name='Saxons', color='red', culture=culture, \
#                    agents=[agents[4], agents[5], agents[6], agents[7]]))
#tribes.append(Tribe(name='Franks', color='purple', culture=culture, \
#                    agents=[agents[8], agents[9], agents[10], agents[11]]))
# tribes.append(Tribe(name='Crazies', color='yellow', agents=[agents[3], \
#                    agents[4], agents[5]]))   # random agents are crazy!!!

# 1 tribes of 6 agents, used map defined in food_d37.txt
agent_colors = [agent.color for agent in agents]
agent_tribes = [agent.tribe for agent in agents]

# Added to implement exile and colonize cultures
tribe_names = [tribe.name for tribe in tribes]
tribe_target_zones = [tribe.target_zone for tribe in tribes]
    
    
env = CrossingEnv(n_agents=num_agents,agent_colors=agent_colors, agent_tribes=agent_tribes, \
                  map_name=map_name, river_penalty=river_penalty, tribes=tribe_names, \
                  target_zones=tribe_target_zones, debug_agent=0)    
    
for ep in range(max_episodes):
    
    US_hits = [0 for i in range(num_agents)]
    THEM_hits = [0 for i in range(num_agents)]

    env_obs = env.reset()  # Environment return observations
    """
    # For Debug only
    print (len(agents_obs))
    print (agents_obs[0].shape)
    """
    
    # Unpack observations into data structure compatible with agent Policy
    agents_obs = unpack_env_obs(env_obs)
    
    for i in range(num_ai_agents):    # Reset agent info - laser tag statistics
        agents[i].reset_info()    
    
    env.render()
    time.sleep(SPEED)  # Change speed of video rendering
    
    """
    # For Debug only
    print (len(agents_obs))
    print (agents_obs[0].shape)
    """
    
    """
    For now, we do not stack observations, and we do not implement LSTM
    
    state = np.stack([state]*num_frames)

    # Reset LSTM hidden units when episode begins
    cx = Variable(torch.zeros(1, 256))
    hx = Variable(torch.zeros(1, 256))
    """

    for frame in range(max_frames):

        for i in range(num_ai_agents):    # For AI agents
            actions[i], _ = select_action(agents[i], agents_obs[i], cuda=False)
            if actions[i] is 6:  # action[i] is a tensor, .item() returns the integer
                tags[i] += 1   # record a tag for accessing aggressiveness
                
        for i in range(num_ai_agents, num_agents):   # For random agents
            actions[i] = agents[i].select_action(agents_obs[i])
            if actions[i] is 6:
                tags[i] += 1   # record a tag for accessing aggressiveness
        
        """
        For now, we do not implement LSTM
        # Select action
        action, log_prob, state_value, (hx,cx)  = select_action(model, state, (hx,cx))        
        """

        # if frame % 10 == 0:
        #     print (actions)    
            
        # Perform step        
        env_obs, reward, done, info = env.step(actions)
        
        """
        For Debug only
        print (env_obs)
        print (reward)
        print (done) 
        """

        for i in range(num_ai_agents):
            agents[i].rewards.append(reward[i])  # Stack rewards

        
        # Unpack observations into data structure compatible with agent Policy
        agents_obs = unpack_env_obs(env_obs)
        load_info(agents, info, narrate=False)   # Load agent info for AI agents
        
        for i in range(num_agents):
            US_hits[i] += agents[i].US_hit
            THEM_hits[i] += agents[i].THEM_hit
            
        """
        For now, we do not stack observation, may come in handy later on
        
        # Evict oldest diff add new diff to state
        next_state = np.stack([next_state]*num_frames)
        next_state[1:, :, :] = state[:-1, :, :]
        state = next_state
        """
        total = 0
        for i in range(num_ai_agents):
            agent_reward = sum(agents[i].rewards)
            total += agent_reward
        
        env.render()
        time.sleep(SPEED)  # Change speed of video rendering

        if any(done):
            print("Done after {} frames".format(frame))
            break
        
        
        if jumping_zone:
            if frame % 100 is 0:          # move the target zone to the end
                (x,y), (w,h) = env.target_zones[0]
                x_random = random.randint(1,20)
                y_random = random.randint(-1,3)
                env.target_zones[0]= ((min(x+x_random,55),max(min(y+y_random,6), 0)), (w,h))    

env.close()  # Close the rendering window

# Print out statistics of AI agents

total_rewards = 0
total_tags = 0
total_US_hits = 0
total_THEM_hits = 0

print ('\nStatistics by Agent')
print ('===================')
for i in range(num_ai_agents):
    agent_tags = sum(agents[i].tag_hist)
    total_tags += agent_tags
#    print ("Agent{} aggressiveness is {:.2f}".format(i, sum(agents[i].tag_hist)/frame))

    agent_reward = sum(agents[i].rewards)
    total_rewards += agent_reward
    print ("Agent{} reward is {:d}".format(i, agent_reward))

    agent_US_hits = sum(agents[i].US_hits)
    agent_THEM_hits = sum(agents[i].THEM_hits)
    total_US_hits += agent_US_hits
    total_THEM_hits += agent_THEM_hits

#    print('US agents hit = {}'.format(agent_US_hits))
#    print('THEM agents hit = {}'.format(agent_THEM_hits ))

print ('\nStatistics in Aggregate')
print ('=======================')
print ('Total rewards gathered = {}'.format(total_rewards))
print ('Av. rewards per agent = {0:.2f}'.format(total_rewards/num_ai_agents))
# print ('Num laser fired = {}'.format(total_tags))
# print ('Total US Hit (friendly fire) = {}'.format(total_US_hits))
# print ('Total THEM Hit = {}'.format(total_THEM_hits))
# print ('friendly fire (%) = {0:.3f}'.format(total_US_hits/(total_US_hits+total_THEM_hits+1e-7)))

for (i, loc) in env.consumption:
    if loc[0] > second_pile_x:
        # print ('agent {} gathered an apple in 2nd pile'.format(i))
        crossed[i] = 1
        
print ("Num agents gathering from 2nd food pile: {}".format(sum(crossed)))

print ('\nStatistics by Team')
print ('===================')
top_tribe = None
top_tribe_reward = 0

for i, tribe in enumerate(tribes):
    if tribe.name is not 'Crazies':
        tribe_reward = sum(tribe.sum_rewards())
        print ('Tribe {} has total reward of {}'.format(tribe.name, tribe_reward))
                           
        if tribe_reward > top_tribe_reward:   # Keep track of dominating team
            top_tribe_reward = tribe_reward
            top_tribe = tribe.name

# Team dominance calculation
if len(tribes) > 1:
    print ('Dominating Team: {}'.format(top_tribe))
    dominance = top_tribe_reward/((total_rewards-top_tribe_reward+1.1e-7)/(len(tribes)-1))    
    print ('Team dominance: {0:.2f}x'.format(dominance))


Load saved model for agent 0
Load saved model for agent 1
Load saved model for agent 2
Load saved model for agent 3
Load saved model for agent 4
Load saved model for agent 5
Load saved model for agent 6
Load saved model for agent 7
Load saved model for agent 8
Load saved model for agent 9

Statistics by Agent
Agent0 reward is 3
Agent1 reward is 0
Agent2 reward is 1
Agent3 reward is 2
Agent4 reward is 0
Agent5 reward is 3
Agent6 reward is 24
Agent7 reward is 1
Agent8 reward is 0
Agent9 reward is 1

Statistics in Aggregate
Total rewards gathered = 35
Av. rewards per agent = 3.50
Num agents gathering from 2nd food pile: 2

Statistics by Team
Tribe Vikings has total reward of 35


# Single Game-Play - Moving Target

Play a single game with rendering to observe agents' learning and resulting behaviors. A target zone moves within the game space based on a trajectory (traj_num),  which allows us to evaluate how well follower agents follow the trajectory of a moving target zone.

Change the scenario to load agent models. Change the traj_num to select from one of eight pre-defined trajectories defined above.


In [13]:
import pickle
import numpy as np

import torch
from torch.autograd import Variable
from teams_env import CrossingEnv


# map_name = "food_d37"
map_name = "food_d37_river_w1_d25"
culture = "pacifist_follower"

################ User Inputs ######################

scenario = 12   # This picks the folder from which the followers' trained models are loaded from
traj_num = 3   # This picks the trajectory of the moving target zone for the game
episodes = 2000  # This is used to recall a model file trained to a # of episodes

####################################################


trajectory = trajectories[traj_num-1]
position = trajectory [0]  # shift the target zone to first position in trajectory
target_zone = position['loc']  
duration = position['duration']
dir_name = folders[scenario-1]


# There will be 10 agents - 0 teams of 0 AI agents each and 0 random agent
num_ai_agents = 10
num_rdn_agents = 0
num_agents = num_ai_agents+num_rdn_agents  # just the sum of the two

# Data structure for AI agents (agents will form their own Class later on)
agents = []
actions = []
tags = []

# Initialize environment
render = True
SPEED = 1/30
num_actions = 8                       # There are 8 actions defined in Gathering

# Initialize constants
num_frames = 7
max_episodes = 1
max_frames = 800

# Initialize parameters for Crossing and Explore
river_penalty = -1
crossed = [0 for i in range(num_ai_agents)]  # Keep track of agents gathering from 2nd food pile
second_pile_x = 50   # x-coordinate of the 2nd food pile
jumping_zone = False


# Load models for AI agents
if episodes > 0:
    agents= [[] for i in range(num_ai_agents)]
    # If episodes is provided (not 0), load the model for each AI agent
    for i in range(num_ai_agents):
        model_file = dir_name+'MA{}_Crossing_ep{}.p'.format(i,episodes)
        try:
            with open(model_file, 'rb') as f:
                print("Load saved model for agent {}".format(i))
                agent = Policy(num_frames, num_actions, 0)
                optimizer = optim.Adam(agent.parameters(), lr=0.1)

                # New way to save and load models - based on: 
                # https://pytorch.org/tutorials/beginner/saving_loading_models.html
                _ = load_model(agent, optimizer, f)
                agent.eval()
                agents[i] = agent
        except OSError:
            print('Model file not found.')
            raise
else:
    # If episodes=0, start with a freshly initialized model for each AI agent
    for i in range(num_ai_agents):
        print("Load AI agent {}".format(i))
        agents.append(Policy(num_frames, num_actions, i))

# Load random agents    
for i in range(num_ai_agents,num_agents):
    print("Load random agent {}".format(i))
    agents.append(Rdn_Policy())

# Initialize AI and random agent data
actions = [0 for i in range(num_agents)]
tags = [0 for i in range(num_agents)]

# Establish tribal association
tribes = []
tribes.append(Tribe(name='Vikings',color='blue', culture=culture, \
                    agents=[agents[0], agents[1], agents[2], agents[3], agents[4], \
                           agents[5], agents[6], agents[7], agents[8], agents[9]]))
tribes[0].set_target_zone(target_zone)

#tribes.append(Tribe(name='Saxons', color='red', culture=culture, \
#                    agents=[agents[4], agents[5], agents[6], agents[7]]))
#tribes.append(Tribe(name='Franks', color='purple', culture=culture, \
#                    agents=[agents[8], agents[9], agents[10], agents[11]]))
# tribes.append(Tribe(name='Crazies', color='yellow', agents=[agents[3], \
#                    agents[4], agents[5]]))   # random agents are crazy!!!

# 1 tribes of 6 agents, used map defined in food_d37.txt
agent_colors = [agent.color for agent in agents]
agent_tribes = [agent.tribe for agent in agents]

# Added to implement exile and colonize cultures
tribe_names = [tribe.name for tribe in tribes]
tribe_target_zones = [tribe.target_zone for tribe in tribes]
    
    
env = CrossingEnv(n_agents=num_agents,agent_colors=agent_colors, agent_tribes=agent_tribes, \
                  map_name=map_name, river_penalty=river_penalty, tribes=tribe_names, \
                  target_zones=tribe_target_zones, debug_agent=0)    
    
for ep in range(max_episodes):
    
    US_hits = [0 for i in range(num_agents)]
    THEM_hits = [0 for i in range(num_agents)]

    env_obs = env.reset()  # Environment return observations
    """
    # For Debug only
    print (len(agents_obs))
    print (agents_obs[0].shape)
    """
    
    # Unpack observations into data structure compatible with agent Policy
    agents_obs = unpack_env_obs(env_obs)
    
    for i in range(num_ai_agents):    # Reset agent info - laser tag statistics
        agents[i].reset_info()    
    
    env.render()
    time.sleep(SPEED)  # Change speed of video rendering
    
    """
    # For Debug only
    print (len(agents_obs))
    print (agents_obs[0].shape)
    """
    
    """
    For now, we do not stack observations, and we do not implement LSTM
    
    state = np.stack([state]*num_frames)

    # Reset LSTM hidden units when episode begins
    cx = Variable(torch.zeros(1, 256))
    hx = Variable(torch.zeros(1, 256))
    """

    index = 0
    position = trajectory [index]  # shift the target zone to first position in trajectory
    env.target_zones[0] = position['loc']  
    duration = position['duration']
    
    for frame in range(max_frames):
        
        if (frame+1) % duration == 0:     # time to shift the target zone
            index += 1                    # shift the target zone to new point in trajectory 
            if index < len(trajectory):   
                position = trajectory[index]  
                duration = position['duration']
                env.target_zones[0] = position['loc'] 

        for i in range(num_ai_agents):    # For AI agents
            actions[i], _ = select_action(agents[i], agents_obs[i], cuda=False)
            if actions[i] is 6:  # action[i] is a tensor, .item() returns the integer
                tags[i] += 1   # record a tag for accessing aggressiveness
                
        for i in range(num_ai_agents, num_agents):   # For random agents
            actions[i] = agents[i].select_action(agents_obs[i])
            if actions[i] is 6:
                tags[i] += 1   # record a tag for accessing aggressiveness
        
        """
        For now, we do not implement LSTM
        # Select action
        action, log_prob, state_value, (hx,cx)  = select_action(model, state, (hx,cx))        
        """

        # if frame % 10 == 0:
        #     print (actions)    
            
        # Perform step        
        env_obs, reward, done, info = env.step(actions)
        
        """
        For Debug only
        print (env_obs)
        print (reward)
        print (done) 
        """

        for i in range(num_ai_agents):
            agents[i].rewards.append(reward[i])  # Stack rewards

        
        # Unpack observations into data structure compatible with agent Policy
        agents_obs = unpack_env_obs(env_obs)
        load_info(agents, info, narrate=False)   # Load agent info for AI agents
        
        for i in range(num_agents):
            US_hits[i] += agents[i].US_hit
            THEM_hits[i] += agents[i].THEM_hit
            
        """
        For now, we do not stack observation, may come in handy later on
        
        # Evict oldest diff add new diff to state
        next_state = np.stack([next_state]*num_frames)
        next_state[1:, :, :] = state[:-1, :, :]
        state = next_state
        """
        total = 0
        for i in range(num_ai_agents):
            agent_reward = sum(agents[i].rewards)
            total += agent_reward
        
        env.render()
        time.sleep(SPEED)  # Change speed of video rendering

        if any(done):
            print("Done after {} frames".format(frame))
            break

env.close()  # Close the rendering window

# Print out statistics of AI agents

total_rewards = 0
total_tags = 0
total_US_hits = 0
total_THEM_hits = 0

print ('\nStatistics by Agent')
print ('===================')
for i in range(num_ai_agents):
    agent_tags = sum(agents[i].tag_hist)
    total_tags += agent_tags
    # print ("Agent{} aggressiveness is {:.2f}".format(i, sum(agents[i].tag_hist)/frame))

    agent_reward = sum(agents[i].rewards)
    total_rewards += agent_reward
    print ("Agent{} reward is {:d}".format(i, agent_reward))

    agent_US_hits = sum(agents[i].US_hits)
    agent_THEM_hits = sum(agents[i].THEM_hits)
    total_US_hits += agent_US_hits
    total_THEM_hits += agent_THEM_hits

    # print('US agents hit = {}'.format(agent_US_hits))
    # print('THEM agents hit = {}'.format(agent_THEM_hits ))

print ('\nStatistics in Aggregate')
print ('=======================')
print ('Total rewards gathered = {}'.format(total_rewards))
print ('Av. rewards per agent = {0:.2f}'.format(total_rewards/num_ai_agents))
# print ('Num laser fired = {}'.format(total_tags))
# print ('Total US Hit (friendly fire) = {}'.format(total_US_hits))
# print ('Total THEM Hit = {}'.format(total_THEM_hits))
# print ('friendly fire (%) = {0:.3f}'.format(total_US_hits/(total_US_hits+total_THEM_hits+1e-7)))
        
for (i, loc) in env.consumption:
    if loc[0] > second_pile_x:
        # print ('agent {} gathered an apple in 2nd pile'.format(i))
        crossed[i] = 1
        
print ("Num agents gathering from 2nd food pile: {}".format(sum(crossed)))

print ('\nStatistics by Team')
print ('===================')
top_tribe = None
top_tribe_reward = 0

for i, tribe in enumerate(tribes):
    if tribe.name is not 'Crazies':
        tribe_reward = sum(tribe.sum_rewards())
        print ('Tribe {} has total reward of {}'.format(tribe.name, tribe_reward))
                           
        if tribe_reward > top_tribe_reward:   # Keep track of dominating team
            top_tribe_reward = tribe_reward
            top_tribe = tribe.name

# Team dominance calculation
if len(tribes) > 1:
    print ('Dominating Team: {}'.format(top_tribe))
    dominance = top_tribe_reward/((total_rewards-top_tribe_reward+1.1e-7)/(len(tribes)-1))    
    print ('Team dominance: {0:.2f}x'.format(dominance))

Load saved model for agent 0
Load saved model for agent 1
Load saved model for agent 2
Load saved model for agent 3
Load saved model for agent 4
Load saved model for agent 5
Load saved model for agent 6
Load saved model for agent 7
Load saved model for agent 8
Load saved model for agent 9

Statistics by Agent
Agent0 reward is -8
Agent1 reward is 6
Agent2 reward is 0
Agent3 reward is -22
Agent4 reward is 20
Agent5 reward is 0
Agent6 reward is 26
Agent7 reward is 14
Agent8 reward is 1
Agent9 reward is -1

Statistics in Aggregate
Total rewards gathered = 36
Av. rewards per agent = 3.60
Num agents gathering from 2nd food pile: 5

Statistics by Team
Tribe Vikings has total reward of 36


## Multiple Game-Play - Trajectories (Map = food_d37)

Our research requires the gathering of agent and team metrics averaged over 30 episodes of game play. The two metrics gathered and averaged are:

* Average agent reward - average number of apples gathered per agent per episode  
* The number of agents gathering apples at the 2nd food pile 

<img src="images/leader-followers.png" width="600">

This is a batch run of 30-episode game-plays over:  
(1) Trajectories  
(2) Training parameters (starting temp, steps/episode, static target loc, or moving target trajectory)  
(3) Episodes trained (500, 1000, 1500, 2000, 2500, 3000)  

In [23]:
import pickle
import numpy as np

import torch
from torch.autograd import Variable

dir_names = [
             # Models of follower agents trained using a static target zone
             "models/1T-10L/followers_static/food_d37/pacifist_follower/tr5.0_t2.0_rp-1.0_300gs_s1/",
             "models/1T-10L/followers_static/food_d37/pacifist_follower/tr5.0_t2.0_rp-1.0_300gs_s2/", 
             "models/1T-10L/followers_static/food_d37/pacifist_follower/tr5.0_t2.0_rp-1.0_300gs_s3/", 
             "models/1T-10L/followers_static/food_d37/pacifist_follower/tr5.0_t1.5_rp-1.0_300gs_s1/",
             "models/1T-10L/followers_static/food_d37/pacifist_follower/tr5.0_t1.5_rp-1.0_300gs_s2/", 
             "models/1T-10L/followers_static/food_d37/pacifist_follower/tr5.0_t1.5_rp-1.0_300gs_s3/",
             # Models of follower agents trained using a moving target zone
             "models/1T-10L/followers_trajectory/food_d37/pacifist_follower/tr5.0_t2.0_rp-1.0_600gs_s1/",
             "models/1T-10L/followers_trajectory/food_d37/pacifist_follower/tr5.0_t2.0_rp-1.0_500gs_s2/", 
             "models/1T-10L/followers_trajectory/food_d37/pacifist_follower/tr5.0_t1.5_rp-1.0_600gs_s1/",
             "models/1T-10L/followers_trajectory/food_d37/pacifist_follower/tr5.0_t1.5_rp-1.0_500gs_s2/", 
             ]

episodes = [500, 1000, 1500, 2000, 2500, 3000] 

game = 'Crossing'
map_name = "food_d37"
culture = "pacifist_follower"

# Performance Statistics - for Research Report
av_agent_reward = [[[0 for i in episodes] for j in dir_names] for k in trajectories]
av_agent_crossed = [[[0 for i in episodes] for j in dir_names] for k in trajectories]  
dominating_tribe = [[[None for i in episodes] for j in dir_names] for k in trajectories]
dom_tribe_reward = [[[0 for i in episodes] for j in dir_names] for k in trajectories]
dominance = [[[0 for i in episodes] for j in dir_names] for k in trajectories]

# There will be 10 agents - 0 teams of 0 AI agents each and 0 random agent
num_ai_agents = 10
num_rdn_agents = 0
num_agents = num_ai_agents+num_rdn_agents  # just the sum of the two

# Data structure for AI agents (agents will form their own Class later on)
agents = []
actions = []
tags = []

# Initialize environment
render = True
SPEED = 1/30
num_actions = 8                       # There are 8 actions defined in Gathering
second_pile_x = 50   # x-coordinate of the 2nd food pile

# Initialize constants
num_frames = 7
max_episodes = 30
# max_frames = 800
verbose = False

# Initialize parameters for Crossing and Explore
river_penalty = -1
crossed = [0 for i in range(num_ai_agents)]  # Keep track of agents gathering from 2nd food pile
second_pile_x = 50   # x-coordinate of the 2nd food pile
jumping_zone = False

for traj_num, trajectory in enumerate(trajectories):
    print ("###### Trajectory = T{} #######".format(traj_num+1))
    
    # Adjust game steps per episode based on the trajectory pacing
    max_frames=0
    for point in trajectory:
        max_frames += point['duration']

    position = trajectory [0]  # shift the target zone to first position in trajectory
    target_zone = position['loc']  
    duration = position['duration']
    
    for dir_num, dir_name in enumerate(dir_names):
        print ("###### Dir = {} #######".format(dir_name))
    
        for eps_num, eps in enumerate(episodes):
            print ("###### Trained episodes = {} #######".format(eps))
    
            # Load models for AI agents
            agents= [[] for i in range(num_ai_agents)]
            # If episodes is provided (not 0), load the model for each AI agent
            for i in range(num_ai_agents):
                model_file = dir_name+'MA{}_{}_ep{}.p'.format(i,game,eps)
                try:
                    with open(model_file, 'rb') as f:
                        print("Load saved model for agent {}".format(i))
                        agent = Policy(num_frames, num_actions, 0)
                        optimizer = optim.Adam(agent.parameters(), lr=0.1)

                        # New way to save and load models - based on: 
                        # https://pytorch.org/tutorials/beginner/saving_loading_models.html
                        _ = load_model(agent, optimizer, f)
                        agent.eval()
                        agents[i] = agent
                except OSError:
                    print('Model file not found.')
                    raise

            # Load random agents    
            for i in range(num_ai_agents,num_agents):
                # print("Load random agent {}".format(i))
                agents.append(Rdn_Policy())
        
            # Establish tribal association
            tribes = []
            tribes.append(Tribe(name='Vikings',color='blue', culture=culture, \
                    agents=[agents[0], agents[1], agents[2], agents[3], agents[4], \
                           agents[5], agents[6], agents[7], agents[8], agents[9]]))
            tribes[0].set_target_zone(target_zone)

            # Set up agent and tribe info to pass into env
            agent_colors = [agent.color for agent in agents]
            agent_tribes = [agent.tribe for agent in agents]
            tribe_names = [tribe.name for tribe in tribes]
            tribe_target_zones = [tribe.target_zone for tribe in tribes]
        
            env = CrossingEnv(n_agents=num_agents,agent_colors=agent_colors, agent_tribes=agent_tribes, \
                  map_name=map_name, river_penalty=river_penalty, tribes=tribe_names, \
                  target_zones=tribe_target_zones, debug_agent=0) 

            # Used to accumulate episode stats for averaging
            cum_rewards = 0
            cum_crossed = 0
            cum_tags = 0
            cum_US_hits = 0
            cum_THEM_hits = 0
            cum_agent_rewards = [0 for agent in agents]
            cum_agent_tags = [0 for agent in agents]
            cum_agent_US_hits = [0 for agent in agents]
            cum_agent_THEM_hits = [0 for agent in agents]
            cum_tribe_rewards = [0 for t in tribes if t.name is not 'Crazies']

            cuda = False
            start = time.time()

            for ep in range(max_episodes):
    
                print('.', end='')  # To show progress
    
                # Initialize AI and random agent data
                actions = [0 for i in range(num_agents)]
                tags = [0 for i in range(num_agents)]
                US_hits = [0 for i in range(num_agents)]
                THEM_hits = [0 for i in range(num_agents)]
            
                # Keep track of agents gathering from 2nd food pile
                crossed = [0 for i in range(num_ai_agents)]

                env_obs = env.reset()  # Environment return observations
                """
                # For Debug only
                print (len(agents_obs))
                print (agents_obs[0].shape)
                """
    
                # Unpack observations into data structure compatible with agent Policy
                agents_obs = unpack_env_obs(env_obs)
    
                for i in range(num_ai_agents):    # Reset agent info - laser tag statistics
                    agents[i].reset_info()    
    
                if render:
                    env.render()
                    time.sleep(SPEED)  # Change speed of video rendering
    
                """
                # For Debug only
                print (len(agents_obs))
                print (agents_obs[0].shape)
                """
    
                """
                For now, we do not stack observations, and we do not implement LSTM
    
                state = np.stack([state]*num_frames)

                # Reset LSTM hidden units when episode begins
                cx = Variable(torch.zeros(1, 256))
                hx = Variable(torch.zeros(1, 256))
                """
    
                index = 0
                position = trajectory [index]  # shift the target zone to first position in trajectory
                env.target_zones[0] = position['loc']  
                duration = position['duration']
    
                for frame in range(max_frames):
            
                    if (frame+1) % duration == 0:     # time to shift the target zone
                        index += 1                    # shift the target zone to new point in trajectory 
                        if index < len(trajectory):   
                            position = trajectory[index]  
                            duration = position['duration']
                            env.target_zones[0] = position['loc'] 
             
                    for i in range(num_ai_agents):    # For AI agents
                        actions[i], _ = select_action(agents[i], agents_obs[i], cuda=cuda)
                        if actions[i] is 6:  # action[i] is a tensor, .item() returns the integer
                            tags[i] += 1   # record a tag for accessing aggressiveness
                
                    for i in range(num_ai_agents, num_agents):   # For random agents
                        actions[i] = agents[i].select_action(agents_obs[i])
                        if actions[i] is 6:
                            tags[i] += 1   # record a tag for accessing aggressiveness
        
                    """
                    For now, we do not implement LSTM
                    # Select action
                    action, log_prob, state_value, (hx,cx)  = select_action(model, state, (hx,cx))        
                    """

                    # if frame % 10 == 0:
                    #     print (actions)    
            
                    # Perform step        
                    env_obs, reward, done, info = env.step(actions)
        
                    """
                    For Debug only
                    print (env_obs)
                    print (reward)
                    print (done) 
                    """

                    for i in range(num_ai_agents):
                        agents[i].rewards.append(reward[i])  # Stack rewards

        
                    # Unpack observations into data structure compatible with agent Policy
                    agents_obs = unpack_env_obs(env_obs)
                    load_info(agents, info, narrate=False)   # Load agent info for AI agents
        
                    for i in range(num_agents):
                        US_hits[i] += agents[i].US_hit
                        THEM_hits[i] += agents[i].THEM_hit
            
                    """
                    For now, we do not stack observation, may come in handy later on
        
                    # Evict oldest diff add new diff to state
                    next_state = np.stack([next_state]*num_frames)
                    next_state[1:, :, :] = state[:-1, :, :]
                    state = next_state
                    """
        
                    if render and ep is 0:   # render only the 1st episode per batch of 30
                        env.render()
                        time.sleep(SPEED)  # Change speed of video rendering

                    if any(done):
                        print("Done after {} frames".format(frame))
                        break
                    
                    for (i, loc) in env.consumption:
                        if loc[0] > second_pile_x:
                            # print ('agent {} gathered an apple in 2nd pile'.format(i))
                            crossed[i] = 1
            
                # Print out statistics of AI agents
                ep_rewards = 0
                ep_tags = 0
                ep_US_hits = 0
                ep_THEM_hits = 0
                ep_crossed = sum(crossed)     # calculated num agents gathering in 2nd pile for episode

                if verbose:
                    print ('\nStatistics by Agent')
                    print ('===================')
                for i in range(num_ai_agents):
                    agent_tags = sum(agents[i].tag_hist)
                    ep_tags += agent_tags
                    cum_agent_tags[i] += agent_tags

                    agent_reward = sum(agents[i].rewards)
                    ep_rewards += agent_reward
                    cum_agent_rewards[i] += agent_reward

                    agent_US_hits = sum(agents[i].US_hits)
                    agent_THEM_hits = sum(agents[i].THEM_hits)
                    ep_US_hits += agent_US_hits
                    ep_THEM_hits += agent_THEM_hits
                    cum_agent_US_hits[i] += agent_US_hits
                    cum_agent_THEM_hits[i] += agent_THEM_hits
        
                    if verbose:
                        # print ("Agent{} aggressiveness is {:.2f}".format(i, agent_tags/frame))
                        print ("Agent{} reward is {:d}".format(i, agent_reward))
                        # print('US agents hit = {}'.format(agent_US_hits))
                        # print('THEM agents hit = {}'.format(agent_THEM_hits ))
        
                cum_rewards += ep_rewards
                cum_crossed += ep_crossed
                cum_tags += ep_tags
                cum_US_hits += ep_US_hits
                cum_THEM_hits += ep_THEM_hits
    
                if verbose:
                    print ('\nStatistics in Aggregate')
                    print ('=======================')
                    print ('Total rewards gathered = {}'.format(ep_rewards))
                    print ('Num agents crossed = {}'.format(ep_crossed))
                    # print ('Num laser fired = {}'.format(ep_tags))
                    # print ('Total US Hit (friendly fire) = {}'.format(ep_US_hits))
                    # print ('Total THEM Hit = {}'.format(ep_THEM_hits))
                    # print ('friendly fire (%) = {0:.3f}'.format(ep_US_hits/(ep_US_hits+ep_THEM_hits+1e-7)))

                if verbose:
                    print ('\nStatistics by Tribe')
                    print ('===================')
                for i, t in enumerate(tribes):
                    if t.name is not 'Crazies':
                        ep_tribe_reward = sum(t.sum_rewards())
                        cum_tribe_rewards[i] += ep_tribe_reward
                        if verbose:
                            print ('Tribe {} has total reward of {}'.format(t.name, ep_tribe_reward))

                for i in range(num_ai_agents):
                    agents[i].clear_history()

            env.close()  # Close the rendering window
            end = time.time()

            print ('\nAverage Statistics in Aggregate')
            print ('=================================')
            total_rewards = cum_rewards/max_episodes
            print ('Total rewards gathered = {:.1f}'.format(total_rewards))
            av_agent_reward[traj_num][dir_num][eps_num] = cum_rewards/max_episodes/num_ai_agents
            print ('Av. agent reward = {:.2f}'.format(av_agent_reward[traj_num][dir_num][eps_num]))
            av_agent_crossed[traj_num][dir_num][eps_num] = cum_crossed/max_episodes
            print ('Agents crossed (2nd food pile) = {:.1f}'.format(av_agent_crossed[traj_num][dir_num][eps_num]))
            # print ('Num laser fired = {:.1f}'.format(cum_tags/max_episodes))
            # print ('Total US Hit (friendly fire) = {:.1f}'.format(cum_US_hits/max_episodes))
            # print ('Total THEM Hit = {:.1f}'.format(cum_THEM_hits/max_episodes))
            # print ('friendly fire (%) = {:.3f}'.format(cum_US_hits/(cum_US_hits+cum_THEM_hits+1e-7)))

            print ('\nAverage Statistics by Tribe')
            print ('=============================')
       
            for i, tribe in enumerate(tribes):
                if tribe.name is not 'Crazies':
                    tribe_reward = cum_tribe_rewards[i]/max_episodes
                    print ('Tribe {} has total reward of {:.1f}'.format(tribe.name, tribe_reward))    
                
                    # Keep track of dominating team and the rewards gathered (only if more than 1 tribe)
                    if len(tribes) > 1:
                        if tribe_reward > dom_tribe_reward[traj_num][dir_num][eps_num]:   
                            dom_tribe_reward[traj_num][dir_num][eps_num] = tribe_reward
                            dominating_tribe[traj_num][dir_num][eps_num]  = tribe.name

            # Team dominance calculation (only if more than 1 tribe)
            if len(tribes) > 1:
                print ('Dominating Tribe: {}'.format(dominating_tribe[traj_num][dir_num][eps_num]))
                dominance[traj_num][dir_num][eps_num] = dom_tribe_reward[traj_num][dir_num][eps_num]/((total_rewards - \
                                                dom_tribe_reward[traj_num][dir_num][eps_num]+1.1e-7)/(len(tribes)-1))    
                print ('Team dominance: {0:.2f}x'.format(dominance[traj_num][dir_num][eps_num]))

            print ('\nAverage Statistics by Agent')
            print ('=============================')
            for i in range(num_ai_agents):
                # print ("Agent{} of {} aggressiveness is {:.2f}".format(i, agents[i].tribe, \
                #                                               cum_agent_tags[i]/(max_episodes*max_frames)))
                print ("Agent{} reward is {:.1f}".format(i, cum_agent_rewards[i]/max_episodes))
                # print('US agents hit = {:.1f}'.format(cum_agent_US_hits[i]/max_episodes))
                # print('THEM agents hit = {:.1f}'.format(cum_agent_THEM_hits[i]/max_episodes))

            print('Training time per epochs: {:.2f} sec'.format((end-start)/max_episodes))

            # print dominating team and dominance factor (only if more than 1 tribe)
            if len(tribes) > 1:
                for tribe in dominating_tribe:   # Dominating team
                    print(tribe)
                for value in dominance:      # Team dominance
                    print(value)

# Note: Statistics for Research Report        
for reward in av_agent_reward:   # Average agent reward
    print(reward)
for agents_crossed in av_agent_crossed:   # Average num agents gathering in 2nd food pile
    print(agents_crossed)



###### Trajectory = T1 #######
###### Dir = models/1T-10L/followers/food_d37/pacifist_follower/tr5.0_t2.0_rp-1.0_300gs_s1/ #######
###### Trained episodes = 500 #######
..............................
Average Statistics in Aggregate
Total rewards gathered = 162.3
Av. agent reward = 16.23
Agents crossed (2nd food pile) = 6.1

Average Statistics by Tribe
Tribe Vikings has total reward of 162.3

Average Statistics by Agent
Agent0 reward is 8.8
Agent1 reward is 1.1
Agent2 reward is 21.0
Agent3 reward is 15.7
Agent4 reward is 34.4
Agent5 reward is 0.6
Agent6 reward is 11.6
Agent7 reward is 12.9
Agent8 reward is 35.9
Agent9 reward is 20.4
Training time per epochs: 10.49 sec
###### Trained episodes = 1000 #######
..............................
Average Statistics in Aggregate
Total rewards gathered = 155.2
Av. agent reward = 15.52
Agents crossed (2nd food pile) = 7.4

Average Statistics by Tribe
Tribe Vikings has total reward of 155.2

Average Statistics by Agent
Agent0 reward is 14.6
Agent1 re

..............................
Average Statistics in Aggregate
Total rewards gathered = 211.8
Av. agent reward = 21.18
Agents crossed (2nd food pile) = 7.9

Average Statistics by Tribe
Tribe Vikings has total reward of 211.8

Average Statistics by Agent
Agent0 reward is 24.3
Agent1 reward is 20.0
Agent2 reward is 10.3
Agent3 reward is 19.8
Agent4 reward is 31.3
Agent5 reward is 20.0
Agent6 reward is 47.5
Agent7 reward is 12.5
Agent8 reward is 9.8
Agent9 reward is 16.3
Training time per epochs: 9.49 sec
###### Trained episodes = 1500 #######
..............................
Average Statistics in Aggregate
Total rewards gathered = 213.4
Av. agent reward = 21.34
Agents crossed (2nd food pile) = 8.0

Average Statistics by Tribe
Tribe Vikings has total reward of 213.4

Average Statistics by Agent
Agent0 reward is 44.6
Agent1 reward is 20.4
Agent2 reward is 28.4
Agent3 reward is 20.2
Agent4 reward is 28.8
Agent5 reward is 2.7
Agent6 reward is 21.9
Agent7 reward is 16.9
Agent8 reward is 7.3
Age

..............................
Average Statistics in Aggregate
Total rewards gathered = 179.2
Av. agent reward = 17.92
Agents crossed (2nd food pile) = 7.4

Average Statistics by Tribe
Tribe Vikings has total reward of 179.2

Average Statistics by Agent
Agent0 reward is 0.0
Agent1 reward is 21.6
Agent2 reward is 26.6
Agent3 reward is 30.7
Agent4 reward is 5.8
Agent5 reward is 16.9
Agent6 reward is 16.3
Agent7 reward is 19.7
Agent8 reward is 15.7
Agent9 reward is 25.9
Training time per epochs: 11.37 sec
###### Trained episodes = 2000 #######
..............................
Average Statistics in Aggregate
Total rewards gathered = 163.7
Av. agent reward = 16.37
Agents crossed (2nd food pile) = 7.0

Average Statistics by Tribe
Tribe Vikings has total reward of 163.7

Average Statistics by Agent
Agent0 reward is 0.0
Agent1 reward is 22.0
Agent2 reward is 21.4
Agent3 reward is 23.6
Agent4 reward is 5.8
Agent5 reward is 18.4
Agent6 reward is 3.1
Agent7 reward is 20.7
Agent8 reward is 18.8
Agen

..............................
Average Statistics in Aggregate
Total rewards gathered = 135.4
Av. agent reward = 13.54
Agents crossed (2nd food pile) = 6.5

Average Statistics by Tribe
Tribe Vikings has total reward of 135.4

Average Statistics by Agent
Agent0 reward is 15.0
Agent1 reward is 23.7
Agent2 reward is 15.0
Agent3 reward is 13.1
Agent4 reward is 44.7
Agent5 reward is 0.0
Agent6 reward is 4.5
Agent7 reward is 0.4
Agent8 reward is 16.1
Agent9 reward is 3.0
Training time per epochs: 9.58 sec
###### Trained episodes = 2500 #######
..............................
Average Statistics in Aggregate
Total rewards gathered = 163.6
Av. agent reward = 16.36
Agents crossed (2nd food pile) = 5.2

Average Statistics by Tribe
Tribe Vikings has total reward of 163.6

Average Statistics by Agent
Agent0 reward is 29.0
Agent1 reward is 23.2
Agent2 reward is 9.4
Agent3 reward is 46.0
Agent4 reward is 29.1
Agent5 reward is 0.0
Agent6 reward is 22.9
Agent7 reward is 2.6
Agent8 reward is 1.3
Agent9 r

..............................
Average Statistics in Aggregate
Total rewards gathered = 186.3
Av. agent reward = 18.63
Agents crossed (2nd food pile) = 8.1

Average Statistics by Tribe
Tribe Vikings has total reward of 186.3

Average Statistics by Agent
Agent0 reward is 30.4
Agent1 reward is 22.5
Agent2 reward is 26.6
Agent3 reward is 17.1
Agent4 reward is 13.7
Agent5 reward is 16.2
Agent6 reward is 18.4
Agent7 reward is 12.0
Agent8 reward is 10.3
Agent9 reward is 19.2
Training time per epochs: 7.01 sec
###### Trained episodes = 3000 #######
..............................
Average Statistics in Aggregate
Total rewards gathered = 202.9
Av. agent reward = 20.29
Agents crossed (2nd food pile) = 6.7

Average Statistics by Tribe
Tribe Vikings has total reward of 202.9

Average Statistics by Agent
Agent0 reward is 31.2
Agent1 reward is 21.1
Agent2 reward is 29.2
Agent3 reward is 31.8
Agent4 reward is 23.3
Agent5 reward is 10.6
Agent6 reward is 12.6
Agent7 reward is 12.8
Agent8 reward is 5.5
A

..............................
Average Statistics in Aggregate
Total rewards gathered = 165.4
Av. agent reward = 16.54
Agents crossed (2nd food pile) = 5.4

Average Statistics by Tribe
Tribe Vikings has total reward of 165.4

Average Statistics by Agent
Agent0 reward is 0.0
Agent1 reward is 19.1
Agent2 reward is 51.0
Agent3 reward is 29.7
Agent4 reward is 5.4
Agent5 reward is 5.7
Agent6 reward is 0.3
Agent7 reward is 21.5
Agent8 reward is 14.0
Agent9 reward is 18.8
Training time per epochs: 7.94 sec
###### Dir = models/1T-10L/followers/food_d37/pacifist_follower/tr5.0_t2.0_rp-1.0_300gs_s3/ #######
###### Trained episodes = 500 #######
..............................
Average Statistics in Aggregate
Total rewards gathered = 210.6
Av. agent reward = 21.06
Agents crossed (2nd food pile) = 5.4

Average Statistics by Tribe
Tribe Vikings has total reward of 210.6

Average Statistics by Agent
Agent0 reward is 25.9
Agent1 reward is 7.4
Agent2 reward is 1.5
Agent3 reward is 15.4
Agent4 reward is 

..............................
Average Statistics in Aggregate
Total rewards gathered = 144.6
Av. agent reward = 14.46
Agents crossed (2nd food pile) = 6.0

Average Statistics by Tribe
Tribe Vikings has total reward of 144.6

Average Statistics by Agent
Agent0 reward is 0.2
Agent1 reward is 23.1
Agent2 reward is 19.2
Agent3 reward is 6.8
Agent4 reward is 18.6
Agent5 reward is 16.3
Agent6 reward is 29.4
Agent7 reward is 17.5
Agent8 reward is 8.8
Agent9 reward is 4.5
Training time per epochs: 9.75 sec
###### Trained episodes = 1000 #######
..............................
Average Statistics in Aggregate
Total rewards gathered = 184.5
Av. agent reward = 18.45
Agents crossed (2nd food pile) = 7.3

Average Statistics by Tribe
Tribe Vikings has total reward of 184.5

Average Statistics by Agent
Agent0 reward is 0.4
Agent1 reward is 18.9
Agent2 reward is 29.0
Agent3 reward is 31.1
Agent4 reward is 16.1
Agent5 reward is 20.7
Agent6 reward is 15.2
Agent7 reward is 23.5
Agent8 reward is 17.2
Agent

..............................
Average Statistics in Aggregate
Total rewards gathered = 145.1
Av. agent reward = 14.51
Agents crossed (2nd food pile) = 7.2

Average Statistics by Tribe
Tribe Vikings has total reward of 145.1

Average Statistics by Agent
Agent0 reward is 11.1
Agent1 reward is 29.5
Agent2 reward is 21.9
Agent3 reward is 8.0
Agent4 reward is 15.1
Agent5 reward is 0.0
Agent6 reward is 21.2
Agent7 reward is 19.6
Agent8 reward is 5.1
Agent9 reward is 13.5
Training time per epochs: 10.16 sec
###### Trained episodes = 1500 #######
..............................
Average Statistics in Aggregate
Total rewards gathered = 140.3
Av. agent reward = 14.03
Agents crossed (2nd food pile) = 6.8

Average Statistics by Tribe
Tribe Vikings has total reward of 140.3

Average Statistics by Agent
Agent0 reward is 22.1
Agent1 reward is 19.7
Agent2 reward is 22.5
Agent3 reward is 31.6
Agent4 reward is 11.4
Agent5 reward is 0.0
Agent6 reward is 13.0
Agent7 reward is 2.6
Agent8 reward is 5.1
Agent

..............................
Average Statistics in Aggregate
Total rewards gathered = 194.0
Av. agent reward = 19.40
Agents crossed (2nd food pile) = 8.0

Average Statistics by Tribe
Tribe Vikings has total reward of 194.0

Average Statistics by Agent
Agent0 reward is 28.4
Agent1 reward is 15.4
Agent2 reward is 25.1
Agent3 reward is 25.7
Agent4 reward is 24.9
Agent5 reward is 8.6
Agent6 reward is 18.0
Agent7 reward is 22.1
Agent8 reward is 7.0
Agent9 reward is 18.6
Training time per epochs: 11.53 sec
###### Trained episodes = 2000 #######
..............................
Average Statistics in Aggregate
Total rewards gathered = 184.2
Av. agent reward = 18.42
Agents crossed (2nd food pile) = 7.8

Average Statistics by Tribe
Tribe Vikings has total reward of 184.2

Average Statistics by Agent
Agent0 reward is 32.9
Agent1 reward is 10.8
Agent2 reward is 30.8
Agent3 reward is 20.0
Agent4 reward is 13.0
Agent5 reward is 3.7
Agent6 reward is 9.6
Agent7 reward is 31.1
Agent8 reward is 7.9
Agen

..............................
Average Statistics in Aggregate
Total rewards gathered = 147.7
Av. agent reward = 14.77
Agents crossed (2nd food pile) = 6.4

Average Statistics by Tribe
Tribe Vikings has total reward of 147.7

Average Statistics by Agent
Agent0 reward is 0.0
Agent1 reward is 22.4
Agent2 reward is 20.0
Agent3 reward is 29.0
Agent4 reward is 5.9
Agent5 reward is 21.0
Agent6 reward is 2.9
Agent7 reward is 13.3
Agent8 reward is 9.9
Agent9 reward is 23.3
Training time per epochs: 10.16 sec
###### Trained episodes = 2500 #######
..............................
Average Statistics in Aggregate
Total rewards gathered = 146.6
Av. agent reward = 14.66
Agents crossed (2nd food pile) = 6.7

Average Statistics by Tribe
Tribe Vikings has total reward of 146.6

Average Statistics by Agent
Agent0 reward is 0.0
Agent1 reward is 18.3
Agent2 reward is 13.2
Agent3 reward is 37.9
Agent4 reward is 15.5
Agent5 reward is 12.2
Agent6 reward is 3.0
Agent7 reward is 19.7
Agent8 reward is 7.7
Agent9

..............................
Average Statistics in Aggregate
Total rewards gathered = 125.2
Av. agent reward = 12.52
Agents crossed (2nd food pile) = 5.2

Average Statistics by Tribe
Tribe Vikings has total reward of 125.2

Average Statistics by Agent
Agent0 reward is 8.5
Agent1 reward is 22.3
Agent2 reward is 6.8
Agent3 reward is 39.6
Agent4 reward is 28.7
Agent5 reward is 0.0
Agent6 reward is 15.3
Agent7 reward is 3.9
Agent8 reward is 0.0
Agent9 reward is 0.0
Training time per epochs: 8.34 sec
###### Trained episodes = 3000 #######
..............................
Average Statistics in Aggregate
Total rewards gathered = 89.3
Av. agent reward = 8.93
Agents crossed (2nd food pile) = 4.1

Average Statistics by Tribe
Tribe Vikings has total reward of 89.3

Average Statistics by Agent
Agent0 reward is 1.9
Agent1 reward is 16.9
Agent2 reward is 2.0
Agent3 reward is 36.1
Agent4 reward is 20.5
Agent5 reward is 0.0
Agent6 reward is 8.7
Agent7 reward is 0.6
Agent8 reward is 2.6
Agent9 reward i

..............................
Average Statistics in Aggregate
Total rewards gathered = 210.8
Av. agent reward = 21.08
Agents crossed (2nd food pile) = 6.5

Average Statistics by Tribe
Tribe Vikings has total reward of 210.8

Average Statistics by Agent
Agent0 reward is 18.8
Agent1 reward is 10.8
Agent2 reward is 18.3
Agent3 reward is 61.1
Agent4 reward is 11.3
Agent5 reward is 18.9
Agent6 reward is 10.5
Agent7 reward is 8.6
Agent8 reward is 20.5
Agent9 reward is 31.9
Training time per epochs: 5.33 sec
[[16.233333333333334, 15.52, 14.653333333333332, 12.559999999999999, 16.466666666666665, 12.496666666666666], [19.18, 19.826666666666668, 19.736666666666668, 16.106666666666666, 15.943333333333333, 14.343333333333334], [20.009999999999998, 21.17666666666667, 21.34, 19.96333333333333, 20.593333333333334, 18.43]]
[[16.6, 16.46, 15.883333333333335, 13.303333333333333, 16.553333333333335, 12.493333333333334], [18.766666666666666, 18.94, 17.923333333333332, 16.366666666666667, 16.336666666666

## Statistics for Research Report 

In [24]:
# Note: Statistics for Research Report   
print ('Average Agent Rewards')
for k, reward_traj in enumerate(av_agent_reward):   # Average agent reward
    print ("For Trajectory {}".format(k))
    for j, reward in enumerate(reward_traj):
        print (reward)
    
print ('Agents Crossed (2nd food pile)')    
for k, crossed_traj in enumerate(av_agent_crossed):   # Average num agents gathering in 2nd food pile
    print ("For Trajectory {}".format(k))
    for j, agents_crossed in enumerate(crossed_traj):
        print(agents_crossed)

Average Agent Rewards
For Trajectory 0
[16.233333333333334, 15.52, 14.653333333333332, 12.559999999999999, 16.466666666666665, 12.496666666666666]
[19.18, 19.826666666666668, 19.736666666666668, 16.106666666666666, 15.943333333333333, 14.343333333333334]
[20.009999999999998, 21.17666666666667, 21.34, 19.96333333333333, 20.593333333333334, 18.43]
For Trajectory 1
[16.6, 16.46, 15.883333333333335, 13.303333333333333, 16.553333333333335, 12.493333333333334]
[18.766666666666666, 18.94, 17.923333333333332, 16.366666666666667, 16.336666666666666, 14.966666666666665]
[21.166666666666664, 20.116666666666667, 21.903333333333332, 19.533333333333335, 18.903333333333332, 22.943333333333335]
For Trajectory 2
[15.363333333333333, 15.01, 16.21333333333333, 13.543333333333333, 16.363333333333333, 11.503333333333334]
[18.653333333333332, 18.856666666666666, 18.94, 16.10333333333333, 16.303333333333335, 15.026666666666667]
[20.343333333333334, 19.666666666666664, 20.533333333333335, 19.473333333333333, 

In [9]:
env.close()

## Multiple Game-Play - Trajectories (Map = food_d37_river_w1_d25)

Our research requires the gathering of agent and team metrics averaged over 30 episodes of game play. The two metrics gathered and averaged are:

* Average agent reward - average number of apples gathered per agent per episode  
* The number of agents gathering apples at the 2nd food pile 

<img src="images/Crossing-river-leadfollow.png" width="600">

This is a batch run of 30-episode game-plays over:  
(1) Trajectories  
(2) Training parameters (starting temp, steps/episode, static target loc, or moving target trajectory)  
(3) Episodes trained (500, 1000, 1500, 2000, 2500, 3000)  

In [10]:
import pickle
import numpy as np

import torch
from torch.autograd import Variable

dir_names = [
    # Agents trained using moving target zone and map = food_d37_river_w1_d25
    "models/1T-10L/followers_trajectory/food_d37_river_w1_d25/pacifist_follower/tr5.0_t1.5_rp-1.0_600gs_s1/",   # scenario=11
    "models/1T-10L/followers_trajectory/food_d37_river_w1_d25/pacifist_follower/tr5.0_t1.5_rp-1.0_500gs_s2/",   # scenario=12
    "models/1T-10L/followers_trajectory/food_d37_river_w1_d25/pacifist_follower/tr5.0_t2.0_rp-1.0_600gs_s1/",   # scenario=13
    "models/1T-10L/followers_trajectory/food_d37_river_w1_d25/pacifist_follower/tr5.0_t2.0_rp-1.0_500gs_s2/"   # scenario=14
]

episodes = [500, 1000, 1500, 2000, 2500, 3000] 

game = 'Crossing'
map_name = "food_d37_river_w1_d25"
culture = "pacifist_follower"

# Performance Statistics - for Research Report
av_agent_reward = [[[0 for i in episodes] for j in dir_names] for k in trajectories]
av_agent_crossed = [[[0 for i in episodes] for j in dir_names] for k in trajectories]  
dominating_tribe = [[[None for i in episodes] for j in dir_names] for k in trajectories]
dom_tribe_reward = [[[0 for i in episodes] for j in dir_names] for k in trajectories]
dominance = [[[0 for i in episodes] for j in dir_names] for k in trajectories]

# There will be 10 agents - 0 teams of 0 AI agents each and 0 random agent
num_ai_agents = 10
num_rdn_agents = 0
num_agents = num_ai_agents+num_rdn_agents  # just the sum of the two

# Data structure for AI agents (agents will form their own Class later on)
agents = []
actions = []
tags = []

# Initialize environment
render = True
SPEED = 1/30
num_actions = 8                       # There are 8 actions defined in Gathering
second_pile_x = 50   # x-coordinate of the 2nd food pile

# Initialize constants
num_frames = 7
max_episodes = 30
# max_frames = 800
verbose = False

# Initialize parameters for Crossing and Explore
river_penalty = -1
crossed = [0 for i in range(num_ai_agents)]  # Keep track of agents gathering from 2nd food pile
second_pile_x = 50   # x-coordinate of the 2nd food pile
jumping_zone = False

for traj_num, trajectory in enumerate(trajectories):
    print ("###### Trajectory = T{} #######".format(traj_num+1))
    
    # Adjust game steps per episode based on the trajectory pacing
    max_frames=0
    for point in trajectory:
        max_frames += point['duration']

    position = trajectory [0]  # shift the target zone to first position in trajectory
    target_zone = position['loc']  
    duration = position['duration']
    
    for dir_num, dir_name in enumerate(dir_names):
        print ("###### Dir = {} #######".format(dir_name))
    
        for eps_num, eps in enumerate(episodes):
            print ("###### Trained episodes = {} #######".format(eps))
    
            # Load models for AI agents
            agents= [[] for i in range(num_ai_agents)]
            # If episodes is provided (not 0), load the model for each AI agent
            for i in range(num_ai_agents):
                model_file = dir_name+'MA{}_{}_ep{}.p'.format(i,game,eps)
                try:
                    with open(model_file, 'rb') as f:
                        print("Load saved model for agent {}".format(i))
                        agent = Policy(num_frames, num_actions, 0)
                        optimizer = optim.Adam(agent.parameters(), lr=0.1)

                        # New way to save and load models - based on: 
                        # https://pytorch.org/tutorials/beginner/saving_loading_models.html
                        _ = load_model(agent, optimizer, f)
                        agent.eval()
                        agents[i] = agent
                except OSError:
                    print('Model file not found.')
                    raise

            # Load random agents    
            for i in range(num_ai_agents,num_agents):
                # print("Load random agent {}".format(i))
                agents.append(Rdn_Policy())
        
            # Establish tribal association
            tribes = []
            tribes.append(Tribe(name='Vikings',color='blue', culture=culture, \
                    agents=[agents[0], agents[1], agents[2], agents[3], agents[4], \
                           agents[5], agents[6], agents[7], agents[8], agents[9]]))
            tribes[0].set_target_zone(target_zone)

            # Set up agent and tribe info to pass into env
            agent_colors = [agent.color for agent in agents]
            agent_tribes = [agent.tribe for agent in agents]
            tribe_names = [tribe.name for tribe in tribes]
            tribe_target_zones = [tribe.target_zone for tribe in tribes]
        
            env = CrossingEnv(n_agents=num_agents,agent_colors=agent_colors, agent_tribes=agent_tribes, \
                  map_name=map_name, river_penalty=river_penalty, tribes=tribe_names, \
                  target_zones=tribe_target_zones, debug_agent=0) 

            # Used to accumulate episode stats for averaging
            cum_rewards = 0
            cum_crossed = 0
            cum_tags = 0
            cum_US_hits = 0
            cum_THEM_hits = 0
            cum_agent_rewards = [0 for agent in agents]
            cum_agent_tags = [0 for agent in agents]
            cum_agent_US_hits = [0 for agent in agents]
            cum_agent_THEM_hits = [0 for agent in agents]
            cum_tribe_rewards = [0 for t in tribes if t.name is not 'Crazies']

            cuda = False
            start = time.time()

            for ep in range(max_episodes):
    
                print('.', end='')  # To show progress
    
                # Initialize AI and random agent data
                actions = [0 for i in range(num_agents)]
                tags = [0 for i in range(num_agents)]
                US_hits = [0 for i in range(num_agents)]
                THEM_hits = [0 for i in range(num_agents)]
            
                # Keep track of agents gathering from 2nd food pile
                crossed = [0 for i in range(num_ai_agents)]

                env_obs = env.reset()  # Environment return observations
                """
                # For Debug only
                print (len(agents_obs))
                print (agents_obs[0].shape)
                """
    
                # Unpack observations into data structure compatible with agent Policy
                agents_obs = unpack_env_obs(env_obs)
    
                for i in range(num_ai_agents):    # Reset agent info - laser tag statistics
                    agents[i].reset_info()    
    
                if render:
                    env.render()
                    time.sleep(SPEED)  # Change speed of video rendering
    
                """
                # For Debug only
                print (len(agents_obs))
                print (agents_obs[0].shape)
                """
    
                """
                For now, we do not stack observations, and we do not implement LSTM
    
                state = np.stack([state]*num_frames)

                # Reset LSTM hidden units when episode begins
                cx = Variable(torch.zeros(1, 256))
                hx = Variable(torch.zeros(1, 256))
                """
    
                index = 0
                position = trajectory [index]  # shift the target zone to first position in trajectory
                env.target_zones[0] = position['loc']  
                duration = position['duration']
    
                for frame in range(max_frames):
            
                    if (frame+1) % duration == 0:     # time to shift the target zone
                        index += 1                    # shift the target zone to new point in trajectory 
                        if index < len(trajectory):   
                            position = trajectory[index]  
                            duration = position['duration']
                            env.target_zones[0] = position['loc'] 
             
                    for i in range(num_ai_agents):    # For AI agents
                        actions[i], _ = select_action(agents[i], agents_obs[i], cuda=cuda)
                        if actions[i] is 6:  # action[i] is a tensor, .item() returns the integer
                            tags[i] += 1   # record a tag for accessing aggressiveness
                
                    for i in range(num_ai_agents, num_agents):   # For random agents
                        actions[i] = agents[i].select_action(agents_obs[i])
                        if actions[i] is 6:
                            tags[i] += 1   # record a tag for accessing aggressiveness
        
                    """
                    For now, we do not implement LSTM
                    # Select action
                    action, log_prob, state_value, (hx,cx)  = select_action(model, state, (hx,cx))        
                    """

                    # if frame % 10 == 0:
                    #     print (actions)    
            
                    # Perform step        
                    env_obs, reward, done, info = env.step(actions)
        
                    """
                    For Debug only
                    print (env_obs)
                    print (reward)
                    print (done) 
                    """

                    for i in range(num_ai_agents):
                        agents[i].rewards.append(reward[i])  # Stack rewards

        
                    # Unpack observations into data structure compatible with agent Policy
                    agents_obs = unpack_env_obs(env_obs)
                    load_info(agents, info, narrate=False)   # Load agent info for AI agents
        
                    for i in range(num_agents):
                        US_hits[i] += agents[i].US_hit
                        THEM_hits[i] += agents[i].THEM_hit
            
                    """
                    For now, we do not stack observation, may come in handy later on
        
                    # Evict oldest diff add new diff to state
                    next_state = np.stack([next_state]*num_frames)
                    next_state[1:, :, :] = state[:-1, :, :]
                    state = next_state
                    """
        
                    if render and ep is 0:   # render only the 1st episode per batch of 30
                        env.render()
                        time.sleep(SPEED)  # Change speed of video rendering

                    if any(done):
                        print("Done after {} frames".format(frame))
                        break
                    
                    for (i, loc) in env.consumption:
                        if loc[0] > second_pile_x:
                            # print ('agent {} gathered an apple in 2nd pile'.format(i))
                            crossed[i] = 1
            
                # Print out statistics of AI agents
                ep_rewards = 0
                ep_tags = 0
                ep_US_hits = 0
                ep_THEM_hits = 0
                ep_crossed = sum(crossed)     # calculated num agents gathering in 2nd pile for episode

                if verbose:
                    print ('\nStatistics by Agent')
                    print ('===================')
                for i in range(num_ai_agents):
                    agent_tags = sum(agents[i].tag_hist)
                    ep_tags += agent_tags
                    cum_agent_tags[i] += agent_tags

                    agent_reward = sum(agents[i].rewards)
                    ep_rewards += agent_reward
                    cum_agent_rewards[i] += agent_reward

                    agent_US_hits = sum(agents[i].US_hits)
                    agent_THEM_hits = sum(agents[i].THEM_hits)
                    ep_US_hits += agent_US_hits
                    ep_THEM_hits += agent_THEM_hits
                    cum_agent_US_hits[i] += agent_US_hits
                    cum_agent_THEM_hits[i] += agent_THEM_hits
        
                    if verbose:
                        # print ("Agent{} aggressiveness is {:.2f}".format(i, agent_tags/frame))
                        print ("Agent{} reward is {:d}".format(i, agent_reward))
                        # print('US agents hit = {}'.format(agent_US_hits))
                        # print('THEM agents hit = {}'.format(agent_THEM_hits ))
        
                cum_rewards += ep_rewards
                cum_crossed += ep_crossed
                cum_tags += ep_tags
                cum_US_hits += ep_US_hits
                cum_THEM_hits += ep_THEM_hits
    
                if verbose:
                    print ('\nStatistics in Aggregate')
                    print ('=======================')
                    print ('Total rewards gathered = {}'.format(ep_rewards))
                    print ('Num agents crossed = {}'.format(ep_crossed))
                    # print ('Num laser fired = {}'.format(ep_tags))
                    # print ('Total US Hit (friendly fire) = {}'.format(ep_US_hits))
                    # print ('Total THEM Hit = {}'.format(ep_THEM_hits))
                    # print ('friendly fire (%) = {0:.3f}'.format(ep_US_hits/(ep_US_hits+ep_THEM_hits+1e-7)))

                if verbose:
                    print ('\nStatistics by Tribe')
                    print ('===================')
                for i, t in enumerate(tribes):
                    if t.name is not 'Crazies':
                        ep_tribe_reward = sum(t.sum_rewards())
                        cum_tribe_rewards[i] += ep_tribe_reward
                        if verbose:
                            print ('Tribe {} has total reward of {}'.format(t.name, ep_tribe_reward))

                for i in range(num_ai_agents):
                    agents[i].clear_history()

            env.close()  # Close the rendering window
            end = time.time()

            print ('\nAverage Statistics in Aggregate')
            print ('=================================')
            total_rewards = cum_rewards/max_episodes
            print ('Total rewards gathered = {:.1f}'.format(total_rewards))
            av_agent_reward[traj_num][dir_num][eps_num] = cum_rewards/max_episodes/num_ai_agents
            print ('Av. agent reward = {:.2f}'.format(av_agent_reward[traj_num][dir_num][eps_num]))
            av_agent_crossed[traj_num][dir_num][eps_num] = cum_crossed/max_episodes
            print ('Agents crossed (2nd food pile) = {:.1f}'.format(av_agent_crossed[traj_num][dir_num][eps_num]))
            # print ('Num laser fired = {:.1f}'.format(cum_tags/max_episodes))
            # print ('Total US Hit (friendly fire) = {:.1f}'.format(cum_US_hits/max_episodes))
            # print ('Total THEM Hit = {:.1f}'.format(cum_THEM_hits/max_episodes))
            # print ('friendly fire (%) = {:.3f}'.format(cum_US_hits/(cum_US_hits+cum_THEM_hits+1e-7)))

            print ('\nAverage Statistics by Tribe')
            print ('=============================')
       
            for i, tribe in enumerate(tribes):
                if tribe.name is not 'Crazies':
                    tribe_reward = cum_tribe_rewards[i]/max_episodes
                    print ('Tribe {} has total reward of {:.1f}'.format(tribe.name, tribe_reward))    
                
                    # Keep track of dominating team and the rewards gathered (only if more than 1 tribe)
                    if len(tribes) > 1:
                        if tribe_reward > dom_tribe_reward[traj_num][dir_num][eps_num]:   
                            dom_tribe_reward[traj_num][dir_num][eps_num] = tribe_reward
                            dominating_tribe[traj_num][dir_num][eps_num]  = tribe.name

            # Team dominance calculation (only if more than 1 tribe)
            if len(tribes) > 1:
                print ('Dominating Tribe: {}'.format(dominating_tribe[traj_num][dir_num][eps_num]))
                dominance[traj_num][dir_num][eps_num] = dom_tribe_reward[traj_num][dir_num][eps_num]/((total_rewards - \
                                                dom_tribe_reward[traj_num][dir_num][eps_num]+1.1e-7)/(len(tribes)-1))    
                print ('Team dominance: {0:.2f}x'.format(dominance[traj_num][dir_num][eps_num]))

            print ('\nAverage Statistics by Agent')
            print ('=============================')
            for i in range(num_ai_agents):
                # print ("Agent{} of {} aggressiveness is {:.2f}".format(i, agents[i].tribe, \
                #                                               cum_agent_tags[i]/(max_episodes*max_frames)))
                print ("Agent{} reward is {:.1f}".format(i, cum_agent_rewards[i]/max_episodes))
                # print('US agents hit = {:.1f}'.format(cum_agent_US_hits[i]/max_episodes))
                # print('THEM agents hit = {:.1f}'.format(cum_agent_THEM_hits[i]/max_episodes))

            print('Training time per epochs: {:.2f} sec'.format((end-start)/max_episodes))

            # print dominating team and dominance factor (only if more than 1 tribe)
            if len(tribes) > 1:
                for tribe in dominating_tribe:   # Dominating team
                    print(tribe)
                for value in dominance:      # Team dominance
                    print(value)

# Note: Statistics for Research Report        
for reward in av_agent_reward:   # Average agent reward
    print(reward)
for agents_crossed in av_agent_crossed:   # Average num agents gathering in 2nd food pile
    print(agents_crossed)



###### Trajectory = T1 #######
###### Dir = models/1T-10L/followers_trajectory/food_d37_river_w1_d25/pacifist_follower/tr5.0_t1.5_rp-1.0_600gs_s1/ #######
###### Trained episodes = 500 #######
..............................
Average Statistics in Aggregate
Total rewards gathered = 223.1
Av. agent reward = 22.31
Agents crossed (2nd food pile) = 6.1

Average Statistics by Tribe
Tribe Vikings has total reward of 223.1

Average Statistics by Agent
Agent0 reward is 4.7
Agent1 reward is 52.4
Agent2 reward is 13.8
Agent3 reward is 8.8
Agent4 reward is 22.6
Agent5 reward is 6.8
Agent6 reward is 47.8
Agent7 reward is 38.0
Agent8 reward is 11.8
Agent9 reward is 16.5
Training time per epochs: 5.53 sec
###### Trained episodes = 1000 #######
..............................
Average Statistics in Aggregate
Total rewards gathered = 234.4
Av. agent reward = 23.44
Agents crossed (2nd food pile) = 8.3

Average Statistics by Tribe
Tribe Vikings has total reward of 234.4

Average Statistics by Agent
Agent0 r

..............................
Average Statistics in Aggregate
Total rewards gathered = 161.9
Av. agent reward = 16.19
Agents crossed (2nd food pile) = 7.3

Average Statistics by Tribe
Tribe Vikings has total reward of 161.9

Average Statistics by Agent
Agent0 reward is 22.1
Agent1 reward is 9.1
Agent2 reward is 21.1
Agent3 reward is 8.5
Agent4 reward is 15.9
Agent5 reward is 29.0
Agent6 reward is 18.6
Agent7 reward is 21.3
Agent8 reward is 5.8
Agent9 reward is 10.5
Training time per epochs: 5.48 sec
###### Trained episodes = 1500 #######
..............................
Average Statistics in Aggregate
Total rewards gathered = 211.3
Av. agent reward = 21.13
Agents crossed (2nd food pile) = 7.2

Average Statistics by Tribe
Tribe Vikings has total reward of 211.3

Average Statistics by Agent
Agent0 reward is 30.1
Agent1 reward is -20.9
Agent2 reward is 22.9
Agent3 reward is 24.5
Agent4 reward is 14.7
Agent5 reward is 20.2
Agent6 reward is 23.1
Agent7 reward is 88.1
Agent8 reward is 1.3
Age

..............................
Average Statistics in Aggregate
Total rewards gathered = 226.9
Av. agent reward = 22.69
Agents crossed (2nd food pile) = 7.9

Average Statistics by Tribe
Tribe Vikings has total reward of 226.9

Average Statistics by Agent
Agent0 reward is -2.6
Agent1 reward is 99.8
Agent2 reward is 11.4
Agent3 reward is 8.3
Agent4 reward is 18.6
Agent5 reward is 10.2
Agent6 reward is 23.7
Agent7 reward is 23.9
Agent8 reward is 21.3
Agent9 reward is 12.3
Training time per epochs: 5.48 sec
###### Trained episodes = 2000 #######
..............................
Average Statistics in Aggregate
Total rewards gathered = 251.5
Av. agent reward = 25.15
Agents crossed (2nd food pile) = 7.5

Average Statistics by Tribe
Tribe Vikings has total reward of 251.5

Average Statistics by Agent
Agent0 reward is -2.7
Agent1 reward is 96.8
Agent2 reward is 24.2
Agent3 reward is 18.5
Agent4 reward is 21.7
Agent5 reward is 22.5
Agent6 reward is 19.0
Agent7 reward is 19.0
Agent8 reward is 20.3
A

..............................
Average Statistics in Aggregate
Total rewards gathered = 230.8
Av. agent reward = 23.08
Agents crossed (2nd food pile) = 7.3

Average Statistics by Tribe
Tribe Vikings has total reward of 230.8

Average Statistics by Agent
Agent0 reward is 31.9
Agent1 reward is 14.7
Agent2 reward is 16.7
Agent3 reward is 8.6
Agent4 reward is 1.0
Agent5 reward is 19.3
Agent6 reward is 22.1
Agent7 reward is 103.9
Agent8 reward is 0.0
Agent9 reward is 12.5
Training time per epochs: 7.36 sec
###### Trained episodes = 2500 #######
..............................
Average Statistics in Aggregate
Total rewards gathered = 165.3
Av. agent reward = 16.53
Agents crossed (2nd food pile) = 6.9

Average Statistics by Tribe
Tribe Vikings has total reward of 165.3

Average Statistics by Agent
Agent0 reward is 37.0
Agent1 reward is 11.8
Agent2 reward is 11.0
Agent3 reward is 23.2
Agent4 reward is 8.7
Agent5 reward is 3.4
Agent6 reward is 24.8
Agent7 reward is 28.3
Agent8 reward is 0.0
Agent

..............................
Average Statistics in Aggregate
Total rewards gathered = 228.5
Av. agent reward = 22.85
Agents crossed (2nd food pile) = 6.8

Average Statistics by Tribe
Tribe Vikings has total reward of 228.5

Average Statistics by Agent
Agent0 reward is 0.0
Agent1 reward is 80.4
Agent2 reward is 24.1
Agent3 reward is 13.5
Agent4 reward is 27.5
Agent5 reward is 19.1
Agent6 reward is 21.1
Agent7 reward is 19.3
Agent8 reward is 14.7
Agent9 reward is 8.8
Training time per epochs: 6.59 sec
###### Trained episodes = 3000 #######
..............................
Average Statistics in Aggregate
Total rewards gathered = 250.8
Av. agent reward = 25.08
Agents crossed (2nd food pile) = 7.3

Average Statistics by Tribe
Tribe Vikings has total reward of 250.8

Average Statistics by Agent
Agent0 reward is 0.0
Agent1 reward is 83.7
Agent2 reward is 22.0
Agent3 reward is 21.7
Agent4 reward is 20.4
Agent5 reward is 23.1
Agent6 reward is 22.1
Agent7 reward is 21.6
Agent8 reward is 18.2
Age

..............................
Average Statistics in Aggregate
Total rewards gathered = 126.8
Av. agent reward = 12.68
Agents crossed (2nd food pile) = 7.0

Average Statistics by Tribe
Tribe Vikings has total reward of 126.8

Average Statistics by Agent
Agent0 reward is 14.4
Agent1 reward is 0.0
Agent2 reward is 4.0
Agent3 reward is 21.3
Agent4 reward is 0.7
Agent5 reward is 21.8
Agent6 reward is 22.4
Agent7 reward is 19.6
Agent8 reward is -0.0
Agent9 reward is 22.7
Training time per epochs: 5.76 sec
###### Dir = models/1T-10L/followers_trajectory/food_d37_river_w1_d25/pacifist_follower/tr5.0_t2.0_rp-1.0_500gs_s2/ #######
###### Trained episodes = 500 #######
..............................
Average Statistics in Aggregate
Total rewards gathered = 192.9
Av. agent reward = 19.29
Agents crossed (2nd food pile) = 5.7

Average Statistics by Tribe
Tribe Vikings has total reward of 192.9

Average Statistics by Agent
Agent0 reward is 21.4
Agent1 reward is -2.6
Agent2 reward is 21.7
Agent3 rewar

..............................
Average Statistics in Aggregate
Total rewards gathered = 224.3
Av. agent reward = 22.43
Agents crossed (2nd food pile) = 6.5

Average Statistics by Tribe
Tribe Vikings has total reward of 224.3

Average Statistics by Agent
Agent0 reward is 12.6
Agent1 reward is 39.4
Agent2 reward is 2.3
Agent3 reward is 27.4
Agent4 reward is 42.5
Agent5 reward is 29.1
Agent6 reward is 31.9
Agent7 reward is 24.0
Agent8 reward is -26.4
Agent9 reward is 41.5
Training time per epochs: 5.58 sec
###### Trained episodes = 1000 #######
..............................
Average Statistics in Aggregate
Total rewards gathered = 211.4
Av. agent reward = 21.14
Agents crossed (2nd food pile) = 7.3

Average Statistics by Tribe
Tribe Vikings has total reward of 211.4

Average Statistics by Agent
Agent0 reward is 6.2
Agent1 reward is 22.7
Agent2 reward is 12.2
Agent3 reward is 35.0
Agent4 reward is 19.1
Agent5 reward is 11.5
Agent6 reward is 6.4
Agent7 reward is 23.4
Agent8 reward is 6.8
Age

..............................
Average Statistics in Aggregate
Total rewards gathered = 208.5
Av. agent reward = 20.85
Agents crossed (2nd food pile) = 6.8

Average Statistics by Tribe
Tribe Vikings has total reward of 208.5

Average Statistics by Agent
Agent0 reward is 21.0
Agent1 reward is 30.3
Agent2 reward is 18.6
Agent3 reward is 26.4
Agent4 reward is 21.1
Agent5 reward is 27.7
Agent6 reward is 19.8
Agent7 reward is 16.3
Agent8 reward is 13.9
Agent9 reward is 13.3
Training time per epochs: 5.53 sec
###### Trained episodes = 1500 #######
..............................
Average Statistics in Aggregate
Total rewards gathered = 230.4
Av. agent reward = 23.04
Agents crossed (2nd food pile) = 6.9

Average Statistics by Tribe
Tribe Vikings has total reward of 230.4

Average Statistics by Agent
Agent0 reward is 33.8
Agent1 reward is 21.9
Agent2 reward is 15.7
Agent3 reward is 26.6
Agent4 reward is 20.8
Agent5 reward is 32.1
Agent6 reward is 19.5
Agent7 reward is 26.2
Agent8 reward is 23.0


..............................
Average Statistics in Aggregate
Total rewards gathered = 200.8
Av. agent reward = 20.08
Agents crossed (2nd food pile) = 7.0

Average Statistics by Tribe
Tribe Vikings has total reward of 200.8

Average Statistics by Agent
Agent0 reward is 16.2
Agent1 reward is 41.7
Agent2 reward is 27.9
Agent3 reward is 26.3
Agent4 reward is 23.6
Agent5 reward is 0.3
Agent6 reward is 8.8
Agent7 reward is 10.3
Agent8 reward is 12.6
Agent9 reward is 33.2
Training time per epochs: 4.30 sec
###### Trained episodes = 2000 #######
..............................
Average Statistics in Aggregate
Total rewards gathered = 197.2
Av. agent reward = 19.72
Agents crossed (2nd food pile) = 7.0

Average Statistics by Tribe
Tribe Vikings has total reward of 197.2

Average Statistics by Agent
Agent0 reward is 18.7
Agent1 reward is 35.9
Agent2 reward is 16.2
Agent3 reward is 15.3
Agent4 reward is 14.5
Agent5 reward is 1.8
Agent6 reward is 2.3
Agent7 reward is 3.9
Agent8 reward is 52.5
Agent

..............................
Average Statistics in Aggregate
Total rewards gathered = 186.9
Av. agent reward = 18.69
Agents crossed (2nd food pile) = 7.0

Average Statistics by Tribe
Tribe Vikings has total reward of 186.9

Average Statistics by Agent
Agent0 reward is 30.1
Agent1 reward is 21.2
Agent2 reward is 4.7
Agent3 reward is 18.4
Agent4 reward is 12.1
Agent5 reward is 14.7
Agent6 reward is 16.7
Agent7 reward is 27.0
Agent8 reward is 26.0
Agent9 reward is 16.0
Training time per epochs: 4.32 sec
###### Trained episodes = 2500 #######
..............................
Average Statistics in Aggregate
Total rewards gathered = 206.3
Av. agent reward = 20.63
Agents crossed (2nd food pile) = 7.4

Average Statistics by Tribe
Tribe Vikings has total reward of 206.3

Average Statistics by Agent
Agent0 reward is 41.6
Agent1 reward is 18.4
Agent2 reward is 12.1
Agent3 reward is 14.0
Agent4 reward is 27.8
Agent5 reward is 11.8
Agent6 reward is 15.7
Agent7 reward is 24.0
Agent8 reward is 19.1
A

..............................
Average Statistics in Aggregate
Total rewards gathered = 168.0
Av. agent reward = 16.80
Agents crossed (2nd food pile) = 7.0

Average Statistics by Tribe
Tribe Vikings has total reward of 168.0

Average Statistics by Agent
Agent0 reward is 20.9
Agent1 reward is 28.5
Agent2 reward is 6.4
Agent3 reward is 13.1
Agent4 reward is 19.9
Agent5 reward is 1.3
Agent6 reward is 8.0
Agent7 reward is 6.9
Agent8 reward is 51.6
Agent9 reward is 11.4
Training time per epochs: 4.29 sec
###### Trained episodes = 3000 #######
..............................
Average Statistics in Aggregate
Total rewards gathered = 94.4
Av. agent reward = 9.44
Agents crossed (2nd food pile) = 6.3

Average Statistics by Tribe
Tribe Vikings has total reward of 94.4

Average Statistics by Agent
Agent0 reward is -0.6
Agent1 reward is -1.0
Agent2 reward is 3.9
Agent3 reward is 26.3
Agent4 reward is 11.7
Agent5 reward is 14.7
Agent6 reward is 18.4
Agent7 reward is 5.1
Agent8 reward is 3.3
Agent9 rew

..............................
Average Statistics in Aggregate
Total rewards gathered = 206.2
Av. agent reward = 20.62
Agents crossed (2nd food pile) = 7.9

Average Statistics by Tribe
Tribe Vikings has total reward of 206.2

Average Statistics by Agent
Agent0 reward is 55.4
Agent1 reward is 36.2
Agent2 reward is 13.3
Agent3 reward is 9.2
Agent4 reward is 17.5
Agent5 reward is 11.9
Agent6 reward is 31.6
Agent7 reward is 15.2
Agent8 reward is 18.2
Agent9 reward is -2.3
Training time per epochs: 4.30 sec
###### Trajectory = T7 #######
###### Dir = models/1T-10L/followers_trajectory/food_d37_river_w1_d25/pacifist_follower/tr5.0_t1.5_rp-1.0_600gs_s1/ #######
###### Trained episodes = 500 #######
..............................
Average Statistics in Aggregate
Total rewards gathered = 193.7
Av. agent reward = 19.37
Agents crossed (2nd food pile) = 5.6

Average Statistics by Tribe
Tribe Vikings has total reward of 193.7

Average Statistics by Agent
Agent0 reward is 4.9
Agent1 reward is 45.4
Ag

..............................
Average Statistics in Aggregate
Total rewards gathered = 169.7
Av. agent reward = 16.97
Agents crossed (2nd food pile) = 4.7

Average Statistics by Tribe
Tribe Vikings has total reward of 169.7

Average Statistics by Agent
Agent0 reward is 1.0
Agent1 reward is 14.3
Agent2 reward is 46.2
Agent3 reward is 8.6
Agent4 reward is 31.4
Agent5 reward is 34.6
Agent6 reward is 5.6
Agent7 reward is 31.0
Agent8 reward is -1.4
Agent9 reward is -1.5
Training time per epochs: 4.29 sec
###### Trained episodes = 1000 #######
..............................
Average Statistics in Aggregate
Total rewards gathered = 162.3
Av. agent reward = 16.23
Agents crossed (2nd food pile) = 6.9

Average Statistics by Tribe
Tribe Vikings has total reward of 162.3

Average Statistics by Agent
Agent0 reward is 13.2
Agent1 reward is 16.8
Agent2 reward is 19.0
Agent3 reward is 13.3
Agent4 reward is 19.7
Agent5 reward is 28.6
Agent6 reward is 17.5
Agent7 reward is 34.7
Agent8 reward is 2.2
Agen

..............................
Average Statistics in Aggregate
Total rewards gathered = 190.0
Av. agent reward = 19.00
Agents crossed (2nd food pile) = 7.2

Average Statistics by Tribe
Tribe Vikings has total reward of 190.0

Average Statistics by Agent
Agent0 reward is 2.5
Agent1 reward is 50.3
Agent2 reward is 21.2
Agent3 reward is 21.3
Agent4 reward is 10.0
Agent5 reward is 21.9
Agent6 reward is 9.3
Agent7 reward is 15.3
Agent8 reward is 17.6
Agent9 reward is 20.6
Training time per epochs: 3.68 sec
###### Trained episodes = 1500 #######
..............................
Average Statistics in Aggregate
Total rewards gathered = 195.6
Av. agent reward = 19.56
Agents crossed (2nd food pile) = 7.0

Average Statistics by Tribe
Tribe Vikings has total reward of 195.6

Average Statistics by Agent
Agent0 reward is -0.1
Agent1 reward is 57.2
Agent2 reward is 14.4
Agent3 reward is 18.9
Agent4 reward is 12.8
Agent5 reward is 13.2
Agent6 reward is 14.9
Agent7 reward is 24.0
Agent8 reward is 21.0
Ag

..............................
Average Statistics in Aggregate
Total rewards gathered = 169.1
Av. agent reward = 16.91
Agents crossed (2nd food pile) = 6.8

Average Statistics by Tribe
Tribe Vikings has total reward of 169.1

Average Statistics by Agent
Agent0 reward is 15.7
Agent1 reward is -0.9
Agent2 reward is 15.1
Agent3 reward is 20.9
Agent4 reward is 19.0
Agent5 reward is 21.2
Agent6 reward is 25.2
Agent7 reward is 38.0
Agent8 reward is 0.0
Agent9 reward is 15.0
Training time per epochs: 3.70 sec
###### Trained episodes = 2000 #######
..............................
Average Statistics in Aggregate
Total rewards gathered = 192.0
Av. agent reward = 19.20
Agents crossed (2nd food pile) = 6.5

Average Statistics by Tribe
Tribe Vikings has total reward of 192.0

Average Statistics by Agent
Agent0 reward is 34.7
Agent1 reward is 14.1
Agent2 reward is 12.6
Agent3 reward is 23.4
Agent4 reward is 1.7
Agent5 reward is 21.6
Agent6 reward is 14.8
Agent7 reward is 51.7
Agent8 reward is 0.0
Age

## Statistics for Research Report 

In [11]:
# Note: Statistics for Research Report   
print ('Average Agent Rewards')
for k, reward_traj in enumerate(av_agent_reward):   # Average agent reward
    print ("For Trajectory {}".format(k))
    for j, reward in enumerate(reward_traj):
        print (reward)
    
print ('Agents Crossed (2nd food pile)')    
for k, crossed_traj in enumerate(av_agent_crossed):   # Average num agents gathering in 2nd food pile
    print ("For Trajectory {}".format(k))
    for j, agents_crossed in enumerate(crossed_traj):
        print(agents_crossed)

Average Agent Rewards
For Trajectory 0
[22.313333333333333, 23.436666666666667, 25.76666666666667, 22.303333333333335, 25.926666666666666, 22.986666666666668]
[22.723333333333333, 21.12333333333333, 19.64666666666667, 22.846666666666668, 23.456666666666667, 12.66]
[19.31, 16.19, 21.130000000000003, 22.07333333333333, 18.753333333333334, 12.89]
[19.259999999999998, 21.326666666666668, 21.493333333333332, 20.82333333333333, 22.326666666666668, 29.21333333333333]
For Trajectory 1
[21.936666666666667, 23.496666666666666, 22.693333333333335, 25.15, 21.506666666666668, 25.826666666666664]
[23.393333333333334, 21.39666666666667, 13.793333333333333, 19.663333333333334, 18.883333333333333, 10.743333333333334]
[20.883333333333333, 14.6, 20.083333333333336, 23.080000000000002, 16.533333333333335, 14.943333333333333]
[19.946666666666665, 22.12666666666667, 22.71333333333333, 21.57, 23.11, 26.356666666666666]
For Trajectory 2
[23.19, 23.75, 23.716666666666665, 25.45, 22.85333333333333, 25.080000000