# Evaluate Baseline

### **1T-10L: 1 Teams composed of 10 agents **

We run single/multiple-play to evaluate whether adjusting temperature and steps/episode (Baseline) can induce more agents to cross over and gather from the 2nd food pile.

<img src="images/Crossing01.png" width="600">

<img src="images/Crossing-river.png" width="600">

The Crossing Game presents a more difficult problem than the Gathering game. In place of a single food pile, there are 2 food piles separated by a fixed distance or a barrier:

* The smaller food pile is located closed to the agents, but has fewer food units than the number of agents. 
* The larger food pile has more food units than the number of agents but is located further away. The agents cannot see it unless they move away from the 1st food pile.
* If there is a river, the agent suffers a -1.0 penalty for each game step in the river.

The game thus deals with two challenging issues that are difficult for reinforcement learning algorithms:

1. Sparce reward - the long distance an agent needs to explore with no reward to get to the 2nd food pile
2. Local Optima - the presence of the 1st smaller food pile which the agents can see

In [1]:
import os
import random
import time
import platform
import torch
import gym
import numpy as np
import pickle

# This is the Crossing game environment
from teams_env import CrossingEnv
from teams_model import *
from interface import *

import matplotlib.pyplot as plt

%matplotlib inline
plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

# for auto-reloading external modules
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2

print("Python version: ", platform.python_version())
print("Pytorch version: {}".format(torch.__version__))
print("OpenAI Gym version: {}".format(gym.__version__))

Python version:  3.6.8
Pytorch version: 1.0.1.post2
OpenAI Gym version: 0.9.2


## Trained Models

The code block contains the folder locations of the trained models of follower agents as well as the parameters used in their training.

In [2]:
folders = [
    # Agents trained in map = food_d37
    'models/1T-10L/baseline/food_d37/pacifist/t1.0_rp-1.0_300gs/',   # scenario=1
    'models/1T-10L/baseline/food_d37/pacifist/t1.0_rp-1.0_300gs/',   # scenario=2
    'models/1T-10L/baseline/food_d37/pacifist/t1.0_rp-1.0_600gs/',   # scenario=3
    'models/1T-10L/baseline/food_d37/pacifist/t1.25_rp-1.0_300gs/',   # scenario=4
    'models/1T-10L/baseline/food_d37/pacifist/t1.25_rp-1.0_600gs/',   # scenario=5
    'models/1T-10L/baseline/food_d37/pacifist/t1.5_rp-1.0_300gs/',   # scenario=6
    'models/1T-10L/baseline/food_d37/pacifist/t1.5_rp-1.0_600gs/',   # scenario=7
    'models/1T-10L/baseline/food_d37/pacifist/t1.5_rp-1.0_1200gs/',   # scenario=8
    'models/1T-10L/baseline/food_d37/pacifist/t2.0_rp-1.0_300gs/',   # scenario=9
    'models/1T-10L/baseline/food_d37/pacifist/t2.0_rp-1.0_600gs/',   # scenario=10
    'models/1T-10L/baseline/food_d37/pacifist/t2.0_rp-1.0_1200gs/',   # scenario=11
    'models/1T-10L/baseline/food_d37/pacifist/t4.0_rp-1.0_300gs/',   # scenario=12
    'models/1T-10L/baseline/food_d37/pacifist/t4.0_rp-1.0_600gs/',   # scenario=13
    'models/1T-10L/baseline/food_d37/pacifist/t4.0_rp-1.0_1200gs/',   # scenario=14
    'models/1T-10L/baseline/food_d37/pacifist/t8.0_rp-1.0_300gs/',   # scenario=15
    'models/1T-10L/baseline/food_d37/pacifist/t8.0_rp-1.0_600gs/',   # scenario=16
    'models/1T-10L/baseline/food_d37/pacifist/t8.0_rp-1.0_1200gs/',   # scenario=17

    # Agents trained in map = food_d37_river_w1_d25
    "models/1T-10L/baseline/food_d37_river_w1_d25/pacifist/t1.0_rp-1.0_300gs/",   # scenario=18
    "models/1T-10L/baseline/food_d37_river_w1_d25/pacifist/t1.25_rp-1.0_300gs/",   # scenario=19 
    "models/1T-10L/baseline/food_d37_river_w1_d25/pacifist/t2.0_rp-1.0_300gs/",   # scenario=20
    "models/1T-10L/baseline/food_d37_river_w1_d25/pacifist/t4.0_rp-1.0_300gs/"   # scenario=21   
]

# Parameter sets pertaining to the trained models in the folders above (not used in the code)
parameters =[ 
            # Temperature for explore/exploit; penalty per step in river; game steps per episode
            {'temp_start':1.0, 'river_penalty':-1.0, 'game_steps':300},
            {'temp_start':1.0, 'river_penalty':-1.0, 'game_steps':600},    
            {'temp_start':1.25, 'river_penalty':-1.0, 'game_steps':300},
            {'temp_start':1.25, 'river_penalty':-1.0, 'game_steps':600},    
            {'temp_start':1.5, 'river_penalty':-1.0, 'game_steps':300},
            {'temp_start':1.5, 'river_penalty':-1.0, 'game_steps':600},
            {'temp_start':1.5, 'river_penalty':-1.0, 'game_steps':1200},
            {'temp_start':2.0, 'river_penalty':-1.0, 'game_steps':300},
            {'temp_start':2.0, 'river_penalty':-1.0, 'game_steps':600},
            {'temp_start':2.0, 'river_penalty':-1.0, 'game_steps':1200},
            {'temp_start':4.0, 'river_penalty':-1.0, 'game_steps':300},
            {'temp_start':4.0, 'river_penalty':-1.0, 'game_steps':600},
            {'temp_start':4.0, 'river_penalty':-1.0, 'game_steps':1200},
            {'temp_start':8.0, 'river_penalty':-1.0, 'game_steps':300},
            {'temp_start':8.0, 'river_penalty':-1.0, 'game_steps':600},
            {'temp_start':8.0, 'river_penalty':-1.0, 'game_steps':1200},
            {'temp_start':1.0, 'river_penalty':-1.0, 'game_steps':300},
            {'temp_start':1.25, 'river_penalty':-1.0, 'game_steps':300},
            {'temp_start':2.0, 'river_penalty':-1.0, 'game_steps':300},
            {'temp_start':4.0, 'river_penalty':-1.0, 'game_steps':300}
            ]

# Play A Single Game - Baseline

Play a single game with rendering to observe agents' learning and resulting behaviors.

User can change the scenario to load agent models from different folders.

In [3]:
import pickle
import numpy as np

import torch
from torch.autograd import Variable
from teams_env import CrossingEnv

game = 'Crossing'
map_name = "food_d37_river_w1_d25"
# map_name = "food_d37"

culture = "pacifist"
scenario = 19
dir_name = folders[scenario-1]
episodes = 3000  # This is used to recall a model file trained to a # of episodes

# There will be 10 agents - 0 teams of 0 AI agents each and 0 random agent
num_ai_agents = 10
num_rdn_agents = 0
num_agents = num_ai_agents+num_rdn_agents  # just the sum of the two

# Data structure for AI agents (agents will form their own Class later on)
agents = []
actions = []
tags = []

# Initialize environment
render = True
SPEED = 1/30
num_actions = 8                       # There are 8 actions defined in Gathering

# Initialize constants
num_frames = 7
max_episodes = 1
max_frames = 1000

# Initialize parameters for Crossing and Explore
river_penalty = -1
crossed = [0 for i in range(num_ai_agents)]  # Keep track of agents gathering from 2nd food pile
second_pile_x = 50   # x-coordinate of the 2nd food pile
jumping_zone = True

# Load models for AI agents
if episodes > 0:
    agents= [[] for i in range(num_ai_agents)]
    # If episodes is provided (not 0), load the model for each AI agent
    for i in range(num_ai_agents):
        model_file = dir_name+'MA{}_{}_ep{}.p'.format(i,game,episodes)
        try:
            with open(model_file, 'rb') as f:
                # Model File include both model and optim parameters
                saved_model = pickle.load(f)
                agents[i], _ = saved_model
                print("Load saved model for agent {}".format(i))
        except OSError:
            print('Model file not found.')
            raise
else:
    # If episodes=0, start with a freshly initialized model for each AI agent
    for i in range(num_ai_agents):
        print("Load AI agent {}".format(i))
        agents.append(Policy(num_frames, num_actions, i))

# Load random agents    
for i in range(num_ai_agents,num_agents):
    print("Load random agent {}".format(i))
    agents.append(Rdn_Policy())

# Initialize AI and random agent data
actions = [0 for i in range(num_agents)]
tags = [0 for i in range(num_agents)]

# Establish tribal association
tribes = []
tribes.append(Tribe(name='Vikings',color='blue', culture=culture, \
                    agents=[agents[0], agents[1], agents[2], agents[3], agents[4], \
                           agents[5], agents[6], agents[7], agents[8], agents[9]]))

#tribes.append(Tribe(name='Saxons', color='red', culture=culture, \
#                    agents=[agents[4], agents[5], agents[6], agents[7]]))
#tribes.append(Tribe(name='Franks', color='purple', culture=culture, \
#                    agents=[agents[8], agents[9], agents[10], agents[11]]))
# tribes.append(Tribe(name='Crazies', color='yellow', agents=[agents[3], \
#                    agents[4], agents[5]]))   # random agents are crazy!!!

# Set up agent and tribe info to pass into env
agent_colors = [agent.color for agent in agents]
agent_tribes = [agent.tribe for agent in agents]
tribe_names = [tribe.name for tribe in tribes]
    
env = CrossingEnv(n_agents=num_agents,agent_colors=agent_colors, agent_tribes=agent_tribes, \
                  map_name=map_name, river_penalty=river_penalty, tribes=tribe_names, \
                  debug_agent=0)    
    
for ep in range(max_episodes):
    
    US_hits = [0 for i in range(num_agents)]
    THEM_hits = [0 for i in range(num_agents)]

    env_obs = env.reset()  # Environment return observations
    """
    # For Debug only
    print (len(agents_obs))
    print (agents_obs[0].shape)
    """
    
    # Unpack observations into data structure compatible with agent Policy
    agents_obs = unpack_env_obs(env_obs)
    
    for i in range(num_ai_agents):    # Reset agent info - laser tag statistics
        agents[i].reset_info()    
    
    env.render()  
    time.sleep(SPEED)  # Change speed of video rendering
    
    """
    # For Debug only
    print (len(agents_obs))
    print (agents_obs[0].shape)
    """
    
    """
    For now, we do not stack observations, and we do not implement LSTM
    
    state = np.stack([state]*num_frames)

    # Reset LSTM hidden units when episode begins
    cx = Variable(torch.zeros(1, 256))
    hx = Variable(torch.zeros(1, 256))
    """

    for frame in range(max_frames):

        for i in range(num_ai_agents):    # For AI agents
            actions[i], _ = select_action(agents[i], agents_obs[i], cuda=False)
            if actions[i] is 6:  # action[i] is a tensor, .item() returns the integer
                tags[i] += 1   # record a tag for accessing aggressiveness
                
        for i in range(num_ai_agents, num_agents):   # For random agents
            actions[i] = agents[i].select_action(agents_obs[i])
            if actions[i] is 6:
                tags[i] += 1   # record a tag for accessing aggressiveness
        
        """
        For now, we do not implement LSTM
        # Select action
        action, log_prob, state_value, (hx,cx)  = select_action(model, state, (hx,cx))        
        """

        # if frame % 10 == 0:
        #     print (actions)    
            
        # Perform step        
        env_obs, reward, done, info = env.step(actions)
        
        """
        For Debug only
        print (env_obs)
        print (reward)
        print (done) 
        """

        for i in range(num_ai_agents):
            agents[i].rewards.append(reward[i])  # Stack rewards

        
        # Unpack observations into data structure compatible with agent Policy
        agents_obs = unpack_env_obs(env_obs)
        load_info(agents, info, narrate=False)   # Load agent info for AI agents
        
        for i in range(num_agents):
            US_hits[i] += agents[i].US_hit
            THEM_hits[i] += agents[i].THEM_hit
            
        """
        For now, we do not stack observation, may come in handy later on
        
        # Evict oldest diff add new diff to state
        next_state = np.stack([next_state]*num_frames)
        next_state[1:, :, :] = state[:-1, :, :]
        state = next_state
        """
        total = 0
        for i in range(num_ai_agents):
            agent_reward = sum(agents[i].rewards)
            total += agent_reward
        
        env.render()
        time.sleep(SPEED)  # Change speed of video rendering

        if any(done):
            print("Done after {} frames".format(frame))
            break

env.close()  # Close the rendering window

# Print out statistics of AI agents

total_rewards = 0
total_tags = 0
total_US_hits = 0
total_THEM_hits = 0

print ('\nStatistics by Agent')
print ('===================')
for i in range(num_ai_agents):
    agent_tags = sum(agents[i].tag_hist)
    total_tags += agent_tags
    print ("Agent{} aggressiveness is {:.2f}".format(i, sum(agents[i].tag_hist)/frame))

    agent_reward = sum(agents[i].rewards)
    total_rewards += agent_reward
    print ("Agent{} reward is {:d}".format(i, agent_reward))

    agent_US_hits = sum(agents[i].US_hits)
    agent_THEM_hits = sum(agents[i].THEM_hits)
    total_US_hits += agent_US_hits
    total_THEM_hits += agent_THEM_hits

    print('US agents hit = {}'.format(agent_US_hits))
    print('THEM agents hit = {}'.format(agent_THEM_hits ))

print ('\nStatistics in Aggregate')
print ('=======================')
print ('Total rewards gathered = {}'.format(total_rewards))
print ('Av. rewards per agent = {0:.2f}'.format(total_rewards/num_ai_agents))
print ('Num laser fired = {}'.format(total_tags))
print ('Total US Hit (friendly fire) = {}'.format(total_US_hits))
print ('Total THEM Hit = {}'.format(total_THEM_hits))
print ('friendly fire (%) = {0:.3f}'.format(total_US_hits/(total_US_hits+total_THEM_hits+1e-7)))

for (i, loc) in env.consumption:
    if loc[0] > second_pile_x:
        # print ('agent {} gathered an apple in 2nd pile'.format(i))
        crossed[i] = 1
        
print ("Num agents gathering from 2nd food pile: {}".format(sum(crossed)))

print ('\nStatistics by Team')
print ('===================')
top_tribe = None
top_tribe_reward = 0

for i, tribe in enumerate(tribes):
    if tribe.name is not 'Crazies':
        tribe_reward = sum(tribe.sum_rewards())
        print ('Tribe {} has total reward of {}'.format(tribe.name, tribe_reward))
                           
        if tribe_reward > top_tribe_reward:   # Keep track of dominating team
            top_tribe_reward = tribe_reward
            top_tribe = tribe.name

# Team dominance calculation
if len(tribes) > 1:
    print ('Dominating Team: {}'.format(top_tribe))
    dominance = top_tribe_reward/((total_rewards-top_tribe_reward+1.1e-7)/(len(tribes)-1))    
    print ('Team dominance: {0:.2f}x'.format(dominance))


RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location='cpu' to map your storages to the CPU.

In [22]:
env.close()

## Performance Stats - Baseline (Map = food_d37)

Our research requires gathering game stats for agents and teams over 30 episodes of game play:

* Average agent reward - average number of apples gathered per agent per episode  
* The number of agents gathering apples at the 2nd food pile 

Rendering is disabled to speed things up.

<img src="images/Crossing01.png" width="600">


In [181]:
import pickle
import numpy as np

import torch
from torch.autograd import Variable

dir_names = [
             "models/1T-10L/baseline/food_d37/pacifist/t1.0_rp-1.0_300gs/",
             "models/1T-10L/baseline/food_d37/pacifist/t1.0_rp-1.0_600gs/", 
             "models/1T-10L/baseline/food_d37/pacifist/t1.25_rp-1.0_300gs/",
             "models/1T-10L/baseline/food_d37/pacifist/t1.25_rp-1.0_600gs/",
             "models/1T-10L/baseline/food_d37/pacifist/t1.5_rp-1.0_300gs/",
             "models/1T-10L/baseline/food_d37/pacifist/t1.5_rp-1.0_600gs/", 
             "models/1T-10L/baseline/food_d37/pacifist/t1.5_rp-1.0_1200gs/",
             "models/1T-10L/baseline/food_d37/pacifist/t2.0_rp-1.0_300gs/",
             "models/1T-10L/baseline/food_d37/pacifist/t2.0_rp-1.0_600gs/", 
             "models/1T-10L/baseline/food_d37/pacifist/t2.0_rp-1.0_1200gs/",
             "models/1T-10L/baseline/food_d37/pacifist/t4.0_rp-1.0_300gs/",
             "models/1T-10L/baseline/food_d37/pacifist/t4.0_rp-1.0_600gs/", 
             "models/1T-10L/baseline/food_d37/pacifist/t4.0_rp-1.0_1200gs/",
             "models/1T-10L/baseline/food_d37/pacifist/t8.0_rp-1.0_300gs/",
             "models/1T-10L/baseline/food_d37/pacifist/t8.0_rp-1.0_600gs/", 
             "models/1T-10L/baseline/food_d37/pacifist/t8.0_rp-1.0_1200gs/"    
             ]
episodes = [500, 1000, 1500, 2000, 2500, 3000] 
game = 'Crossing'
culture = "pacifist"
map_name = "food_d37"

# Performance Statistics - for Research Report
av_agent_reward = [[0 for i in episodes] for j in dir_names]
av_agent_crossed = [[0 for i in episodes] for j in dir_names]  
dominating_tribe = [[None for i in episodes] for j in dir_names]
dom_tribe_reward = [[0 for i in episodes] for j in dir_names]
dominance = [[0 for i in episodes] for j in dir_names]

# There will be 10 agents - 0 teams of 0 AI agents each and 0 random agent
num_ai_agents = 10
num_rdn_agents = 0
num_agents = num_ai_agents+num_rdn_agents  # just the sum of the two

# Data structure for AI agents (agents will form their own Class later on)
agents = []
actions = []
tags = []

# Initialize environment
render = False
SPEED = 1/30
river_penalty = -1
num_actions = 8                       # There are 8 actions defined in Gathering
second_pile_x = 50   # x-coordinate of the 2nd food pile

# Initialize constants
num_frames = 7
max_episodes = 30
max_frames = 500
verbose = False


for dir_num, dir_name in enumerate(dir_names):
    print ("###### Dir = {} #######".format(dir_name))
    
    for eps_num, eps in enumerate(episodes):
        print ("###### Trained episodes = {} #######".format(eps))
    
        # Load models for AI agents
        agents= [[] for i in range(num_ai_agents)]
        # If episodes is provided (not 0), load the model for each AI agent
        for i in range(num_ai_agents):
            model_file = dir_name+'MA{}_{}_ep{}.p'.format(i,game,eps)
            try:
                with open(model_file, 'rb') as f:
                    # Model File include both model and optim parameters
                    saved_model = pickle.load(f)
                    agents[i], _ = saved_model
                    # print("Load saved model for agent {}".format(i))
            except OSError:
                print('Model file not found.')
                raise

        # Load random agents    
        for i in range(num_ai_agents,num_agents):
            # print("Load random agent {}".format(i))
            agents.append(Rdn_Policy())
        
        # Establish tribal association
        tribes = []
        tribes.append(Tribe(name='Vikings',color='blue', culture=culture, \
                    agents=[agents[0], agents[1], agents[2], agents[3], agents[4], \
                           agents[5], agents[6], agents[7], agents[8], agents[9]]))

        # Set up agent and tribe info to pass into env
        agent_colors = [agent.color for agent in agents]
        agent_tribes = [agent.tribe for agent in agents]
        tribe_names = [tribe.name for tribe in tribes]
        
        env = CrossingEnv(n_agents=num_agents,agent_colors=agent_colors, agent_tribes=agent_tribes, \
                  map_name=map_name, river_penalty=river_penalty, tribes=tribe_names, \
                  debug_agent=0)

        # Used to accumulate episode stats for averaging
        cum_rewards = 0
        cum_crossed = 0
        cum_tags = 0
        cum_US_hits = 0
        cum_THEM_hits = 0
        cum_agent_rewards = [0 for agent in agents]
        cum_agent_tags = [0 for agent in agents]
        cum_agent_US_hits = [0 for agent in agents]
        cum_agent_THEM_hits = [0 for agent in agents]
        cum_tribe_rewards = [0 for t in tribes if t.name is not 'Crazies']

        cuda = False
        start = time.time()

        for ep in range(max_episodes):
    
            print('.', end='')  # To show progress
    
            # Initialize AI and random agent data
            actions = [0 for i in range(num_agents)]
            tags = [0 for i in range(num_agents)]
            US_hits = [0 for i in range(num_agents)]
            THEM_hits = [0 for i in range(num_agents)]
            
            # Keep track of agents gathering from 2nd food pile
            crossed = [0 for i in range(num_ai_agents)]

            env_obs = env.reset()  # Environment return observations
            """
            # For Debug only
            print (len(agents_obs))
            print (agents_obs[0].shape)
            """
    
            # Unpack observations into data structure compatible with agent Policy
            agents_obs = unpack_env_obs(env_obs)
    
            for i in range(num_ai_agents):    # Reset agent info - laser tag statistics
                agents[i].reset_info()    
    
            if render:
                env.render()
                time.sleep(SPEED)  # Change speed of video rendering
    
            """
            # For Debug only
            print (len(agents_obs))
            print (agents_obs[0].shape)
            """
    
            """
            For now, we do not stack observations, and we do not implement LSTM
    
            state = np.stack([state]*num_frames)

            # Reset LSTM hidden units when episode begins
            cx = Variable(torch.zeros(1, 256))
            hx = Variable(torch.zeros(1, 256))
            """

            for frame in range(max_frames):

                for i in range(num_ai_agents):    # For AI agents
                    actions[i], _ = select_action(agents[i], agents_obs[i], cuda=cuda)
                    if actions[i] is 6:  # action[i] is a tensor, .item() returns the integer
                        tags[i] += 1   # record a tag for accessing aggressiveness
                
                for i in range(num_ai_agents, num_agents):   # For random agents
                    actions[i] = agents[i].select_action(agents_obs[i])
                    if actions[i] is 6:
                        tags[i] += 1   # record a tag for accessing aggressiveness
        
                """
                For now, we do not implement LSTM
                # Select action
                action, log_prob, state_value, (hx,cx)  = select_action(model, state, (hx,cx))        
                """

                # if frame % 10 == 0:
                #     print (actions)    
            
                # Perform step        
                env_obs, reward, done, info = env.step(actions)
        
                """
                For Debug only
                print (env_obs)
                print (reward)
                print (done) 
                """

                for i in range(num_ai_agents):
                    agents[i].rewards.append(reward[i])  # Stack rewards

        
                # Unpack observations into data structure compatible with agent Policy
                agents_obs = unpack_env_obs(env_obs)
                load_info(agents, info, narrate=False)   # Load agent info for AI agents
        
                for i in range(num_agents):
                    US_hits[i] += agents[i].US_hit
                    THEM_hits[i] += agents[i].THEM_hit
            
                """
                For now, we do not stack observation, may come in handy later on
        
                # Evict oldest diff add new diff to state
                next_state = np.stack([next_state]*num_frames)
                next_state[1:, :, :] = state[:-1, :, :]
                state = next_state
                """
        
                if render and ep is 0: 
                    env.render()
                    time.sleep(SPEED)  # Change speed of video rendering

                if any(done):
                    print("Done after {} frames".format(frame))
                    break
                    
                for (i, loc) in env.consumption:
                    if loc[0] > second_pile_x:
                        # print ('agent {} gathered an apple in 2nd pile'.format(i))
                        crossed[i] = 1
            
            # Print out statistics of AI agents
            ep_rewards = 0
            ep_tags = 0
            ep_US_hits = 0
            ep_THEM_hits = 0
            ep_crossed = sum(crossed)     # calculated num agents gathering in 2nd pile for episode

            if verbose:
                print ('\nStatistics by Agent')
                print ('===================')
            for i in range(num_ai_agents):
                agent_tags = sum(agents[i].tag_hist)
                ep_tags += agent_tags
                cum_agent_tags[i] += agent_tags

                agent_reward = sum(agents[i].rewards)
                ep_rewards += agent_reward
                cum_agent_rewards[i] += agent_reward

                agent_US_hits = sum(agents[i].US_hits)
                agent_THEM_hits = sum(agents[i].THEM_hits)
                ep_US_hits += agent_US_hits
                ep_THEM_hits += agent_THEM_hits
                cum_agent_US_hits[i] += agent_US_hits
                cum_agent_THEM_hits[i] += agent_THEM_hits
        
                if verbose:
                    print ("Agent{} aggressiveness is {:.2f}".format(i, agent_tags/frame))
                    print ("Agent{} reward is {:d}".format(i, agent_reward))
                    # print('US agents hit = {}'.format(agent_US_hits))
                    # print('THEM agents hit = {}'.format(agent_THEM_hits ))
        
            cum_rewards += ep_rewards
            cum_crossed += ep_crossed
            cum_tags += ep_tags
            cum_US_hits += ep_US_hits
            cum_THEM_hits += ep_THEM_hits
    
            if verbose:
                print ('\nStatistics in Aggregate')
                print ('=======================')
                print ('Total rewards gathered = {}'.format(ep_rewards))
                print ('Num agents crossed = {}'.format(ep_crossed))
                # print ('Num laser fired = {}'.format(ep_tags))
                # print ('Total US Hit (friendly fire) = {}'.format(ep_US_hits))
                # print ('Total THEM Hit = {}'.format(ep_THEM_hits))
                # print ('friendly fire (%) = {0:.3f}'.format(ep_US_hits/(ep_US_hits+ep_THEM_hits+1e-7)))

            if verbose:
                print ('\nStatistics by Tribe')
                print ('===================')
            for i, t in enumerate(tribes):
                if t.name is not 'Crazies':
                    ep_tribe_reward = sum(t.sum_rewards())
                    cum_tribe_rewards[i] += ep_tribe_reward
                    if verbose:
                        print ('Tribe {} has total reward of {}'.format(t.name, ep_tribe_reward))

            for i in range(num_ai_agents):
                agents[i].clear_history()

        env.close()  # Close the rendering window
        end = time.time()

        print ('\nAverage Statistics in Aggregate')
        print ('=================================')
        total_rewards = cum_rewards/max_episodes
        print ('Total rewards gathered = {:.1f}'.format(total_rewards))
        av_agent_reward[dir_num][eps_num] = cum_rewards/max_episodes/num_ai_agents
        print ('Av. agent reward = {:.2f}'.format(av_agent_reward[dir_num][eps_num]))
        av_agent_crossed[dir_num][eps_num] = cum_crossed/max_episodes
        print ('Agents crossed (2nd food pile) = {:.1f}'.format(av_agent_crossed[dir_num][eps_num]))
        # print ('Num laser fired = {:.1f}'.format(cum_tags/max_episodes))
        # print ('Total US Hit (friendly fire) = {:.1f}'.format(cum_US_hits/max_episodes))
        # print ('Total THEM Hit = {:.1f}'.format(cum_THEM_hits/max_episodes))
        # print ('friendly fire (%) = {:.3f}'.format(cum_US_hits/(cum_US_hits+cum_THEM_hits+1e-7)))

        print ('\nAverage Statistics by Tribe')
        print ('=============================')
       
        for i, tribe in enumerate(tribes):
            if tribe.name is not 'Crazies':
                tribe_reward = cum_tribe_rewards[i]/max_episodes
                print ('Tribe {} has total reward of {:.1f}'.format(tribe.name, tribe_reward))    
                
                # Keep track of dominating team and the rewards gathered (only if more than 1 tribe)
                if len(tribes) > 1:
                    if tribe_reward > dom_tribe_reward[dir_num][eps_num]:   
                        dom_tribe_reward[dir_num][eps_num] = tribe_reward
                        dominating_tribe[dir_num][eps_num]  = tribe.name

        # Team dominance calculation (only if more than 1 tribe)
        if len(tribes) > 1:
            print ('Dominating Tribe: {}'.format(dominating_tribe[dir_num][eps_num]))
            dominance[dir_num][eps_num] = dom_tribe_reward[dir_num][eps_num]/((total_rewards - \
                                                dom_tribe_reward[dir_num][eps_num]+1.1e-7)/(len(tribes)-1))    
            print ('Team dominance: {0:.2f}x'.format(dominance[dir_num][eps_num]))

        print ('\nAverage Statistics by Agent')
        print ('=============================')
        for i in range(num_ai_agents):
            # print ("Agent{} of {} aggressiveness is {:.2f}".format(i, agents[i].tribe, \
            #                                               cum_agent_tags[i]/(max_episodes*max_frames)))
            print ("Agent{} reward is {:.1f}".format(i, cum_agent_rewards[i]/max_episodes))
            # print('US agents hit = {:.1f}'.format(cum_agent_US_hits[i]/max_episodes))
            # print('THEM agents hit = {:.1f}'.format(cum_agent_THEM_hits[i]/max_episodes))

        print('Training time per epochs: {:.2f} sec'.format((end-start)/max_episodes))

# Note: Statistics for Research Report        
for reward in av_agent_reward:   # Average agent reward
    print(reward)
for agents_crossed in av_agent_crossed:   # Average num agents gathering in 2nd food pile
    print(agents_crossed)

# print dominating team and dominance factor (only if more than 1 tribe)
if len(tribes) > 1:
    for tribe in dominating_tribe:   # Dominating team
        print(tribe)
    for value in dominance:      # Team dominance
        print(value)

###### Dir = models/1T-10L/baseline/food_d37/pacifist/t1.0_rp-1.0_300gs/ #######
###### Trained episodes = 500 #######
..............................
Average Statistics in Aggregate
Total rewards gathered = 274.7
Av. agent reward = 27.47
Agents crossed (2nd food pile) = 6.3

Average Statistics by Tribe
Tribe Vikings has total reward of 274.7

Average Statistics by Agent
Agent0 reward is 6.6
Agent1 reward is 5.5
Agent2 reward is 37.9
Agent3 reward is 26.5
Agent4 reward is 2.7
Agent5 reward is 23.0
Agent6 reward is 32.5
Agent7 reward is 85.0
Agent8 reward is 32.8
Agent9 reward is 22.1
Training time per epochs: 2.03 sec
###### Trained episodes = 1000 #######
..............................
Average Statistics in Aggregate
Total rewards gathered = 353.6
Av. agent reward = 35.36
Agents crossed (2nd food pile) = 6.0

Average Statistics by Tribe
Tribe Vikings has total reward of 353.6

Average Statistics by Agent
Agent0 reward is 10.8
Agent1 reward is 18.5
Agent2 reward is 52.0
Agent3 reward is

..............................
Average Statistics in Aggregate
Total rewards gathered = 341.2
Av. agent reward = 34.12
Agents crossed (2nd food pile) = 5.1

Average Statistics by Tribe
Tribe Vikings has total reward of 341.2

Average Statistics by Agent
Agent0 reward is 8.0
Agent1 reward is 18.3
Agent2 reward is 20.8
Agent3 reward is 58.0
Agent4 reward is 29.1
Agent5 reward is 19.1
Agent6 reward is 7.0
Agent7 reward is 58.3
Agent8 reward is 42.4
Agent9 reward is 80.2
Training time per epochs: 2.03 sec
###### Trained episodes = 1500 #######
..............................
Average Statistics in Aggregate
Total rewards gathered = 355.9
Av. agent reward = 35.59
Agents crossed (2nd food pile) = 5.3

Average Statistics by Tribe
Tribe Vikings has total reward of 355.9

Average Statistics by Agent
Agent0 reward is 37.9
Agent1 reward is 30.9
Agent2 reward is 27.5
Agent3 reward is 64.6
Agent4 reward is 11.8
Agent5 reward is 33.1
Agent6 reward is 29.2
Agent7 reward is 60.8
Agent8 reward is 6.1
Age

..............................
Average Statistics in Aggregate
Total rewards gathered = 365.3
Av. agent reward = 36.53
Agents crossed (2nd food pile) = 5.1

Average Statistics by Tribe
Tribe Vikings has total reward of 365.3

Average Statistics by Agent
Agent0 reward is 39.8
Agent1 reward is 56.0
Agent2 reward is 1.4
Agent3 reward is 21.0
Agent4 reward is 0.2
Agent5 reward is 70.3
Agent6 reward is 46.0
Agent7 reward is 24.3
Agent8 reward is 71.7
Agent9 reward is 34.7
Training time per epochs: 2.04 sec
###### Trained episodes = 2000 #######
..............................
Average Statistics in Aggregate
Total rewards gathered = 364.7
Av. agent reward = 36.47
Agents crossed (2nd food pile) = 5.6

Average Statistics by Tribe
Tribe Vikings has total reward of 364.7

Average Statistics by Agent
Agent0 reward is 52.0
Agent1 reward is 42.2
Agent2 reward is 29.5
Agent3 reward is 7.5
Agent4 reward is 0.0
Agent5 reward is 65.6
Agent6 reward is 55.4
Agent7 reward is 29.7
Agent8 reward is 44.1
Agen

..............................
Average Statistics in Aggregate
Total rewards gathered = 346.6
Av. agent reward = 34.66
Agents crossed (2nd food pile) = 2.9

Average Statistics by Tribe
Tribe Vikings has total reward of 346.6

Average Statistics by Agent
Agent0 reward is 0.0
Agent1 reward is 61.0
Agent2 reward is 14.3
Agent3 reward is 0.0
Agent4 reward is 113.3
Agent5 reward is 45.9
Agent6 reward is 0.0
Agent7 reward is 53.2
Agent8 reward is 26.0
Agent9 reward is 32.8
Training time per epochs: 2.02 sec
###### Trained episodes = 2500 #######
..............................
Average Statistics in Aggregate
Total rewards gathered = 355.4
Av. agent reward = 35.54
Agents crossed (2nd food pile) = 2.7

Average Statistics by Tribe
Tribe Vikings has total reward of 355.4

Average Statistics by Agent
Agent0 reward is 0.0
Agent1 reward is 89.9
Agent2 reward is 74.5
Agent3 reward is 0.0
Agent4 reward is 103.0
Agent5 reward is 31.7
Agent6 reward is 0.0
Agent7 reward is 10.1
Agent8 reward is 13.4
Agen

..............................
Average Statistics in Aggregate
Total rewards gathered = 363.9
Av. agent reward = 36.39
Agents crossed (2nd food pile) = 2.0

Average Statistics by Tribe
Tribe Vikings has total reward of 363.9

Average Statistics by Agent
Agent0 reward is 0.0
Agent1 reward is 142.7
Agent2 reward is 0.0
Agent3 reward is 0.0
Agent4 reward is 128.6
Agent5 reward is 0.0
Agent6 reward is 0.1
Agent7 reward is 29.6
Agent8 reward is 30.0
Agent9 reward is 33.0
Training time per epochs: 2.01 sec
###### Trained episodes = 3000 #######
..............................
Average Statistics in Aggregate
Total rewards gathered = 355.8
Av. agent reward = 35.58
Agents crossed (2nd food pile) = 2.0

Average Statistics by Tribe
Tribe Vikings has total reward of 355.8

Average Statistics by Agent
Agent0 reward is 0.0
Agent1 reward is 137.6
Agent2 reward is 0.0
Agent3 reward is 0.0
Agent4 reward is 125.2
Agent5 reward is 0.0
Agent6 reward is 0.0
Agent7 reward is 30.0
Agent8 reward is 30.0
Agent9

..............................
Average Statistics in Aggregate
Total rewards gathered = 308.0
Av. agent reward = 30.80
Agents crossed (2nd food pile) = 1.8

Average Statistics by Tribe
Tribe Vikings has total reward of 308.0

Average Statistics by Agent
Agent0 reward is 0.0
Agent1 reward is 0.9
Agent2 reward is 120.1
Agent3 reward is 0.0
Agent4 reward is 0.0
Agent5 reward is 0.0
Agent6 reward is 99.9
Agent7 reward is 0.0
Agent8 reward is 56.0
Agent9 reward is 31.0
Training time per epochs: 2.01 sec
###### Dir = models/1T-10L/baseline/food_d37/pacifist/t4.0_rp-1.0_600gs/ #######
###### Trained episodes = 500 #######
..............................
Average Statistics in Aggregate
Total rewards gathered = 101.2
Av. agent reward = 10.12
Agents crossed (2nd food pile) = 1.2

Average Statistics by Tribe
Tribe Vikings has total reward of 101.2

Average Statistics by Agent
Agent0 reward is 4.0
Agent1 reward is 10.7
Agent2 reward is 2.9
Agent3 reward is 19.0
Agent4 reward is 8.2
Agent5 reward is

..............................
Average Statistics in Aggregate
Total rewards gathered = 13.1
Av. agent reward = 1.31
Agents crossed (2nd food pile) = 0.0

Average Statistics by Tribe
Tribe Vikings has total reward of 13.1

Average Statistics by Agent
Agent0 reward is 0.9
Agent1 reward is 1.4
Agent2 reward is 0.6
Agent3 reward is 1.5
Agent4 reward is 0.5
Agent5 reward is 1.1
Agent6 reward is 1.3
Agent7 reward is 2.2
Agent8 reward is 2.1
Agent9 reward is 1.4
Training time per epochs: 2.00 sec
###### Trained episodes = 1000 #######
..............................
Average Statistics in Aggregate
Total rewards gathered = 152.3
Av. agent reward = 15.23
Agents crossed (2nd food pile) = 1.3

Average Statistics by Tribe
Tribe Vikings has total reward of 152.3

Average Statistics by Agent
Agent0 reward is 1.5
Agent1 reward is 2.8
Agent2 reward is 0.9
Agent3 reward is 11.0
Agent4 reward is 2.3
Agent5 reward is 10.3
Agent6 reward is 8.4
Agent7 reward is 15.4
Agent8 reward is 75.3
Agent9 reward is 2

..............................
Average Statistics in Aggregate
Total rewards gathered = 252.3
Av. agent reward = 25.23
Agents crossed (2nd food pile) = 1.2

Average Statistics by Tribe
Tribe Vikings has total reward of 252.3

Average Statistics by Agent
Agent0 reward is 2.0
Agent1 reward is 11.3
Agent2 reward is 156.2
Agent3 reward is 18.3
Agent4 reward is 2.3
Agent5 reward is 13.8
Agent6 reward is 2.1
Agent7 reward is 9.6
Agent8 reward is 24.6
Agent9 reward is 12.2
Training time per epochs: 2.14 sec
###### Trained episodes = 1500 #######
..............................
Average Statistics in Aggregate
Total rewards gathered = 309.0
Av. agent reward = 30.90
Agents crossed (2nd food pile) = 1.1

Average Statistics by Tribe
Tribe Vikings has total reward of 309.0

Average Statistics by Agent
Agent0 reward is 7.0
Agent1 reward is 2.2
Agent2 reward is 212.0
Agent3 reward is 0.7
Agent4 reward is 1.0
Agent5 reward is 19.9
Agent6 reward is 3.2
Agent7 reward is 22.1
Agent8 reward is 26.0
Agent9 

## Statistics for Research Report 


In [183]:
# Note: Statistics for Research Report   
print ('Average Agent Rewards')
for reward in av_agent_reward:   # Average agent reward
    print(reward)
    
print ('Agents Crossed (2nd food pile)')    
for agents_crossed in av_agent_crossed:   # Average num agents gathering in 2nd food pile
    print(agents_crossed)

Average Agent Rewards
[27.47, 35.36, 36.11, 35.96666666666667, 31.376666666666665, 35.46333333333333]
[29.676666666666666, 34.10333333333334, 33.25666666666667, 35.25666666666667, 35.3, 37.160000000000004]
[24.61, 34.11666666666667, 35.593333333333334, 35.21, 34.31666666666667, 35.50333333333334]
[31.22, 34.946666666666665, 35.626666666666665, 37.08, 36.836666666666666, 27.253333333333337]
[17.97, 34.2, 36.53333333333333, 36.473333333333336, 33.71666666666667, 37.50333333333334]
[29.526666666666664, 32.986666666666665, 33.056666666666665, 34.93333333333333, 35.626666666666665, 9.3]
[27.706666666666667, 29.956666666666667, 31.05, 34.663333333333334, 35.53666666666667, 32.89666666666666]
[19.616666666666667, 27.743333333333332, 34.38666666666667, 35.93333333333333, 35.223333333333336, 36.79333333333334]
[26.189999999999998, 32.69, 36.086666666666666, 37.28, 36.39333333333333, 35.58]
[32.126666666666665, 37.089999999999996, 38.83, 35.35, 14.136666666666667, 35.92]
[7.276666666666666, 31.1

## Performance Stats - Baseline (Map = food_d37_river_w1_d25)

Our research requires gathering game stats for agents and teams over 30 episodes of game play:

* Average agent reward - average number of apples gathered per agent per episode  
* The number of agents gathering apples at the 2nd food pile 

Rendering is disabled to speed things up.

<img src="images/Crossing-river.png" width="600">

In [4]:
import pickle
import numpy as np

import torch
from torch.autograd import Variable

dir_names = [
    "models/1T-10L/baseline/food_d37_river_w1_d25/pacifist/t1.0_rp-1.0_300gs/",   # scenario=18
    "models/1T-10L/baseline/food_d37_river_w1_d25/pacifist/t1.25_rp-1.0_300gs/",   # scenario=19 
    "models/1T-10L/baseline/food_d37_river_w1_d25/pacifist/t2.0_rp-1.0_300gs/",   # scenario=20
    "models/1T-10L/baseline/food_d37_river_w1_d25/pacifist/t4.0_rp-1.0_300gs/"   # scenario=21   
]  
             
episodes = [500, 1000, 1500, 2000, 2500, 3000] 
game = 'Crossing'
culture = "pacifist"
map_name = "food_d37_river_w1_d25"

# Performance Statistics - for Research Report
av_agent_reward = [[0 for i in episodes] for j in dir_names]
av_agent_crossed = [[0 for i in episodes] for j in dir_names]  
dominating_tribe = [[None for i in episodes] for j in dir_names]
dom_tribe_reward = [[0 for i in episodes] for j in dir_names]
dominance = [[0 for i in episodes] for j in dir_names]

# There will be 10 agents - 0 teams of 0 AI agents each and 0 random agent
num_ai_agents = 10
num_rdn_agents = 0
num_agents = num_ai_agents+num_rdn_agents  # just the sum of the two

# Data structure for AI agents (agents will form their own Class later on)
agents = []
actions = []
tags = []

# Initialize environment
render = True
SPEED = 1/30
river_penalty = -1
num_actions = 8                       # There are 8 actions defined in Gathering
second_pile_x = 50   # x-coordinate of the 2nd food pile

# Initialize constants
num_frames = 7
max_episodes = 30
max_frames = 500
verbose = False


for dir_num, dir_name in enumerate(dir_names):
    print ("###### Dir = {} #######".format(dir_name))
    
    for eps_num, eps in enumerate(episodes):
        print ("###### Trained episodes = {} #######".format(eps))
    
        # Load models for AI agents
        agents= [[] for i in range(num_ai_agents)]
        # If episodes is provided (not 0), load the model for each AI agent
        for i in range(num_ai_agents):
            model_file = dir_name+'MA{}_{}_ep{}.p'.format(i,game,eps)
            try:
                with open(model_file, 'rb') as f:
                    # Model File include both model and optim parameters
                    saved_model = pickle.load(f)
                    agents[i], _ = saved_model
                    # print("Load saved model for agent {}".format(i))
            except OSError:
                print('Model file not found.')
                raise

        # Load random agents    
        for i in range(num_ai_agents,num_agents):
            # print("Load random agent {}".format(i))
            agents.append(Rdn_Policy())
        
        # Establish tribal association
        tribes = []
        tribes.append(Tribe(name='Vikings',color='blue', culture=culture, \
                    agents=[agents[0], agents[1], agents[2], agents[3], agents[4], \
                           agents[5], agents[6], agents[7], agents[8], agents[9]]))

        # Set up agent and tribe info to pass into env
        agent_colors = [agent.color for agent in agents]
        agent_tribes = [agent.tribe for agent in agents]
        tribe_names = [tribe.name for tribe in tribes]
        
        env = CrossingEnv(n_agents=num_agents,agent_colors=agent_colors, agent_tribes=agent_tribes, \
                  map_name=map_name, river_penalty=river_penalty, tribes=tribe_names, \
                  debug_agent=0)

        # Used to accumulate episode stats for averaging
        cum_rewards = 0
        cum_crossed = 0
        cum_tags = 0
        cum_US_hits = 0
        cum_THEM_hits = 0
        cum_agent_rewards = [0 for agent in agents]
        cum_agent_tags = [0 for agent in agents]
        cum_agent_US_hits = [0 for agent in agents]
        cum_agent_THEM_hits = [0 for agent in agents]
        cum_tribe_rewards = [0 for t in tribes if t.name is not 'Crazies']

        cuda = False
        start = time.time()

        for ep in range(max_episodes):
    
            print('.', end='')  # To show progress
    
            # Initialize AI and random agent data
            actions = [0 for i in range(num_agents)]
            tags = [0 for i in range(num_agents)]
            US_hits = [0 for i in range(num_agents)]
            THEM_hits = [0 for i in range(num_agents)]
            
            # Keep track of agents gathering from 2nd food pile
            crossed = [0 for i in range(num_ai_agents)]

            env_obs = env.reset()  # Environment return observations
            """
            # For Debug only
            print (len(agents_obs))
            print (agents_obs[0].shape)
            """
    
            # Unpack observations into data structure compatible with agent Policy
            agents_obs = unpack_env_obs(env_obs)
    
            for i in range(num_ai_agents):    # Reset agent info - laser tag statistics
                agents[i].reset_info()    
    
            if render:
                env.render()
                time.sleep(SPEED)  # Change speed of video rendering
    
            """
            # For Debug only
            print (len(agents_obs))
            print (agents_obs[0].shape)
            """
    
            """
            For now, we do not stack observations, and we do not implement LSTM
    
            state = np.stack([state]*num_frames)

            # Reset LSTM hidden units when episode begins
            cx = Variable(torch.zeros(1, 256))
            hx = Variable(torch.zeros(1, 256))
            """

            for frame in range(max_frames):

                for i in range(num_ai_agents):    # For AI agents
                    actions[i], _ = select_action(agents[i], agents_obs[i], cuda=cuda)
                    if actions[i] is 6:  # action[i] is a tensor, .item() returns the integer
                        tags[i] += 1   # record a tag for accessing aggressiveness
                
                for i in range(num_ai_agents, num_agents):   # For random agents
                    actions[i] = agents[i].select_action(agents_obs[i])
                    if actions[i] is 6:
                        tags[i] += 1   # record a tag for accessing aggressiveness
        
                """
                For now, we do not implement LSTM
                # Select action
                action, log_prob, state_value, (hx,cx)  = select_action(model, state, (hx,cx))        
                """

                # if frame % 10 == 0:
                #     print (actions)    
            
                # Perform step        
                env_obs, reward, done, info = env.step(actions)
        
                """
                For Debug only
                print (env_obs)
                print (reward)
                print (done) 
                """

                for i in range(num_ai_agents):
                    agents[i].rewards.append(reward[i])  # Stack rewards

        
                # Unpack observations into data structure compatible with agent Policy
                agents_obs = unpack_env_obs(env_obs)
                load_info(agents, info, narrate=False)   # Load agent info for AI agents
        
                for i in range(num_agents):
                    US_hits[i] += agents[i].US_hit
                    THEM_hits[i] += agents[i].THEM_hit
            
                """
                For now, we do not stack observation, may come in handy later on
        
                # Evict oldest diff add new diff to state
                next_state = np.stack([next_state]*num_frames)
                next_state[1:, :, :] = state[:-1, :, :]
                state = next_state
                """
                        
                if render and ep is 0:   # render only the 1st episode per batch of 30
                    env.render()
                    time.sleep(SPEED)  # Change speed of video rendering

                if any(done):
                    print("Done after {} frames".format(frame))
                    break
                    
                for (i, loc) in env.consumption:
                    if loc[0] > second_pile_x:
                        # print ('agent {} gathered an apple in 2nd pile'.format(i))
                        crossed[i] = 1
            
            # Print out statistics of AI agents
            ep_rewards = 0
            ep_tags = 0
            ep_US_hits = 0
            ep_THEM_hits = 0
            ep_crossed = sum(crossed)     # calculated num agents gathering in 2nd pile for episode

            if verbose:
                print ('\nStatistics by Agent')
                print ('===================')
            for i in range(num_ai_agents):
                agent_tags = sum(agents[i].tag_hist)
                ep_tags += agent_tags
                cum_agent_tags[i] += agent_tags

                agent_reward = sum(agents[i].rewards)
                ep_rewards += agent_reward
                cum_agent_rewards[i] += agent_reward

                agent_US_hits = sum(agents[i].US_hits)
                agent_THEM_hits = sum(agents[i].THEM_hits)
                ep_US_hits += agent_US_hits
                ep_THEM_hits += agent_THEM_hits
                cum_agent_US_hits[i] += agent_US_hits
                cum_agent_THEM_hits[i] += agent_THEM_hits
        
                if verbose:
                    print ("Agent{} aggressiveness is {:.2f}".format(i, agent_tags/frame))
                    print ("Agent{} reward is {:d}".format(i, agent_reward))
                    # print('US agents hit = {}'.format(agent_US_hits))
                    # print('THEM agents hit = {}'.format(agent_THEM_hits ))
        
            cum_rewards += ep_rewards
            cum_crossed += ep_crossed
            cum_tags += ep_tags
            cum_US_hits += ep_US_hits
            cum_THEM_hits += ep_THEM_hits
    
            if verbose:
                print ('\nStatistics in Aggregate')
                print ('=======================')
                print ('Total rewards gathered = {}'.format(ep_rewards))
                print ('Num agents crossed = {}'.format(ep_crossed))
                # print ('Num laser fired = {}'.format(ep_tags))
                # print ('Total US Hit (friendly fire) = {}'.format(ep_US_hits))
                # print ('Total THEM Hit = {}'.format(ep_THEM_hits))
                # print ('friendly fire (%) = {0:.3f}'.format(ep_US_hits/(ep_US_hits+ep_THEM_hits+1e-7)))

            if verbose:
                print ('\nStatistics by Tribe')
                print ('===================')
            for i, t in enumerate(tribes):
                if t.name is not 'Crazies':
                    ep_tribe_reward = sum(t.sum_rewards())
                    cum_tribe_rewards[i] += ep_tribe_reward
                    if verbose:
                        print ('Tribe {} has total reward of {}'.format(t.name, ep_tribe_reward))

            for i in range(num_ai_agents):
                agents[i].clear_history()

        env.close()  # Close the rendering window
        end = time.time()

        print ('\nAverage Statistics in Aggregate')
        print ('=================================')
        total_rewards = cum_rewards/max_episodes
        print ('Total rewards gathered = {:.1f}'.format(total_rewards))
        av_agent_reward[dir_num][eps_num] = cum_rewards/max_episodes/num_ai_agents
        print ('Av. agent reward = {:.2f}'.format(av_agent_reward[dir_num][eps_num]))
        av_agent_crossed[dir_num][eps_num] = cum_crossed/max_episodes
        print ('Agents crossed (2nd food pile) = {:.1f}'.format(av_agent_crossed[dir_num][eps_num]))
        # print ('Num laser fired = {:.1f}'.format(cum_tags/max_episodes))
        # print ('Total US Hit (friendly fire) = {:.1f}'.format(cum_US_hits/max_episodes))
        # print ('Total THEM Hit = {:.1f}'.format(cum_THEM_hits/max_episodes))
        # print ('friendly fire (%) = {:.3f}'.format(cum_US_hits/(cum_US_hits+cum_THEM_hits+1e-7)))

        print ('\nAverage Statistics by Tribe')
        print ('=============================')
       
        for i, tribe in enumerate(tribes):
            if tribe.name is not 'Crazies':
                tribe_reward = cum_tribe_rewards[i]/max_episodes
                print ('Tribe {} has total reward of {:.1f}'.format(tribe.name, tribe_reward))    
                
                # Keep track of dominating team and the rewards gathered (only if more than 1 tribe)
                if len(tribes) > 1:
                    if tribe_reward > dom_tribe_reward[dir_num][eps_num]:   
                        dom_tribe_reward[dir_num][eps_num] = tribe_reward
                        dominating_tribe[dir_num][eps_num]  = tribe.name

        # Team dominance calculation (only if more than 1 tribe)
        if len(tribes) > 1:
            print ('Dominating Tribe: {}'.format(dominating_tribe[dir_num][eps_num]))
            dominance[dir_num][eps_num] = dom_tribe_reward[dir_num][eps_num]/((total_rewards - \
                                                dom_tribe_reward[dir_num][eps_num]+1.1e-7)/(len(tribes)-1))    
            print ('Team dominance: {0:.2f}x'.format(dominance[dir_num][eps_num]))

        print ('\nAverage Statistics by Agent')
        print ('=============================')
        for i in range(num_ai_agents):
            # print ("Agent{} of {} aggressiveness is {:.2f}".format(i, agents[i].tribe, \
            #                                               cum_agent_tags[i]/(max_episodes*max_frames)))
            print ("Agent{} reward is {:.1f}".format(i, cum_agent_rewards[i]/max_episodes))
            # print('US agents hit = {:.1f}'.format(cum_agent_US_hits[i]/max_episodes))
            # print('THEM agents hit = {:.1f}'.format(cum_agent_THEM_hits[i]/max_episodes))

        print('Training time per epochs: {:.2f} sec'.format((end-start)/max_episodes))

# Note: Statistics for Research Report        
for reward in av_agent_reward:   # Average agent reward
    print(reward)
for agents_crossed in av_agent_crossed:   # Average num agents gathering in 2nd food pile
    print(agents_crossed)

# print dominating team and dominance factor (only if more than 1 tribe)
if len(tribes) > 1:
    for tribe in dominating_tribe:   # Dominating team
        print(tribe)
    for value in dominance:      # Team dominance
        print(value)

###### Dir = models/1T-10L/baseline/food_d37_river_w1_d25/pacifist/t1.0_rp-1.0_300gs/ #######
###### Trained episodes = 500 #######
..............................
Average Statistics in Aggregate
Total rewards gathered = 80.8
Av. agent reward = 8.08
Agents crossed (2nd food pile) = 0.1

Average Statistics by Tribe
Tribe Vikings has total reward of 80.8

Average Statistics by Agent
Agent0 reward is 0.8
Agent1 reward is 0.5
Agent2 reward is 0.1
Agent3 reward is 24.3
Agent4 reward is 2.8
Agent5 reward is 0.2
Agent6 reward is 0.8
Agent7 reward is 21.1
Agent8 reward is 26.6
Agent9 reward is 3.6
Training time per epochs: 4.26 sec
###### Trained episodes = 1000 #######
..............................
Average Statistics in Aggregate
Total rewards gathered = 98.1
Av. agent reward = 9.81
Agents crossed (2nd food pile) = 0.2

Average Statistics by Tribe
Tribe Vikings has total reward of 98.1

Average Statistics by Agent
Agent0 reward is 0.0
Agent1 reward is 0.0
Agent2 reward is -0.7
Agent3 reward i

..............................
Average Statistics in Aggregate
Total rewards gathered = 121.2
Av. agent reward = 12.12
Agents crossed (2nd food pile) = 0.6

Average Statistics by Tribe
Tribe Vikings has total reward of 121.2

Average Statistics by Agent
Agent0 reward is 2.0
Agent1 reward is 2.8
Agent2 reward is 6.9
Agent3 reward is 13.6
Agent4 reward is 33.1
Agent5 reward is 2.5
Agent6 reward is 20.6
Agent7 reward is -0.2
Agent8 reward is 15.4
Agent9 reward is 24.5
Training time per epochs: 4.22 sec
###### Trained episodes = 1500 #######
..............................
Average Statistics in Aggregate
Total rewards gathered = 132.5
Av. agent reward = 13.25
Agents crossed (2nd food pile) = 0.4

Average Statistics by Tribe
Tribe Vikings has total reward of 132.5

Average Statistics by Agent
Agent0 reward is 4.8
Agent1 reward is 0.1
Agent2 reward is 0.1
Agent3 reward is 5.0
Agent4 reward is 8.5
Agent5 reward is 1.6
Agent6 reward is 62.2
Agent7 reward is -1.0
Agent8 reward is 18.7
Agent9 rew

In [6]:
# Note: Statistics for Research Report   
print ('Average Agent Rewards')
for reward in av_agent_reward:   # Average agent reward
    print(reward)
    
print ('Agents Crossed (2nd food pile)')    
for agents_crossed in av_agent_crossed:   # Average num agents gathering in 2nd food pile
    print(agents_crossed)

Average Agent Rewards
[8.083333333333332, 9.809999999999999, 9.106666666666666, 9.15, 9.256666666666666, 9.203333333333333]
[8.973333333333333, 15.1, 12.033333333333333, 13.969999999999999, 21.27, 25.18]
[6.6, 12.123333333333333, 13.246666666666666, 10.723333333333333, 10.77, 9.3]
[4.82, 8.85, 9.273333333333333, 9.290000000000001, 9.113333333333333, 9.196666666666667]
Agents Crossed (2nd food pile)
[0.06666666666666667, 0.2, 0.0, 0.0, 0.0, 0.0]
[1.6, 0.9333333333333333, 0.4, 1.1, 1.0, 1.2]
[0.2, 0.5666666666666667, 0.4, 0.23333333333333334, 0.13333333333333333, 0.0]
[0.43333333333333335, 0.03333333333333333, 0.0, 0.0, 0.0, 0.0]
