# Multi-Team Gathering Environment

The Gathering Game is a miniworld composed of:
* Agents (Gatherers)
* Resource (Food units) - a source of positive rewards provided by the environment
* Weaponry (Laser) - the ability to inflict negative reward built into the environment

The way we organize the rewards of the agents can transform the game and the problem:
* **A world of selfish lone beings** - It is equivalent to a world with a horse, a sheep, a hog, etc. There can be neither context nor reason for them to cooperate for the common good. Cooperation has to come about due to selfishness and reciprocity.
* **A world of communal beings** - It is equivalent to a world of psuedo-communists. All rewards gathered are shared by default. Cooperation comes about due to the need to maximize total reward.
* **A world of competing tribes** - It is equivalent to a world of 2 tribes of psuedo-communists. All rewards gathered by a tribe are shared within by tribe members. Cooperation within tribe comes about due to the need to maximize total reward for the tribe. Cooperation between tribes still has to come about due to selfishness.


In [1]:
import os
import random
import time
import platform
import torch
import gym
import numpy as np

from env import GatheringEnv   # This is the Game Environment
from model import *   # Use the Policy defined below instead

# for auto-reloading external modules
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2

print("Python version: ", platform.python_version())
print("Pytorch version: {}".format(torch.__version__))
print("OpenAI Gym version: {}".format(gym.__version__))

Python version:  3.6.4
Pytorch version: 0.4.1.post2
OpenAI Gym version: 0.9.2


In [3]:
from model import *    # Use the Policy and Rdn_policy defined in model.py

# There will be 4 agents - 3 AI agents, 1 random agent
num_ai_agents = 3
num_rdn_agents = 1
num_agents = num_ai_agents+num_rdn_agents  # just the sum of the two

# Data structure for AI agents (agents will form their own Class later on)
ai_agents = []
actions = []
tags = []
rewards = []

env = GatheringEnv(n_agents=num_agents, map_name='default')

# Env API is similar to that of OpenAI Gym
state_n = env.reset()
env.render()

# Load AI agents with trained weights
for i in range(num_ai_agents):
    print("Load agent {}".format(i))
    ai_agents.append(Policy(env.state_size, i+1))
    ai_agents[i].load_weights()
# Load random agents    
for i in range(num_ai_agents,num_agents):
    print("Load agent {}".format(i))
    ai_agents.append(Rdn_Policy())

# Initialize AI and random agent data
for i in range(num_agents):
    actions = [0 for i in range(num_agents)]
    tags = [0 for i in range(num_agents)]
    rewards = [0 for i in range(num_agents)]

n_steps = 1000

# Render for n_steps steps
for step in range(n_steps):
    # Load AI agent with trained weights
    for i in range(num_agents):
        actions[i] = ai_agents[i].select_action(state_n[i])
        if actions[i] is 6:
            tags[i] += 1   # record a tag for accessing aggressiveness

    if step % 10 == 0:
        print (actions)    
            
    state_n, reward_n, done_n, info_n = env.step(actions)

    for i in range(num_agents):
        rewards[i] += reward_n[i]    # Accumulate rewards for each agent
        
    if any(done_n):
        break
    env.render()
    time.sleep(1/30)  # Change speed of video rendering

env.close()  # Close the rendering window

# Print out statistics of all agents
for i in range(num_agents):
    print ("Agent{} aggressiveness is {:.2f}".format(i+1, tags[i]/n_steps))
    print ("Agent{} reward is {:d}".format(i+1, rewards[i]))

Load agent 0
Load agent 1
Load agent 2
Load agent 3
[tensor(4), tensor(4), tensor(2), 4]


  return F.softmax(action_scores)


[tensor(1), tensor(4), tensor(6), 4]
[tensor(0), tensor(4), tensor(1), 1]
[tensor(6), tensor(1), tensor(6), 2]
[tensor(0), tensor(1), tensor(6), 2]
[tensor(0), tensor(5), tensor(6), 6]
[tensor(1), tensor(4), tensor(5), 4]
[tensor(4), tensor(1), tensor(6), 6]
[tensor(4), tensor(1), tensor(6), 4]
[tensor(6), tensor(1), tensor(0), 2]
[tensor(0), tensor(0), tensor(2), 6]
[tensor(6), tensor(4), tensor(6), 7]
[tensor(4), tensor(4), tensor(2), 7]
[tensor(1), tensor(7), tensor(0), 7]
[tensor(6), tensor(4), tensor(0), 7]
[tensor(0), tensor(0), tensor(0), 7]
[tensor(4), tensor(3), tensor(0), 0]
[tensor(6), tensor(6), tensor(1), 4]
[tensor(1), tensor(2), tensor(0), 4]
[tensor(5), tensor(2), tensor(6), 5]
[tensor(4), tensor(4), tensor(0), 0]
[tensor(0), tensor(1), tensor(0), 1]
[tensor(5), tensor(5), tensor(1), 5]
[tensor(4), tensor(1), tensor(6), 2]
[tensor(6), tensor(0), tensor(0), 0]
[tensor(4), tensor(0), tensor(0), 5]
[tensor(4), tensor(6), tensor(0), 6]
[tensor(3), tensor(0), tensor(6), 4]
[

In [27]:
data = [[[0, 1]] * 10] * 6



In [23]:
space = gym.spaces.MultiDiscrete([4,1,1 ])

In [24]:
print(np.asarray([4,1,1], dtype=np.int32))

[4 1 1]
