- `observation_space` and `action_space` walkthrough: https://www.aicrowd.com/showcase/neurips-2022-neural-mmo-challenge-tutorial
- Use baselines and perform a rollout in env walkthrough: https://gitlab.aicrowd.com/neural-mmo/neurips2022-nmmo-starter-kit
- NeurIPS team-based competition repo: https://gitlab.aicrowd.com/neural-mmo/neurips2022-nmmo/-/tree/master/

In [102]:
import nmmo
from nmmo import config
from nmmo.io import action

from neurips2022nmmo.scripted import baselines
from neurips2022nmmo import Team
from neurips2022nmmo import CompetitionConfig, scripted, RollOut, TeamBasedEnv

from IPython.display import display
import gym

# Initialise the environment

In [105]:
class TestConfig(config.Medium,
                 config.AllGameSystems):
    pass

# init config
# conf = TestConfig()
conf = CompetitionConfig()
print('-'*100)
print('Displaying Config Parameters')
print('-'*100)
for attr in dir(conf):
    if not attr.startswith('__'):
        try:
            print(f'{attr}: {getattr(conf, attr)}')
        except:
            print(f'Unable to display attr {attr}')
print('-'*100)

# init env
# env = nmmo.Env(conf) # use if conf == TestConfig
env = TeamBasedEnv # use if conf == CompetitionConfig

----------------------------------------------------------------------------------------------------
Displaying Config Parameters
----------------------------------------------------------------------------------------------------
COMBAT_DAMAGE_FORMULA: <bound method Combat.COMBAT_DAMAGE_FORMULA of <neurips2022nmmo.config.CompetitionConfig object at 0x7fbe71ebbfd0>>
COMBAT_FRIENDLY_FIRE: False
COMBAT_MAGE_DAMAGE: 30
COMBAT_MAGE_REACH: 3
COMBAT_MELEE_DAMAGE: 30
COMBAT_MELEE_REACH: 3
COMBAT_RANGE_DAMAGE: 30
COMBAT_RANGE_REACH: 3
COMBAT_SPAWN_IMMUNITY: 20
COMBAT_SYSTEM_ENABLED: True
COMBAT_WEAKNESS_MULTIPLIER: 1.5
COMMUNICATION_NUM_TOKENS: 170
COMMUNICATION_SYSTEM_ENABLED: True
EMULATE_CONST_HORIZON: False
EMULATE_CONST_PLAYER_N: False
EMULATE_FLAT_ATN: False
EMULATE_FLAT_OBS: False
EQUIPMENT_AMMUNITION_BASE_DAMAGE: 0
EQUIPMENT_AMMUNITION_LEVEL_DAMAGE: 10
EQUIPMENT_ARMOR_BASE_DEFENSE: 0
EQUIPMENT_ARMOR_LEVEL_DEFENSE: 4
EQUIPMENT_SYSTEM_ENABLED: True
EQUIPMENT_TOOL_BASE_DEFENSE: 0
EQUIPMEN

# Examine the observation space

First lets write a function for displaying an agent's observation in a human-readable manner:

In [106]:
def display_agent_obs(obs):
    for key, val in obs.items():
        print('-'*100)
        print(f'{key}:')
        for k, v in val.items():
            print(f'\t{k}:\n{v}')
    print('-'*100)

An observation in `nmmo` consists of the following components:
- **Entity**: the information of yourself, other players and npcs.
    - Continuous: the continuous features, a 2d ndarray with shape 100*24.
        - The first dimention 100 is the max number of agents that can be observed, and is controlled by config.N_AGENT_OBS.
        - The second dimention 24 is the number of feature columns and the meaning of each column will be explained in detail.
    - Discrete: the discrete features, a 2d ndarray with shape 100*5.
        - The first dimention 100 is the max number of agents that can be observed, and is controlled by config.N_AGENT_OBS.
        - The second dimention 5 is the number of feature columns.
        - Notice that the discrete information is duplicate of (a part of) the continuous information, which means you can simply drop the discrete information.
    - N: the number of agents observed (including yourself) in current vision.
- **Tile**: the information of local map with 15x15 size.
    - Continuous: the continuous features, a 2d ndarray with shape 225*4.
        - The first dimention 225 is the number of tiles within agent's vision, which is controlled by config.NSTIM. When config.NSTIM=7 by default, the number of tiles is (2*7+1)^2 = 225.
        - The second dimention 4 is the number of feature columns and the meaning of each column will be explained in detail.
    - Discrete: the discrete features, a 2d ndarray with shape 225*3.
        - The first dimention 225 is the number of tiles within agent's vision, which is controlled by config.NSTIM.
       -  The second dimention 3 is the number of feature columns.
        - Notice that the discrete information is also duplicate of (a part of) the continuous information, which means you can simply drop the discrete information.
    - N: the number of distinct tile observations (fixed) ## Item Information
- **Item**: the information of weapon, tool, comsummer that the Entity equiped
    - Continuous: the continuous features, a 2d ndarray with shape 170*16.
        - The first dimention 170 is the number of items, which is controlled by config.NPC_LEVEL_MAX. When config.NPC_LEVEL_MAX=10 by default, the number of items is 17*10 = 170.
        - The second dimention 16 is the number of feature columns and the meaning of each column will be explained in detail.
    - Discrete: the discrete features, a 2d ndarray with shape 170*3.
        - The first dimention 170 is the number of items, which is controlled by config.NPC_LEVEL_MAX. When config.NPC_LEVEL_MAX=10 by default, the number of items is 17*10 = 170.
        - The second dimention 3 is the number of feature columns.
        - Notice that the discrete information is also duplicate of (a part of) the continuous information, which means you can simply drop the discrete information.
    - N: the number of distinct item observations (fixed)
- **Market**: the information (in selling goods) of global market]
    - Continuous: the continuous features, a 2d ndarray with shape 170*16.
        - The first dimention 170 is the max number of items in market, which is controlled by config.NPC_LEVEL_MAX. When config.NPC_LEVEL_MAX=10 by default, the number of items is 17*10 = 170.
        - The second dimention 16 is the number of feature columns and the meaning of each column will be explained in detail.
    - Discrete: the discrete features, a 2d ndarray with shape 170*3.
        - The first dimention 170 is the number of items, which is controlled by config.NPC_LEVEL_MAX. When config.NPC_LEVEL_MAX=10 by default, the number of items is 17*10 = 170.
        - The second dimention 3 is the number of feature columns.
        - Notice that the discrete information is also duplicate of (a part of) the continuous information, which means you can simply drop the discrete information.
    - N: the number of distinct item observations (fixed)

Lets look at the observation space of each agent:

**N.B. Below code only works for `nmmo.Env` environment, not for `neurips2022nmmo.TeamBasedEnv`**

In [107]:
display_agent_obs(env.observation_space(agent=1))

AttributeError: type object 'TeamBasedEnv' has no attribute 'observation_space'

Each agent (numbered $1$ to $N$ for $N$ agents) has its own observation indexed by its key:

In [108]:
obs = env.reset()
print(obs.keys())

TypeError: reset() missing 1 required positional argument: 'self'

Lets look at an agent's observation:

In [109]:
agent = 1

print(f'Agent {agent} obs:')
display_agent_obs(obs[agent])

Agent 1 obs:
----------------------------------------------------------------------------------------------------
Entity:
	Continuous:
[[1. 1. 0. ... 1. 1. 1.]
 [1. 2. 0. ... 1. 1. 1.]
 [1. 3. 0. ... 1. 1. 1.]
 ...
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]]
	Discrete:
[[  1   2 176 208 376]
 [  1   2 176 208 378]
 [  1   2 176 208 380]
 [  1   2 176 208 382]
 [  0   0   0   0   0]
 [  0   0   0   0   0]
 [  0   0   0   0   0]
 [  0   0   0   0   0]
 [  0   0   0   0   0]
 [  0   0   0   0   0]
 [  0   0   0   0   0]
 [  0   0   0   0   0]
 [  0   0   0   0   0]
 [  0   0   0   0   0]
 [  0   0   0   0   0]
 [  0   0   0   0   0]
 [  0   0   0   0   0]
 [  0   0   0   0   0]
 [  0   0   0   0   0]
 [  0   0   0   0   0]
 [  0   0   0   0   0]
 [  0   0   0   0   0]
 [  0   0   0   0   0]
 [  0   0   0   0   0]
 [  0   0   0   0   0]
 [  0   0   0   0   0]
 [  0   0   0   0   0]
 [  0   0   0   0   0]
 [  0   0   0   0   0]
 [  0   0   0   0   0]
 [  0   

# Examine the action space

In [110]:
act_space = env.action_space(agent=1)
print("Action space:")
print("*"*2,action.Attack,": ",act_space[action.Attack])
print("-"*8,action.Style,": ",act_space[action.Attack][action.Style])
print("-"*8,action.Target,": ",act_space[action.Attack][action.Target])
print("*"*2,action.Move,": ",act_space[action.Move])
print("-"*8,action.Direction,act_space[action.Move][action.Direction])
print("*"*2,action.Buy,": ",act_space[action.Buy])
print("-"*8,action.Item, act_space[action.Buy][action.Item])
print("*"*2,action.Sell,": ",act_space[action.Sell])
print("-"*8,action.Item, act_space[action.Sell][action.Item])
print("-"*8,action.Price, act_space[action.Sell][action.Price])
print("*"*2,action.Use,": ",act_space[action.Use])
print("-"*8,action.Item, act_space[action.Use][action.Item])
print("*"*2,action.Comm,": ",act_space[action.Comm])
print("-"*8,action.Token, act_space[action.Comm][action.Token])

AttributeError: type object 'TeamBasedEnv' has no attribute 'action_space'

# Examine the player baselines shipped with `nmmo`

In [111]:
from neurips2022nmmo.scripted import baselines

print(dir(baselines))

['Action', 'Alchemist', 'Carver', 'Combat', 'Explore', 'Fisher', 'Forage', 'Gather', 'Herbalist', 'Item', 'Mage', 'Meander', 'Melee', 'Prospector', 'Random', 'Range', 'Scripted', 'Serialized', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', 'attack', 'colors', 'defaultdict', 'item', 'material', 'move', 'nmmo', 'random', 'scripting', 'skill']


# Create a team using the provided baseline players

Lets assume with have `config.PLAYER_TEAM_SIZE=8` (adjust as appropriate depending on how `config` is defined above):

In [220]:
agents = [
    baselines.Fisher,
    baselines.Herbalist,
    baselines.Prospector,
    baselines.Carver,
    baselines.Alchemist,
    baselines.Melee,
    baselines.Range,
    baselines.Mage,
]

N.B. From the docs https://neuralmmo.github.io/beta/build/html/rst/tutorial.html, we know that agent actions are called by calling `agent(obs)` since behaviour is defined in the `__call__` method (see the `LavaAgent` example in the tutorial link). 

In [221]:
from typing import Any, Dict, Type, List
import numpy as np

class MyTeam(Team):
    def __init__(self, 
                 team_id: str,
                 agents: list,
                 conf=None, 
                 **kwargs):
        super().__init__(team_id, conf)
        self.team_id = team_id
        self.agents = [agent(config=conf, idx=idx) for idx, agent in enumerate(agents)]
            
    def reset(self):
        pass
    
    def act(self, observations: Dict[Any, dict]) -> Dict[int, dict]:
        if "stat" in observations:
            stat = observations.pop("stat")
        actions = {i: self.agents[i](obs) for i, obs in observations.items()}
        for i in actions:
            for atn, args in actions[i].items():
                for arg, val in args.items():
                    if arg.argType == nmmo.action.Fixed:
                        actions[i][atn][arg] = arg.edges.index(val)
                    elif arg == nmmo.action.Target:
                        actions[i][atn][arg] = self.get_target_index(
                            val, self.agents[i].ob.agents)
                    elif atn in (nmmo.action.Sell,
                                 nmmo.action.Use) and arg == nmmo.action.Item:
                        actions[i][atn][arg] = self.get_item_index(
                            val, self.agents[i].ob.items)
                    elif atn == nmmo.action.Buy and arg == nmmo.action.Item:
                        actions[i][atn][arg] = self.get_item_index(
                            val, self.agents[i].ob.market)
        return actions

    @staticmethod
    def get_item_index(instance: int, items: np.ndarray) -> int:
        for i, itm in enumerate(items):
            id_ = nmmo.scripting.Observation.attribute(itm,
                                                       nmmo.Serialized.Item.ID)
            if id_ == instance:
                return i
        raise ValueError(f"Instance {instance} not found")

    @staticmethod
    def get_target_index(target: int, agents: np.ndarray) -> int:
        targets = [
            x for x in [
                nmmo.scripting.Observation.attribute(
                    agent, nmmo.Serialized.Entity.ID) for agent in agents
            ] if x
        ]
        return targets.index(target)

    
    
    
    
    
    
my_team = MyTeam(team_id='my_team',
                 agents=agents,
                 conf=conf)

# Create all teams for game

Lets assume there are `config.PLAYER_N / config.PLAYER_TEAM_SIZE = 128 / 8 = 16` teams in the environment

Lets have have 5 of the `scripted.CombatTeam` teams, 10 of the `scripted.MixtureTeam` teams, and 1 custom team:

In [222]:
teams = [scripted.CombatTeam(f"Combat-{i}", conf) for i in range(5)]
teams.extend([scripted.MixtureTeam(f"Mixture-{i}", conf) for i in range(10)])
teams.append(my_team)

In [230]:
test = {'a': 1, 'b': 2}
print(len(test))
for a in range(1):
    print(a)

2
0


# Run simulation using RollOut

In [231]:
ro = RollOut(conf, teams, parallel=True, show_progress=True)
ro.run(n_episode=1)

  5%|████████▎                                                                                                                                                              | 2/40 [00:00<00:03, 12.06it/s]

Generating 40 maps


100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 40/40 [00:02<00:00, 15.17it/s]
  0%|                                                                                                                                                                             | 0/1024 [00:00<?, ?it/s]

Start Episode 1!


100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1024/1024 [02:07<00:00,  8.04it/s]

Result of Episode 1:
+------------------------------------------------------------------------------------------+
| Team            TotalScore   AliveScore   DefeatScore   TimeAlive   Gold     DamageTaken |
+------------------------------------------------------------------------------------------+
| Mixture-7       11.00        10.00        1.00          1024.00     63.00    4071.19     |
| (MixtureTeam)                                                                            |
+------------------------------------------------------------------------------------------+
| Combat-2        6.50         6.00         0.50          1022.00     5.00     2903.62     |
| (CombatTeam)                                                                             |
+------------------------------------------------------------------------------------------+
| Mixture-8       5.00         5.00         0.00          997.00      229.00   5297.19     |
| (MixtureTeam)                                  




[{12: TeamResult(policy_id='MixtureTeam', alive_score=10.0, defeat_score=1.0, total_score=11.0, time_alive=1024.0, gold=63.0, damage_taken=4071.1875, n_timeout=0),
  2: TeamResult(policy_id='CombatTeam', alive_score=6.0, defeat_score=0.5, total_score=6.5, time_alive=1022.0, gold=5.0, damage_taken=2903.625, n_timeout=0),
  13: TeamResult(policy_id='MixtureTeam', alive_score=5.0, defeat_score=0.0, total_score=5.0, time_alive=997.0, gold=229.0, damage_taken=5297.1875, n_timeout=0),
  8: TeamResult(policy_id='MixtureTeam', alive_score=4.0, defeat_score=1.0, total_score=5.0, time_alive=997.0, gold=353.0, damage_taken=8829.6875, n_timeout=0),
  14: TeamResult(policy_id='MixtureTeam', alive_score=3.0, defeat_score=2.0, total_score=5.0, time_alive=989.0, gold=159.0, damage_taken=4622.4375, n_timeout=0),
  4: TeamResult(policy_id='CombatTeam', alive_score=2.0, defeat_score=1.5, total_score=3.5, time_alive=918.0, gold=3.0, damage_taken=3517.25, n_timeout=0),
  5: TeamResult(policy_id='MixtureTea

# Run simulation using custom loop

In [234]:
print(dir(my_team))

['__annotations__', '__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', 'act', 'agents', 'env_config', 'get_item_index', 'get_target_index', 'id', 'n_player', 'n_timeout', 'policy_id', 'reset', 'team_id']


In [238]:
class EnvLoop:
    def __init__(self,
                 env_cls,
                 env_conf,
                 teams,
                 **kwargs):
        self.env_conf = env_conf
        self.env = env_cls(conf)
        
        self.teams = teams
    
    def run(self,
            verbose: bool = False,
            **kwargs):
        '''Runs one episode.'''
        if verbose:
            print(f'Starting environment episode...')
        
        observations = self.env.reset()
        step_counter = 1
        while observations:
            if verbose:
                print(f'\nStep {step_counter}')
            team_to_player_to_actions = self._get_team_to_player_to_actions(observations)
            
            # # display(team_to_player_to_actions)
            # for team_idx, team_actions in team_to_player_to_actions.items():
            #     print(team_idx)
            #     for player_idx, player_actions in team_actions.items():
            #         print(f'\t{player_idx}')
            #         for action_type, chosen_action in player_actions.items():
            #             print(f'\t\t{action_type}:')
            #             for dim_key, dim_val in chosen_action.items():
            #                 print(f'\t\t\t{dim_key}: {dim_val}')
                            
            observations, rewards, dones, infos = self.env.step(team_to_player_to_actions)
            print(f'observations: {observations[0]}')
            print(f'rewards: {rewards}')
            print(f'dones: {dones}')
            # print(f'infos: {infos.keys()}')
            print(f'infos: {infos}')
            step_counter += 1
            raise Exception()
    
    def _get_team_to_player_to_actions(self, observations):
        team_to_player_to_actions = {}
        for team_idx, team_observations in observations.items():
            team_to_player_to_actions[team_idx] = self.teams[team_idx].act(team_observations)
        return team_to_player_to_actions
        

env_loop = EnvLoop(env_cls=TeamBasedEnv, env_conf=conf, teams=teams)
env_loop.run(verbose=True)

  5%|████████▎                                                                                                                                                              | 2/40 [00:00<00:02, 16.12it/s]

Generating 40 maps


100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 40/40 [00:02<00:00, 16.82it/s]


Starting environment episode...

Step 1
observations: {0: {'Entity': {'Continuous': array([[1., 1., 0., ..., 1., 1., 1.],
       [1., 2., 0., ..., 1., 1., 1.],
       [1., 3., 0., ..., 1., 1., 1.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]], dtype=float32), 'Discrete': array([[  1,   2, 176, 209, 376],
       [  1,   2, 176, 209, 378],
       [  1,   2, 176, 209, 380],
       [  1,   2, 176, 209, 382],
       [  0,   0,   0,   0,   0],
       [  0,   0,   0,   0,   0],
       [  0,   0,   0,   0,   0],
       [  0,   0,   0,   0,   0],
       [  0,   0,   0,   0,   0],
       [  0,   0,   0,   0,   0],
       [  0,   0,   0,   0,   0],
       [  0,   0,   0,   0,   0],
       [  0,   0,   0,   0,   0],
       [  0,   0,   0,   0,   0],
       [  0,   0,   0,   0,   0],
       [  0,   0,   0,   0,   0],
       [  0,   0,   0,   0,   0],
       [  0,   0,   0,   0,   0],
       [  0,   0,   0,   0,   0],
 

Exception: 

In [233]:
print(info)

NameError: name 'info' is not defined