Pommerman baseline using action filter
Chao Gao, Pablo Hernandez-Leal, Bilal Kartal, and Matthew E. Taylor "Skynet: A Top Deep RL Agent in the Inaugural Pommerman Team Competition". 4th Multidisciplinary Conference on Reinforcement Learning and Decision Making (2019)
and here's a BibTeX entry that you can use to cite it in a publication:
@inproceedings{gao2019skynet,
author = {Chao Gao and Pablo Hernandez-Leal and Bilal Kartal and Taylor, Matthew E.},
title = {Skynet: A Top Deep RL Agent in the Inaugural Pommerman Team Competition},
year = {2019},
booktitle={4th Multidisciplinary Conference on Reinforcement Learning and Decision Making},
}
It is an module for Pommerman. As the name indicates, it prunes actions that would surely lead the agent to death, given the bombs nearby.The safety of each cell can be computed by comparing the minimum number of steps to evade and the minimum bomb life value covering this cell.
This implementation is an upgraded (and arguably stronger) version of the ActionFilter used in skynet955.
In your your agent, call
import action_prune
actions = action_prune.get_filtered_actions(observation)
where observation
is the agent observation received by the
agent while calling the step
function in Pommerman environment.
It is natural to build a few players using this action filter module. More specifically, random player with 3 different bombing action prune options.
simple
: simply not place a bomb when the agent's position is covered by some bomb nearby.simple_adjacent
: similar to 1., but also apply the test to the agent's adjacent passable cells.lookahead
: try place a bomb, then call the action pruning algorithm, if all actions are pruned, which means the agent is probably doomed to die, then placing bomb action should be avoided.
For example, the following defines a player that takes actions randomly (including placing bombs) using the filter.
class SmartRandomAgent(BaseAgent):
""" random with filtered actions"""
def act(self, obs, action_space):
valid_actions=action_prune.get_filtered_actions(obs)
if len(valid_actions) ==0:
valid_actions.append(Action.Stop.value)
return random.choice(valid_actions)
def episode_end(self,reward):
pass
We also release a similar agent which uses the filter and only moves randomly, this is, it does not place bombs.
class SmartRandomAgentNoBomb(BaseAgent):
""" random with filtered actions but no bomb"""
def act(self, obs, action_space):
valid_actions=action_prune.get_filtered_actions(obs)
if Action.Bomb.value in valid_actions:
valid_actions.remove(Action.Bomb.value)
if len(valid_actions) ==0:
valid_actions.append(Action.Stop.value)
return random.choice(valid_actions)
Lastly, we provide an agent which is based on a modification of SimpleAgent
, the main idea is to only let this agent place a bomb when it is certain to kill an opponent, this agent we named as CautiousAgent
.
class CautiousAgent(BaseAgent):
We use the following script to test the effective of the action filter, where SmartRandomAgent
is a player who plays randomly after prunning actions using the filter.
'''An example to show how to set up an pommerman game programmatically'''
import os
import sys
import pommerman
from pommerman import agents
from pommerman.agents import random_agent
ENV_ID='PommeFFACompetition-v0'
RENDER=True
N_game=20
#ENV_ID="PommeTeamCompetition-v0"
def main():
# Print all possible environments in the Pommerman registry
print(pommerman.REGISTRY)
print(ENV_ID)
# Create a set of agents (exactly four)
agent_list = [
random_agent.SmartRandomAgent(),
agents.simple_agent.SimpleAgent(),
agents.simple_agent.SimpleAgent(),
agents.simple_agent.SimpleAgent(),
]
# Make the "Free-For-All" environment using the agent list
env = pommerman.make(ENV_ID, agent_list)
# Run the episodes just like OpenAI Gym
win_cnt=0; draw_cnt=0; lost_cnt=0
for i_episode in range(N_game):
state = env.reset()
done = False
step_cnt=0
while not done:
if RENDER: env.render()
actions = env.act(state)
state, rewards, done, info = env.step(actions)
step_cnt +=1
if rewards[0]>0: win_cnt +=1
elif step_cnt>=800: draw_cnt +=1
else: lost_cnt = lost_cnt + 1
print('Episode {} finished'.format(i_episode))
print('win:', win_cnt, 'draw_cnt:', draw_cnt, 'lose_cnt:', lost_cnt)
print('\n')
env.close()
if __name__ == '__main__':
main()
Using the above script and option 2
, running results of 100 games, four trials starting from different corners:
- win 45, draw 7, lose 48
- win 34, draw 14, lose 52
- win 45, draw 10, lose 45
- win 34, draw 13, lose 53
Give the observation that a player with similar strength to SimpleAgent
should yield win probability around 0.25, the above results show that the action filter is quite effective --- even a random player with such a filter could perform better than a well-designed search agent baseline.
Notice the strategy flaw
of SimpleAgent
is due to its deterministic strategy, we
believe that the Random with action filter
player might be a good baseline for pommerman for training and testing.
Another use is to plug in this filter into a learning algorithm, as we did for skynet955
.
Further development is possible, perhaps in the following directions:
- considering other agent's moves, current code treats all other agents as static.
See the COPYING file for details