In [1]:
import numpy as np

from colosseumrl.envs.tron import TronGridEnvironment, TronRender
from random import choice
from time import sleep

# ColosseumRL

ColosseumRL is a python library to assist in research related to multi-agent games and training agents for them. This library provides a variety of different mutli-agent environments to experiment with. In this example, we will be focusing on the Tron environment. Inspired by the 1982 movie, this game features snake-like mechanics where players try to force their opponenets to crash. First, will examine the environment api for exectuing actions and playing a full game of Tron.

## The Basic Agent (not very intelligent)

A simple test agent to provide us with random actions for playing a full game.

In [2]:
class RandomAgent:
    def __call__(self, env, observation):
        return choice(['forward', 'left', 'right'])

## Play a Game

We will now create the standard reinforcement learning training loop.

1. Observe the current state of the environment
2. Decide on an action
3. Execute the action
4. Recieve a reward and observe a new state
5. Repeat

In [3]:
# Create a Tron environment on a 19x19 Grid
env = TronGridEnvironment.create(board_size=19, num_players=4)
renderer = TronRender(board_size=19, num_players=4)

# Create our simple random agent
agent = RandomAgent()

In [10]:
# Start the game with the first important API call
# `env.new_state` creates a new game

# The current state of the environment has two componenets
# - the state object contains all of the information about the game configuration
# - the player object contains the current acting players
# We also create a terminal boolean which starts as false
state, players = env.new_state()
terminal = False

# Create a Rendering box
renderer.close()
renderer.render(state)

# Play until the game is over
while not terminal:
    # Let each player select an action for their respective observations
    actions = [agent(env, env.state_to_observation(state, player)) for player in players]
    
    # Perform actions simultaneously with the other important API call
    # `env.next_state` exectutes actions and produces the next configuration of the game
    
    # Notice we recieve new state and players objects
    # We also recieve 
    # - reward: A list containing every player's reward after the previous action
    # - terminal: An updated boolean informing us if the game is over
    # - winners: A list of winners which is None when the game is not over yet.
    state, players, rewards, terminal, winners = env.next_state(state, players, actions)
    renderer.render(state)
    
    # Wait so we can see whats happening
    sleep(0.05)

# Finish up
# Here we have the final useful api call for all environments:
# `env.compute_rankings` produces the full mapping of rankings after the conclusion of the game
# This rankings is a dictionary mapping player_number -> rank
if winners.size == 0:
    print(f"No single player won. Tie with rankings: {env.compute_ranking(state, players, winners)}")
else:
    print(f"Player {winners[0]} wins.")

Player 0 wins.
