Skip to content

Creating Environments

Christian Kauten edited this page Jan 6, 2019 · 15 revisions

nes-py provides an interface to building custom OpenAI Gym environments for individual NES games in pure Python. This page provides a reference for this interface with some examples based on Super Mario Bros.

Boilerplate

Use this stub code when defining your own interfaces. It utilizes designs that are backward compatible with python 2, which is highly recommended as nes-py is python 2 compatible.

"""An OpenAI Gym interface to the NES game <TODO: Game Name>"""
from nes_py import NESEnv


class FooGame(NESEnv):
    """An OpenAI Gym interface to the NES game <TODO: Game Name>"""

    def __init__(self):
        """Initialize a new <TODO: Game Name> environment."""
        super(FooGame, self).__init__('TODO: path to ROM for the game')
        # setup any variables to use in the below callbacks here

    def _will_reset(self):
        """Handle any RAM hacking after a reset occurs."""
        # use this method to perform setup before and episode resets.
        # the method returns None
        pass

    def _did_reset(self):
        """Handle any RAM hacking after a reset occurs."""
        # use this method to access the RAM of the emulator 
        # and perform setup for each episode. 
        # the method returns None
        pass

    def _did_step(self, done):
        """
        Handle any RAM hacking after a step occurs.

        Args:
            done: whether the done flag is set to true

        Returns:
            None

        """
        pass

    def _get_reward(self):
        """Return the reward after a step occurs."""
        return 0

    def _get_done(self):
        """Return True if the episode is over, False otherwise."""
        return False

    def _get_info(self):
        """Return the info after a step occurs."""
        return {}


# explicitly define the outward facing API for the module
__all__ = [FooGame.__name__]

Life Cycle

Reset

The reset lifecycle executes in order like this pseudocode:

_will_reset()
reset()
_did_reset()
obs = screen
return obs

Step

The step lifecycle executes in order like this pseudocode:

reward = 0
done = False
info = {}
for _ in range(frameskip):
    step()
    reward += _get_reward()
    done = done or _get_done()
    info = _get_info()
_did_step()
obs = screen
return obs, reward, done, info

Reference

NESEnv features methods to directly interact with the underlying NES emulator.

RAM

The RAM behaves like any other NumPy vector.

Read Byte

self.ram[address]

Write Byte

self.ram[address] = value

Frame Advance

self._frame_advance(action)

This method takes an action similar to step just to advance a frame. Use it in the lifecycle callbacks to skip frames that aren't meant for the agent (e.g. loading screens, cutscenes, animations, etc.)

Create Backup State

self._backup()

This method creates a backup state that can be restored arbitrarily. It can be used to create an initial state after some proprietary steps. When a backup exists, calls to reset will restore the backup state as the initial point.

Super Mario Bros Example

See gym-super-mario-bros for an example project for Super Mario Bros.