# Exploration of vectorization and discretization of the action space

In the SB3 documentation, the vectorization of the __"Env"__ objects is recommended, in order to reduce the processing time necessary for the different stages of a project.

As a starting point we are going to use the "RetroMtpoNesReduced()" class, and we are going to expand it.

In [1]:
# Imports
# For gym functionality
from gym import Env
import gym
from retro import RetroEnv 
from gym.spaces import MultiDiscrete, Box, Discrete, MultiBinary
import retro
import retro.data
import numpy as np
# Import opencv for grayscaling
import cv2
# Import matplotlib for plotting the image
from matplotlib import pyplot as plt

# To use the stable baselines 3 objects and methods
from stable_baselines3 import PPO
from stable_baselines3.common.vec_env import VecFrameStack, DummyVecEnv, VecEnv
from stable_baselines3.common.monitor import Monitor
from stable_baselines3.common.utils import set_random_seed
from stable_baselines3.common import env_checker

import os

## Utilización de Wrappers

Las librerías SB3 y Retro, introducen la opción de utilizar objetos de programación llamados "wrappers", que permiten la modificación del comportamiento de los objetos "env", de manera de facilitar nuestro trabajo, y permitirnos libertad a la hora de construir nustros proyectos. Es por lo anterior que vamos a utilizar algunos "wrappers" que proveen dichas librerías, y adicionalmente vamos a modificar la clase "MtpoNes()", por una clase __"RetroMtpoNes()"__ que nos va a permitir utilizar un "wrapper" que nos auyde con la reducción del espacio de acciones de nuestro agente.

La clase RetroMtpoNes() será la que utilizaremos definitivamente durante la ejecución del proyecto.

In [7]:
class RetroMtpoNesReducedRL(Env):
    """
    Class that creates a retro "Gym" object, and allows me to manipulate its observation space.
     With this I seek to reduce the observations space, to speed up the training stage.

     This class creates a "focus area", removing the outter most two thirds of the screen (vertically), 
     leaving in "focus" a area where the action of the game takes place. Additionally I reduce the number
     of color channels, from three to one, which gives the feeling that the game is in black and white (also
     called "grayscaling").

     In this class, additionally, the "viewing" area is reduced, going from an observation space of 196x80x1
     to one of 84x84x1.
     
     The main inspiration for this class comes from a Youtube tutorial from Nickolas Renotte.
     
     https://www.youtube.com/watch?v=rzbFhu6So5U&t=6248s
     
    """
    def __init__(self, state='GlassJoe.state',
                 scenario='scenario_king_hippo',
                 inttype=retro.data.Integrations.STABLE,
                 points_as_rewards=True):
        super(RetroEnv).__init__()
        # Most of these lines comes from GYM RETRO library.
        self.img = None
        rom_path = retro.data.get_romfile_path('Mtpo-Nes', inttype)
        self.system = retro.get_romfile_system(rom_path)
        core = retro.get_system_info(self.system)
        self.buttons = core['buttons']
        self.observation_space = Box(low=0, high=255, shape=(84,84,1), dtype=np.uint8)
        self.action_space = MultiBinary(9)
        self.state = state
        self.scenario = scenario
        self.game = retro.make(game='Mtpo-Nes',
                               state=self.state,
                               scenario=self.scenario,
                              )
        self.points_as_rewards = points_as_rewards
        self.picture = None
        

    def preprocess(self, observation):
        """ 
        Method to preprocess the images that the "RetroEnv" object uses during training.
         The idea is to deliver a reduced observation, which helps streamline the training processes of the
         agent. The derivation of the reduced observation can be seen in the notebook:
        
         - '1_CV_Preprocessing.ipynb'
        
         which is part of this 'Notebooks' section
        """
        # Cropping
        xlen = observation.shape[0]
        ylen = observation.shape[1]
        focus_zone = observation[int(xlen*(1/8)):int(xlen*(3/2)),int(ylen/3):-int(ylen/3)]
        # Grayscale
        gray = cv2.cvtColor(focus_zone, cv2.COLOR_BGR2GRAY)
        resize = cv2.resize(gray, (84,84), interpolation=cv2.INTER_CUBIC)
        
        # We must fit the output to a tensor with three dimensions, since
        # it is the data structure that the gym object expects.
        # values between 0 and 1.
        channels = np.reshape(resize, (84,84,1))

        return channels

    def reset(self):
        # Returns the fist "frame"
        obs = self.game.reset()
        processed_obs = self.preprocess(obs)
        self.score = 0
        self.picture = processed_obs
        return processed_obs
    
    def step(self, action):
        # Go one step further in the emulation of the game
        # Integrate the modification to the observation using the "preprocessed()" method
        obs, reward, done, info = self.game.step(action)
        processed_obs = self.preprocess(obs)
        
        # This is to return the points of the game as the reward if we want it.
        if self.points_as_rewards:
            reward_as_points = info['POINTS'] - self.score
            self.score = info['POINTS']
            return processed_obs, reward_as_points, done, info
        else:  
            return processed_obs, reward, done, info
    
    # The rest of the methods are not used much, yet might come in
    # handy in some cases
    def render(self, *args, **kwargs):
        self.game.render()
        
    def close(self):
        self.game.close()

    def get_image(self):
        return self.picture
    
    def get_buttons(self):
        return self.buttons
    
    def get_action_meaning(self, act):
        return self.game.get_action_meaning(act)
    
    def get_in_game_score(self):
        return self.score

    def get_in_game_reward(self):
        return self.in_game_reward

## Action space discretization wrapper

As we mentioned, in order to reduce the action space of the RL agent, which helps us reduce the data processing time during training, we are going to implement a "wrapper" to our "RetroMtpoNesReduced()" object that help us with that.

This "wrapper" is an example that we can find in the retro library repository:

https://github.com/openai/retro-baselines/blob/master/agents/sonic_util.py

I'll take it and adapt it to reduce the action space of __"Punch-Out"__.

In [8]:
class Discretizer(gym.ActionWrapper):
    """
    Wraps an "Env" object and turn it into an environment with discrete actions.
     args:
         combos: ordered list of lists of valid button combinations.
    """

    def __init__(self, env, combos):
        super().__init__(env)
        assert isinstance(env.action_space, gym.spaces.MultiBinary)
        buttons = env.unwrapped.buttons
        self._decode_discrete_action = []
        for combo in combos:
            arr = np.array([False] * env.action_space.n)
            for button in combo:
                arr[buttons.index(button)] = True
            self._decode_discrete_action.append(arr)

        self.action_space = gym.spaces.Discrete(len(self._decode_discrete_action))

    def action(self, act):
        return self._decode_discrete_action[act].copy()


class MtpoDiscretizer(Discretizer):
    """
    We use discrete actions specific to the Punch-Out game
    """

# Actions to use the star during the fight (super power)
    def __init__(self, env):
        USE_STAR = [
        [], # Motionless
        ['RIGHT'], # Dodge right
        ['LEFT'], # Dodge left
        ['DOWN'], # Cover
        ['UP', 'A'], # Hit the face with a right hand
        ['UP', 'B'], # Hit the face with a left hand
        ['A'], # Punch to the body with a right hand
        ['B'], # Punch to the body with a left hand
        ['START'], # Use super power
        ]

# Actions to not use the star during the fight (super power)
        NO_STAR = [
        [], # Motionless
        ['RIGHT'], # Dodge right
        ['LEFT'], # Dodge left
        ['DOWN'], # Cover
        ['UP', 'A'], # Hit the face with a right hand
        ['UP', 'B'], # Hit the face with a left hand
        ['A'], # Punch to the body with a right hand
        ['B'], # Punch to the body with a left hand
        ]

# Actions to not use the star during the fight (super power) and only dodge blows, not cover
        DODGE = [
        [],
        ['RIGHT'], # Dodge right
        ['LEFT'], # Dodge left
        ['DOWN'], # Cover
        ['UP', 'A'], # Hit the face with a right hand
        ['UP', 'B'], # Hit the face with a left hand
        ['A'], # Punch to the body with a right hand
        ['B'], # Punch to the body with a left hand
        ['START'], # Use super power
        ]
        super().__init__(env=env, combos=DODGE)

Now we initialize the "env" object:

In [11]:
env = RetroMtpoNesReducedRL()
env = MtpoDiscretizer(env)

And to test that the "wrapper" discretization of the action space works, we are going to use a function provided by the SB3 library, called __"env_checker.check_env"__.

In [12]:
env_checker.check_env(env)

If we don't get errors, that means that the "RetroMtpoNesReducedRL()" class and the "wrapper" work correctly.

Now, we verify the new format of the agent's action space, and a sample:

In [13]:
obs = env.observation_space
acciones = env.action_space
print(acciones)
print(acciones.sample())

Discrete(9)
3


We see that now the __action space__ of the agent is discrete, and has only nine (09) posible actions.