__[A Baby Robot's Guide To Reinforcement Learning](https://towardsdatascience.com/tagged/baby-robot-guide)__

# Creating a Custom Gym Environment for Jupyter Notebooks
## Part 1: Creating the framework

<center><img src="images/part1_cover_opt.gif"/></center>

___

This notebook accompanies the Towards Data Science [article](https://towardsdatascience.com/creating-a-custom-gym-environment-for-jupyter-notebooks-e17024474617) and is part of _[A Baby Robot's Guide To Reinforcement Learning](https://towardsdatascience.com/tagged/baby-robot-guide)_

___

***

> <b>Updated 3rd January 2023:</b>
> 
> Development of the Open AI Gym library for Reinforcement Learning, which is the base framework originally described in this article,has stopped. It has now been replaced by _[Gymnasium](https://github.com/Farama-Foundation/Gymnasium)_, a new package managed by the _[Farama Foundation](https://farama.org/Announcing-The-Farama-Foundation)_. 
>
> In most cases this new framework remains the same as the original, but there have been a few subtle changes to the API. Consequently this article and its accompanying code samples have been updated to take account of these changes and to make use of this latest framework.
>
> Therefore, although the framework is still referred to as 'Gym', this actually means the new 'Gymnasium' version of the library.

***

# Overview

This article (split over two parts) describes the creation of a custom _[OpenAI Gym](https://gymnasium.farama.org/)_ environment for *Reinforcement Learning* (_RL_) problems. 

Quite a few tutorials already exist that show how to create a custom Gym environment (see the _[References]()_ section for a few good links). In all of these examples, and indeed in the most common Gym environments, these produce either a text-based output (e.g. _[Frozenlake](https://gymnasium.farama.org/environments/toy_text/frozen_lake/)_) or an image-based output (e.g. _[Lunar Lander](https://gymnasium.farama.org/environments/box2d/lunar_lander/)_) that appears in a separate graphical window.

Instead we'll create a custom environment that is specifically tailored to generate its output in a _Jupyter_ notebook. The graphical representation of the environment will be written directly into the notebook cell and updated in real time. Additionally, it can be used in any test framework, and with any RL algorithm, that also implements the _Gym_ interface.

By the end of the article we will have created a custom _Gym_ environment, that can be tailored to produce a range of different Grid Worlds for _Baby Robot_ to explore, and that renders an output similar to the cover image shown above.

<center><img src="./images/green_babyrobot_small.gif"/></center>

# Introduction

Up until now, in our _[series on Reinforcement Learning](https://towardsdatascience.com/tagged/baby-robot-guide)_ (RL), we've used bespoke environments to represent the locations where Baby Robot finds himself. Starting from a simple grid world we added components, such as walls and puddles, to increase the complexities of the challenges that _Baby Robot_ faced. 

Now that we know the basics of _RL_, and before we move onto more complex problems and algorithms, it seems like a good time to formalise _Baby Robot's_ environment. If we give this environment a fixed, defined, interface then we can re-use the same environment in all of our problems and with multiple _RL_ algorithms. This will makes things a lot simpler as we move forwards to look at different _RL_ methods.

By adopting a common interface we can then drop this environment into any existing systems that also implement the same interface. All we need to do is decide what interface we should use. Luckily for us this has already been done, and it's called the <b>OpenAI Gym</b> interface.

# Introduction to OpenAI Gym

_[OpenAI Gym](https://gymnasium.farama.org/content/basic_usage/)_ is a set of _Reinforcement Learning (RL)_ environments, with problems ranging from simple grid worlds up to complex physics engines. 
Each of these environments implements the same interface, making it easy to test a single environment using a range of different _RL_ algorithms. Similarly, it makes it straightforward to evaluate a single _RL_ algorithm on a range of different environments. 

As a result, _OpenAI Gym_ has become the _de-facto_ standard for learning about and bench-marking _RL_ algorithms.

### The OpenAI Gym Interface

The interface for all _OpenAI Gym_ environments can be divided into 3 parts:

1. <b><i>Initialisation</i></b>: Create and initialise the environment.

2. <b><i>Execution</i></b>: Take repeated actions in the environment. At each step the environment provides information to describe its new state and the reward received as a consequence of taking the specified action. This continues until the environment signals that the episode is complete.

3. <b><i>Termination</i></b>: Cleanup and destroy the environment.

### Example: The CartPole Environment

One of the simpler problems in Gym is the _[CartPole](https://gymnasium.farama.org/environments/classic_control/cart_pole/)_ environment. In this problem the goal is to move a cart left or right so that the pole, that's balanced on the cart, remains upright. 

<br/>
<center><img src="images/small_cartpole.gif"/></center>
<center><i>Figure 1: Output of the CartPole Environment - the aim is to balance the pole by moving the cart left or right.</i></center><br/>


The code to set up and run this Gym environment is shown below. Here we're just choosing left or right actions randomly, so the pole isn't going to stay up for very long!

In [1]:
import gymnasium as gym

In [2]:
###########################################
#         Stage 1 - Initialization
###########################################

# create the cartpole environment
env = gym.make('CartPole-v1', render_mode="human")

# run for 10 episodes
for episode in range(10):

  # put the environment into its start state
  env.reset()

###########################################
#            Stage 2 - Execution
###########################################

  # run until the episode completes
  terminated = False
  while not terminated:

    # show the environment
    env.render()

    # choose a random action
    action = env.action_space.sample()

    # take the action and get the information from the environment
    observation, reward, terminated, truncated, info = env.step(action)


###########################################
#           Stage 3 - Termination
###########################################

# terminate the environment
env.close()

In the code given above, we've labelled the 3 stages of a Gym environment. In more detail, each of these do the following:

### 1. Initialisation

```Python
env = gym.make('CartPole-v1', render_mode="human")
```

* Create the required environment, in this case the version 1 of CartPole. The returned environment object 'env' can then be used to call the functions in the common Gym environment interface.
* The 'render_mode' parameter defines how the environment should appear when the 'render' function is called. In this case 'human' has been used to continuously render the environment into the display window.
<br/>

```Python
obs = env.reset()
```

* Called at the start of each episode, this puts the environment into its starting state and returns the initial observation of the environment.
<br/><br/><br/>


### 2. Execution

Here we run until the environment 'terminated' flag is set to indicate that the episode is complete. This can occur if the agent has reached the termination state or a fixed number of steps have been executed.

```Python
env.render()
```

* Draw the current state of the environment. In the case of CartPole this will result in a new window being opened to display a graphical view of the cart and its pole. In simpler environments, such as the _[FrozenLake](https://gymnasium.farama.org/environments/toy_text/frozen_lake/)_ simple grid world, a textual representation is shown.
<br/>


```Python
action = env.action_space.sample()
```

* Choose a random action from the environment's set of possible actions.
<br/>

```Python
observation, reward, terminated, truncated, info = env.step(action)
```

* Take the action and get back information from the environment about the outcome of this action. This includes 4 pieces of information:
<br/><br/><br/>
**'observation'**: Defines the new state of the environment. In the case of CartPole this is information about the position and velocity of the pole. In a grid-world environment it would be information about the next state, where we end up after taking the action.
<br/><br/>
**'reward'**: The amount of reward, if any, received as a result of taking the action.
<br/><br/>
**'terminated'**: A flag to indicate if we've reached the end of the episode
<br/><br/>
**'truncated'**: A flag to indicate if the episode has been stopped before completion.
<br/><br/>
**'info'**: Any additional information. In general this isn't set.
<br/><br/><br/>

> <b>Note:</b><br/>
> In earlier versions of the Gym environment the '_terminated_' and '_truncated_' flags were represented by a single '_terminated_' flag.<br/>
> This has now been split into two, making it clearer as to why the episode finished (for more information see the API documentation for the '[step](https://gymnasium.farama.org/api/env/#gymnasium.Env.step)' function).

<br/>


### 3. Termination

```Python
env.close()
```

* Terminate the environment. This will also close any graphical window that may have been created by the render function.

***

# Creating a Custom Gym Environment

As described previously, the major advantage of using <i>OpenAI Gym</i> is that every environment uses exactly the same interface. We can just replace the environment name string '<i>CartPole-v1</i>' in the '<i>gym.make</i>' line above with the name of any other environment and the rest of the code can stay exactly the same. 

This is also true for any custom environment that implements the Gym interface. All that's required is a class inherited from the Gym environment and that adds the set of functions described above.

This is shown below for the initial framework of the custom 'BabyRobotEnv' that we're going to create (the <i>'_v0'</i> appended to the class name indicates that this is version zero of our environment. We'll update this as we add functionality):

In [3]:
class BabyRobotEnv_v0(gym.Env):
    
    def __init__(self):
        super().__init__()
        pass

    def step(self, action):        
        state = 1    
        reward = -1            
        terminated = True
        truncated = False
        info = {}
        return state, reward, terminated, truncated, info

    def reset(self, seed=None, options=None):
        super().reset(seed=seed)
        state = 0
        info = {}
        return state,info
  
    def render(self):
        pass

In this basic framework for our custom environment we've inherited our class from the base 'gym.Env' class, which gives us all of the main functionality required to create the environment. To this we've then added the 4 functions that are required to turn the class into our own, custom, environment:
<br/><br/>

* '<i>\_\_init\_\_</i>': the class initialisation, where we can setup anything required by the class.<br/>
<br/>
* '_step_': implements what happens when Baby Robot takes a step in the environment and returns information describing the results of taking that step.<br/>
<br/>
* '_reset_': called at the start of every episode to put the environment back into its initial state.<br/>
<br/>
* '_render_': provides a graphical or text based representation of the environment, to allow the user to see how things are progressing.<br/>
<br/>

We haven't implemented a '_close_' function, since there's currently nothing to close, so we can just rely on the base class to do any required clean up. Additionally, we haven't yet added any functionality. Our class satisfies the requirements of the Gym interface, and could be used within a Gym test harness, but it currently won't do much!

***

# Action and Observation Spaces

The code above defines the framework for a custom environment, however it can't yet be run since it currently has no <i><b>'action_space'</b></i> from which to sample random actions. The <i><b>'action_space'</b></i> defines the set of actions that an agent may take in the environment. These can be discrete, continuous or a combination of both.

* <i><b>Discrete actions</b></i> represent a mutually-exclusive set of possible actions, such as the left and right actions in the _CartPole_ environment. At any time-step you can either choose left or right but not both.

* <i><b>Continuous actions</b></i> are actions that have an associated value, which represents the amount of that action to take. For example, when turning a steering wheel an angle could be specified to represent by how much the wheel should be turned.

The _Baby Robot_ environment that we're creating is what's referred to as a _Grid World_. In other words, it's a grid of squares where _Baby Robot_ may move around, from square to square, to explore and navigate the environment. The default level in this environment will be a 3 x 3 grid, with a starting point at the top left-hand corner, and an exit at the bottom right-hand corner, as shown in Figure 2:


<br/>
<center><img src="./images/default_grid.png" style="background-color: white;"/></center>
<center><i>Figure 2: The default level in the Baby Robot environment.</i></center><br/>


Therefore, for the custom _BabyRobotEnv_ that we're creating, there are only 4 possible movement actions: North, South, East or West. Additionally, we'll add a _'Stay'_ action, where Baby Robot remains in the current position. So, in total we have 5 mutually-exclusive actions and we therefore set the action space to define 5 discrete values:
<br><br>

``` Python
self.action_space = gym.spaces.Discrete(5)
```

<br>
In addition to an <i>action_space</i>,  all environments need to specify an <i><b>observation_space</b></i>. This defines the information supplied to the agent when it receives an observation about the environment.

When Baby Robot takes a step in the environment we want to return his new position. Therefore we'll define an observation space that specifies a grid position as an '_x_' and '_y_' coordinate.

The Gym interface defines a couple of different _['spaces'](https://gymnasium.farama.org/api/spaces/#spaces)_ that could be used to specify our coordinates. For example, if our coordinates where continuous  floating point values we could use the _[Box space](https://gymnasium.farama.org/api/spaces/fundamental/#box)_. This would also let us set a limit on the possible range of values that can be used for the 'x' and 'y' coordinates. Additionally, we could then combine these to form a single expression of the environment's observation space using _[Gym's Dict space](https://gymnasium.farama.org/api/spaces/composite/#dict)_.


However, since we're only going to allow whole moves from one square to the next (as opposed to being half-way between squares), we will specify the grid-coordinate in integers. Therefore, as with the action space, we'll be using a discrete set of values. But now, instead of there only being a single discrete value, we have two: one for each of the '_x_' and '_y_' coordinates. Luckily for us, the Gym interface has just the thing, the _[MultiDiscrete space](https://gymnasium.farama.org/api/spaces/fundamental/#multidiscrete)_.

In the horizontal direction the maximum '_x_' position is bounded by the width of the grid and in the vertical '_y_' direction by the height of the grid. Therefore, the observation space can be defined as follows:
<br><br>

``` Python

self.observation_space = MultiDiscrete([ self.max_x, self.max_y ])

```

Discrete spaces are zero based, so our coordinate values will be from zero up to one less than the defined maximum value.

<br>
With these changes the new version of the _BabyRobotEnv_ class is as shown below:



In [4]:
import numpy as np
from gymnasium.spaces import Discrete,MultiDiscrete


class BabyRobotEnv_v1(gym.Env):

    def __init__(self, **kwargs):
        super().__init__()

        # dimensions of the grid
        self.width = kwargs.get('width',3)
        self.height = kwargs.get('height',3)

        # define the maximum x and y values
        self.max_x = self.width - 1
        self.max_y = self.height - 1

        # there are 5 possible actions: move N,E,S,W or stay in same state
        self.action_space = Discrete(5)

        # the observation will be the coordinates of Baby Robot
        self.observation_space = MultiDiscrete([self.width, self.height])

        # Baby Robot's position in the grid
        self.x = 0
        self.y = 0

    def step(self, action):
        obs = np.array([self.x,self.y])
        reward = -1
        terminated = True
        truncated = False
        info = {}
        return obs, reward, terminated, truncated, info

    def reset(self, seed=None, options=None):
        super().reset(seed=seed)
        # reset Baby Robot's position in the grid
        self.x = 0
        self.y = 0
        info = {}
        return np.array([self.x,self.y]),info

    def render(self):
        pass

There are a couple of points to note about the new version of the _BabyRobotEnv_ class:

* We're supplying a <b>_kwargs_</b> argument to the <b>_init_</b> function, letting us create our instance with a dictionary of parameters. Here we're just going to supply the width and height of the grid we want to make, but going forward we can use this to pass other parameters and by using <b>_kwargs_</b> we can avoid changing the interface of the class.

* When we take the width and height from the <b>_kwargs_</b>, in both cases we default to values of 3 if the parameter hasn't been supplied. So we'd end up with a grid of size 3x3 if no arguments are supplied during the creation of the environment.

* We've now defined Baby Robot's position in the grid using '_self.x_' and '_self.y_', which we now return as the observation from the '_reset_' and '_step_' functions. In both cases we've converted these values into numpy arrays, which although not required to match the Gym interface, is required for the <i>Gymnasium</i> environment checker, which will be introduced in the next section.

***

# Testing a Custom Environment

Before we start adding any real functionality to our custom environment it's worth confirming that our new environment conforms to the Gym interface. To test this we can validate our class using the _[Gymnasium Environment Checker](https://gymnasium.farama.org/api/utils/#environment-checking)_.

This will test that we have implemented the functions required to conform to the _Gym_ interface. It also checks that the action and observation spaces are set up correctly and that the function responses match the associated observation space.

To run the check it's simply a case of creating an instance of the environment and supplying this to the '_check_env_' function. If there's anything wrong then warning messages will be shown. If there's no output then it's all good.


In [5]:
# create an instance of our custom environment
env = BabyRobotEnv_v1()

# use the Gymnasium 'check_env' function to check the environment
# - returns nothing if the environment is verified as ok
from gymnasium.utils.env_checker import check_env
check_env(env)

We can also take a look at the environment's action and observation spaces, to make sure they're returning the expected values:

In [6]:
print(f"Action Space: {env.action_space}")
print(f"Action Space Sample: {env.action_space.sample()}")

Action Space: Discrete(5)
Action Space Sample: 4


* the action space, as expected, is a Discrete space with 5 possible values.

* the value sampled from the action space will be a random value between 0 and 4.

Similarly, for the observation space:

In [7]:
print(f"Observation Space: {env.observation_space}")
print(f"Observation Space Sample: {env.observation_space.sample()}")

Observation Space: MultiDiscrete([3 3])
Observation Space Sample: [2 1]


* the observation space has a MultiDiscrete type and its two components each have 3 possible values (since we created a default 3x3 grid).

* when sampling from the observation space for this grid, both 'x' and 'y' can take the values 0, 1 or 2.

***

# Creating the Environment

You may have noticed that in the test above, rather than creating the environment using '_gym.make_', as we did for _CartPole_, we instead simply created an instance of it by doing:

```Python
env = BabyRobotEnv_v1()
```

This is absolutely fine when working with the environment ourselves, but if we want to have our custom environment registered as a proper _Gym_ environment, that can be created using '_gym.make_', then there are a couple of further steps we need to take.

Firstly, from the _[Gym Documentation](https://gymnasium.farama.org/tutorials/environment_creation/)_, we need to setup our files and directories with a structure similar to that shown below:


<br/>
<center><img src="./images/gym_directory_structure.png"/></center>
<center><i>Figure 2: Directory structure for a custom Gym environment.</i></center><br/>


So we need 3 directories:
<br><br>

__1__\.&nbsp; The main directory (in this case '_BabyRobotGym_') to hold our '_setup.py_' file. This file defines the name of the project directory and references the required resources, which in this case is just the '_Gym_' library. The contents of this file are as shown below:
<br><br>


```Python
from setuptools import setup

setup(name='baby_robot_gym',
      version='0.0.1',
      install_requires=['gymnasium']  
)
```
<br><br>

__2__\.&nbsp; The project directory, which has the same name as the setup file's '_name_' parameter. So in the case the directory is called '<i>baby_robot_gym</i>'. This contains a single file <i>'\_\_init.py\_\_'</i> which defines the available versions of the environment:
<br><br>


```Python
from gymnasium.envs.registration import register

register(
    id='BabyRobotEnv-v0',
    entry_point='baby_robot_gym.envs:BabyRobotEnv_v0',
)

register(
    id='BabyRobotEnv-v1',
    entry_point='baby_robot_gym.envs:BabyRobotEnv_v1',
)
```
<br><br>

__3__\.&nbsp; The '_envs_' directory where the main functionality lives. In our case this contains the two versions of the Baby Robot environment that we've defined above ('<i>baby_robot_env_v0.py</i>' and '<i>baby_robot_env_v1.py</i>'). These define the two classes that are referenced in the '<i>babyrobot/\_\_init\_\_.py</i>' file.

Additionally this directory contains its own '<i>\_\_init\_\_.py<\i>' file that references both of the files contained in the directory:
<br><br>

```Python
from .baby_robot_env_v0 import BabyRobotEnv_v0
from .baby_robot_env_v1 import BabyRobotEnv_v1
```

We've now defined a Python package that can be uploaded to a repository, such as _[PyPi](https://pypi.org/)_, to allow easy sharing of your new creation. Additionally, with this structure in place, we're now able to import our new environment and create it using the _'gym.make'_ method, as we did previously for _CartPole_:

In [8]:
import babyrobot

# create an instance of our custom environment
env = gym.make('BabyRobotEnv-v1')

Note that the name used to specify the environment is the one that was used to register it, not the class name. So, in this case, although the class is called <i>'BabyRobotEnv_v1'</i>, the registered name is actually _'BabyRobotEnv-v1'_.


***

# Cloning the Github repository

To make it easy to examine the directory structure described above, it can be recreated by cloning the _Github_ repository. The steps to do this are as follows:
<br><br>

1\.&nbsp; <b><i>Get the code and move to the newly created directory</b></i>:

`git clone https://github.com/WhatIThinkAbout/BabyRobotGym.git` <br>
`cd BabyRobotGym`

* this directory contains the files and folder structure that we've defined above (plus a few extra ones that we'll look at in part 2).

<br><br>
2\.&nbsp; <b><i>Create a Conda environment and install the required packages</b></i>:<br>

To be able to run our environment we need to have a few other packages installed, most notably 'Gym' itself. To make it easy to setup the environment the Github repo contains a couple of '.yml' files that list the required packages. 
To use these to create a Conda environment and install the packages, do the following (choose the one appropriate for your operating system):

On Unix:

`conda env create -f environment_unix.yml`<br>


On Windows: 

`conda env create -f environment_windows.yml`<br>


<br><br>
3\.&nbsp; <b><i>Activate the environment</b></i>:

We've created the environment with all our required packages, so now it's just a case of activating it, as follows:

`conda activate BabyRobotGym`<br>

(when you're finished playing with this environment run "conda deactivate" to get back out)


<br><br>
4\.&nbsp; <b><i>Run the notebook</b></i>

Everything should now be in place to run our custom Gym environment. To test this we can run the sample Jupyter Notebook <i>'baby_robot_gym_test.ipynb'</i> that's included in the repository. This will load the _'BabyRobotEnv-v1'_ environment and test it using the Stable Baseline's environment checker. 

To start this in a browser, just type:

`jupyter notebook baby_robot_gym_test.ipynb`<br>

Or else just open this file in VS Code and make sure _'BabyRobotGym'_ is selected as the kernel. This should make the _'BabyRobotEnv-v1'_ environment, test it in Stable Baselines and then run the environment until it completes, which happens to occur in a single step, since we haven't yet written the 'step' function!

***

# Adding Actions

Although the current version of the custom environment satisfies the requirements of the Gym interface and has the required functions to pass the environment checker tests, it doesn't yet do anything. We want Baby Robot to be able to move around in his environment and for this we're going to need him to be able to take some actions. 

Since Baby Robot will be operating in a simple Grid World environment (see figure 2, above) the actions he can take will be limited to moving North, South, East or West. Additionally we want him to be able to stay in the same place, if this would be the optimal action. So in total we have 5 possible actions (as we've already seen in the action space).

This can be described using a Python integer enumeration:

In [9]:
from enum import IntEnum

''' simple helper class to enumerate actions in the grid levels '''
class Actions(IntEnum):  
    Stay  = 0    
    North = 1
    East  = 2
    South = 3
    West  = 4

    # get the enum name without the class
    def __str__(self): return self.name  

To simplify the code we can inherit from our previous '<i>BabyRobotEnv_v1</i>' class. This gives us all of the previous functionality and behaviour, which we can then extend to add the new parts that relate to actions. This is shown below:

In [16]:
class BabyRobotEnv_v2( BabyRobotEnv_v1 ):

  metadata = {'render_modes': ['human']}

  def __init__(self, **kwargs):
      super().__init__(**kwargs)

      # the start and end positions in the grid
      # - by default these are the top-left and bottom-right respectively
      self.start = kwargs.get('start',[0,0])
      self.end = kwargs.get('end',[self.max_x,self.max_y])

      # Baby Robot's initial position
      # - by default this is the grid start
      self.initial_pos = kwargs.get('initial_pos',self.start)

      # Baby Robot's position in the grid
      self.x = self.initial_pos[0]
      self.y = self.initial_pos[1]


  def take_action(self, action):
      ''' apply the supplied action '''

      # move in the direction of the specified action
      if   action == Actions.North: self.y -= 1
      elif action == Actions.South: self.y += 1
      elif action == Actions.West:  self.x -= 1
      elif action == Actions.East:  self.x += 1

      # make sure the move stays on the grid
      if self.x < 0: self.x = 0
      if self.y < 0: self.y = 0
      if self.x > self.max_x: self.x = self.max_x
      if self.y > self.max_y: self.y = self.max_y


  def step(self, action):

      # take the action and update the position
      self.take_action(action)
      obs = np.array([self.x,self.y])

      # set the 'terminated' flag if we've reached the exit
      terminated = (self.x == self.end[0]) and (self.y == self.end[1])
      truncated = False

      # get -1 reward for each step
      # - except at the terminal state which has zero reward
      reward = 0 if terminated else -1

      info = {}
      return obs, reward, terminated, truncated, info


  def render(self, **kwargs ):
      print(f"{Actions(action): <5}: ({self.x},{self.y}) reward = {reward}")

The new functionality, that's been added to the class, does the following:<br>

* in the <i>'\_\_init\_\_'</i> function key word arguments can be supplied that specify the start and end positions in the environment and Baby Robot's starting position (which by default is set to the grid's start position).

* the '<i>take_action</i>' function simply updates Baby Robot's current position by applying the supplied action and then checks that the new position is valid (to stop him going off the grid).

* the '_step_' function applies the current action and then gets the new observation and reward, which are then returned to the caller. By default a reward of -1 is returned for each move, unless Baby Robot has reached the end position, in which case the reward is set to zero and the '_terminated_' flag is set to true.

* the '_render_' function prints out the current position and reward.


So, finally, we can now take actions and move around from one cell to the next. We can then use a modified version of Listing 1 above (changing from using _CartPole_ to instead use our latest <i>BabyRobot_v2</i> environment) to select random actions and move around the grid until Baby Robot reaches the cell that has been specified as the exit of the grid (which by default is cell (2,2)).

The test framework for our new environment is shown below:

In [18]:
# create the environment
env = gym.make('BabyRobotEnv-v2')

# initialize the environment
env.reset()
env.render()

terminated = False
while not terminated:  

  # choose a random action
  action = env.action_space.sample()   

  # take the action and get the information from the environment
  new_state, reward, terminated, truncated, info = env.step(action)
  
  # show the current position and reward
  env.render(action=action, reward=reward)  

Stay : (0,0) reward = 0
North: (0,0) reward = -1
Stay : (0,0) reward = -1
West : (0,0) reward = -1
West : (0,0) reward = -1
South: (0,1) reward = -1
West : (0,1) reward = -1
West : (0,1) reward = -1
Stay : (0,1) reward = -1
Stay : (0,1) reward = -1
South: (0,2) reward = -1
West : (0,2) reward = -1
South: (0,2) reward = -1
West : (0,2) reward = -1
North: (0,1) reward = -1
Stay : (0,1) reward = -1
Stay : (0,1) reward = -1
West : (0,1) reward = -1
North: (0,0) reward = -1
South: (0,1) reward = -1
South: (0,2) reward = -1
North: (0,1) reward = -1
North: (0,0) reward = -1
Stay : (0,0) reward = -1
South: (0,1) reward = -1
South: (0,2) reward = -1
East : (1,2) reward = -1
East : (2,2) reward = 0


The path through the grid moves from the start square (0,0) to the exit (2,2). Since actions are chosen at random the path can be of any length. Note also that each step receives a reward of -1, until the exit is reached. So the longer it takes Baby Robot to reach the exit, the more negative the return value.

***

# Creating the Render Function

Technically we've already created the render function, it's just that it's not very exciting! As can be seen from the output above, all we're getting are simple text messages that describe the action, position and reward. What we really want is a graphical representation of the environment, showing Baby Robot moving around the grid world.

As described above, the collection of environments in the Gym library perform their rendering, to show the current state of the environment, either by generating a text based representation or by creating an array containing an image.

Text based representations provide a quick way to render the environment in terminal based applications. They're ideal when you only need a simple overview of the current state.

Images on the other hand give a very detailed picture of the current state and are perfect for creating videos of an episode, to display after the episode has completed.
While both of these representations are useful, neither is particularly suited to creating real-time, detailed, views of the environment's state when working in _Jupyter Notebooks_. When Baby Robot moves around a grid level we want to actually see him moving, rather than just getting a text message describing his position, or watching a simple text drawing, with an '_X_' moving over a grid of dots. 

Additionally we want to watch this happening as the episode unfolds, rather than only being able to watch it back afterwards, or see it in a flickering display in real-time. In short, we want to render using a different method to text characters or image arrays. We can achieve this by drawing to an HTML5 Canvas, using the excellent _[ipycanvas](https://ipycanvas.readthedocs.io/en/latest/)_ library, and we'll cover this fully in Part 2.

***

# Summary

<b><i>OpenAI Gym</b></i> environments are the standard method for testing Reinforcement Learning algorithms. The base collection comes with a large set of varied and challenging problems. However, in many cases you may want to define your own, custom, environment. By implementing the structure and interface of the _Gym_ environment it's easy to create such an environment, that will slot seamlessly into any application that also uses the _Gym_ interface.

In summary, the main steps to create a custom Gym environment are as follows:<br>

* Create a class that inherits from the env.Gym base class.

* Implement the '_reset_', '_step_' and '_render_' functions (and possibly the '_close_' function if resources need to be tidied up).

* Define the <b><i>action space</b></i>, to specify the number and type of actions that the environment allows.

* Define the <b><i>observation space</b></i>, to describe the information that is supplied to the agent on each step and that sets the boundaries for movement within the environment.

* Organise the directory structure and add <i>'\_\_init\_\_.py'</i> and '_setup.py_' files to match the _Gym_ specification and to make the environment compatible with the _Gym_ framework.

Following these steps will give you a bare-bones framework, from which you can start adding your own custom features, to tailor the environment to your own specific problem.

In our case, we want to create a Grid World environment that Baby Robot can explore. Additionally, we want to be able to graphically view this environment and watch Baby Robot as he moves around it. In the next we'll see how this can be achieved.

<br>

<center><img src="images/green_babyrobot_small.gif"/></center>

<br>

# References:


1. _[The Gymnasium Library](https://gymnasium.farama.org/)_

2. And the complete series of Baby Robot's guide to Reinforcement Learning can be found here … _[Baby Robot Guide](https://towardsdatascience.com/tagged/baby-robot-guide)_