__[A Baby Robot's Guide To Reinforcement Learning](https://towardsdatascience.com/tagged/baby-robot-guide)__

# Creating a Custom Gym Environment for Jupyter Notebooks
## Part 2: Rendering to Jupyter Notebook Cells

<center><img src="images/part2_cover_opt.gif"/></center>

Run this notebook on Binder:

[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/WhatIThinkAbout/BabyRobotGym/HEAD?labpath=notebooks%2FBabyRobot_API.ipynb)

***

> <b>Updated 7th January 2023:</b>
> 
> Development of the Open AI Gym library for Reinforcement Learning, which is the base framework originally described in this article,has stopped. It has now been replaced by _[Gymnasium](https://github.com/Farama-Foundation/Gymnasium)_, a new package managed by the _[Farama Foundation](https://farama.org/Announcing-The-Farama-Foundation)_. 
>
> In most cases this new framework remains the same as the original, but there have been a few subtle changes to the API. Consequently this article and its accompanying code samples have been updated to take account of these changes and to make use of this latest framework.
>
> Therefore, although the framework is still referred to as 'Gym', this actually means the new 'Gymnasium' version of the library.

***

# Introduction

In _[Part One](https://towardsdatascience.com/creating-a-custom-gym-environment-for-jupyter-notebooks-e17024474617)_, we saw how a custom Gym environment for **Reinforcement Learning** (_RL_) problems could be created, simply by extending the Gym base class and implementing a few functions. However, the custom environment we ended up with was a bit basic, with only a simple text output. 

So, in this part, we'll extend this simple environment by adding graphical rendering. Additionally, this rendered output will be explicitly targeted at _Jupyter Notebooks_, producing a graphical representation of the environment directly into the notebook cells.


<center><img src="images/green_babyrobot_small.gif"/></center>

# Introduction to the ipycanvas Library

When running a Reinforcement Learning problem in a _Jupyter Notebook_, it's very easy to write text into the notebook cell to show how things are progressing. However, given the large amount of information that can be generated over time, a much clearer representation can be obtained by creating a graphical view of the environment.

Quite often this graphical view is generated by taking snapshot images of the environment at each time-step and then joining these together, at the end of the episode, to create a short movie. This can then be played back within the notebook to see how things progressed.

The downside with this approach is that you need to wait for the movie to be created. Ideally we want to see the changes that occur in our environment happening in real time. We need something that can be added to a notebook cell, then drawn to and updated as actions take place. 
This exact functionality can be achieved using the _[HTML canvas element](https://developer.mozilla.org/en-US/docs/Web/API/Canvas_API)_, which can be accessed within a _Jupyter Notebook_ using the excellent **[ipycanvas](https://ipycanvas.readthedocs.io/en/latest/)** library.

### Load the libraries required to run the notebook code

In [1]:
# uncomment this if the code has been cloned from github
# set the path so we can import from the root directory
# import sys
# sys.path.insert(0, '../')

# install the babyrobot gym environment if code not cloned
%pip install babyrobot --upgrade -q

Note: you may need to restart the kernel to use updated packages.


In [2]:
import os
import shutil
from time import sleep
from ipywidgets import Layout, Play, Image, IntProgress, HBox, VBox, link
from IPython.display import Image as PyImage

# import the babyrobot library to get access to the previous environments and utilities
import babyrobot

# alias gymnasium to make it compatible with existing code that refers to 'gym'
import gymnasium as gym

In [3]:
# use the Gymnasium 'check_env' function to check the environment
# - returns nothing if the environment is verified as ok
from gymnasium.utils.env_checker import check_env

## Example:

The first thing we're going to need to create our _**Baby Robot Grid World**_, is the actual "world", where all the action takes place. At its most basic, this is just a coloured rectangle. This can be created really easily in _ipycanvas_ by simply defining a canvas and then specifying the size and colour of rectangle to draw:

In [4]:
from ipycanvas import Canvas,hold_canvas

cell_pixels = 64           # pixel dimensions of a grid square   
grid_width  = 3            # number of horizontal cells
grid_height = 3            # number of vertical cells 

width_pixels  = grid_width  * cell_pixels  # total horizontal pixels
height_pixels = grid_height * cell_pixels  # total vertical pixels

canvas = Canvas(width=width_pixels, height=height_pixels, sync_image_data=True)

In the code above, we've imported the _ipycanvas_ library, then defined the dimensions of the grid world that we're going to create. This will be a 3x3 grid, where each cell is a square of 64-pixels. Using these dimensions we can then create our canvas.

In [5]:
def draw_base(canvas):
  ''' fill the supplied canvas with orange '''
  canvas.fill_style = 'orange' 
  canvas.fill_rect(0, 0, canvas.width, canvas.height)

Initially the canvas will be blank, so to actually see the canvas we need to draw something. In the '_draw_base_' function, shown above, the fill colour is set to be orange and then this is used to draw a rectangle covering the complete canvas area.

After calling this function, the final line, '_canvas_', just draws the completed canvas into the notebook cell, as shown in _Figure 1_ below. This square will act as the base of our grid-world. Pretty exciting!

In [6]:
draw_base(canvas)  
canvas

Canvas(height=192, sync_image_data=True, width=192)

<center>Figure 1: The basic canvas world.</center>

### Adding a Grid

The next thing that any self-respecting Grid World is going to need is an actual grid. Again this can be easily achieved in _ipycanvas_ by drawing a few dashed lines:

In [7]:
def draw_grid( canvas ):
  # with hold_canvas(canvas):
    canvas.stroke_style = '#777' # grid line color - medium gray
    canvas.line_width = 1
    canvas.set_line_dash([4,8])    

    # draw the grid onto the canvas
    for y in range(grid_height):   
      for x in range(grid_width):   
        canvas.stroke_rect(cell_pixels * x, cell_pixels * y, cell_pixels, cell_pixels)

Here we've defined a function that sets up the canvas properties to draw a 1 pixel wide, dashed, grey line. Then we simply draw a rectangle for each cell in the grid, which gives us the output shown in _Figure 2_:

In [8]:
canvas = Canvas(width=width_pixels, height=height_pixels, sync_image_data=True)
draw_base(canvas) 
draw_grid( canvas )
canvas

Canvas(height=192, sync_image_data=True, width=192)

<center>Figure 2: The basic grid world.</center>

### Adding a Border

In [9]:
def draw_border(canvas):
  canvas.stroke_style = 'black'
  canvas.line_width = 5
  canvas.set_line_dash([0,0])
  canvas.stroke_rect(0,0,width_pixels,height_pixels) 

We can improve the look of our grid world by adding a border around the outside. This is simply a black rectangle, with slightly thicker lines than the grid, and is defined in the 'draw_border' function. This produces the output shown below:

In [10]:
canvas = Canvas(width=width_pixels, height=height_pixels, sync_image_data=True)
draw_base(canvas) 
draw_grid( canvas )
draw_border(canvas)
canvas

Canvas(height=192, sync_image_data=True, width=192)

<center>Figure 3: The grid world with an added border.</center>

### Adding an Animated Image

The final thing that our _Baby Robot Grid World_ is going to need is a _Baby Robot_, and preferably one that moves! Since we want our robot to move over the top of the grid level, without damaging anything we've already drawn, we'll use a separate canvas for our robot animation. 

This is easily achieved using the _**MultiCanvas**_ element. With this we can stack as many canvases as we want, and draw to each one separately, to build up our complete environment. This is shown below, where we've defined the _MultiCanvas_ to have 2 layers and then used the functions from above to recreate the grid world on the first of these layers (layer index zero).

In [11]:
from ipycanvas import MultiCanvas

layers = 2
multi_canvas = MultiCanvas(layers,width=width_pixels, height=height_pixels, sync_image_data=True)
draw_base(multi_canvas[0])  
draw_grid(multi_canvas[0])
draw_border(multi_canvas[0])
multi_canvas

MultiCanvas(height=192, sync_image_data=True, width=192)

Finally, we can load in our Baby Robot image and create a very simple animation, drawing our animation onto the upper canvas (index = 1).

In [12]:
robot_size = 64
baby_robot = Image.from_file('images/baby_robot.png')  

# animate an image on the canvas
def animate_robot( canvas ):  
  canvas.clear()
  y = robot_size + 2
  for x in range(-robot_size,200,2):   
    with hold_canvas(canvas):
      canvas.clear_rect(x, y, robot_size)                        
      canvas.draw_image(baby_robot, x, y )       
    sleep(0.04)

To make Baby Robot move across the screen we use a simple loop that clears the previous image before drawing the next one. Since there's some padding on the image we can simply clear the area where we want to draw the new image. Both of these operations are tied together using '_hold_canvas_' which makes things slightly smoother (for more advanced animations check out the _[ipycanvas documentation](https://ipycanvas.readthedocs.io/en/latest/animations.html)_).

The final Baby Robot Grid World is shown in Figure 4, below:

In [13]:
multi_canvas

MultiCanvas(height=192, image_data=b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x00\xc0\x00\x00\x00\xc0\x08\x…

<center>Figure 4. Baby Robot in the Grid World.</center>

In [14]:
# run this cell to make Baby Robot move
animate_robot( multi_canvas[1] )

# Creating a Graphical Grid Level

Using the _ipycanvas_ library, and the basic drawing routines described above, we can create classes that encapsulate all of the functionality required to draw a graphical grid level for our custom Gym environment. 

As part of this, we have two main classes:

* _**GridLevel**_: to manage the drawing and querying of the grid level.
* _**RobotDraw**_: to draw Baby Robot onto the grid at a particular location and to do the animation as he moves between cells.

The full code for both of these classes can be found on _[Github](https://github.com/WhatIThinkAbout/BabyRobotGym/tree/main/babyrobot/envs/lib)_.

In the code below we import these two classes and then use them to draw a default 3x3 grid level, onto which we add Baby Robot, positioned at cell [1,1].

In [15]:
from babyrobot.envs.lib import GridLevel
from babyrobot.envs.lib import RobotDraw

# draw the default grid level
level = GridLevel()

# add Baby Robot
robot = RobotDraw(level)
robot.set_cell_position([1,1])
robot.draw()

# show all environment canvases
level.draw()

MultiCanvas(height=196, sync_image_data=True, width=196)

<center>Figure 5: A default Baby Robot grid world level.</center>

This gives us a default _Baby Robot_ grid world level that we can use to create a graphical rendering function for our Gym environment.

# Create a graphical Gym render function

At the end of the first part of this series on creating a custom Gym environment we'd ended up with a render function that produced this:

<center><img src="images/V2_output.png"/></center>
<center><i>Figure 5: The output from version 2 of BabyRobotEnv's 'render' function.</i></center>

While providing all the important information about the current state of the environment, it's not very exciting. Additionally, it's a lot harder to visualise how the episode progressed. By looking at the coordinates at each time step you can sort of imagine how Baby Robot moved through the grid, but things would be much clearer if we could actually see this happening.

As we've seen, real time graphics can be created in a _Jupyter Notebook_ cell using _ipycanvas_, so we can replace the current text-base render function with one that shows a graphical view of the environment and update this as changes occur.

In [16]:
from babyrobot.envs import BabyRobotEnv_v2
from babyrobot.envs.lib import Actions
import numpy as np

In [17]:
''' the first graphical environment '''
class BabyRobotEnv_v3( BabyRobotEnv_v2 ):

  def __init__(self, **kwargs):
      super().__init__(**kwargs)

      # graphical creation of the level
      self.level = GridLevel( **kwargs )

      # add baby robot
      self.robot = RobotDraw(self.level,**kwargs)
      self.robot.draw()

  def reset(self, seed=None, return_info=False, options=None):
      super().reset(seed=seed)
      # reset Baby Robot's position in the grid
      self.robot.set_cell_position(self.initial_pos)
      self.robot.reset()
      self.x = self.initial_pos[0]
      self.y = self.initial_pos[1]
      info = {}
      return np.array([self.x,self.y]),info

  def render(self, action=0, reward=0 ):
      ''' render as an HTML5 canvas '''
      print(f"{Actions(action): <5}: ({self.x},{self.y}) reward = {reward}")

      # move baby robot to the current position
      self.robot.move(self.x,self.y)
      return self.level.draw()

As we've done previously, the new class inherits from the previous version of the environment (in this case from <i>BabyRobotEnv_v2</i>), which gives us all the functionality of the Gym base class, plus the extra stuff we added in the previous iterations. We then just need to provide new versions of the functions we want to replace, which in this case are as follows:

* <b><i>\_\_init\_\_</i></b> : contains the instances of our 'GridLevel' and 'RobotDraw' classes that we need for drawing the grid and Baby Robot respectively.

* <b><i>reset</i></b> : puts both Baby Robot and the environment back to the initial position.

* <b><i>render_</i></b> : moves Baby Robot to the new position (where the position has been calculated in the Gym interface's 'step' function, defined in BabyRobotEnv_v2) and draws the level. This will animate the movement as Baby Robot moves from one cell to the next.

Now when we create an instance of this environment and call it's render function, we see this:

In [18]:
env = BabyRobotEnv_v3()
env.render()

Stay : (0,0) reward = 0


MultiCanvas(height=196, sync_image_data=True, width=196)

In [19]:
# initialize the environment
env.reset()

terminated = False
while not terminated: 

  # choose a random action
  action = env.action_space.sample()   

  # take the action and get the information from the environment
  new_state, reward, terminated, truncated, info = env.step(action)
  
  # show the current position and reward
  env.render(action=action, reward=reward) 

Stay : (0,0) reward = -1
West : (0,0) reward = -1
East : (1,0) reward = -1
East : (2,0) reward = -1
Stay : (2,0) reward = -1
South: (2,1) reward = -1
West : (1,1) reward = -1
East : (2,1) reward = -1
South: (2,2) reward = 0


Even better, when we run our standard reinforcement learning loop, shown above, we now get to see Baby Robot moving around the environment. Baby Robot is currently taking randomly sampled actions in his quest to find the exit, so each episode will follow a different path.

# State specific action spaces

If you take a look again at the _BabyRobotEnv_v3 'render'_ function, you'll see that we're still printing the action, position and reward for each time step. So, in addition to the new graphical output, we're still getting the text output from version 2 of our environment. Additionally, if you examine this text output, you'll see entries such as the first line in _Figure 5_: 

_"North: (0,0) reward = -1"_

In other words, Baby Robot was in the initial start square (0,0) and then chose to move North, which would take him straight into a wall!

Although he's only a baby, he's not stupid, so should only choose actions that are valid. We can achieve this by introducing a state specific action space where, rather than simply choosing from all of the actions, the action that is returned depends on the current state.

In [20]:
class Dynamic(gym.Space):

  def __init__(self, action_list = []):
      ' set the list of initially available actions '      
      self.set_actions(action_list)
      
  def sample(self):
      ' select a random action from the set of available actions '
      return np.random.choice(self.available_actions)    
    
  def set_actions(self,actions):
      self.available_actions = actions
      self.n = len(actions)    
      
  def get_available_actions(self):
      return [str(action) for action in self.available_actions] 

In the code above we've created a custom _[Gym Space](https://www.gymlibrary.ml/content/spaces/)_. We'll use this to store the actions available in the current state and then, when '_sample_' is called, we'll randomly select one of these actions.

Using this class we can enhance our previous environment so that, when a new state is entered, it sets up the possible actions for that state. This is shown below:

In [21]:
from babyrobot.envs.lib import Direction

class BabyRobotEnv_v4( BabyRobotEnv_v3 ):

  def __init__(self, **kwargs):
      super().__init__(**kwargs)

      # initially no actions are available
      self.dynamic_action_space = Dynamic()

      # set the initial position and available actions
      self.reset()


  def get_available_actions( self ):
      ''' test which actions are allowed at the specified grid state '''

      # get the available actions from the grid level
      direction_value = self.level.get_directions(self.x,self.y)

      # convert the grid directions into environment actions
      action_list = []
      if direction_value & Direction.North: action_list.append( Actions.North )
      if direction_value & Direction.South: action_list.append( Actions.South )
      if direction_value & Direction.East:  action_list.append( Actions.East )
      if direction_value & Direction.West:  action_list.append( Actions.West )
      return action_list


  def set_available_actions( self ):
      ' set the list of available actions into the action space '
      action_list = self.get_available_actions()
      self.dynamic_action_space.set_actions( action_list )


  def show_available_actions( self ):
      ''' print the set of avaiable actions for current state '''
      available_actions = str(self.dynamic_action_space.get_available_actions()).replace("'","")
      print(f"({self.x},{self.y}) {available_actions:29}",end="")


  def take_action(self, action):
      ''' apply the supplied action '''

      # call the parent class to take the action and update the position
      super().take_action( action )

      # set the available actions for the new state
      self.set_available_actions()


  def reset(self, seed=None, return_info=False, options=None):
      # reset Baby Robot's position in the grid
      observation,info = super().reset(seed=seed)
      self.set_available_actions()
      return observation,info

As before, we inherit from the previous environment (in this case <i>BabyRobotEnv_v3</i>), so that we can build on its functionality. We then add an instance of the 'Dynamic' class and, each time the '<i>take_action</i>' function is called, we populate this with the actions available for the current state.

As a result, when an action is sampled for a particular state, it will be drawn from the set of valid actions, that don't result in Baby Robot walking into a wall. 

For example, for the start state, calling BabyRobotEnv_v4's '<i>show_available_actions</i>' function returns the actions South and East. Similarly, for grid position (2,1), shown in _Figure 8_, the available actions are North, South or West.

In [32]:
# create the default environment
env = BabyRobotEnv_v4()
env.show_available_actions()
env.render()

(0,0) [South, East]                Stay : (0,0) reward = 0


MultiCanvas(height=196, sync_image_data=True, width=196)

In [33]:
env = BabyRobotEnv_v4(**{'initial_pos':[2,1],'add_compass':True})
env.show_available_actions()
env.render()

(2,1) [North, South, West]         Stay : (2,1) reward = 0


MultiCanvas(height=196, sync_image_data=True, width=296)

<center><i>Figure 8: Grid position (2,1) where the available actions are North, South or West.</i></center>

In [29]:
env.level.save("images/position_2_1.png")

### Registering and checking a local environment class

To check that our new environment conforms to the Gym API standard we can use the Gymnasium '<i>check_env</i>' function. If this returns no warnings then we're all good.

However, to supply our environment to this function, we first need to call '_gym.make_' to make the environment, but before we can do this we need to have registered the environment for Gymnasium to know about it.

In the first part of this article we saw how to do this when the custom environment was contained in its own python file. In this case the '<i>entry_point</i>' supplied to the '_register_' function defines the file and class name.

Registering a local class is slightly different. In this case the '<i>entry_point</i>' is just the class name rather than a string. So, in this case, we can register and check the <b><i>BabyRobotEnv_v4</i></b> class as follows:

In [35]:
# register the local custom environment class
from gymnasium.envs.registration import register
register( id='BabyRobotEnv-v4', entry_point=BabyRobotEnv_v4 )

# make the environment
env = gym.make("BabyRobotEnv-v4")

# check the environment conforms to the API standard
check_env(env)

Stay : (0,0) reward = 0


# Enhancing the graphical environment

While it's useful to be able to see the text output, giving the details for each action, it's not very nice that it generates an ever increasing list of text, which eventually swamps the notebook cell. 

In [36]:
class BabyRobotEnv_v5( BabyRobotEnv_v4 ):
  
  def __init__(self, **kwargs):
      super().__init__(**kwargs)  
  
  def render(self, info=None):                 
      ''' render as an HTML5 canvas '''           
      # move baby robot to the current position
      self.robot.move(self.x,self.y) 
      # write the info to the grid side-panel      
      self.show_info(info) 
      return self.level.draw()          

  def show_info(self,info):
      ''' display the supplied information on the grid level '''
      self.level.show_info( info )      

In [37]:
# register and make the environment
register( id='BabyRobotEnv-v5', entry_point=BabyRobotEnv_v5 )
env = gym.make("BabyRobotEnv-v5")

# check the environment conforms to the API standard
# - returns nothing if the environment is verified as ok
check_env(env)

Rather than using a print statement in the '_render_' function we can instead write text directly to the canvas. To do this, we first need to expand the canvas to create a region where the text can be shown. By making use of the '<i>\_\_init\_\_</i>' function's '_kwargs_' argument, we can supply an object that defines this text region:

In the example below we've specified that we'd like a grey side panel with a width approximately equal to the width of the grid level. This then gives the following output:

In [43]:
env = babyrobot.make("BabyRobotEnv-v5",**{'side_panel':{'width':200,'color':'#ddd'}})
env.render()

MultiCanvas(height=196, sync_image_data=True, width=396)

(note that here we're using '_babyrobot.make_' as opposed to '_gym.make_' - this is to avoid being forced to call '_env.reset()_' before '_env.render()_' which is a new restriction implemented in the later versions of Gym.)

All we need now is a way to write to this panel, and display the required information, each time '_render_' is called. The next iteration of our environment contains the '<i>show_info</i>' function to do just that.

The new '<i>show_info</i>' method calls a function in the underlying '_GridLevel_' class. This takes an information object giving the text to display and the details of where it should go.

Previously, in the '<i>render</i>' function, we supplied the action and the reward and then displayed these using a print command:

```
print(f"{Actions(action): <5}: ({self.x},{self.y}) reward = {reward}")
```

In the new graphical version, we instead create an information object in the main loop and give it to the render function:

In [44]:
env = babyrobot.make("BabyRobotEnv-v5", **{'side_panel':{'width':200,'color':'#ddd'}})
env.render()

MultiCanvas(height=196, sync_image_data=True, width=396)

In [46]:
# run the environment, taking random actions
env.reset()

info = {}
terminated = False
while not terminated:

  # choose a random action
  action = env.action_space.sample()

  # take the action and get the information from the environment
  new_state, reward, terminated, truncated, info = env.step(action)

  # form an information string
  info_str = f"{Actions(action): <5}: {new_state} reward = {reward}"

  # show the current position and reward
  env.render(info = {'side_info': [((10,10),info_str)]})

# Increasing the Challenge

While our new graphical output from the custom Gym environment may look nice, it's not exactly a very hard Reinforcement Learning challenge. To make things more difficult we need to add a few obstacles for Baby Robot to negotiate.

### Adding Walls:
We can supply an array of wall definitions when creating the environment. Each item in this array defines the grid coordinate and side of the cell where the wall should be placed:

In [47]:
setup = {'add_compass':True}
walls = [((0, 0),'E'),
         ((2, 2),'W')]
setup['walls'] = walls

env = BabyRobotEnv_v5(**setup)
env.show_available_actions()
env.render()

(0,0) [South]                      

MultiCanvas(height=196, sync_image_data=True, width=296)

<center><i>Figure 11: Adding walls to the environment.</i></center>

### Adding Puddles

Currently, when moving around the grid, all of Baby Robot's actions are deterministic. For example, in Figure 11 above, Baby Robot currently only has one possible action from the Start state, and that's to head South. When he takes this action he'll definitely end up in the cell below and will receive a reward of -1 for taking this action.

Many RL problems instead consider probabilistic environments where, when an action is taken, it's not guaranteed that you end up in the target state nor that you get the expected reward (see the article on "_[Markov Decision Processes and Bellman Equations](https://towardsdatascience.com/markov-decision-processes-and-bellman-equations-45234cce9d25)_" for more information on this). We can introduce this randomness to the grid world by adding puddles. When Baby Robot encounters one of these there's a chance he can skid, in which case he'll end up in a different cell than the one he was trying to reach. Additionally, it takes Baby Robot longer to move through puddles, and so the reward for moving into a puddle is more negative (i.e. a larger penalty).

Before we add any puddles we'll make one final change to the environment. In the '<i>take_action</i>' function we'll check if the action resulted in the desired target being reached. Then, in the '_step_' function, we'll make use of the Gym interface's 'info' object to return this information. This will allow us to monitor the effect of Baby Robot moving into a puddle:

In [48]:
class BabyRobotEnv_v6( BabyRobotEnv_v5 ):

  def __init__(self, **kwargs):
      super().__init__(**kwargs)

  def take_action(self, action):
      ''' apply the supplied action
          returns: - the reward obtained for taking the action
                   - flag indicating if the target state was reached
      '''
      # convert the action into a direction bitfield
      direction = Direction.from_action(action)

      # calculate the postion of the next state and the reward for moving there
      next_pos,reward,target_reached = self.level.get_next_state( self.x, self.y, direction )

      # store the new position
      self.x = next_pos[0]
      self.y = next_pos[1]

      # update the available actions for the new position
      self.set_available_actions()
      return reward, target_reached

  def step(self, action):

      # take the action and update the position
      reward, target_reached = self.take_action(action)
      obs = np.array([self.x,self.y])

      # set the 'terminated' flag if we've reached the exit
      terminated = (self.x == self.end[0]) and (self.y == self.end[1])
      truncated = False

      info = {'target_reached':target_reached}
      return obs, reward, terminated, truncated, info

In [49]:
env = BabyRobotEnv_v6()
env.step(Actions.Stay)

(array([0, 0]), -1, False, False, {'target_reached': False})

In [50]:
# register and make the environment
register( id='BabyRobotEnv-v6', entry_point=BabyRobotEnv_v6 )
env = gym.make("BabyRobotEnv-v6")

# check the environment conforms to the API standard
# - returns nothing if the environment is verified as ok
check_env(env)

As with walls, puddles are specified by giving the coordinates of their grid location. However, puddles exist in the middle of a cell, so a side doesn't need to be specified. Instead the size of the puddle is defined, with 2 possible options which, by default, having the following properties:

* 1 = small puddle. Reward = -2, Probability of skidding = 0.4
* 2 = large puddle. Reward = -4, Probability of skidding = 0.6

If we now run the simple test code, shown below, Baby Robot will try to take 2 steps to the East. The first of these will succeed, since he's moving from the Start square which is dry. However, he's moving into a large puddle so will automatically receive a reward of -4. On his next move he'd like to reach the Exit, so again tries to move East. However, he's now moving out of a large puddle, so there's a 0.6 probability that he'll skid and instead end up in one of the other possible states.

In [51]:
setup = { 'start':[0,1], 'end': [2,1] , 'add_compass':True }        
setup['side_panel'] = {'width':200,'color':'#ddd'}
setup['walls'] = [((0, 1),'N'),((0, 1),'S')]
setup['puddles'] = [((1,1),2)]

env = BabyRobotEnv_v6(**setup)
env.show_available_actions()
env.render() 

(0,1) [East]                       

MultiCanvas(height=196, sync_image_data=True, width=396)

In [58]:
def puddle_test():
  env.reset()
  for step in range(2):
    action = Actions.East
    new_state, reward, terminated, truncated, info = env.step(action)
    info_str = f"{Actions(action): <5}: {new_state} reward = {reward}"
    target_str = f"Target Reached = {info['target_reached']}"
    env.render(info = {'side_info': [((10,100),info_str),((10,130),target_str)]})

puddle_test()

### Adding a Maze

Many Grid World problems define mazes that need to be navigated in search of the exit. While we could achieve this by specifying a large array of walls, this would quickly get to be annoying. Therefore we can instead just specify that we'd like to add a maze and supply it with a random seed, which will determine the walls that are created.

By default the maze will only have a single path that can be followed to reach the exit. For many RL problems a better challenge is created when several possible options are available and the learning algorithm will need to find the best of these. By removing some of the walls from the maze we can create several routes to the exit. The RL algorithm will then need to find which one of these gets Baby Robot to the exit with the greatest reward.

Here, in our final level, we've added pretty much everything! We've specified a larger level of size 8x5 featuring a maze. We've then removed a few walls from this to create several routes to the exit. Then we've added some puddles, just to create more of a challenge. Finally, to make things look nice, we've specified that we'd like to use the '<i>black_orange</i>' theme (all of the colours are fully customizable).

In [59]:
setup = { 'width': 8,
          'height': 5,
          'add_maze': True,
          'maze_seed': 42,
          'end': [5,4]
        }       
walls = [((2, 0),'E'), # remove the east wall at (2,0)
         ((2, 2),'E'), # remove the east wall at (2,2)
         ((3, 2),'E'), # remove the east wall at (3,2)
         ((5, 2),'E')] # add an east wall at (5,2)        
setup['walls'] = walls

puddles = [((2,2),1),           
           ((2,0),1),
           ((7,3),1),
           ((3,2),2),
           ((5,1),2)]
setup['puddles'] = puddles

setup['grid'] = {'theme': 'black_orange'}
setup['side_panel'] = {'width':200}
env = BabyRobotEnv_v6(**setup)
env.render()

MultiCanvas(height=326, sync_image_data=True, width=718)

In [62]:
# run the environment, taking random actions
env.reset()

info = {}
terminated = False

# run for a maximum of 20 steps
for step in range(20):

  # choose a random action
  action = env.action_space.sample()   

  # take the action and get the information from the environment
  new_state, reward, terminated, truncated, info = env.step(action)

  # form an information string
  info_str = f"{Actions(action): <5}: {new_state} reward = {reward}"  
    
  # show the current position and reward  
  env.render(info = {'side_info': [((10,10),info_str),((10,30),f"Step = {step}")]})  

  if terminated:
    break

As seen above, Baby Robot now has a challenging problem, where he must search the maze looking for the exit. When the standard Gym Environment Reinforcement Learning loop is run, Baby Robot will begin to randomly explore the maze, gathering information that he can use to learn how to escape.

Obviously, given that random actions are being taken, and with the added complication of puddles that can potentially cause skids, it may take Baby Robot some time to locate the exit. To see how a Reinforcement Learning algorithm can be used to find the best route through the maze, check out the _[training notebook](https://colab.research.google.com/github/WhatIThinkAbout/BabyRobotGym/blob/main/notebooks/PPO_Training.ipynb)_.

# Summary
Over the course of these two articles we've seen how a custom _Gym Environment_ can be created, with real-time graphical output rendered directly into _Jupyter Notebook_ cells. 

The _ipycanvas_ library provides direct access to the HTML canvas, where simple graphical components can be combined to produce informative views of the _Reinforcement Learning_ environment.

Additionally, by basing this environment on the _Gym API_ we can create _Reinforcement Learning_ problems that are compatible with a host of different out-of-the box learning algorithms. Hopefully these articles have given you all the information you need to start building your own, bespoke, _RL_ environments.

---

If you'd just like to have a play with the Baby Robot environment, check out this _[notebook](https://colab.research.google.com/github/WhatIThinkAbout/BabyRobotGym/blob/main/notebooks/BabyRobot_API.ipynb)_ showing the different ways in which Baby Robot Grid Worlds can be created and the components that can be added.

---

Now that we can create a range of challenging worlds for Baby Robot to explore, all that's left to do is learn how to tackle these problems. The first part of the series on how to do this can be found _[here](https://towardsdatascience.com/state-values-and-policy-evaluation-ceefdd8c2369)_.


<center><img src="images/green_babyrobot_small.gif"/></center>