# OpenAI Gym

This repository demonstrates a working OpenAI Gym environment in a Jupyter Notebook. It can be used as a starting point to use reinforcement algorithms to train the agent to learn. 

OpenAI Gym's repository can be found <a href='https://github.com/openai/gym'>here</a> with its extensive documentation, including installation instructions, found <a href='https://gym.openai.com/docs/'>here</a>.

First the `gym` library needs to be imported.

In [1]:
import gym
import Box2D

A gym environment called `env` can be created that simulates a car on a mountain. 

In [2]:
env = gym.make('BipedalWalker-v2')

The environment first needs to be reset to become active. 

In [3]:
observation = env.reset()

The walker has an initial state of:

In [4]:
print(observation)

[ 2.74734967e-03  1.40638338e-06 -1.83399785e-04 -1.60000086e-02
  9.21963006e-02  4.25762410e-04  8.60109627e-01  1.06479476e-03
  1.00000000e+00  3.25901024e-02  4.25730599e-04  8.53684366e-01
 -2.89682124e-04  1.00000000e+00  4.40813839e-01  4.45819944e-01
  4.61422592e-01  4.89549994e-01  5.34102559e-01  6.02460802e-01
  7.09148586e-01  8.85931492e-01  1.00000000e+00  1.00000000e+00]


The environment can be now be displayed.

In [5]:
env.render()

True

The state of the walker is given by 24 coordinates. This can also be determined by executing the following command.

In [6]:
env.observation_space

Box(24,)

A list of the actions that the walker can make can be determined by:

In [7]:
env.action_space

Box(4,)

This means that the walker can perform an action if told an array of four numbers. For example, a random action that the walker could make is:



In [8]:
env.action_space.sample()

array([ 0.680614  ,  0.02300155, -0.62014097,  0.6836832 ], dtype=float32)

Let's make the walker walk in a random motion:

In [10]:
env.reset()

done = False

while done is False:
    
    observation, reward, done, info = env.step(env.action_space.sample())
    env.render()

In [None]:
env.reset()
env.render()

In [None]:
env.step([0,0,1,0])
#env.render()


shows that there are four discrete actions that the car can make. The car can perform an action by using the `step` class method. For example, `env.step(0)` will propel the car in the left direction. The class method `step` will return four things: `observation, reward, done, info`. In order to visually see the action taken, the `render` class method needs to be called.
A simple example that shows the car always choosing action 0 is shown below:

In [None]:
env.step(env.action_space.sample())
env.render()

In [None]:
env.reset()

done = False

while done is False:
    
    observation, reward, done, info = env.step([10,10,10,10])
    env.render()

As can be seen, `env.step.(0)` moves the car to the left. The car can also exectute an action at random. The code below shows the car doing exactly this.

In [None]:
env.reset()

done = False

while done is False:
    
    observation, reward, done, info = env.step(env.action_space.sample())
    env.render()

In order to close the environment that was created, the following command needs to be executed:    

In [None]:
env.close()