# Part 12.1: Introduction to the OpenAI Gym

[OpenAI Gym](https://gym.openai.com/) aims to provide an easy-to-setup general-intelligence benchmark with a wide variety of different environments. The goal is to standardize how environments are defined in AI research publications so that published research becomes more easily reproducible. The project claims to provide the user with a simple interface. As of June 2017, developers can only use Gym with Python. 

OpenAI gym is pip-installed onto your local machine.  There are a few significant limitations to be aware of:

* OpenAI Gym Atari only **directly** supports Linux and Macintosh
* OpenAI Gym Atari can be used with Windows; however, it requires a particular [installation procedure](https://towardsdatascience.com/how-to-install-openai-gym-in-a-windows-environment-338969e24d30)
* OpenAI Gym can not directly render animated games in Google CoLab.

Because OpenAI Gym requires a graphics display, the only way to display Gym in Google CoLab is an embedded video.  The presentation of OpenAI Gym game animations in Google CoLab is discussed later in this module.

### OpenAI Gym Leaderboard

The OpenAI Gym does have a leaderboard, similar to Kaggle; however, the OpenAI Gym's leaderboard is much more informal compared to Kaggle.  The user's local machine performs all scoring.  As a result, the OpenAI gym's leaderboard is strictly an "honor's system."  The leaderboard is maintained the following GitHub repository:

* [OpenAI Gym Leaderboard](https://github.com/openai/gym/wiki/Leaderboard)

If you submit a score, you are required to provide a writeup with sufficient instructions to reproduce your result. A video of your results is suggested, but not required.

### Looking at Gym Environments

The centerpiece of Gym is the environment, which defines the "game" in which your reinforcement algorithm will compete.  An environment does not need to be a game; however, it describes the following game-like features:
* **action space**: What actions can we take on the environment, at each step/episode, to alter the environment.
* **observation space**: What is the current state of the portion of the environment that we can observe. Usually, we can see the entire environment.

Before we begin to look at Gym, it is essential to understand some of the terminology used by this library.

* **Agent** - The machine learning program or model that controls the actions.
Step - One round of issuing actions that affect the observation space.
* **Episode** - A collection of steps that terminates when the agent fails to meet the environment's objective, or the episode reaches the maximum number of allowed steps.
* **Render** - Gym can render one frame for display after each episode.
* **Reward** - A positive reinforcement that can occur at the end of each episode, after the agent acts.
* **Nondeterministic** - For some environments, randomness is a factor in deciding what effects actions have on reward and changes to the observation space.

It is important to note that many of the gym environments specify that they are not nondeterministic even though they make use of random numbers to process actions. It is generally agreed upon (based on the gym GitHub issue tracker) that nondeterministic property means that a deterministic environment will still behave randomly even when given consistent seed value. The seed method of an environment can be used by the program to seed the random number generator for the environment.

The Gym library allows us to query some of these attributes from environments.  I created the following function to query gym environments.


In [1]:
import gym

def query_environment(name):
  env = gym.make(name)
  spec = gym.spec(name)
  print(f"Action Space: {env.action_space}")
  print(f"Observation Space: {env.observation_space}")
  print(f"Max Episode Steps: {spec.max_episode_steps}")
  print(f"Nondeterministic: {spec.nondeterministic}")
  print(f"Reward Range: {env.reward_range}")
  print(f"Reward Threshold: {spec.reward_threshold}")

D:\UserFiles\anaconda\envs\ia\lib\site-packages\numpy\.libs\libopenblas.NOIJJG62EMASZI6NYURL6JBKM4EVBGM7.gfortran-win_amd64.dll
D:\UserFiles\anaconda\envs\ia\lib\site-packages\numpy\.libs\libopenblas.WCDJNK7YVMPZQ2ME2ZZHJJRJ3JIKNDB7.gfortran-win_amd64.dll


In [7]:
!pip install git+https://github.com/Kojoley/atari-py.git

Collecting git+https://github.com/Kojoley/atari-py.git
  Cloning https://github.com/Kojoley/atari-py.git to c:\users\yoda\appdata\local\temp\pip-req-build-87ty7yug
Building wheels for collected packages: atari-py
  Building wheel for atari-py (setup.py): started
  Building wheel for atari-py (setup.py): still running...
  Building wheel for atari-py (setup.py): still running...
  Building wheel for atari-py (setup.py): finished with status 'done'
  Created wheel for atari-py: filename=atari_py-1.2.2-cp38-cp38-win_amd64.whl size=551041 sha256=f5ddc0cf424258b3f0fea3cfff92f735c02ef2ab42f99c3479f82a762bb69411
  Stored in directory: C:\Users\yoda\AppData\Local\Temp\pip-ephem-wheel-cache-t9du4tfq\wheels\25\11\0c\d37ea19ecec588ab95c4199a485d3a4de5284e9a08b89c8f3f
Successfully built atari-py
Installing collected packages: atari-py
Successfully installed atari-py-1.2.2


  Running command git clone -q https://github.com/Kojoley/atari-py.git 'C:\Users\yoda\AppData\Local\Temp\pip-req-build-87ty7yug'


Collecting git+https://github.com/Kojoley/atari-py.git
  Cloning https://github.com/Kojoley/atari-py.git to c:\users\yoda\appdata\local\temp\pip-req-build-oc5fhkx2
Building wheels for collected packages: atari-py
  Building wheel for atari-py (setup.py): started
  Building wheel for atari-py (setup.py): still running...
  Building wheel for atari-py (setup.py): still running...
  Building wheel for atari-py (setup.py): finished with status 'done'
  Created wheel for atari-py: filename=atari_py-1.2.2-cp38-cp38-win_amd64.whl size=551042 sha256=e5f6914c3402cb6b5ed671160c8c14249089a93a6a6227f339a9d40ac8d32073
  Stored in directory: C:\Users\yoda\AppData\Local\Temp\pip-ephem-wheel-cache-i6ddlxi6\wheels\25\11\0c\d37ea19ecec588ab95c4199a485d3a4de5284e9a08b89c8f3f
Successfully built atari-py
Installing collected packages: atari-py
Successfully installed atari-py-1.2.2


  Running command git clone -q https://github.com/Kojoley/atari-py.git 'C:\Users\yoda\AppData\Local\Temp\pip-req-build-oc5fhkx2'


In [9]:
!pip install Box2D


Collecting Box2D
  Downloading Box2D-2.3.10-cp38-cp38-win_amd64.whl (1.3 MB)
Installing collected packages: Box2D
Successfully installed Box2D-2.3.10


We will begin by looking at the MountainCar-v0 environment, which challenges an underpowered car to escape the valley between two mountains.  The following code describes the Mountian Car environment.

In [10]:
query_environment("MountainCar-v0")

Action Space: Discrete(3)
Observation Space: Box(-1.2000000476837158, 0.6000000238418579, (2,), float32)
Max Episode Steps: 200
Nondeterministic: False
Reward Range: (-inf, inf)
Reward Threshold: -110.0


There are three distinct actions that can be taken: accelrate forward, decelerate, or accelerate backwards.  The observation space contains two continuous (floating point) values, as evident by the box object. The observation space is simply the position and velocity of the car.  The car has 200 steps to escape for each epasode.  You would have to look at the code to know, but the mountian car recieves no incramental reward.  The only reward for the car is given when it escapes the valley.  

In [11]:
query_environment("CartPole-v1")

Action Space: Discrete(2)
Observation Space: Box(-3.4028234663852886e+38, 3.4028234663852886e+38, (4,), float32)
Max Episode Steps: 500
Nondeterministic: False
Reward Range: (-inf, inf)
Reward Threshold: 475.0


The CartPole-v1 environment challenges the agent to move a cart while keeping a pole balanced. The environment has an observation space of 4 continuous numbers:

* Cart Position
* Cart Velocity
* Pole Angle
* Pole Velocity At Tip

To achieve this goal, the agent can take the following actions:

* Push cart to the left
* Push cart to the right

There is also a continuous variant of the mountain car.  This version does not simply have the motor on or off.  For the continuous car the action space is a single floating point number that specifies how much forward or backward force is being applied.

In [12]:
query_environment("MountainCarContinuous-v0")

Action Space: Box(-1.0, 1.0, (1,), float32)
Observation Space: Box(-1.2000000476837158, 0.6000000238418579, (2,), float32)
Max Episode Steps: 999
Nondeterministic: False
Reward Range: (-inf, inf)
Reward Threshold: 90.0


Note: ignore the warning above, it is a relativly inconsequential bug in OpenAI Gym.

Atari games, like breakout can use an observation space that is either equal to the size of the Atari screen (210x160) or even use the RAM memory of the Atari (128 bytes) to determine the state of the game.  Yes thats bytes, not kilobytes!

In [13]:
query_environment("Breakout-v0")

Action Space: Discrete(4)
Observation Space: Box(0, 255, (210, 160, 3), uint8)
Max Episode Steps: 10000
Nondeterministic: False
Reward Range: (-inf, inf)
Reward Threshold: None


In [14]:
query_environment("Breakout-ram-v0")

Action Space: Discrete(4)
Observation Space: Box(0, 255, (128,), uint8)
Max Episode Steps: 10000
Nondeterministic: False
Reward Range: (-inf, inf)
Reward Threshold: None
