# UPDATED GYM REVIEW

<a href="https://colab.research.google.com/github/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_12_01_ai_gym.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# T81-558: Applications of Deep Neural Networks
**Module 12: Reinforcement Learning**
* Instructor: [Jeff Heaton](https://sites.wustl.edu/jeffheaton/), McKelvey School of Engineering, [Washington University in St. Louis](https://engineering.wustl.edu/Programs/Pages/default.aspx)
* For more information visit the [class website](https://sites.wustl.edu/jeffheaton/t81-558/).

# Module 12 Video Material

* **Part 12.1: Introduction to the OpenAI Gym** [[Video]](https://www.youtube.com/watch?v=_KbUxgyisjM&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](https://github.com/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_12_01_ai_gym.ipynb)
* Part 12.2: Introduction to Q-Learning [[Video]](https://www.youtube.com/watch?v=A3sYFcJY3lA&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](https://github.com/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_12_02_qlearningreinforcement.ipynb)
* Part 12.3: Keras Q-Learning in the OpenAI Gym [[Video]](https://www.youtube.com/watch?v=qy1SJmsRhvM&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](https://github.com/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_12_03_keras_reinforce.ipynb)
* Part 12.4: Atari Games with Keras Neural Networks [[Video]](https://www.youtube.com/watch?v=co0SwPWoZh0&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](https://github.com/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_12_04_atari.ipynb)
* Part 12.5: Application of Reinforcement Learning [[Video]](https://www.youtube.com/watch?v=1jQPP3RfwMI&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](https://github.com/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_12_05_apply_rl.ipynb)


# Part 12.1: Introduction to the OpenAI Gym

[OpenAI Gym](https://gym.openai.com/) aims to provide an easy-to-setup general-intelligence benchmark with various environments. The goal is to standardize how environments are defined in AI research publications to make published research more easily reproducible. The project claims to provide the user with a simple interface. As of June 2017, developers can only use Gym with Python. 

OpenAI gym is pip-installed onto your local machine. There are a few significant limitations to be aware of:

* OpenAI Gym Atari only **directly** supports Linux and Macintosh
* OpenAI Gym Atari can be used with Windows; however, it requires a particular [installation procedure](https://towardsdatascience.com/how-to-install-openai-gym-in-a-windows-environment-338969e24d30)
* OpenAI Gym can not directly render animated games in Google CoLab.

Because OpenAI Gym requires a graphics display, an embedded video is the only way to display Gym in Google CoLab. The presentation of OpenAI Gym game animations in Google CoLab is discussed later in this module.

## OpenAI Gym Leaderboard

The OpenAI Gym does have a leaderboard, similar to Kaggle; however, the OpenAI Gym's leaderboard is much more informal compared to Kaggle. The user's local machine performs all scoring. As a result, the OpenAI gym's leaderboard is strictly an "honor system."  The leaderboard is maintained in the following GitHub repository:

* [OpenAI Gym Leaderboard](https://github.com/openai/gym/wiki/Leaderboard)

You must provide a write-up with sufficient instructions to reproduce your result if you submit a score. A video of your results is suggested but not required.

## Looking at Gym Environments

The centerpiece of Gym is the environment, which defines the "game" in which your reinforcement algorithm will compete. An environment does not need to be a game; however, it describes the following game-like features:
* **action space**: What actions can we take on the environment at each step/episode to alter the environment.
* **observation space**: What is the current state of the portion of the environment that we can observe. Usually, we can see the entire environment.

Before we begin to look at Gym, it is essential to understand some of the terminology used by this library.

* **Agent** - The machine learning program or model that controls the actions.
Step - One round of issuing actions that affect the observation space.
* **Episode** - A collection of steps that terminates when the agent fails to meet the environment's objective or the episode reaches the maximum number of allowed steps.
* **Render** - Gym can render one frame for display after each episode.
* **Reward** - A positive reinforcement that can occur at the end of each episode, after the agent acts.
* **Non-deterministic** - For some environments, randomness is a factor in deciding what effects actions have on reward and changes to the observation space.

It is important to note that many gym environments specify that they are not non-deterministic even though they use random numbers to process actions. Based on the gym GitHub issue tracker, a non-deterministic property means a deterministic environment behaves randomly. Even when you give the environment a consistent seed value, this behavior is confirmed. The program can use the seed method of an environment to seed the random number generator for the environment.

The Gym library allows us to query some of these attributes from environments. I created the following function to query gym environments.


# INSTALL DEPENDENCIES

In [1]:
# # HIDE OUTPUT
# # !apt-get update > /dev/null 2>&1
# # !apt-get install cmake > /dev/null 2>&1
# !apt-get install -y xvfb python-opengl ffmpeg > /dev/null 2>&1
# # !pip install --upgrade setuptools 2>&1
# !pip install ffmpeg
# !pip install imageio-ffmpeg
# !pip install wheel
# !pip install ez_setup > /dev/null 2>&1
# !pip install pyglet==1.5.27
# !pip install -U 'mujoco-py<2.2,>=2.1'
# !pip install pyvirtualdisplay > /dev/null 2>&1
# # !pip install gym[atari] > /dev/null 2>&1
# # !pip install gym[atari,accept-rom-license]==0.21.0 > /dev/null 2>&1
# !pip install gym[atari,accept-rom-license,classic_control] > /dev/null 2>&1
# !pip install atari-py==0.2.5
# # pip install gym[classic_control]

# !pip install gymnasium[classic_control,box2d]

In [2]:
# # # HIDE OUTPUT
# !wget http://www.atarimania.com/roms/Roms.rar 
# !unrar x -o+ /content/Roms.rar >/dev/nul
# !python -m atari_py.import_roms /content/ROMS >/dev/nul

In [3]:
# !pip install autorom
# !pip install --upgrade gym[atari]

In [4]:
import gym
import ale_py

print('gym:', gym.__version__)
print('ale_py:', ale_py.__version__)

env = gym.make('Breakout-v4')

gym: 0.26.2
ale_py: 0.8.1


A.L.E: Arcade Learning Environment (version 0.8.1+53f58b7)
[Powered by Stella]


In [5]:
import gymnasium as gym2

gym2.__version__

'0.29.1'

In [6]:
import gym


def query_environment(name):
    env = gym.make(name)
    spec = gym.spec(name)
    print(f"Action Space: {env.action_space}")
    print(f"Observation Space: {env.observation_space}")
    print(f"Max Episode Steps: {spec.max_episode_steps}")
    print(f"Nondeterministic: {spec.nondeterministic}")
    print(f"Reward Range: {env.reward_range}")
    print(f"Reward Threshold: {spec.reward_threshold}")


In [7]:
all_envs = gym.envs.registry.values() # older versions
# all_envs = gym.envs.registry.all() # newer versions

env_ids = [env_spec.id for env_spec in all_envs]
print("\n".join(sorted(env_ids)))

ALE/Adventure-ram-v5
ALE/Adventure-v5
ALE/AirRaid-ram-v5
ALE/AirRaid-v5
ALE/Alien-ram-v5
ALE/Alien-v5
ALE/Amidar-ram-v5
ALE/Amidar-v5
ALE/Assault-ram-v5
ALE/Assault-v5
ALE/Asterix-ram-v5
ALE/Asterix-v5
ALE/Asteroids-ram-v5
ALE/Asteroids-v5
ALE/Atlantis-ram-v5
ALE/Atlantis-v5
ALE/Atlantis2-ram-v5
ALE/Atlantis2-v5
ALE/Backgammon-ram-v5
ALE/Backgammon-v5
ALE/BankHeist-ram-v5
ALE/BankHeist-v5
ALE/BasicMath-ram-v5
ALE/BasicMath-v5
ALE/BattleZone-ram-v5
ALE/BattleZone-v5
ALE/BeamRider-ram-v5
ALE/BeamRider-v5
ALE/Berzerk-ram-v5
ALE/Berzerk-v5
ALE/Blackjack-ram-v5
ALE/Blackjack-v5
ALE/Bowling-ram-v5
ALE/Bowling-v5
ALE/Boxing-ram-v5
ALE/Boxing-v5
ALE/Breakout-ram-v5
ALE/Breakout-v5
ALE/Carnival-ram-v5
ALE/Carnival-v5
ALE/Casino-ram-v5
ALE/Casino-v5
ALE/Centipede-ram-v5
ALE/Centipede-v5
ALE/ChopperCommand-ram-v5
ALE/ChopperCommand-v5
ALE/CrazyClimber-ram-v5
ALE/CrazyClimber-v5
ALE/Crossbow-ram-v5
ALE/Crossbow-v5
ALE/Darkchambers-ram-v5
ALE/Darkchambers-v5
ALE/Defender-ram-v5
ALE/Defender-v5
ALE/

We will look at the **MountainCar-v0** environment, which challenges an underpowered car to escape the valley between two mountains.  The following code describes the Mountian Car environment.

In [8]:
query_environment("MountainCar-v0")

Action Space: Discrete(3)
Observation Space: Box([-1.2  -0.07], [0.6  0.07], (2,), float32)
Max Episode Steps: 200
Nondeterministic: False
Reward Range: (-inf, inf)
Reward Threshold: -110.0


This environment allows three distinct actions: accelerate forward, decelerate, or backward. The observation space contains two continuous (floating point) values, as evident by the box object. The observation space is simply the position and velocity of the car. The car has 200 steps to escape for each episode. You would have to look at the code, but the mountain car receives no incremental reward. The only reward for the vehicle occurs when it escapes the valley.  

In [9]:
query_environment("CartPole-v1")

Action Space: Discrete(2)
Observation Space: Box([-4.8000002e+00 -3.4028235e+38 -4.1887903e-01 -3.4028235e+38], [4.8000002e+00 3.4028235e+38 4.1887903e-01 3.4028235e+38], (4,), float32)
Max Episode Steps: 500
Nondeterministic: False
Reward Range: (-inf, inf)
Reward Threshold: 475.0


The **CartPole-v1** environment challenges the agent to balance a pole while the agent. The environment has an observation space of 4 continuous numbers:

* Cart Position
* Cart Velocity
* Pole Angle
* Pole Velocity At Tip

To achieve this goal, the agent can take the following actions:

* Push cart to the left
* Push cart to the right

There is also a continuous variant of the mountain car. This version does not simply have the motor on or off. The action space is a single floating-point number for the continuous cart that specifies how much forward or backward force the cart currently utilizes.

In [10]:
query_environment("MountainCarContinuous-v0")

Action Space: Box(-1.0, 1.0, (1,), float32)
Observation Space: Box([-1.2  -0.07], [0.6  0.07], (2,), float32)
Max Episode Steps: 999
Nondeterministic: False
Reward Range: (-inf, inf)
Reward Threshold: 90.0


Note: If you see a warning above, you can safely ignore it; it is a relatively minor bug in OpenAI Gym.

Atari games, like breakout, can use an observation space that is either equal to the size of the Atari screen (210x160) or even use the RAM of the Atari (128 bytes) to determine the state of the game.  Yes, that's bytes, not kilobytes!

In [11]:
query_environment("Breakout-v4")

Action Space: Discrete(4)
Observation Space: Box(0, 255, (210, 160, 3), uint8)
Max Episode Steps: None
Nondeterministic: False
Reward Range: (-inf, inf)
Reward Threshold: None


In [12]:
query_environment("Breakout-ram-v4")

Action Space: Discrete(4)
Observation Space: Box(0, 255, (128,), uint8)
Max Episode Steps: None
Nondeterministic: False
Reward Range: (-inf, inf)
Reward Threshold: None


## Render OpenAI Gym Environments from CoLab

It is possible to visualize the game your agent is playing, even on CoLab. This section provides information on generating a video in CoLab that shows you an episode of the game your agent is playing. I based this video process on suggestions found [here](https://colab.research.google.com/drive/1flu31ulJlgiRL1dnN2ir8wGh9p7Zij2t).

Begin by installing **pyvirtualdisplay** and **python-opengl**.

In [13]:
import gym
from gym.wrappers.monitoring.video_recorder import VideoRecorder
import glob
import io
import base64
from IPython.display import HTML
from pyvirtualdisplay import Display
from IPython import display as ipythondisplay
from typing import Optional

import os
os.environ['PYVIRTUALDISPLAY_DISPLAYFD'] = '0'


In [14]:
display = Display(visible=0, size=(1400, 900))

In [15]:
display.start()

"""
Utility functions to enable video recording of gym environment 
and displaying it.
To enable video, just do "env = wrap_env(env)""
"""

'\nUtility functions to enable video recording of gym environment \nand displaying it.\nTo enable video, just do "env = wrap_env(env)""\n'

In [16]:

def show_video(filename:Optional[str]=None):
    if filename is None:
      mp4list = glob.glob('./*.mp4')    
      if len(mp4list) > 0:
        mp4 = mp4list[0]
      else:
        print("Could not find video")
    else:
      mp4 = filename
      video = io.open(mp4, 'r+b').read()
      encoded = base64.b64encode(video)
      ipythondisplay.display(HTML(data='''<video alt="test" autoplay 
              loop controls style="height: 400px;">
              <source src="data:video/mp4;base64,{0}" type="video/mp4" />
            </video>'''.format(encoded.decode('ascii'))))
    # else:
    #     print("Could not find video")


Now we are ready to play the game.  We use a simple random agent.

In [17]:
import time
timestr = f"./video_{time.strftime('%Y%m%d_%H%M%S')}.mp4"

# env = gym.make("MountainCar-v0")
# env = gym.make("Pendulum-v1")
# env = gym.make("CartPole-v0")
# env = gym.make("Atlantis-v4", render_mode="rgb_array")
env = gym.make("Blackjack-v1",render_mode='rgb_array')

video_recorder = None
video_recorder = VideoRecorder(env=env,
                               path=timestr,
                               enabled=True)
observation = env.reset()
i = 0
while True:

    # env.unwrapped.render(mode="rgb_array")
    env.render()
    video_recorder.capture_frame()

    # your agent goes here
    action = env.action_space.sample()

    # observation, reward, done, truncated, info = env.step(action)
    observation, reward, terminated, truncated, info = env.step(action)

    if terminated:
        print("Exit by episode done!")
        break
    i += 1
    if i > 1000:
        print("Exit by max steps!")
        break

print("Saved video.")
video_recorder.close()
video_recorder.enabled = False

# env.close()
show_video(timestr)

  if not isinstance(terminated, (bool, np.bool8)):


Exit by episode done!
Saved video.
Moviepy - Building video ./video_20240614_203539.mp4.
Moviepy - Writing video ./video_20240614_203539.mp4



                                                  

Moviepy - Done !
Moviepy - video ready ./video_20240614_203539.mp4




In [18]:
glob.glob('*.mp4')

['video_20240614_202145.mp4',
 'video_20240614_203539.mp4',
 'video_20240614_202458.mp4',
 'video_20240614_202358.mp4',
 'video_20240614_202843.mp4',
 'video_20240614_202406.mp4']

# GYMNASIUM as new GYM project

In [19]:
import gymnasium as gym2
# from gymnasium.wrappers.monitoring.video_recorder import VideoRecorder as VideoRecorder2

def query_environment2(name):
    env = gym2.make(name)
    spec = gym2.spec(name)
    print(f"Action Space: {env.action_space}")
    print(f"Observation Space: {env.observation_space}")
    print(f"Max Episode Steps: {spec.max_episode_steps}")
    print(f"Nondeterministic: {spec.nondeterministic}")
    print(f"Reward Range: {env.reward_range}")
    print(f"Reward Threshold: {spec.reward_threshold}")

In [20]:
all_envs = gym2.envs.registry.values()

env_ids = [env_spec.id for env_spec in all_envs]
print("\n".join(sorted(env_ids)))

ALE/Adventure-ram-v5
ALE/Adventure-v5
ALE/AirRaid-ram-v5
ALE/AirRaid-v5
ALE/Alien-ram-v5
ALE/Alien-v5
ALE/Amidar-ram-v5
ALE/Amidar-v5
ALE/Assault-ram-v5
ALE/Assault-v5
ALE/Asterix-ram-v5
ALE/Asterix-v5
ALE/Asteroids-ram-v5
ALE/Asteroids-v5
ALE/Atlantis-ram-v5
ALE/Atlantis-v5
ALE/Atlantis2-ram-v5
ALE/Atlantis2-v5
ALE/Backgammon-ram-v5
ALE/Backgammon-v5
ALE/BankHeist-ram-v5
ALE/BankHeist-v5
ALE/BasicMath-ram-v5
ALE/BasicMath-v5
ALE/BattleZone-ram-v5
ALE/BattleZone-v5
ALE/BeamRider-ram-v5
ALE/BeamRider-v5
ALE/Berzerk-ram-v5
ALE/Berzerk-v5
ALE/Blackjack-ram-v5
ALE/Blackjack-v5
ALE/Bowling-ram-v5
ALE/Bowling-v5
ALE/Boxing-ram-v5
ALE/Boxing-v5
ALE/Breakout-ram-v5
ALE/Breakout-v5
ALE/Carnival-ram-v5
ALE/Carnival-v5
ALE/Casino-ram-v5
ALE/Casino-v5
ALE/Centipede-ram-v5
ALE/Centipede-v5
ALE/ChopperCommand-ram-v5
ALE/ChopperCommand-v5
ALE/CrazyClimber-ram-v5
ALE/CrazyClimber-v5
ALE/Crossbow-ram-v5
ALE/Crossbow-v5
ALE/Darkchambers-ram-v5
ALE/Darkchambers-v5
ALE/Defender-ram-v5
ALE/Defender-v5
ALE/

In [21]:
import time
timestr = f"./video_{time.strftime('%Y%m%d_%H%M%S')}.mp4"

env = gym2.make("MountainCar-v0", render_mode="rgb_array")

video_recorder2 = None
video_recorder2 = VideoRecorder(env=env,
                                 path=timestr,
                                 enabled=True)

observation, info = env.reset(seed=42)
total_rewards = 0
for _ in range(1000):

  env.unwrapped.render()
  video_recorder2.capture_frame()
  
  action = env.action_space.sample()  # this is where you would insert your policy
  observation, reward, terminated, truncated, info = env.step(action)
  total_rewards += reward
  # print(f".....STEP {_}.....")
  # print(observation)
  # print(reward)
  # print(terminated)
  # print(info)
  if terminated or truncated:
    break

print(f"Total rewards: {total_rewards}")

print("Saved video.")
video_recorder2.close()
video_recorder2.enabled = False

show_video(timestr)

Total rewards: -200.0
Saved video.
Moviepy - Building video ./video_20240614_203539.mp4.
Moviepy - Writing video ./video_20240614_203539.mp4



                                                               

Moviepy - Done !
Moviepy - video ready ./video_20240614_203539.mp4


In [22]:
observation, info = env.reset()

In [23]:
observation

array([-0.5122243,  0.       ], dtype=float32)

In [25]:
import mujoco_py



You appear to be missing MuJoCo.  We expected to find the file here: /home/vic_263/.mujoco/mujoco210

This package only provides python bindings, the library must be installed separately.

Please follow the instructions on the README to install MuJoCo

    https://github.com/openai/mujoco-py#install-mujoco

Which can be downloaded from the website

    https://www.roboti.us/index.html



Exception: 
You appear to be missing MuJoCo.  We expected to find the file here: /home/vic_263/.mujoco/mujoco210

This package only provides python bindings, the library must be installed separately.

Please follow the instructions on the README to install MuJoCo

    https://github.com/openai/mujoco-py#install-mujoco

Which can be downloaded from the website

    https://www.roboti.us/index.html


In [26]:
env = gym2.make("Humanoid-v4")

Exception: 
Missing path to your environment variable. 
Current values LD_LIBRARY_PATH=/home/vic_263/anaconda3/envs/ia/lib/python3.10/site-packages/cv2/../../lib64:/usr/local/cuda/lib64
Please add following line to .bashrc:
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/vic_263/.mujoco/mujoco210/bin

In [None]:
query_environment2("Taxi-v3")

In [None]:
env = gym2.make("Taxi-v3")

In [None]:
env.reset()

In [None]:
env.step(1)

In [None]:
action = env.action_space.sample()
action