![what is RL](figs/what_is_rl.jpeg)

# Introduction to Reinforcement Learning with OpenAI Gym
## About Gym
* `Gym` is an open source Python library for developing and comparing `reinforcement learning` algorithms
* It provides a standard API to communicate between the learning algorithms and environments.
* It also provides a standard set of environments compliant with that API.
* `Gym` documentation website is located [https://www.gymlibrary.dev/](https://www.gymlibrary.dev/). 
* `Gym` also has a discord server for development purposes that you can join [https://discord.gg/nHg2JRN489](https://discord.gg/nHg2JRN489).
* `Gym`'s official developer site is [https://github.com/openai/gym](https://github.com/openai/gym).


![gymlibrary.dev](figs/gymlibrary.dev.jpeg)

## Gym environment categories
1. Atari
2. MuJoCo
3. Toy Text
4. Classic Control
5. Box2D
6. Third party environments
7. Create your own!

## 1. Atari Gym environments
* A set of Atari 2600 environment simulated through Stella and the Arcade Learning Environment.
* Atari environments are simulated via the Arcade Learning Environment (ALE).
* List of Atari 2600 games: [https://en.wikipedia.org/wiki/List_of_Atari_2600_games](https://en.wikipedia.org/wiki/List_of_Atari_2600_games)
* Just in case you are looking for ROMs to revisit past [http://www.atarimania.com/rom_collection_archive_atari_2600_roms.html](http://www.atarimania.com/rom_collection_archive_atari_2600_roms.html)

![Atari-2600](figs/Atari.jpeg)

## 2. MuJoCo Gym environments
* MuJoCo stands for **Mu**lti-**Jo**int dynamics with **Co**ntact. 
* It is a physics engine for faciliatating research and development in robotics, biomechanics, graphics and animation, and other areas where fast and accurate simulation is needed.
* The unique dependencies for this set of environments can be installed via: `pip install gym[mujoco]`

![Mujoco](figs/mujoco.jpeg)

## 3. Toy Text Gym environments
* All toy text environments were created by us using native Python libraries such as StringIO.
* These environments are designed to be extremely simple, with small discrete state and action spaces, and hence easy to learn. 
* As a result, they are suitable for debugging implementations of reinforcement learning algorithms.

![toytext](figs/toytext.jpeg)

## 4. Classic Control Gym environments
* The unique dependencies for this set of environments can be installed via: `pip install gym[classic_control]` .
* There are five classic control environments: `Acrobot`, `CartPole`, `Mountain Car`, `Continuous Mountain Car`, and `Pendulum`. 
* All of these environments are stochastic in terms of their initial state, within a given range. 
* In addition, `Acrobot` has noise applied to the taken action. 
* Also, regarding the both `mountain car` environments, the cars are under powered to climb the mountain, so it takes some effort to reach the top.
* Among Gym environments, this set of environments can be considered as easier ones to solve by a `policy`.

![classic-control](figs/classic-control2.jpeg)

## 5. Box2D Gym environments
* These environments all involve toy games based around physics control, using box2d based physics and PyGame based rendering. 
* These environments were contributed back in the early days of Gym by Oleg Klimov, and have become popular toy benchmarks ever since. 
* All environments are highly configurable via arguments specified in each environment’s documentation.
* The unique dependencies for this set of environments can be installed via: `pip install gym[box2d]`

![box2d](figs/box2d.jpeg)

## 6. Third party Gym environments
* OpenAI Gym also hosts a good number of third party environments. For example,
* `ViZDoom` :  An environment centered around the original Doom game.
    - It focuses on visual control (from image to actions) at thousands of frames per second.
* `MineRL` : Gym interface with Minecraft game, focused on a specific sparse reward challenge.
* `GymGo` : The board game Go, also known as Weiqi. The game that was famously conquered by AlphaGo.
* `gym-gazebo` : gym-gazebo presents an extension of the initial OpenAI gym for robotics using ROS and Gazebo, an advanced 3D modeling and rendering tool.
* `PettingZoo` : PettingZoo is a Python library for conducting research in multi-agent reinforcement learning, akin to a multi-agent version of Gym.
* `gym-recsys` : This package describes an OpenAI Gym interface for creating a simulation environment of reinforcement learning-based recommender systems (RL-RecSys). The design strives for simple and flexible APIs to support novel research.
* `math-prog-synth-env` : In our paper “A Reinforcement Learning Environment for Mathematical Reasoning via Program Synthesis” we convert the DeepMind Mathematics Dataset into an RL environment based around program.
* `NLPGym`: A toolkit to develop RL agents to solve NLP tasks. NLPGym provides interactive environments for standard NLP tasks such as sequence tagging, question answering, and sequence classification. Users can easily customize the tasks with their own datasets, observations, featurizers and reward functions.
* ...
* and many more
* Detailed list can be obtained [https://www.gymlibrary.dev/environments/third_party_environments/](https://www.gymlibrary.dev/environments/third_party_environments/)

## 7. Create your own Gym environments
* If you are interested to construct a gym environment yourself and behave the same way as all other gym environments, OpenAI gym allows you to do that too with ease
    - Here is a short tutorial that you can follow [https://www.gymlibrary.dev/content/environment_creation/](https://www.gymlibrary.dev/content/environment_creation/)

![custom-gym-env](figs/custom-gym-env.jpeg)

## Installation
* `pip install gym`

This does not include dependencies for all families of environments (there's a massive number, and some can be problematic to install on certain systems). You can install these dependencies for one family like `pip install gym[atari]` or use the following to install all dependencies.: 
* `pip install gym[all]`

If the above pip install throws a bash/zsh error, it might be the subscripts not allowed there. You need to set option for that.
* `setopt no_nomatch`

Then run again.

## The Imports

In [13]:
#!setopt no_nomatch
#!pip install gym[pong]

zsh:1: no matches found: gym[pong]


In [2]:
import gym
import math
import imageio.v2 as imageio
import os
import numpy as np
from tqdm import tqdm
import matplotlib.pyplot as plt
from joblib import load, dump
import pandas as pd

## Listing available `gym` environments

In [None]:
#!/Users/ashis/venv-directory/venv-ml-p3.10/bin/python3.10
#Please make this python file executable and then run it without passing 
# it to python interpreter as the the interpreter listed on the first line 
# will be invoked. Good luck!
#$ chmod +x list_all_envs_registry.py
#$ ./list_all_envs_registry.py

from gym import envs
#all_envs = envs.registry.all()
#env_ids = [env_spec.id for env_spec in all_envs]
#pprint(sorted(env_ids))
for key in envs.registry.keys():
    print(key)


In [4]:
list_keys = pd.read_csv('../list_all_envs_registry.txt')
print('First 10 environment keys:')
print(list_keys.head(n=10))
print('...')
print('Last 10 environment keys:')
print(list_keys.tail(n=10))

First 10 environment keys:
       ALE/Adventure-v5
0  ALE/Adventure-ram-v5
1        ALE/AirRaid-v5
2    ALE/AirRaid-ram-v5
3          ALE/Alien-v5
4      ALE/Alien-ram-v5
5         ALE/Amidar-v5
6     ALE/Amidar-ram-v5
7        ALE/Assault-v5
8    ALE/Assault-ram-v5
9        ALE/Asterix-v5
...
Last 10 environment keys:
       ALE/Adventure-v5
985         Walker2d-v3
986         Walker2d-v4
987              Ant-v2
988              Ant-v3
989              Ant-v4
990         Humanoid-v2
991         Humanoid-v3
992         Humanoid-v4
993  HumanoidStandup-v2
994  HumanoidStandup-v4


## Interacting with the Environment
Gym implements the classic “agent-environment loop”:

<img src="https://www.gymlibrary.dev/_images/AE_loop_dark.png" style="background-color:black;" width=300> 

Image courtesy: www.gymlibrary.dev

The agent performs some `actions` in the environment (usually by passing some control inputs to the environment, e.g. torque inputs of motors) and observes how the `environment’s state` changes. One such action-observation exchange is referred to as a `timestep`.

The goal in Reinforcement Learning (RL) is to manipulate the `environment` in some specific way. 

For instance, we want the agent to navigate a robot to a specific point in space. 
* If it succeeds in doing this (or makes some progress towards that goal), it will receive a `positive reward` alongside the observation for this `timestep`. 
* The reward may also be negative or 0, if the agent did not yet succeed (or did not make any progress). 
* The agent will then be trained to maximize the reward it accumulates over many timesteps.
* After some timesteps, the environment may enter a terminal state. 
    * For instance, the robot may have crashed! In that case, we want to `reset the environment` to a new initial state. The environment issues a done signal to the agent if it enters such a terminal state. 
    * Not all done signals must be triggered by a “catastrophic failure”: Sometimes we also want to issue a done signal after a fixed number of timesteps, or if the agent has succeeded in completing some task in the environment.




## Agent-Environment loop in `Gym`
* Here below is an example of agent-environment loop in `gym`: `LunarLander-v2`

### LunarLander-v2
* Reference [https://www.gymlibrary.dev/environments/box2d/lunar_lander/](https://www.gymlibrary.dev/environments/box2d/lunar_lander/)
![lunarlander-v2](figs/lunar_landerv2.png)



### LunarLander-v2

* This example will run an instance of `LunarLander-v2` environment for `n` timesteps. 
* Since we pass `render_mode="human"`, you should see a window pop up rendering the environment.
* Save the following in a file named `lunarlanderv2.py`

![lunarlander-v2](../video/lunarlander-v2-random.gif)

In [None]:
#!/Users/ashis/venv-directory/venv-ml-p3.10/bin/python3.10
#Please make this python file executable and then run it without passing it to python interpreter
#as the the interpreter listed on the first line will be invoked. Good luck!
#$ chmod +x lunarlanderv2.py
#$ ./lunralanderv2.py
import gym
from tqdm import tqdm

#number of timesteps
n = 500

#Since we pass render_mode="human", you should see a window pop up rendering the environment.
env = gym.make("LunarLander-v2", render_mode="human")

env.action_space.seed(42)

observation, info = env.reset(seed=42)

for _ in tqdm(range(n)):
    action = env.action_space.sample()
    observation, reward, terminated, truncated, info = env.step(action)
    
    if terminated or truncated:
        observation, info = env.reset()
        #break

env.close()

## Action space and Observation (i.e., state) space
* Every environment specifies the format of valid actions by providing an `env.action_space` attribute. 
* Similarly, the format of valid observations is specified by `env.observation_space`. 
* In the example above we sampled random actions via `env.action_space.sample()`. 
* Note that we need to seed the action space separately from the environment to ensure reproducible samples.

# Thanks for your attention