# Introduction to Reinforcement Learning with OpenAI Gym
## About Gym
`Gym` is an open source Python library for developing and comparing `reinforcement learning` algorithms by providing a standard API to communicate between learning algorithms and environments, as well as a standard set of environments compliant with that API. 
* `Gym` documentation website is located [here](https://www.gymlibrary.dev/). 
* `Gym` also has a discord server for development purposes that you can join [here](https://discord.gg/nHg2JRN489).
* `Gym`'s official developer site is [here](https://github.com/openai/gym).


## Installation
* `pip install gym`

This does not include dependencies for all families of environments (there's a massive number, and some can be problematic to install on certain systems). You can install these dependencies for one family like `pip install gym[atari]` or use the following to install all dependencies.: 
* `pip install gym[all]`

If the above pip install throws a bash/zsh error, it might be the subscripts not allowed there. You need to set option for that.
* `setopt no_nomatch`

Then run again.

## The Imports

In [13]:
#!setopt no_nomatch
#!pip install gym[pong]

zsh:1: no matches found: gym[pong]


In [1]:
import gym
import math
import imageio.v2 as imageio
import os
import numpy as np
from tqdm import tqdm
import matplotlib.pyplot as plt
from joblib import load, dump

## Listing available `gym` environments

In [None]:
#!/Users/ashis/venv-directory/venv-ml-p3.10/bin/python3.10
#Please make this python file executable and then run it without passing 
# it to python interpreter as the the interpreter listed on the first line 
# will be invoked. Good luck!
#$ chmod +x list_all_envs_registry.py
#$ ./list_all_envs_registry.py

from gym import envs
#all_envs = envs.registry.all()
#env_ids = [env_spec.id for env_spec in all_envs]
#pprint(sorted(env_ids))
for key in envs.registry.keys():
    print(key)

## Interacting with the Environment
Gym implements the classic “agent-environment loop”:

<img src="https://www.gymlibrary.dev/_images/AE_loop_dark.png" style="background-color:black;" width=300> 

Image courtesy: www.gymlibrary.dev

The agent performs some `actions` in the environment (usually by passing some control inputs to the environment, e.g. torque inputs of motors) and observes how the `environment’s state` changes. One such action-observation exchange is referred to as a `timestep`.

The goal in Reinforcement Learning (RL) is to manipulate the `environment` in some specific way. 

For instance, we want the agent to navigate a robot to a specific point in space. 
* If it succeeds in doing this (or makes some progress towards that goal), it will receive a `positive reward` alongside the observation for this `timestep`. 
* The reward may also be negative or 0, if the agent did not yet succeed (or did not make any progress). 
* The agent will then be trained to maximize the reward it accumulates over many timesteps.
* After some timesteps, the environment may enter a terminal state. 
    * For instance, the robot may have crashed! In that case, we want to `reset the environment` to a new initial state. The environment issues a done signal to the agent if it enters such a terminal state. 
    * Not all done signals must be triggered by a “catastrophic failure”: Sometimes we also want to issue a done signal after a fixed number of timesteps, or if the agent has succeeded in completing some task in the environment.




## Agent-Environment loop in `Gym`
* Here below are few examples of agent-environment loop in `gym`:

### LunarLander-v2
* Reference [https://www.gymlibrary.dev/environments/box2d/lunar_lander/](https://www.gymlibrary.dev/environments/box2d/lunar_lander/)
![lunarlander-v2](figs/lunar_landerv2.png)



### LunarLander-v2

* This example will run an instance of `LunarLander-v2` environment for `n` timesteps. 
* Since we pass `render_mode="human"`, you should see a window pop up rendering the environment.
* Save the following in a file named `lunarlanderv2.py`

In [None]:
#!/Users/ashis/venv-directory/venv-ml-p3.10/bin/python3.10
#Please make this python file executable and then run it without passing it to python interpreter
#as the the interpreter listed on the first line will be invoked. Good luck!
#$ chmod +x lunarlanderv2.py
#$ ./lunralanderv2.py
import gym
from tqdm import tqdm

#number of timestepts
n = 500

#Since we pass render_mode="human", you should see a window pop up rendering the environment.
env = gym.make("LunarLander-v2", render_mode="human")
env.action_space.seed(42)

observation, info = env.reset(seed=42)

for _ in tqdm(range(n)):
    action = env.action_space.sample()
    observation, reward, terminated, truncated, info = env.step(action)
    
    if terminated or truncated:
        observation, info = env.reset()
        #break

env.close()

## Action space and Observation (i.e., state) space
* Every environment specifies the format of valid actions by providing an `env.action_space` attribute. 
* Similarly, the format of valid observations is specified by `env.observation_space`. 
* In the example above we sampled random actions via `env.action_space.sample()`. 
* Note that we need to seed the action space separately from the environment to ensure reproducible samples.

# Thanks for your attention