# Week 1: 43008 Reinforcement Learning

## PART-A: Introduction to Open AI Gym and Environments

In this notebook we will get introduced to OpenAI Gym library and its Environments.

## Open AI Gym:

OpenAI Gym is a widely used open-source toolkit for developing and evaluating reinforcement learning algorithms. It provides a collection of pre-defined environments with a standardized interface, making it easier to test and compare different reinforcement learning methods.

Some key aspects of Open AI Gym:

* **1. Reinforcement Learning (RL) Framework:** OpenAI Gym focuses specifically on reinforcement learning, a subfield of machine learning where an agent learns to interact with an environment and maximize its cumulative reward through trial and error. RL is well-suited for problems that involve sequential decision-making, such as game playing, robotics control, and optimization.

*  **2. Environment Abstraction:** OpenAI Gym abstracts the RL problem into environments, which represent the tasks or scenarios that the agent interacts with. Each environment provides an interface for taking actions, observing states, and receiving rewards.

* **3. Standardized Interface:** All Gym environments share a common API, allowing users to seamlessly switch between different environments. This consistency simplifies the process of developing and evaluating reinforcement learning algorithms across various domains.

* **4. Variety of Environments:**  OpenAI Gym offers a diverse collection of environments, ranging from classic control tasks like CartPole and MountainCar to complex game environments like Atari 2600 games. These environments serve as benchmark problems for testing RL algorithms. Common Examples: Lunar Lander, Frozen Lake, etc.

* **5. Easy to Use:** OpenAI Gym is designed to be accessible to both beginners and advanced researchers. It provides a simple and intuitive interface for getting started with RL, while also allowing customization and extension of environments to suit specific needs.

* **6. Integration with RL Libraries:** OpenAI Gym easily integrates with popular reinforcement learning libraries and frameworks such as TensorFlow and PyTorch. This makes it convenient to combine Gym's environments with powerful deep learning tools for building advanced RL agents.

* **7. Documentation and Resources:** OpenAI Gym offers comprehensive documentation, including tutorials, examples, and API references. It also provides an active community forum where users can seek help, share ideas, and discuss RL-related topics.

OpenAI Gym has become a popular choice for experimenting with reinforcement learning algorithms due to its user-friendly interface, standardized environments, and compatibility with other machine learning frameworks. It serves as a valuable tool for both learning RL concepts and pushing the boundaries of RL research and development.

**For more information, you can visit the official OpenAI Gym website at https://gym.openai.com/ or refer to the Gym documentation available at https://gym.openai.com/docs/.**

# Using Open AI Gym

Following steps are involved in using Open AI Gym:



1.   Installaton
2.   Importing OpenAI Gym
3.   Exploring the available Environments
4.   Creating an Instance of an Environment
5.   Interacting with the Environment Instance
6.   Understanding the Environment

Let's explore each steps and get started using OpenAI Gym!

## Step 1: Installation: OpenAI Gym Supports Python 3.5+. To install, we can use `pip` and the package installer for Python:

In [1]:
# Install OpenAI Gym
!pip install gym
!pip install numpy==1.23.5



In [2]:
# Rendering dependencies <-- Required for Google Colab
!pip install gym pyvirtualdisplay > /dev/null 2>&1
!apt-get install -y xvfb python-opengl ffmpeg > /dev/null 2>&1
!apt-get install xvfb

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
xvfb is already the newest version (2:21.1.4-2ubuntu1.7~22.04.15).
0 upgraded, 0 newly installed, 0 to remove and 35 not upgraded.


### Helper Functions for rendering on Google Colab

In [3]:
from IPython.display import HTML
from pyvirtualdisplay import Display
from IPython import display as ipythondisplay

display = Display(visible=0, size=(1400, 900))
display.start()


# Utility functions to enable video recording of gym environment and displaying it
# To enable video, just do "env = wrap_env(env)"

# Function to Show video after rendering
def show_video():
  mp4list = glob.glob('video/*.mp4')
  if len(mp4list) > 0:
    mp4 = mp4list[0]
    video = io.open(mp4, 'r+b').read()
    encoded = base64.b64encode(video)
    ipythondisplay.display(HTML(data='''<video alt="test" autoplay
                loop controls style="height: 400px;">
                <source src="data:video/mp4;base64,{0}" type="video/mp4" />
             </video>'''.format(encoded.decode('ascii'))))
  else:
    print("Could not find video")

# Warpper for Environment to record the video
def wrap_env(env):
  env = RecordVideo(env, './video',  episode_trigger = lambda episode_number: True)
  return env

## Step 2: Importing OpenAI Gym

In [4]:
# Import OpenAi Gym
import gym

# import other required libraries/packages
from gym.wrappers.record_video import RecordVideo
import glob
import io
import base64
from IPython.display import HTML
from IPython import display as ipythondisplay

## Step 3: Exploring the available Environments:
OpenAI Gym provides a wide range of environments for reinforcement learning. You can explore the available environments using the `gym.envs.registry` module. Here's an example that lists all the registered environments:

In [5]:
# Import all the Environments
from gym import envs

all_envs = envs.registry.values()
env_ids = [env_spec.id for env_spec in all_envs]

# Print the Environment IDs
print(env_ids)

['CartPole-v0', 'CartPole-v1', 'MountainCar-v0', 'MountainCarContinuous-v0', 'Pendulum-v1', 'Acrobot-v1', 'LunarLander-v2', 'LunarLanderContinuous-v2', 'BipedalWalker-v3', 'BipedalWalkerHardcore-v3', 'CarRacing-v2', 'Blackjack-v1', 'FrozenLake-v1', 'FrozenLake8x8-v1', 'CliffWalking-v0', 'Taxi-v3', 'Reacher-v2', 'Reacher-v4', 'Pusher-v2', 'Pusher-v4', 'InvertedPendulum-v2', 'InvertedPendulum-v4', 'InvertedDoublePendulum-v2', 'InvertedDoublePendulum-v4', 'HalfCheetah-v2', 'HalfCheetah-v3', 'HalfCheetah-v4', 'Hopper-v2', 'Hopper-v3', 'Hopper-v4', 'Swimmer-v2', 'Swimmer-v3', 'Swimmer-v4', 'Walker2d-v2', 'Walker2d-v3', 'Walker2d-v4', 'Ant-v2', 'Ant-v3', 'Ant-v4', 'Humanoid-v2', 'Humanoid-v3', 'Humanoid-v4', 'HumanoidStandup-v2', 'HumanoidStandup-v4']


### Sample environments available in OpenAI Gym

[Lunar Lander](https://www.gymlibrary.dev/environments/box2d/lunar_lander/)


<img src='https://user-images.githubusercontent.com/15806078/153222406-af5ce6f0-4696-4a24-a683-46ad4939170c.gif' width=400>


[Frozen lake](https://www.gymlibrary.dev/_images/frozen_lake.gif)

<img src= 'https://www.gymlibrary.dev/_images/frozen_lake.gif' width=400>

## Step 4: Creating an intance of an Environment

In [6]:
# Create Environment instance
#env = wrap_env(gym.make('CartPole-v1', new_step_api=True)) # Replace 'CartPole-v1' with any other IDs the list
#env = wrap_env(gym.make('MountainCar-v0', new_step_api=True)) # Replace 'CartPole-v1' with any other IDs the list
#env = wrap_env(gym.make('Pendulum-v1', new_step_api=True)) # Replace 'CartPole-v1' with any other IDs the list
env = wrap_env(gym.make('FrozenLake-v1', new_step_api=True)) # Replace 'CartPole-v1' with any other IDs the list

  deprecation(


## Step 5: Interacting with the Environment instance:

To interact with the environment, you need to use a loop that runs until the episode is finished. Each iteration of the loop represents a step in the environment

In [7]:
env.reset()  # Reset the environment to the initial state
terminated = False  # Flag to indicate if the episode is finished

while not terminated:
    action = env.action_space.sample()  # Choose a random action
    observation, reward, terminated, info = env.step(action)  # Take a step

    # Perform your own computations or actions here

env.close()  # Close the environment

  logger.deprecation(
See here for more information: https://www.gymlibrary.ml/content/api/[0m
  deprecation(
  from pkg_resources import resource_stream, resource_exists
Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages
  declare_namespace(pkg)
Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages
  declare_namespace(pkg)
Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages
  declare_namespace(pkg)
See here for more information: https://www.gymlibrary.ml/content/api/[0m
  deprecation(


## Step 6: Understanding the Environment

Each environment has the following common properties:

1. **observation (object) –** this will be an element of the environment’s observation_space. This may, for instance, be a numpy array containing the positions and velocities of certain objects.

2. **reward (float) –** The amount of reward returned as a result of taking the action.

3. **terminated (bool) –** whether a terminal state (as defined under the MDP of the task) is reached. In this case further step() calls could return undefined results.

4. **truncated (bool) –** whether a truncation condition outside the scope of the MDP is satisfied. Typically a timelimit, but could also be used to indicate agent physically going out of bounds. Can be used to end the episode prematurely before a terminal state is reached.

5. **info (dictionary) –** info contains auxiliary diagnostic information (helpful for debugging, learning, and logging). This might, for instance, contain: metrics that describe the agent’s performance state, variables that are hidden from observations, or individual reward terms that are combined to produce the total reward. It also can contain information that distinguishes truncation and termination, however this is deprecated in favour of returning two booleans, and will be removed in a future version.

You can access these properties after each step to analyze the agent's performance.

**Essentials functions:**

The `env.reset()` function resets the environment to its initial state and returns the initial state.

The `env.action_space.sample()` function randomly selects an action from the action space.

The `env.step(action)` function takes an action as input and performs one step in the environment. It returns the next state, the reward obtained, a boolean value indicating whether the episode is done, and additional information.

Observe the printed output, which shows the current state, chosen action, reward, and next state for each step taken by the agent. You will notice that the agent moves randomly through the frozen lake, occasionally falling into holes and sometimes reaching the goal.

In [8]:
env.reset()  # Reset the environment to the initial state
terminated = False  # Flag to indicate if the episode is finished
step = 0
env.render()

while not terminated:
    action = env.action_space.sample()  # Choose a random action
    observation, reward, terminated, info = env.step(action)  # Take a step

    # Perform your own computations or actions here
    # Print each step results
    print("---- Step: ", step, " ----")
    print("Observation :", observation)
    print("Reward : ", reward)
    print("Terminated? :", terminated)
    #print("Truncated? :", truncated)
    print("Info :", info, '\n')
    step +=1 # increase the step count

env.close()  # Close the environment

  logger.deprecation(
If you want to render in human mode, initialize the environment in this way: gym.make('EnvName', render_mode='human') and don't call the render method.
See here for more information: https://www.gymlibrary.ml/content/api/[0m
  deprecation(
See here for more information: https://www.gymlibrary.ml/content/api/[0m
  deprecation(


---- Step:  0  ----
Observation : 4
Reward :  0.0
Terminated? : False
Info : {'prob': 0.3333333333333333} 

---- Step:  1  ----
Observation : 8
Reward :  0.0
Terminated? : False
Info : {'prob': 0.3333333333333333} 

---- Step:  2  ----
Observation : 4
Reward :  0.0
Terminated? : False
Info : {'prob': 0.3333333333333333} 

---- Step:  3  ----
Observation : 8
Reward :  0.0
Terminated? : False
Info : {'prob': 0.3333333333333333} 

---- Step:  4  ----
Observation : 8
Reward :  0.0
Terminated? : False
Info : {'prob': 0.3333333333333333} 

---- Step:  5  ----
Observation : 9
Reward :  0.0
Terminated? : False
Info : {'prob': 0.3333333333333333} 

---- Step:  6  ----
Observation : 10
Reward :  0.0
Terminated? : False
Info : {'prob': 0.3333333333333333} 

---- Step:  7  ----
Observation : 9
Reward :  0.0
Terminated? : False
Info : {'prob': 0.3333333333333333} 

---- Step:  8  ----
Observation : 13
Reward :  0.0
Terminated? : False
Info : {'prob': 0.3333333333333333} 

---- Step:  9  ----
Observ

In [9]:
# Show the recorded video of the Environment
show_video()

The `env.reset()` function resets the environment to its initial state and returns the initial state.

The `env.action_space.sample()` function randomly selects an action from the action space.

The `env.step(action)` function takes an action as input and performs one step in the environment. It returns the next state, the reward obtained, a boolean value indicating whether the episode is done, and additional information.

Observe the printed output, which shows the current state, chosen action, reward, and next state for each step taken by the agent. You will notice that the agent moves randomly through the frozen lake, occasionally falling into holes and sometimes reaching the goal.

## Additional Exploration

Feel free to modify the code and experiment with different actions and strategies to navigate the FrozenLake environment. You can also try to implement a reinforcement learning algorithm to train an agent to find an optimal policy for this environment.

## Conclusion

Congratulations! You have successfully explored the FrozenLake-v1 environment from OpenAI Gym. You learned about the environment's dynamics, interacted with it using Python code, and observed the agent's interactions. OpenAI Gym provides a wide range of environments to explore and experiment with reinforcement learning algorithms. Continue your RL journey and explore more exciting environments and algorithms!