### gymnasium ###

https://gymnasium.farama.org/index.html

Gymnasium is a project that provides an API for all single agent reinforcement learning environments, and includes implementations of common environments.

The API contains four key functions: 

* make, reset, step and render, 

that this basic usage will introduce you to. 

At the core of Gymnasium is Env, a high-level python class representing a markov decision process (MDP) from reinforcement learning theory.

 Within gymnasium, environments (MDPs) are implemented as Env classes, along with Wrappers, which provide helpful utilities and can change the results passed to the user.

#### <u> Install packages: https://anaconda.org/search?q=gymnasium </u>

__Before install gymnasium Must install: (for Windows users)__
conda install swig
Microsoft C++ build tools - https://visualstudio.microsoft.com/downloads/
https://www.youtube.com/watch?v=gMgj4pSHLww

After that:

 conda install conda-forge::gymnasium-all

OR individually

conda install -c conda-forge gymnasium 
conda install -c conda-forge gymnasium-box2d 
conda install -c conda-forge gymnasium-classic_control 

OR

pip install .....

#### Initializing Environments ####

Initializing environments is very easy in Gymnasium and can be done via the make function:

In [1]:
from IPython.display import Image

import gymnasium as gym
import warnings
warnings.filterwarnings('ignore')


This will return an Env for users to interact with. 

To see all environments you can create, use __gymnasium.envs.registry.keys()__.make includes a number of additional parameters to adding wrappers, specifying keywords to the environment and more.

In [2]:
envs = gym.envs.registry.keys()

# for i,e in enumerate(envs):
#     print(f"Environment {i}: {e}")

#### Interacting with the Environment ####

The classic “agent-environment loop” pictured below is simplified representation of reinforcement learning that Gymnasium implements.

<img src ="blackjack_AE_loop.jpg" style="height:200px"/>

This loop is implemented using the following gymnasium code - here is an example of a Lunar lander

In [3]:
env = gym.make('CartPole-v1', render_mode="human")
observation, info = env.reset()

for _ in range(1000):
    action = env.action_space.sample()  # agent policy that uses the observation and info
    observation, reward, terminated, truncated, info = env.step(action)
    
    if terminated or truncated:
        observation, info = env.reset()
    env.render()
env.close()

In [4]:
env = gym.make("LunarLander-v3", render_mode="human")
observation, info = env.reset()

for _ in range(200):
    action = env.action_space.sample()  # agent policy that uses the observation and info
    observation, reward, terminated, truncated, info = env.step(action)
    
    if terminated or truncated:
        observation, info = env.reset()
    env.render()
env.close()

First, an environment is created using make with an additional keyword "render_mode" that specifies how the environment should be visualised. 

* After initializing the environment, we reset the environment to get the first observation of the environment. 
* Next, the agent performs an action in the environment, step, this can be imagined as moving a robot or pressing a button on a games’ controller that causes a change within the environment. 
* As a result, the agent receives a new observation from the updated environment along with a reward for taking the action. 
* This reward could be for instance positive for destroying an enemy or a negative reward for moving into lava. One such action-observation exchange is referred to as a timestep.
* However, after some timesteps, the environment may end, this is called the terminal state. For instance, the robot may have crashed, or the agent have succeeded in completing a task, the environment will need to stop as the agent cannot continue. 
* In gymnasium, if the environment has terminated, this is returned by step. Similarly, we may also want the environment to end after a fixed number of timesteps, in this case, the environment issues a truncated signal. 
* If either of terminated or truncated are true then reset should be called next to restart the environment.

#### Action and observation spaces ####

Every environment specifies the format of valid actions and observations with the env.action_space and env.observation_space attributes. This is helpful for both knowing the expected input and output of the environment as all valid actions and observation should be contained with the respective space.

In the example, we sampled random actions via env.action_space.sample() instead of using an agent policy, mapping observations to actions which users will want to make.

Every environment should have the attributes action_space and observation_space, both of which should be instances of classes that inherit from Space. Gymnasium has support for a majority of possible spaces users might need:

    Box: describes an n-dimensional continuous space. It’s a bounded space where we can define the upper and lower limits which describe the valid values our observations can take.

    Discrete: describes a discrete space where {0, 1, …, n-1} are the possible values our observation or action can take. Values can be shifted to {a, a+1, …, a+n-1} using an optional argument.

    Dict: represents a dictionary of simple spaces.

    Tuple: represents a tuple of simple spaces.

    MultiBinary: creates an n-shape binary space. Argument n can be a number or a list of numbers.

    MultiDiscrete: consists of a series of Discrete action spaces with a different number of actions in each element.



In [6]:
# Observation space
print(f"For the environment: Cartpole, the Shape of the Observation space: {env.observation_space.shape}\n")
# Let us sample this
print(f"Observations space sample:\n{env.observation_space.sample()}\n")

# Action space
print(f"The action space: {env.action_space}\n")
# Sample action space
print(f"Actions: {env.action_space.sample()}\n")


For the environment: Cartpole, the Shape of the Observation space: (8,)

Observations space sample:
[-0.7449603  -0.97068137 -1.3347585  -1.3255752  -1.2493231  -4.7408085
  0.72689414  0.5405498 ]

The action space: Discrete(4)

Actions: 1



In [7]:
# Let us see what we get when the agent takes an action in an environment

env = gym.make('CartPole-v1')

print(f"Reset to a start position: {env.reset()}")

Reset to a start position: (array([ 0.01318032, -0.02931096, -0.03096798,  0.03221523], dtype=float32), {})


In [8]:
# let us take an action in this environment
print(f"Result of action: {env.step(1)}")

# Let us store this information

obs, reward, terminated, truncated, info = env.step(1)

print(f"\n\nOutput of action by an agent is:\n\nobservation[next state]: {obs}\nReward: {reward}\nTerminated or end of episode (True or False): {terminated}\nTruncated (min runs): {truncated}")



Result of action: (array([ 0.0125941 ,  0.16624108, -0.03032367, -0.27007532], dtype=float32), 1.0, False, False, {})


Output of action by an agent is:

observation[next state]: [ 0.01591892  0.36178234 -0.03572518 -0.57216614]
Reward: 1.0
Terminated or end of episode (True or False): False
Truncated (min runs): False
