# Introduction
A simple tutorial for the main functionalities of `gymnasium`. For more, check the `gymnasium` [documentation](https://gymnasium.farama.org/).

In [None]:
import gymnasium as gym 

# Environments
Gym MDPs are implemented though the class `Environment`.

**Environment list**

In [None]:
print(list(gym.envs.registry.keys())[:6])

#### Environment interaction
Environment interaction follows the following scheme
![](./imgs/env.png)

In [None]:
env = gym.make("LunarLander-v2")#, render_mode="human")
observation, info = env.reset()
for _ in range(200):
    action = env.action_space.sample()  # agent policy that uses the observation and info
    observation, reward, terminated, truncated, info = env.step(action)
    if terminated or truncated:
        observation, info = env.reset()
env.close()

**Render options**
- ``None`` (default): no render is computed.
- ``human``: The environment is continuously rendered in the current display or terminal, usually for human consumption. This rendering should occur during step() and render() doesn’t need to be called. Returns None.
- ``rgb_array``: Return a single frame representing the current state of the environment. A frame is a np.ndarray with shape (x, y, 3) representing RGB values for an x-by-y pixel image.
- ``ansi``: Return a strings (str) or StringIO.StringIO containing a terminal-style text representation for each time step. The text can include newlines and ANSI escape sequences (e.g. for colors).
- ``rgb_array_list`` and ``ansi_list``: List based version of render modes are possible (except Human) 
through the wrapper, ``gymnasium.wrappers``. 
``RenderCollection`` that is automatically applied during `gymnasium.make(..., render_mode="rgb_array_list")`. 
The frames collected are popped after `render()` is called or `reset()`.

In [None]:
env = gym.make("FrozenLake-v1", render_mode="ansi")
observation, info = env.reset()
for i in range(200):
    if i < 3:
        print(env.render())
    action = env.action_space.sample()  # agent policy that uses the observation and info
    observation, reward, terminated, truncated, info = env.step(action)
    if terminated or truncated:
        observation, info = env.reset()
env.close()

## Observation, reward and action space

In [None]:
ENVIRONMENTS = {
    'lake': lambda mode: gym.make("FrozenLake-v1", render_mode=mode),
    'lunar': lambda mode: gym.make("LunarLander-v2", render_mode=mode),
    'car': lambda mode: gym.make("CarRacing-v2", render_mode=mode)
}

### Action space options
- `Box`: describes an n-dimensional continuous space. It’s a bounded space where we can define the upper and lower limits which describe the valid values our observations can take.
- `Discrete`: describes a discrete space where ${0, 1, \dots, n-1}$ are the possible values our observation or action can take. Values can be shifted to ${a, a+1, \dots, a+n-1}$ using an optional argument.
- `Dict`: represents a dictionary of simple spaces.
- `Tuple`: represents a tuple of simple spaces.
- `MultiBinary`: creates an n-shape binary space. Argument n can be a number or a list of numbers.
- `MultiDiscrete`: consists of a series of Discrete action spaces with a different number of actions in each element.

In [None]:
env = ENVIRONMENTS['car'](None)
print(env.action_space)

In [None]:
example = 'car'
env = ENVIRONMENTS[example](None)
observation, info = env.reset()
for _ in range(10):
    action = env.action_space.sample()  # agent policy that uses the observation and info
    observation, reward, terminated, truncated, info = env.step(action)
    print(observation, reward, terminated, truncated, info)
    if terminated or truncated:
        observation, info = env.reset()
env.close()

## Use wrappers to modify existing environments
- `TimeLimit`: Issue a truncated signal if a maximum number of timesteps has been exceeded (or the base environment has issued a truncated signal).
- `ClipAction`: Clip the action such that it lies in the action space (of type Box).
- `RescaleAction`: Rescale actions to lie in a specified interval
- `TimeAwareObservation`: Add information about the index of timestep to observation. In some cases helpful to ensure that transitions are Markov.

In [None]:
from gymnasium.wrappers import FlattenObservation

In [None]:
env = ENVIRONMENTS['car'](None)
print(env.observation_space.shape)
wrap_env = FlattenObservation(env)
print(wrap_env.observation_space.shape)