# <center> GYM Tutorial </center>

`OpenAI Gym` is a Python package comprising a selection of RL environments, ranging from simple “toy” environments to more challenging environments, including simulated robotics environments and Atari video game environments.
It was developed with the aim of becoming a standardized environment and benchmark for RL research. These environments have a shared interface, allowing you to write general algorithms.

## Environments

An environment is a class which defines the `observation` (a.k.a. `state`) and `action` spaces and a set of methods as follows:
- Attributes: `action_space` and `observation_space` define allowed states and actions within this environment
- Methods
    - `reset()`:  initialize an environment object and reset all states to the initial values
    - `render()`:  render the environment and show visualization if implemented
    - `step(action)`: An `action` is taken, and the environment changes states and return a reward
    - `close()`: destroy the environment object



## Demo: CartPole game ##

For details of CartPole game, check https://github.com/openai/gym/wiki/CartPole-v0

**Observation (State)**:


|Num	|Observation	|Min	|Max|
|:------:|:------------|---------:|-----:|
|0	|Cart Position	|-2.4|	2.4|
|1	|Cart Velocity|	-Inf	|Inf|
|2	|Pole Angle	|~ -41.8°|	~ 41.8°|
|3	|Pole Velocity At Tip	|-Inf	|Inf|

**Actions**

|Num	|Action   |
|:------:|:------------|
|0	Push |cart to the left|
|1	Push |cart to the right|


Note: The amount the velocity is reduced or increased is not fixed as it depends on the angle the pole is pointing. This is because the center of gravity of the pole increases the amount of energy needed to move the cart underneath it

**Reward**

Reward is 1 for every step taken, including the termination step.

In [None]:
import gym

In [None]:
env = gym.make('CartPole-v0')
env.reset()  # sample a random state as an initial state
env.render()
#env.close()

True

## Understanding `space`  objects

The environments in gym have `Space` objects which describe the valid actions and observations. Every environment comes with an action_space and an observation_space. These attributes are of type Space, and they describe the format of valid actions and observations.

Typically, there are two types of spaces:

- `Discrete` space allows a fixed range of non-negative numbers. `Discrete(3)` means that the action can take values `{0, 1, 2}`.
- `Box` space represents an n-dimensional box, so valid observations will be an array of numbers. You can also check the Box’s bounds.

In [None]:
print(env.action_space)
print(env.observation_space)
print(env.observation_space.high)
print(env.observation_space.low)

Discrete(2)
Box(-3.4028234663852886e+38, 3.4028234663852886e+38, (4,), float32)
[4.8000002e+00 3.4028235e+38 4.1887903e-01 3.4028235e+38]
[-4.8000002e+00 -3.4028235e+38 -4.1887903e-01 -3.4028235e+38]


In [None]:
# Sample action or observation

print(env.observation_space.sample())
print(env.action_space.sample())

[ 3.0832727e+00 -8.5805008e+37  9.8104283e-02  3.1468467e+38]
1


## Understanding env.step

As mentioned in the documentation page, each environment is separated into different episodes, with `done=True` indicating that the specific episode has ended. Thus, we need to call reset there. For this, we need to understand what `env.step(action)` does and returns. `env.step(action)` takes the next step in the environment by performing the action specified by `action` and returns a tuple:
- **observation**: This is environment specific and represents our observation of the environment after taking the action specified in `env.step(action)`.
- **reward**: The reward we received upon performing the action.
- **done**: This is the parameter we discussed about. We need to monitor this and call `env.reset()` when `done=True`.
- info: Additional information for debugging

Here’s a bare minimum example of getting something running. This will run an instance of the CartPole-v0 environment for 1000 timesteps, rendering the environment at each step. You should see a window pop up rendering the classic cart-pole problem:

In [None]:
from time import sleep
env = gym.make('CartPole-v0')
for i_episode in range(5):
    print("\n episode: ", i_episode)
    observation = env.reset()
    for t in range(100):
        env.render()
        sleep(0.03)  
        action = env.action_space.sample()
        #print(t, observation, action)
        observation, reward, done, info = env.step(action)
        
        if done:
            print('Episode #%d finished after %d timesteps' % (i_episode, t))
            
            break
env.close()


 episode:  0
Episode #0 finished after 16 timesteps

 episode:  1
Episode #1 finished after 21 timesteps

 episode:  2
Episode #2 finished after 20 timesteps

 episode:  3
Episode #3 finished after 12 timesteps

 episode:  4
Episode #4 finished after 17 timesteps
