# Introduction to RL:
## Familiarizing myself with OpenAI's Gym

This notebook is meant to serve as a "look up guide" of sorts for myself when I inevitably forget how gym works and how to access certain parts of it that I might need.
<br>
<br>
I am following along [here](https://gym.openai.com/docs/)

<br>

In [3]:
# First import gym
import gym

### Environments:
Environments are the cornerstone of gym, they are, in essence, the entire reason openai decided to make this public. All of the available environments can be found [here](https://gym.openai.com/envs/#classic_control)
<br>
<br>
Environments are all hosted within their envs module

In [4]:
# even though I already imported gym I'll import the envs module to make it easier to understand
from gym import envs

In [7]:
# we can list out all of the available envs
#print(envs.registry.all())

In [21]:
# you can then use any of these environments by running gym.make
env = gym.make('CartPole-v1')

There are 3 main methods to know of the env
1. reset(self): Reset the environment's state. Returns observation.
2. step(self, action): Step the environment by one timestep. Returns observation, reward, done, info.
3. render(self, mode='human'): Render one frame of the environment. The default mode will do something human friendly, such as pop up a window.

<br>

### Spaces:
Before I continue it is useful to understand what the two different spaces are.
1. Action Space
2. Observation Space
Every environment has one of these two spaces.

The most common are **discrete** which is a fixed range of non-negative numbers, and **box** which is an n-dimensional box used for multidimensional continuous spaces with bounds.

So, in layman's terms. The box defines n dimensions of terms that have bounds. For example, in cartpole we have the first dimension being the cart's position which is between [-2.4, 2.4] while the second dimension is the cart's velocity which is between [-inf, inf]

In [22]:
# Lets look at the two spaces in our sample env
action_space = env.action_space
observation_space = env.observation_space

In [23]:
print(action_space)
print(observation_space)

Discrete(2)
Box(4,)


In [24]:
# We can get an idea of our observation spaces bounds by printing out the high's and lows
print(observation_space.high)
print(observation_space.low)

[4.8000002e+00 3.4028235e+38 4.1887903e-01 3.4028235e+38]
[-4.8000002e+00 -3.4028235e+38 -4.1887903e-01 -3.4028235e+38]


In [31]:
# We can sample the action_space for a random action, lets see what happens (should be either 0 or 1)
random_action = action_space.sample()
print(random_action)

1


<br>

### Step function for environment:
The step method is arguably the most important. It contains all of the information our agent will need in order to actually be able to learn from it's environment and make educated decisions
<br>
<br>
The step method takes as input an action and returns a tuple (observation, reward, done, info).
- Observation (object): An environment-specific object representing your observation of the environment.
- Reward (float): Amount of reward achieved by the previous action.
- Done (boolean): Whether it’s time to reset the environment again.
- Info (dict): Diagnostic information useful for debugging.

In [34]:
# lets use the random action from the previous cell to take a step so we can look at each of these individually
observation = env.reset()
new_observation, reward, done, info = env.step(random_action)