# Open Gym

This notebook contains an introduction to [opengym](https://gym.openai.com/), an accessible and flexible framework for problems in reinforcement learning. 

A large part of the material from this notebook is modified from the excellent [official tutorial](http://gym.openai.com/docs/).

As guiding example, we rely on the [**cartpole game** ](https://github.com/openai/gym/wiki/CartPole-v0). The goal of this game is to stabilize a rod attached on one end to a movable cart by carefully moving the cart.

<img src="https://pytorch.org/tutorials/_images/cartpole1.gif" alt="Drawing" style="width: 700px;"></img>
Source: https://pytorch.org/tutorials/

## Environment

The core concept of opengym is that of *environment* that an agent can interact with.

In [1]:
import gym 
import numpy as np

env = gym.make('CartPole-v0')

The two most central components of an environment are the **observation space** and the **action space**. 

The **observation space** formalizes what information an agent can draw from the environment. In the cartpole example observations are four-dimensional vectors describing the location and velocity of the cart, and angle and angular velocity of the pole.

The **action space** formalizes how the agent may interact with the environment. In the cartpole example there are only two possible actions: pushing the car to the left or to the right.

In [None]:
print(env.observation_space) 
print(env.observation_space.sample())
print(env.action_space) 
print(env.action_space.sample())

For instance, the agent can interact with the environment by performing repeatedly random actions until the environment terminates. A sequence of action until termination is called **episode**.

Each time an action is performed, an agent receives feedback from the environment, namely the current **observed state**, the **reward** for the action and information about whether the environment has terminated. In the cartpole example, the environment terminates if the angle of the rod deviates too much from the vertical position. The agent receives a reward of 1 per step as long as the environment has not terminated.

In [None]:
steps = 100
seed = 42

env.seed(seed)
env.reset()

for _ in range(steps):
    env.render()
    a = env.action_space.sample()
    s, r, done, _ = env.step(a)
    print(s, r, done)
env.close()

## Second Example: Space Invaders

As a second example, let's move to `Space Invaders`.

In [16]:
env = gym.make('SpaceInvaders-v0')

The observation space is much larger than in the cartpole example. Indeed, the observation space is now an RGB $210 \times 160$-RGB pixel image. Since ``Space Invaders`` is part of the atari family, there are 6 possible actions corresponding to four directions and two buttons. However, only three of the actions have an effect, namely ``left``, ``right`` and ``shoot``.

In [None]:
print(env.observation_space)
print(env.action_space)

Again, we can perform random actions.

In [None]:
steps = int(8e2)
seed = 42

env.seed(seed)
env.reset()
for _ in range(steps):
    env.render()
    a = env.action_space.sample()
    s, r, done, _ = env.step(a)
env.close()

print(r, done)