# Getting Started with OpenAI Gym
## Installation

Can either use pip or install directly from the github source repository

In [3]:
# install from pip
# [all] installs more dependencies environments may need
!pip install gym[all]

Collecting atari-py==0.2.6
  Using cached atari_py-0.2.6-cp38-cp38-linux_x86_64.whl
Collecting box2d-py~=2.3.5
  Using cached box2d_py-2.3.8-cp38-cp38-linux_x86_64.whl
Collecting mujoco-py<2.0,>=1.50
  Using cached mujoco-py-1.50.1.68.tar.gz (120 kB)
  Preparing metadata (setup.py) ... [?25l- \ done
Reason for being yanked: re-release with new wheels[0m
Building wheels for collected packages: mujoco-py
  Building wheel for mujoco-py (setup.py) ... [?25l- \ | error
[31m  ERROR: Command errored out with exit status 1:
   command: /home/snakeeye10/anaconda3/bin/python -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-b9y7erg_/mujoco-py_48170f54c4284fe8b25ca8dbd068977e/setup.py'"'"'; __file__='"'"'/tmp/pip-install-b9y7erg_/mujoco-py_48170f54c4284fe8b25ca8dbd068977e/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code =

In [None]:
# Or Build from Source
# From git repository
!git clone https://github.com/openai/gym
!cd gym
!pip install -e .

## About

OpenAI Gym is about providing environments for reinforcement learning.
Doesn't make any assumptions of the agent you use.
Provides a general interface to access different environments.

Helps provide better benchmarks

Helps to standardize Environments

In [23]:
# Simple example
# Control problem of the CartPole
import gym
env = gym.make("CartPole-v0")
env.reset()
max_steps = 100
for _ in range(max_steps):
    env.render()
    env.step(env.action_space.sample())
env.close()

## Environment Basics

### Observations
Step function returns 4 values just like in our Assignment1

1: An observation is an object that represents some state of the environment.
This could be pixel data, angles of joints on a robot, or a state of the board in a board-game.

2: Reward is a float amount achieved by previous action.

3: Done boolean of when the episode is done and it is time for the environment to reset.

4: Info is a dict which can have information that can be useful for debugging. The agent shouldn't use this info.


### Spaces

Every environment has an action space and an observation space.

In [9]:
env = gym.make("CartPole-v0")
print(env.action_space)
print(env.observation_space)

Discrete(2)
Box([-4.8000002e+00 -3.4028235e+38 -4.1887903e-01 -3.4028235e+38], [4.8000002e+00 3.4028235e+38 4.1887903e-01 3.4028235e+38], (4,), float32)


The discrete space allows non-negative numbers.

The box space is an n-dimensional box.
Observations are arrays of 4 numbers.

In [10]:
# Check the box's bounds

print(env.observation_space.high)

print(env.observation_space.low)

[4.8000002e+00 3.4028235e+38 4.1887903e-01 3.4028235e+38]
[-4.8000002e+00 -3.4028235e+38 -4.1887903e-01 -3.4028235e+38]


Can sample a space

In [13]:
from gym import spaces
# Set w/ 10 elements 0 - 9
space = spaces.Discrete(10)
# n is size of space
print(space.n)
# sample a possible value
x = space.sample()
print(x)

10
8


## Types of Environments

### Classic control (i.e. Cartpole)
### Toytext

Both of these are easy places to get started.
Problems seen in a lot of RL Literature.
A number of examples from our textbook are available here

In [18]:
# Toy-text Example
# Blackjack: (Goal: get as close to 21 without going over)
# Observations are 3-tuple of players current sum, dealer's showing card, and whether the player holds a usable ace
# reward is outcome win, lose, draw (+1, 0 -1)
rewards = {
    1: "Win",
    0: "Draw",
    -1: "Lose"
}
# actions are hit=1 or stick=0
actions = {
    0: "Hit",
    1: "Stick"
}
text_env = gym.make("Blackjack-v0")
text_env.reset()
max_steps = 10
for i in range(max_steps):
    done = False
    print(f"Game {i} =========\n")
    while not done:
        action = text_env.action_space.sample()
        print(actions[action])
        obs, reward, done, info = text_env.step(action)
        print(obs)
        print(rewards[reward])

    text_env.reset()

text_env.close()


Hit
(21, 1, True)
Win

Stick
(13, 2, False)
Draw
Stick
(17, 2, False)
Draw
Stick
(27, 2, False)
Lose

Hit
(14, 9, True)
Lose

Hit
(13, 4, False)
Lose

Stick
(17, 10, False)
Draw
Stick
(27, 10, False)
Lose

Hit
(17, 10, False)
Draw

Stick
(15, 7, False)
Draw
Stick
(21, 7, False)
Draw
Hit
(21, 7, False)
Win

Stick
(17, 8, False)
Draw
Hit
(17, 8, False)
Lose

Stick
(15, 4, False)
Draw
Stick
(20, 4, False)
Draw
Stick
(30, 4, False)
Lose

Hit
(8, 9, False)
Lose


### Algorithmic
Learn Computational tasks ( like reversing a sequence ) purely from examples

### Atari
Play classic Atari Games
Uses arcade learning environment which uses Stella Atari Emulator
Environments often come with 2 versions one with RAM as input and the other with pixels as input

In [5]:
import gym
# Example: Pitfall
# Input: RGB pixels of screen image (shape: 210, 160, 3)

atari_env = gym.make("Pitfall-v0")
atari_env.reset()
max_steps = 5000
for _ in range(max_steps):
    atari_env.render()
    atari_env.step(atari_env.action_space.sample())
atari_env.close()

### 2D and 3D Robots
Uses MuJoCo physics engine

## Registry

To list available environments
use the registry

In [14]:
from gym import envs
print(envs.registry.all())

dict_values([EnvSpec(Copy-v0), EnvSpec(RepeatCopy-v0), EnvSpec(ReversedAddition-v0), EnvSpec(ReversedAddition3-v0), EnvSpec(DuplicatedInput-v0), EnvSpec(Reverse-v0), EnvSpec(CartPole-v0), EnvSpec(CartPole-v1), EnvSpec(MountainCar-v0), EnvSpec(MountainCarContinuous-v0), EnvSpec(Pendulum-v0), EnvSpec(Acrobot-v1), EnvSpec(LunarLander-v2), EnvSpec(LunarLanderContinuous-v2), EnvSpec(BipedalWalker-v3), EnvSpec(BipedalWalkerHardcore-v3), EnvSpec(CarRacing-v0), EnvSpec(Blackjack-v0), EnvSpec(KellyCoinflip-v0), EnvSpec(KellyCoinflipGeneralized-v0), EnvSpec(FrozenLake-v1), EnvSpec(FrozenLake8x8-v1), EnvSpec(CliffWalking-v0), EnvSpec(NChain-v0), EnvSpec(Roulette-v0), EnvSpec(Taxi-v3), EnvSpec(GuessingGame-v0), EnvSpec(HotterColder-v0), EnvSpec(Reacher-v2), EnvSpec(Pusher-v2), EnvSpec(Thrower-v2), EnvSpec(Striker-v2), EnvSpec(InvertedPendulum-v2), EnvSpec(InvertedDoublePendulum-v2), EnvSpec(HalfCheetah-v2), EnvSpec(HalfCheetah-v3), EnvSpec(Hopper-v2), EnvSpec(Hopper-v3), EnvSpec(Swimmer-v2), EnvSp

Gives list of envSpec objects
These define the parameters for a task

Can add your own environments to the registry to make them available

## References

OpenAI Gym Documentation
https://gym.openai.com/docs/