# Table of Contents
* [1. Introduction](#1.-Introduction)
	* [1.1 What is an OpenAI gym environment object?](#1.1-What-is-an-OpenAI-gym-environment-object?)
		* [1.1.1 Methods:](#1.1.1-Methods:)
		* [1.1.2 Attributes:](#1.1.2-Attributes:)
	* [1.2 Useful Links](#1.2-Useful-Links)
* [2. Environement Examples](#2.-Environement-Examples)
	* [2.1 Algorithms: Copy-v0](#2.1-Algorithms:-Copy-v0)
		* [2.1.1 A heuristic solve!](#2.1.1-A-heuristic-solve!)
	* [2.2 Atari: Breakout-v0](#2.2-Atari:-Breakout-v0)


In [None]:
import gym
import matplotlib.pyplot as plt
import time
from IPython import display
import numpy as np

# 1. Introduction

Here's a practical introduction to OpenAI Gym Environments. You can find a list of the environments here:
* https://gym.openai.com/envs/

The code for them is located in `../gym/envs`

## 1.1 What is an OpenAI gym environment object?

Essentially it is just an object that has some standard methods and attributes

`env = gym.make('Name_of_registered_environment')`

### 1.1.1 Methods:

1. `env.reset()` - starts the environment, may return an initial state
2. `env.render()` - return a visual representation of the state
    * Often has an argument `mode` which can take on values:
        * env.render(mode='human')
        * env.render(mode='rgb_array')
3. `env.step(some_action)` - submit an action to the environment. Returns a tuple:
    * (observation, reward, done, info)
        * observation: new state
        * reward: ...
        * done: if game is episodic, is it finished
        * info: diagnostic information

### 1.1.2 Attributes:

1. `env.action_space` - a space object containing valid actions.
2. `env.observation_space` - a space object containing valid states. 
    * These space objects have some useful methods/attributes themselves:
        * `space.sample()` - return a random sample

## 1.2 Useful Links

* https://gym.openai.com/docs/
* https://gym.openai.com/envs/

# 2. Environement Examples

## 2.1 Algorithms: Copy-v0

In [None]:
env = gym.make('Copy-v0')
env.reset()
env.render()

In [None]:
env.action_space

In [None]:
env.action_space.sample()

In [None]:
env.action_space

In [None]:
env.observation_space

In [None]:
env.observation_space.sample()

In [None]:
env.render()

### 2.1.1 A heuristic solve!

In [None]:
obs = env.reset()  # initial state returned by reset
env.render()
while obs < 5:  # 5 is the null return
    obs, reward, done, _ = env.step((1, 1, obs))
    env.render()

## 2.2 Atari: Breakout-v0

In [None]:
env = gym.make('Breakout-v0')
state = env.reset()
# env.render()  # haven't quite worked out how to get this to work within notebook
                # it will work outside the notebook when call has access to screen
plt.imshow(env.render(mode='rgb_array'))

In [None]:
plt.imshow(state)

In [None]:
env.action_space, env.observation_space

### Random Policy

In [None]:
fig, ax = plt.subplots(1,1)
state = env.reset()
for t in range(100):
    ax.imshow(state)
    display.display(fig)
    display.clear_output(wait=True)
    action = env.action_space.sample()
    state, reward, done, info = env.step(action)
    if done:
        print("Episode finished after {} timesteps".format(t+1))
        break

## Box2D: CarRacing-v0 

Haven't got this one to work inside the notebook yet

In [None]:
env = gym.make('CarRacing-v0')
# state = env.reset()
# env.render()  # haven't quite worked out how to get this to work within notebook
                # it will work outside the notebook when call has access to screen
# env.render(mode='rgb_array')