# [OpenAI](http://www.openai.com) [Gym API](http://gym.openai.com/docs/)

#### Topics Covered

* Components of RL 
    * Agent
    * Environment
    * Actions
    * Observations
    * Policy

In [1]:
import random

## The Agent

The **agent** is a person or a thing that takes an active role. The agent is the implementor of the **policy** which decides what action to take at each time step. The agent decides what action to take based on the observation that it recieves from the environment.

In [2]:
# a niave agent that makes a random choice regardless of observations
class Agent:
    def __init__(self):
        self.total_reward = 0.0

    def step(self, env):
        current_obs = env.get_observation()
        actions = env.get_actions()
        reward = env.action(random.choice(actions))
        self.total_reward += reward

## The Environment

The **environment** is a model of the world external to the **agent**. The environment is responsible for providing the agent with **observations** and **rewards**. The environment state will change depending on the agents actions.      
OpenAI Environment class has 2 main attributes `action_space` and `observation_space` as well as two main methods `reset()` and `step()`. Each of the attributes represent their respective spaces. The reset method returns the environment to it's initial state. The step method is the central method of the Environment class and does the following things:               
         
* Takes an input that is the step to be taken and executes it
* Gets new observations after this action
* Gets the reward gained by this step
* Provides and indication that the step is complete

In [6]:
class Environment:
    def __init__(self):
        self.steps_left = 10

    # observations will change based on agent behavior
    # this informs the agents decisions
    def get_observation(self):
        return [0.0, 0.0, 0.0]
    
    # action set should likely change based on the agents actions
    def get_actions(self):
        return [0, 1]

    # likely some 'win condition'
    def is_done(self):
        return self.steps_left == 0

    def action(self, action):
        if self.is_done():
            raise Exception("Game is over")
        self.steps_left -= 1
        return random.random()

In [9]:
# object instantiation
env = Environment()
agent = Agent()

# the NIAVE agent will make random choices for 10 steps
while not env.is_done():
    agent.step(env)

print("Total reward: %.4f" % agent.total_reward)

Total reward: 3.4977


OpenAI ships with tons of pre-build environments to test on. A list can be found [here](https://gym.openai.com/envs).

## Agent Actions

The action space can be either discrete or continous, or a combination of both. `Discrete Action Space` (pushing a button, moving in a grid) only one option is possible at time. `Continous Action Space` (run 9 degrees left, turn a nob 0-1). The environment could also have multiple actions that can be performed simultaneously.

## Observations

Observations are information that the `Environment` provides to the `Agent`. Observations can be as simple as a couple of numbers, or as complex as multiple videos or images.       
OpenAI Observation types are as follows: Discrete, Box, Tuple. `Discrete` is a set of mutually exclusive possibilities. `Box` is an n-dimensional tensor. `Tuple` allows us to group together multiple space classes.

## Gym Wrappers & Monitors

Just know that these exist and help extend OpenAI functionality in generic ways. [readthedocs.io](http://gym.openai.com/docs/)