# Reinforcement Learning!

Reinforcement Learning is the subfield of machine learning concerned with decision making and motor control. It studies how an agent can learn how to achieve goals in a complex, uncertain environment. It's exciting for many reasons, but here are two: 
- Reinforcement Learning is very general, and is all encompassing to problems that involve making a sequence of decisions. For example, controlling a robot's motors so that it's able to run and jump, making business decisions like pricing and inventory management, or playing video games and board games. It can also be applied to supervised machine learning problems with sequential or structured outputs. 
- Reinforcement Learning algorithms have started to achieve good results in many difficult environments. Reinforcement Learning has a long history, but until recent advances in deep learning, it requred lots of problem-specific engineering. DeepMind's Atari results, BRETT from Pieter Abbeel's group, and AlphaGo all used deep Reinforcement Learning which did not make too many assumptions about their environment, and thus can be applied to other settings; making it great for generalzing about new situations. 

BUT, Reinforcement Learning has two factors that are slowing it down (which have been solved by OpenAI's Gym Library):
- The need for better benchmarks. In supervised learning, progress has been driven by large labeled datasets like ImageNet. In Reinforcement Learning, the closest equivalent would be a large and diverse collection of environments. However, the existing open-source collections of Reinforcement Learning environments dont have enough variety, and they are often difficult to set up and use. 
- Lack of standardization of environments used in publications. Subtle differences in the problem definition, such as the reward function or the set of actions, can drastically alter a task’s difficulty. This issue makes it difficult to reproduce published research and compare results from different papers.

In [None]:
#!pip install pygame
#!pip install gym

In [26]:
import gym 
from gym import spaces
from gym import envs

Here is the documentation, use this as reference. I am choosing to use openai's gym environment for the agent: https://gym.openai.com/docs/

If we ever want to do better than take random actions at each step, it’d probably be good to actually know what our actions are doing to the environment.

The environment’s step function returns exactly what we need. In fact, step returns four values. These are:

- observation (object): an environment-specific object representing your observation of the environment. For example, pixel data from a camera, joint angles and joint velocities of a robot, or the board state in a board game.

- reward (float): amount of reward achieved by the previous action. The scale varies between environments, but the goal is always to increase your total reward.

- done (boolean): whether it’s time to reset the environment again. Most (but not all) tasks are divided up into well-defined episodes, and done being True indicates the episode has terminated. (For example, perhaps the pole tipped too far, or you lost your last life.)

- info (dict): diagnostic information useful for debugging. It can sometimes be useful for learning (for example, it might contain the raw probabilities behind the environment’s last state change). However, official evaluations of your agent are not allowed to use this for learning.


This is just an implementation of the classic “agent-environment loop”. Each timestep, the agent chooses an action, and the environment returns an observation and a reward.


In [4]:
env = gym.make("CartPole-v1")
observation = env.reset()
for _ in range(1000):
  env.render()
  action = env.action_space.sample() # your agent here (this takes random actions)
  observation, reward, done, info = env.step(action)

  if done:
    observation = env.reset()
env.close()


New environment, but this time I'll call the done() flag. 
I'll also call action_space and observation_space. These are attributes of the type Space, and they describe the format of valid actions and observations.
The Discrete space allows a fixed range of non-negative numbers, so therefore valid actions are either 0 or 1. 
The Box space represents an n-dimensional box, so valid observations will be an array of 4 numbers; this is useful to help write generic code that will work with many environments. 
Box and Discrete are the most common Spaces; you can sample from a Space or check that something belongs to it too


In [24]:
def checker(n):
    space = spaces.Discrete(n) # Set with 8 elements {0, 1, 2....7}
    x = space.sample()
    space.contains(n)
    space.n == n
    

In [25]:
checker(8)

In [12]:
def cartPolev0():
    env = gym.make("CartPole-v0")
    print(env.action_space)
    print(env.observation_space)
    print(env.observation_space.high)
    print(env.observation_space.low)
    for i_episode in range(20): 
        observation = env.reset()
        for t in range(100):
            env.render()
            print(observation)
            action = env.action_space.sample()
            observation, reward, done, info = env.step(action)
            if done:
                print("Episode finished after {} timesteps".format(t+1))
                break 
    env.close()

In [13]:
cartPolev0()

  logger.warn(


Discrete(2)
Box([-4.8000002e+00 -3.4028235e+38 -4.1887903e-01 -3.4028235e+38], [4.8000002e+00 3.4028235e+38 4.1887903e-01 3.4028235e+38], (4,), float32)
[4.8000002e+00 3.4028235e+38 4.1887903e-01 3.4028235e+38]
[-4.8000002e+00 -3.4028235e+38 -4.1887903e-01 -3.4028235e+38]
[-0.03499534  0.00391542 -0.00860296  0.04378294]
[-0.03491703 -0.19108212 -0.0077273   0.3337392 ]
[-0.03873867 -0.38609326 -0.00105251  0.62397534]
[-0.04646054 -0.5812005   0.01142699  0.9163266 ]
[-0.05808454 -0.3862349   0.02975352  0.62725675]
[-0.06580924 -0.1915406   0.04229866  0.34409082]
[-0.06964006 -0.38723797  0.04918048  0.6498064 ]
[-0.07738481 -0.19283432  0.0621766   0.37300676]
[-0.0812415   0.0013518   0.06963674  0.10055857]
[-0.08121447 -0.19469547  0.07164791  0.41437343]
[-0.08510837 -0.00065832  0.07993538  0.14511089]
[-0.08512154 -0.19682862  0.0828376   0.46190274]
[-0.08905811 -0.39301765  0.09207565  0.7795036 ]
[-0.09691846 -0.1992741   0.10766573  0.5171513 ]
[-0.10090394 -0.3957342   0

#### Environments

Gym comes out of the box with a diverse suit of environments, ranging from easy to difficult and involve many different kinds of data. You'll find a complete list here: https://gym.openai.com/envs/#classic_control

Here are some of those environments: 
- Classic control and toy text: complete small-scale tasks, mostly from the RL literature. They’re here to get you started.

- Algorithmic: perform computations such as adding multi-digit numbers and reversing sequences. One might object that these tasks are easy for a computer. The challenge is to learn these algorithms purely from examples. These tasks have the nice property that it’s easy to vary the difficulty by varying the sequence length.

- Atari: play classic Atari games. We’ve integrated the Arcade Learning Environment (which has had a big impact on reinforcement learning research) in an easy-to-install form.

- 2D and 3D robots: control a robot in simulation. These tasks use the MuJoCo physics engine, which was designed for fast and accurate robot simulation. 

#### The Registry

Gym's main purpose is to provide a large collection of environments that expose a common interface and are versioned to allow for comparisons. You can view the full list with the envs_list function below. This will list all EnvSpec objects, whose parameters are for a particular task, including the number of trials to run and the maximum number of steps. In order to ensure valid comparisons for the future, environments will never be changed in a fashion that affects performance, only replaced by newer versions; and it's easy to add your own environments to the registry, making them available for gym.make(), just register() them at load time. This will help with portability!

In [27]:
def envs_list():
    print(envs.registry.all())

envs_list()

ValuesView(├──CartPole: [ v0, v1 ]
├──MountainCar: [ v0 ]
├──MountainCarContinuous: [ v0 ]
├──Pendulum: [ v1 ]
├──Acrobot: [ v1 ]
├──LunarLander: [ v2 ]
├──LunarLanderContinuous: [ v2 ]
├──BipedalWalker: [ v3 ]
├──BipedalWalkerHardcore: [ v3 ]
├──CarRacing: [ v1 ]
├──Blackjack: [ v1 ]
├──FrozenLake: [ v1 ]
├──FrozenLake8x8: [ v1 ]
├──CliffWalking: [ v0 ]
├──Taxi: [ v3 ]
├──Reacher: [ v2 ]
├──Pusher: [ v2 ]
├──InvertedPendulum: [ v2 ]
├──InvertedDoublePendulum: [ v2 ]
├──HalfCheetah: [ v2, v3 ]
├──Hopper: [ v2, v3 ]
├──Swimmer: [ v2, v3 ]
├──Walker2d: [ v2, v3 ]
├──Ant: [ v2, v3 ]
├──Humanoid: [ v2, v3 ]
└──HumanoidStandup: [ v2 ]
)
