## Cart Pole Reinforcment Learning 

This tutorial will go over the cart pole example in Open AI Gym. Gym is a library of many different environments where AI can be created to optimize a task. The [cart pole](https://github.com/openai/gym/wiki/CartPole-v0) tasks goal is to balance a rod just like the case of an inverted pendulum problem

In [1]:
import gym

First we make the gym environment where you can specify whatever environment you would like to use. In this environment every frame we can pick an action to perform and see the observations as a result of this action. Through the env action space one can see that there are 2 discrete actions that can occur for this example going right(1) or going left(0).

In [2]:
env = gym.make('CartPole-v0')
print(f'The Number of actions are {env.action_space}')
print(f'The Data Types of actions are {env.action_space.dtype}')

[33mWARN: gym.spaces.Box autodetected dtype as <class 'numpy.float32'>. Please provide explicit dtype.[0m
The Number of actions are Discrete(2)
The Data Types of actions are int64


Setup a monitor which will record the results of the actions taken. 

In [3]:
env=gym.wrappers.monitor.Monitor(env, '/home/anthony/Documents/cart_pole/', force=True)

Reset will set the environment to an initial state from which you can perform various actions. Initially reset will return to the initial observation state. These different observation states are: 0) Cart Position 1) Cart Velocity 3) Pole Angle 4) Pole Velocity Tip. For each of these actions a reward is given if the cart has not gone into a terminal state. This terminal state is triggered when the pole angle is greater than $12^\circ$, the cart position is out of the display or the episode length ranges past 200. The goal in the end is to get a reward greater than 195 over 100 trials

In [4]:
env.reset()

array([-0.0157479 , -0.00429214, -0.03903011, -0.04250227])

The bellow is an example of a loop for performing actions and rendering the results. For each step function the observations, reward, termination boolean and meta information is returned. The input to the step function is the given action to take, for this example a random action is taken so the environment randomly samples from the possible action space. After performing this action one can render the results to see things occur with time. If the termination criteria is met then the simulation is terminated, else it continues performing another random action.

In [5]:
for i in range(1,100):
    print(f'Step {i}')
    ob, reward, done, meta=env.step(env.action_space.sample())
    env.render()
    if done:
        break

Step 1
Step 2
Step 3
Step 4
Step 5
Step 6
Step 7
Step 8
Step 9
Step 10
Step 11
Step 12


This will cleanup the environement

In [6]:
env.close()