# 9. On Policy Prediction with Approximation
## CartPole-v0

### Overview:
[CartPole-v0](https://github.com/openai/gym/wiki/CartPole-v0)
A pole is attached by an un-actuated joint to a cart, which moves along a frictionless track. The pendulum starts upright, and the goal is to prevent it from falling over by increasing and reducing the cart's velocity.

### State (Observation)
| num | observation          | min      | max     |
|----:|:---------------------|---------:|--------:|
| 0   | Cart Position        | -2.4     | 2.4     |
| 1   | Cart Velocity        | -Inf     | Inf     |
| 2   | Pole Angle           | ~ -41.8° | ~ 41.8° |
| 3   | Pole Velocity at Tip | -Inf     | Inf     |

### Action
| num | action          | 
|----:|:----------------|
| 0   | Push cart left  |
| 1   | Push cart right |

### Reward
1 for every step, including termination. <br />

### Termination
* Pole angle more than ±12°
* Cart Position more than ±2.4
* Episode length > 200

### Solved
Average reward ≥195.0 over 100 episodes

### ...

In [None]:
import gym
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt

In [None]:
env = gym.make("CartPole-v0")
env.seed(0)

In [None]:
def run_episode(render=False, gamma=0.99):
    states             = []
    actions            = []
    raw_reward         = 0
    discounted_reward  = 0
    cumulative_rewards = []
    discount_factor    = 1.0
    episode_length     = 200
    
    observation = env.reset()
    for t in range(episode_length):
        cumulative_reward.append(discounted_reward)
        
        if render:
            env.render()
        action = policy()
        
        # Recording 
        states.append(observation)
        actions.append(action)
        
        # Step
        observation, reward, done, _ = env.step(action)
        
        # Reward handling
        raw_reward        += reward
        discounted_reward += reward * discount_factor
        discount_factor   *= gamma
        
        if done:
            env.close()
            return raw_reward, discounted_reward, cumulative_reward, states, actions