## Classic Control

### CartPole (v1)

_A pole is attached by an un-actuated joint to a cart, which moves along a frictionless track.  
The system is controlled by applying a force of +1 or -1 to the cart.  
The pendulum starts upright, and the goal is to prevent it from falling over.  
A reward of +1 is provided for every step that the pole remains upright.  
The episode ends when the pole is more than 15 degrees from vertical, or the cart moves more than 2.4 units from the center._

[Environment Source code (GitHub)](https://github.com/openai/gym/blob/master/gym/envs/classic_control/cartpole.py)

In [5]:
import gym

In [6]:
# Create an instance of environment
env = gym.make("CartPole-v1")

print("Observation space type:", type(env.observation_space))
print("Observation space size:", env.observation_space.shape, "*", env.observation_space.dtype)
print("Observation space max values:", env.observation_space.high)
print("Observation space min values:", env.observation_space.low)
print("Reward range:", env.reward_range)

Observation space type: <class 'gym.spaces.box.Box'>
Observation space size: (4,) * float32
Observation space max values: [4.8000002e+00 3.4028235e+38 4.1887903e-01 3.4028235e+38]
Observation space min values: [-4.8000002e+00 -3.4028235e+38 -4.1887903e-01 -3.4028235e+38]
Reward range: (-inf, inf)


### Observation Space - _Box(4)_

| Index | Observation Type     | Min Value           | Max Value         |
| :---: | -------------------- | :-----------------: | :---------------: |
| 0     | Cart Position        | -4.8                | 4.8               |
| 1     | Cart Velocity        | _-Inf_              | _Inf_             |
| 2     | Pole Angle           | -0.41887 rad (-24°) | 0.41887 rad (24°) |
| 3     | Pole Velocity At Tip | _-Inf_              | _Inf_             |

### Action Space - _Discrete(2)_

| Index | Action Type            |
| :---: | ---------------------- |
| 0     | Push Cart to the Left  |
| 1     | Push Cart to the Right |

### Reward
Reward is 1 for every step taken, including the termination step. The threshold is 475.

### Starting State
![image](https://user-images.githubusercontent.com/53531617/148705742-ec6e0832-1f2d-422b-adce-9f1af5a264dc.png)

### Termination Conditions
1) Pole Angle is more than ±12°  
2) Cart Position is more than ±2.4 (center of the cart reaches the edge of the display)  
3) Episode length is greater than 200 (500 for v1)  

### Solved Requirements
Considered solved when the average reward is greater than or equal to 195.0 over 100 consecutive trials.

In [7]:
def play(agent):
    # Initialize the environment state (uniform random value between ±0.05)
    current_state = env.reset()
    
    for _ in range(500):
        action = agent.get_action()
        env.step(action)
        env.render()

In [8]:
from RandomAgent import *

random_agent = RandomAgent(env)
play(random_agent)


  logger.warn(
