

## CartPole-v1 Environment

In this notebook, we use the widely used CartPole environment version 1.


![CartPole](../assets/cart_pole.gif)

**Action Space** - Type : ```Discrete(2)```

- ```0``` : Push cart to the left
- ```1``` : Push cart to the right

**Observation Space** - Type : ```Box(-num, num, (4,), float32)```

- Cart Position ```(-4.8, 4.8)```
- Cart Velocity ```(-inf, inf)```
- Pole Angle ```(-0.41, 0.41)```
- Pole Velocity At Tip ```(-inf, inf)```

**Reward** - Type : ```float32```

It always returns ```1.0``` as reward.<br>
If completely succeeded, you can then take max ```500.0``` rewards in a single episode, because a single episode will be truncated on max ```500``` actions.

**Done Flag (Termination and Truncation)** - Type : ```bool```

It returns the following 2 types of done flag, which is used to check whether the episode is done or not.

- Termination flag : When the agent fails and cannot work any more, termination flag is ```True```, otherwise ```False```.
- Truncation flag : When the agent reaches to max 500 actions (successful at final action), truncation flag is ```True```, otherwise ```False```. (The agent cannot work any more, also in this case.)

**Sample Code to run CartPole**

Here is the sample source code to run CartPole agent.

In [4]:
import gymnasium as gym
import random

def pick_sample():
  return random.randint(0, 1)

env = gym.make("CartPole-v1",render_mode='human')
for i in range(1):
  print("start episode {}".format(i))
  done = False
  s, _ = env.reset()
  while not done:
    a = pick_sample()
    s, r, term, trunc, _ = env.step(a)
    done = term or trunc
    print("action: {},  reward: {}".format(a, r))
    print("state: {}, {}, {}, {}".format(s[0], s[1], s[2], s[3]))

#env.close()



start episode 0
action: 1,  reward: 1.0
state: 0.04298793524503708, 0.17315316200256348, -0.013376884162425995, -0.29574650526046753
action: 1,  reward: 1.0
state: 0.04645099863409996, 0.3684632182121277, -0.019291814416646957, -0.5926181077957153
action: 0,  reward: 1.0
state: 0.05382026359438896, 0.17361657321453094, -0.031144175678491592, -0.306073933839798
action: 1,  reward: 1.0
state: 0.057292595505714417, 0.36916816234588623, -0.037265654653310776, -0.6084139943122864
action: 1,  reward: 1.0
state: 0.0646759569644928, 0.5647907257080078, -0.049433935433626175, -0.9125977158546448
action: 1,  reward: 1.0
state: 0.07597177475690842, 0.7605453729629517, -0.06768588721752167, -1.2203985452651978
action: 1,  reward: 1.0
state: 0.09118267893791199, 0.9564712047576904, -0.09209386259317398, -1.5334988832473755
action: 1,  reward: 1.0
state: 0.11031210422515869, 1.1525741815567017, -0.12276383489370346, -1.8534440994262695
action: 1,  reward: 1.0
state: 0.13336358964443207, 1.3488132953

In [5]:
env.close()