# Project: Deep Reinforcement Learning - CartPole Environment
**Author:** Sayed Pedram Haeri Boroujeni  
**Position:** PhD Student, Clemson University  
**Affiliation:** Department of Computer Science  
**Email:** shaerib@g.clemson.edu  
**Date Created:** June 10, 2025  
**Last Updated:** June 15, 2025 

## CartPole Environment:
The CartPole-v1 environment models a pole attached by an un-actuated joint to a cart, which moves along a frictionless track. The agent must apply forces to the cart to keep the pole balanced upright. For the CartPole environment, we have a discrete action space and a continuous observation space. 
%%html
<img src="attachment:image.png" width="300" height="400" />

- **State Space**: 
  
  The observation is a 4-dimensional continuous vector consisting of:
  - Cart Position (x)
  - Cart Velocity (ẋ) 
  - Pole Angle (θ)
  - Pole Velocity at Tip (θ̇)
  
  Each of these values is returned as a floating-point number on every step.
  
  <br>
  
- **Action Space**:
  - 0 — Push cart to the left
  - 1 — Push cart to the right
  
  <br>
  
- **Objective**:
  - Keep the pole balanced upright by selecting left or right forces at each time step.
  
  The episode terminates when any of the following occurs:
  - Pole angle exceeds ±12° (≈0.209 rad) from vertical.
  - Cart position exceeds ±2.4 units from the center.
  - 500 time steps have elapsed (the default horizon in CartPole-v1).
  
  <br>
  
- **Reward Signal**:
  - +1 for every step that the pole remains upright (i.e., for each time step before termination).
  - The maximum achievable return per episode is 500.
  
You can create the CartPole environment as follows (please see the basic definitions on OpenAI Gym Evironments in the FrozenLake file):

In [None]:
import gym

In [None]:
env = gym.make("CartPole-v1", render_mode="human")

Since we have continuous state space, env.observation_spac gives you the minimum and maximum bounds of the states; however, you will see a similar result to FrozenLake for the action space.

In [None]:
print("State Space:", env.observation_space)
print("Action Space:", env.action_space)

In [None]:
env.reset()
env.render()
print(env.step(1))

In [None]:
random_action = env.action_space.sample()
next_state, reward, terminated, truncated, info = env.step(random_action)
print(f"  → state={next_state}, reward={reward}, terminated={terminated}, truncated={truncated}")

In [None]:
env.close()

In the following code, we consider 5 episodes to see the agent's movement in the environment.

In [None]:
import gym
import time

env = gym.make("CartPole-v1", render_mode="human")

episode_number = 5

for i in range(episode_number):
    
    print(f"..... Episode {i+1} is started ........")
    
    state, info_ini = env.reset()
    
    print(f"  → state={state}")
    
    while True:
        
        env.render()
        
        action = env.action_space.sample()
        next_state, reward, terminated, truncated, info = env.step(action)
        
        print(f"  → action={action}, state={next_state}, reward={reward}, terminated={terminated}, truncated={truncated}")
        
        done = terminated or truncated
        
        time.sleep(0.1)
        
        if done:
            break
            
env.close()