# Random Baseline Cartpole 

In [1]:
import gymnasium as gym
from pathlib import Path
from gymnasium.wrappers import RecordVideo
from datetime import datetime
from stable_baselines3.common.monitor import Monitor
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [2]:
docs_path = Path("../../documentation/cartpole/random-baseline") # ../ makes it so it writes to a directory one back from current one

run_id = datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
run_path = docs_path / f"run_{run_id}"

video_dir = run_path / "videos"
graphs_dir  = run_path / "graphs"
report_file = run_path / "random_baseline.md"
monitor_dir = run_path / "monitor"

docs_path.mkdir(parents=True, exist_ok=True)
graphs_dir.mkdir(parents=True, exist_ok=True)
docs_path.mkdir(parents=True, exist_ok=True)
video_dir.mkdir(parents=True, exist_ok=True)
monitor_dir.mkdir(parents=True, exist_ok=True)

monitor_file = str(monitor_dir) #/ "monitor_log.csv"


## Demo Run: 1 episode

In [3]:
env = gym.make("CartPole-v1", render_mode="rgb_array") 
env = RecordVideo(
    env,
    video_folder=str(video_dir),
    episode_trigger=lambda e: True,
    name_prefix="cartpole_random_baseline" 
)

In [4]:
# Reset environment
observation, info = env.reset(seed=42)

print(f"Action space: {env.action_space}") #discrete(2), can only go left or right
print(f"Observation space: {env.observation_space}")  # Box(4D values), essentially all the agent can see
print(f"Starting observation: {observation}")
print(f"maxiumum number of steps per episode: {env.spec.max_episode_steps}")

In [5]:
step = 0
total_reward = 0
episode_over = False

In [6]:
labels = ["cart position", "cart velocity", "pole angle", "pole angular velocity"]

while not episode_over:
    step += 1
    action = env.action_space.sample()  
    observation, reward, terminated, truncated, info = env.step(action)

    total_reward += reward
    episode_over = terminated or truncated

    print(f"Step {step}:")
    print(f"Action taken: {action}")

    for label, observe in zip(labels,observation):
        print(f"{label}: {observe}")

    print(f"Reward: {reward}")
    print(f"Terminated: {terminated}, Truncated: {truncated}")
    print("-" * 50)

print(f"Episode finished! Total reward: {total_reward}")
env.close()

In [7]:
obs_explanation = """\
**Observation vector (4 values):**
1. **Cart Position (m)** â€” horizontal position on the track (â‰ˆ -4.8 to +4.8).
2. **Cart Velocity (m/s)** â€” how fast the cart moves (unbounded float in practice).
3. **Pole Angle (rad)** â€” tilt of the pole relative to vertical (â‰ˆ -0.4189 to +0.4189 rad â‰ˆ Â±24Â°).
4. **Pole Angular Velocity (rad/s)** â€” how fast the pole is rotating (unbounded float in practice).
"""

failure_conditions = """\
**Episode ends when (termination/truncation):**
- **Pole tilt exceeds Â±0.4189 rad (~Â±24Â°)** â†’ `terminated = True`
- **Cart position leaves track bounds (â‰ˆ Â±4.8 m)** â†’ `terminated = True`
- **Time limit of 500 steps is reached** â†’ `truncated = True`
"""

with open(report_file, "w")as f:
    f.write("# SCRUM-15: Researching Cartpole test write\n\n")
    f.write("## Environment Details\n")
    f.write(f"- Action space: {env.action_space}\n")
    f.write(f"- Observation space: {env.observation_space}\n")
    f.write(f"- Maximum steps per episode: {env.spec.max_episode_steps}\n\n")

    f.write("## Observation Meaning\n")
    f.write(obs_explanation + "\n")

    f.write("## Failure Conditions\n")
    f.write(failure_conditions + "\n")


    f.write("## Example Run\n")
    f.write(f"- Starting observation: {observation.tolist()}\n")
    f.write(f"- Total reward: {total_reward}\n")



## Experimental Run for Analysis ( 100 episodes )

In [None]:
def run_random_baseline_with_monitor(video_dir, monitor_file, num_episodes=100, seed=42):
    print("\nRunning Random Baseline logging rewards per episode...")

    monitor_path = Path(monitor_file)
    if monitor_path.is_dir():
        monitor_path = monitor_path

    env = gym.make("CartPole-v1", render_mode="rgb_array")
    env = Monitor(env, str(monitor_path))
    env = RecordVideo(
        env,
        video_folder=str(video_dir),
        episode_trigger=lambda e: e < 5,
        name_prefix="cartpole_random_baseline"
    )

    rewards = []
    for ep in range(num_episodes):
        obs, info = env.reset(seed=seed + ep)
        done = False
        total_reward = 0
        while not done:
            action = env.action_space.sample()
            obs, reward, terminated, truncated, info = env.step(action)
            done = terminated or truncated
            total_reward += reward
        rewards.append(total_reward)
        print(f"Episode {ep + 1}: reward = {total_reward}")

    env.close()
    print(f"Logged rewards to {monitor_path}")
    return np.mean(rewards)
    
run_random_baseline_with_monitor(video_dir, monitor_file, num_episodes=100, seed=42)

In [None]:
def plot_random_baseline_curve(monitor_file, graphs_dir):
    monitor_path = Path(monitor_file)

    if monitor_path.is_dir():
        monitor_path = monitor_path/ "monitor.csv"

    if not monitor_path.exists():
        raise FileNotFoundError(f"No monitor file found at {monitor_path}")

    df = pd.read_csv(monitor_path, skiprows=1)  # this skips the moitor header at the top
    plt.figure(figsize=(10, 6))
    plt.plot(df["r"], label="Reward per Episode", color="tab:red", alpha=0.7)
    plt.xlabel("Episodes")
    plt.ylabel("Total Reward")
    plt.title("Random Baseline Learning Curve (CartPole-v1)")
    plt.grid(True, linestyle="--", alpha=0.6)
    plt.legend()
    out = Path(graphs_dir) / "random_baseline_curve.png"
    plt.savefig(out, dpi=200, bbox_inches="tight")
    plt.close()

    print(f"ðŸ“ˆ Random baseline curve saved â†’ {out}")

plot_random_baseline_curve(monitor_file, graphs_dir)