# RemoteRL Quick-Start: Train a CartPole Agent Remotely with Stable-Baselines3

## Introduction

> In this notebook, we will train a classic **CartPole-v1** reinforcement learning agent remotely using Stable-Baselines3. This means the environment (CartPole) will run on a separate process or remote machine (the *simulator*), while the training algorithm runs on your main processor or local machine (the *trainer*).
> **RemoteRL** makes this easy: just call `remoterl.init()` in your code to link the trainer and simulator through the RemoteRL cloud service. Once connected, every Gym environment command from the trainer (e.g. `reset`, `step`) executes on the remote simulator seamlessly, with no manual networking setup.

1. **Setup: Install Dependencies**
2. **Get an API Key**
3. **Launch a Remote Simulator**
4. **Start the Trainer and Train the Agent**

## 1. Setup: Install Dependencies

In [1]:
!pip install --quiet remoterl stable-baselines3
import remoterl, stable_baselines3 as sb3
print(remoterl.__all__)

['init', 'shutdown']


*The command above installs the RemoteRL python library and Stable-Baselines3.*


## 2. Get an API Key

RemoteRL requires an API key to connect to its cloud service. If you don't have one, you can create a free account on the RemoteRL website to obtain your API key.

*Each free account includes **1 GB** of data per month (approximately 1 million CartPole steps).*

After signing up, copy your API key. *The next code cell will open your RemoteRL dashboard in a browser (if it isn’t already open) and prompt you to enter your API key into this notebook.*

In [2]:
import os, webbrowser
DASHBOARD_URL = "https://remoterl.com/user/dashboard"

def register_api_key(open_browser=True):
    if open_browser:
        webbrowser.open_new_tab(DASHBOARD_URL)
        key = input("Paste your REMOTERL API key: ").strip()
        os.environ["REMOTERL_API_KEY"] = key
    else:
        key = os.getenv("REMOTERL_API_KEY")
    if not key:
        print(f"Please visit {DASHBOARD_URL} to get your API key.")
        raise RuntimeError("API key required.")

    print("✅ RemoteRL registered. Happy training!")
    return key

In [3]:
# If the browser window doesn’t open automatically, visit the dashboard:
DASHBOARD_URL = "https://remoterl.com/user/dashboard"

key = register_api_key(open_browser=True)

✅ RemoteRL registered. Happy training!



## 3. Launch a Remote Simulator

Now we'll launch a **simulator** process to host the CartPole environment remotely. In this demo, the simulator will run as a separate background process on this machine. (In a real-world scenario, you could run the simulator on another machine or in the cloud just as easily.)

Once the simulator starts, it will connect to the RemoteRL service and wait for the trainer to join. You should see log output in the cell indicating that the simulator is connected and ready to handle environment steps.

In [None]:
# ── Cell A · starts an isolated simulator (works on Windows, Linux, macOS) ──
import os, sys, textwrap, subprocess, time
import threading

# ------------------------------------------------------------------
# 1️⃣  Grab API key from the environment (set earlier by register_api_key)
# ------------------------------------------------------------------
API_KEY = os.getenv("REMOTERL_API_KEY", key)

# ------------------------------------------------------------------
# 2️⃣  Build one‑liner Python code that will run inside the child process
#     (`remoterl.init(..., role="simulator")` is intentionally blocking)
# ------------------------------------------------------------------
sim_code = textwrap.dedent(f"""
    import remoterl
    
    remoterl.init(api_key='{API_KEY}', role='simulator')   # blocks here(Simulator init designed to be blocking)
""")

# ------------------------------------------------------------------
# 3️⃣  Spawn the simulator subprocess (same Python executable, unbuffered)
#     - stdout/stderr are piped so we can echo logs back in real time
# ------------------------------------------------------------------
sim_proc = subprocess.Popen(
    [sys.executable, "-u", "-c", sim_code],
    stdout=subprocess.PIPE,            # stream simulator logs to parent
    stderr=subprocess.STDOUT,
    text=True,
)

# ------------------------------------------------------------------
# 4️⃣  Background thread: continuously forward simulator output
# ------------------------------------------------------------------
def stream_logs(proc):
    for line in iter(proc.stdout.readline, ''):   # keep reading until EOF
        if line:
            print(f"[sim] {line.rstrip()}")

# ------------------------------------------------------------------
# 5️⃣  Confirmation + optional head‑start delay
# ------------------------------------------------------------------
log_thread = threading.Thread(target=stream_logs, args=(sim_proc,), daemon=True)
log_thread.start()

print(f"🚀  Simulator subprocess started (pid={sim_proc.pid})")

time.sleep(10)                          # give it a head-start

🚀  Simulator subprocess started (pid=38004)
[sim] [38;5;71m[RemoteRL] Simulator started, waiting for connection to Trainers...[0m


[sim] [38;5;71m[RemoteRL] Connected | trainer=cbb65cd2[0m
[sim] [38;5;71m[RemoteRL] Session started | trainer=cbb65cd2 | num_env_runners=2[0m
[sim] [38;5;71m[RemoteRL] Remote Environment | seq=1 make:{'0': 'CartPole-v1'} | runner=0[0m
[sim] [38;5;71m[RemoteRL] Remote Environment | seq=1 make:{'0': 'CartPole-v1'} | runner=1[0m
[sim] [38;5;71m[RemoteRL] Remote Environment | seq=8 make:{'7': 'CartPole-v1'} | runner=0[0m
[sim] [38;5;71m[RemoteRL] Remote Environment | seq=8 make:{'7': 'CartPole-v1'} | runner=1[0m
[sim] [38;5;71m[RemoteRL] Remote Environment | seq=64 step | runner=0[0m
[sim] [38;5;71m[RemoteRL] Remote Environment | seq=64 step | runner=1[0m
[sim] [38;5;71m[RemoteRL] Remote Environment | seq=512 reset | runner=0[0m
[sim] [38;5;71m[RemoteRL] Remote Environment | seq=512 reset | runner=1[0m


## 4. Start the Online Trainer

Finally, we start the **trainer** process to train our agent using the remote environment. We’ll use the **PPO** algorithm (Proximal Policy Optimization) from Stable-Baselines3 to train the CartPole agent. The code below will initialize the trainer session (using your API key and connecting to RemoteRL), create a parallelized CartPole environment (32 simultaneous instances running on the remote simulator to speed up training), and then begin training the agent.

When you run this cell, you will see training progress logs (from Stable-Baselines3) appear in the output. After about 20,000 timesteps, the training will complete and you should see a "✅ Training finished." message, indicating that our remote training demo is successfully finished.

In [5]:
# ── Cell B ──────────────────────────────────────────────────────────
import remoterl, os
from stable_baselines3 import PPO
from stable_baselines3.common.env_util import make_vec_env
# ------------------------------------------------------------------
# 1️⃣ Retrieve API key injected by the helper or set in the shell
# ------------------------------------------------------------------
API_KEY = os.getenv("REMOTERL_API_KEY", key)

# ------------------------------------------------------------------
# 2️⃣ Connect to the RemoteRL backend in trainer mode
# ------------------------------------------------------------------
if not remoterl.init(api_key=API_KEY, role="trainer"):
    raise RuntimeError("Failed to connect to RemoteRL.")

# ------------------------------------------------------------------
# 3️⃣ Build a vectorised CartPole environment (32 parallel instances)
# ------------------------------------------------------------------
ENV_ID = "CartPole-v1"
env     = make_vec_env(ENV_ID, n_envs=32)

# ------------------------------------------------------------------
# 4️⃣ Instantiate PPO with a modest network architecture
# ------------------------------------------------------------------
model = PPO(
    policy="MlpPolicy",
    env=env,
    policy_kwargs=dict(net_arch=dict(pi=[128, 64], vf=[128, 64])),
    n_steps=64, n_epochs=4, batch_size=64, verbose=1, device="auto",
)

# ------------------------------------------------------------------
# 5️⃣ Train for roughly 20k environment steps
# ------------------------------------------------------------------
model.learn(total_timesteps=20_000)

# ------------------------------------------------------------------
# 6️⃣ Graceful shutdown of envs and confirmation message
# ------------------------------------------------------------------
env.close()
print("✅ Training finished.")

[38;5;71m[RemoteRL] Session opened | trainer=cbb65cd2 | num_env_runners=2 | num_workers=1[0m
[94m[RemoteRL] Remote Gym enabled with 1 workers and 2 runners.[0m
[94m[RemoteRL] Remote Stable-Baselines3 applied.[0m
Using cpu device
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 20.2     |
|    ep_rew_mean     | 20.2     |
| time/              |          |
|    fps             | 139      |
|    iterations      | 1        |
|    time_elapsed    | 14       |
|    total_timesteps | 2048     |
---------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 24.1        |
|    ep_rew_mean          | 24.1        |
| time/                   |             |
|    fps                  | 133         |
|    iterations           | 2           |
|    time_elapsed         | 30          |
|    total_timesteps      | 4096        |
| train/                  |             |
|    