# RemoteRL Quick-Start Tutorial

RemoteRL lets you connect environment simulators and RL trainers over the internet via secure WebSockets Cloud Server.
With a one-line `remoterl.init()` call, you can keep environments and trainers on different machines.

**In this notebook you will:**
- Install RemoteRL and dependencies
- Launch a CartPole simulator process
- Train a PPO agent remotely

In [1]:
!pip install --quiet remoterl stable-baselines3
import remoterl, stable_baselines3 as sb3
print(remoterl.__all__)

['init', 'shutdown']


## Get an API Key

Create a free RemoteRL account and obtain your key by running the CLI locally:
```bash
remoterl register
```
Every account includes **1 GB of free credit**.

In [2]:
# Single-call helper that can (optionally) open the RemoteRL dashboard, grab or reuse your API key, 
# stash it in REMOTERL_API_KEY for the current session, and print a success message.

import os, webbrowser
DASHBOARD_URL = "https://remoterl.com/user/dashboard"

def register_api_key(open_browser=True):
    if open_browser:
        webbrowser.open_new_tab(DASHBOARD_URL)
        key = input("Paste your REMOTERL API key: ").strip()
        os.environ["REMOTERL_API_KEY"] = key
    else:
        key = os.getenv("REMOTERL_API_KEY")
    if not key:
        print(f"Please visit {DASHBOARD_URL} to get your API key.")
        raise RuntimeError("API key required.")

    print("✅ RemoteRL registered. Happy training!")
    return key

In [3]:
# If the browser window doesn’t open automatically, visit the dashboard:
DASHBOARD_URL = "https://remoterl.com/user/dashboard"

key = register_api_key(open_browser=True)

✅ RemoteRL registered. Happy training!


## Launch a Remote Simulator

Launches a minimal RemoteRL simulator cluster and waits for trainers to connect.

In [4]:
# ── Cell A · starts an isolated simulator (works on Windows, Linux, macOS) ──
import os, sys, textwrap, subprocess, time
import threading

# ------------------------------------------------------------------
# 1️⃣  Grab API key from the environment (set earlier by register_api_key)
# ------------------------------------------------------------------
API_KEY = os.getenv("REMOTERL_API_KEY", key)

# ------------------------------------------------------------------
# 2️⃣  Build one‑liner Python code that will run inside the child process
#     (`remoterl.init(..., role="simulator")` is intentionally blocking)
# ------------------------------------------------------------------
sim_code = textwrap.dedent(f"""
    import remoterl
    
    remoterl.init(api_key='{API_KEY}', role='simulator')   # blocks here(Simulator init designed to be blocking)
""")

# ------------------------------------------------------------------
# 3️⃣  Spawn the simulator subprocess (same Python executable, unbuffered)
#     - stdout/stderr are piped so we can echo logs back in real time
# ------------------------------------------------------------------
sim_proc = subprocess.Popen(
    [sys.executable, "-u", "-c", sim_code],
    stdout=subprocess.PIPE,            # stream simulator logs to parent
    stderr=subprocess.STDOUT,
    text=True,
)

# ------------------------------------------------------------------
# 4️⃣  Background thread: continuously forward simulator output
# ------------------------------------------------------------------
def stream_logs(proc):
    for line in iter(proc.stdout.readline, ''):   # keep reading until EOF
        if line:
            print(f"[sim] {line.rstrip()}")

# ------------------------------------------------------------------
# 5️⃣  Confirmation + optional head‑start delay
# ------------------------------------------------------------------
log_thread = threading.Thread(target=stream_logs, args=(sim_proc,), daemon=True)
log_thread.start()

print(f"🚀  Simulator subprocess started (pid={sim_proc.pid})")

time.sleep(10)                          # give it a head-start

🚀  Simulator subprocess started (pid=40216)
[sim] [38;5;71m[RemoteRL] Simulator started, waiting for connection to Trainers...[0m


[sim] [38;5;71m[RemoteRL] Connected | trainer=99db70db[0m
[sim] [38;5;71m[RemoteRL] Session started | trainer=99db70db | num_env_runners=2[0m
[sim] [38;5;71m[RemoteRL] Remote Environment | seq=1 make:{'0': 'CartPole-v1'} | runner=0[0m
[sim] [38;5;71m[RemoteRL] Remote Environment | seq=1 make:{'0': 'CartPole-v1'} | runner=1[0m
[sim] [38;5;71m[RemoteRL] Remote Environment | seq=8 make:{'7': 'CartPole-v1'} | runner=0[0m
[sim] [38;5;71m[RemoteRL] Remote Environment | seq=8 make:{'7': 'CartPole-v1'} | runner=1[0m
[sim] [38;5;117m[RemoteRL] | simulator |    520 MB left | https://remoterl.com/user/dashboard | elapsed 0:00:00 | ~ calculating...[0m
[sim] [38;5;71m[RemoteRL] Remote Environment | seq=64 step | runner=0[0m
[sim] [38;5;71m[RemoteRL] Remote Environment | seq=64 step | runner=1[0m
[sim] [38;5;71m[RemoteRL] Remote Environment | seq=512 step | runner=0[0m
[sim] [38;5;71m[RemoteRL] Remote Environment | seq=512 step | runner=1[0m


## Start the Online Trainer

Connects to the RemoteRL simulator cluster, instantiates a PPO agent, and trains online.

In [5]:
# ── Cell B ──────────────────────────────────────────────────────────
import remoterl, os
from stable_baselines3 import PPO
from stable_baselines3.common.env_util import make_vec_env
# ------------------------------------------------------------------
# 1️⃣ Retrieve API key injected by the helper or set in the shell
# ------------------------------------------------------------------
API_KEY = os.getenv("REMOTERL_API_KEY", key)

# ------------------------------------------------------------------
# 2️⃣ Connect to the RemoteRL backend in trainer mode
# ------------------------------------------------------------------
if not remoterl.init(api_key=API_KEY, role="trainer"):
    raise RuntimeError("Failed to connect to RemoteRL.")

# ------------------------------------------------------------------
# 3️⃣ Build a vectorised CartPole environment (32 parallel instances)
# ------------------------------------------------------------------
ENV_ID = "CartPole-v1"
env     = make_vec_env(ENV_ID, n_envs=32)

# ------------------------------------------------------------------
# 4️⃣ Instantiate PPO with a modest network architecture
# ------------------------------------------------------------------
model = PPO(
    policy="MlpPolicy",
    env=env,
    policy_kwargs=dict(net_arch=dict(pi=[128, 64], vf=[128, 64])),
    n_steps=64, n_epochs=4, batch_size=64, verbose=1, device="auto",
)

# ------------------------------------------------------------------
# 5️⃣ Train for roughly 20k environment steps
# ------------------------------------------------------------------
model.learn(total_timesteps=20_000)

# ------------------------------------------------------------------
# 6️⃣ Graceful shutdown of envs and confirmation message
# ------------------------------------------------------------------
env.close()
print("✅ Training finished.")

[38;5;71m[RemoteRL] Session opened | trainer=99db70db | num_env_runners=2 | num_workers=1[0m
[94m[RemoteRL] Remote Gym enabled with 1 workers and 2 runners.[0m
[94m[RemoteRL] Remote Stable-Baselines3 applied.[0m
Using cpu device
[38;5;117m[RemoteRL] | trainer |    524 MB left | https://remoterl.com/user/dashboard | elapsed 0:00:00 | ~ calculating...[0m
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 24.6     |
|    ep_rew_mean     | 24.6     |
| time/              |          |
|    fps             | 147      |
|    iterations      | 1        |
|    time_elapsed    | 13       |
|    total_timesteps | 2048     |
---------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 24.5         |
|    ep_rew_mean          | 24.5         |
| time/                   |              |
|    fps                  | 146          |
|    iterations           | 2          