# Starter: Sequence RL — debug-friendly notebook (Anaconda base)

Run the cells **in order** (1 → 10).
This notebook:
- uses your Anaconda base env (no venv),
- verifies CUDA,
- wires `sys.path` so `training/` imports,
- runs the trainer **in-process** (easy to debug) or via subprocess,
- starts TensorBoard,
- evaluates a saved policy,
- and gives quick unit-test & step-debug helpers.


In [54]:
import os, sys, subprocess, textwrap

print("Python:", sys.executable)
try:
    import torch
    print("PyTorch:", torch.__version__)
    print("CUDA available:", torch.cuda.is_available())
    print("CUDA version:", getattr(torch.version, "cuda", None))
    if torch.cuda.is_available():
        print("CUDA device count:", torch.cuda.device_count())
        print("Device 0:", torch.cuda.get_device_name(0))
except Exception as e:
    print("Torch import error:", e)

# Optional: show nvidia-smi (won't crash if missing)
try:
    print("\n--- nvidia-smi ---")
    out = subprocess.run(["nvidia-smi"], capture_output=True, text=True)
    print(out.stdout or out.stderr)
except Exception as e:
    print("nvidia-smi not available:", e)


Python: C:\Users\carlo\anaconda3\python.exe
PyTorch: 2.2.1
CUDA available: True
CUDA version: 12.1
CUDA device count: 1
Device 0: NVIDIA GeForce RTX 3050 Laptop GPU

--- nvidia-smi ---
Sun Aug 24 14:36:42 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 551.61                 Driver Version: 551.61         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                     TCC/WDDM  | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  NVIDIA GeForce RTX 3050 ...  WDDM  |   00000000:01:00.0 Off |                  N/A |
| N/A   56C    P8              4W /   25W |     376MiB /   4096MiB |      0%      Default |
|                                         |    

In [81]:
import os, sys, runpy, importlib, contextlib, io, json

# --- configure your paths/overrides ---
CFG_PATH = r"E:\sequence_game_board\sequence_board_game\training\configs\tiny-smoke.json"
OVERRIDE = {
    "training.num_envs": 8, "training.rollout_length": 32, "training.total_updates": 200, "training.minibatch_size": 512,
    "logging.tensorboard": True
}

# Project root = repo folder that contains the 'training' package
project_root = os.path.abspath(os.path.join(os.path.dirname(CFG_PATH), os.pardir))
if project_root not in sys.path:
    sys.path.insert(0, project_root)

print("Project root:", project_root)

# --- nuke any cached 'training' modules so we load the edited code ---
to_purge = [m for m in list(sys.modules) if m == "training" or m.startswith("training.")]
for m in to_purge:
    del sys.modules[m]
importlib.invalidate_caches()

# Sanity: import the state module we will actually use
import training
import training.engine.state as st
print("training package file:", training.__file__)
print("state.py path        :", st.__file__)
print("has sequences_meta_cells?:", hasattr(st.GameState, "sequences_meta_cells"))
assert hasattr(st.GameState, "sequences_meta_cells"), "Your edited GameState isn't loading!"

# --- run the script in-process with a clean argv ---
argv = [
    "training.scripts.train",
    "--config", CFG_PATH,
    "--override", json.dumps(OVERRIDE),
]

stdout_buf, stderr_buf = io.StringIO(), io.StringIO()
print("Running in-process with argv:", argv)

with contextlib.redirect_stdout(stdout_buf), contextlib.redirect_stderr(stderr_buf):
    old_argv = sys.argv
    try:
        sys.argv = argv
        # alter_sys=True helps mimic "python -m"
        runpy.run_module("training.scripts.train", run_name="__main__", alter_sys=True)
        exit_code = 0
    except SystemExit as se:
        exit_code = int(getattr(se, "code", 0) or 0)
    except Exception as e:
        exit_code = 1
        print("EXCEPTION:", repr(e))
    finally:
        sys.argv = old_argv

out, err = stdout_buf.getvalue(), stderr_buf.getvalue()
print("--- STDOUT ---\n", out[-10000:])
print("--- STDERR ---\n", err[-10000:])
print("Exit code:", exit_code)


Project root: E:\sequence_game_board\sequence_board_game\training
training package file: E:\sequence_game_board\sequence_board_game\training\__init__.py
state.py path        : E:\sequence_game_board\sequence_board_game\training\engine\state.py
has sequences_meta_cells?: True
Running in-process with argv: ['training.scripts.train', '--config', 'E:\\sequence_game_board\\sequence_board_game\\training\\configs\\tiny-smoke.json', '--override', '{"training.num_envs": 8, "training.rollout_length": 32, "training.total_updates": 200, "training.minibatch_size": 512, "logging.tensorboard": true}']
--- STDOUT ---
 .0000 | loss/entropy:4.6456
update 112/200 | loss/total:56191944.0000 | loss/policy:0.2072 | loss/value:112383888.0000 | loss/entropy:4.6457
update 113/200 | loss/total:13797958.0000 | loss/policy:0.2074 | loss/value:27595916.0000 | loss/entropy:4.6458
update 114/200 | loss/total:31464174.0000 | loss/policy:0.2075 | loss/value:62928348.0000 | loss/entropy:4.6458
update 115/200 | loss/tot

In [80]:
%load_ext tensorboard
%tensorboard --logdir "E:/sequence_game_board/sequence_board_game/runs" --port 6007 --reload_interval 3

The tensorboard extension is already loaded. To reload it, use:
  %reload_ext tensorboard


Reusing TensorBoard on port 6007 (pid 67004), started 0:06:34 ago. (Use '!kill 67004' to kill it.)

In [82]:
import glob
paths = glob.glob(r"E:\sequence_game_board\sequence_board_game\runs\**\events.*", recursive=True)
print("Found:", len(paths))
for p in paths[:10]:
    print(" ", p)


Found: 0
