# PaST — CNN+DeepSets PPO (Colab/T4)

This notebook trains one of the new *CNN+DeepSets* PPO variants (same job×family action space + best-start decoding), then evaluates it with Greedy / SGBS / EAS or SGBS+EAS.

**Typical flow**
1) Install deps (Colab)
2) Train: `python -m PaST.train_ppo ...`
3) Auto-pick latest checkpoint
4) Evaluate: `python -m PaST.run_eval_eas_* ...`

In [1]:
import os, sys, subprocess, textwrap
from pathlib import Path

ROOT = Path.cwd()
print("CWD:", ROOT)
print("Python:", sys.version)

# Make sure imports like `import PaST` work
if str(ROOT) not in sys.path:
    sys.path.insert(0, str(ROOT))

# Basic sanity: torch + GPU
import torch

print("torch:", torch.__version__)
print("cuda available:", torch.cuda.is_available())
if torch.cuda.is_available():
    print("gpu:", torch.cuda.get_device_name(0))

CWD: /content
Python: 3.12.12 (main, Oct 10 2025, 08:52:57) [GCC 11.4.0]
torch: 2.9.0+cu126
cuda available: True
gpu: Tesla T4


In [2]:
# Colab installs (safe to re-run).
# If you're not on Colab and already have deps, you can skip this cell.
!rm -rf PaST
!git clone https://github.com/Abdellahbado/PaST

!pip -q install -r PaST/requirements.txt
!pip -q install pandas matplotlib

Cloning into 'PaST'...
remote: Enumerating objects: 356, done.[K
remote: Counting objects: 100% (27/27), done.[K
remote: Compressing objects: 100% (22/22), done.[K
remote: Total 356 (delta 10), reused 16 (delta 5), pack-reused 329 (from 1)[K
Receiving objects: 100% (356/356), 125.83 MiB | 41.38 MiB/s, done.
Resolving deltas: 100% (178/178), done.


## Train (long-run-friendly hyperparams)

Defaults here are chosen to **keep learning for a long time**:
- Cosine LR decay with a non-trivial end LR (`lr_end_factor=0.1`)
- Cosine entropy decay (explore early, still some exploration late)
- Target KL to prevent updates collapsing
- Curriculum enabled (helps early stability): starts on *small* horizons then gradually introduces larger ones, while also annealing the epsilon-constraint slack range

Adjust `TOTAL_ENV_STEPS` depending on runtime budget.

In [3]:
# Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

# Create checkpoint directory in Drive
import os
drive_checkpoint_dir = '/content/drive/MyDrive/PaST_checkpoints'
os.makedirs(drive_checkpoint_dir, exist_ok=True)
print(f"Checkpoint directory: {drive_checkpoint_dir}")

KeyboardInterrupt: 

In [5]:
# Pick ONE variant_id:
# - ppo_family_q4_ctx13_beststart_cwe (recommended default)
# - ppo_duration_aware_family_ctx13_cnn

VARIANT_ID = "ppo_family_q4_ctx13_beststart_cwe"
SEED = 0
OUT_DIR = "/content/drive/MyDrive/PaST_checkpoints/runs_family_beststart_cwe"

# Main training knobs (T4-friendly defaults)
NUM_ENVS = 128
ROLLOUT_LEN = 512
TOTAL_ENV_STEPS = 20_000_000

LR = 3e-4
LR_SCHEDULE = "cosine"
LR_END_FACTOR = 0.1

ENT_SCHEDULE = "cosine"
ENT_START = 0.02
ENT_END = 0.002
ENT_DECAY_FRAC = 0.8

PPO_EPOCHS = 4
NUM_MINIBATCHES = 64
CLIP_EPS = 0.2
VALUE_COEF = 0.5
MAX_GRAD_NORM = 0.5
TARGET_KL = 0.03

CURRICULUM = True
CURRICULUM_FRAC = 0.3

cmd = [
    sys.executable,
    "-m",
    "PaST.train_ppo",
    "--variant_id",
    VARIANT_ID,
    "--seed",
    str(SEED),
    "--device",
    "cuda" if torch.cuda.is_available() else "cpu",
    "--output_dir",
    OUT_DIR,
    "--num_envs",
    str(NUM_ENVS),
    "--rollout_length",
    str(ROLLOUT_LEN),
    "--total_env_steps",
    str(TOTAL_ENV_STEPS),
    "--learning_rate",
    str(LR),
    "--lr_schedule",
    LR_SCHEDULE,
    "--lr_end_factor",
    str(LR_END_FACTOR),
    "--ppo_epochs",
    str(PPO_EPOCHS),
    "--num_minibatches",
    str(NUM_MINIBATCHES),
    "--clip_eps",
    str(CLIP_EPS),
    "--value_coef",
    str(VALUE_COEF),
    "--max_grad_norm",
    str(MAX_GRAD_NORM),
    "--target_kl",
    str(TARGET_KL),
    "--entropy_schedule",
    ENT_SCHEDULE,
    "--entropy_coef_start",
    str(ENT_START),
    "--entropy_coef_end",
    str(ENT_END),
    "--entropy_decay_fraction",
    str(ENT_DECAY_FRAC),
]
if CURRICULUM:
    cmd += ["--curriculum", "--curriculum_fraction", str(CURRICULUM_FRAC)]

print(" ".join(cmd))

In [None]:
# Start training
!{' '.join(cmd)}

## Find the latest checkpoint
This grabs the most recent `latest.pt` under the output directory.

In [None]:
import glob

ckpts = glob.glob(f"{OUT_DIR}/**/checkpoints/latest.pt", recursive=True)
if not ckpts:
    raise FileNotFoundError(f"No latest.pt found under {OUT_DIR}/")

ckpts_sorted = sorted(ckpts, key=lambda p: Path(p).stat().st_mtime)
CKPT = ckpts_sorted[-1]
print("Using checkpoint:", CKPT)

## Evaluate (Greedy / SGBS / EAS / SGBS+EAS)
Choose the script based on the variant family:
- `run_eval_eas_family_q4_beststart` for `ppo_family_*_beststart_*`
- `run_eval_eas_duration_aware` for `ppo_duration_aware_*`

In [None]:
EVAL_SEED = 42
NUM_INSTANCES = 16
SCALE = "small"  # small|medium|large
EPS_STEPS = 5

METHOD = "sgbs_eas"  # eas|sgbs_eas
BETA = "4"
GAMMA = "4"
MAX_ITERS = 50
EAS_LR = 0.003
EAS_IL = 0.01
SAMPLES_PER_ITER = 32

if "duration_aware" in VARIANT_ID:
    eval_mod = "PaST.run_eval_eas_duration_aware"
else:
    eval_mod = "PaST.run_eval_eas_family_q4_beststart"

eval_cmd = [
    sys.executable,
    "-m",
    eval_mod,
    "--checkpoint",
    CKPT,
    "--variant_id",
    VARIANT_ID,
    "--eval_seed",
    str(EVAL_SEED),
    "--num_instances",
    str(NUM_INSTANCES),
    "--scale",
    SCALE,
    "--epsilon_steps",
    str(EPS_STEPS),
    "--method",
    METHOD,
    "--beta",
    BETA,
    "--gamma",
    GAMMA,
    "--max_iterations",
    str(MAX_ITERS),
    "--eas_lr",
    str(EAS_LR),
    "--eas_il_weight",
    str(EAS_IL),
    "--samples_per_iter",
    str(SAMPLES_PER_ITER),
]
print(" ".join(eval_cmd))

In [None]:
!{' '.join(eval_cmd)}