# PaST — PPO (Colab/T4): CNN+DeepSets or CWE

> If you want to run the **CWE** backbone (`cwe_sparse`), you must clone a repo/branch that includes the CWE changes.

This notebook trains one PPO variant, then evaluates it with Greedy / SGBS / EAS or SGBS+EAS.

**Typical flow**
1) Install deps (Colab)
2) Train: `python -m PaST.train_ppo ...`
3) Auto-pick latest checkpoint
4) Evaluate: `python -m PaST.run_eval_eas_* ...`

In [1]:
import os, sys, subprocess, textwrap
from pathlib import Path

ROOT = Path.cwd()
print("CWD:", ROOT)
print("Python:", sys.version)

# Make sure imports like `import PaST` work
if str(ROOT) not in sys.path:
    sys.path.insert(0, str(ROOT))

# Basic sanity: torch + GPU
import torch

print("torch:", torch.__version__)
print("cuda available:", torch.cuda.is_available())
if torch.cuda.is_available():
    print("gpu:", torch.cuda.get_device_name(0))

CWD: /content
Python: 3.12.12 (main, Oct 10 2025, 08:52:57) [GCC 11.4.0]
torch: 2.9.0+cu126
cuda available: True
gpu: Tesla T4


In [2]:
# Colab installs (safe to re-run).
# If you're not on Colab and already have deps, you can skip this cell.
#
# IMPORTANT: The default upstream repo may not include the CWE variant yet.
# If you want to run CWE, point these to the branch where you added it.
REPO_URL = "https://github.com/Abdellahbado/PaST"  # TODO: change to your fork if needed
REPO_BRANCH = "main"  # TODO: change to your CWE branch if needed

!rm -rf PaST
!git clone -b "$REPO_BRANCH" "$REPO_URL"

!pip -q install -r PaST/requirements.txt
!pip -q install pandas matplotlib

Cloning into 'PaST'...
remote: Enumerating objects: 349, done.[K
remote: Counting objects: 100% (20/20), done.[K
remote: Compressing objects: 100% (17/17), done.[K
remote: Total 349 (delta 5), reused 11 (delta 3), pack-reused 329 (from 1)[K
Receiving objects: 100% (349/349), 125.82 MiB | 15.10 MiB/s, done.
Resolving deltas: 100% (173/173), done.


## Train (long-run-friendly hyperparams)

Defaults here are chosen to **keep learning for a long time**:
- Cosine LR decay with a non-trivial end LR (`lr_end_factor=0.1`)
- Cosine entropy decay (explore early, still some exploration late)
- Target KL to prevent updates collapsing
- Curriculum enabled (helps early stability): starts on *small* horizons then gradually introduces larger ones, while also annealing the epsilon-constraint slack range

Adjust `TOTAL_ENV_STEPS` depending on runtime budget.

In [3]:
# Pick ONE variant_id:
# - ppo_family_q4_ctx13_beststart_cnn (recommended baseline)
# - ppo_family_q4_ctx13_beststart_cwe (CWE backbone, job-conditioned period readout)
# - ppo_duration_aware_family_ctx13_cnn
#
VARIANT_ID = "ppo_family_q4_ctx13_beststart_cwe"
SEED = 0
OUT_DIR = "runs_colab"

# Main training knobs (T4-friendly defaults)
NUM_ENVS = 64
ROLLOUT_LEN = 256
TOTAL_ENV_STEPS = 20_000_000

LR = 3e-4
LR_SCHEDULE = "cosine"
LR_END_FACTOR = 0.1

ENT_SCHEDULE = "cosine"
ENT_START = 0.02
ENT_END = 0.002
ENT_DECAY_FRAC = 0.8

PPO_EPOCHS = 2
NUM_MINIBATCHES = 16
CLIP_EPS = 0.2
VALUE_COEF = 0.5
MAX_GRAD_NORM = 0.5
TARGET_KL = 0.03

CURRICULUM = True
CURRICULUM_FRAC = 0.3

cmd = [
    sys.executable,
    "-m",
    "PaST.train_ppo",
    "--variant_id",
    VARIANT_ID,
    "--seed",
    str(SEED),
    "--device",
    "cuda" if torch.cuda.is_available() else "cpu",
    "--output_dir",
    OUT_DIR,
    "--num_envs",
    str(NUM_ENVS),
    "--rollout_length",
    str(ROLLOUT_LEN),
    "--total_env_steps",
    str(TOTAL_ENV_STEPS),
    "--learning_rate",
    str(LR),
    "--lr_schedule",
    LR_SCHEDULE,
    "--lr_end_factor",
    str(LR_END_FACTOR),
    "--ppo_epochs",
    str(PPO_EPOCHS),
    "--num_minibatches",
    str(NUM_MINIBATCHES),
    "--clip_eps",
    str(CLIP_EPS),
    "--value_coef",
    str(VALUE_COEF),
    "--max_grad_norm",
    str(MAX_GRAD_NORM),
    "--target_kl",
    str(TARGET_KL),
    "--entropy_schedule",
    ENT_SCHEDULE,
    "--entropy_coef_start",
    str(ENT_START),
    "--entropy_coef_end",
    str(ENT_END),
    "--entropy_decay_fraction",
    str(ENT_DECAY_FRAC),
]
if CURRICULUM:
    cmd += ["--curriculum", "--curriculum_fraction", str(CURRICULUM_FRAC)]

print(" ".join(cmd))

/usr/bin/python3 -m PaST.train_ppo --variant_id ppo_family_q4_ctx13_beststart_cnn --seed 0 --device cuda --output_dir runs_colab --num_envs 64 --rollout_length 256 --total_env_steps 20000000 --learning_rate 0.0003 --lr_schedule cosine --lr_end_factor 0.1 --ppo_epochs 2 --num_minibatches 16 --clip_eps 0.2 --value_coef 0.5 --max_grad_norm 0.5 --target_kl 0.03 --entropy_schedule cosine --entropy_coef_start 0.02 --entropy_coef_end 0.002 --entropy_decay_fraction 0.8 --curriculum --curriculum_fraction 0.3


In [None]:
# Start training
!{' '.join(cmd)}


################################################################################
# Running seed 0
################################################################################

PaST-SM PPO Training v2.0-PPO
Variant: ppo_family_q4_ctx13_beststart_cnn
Seed: 0
Device: cuda
Num envs: 64
Rollout length: 256
Steps per update: 16,384
Total updates: 1,220
Total env steps: 20,000,000
Output: runs_colab/ppo_family_q4_ctx13_beststart_cnn/seed_0

Creating environment...
Observation shapes: {'jobs': (50, 2), 'periods': (48, 4), 'period_mask': (48,), 'ctx': (13,), 'job_mask': (50,), 'action_mask': (200,)}

Creating model...
Model parameters: 202,182
Action dim: 200

Creating PPO runner...

Starting training...
[    0] steps=    16,384 | ret=  -58.78±62.12  | r/step=-13.58±8.00   | tmax<=  80 | ep=3674 am=  0 | π= 0.0063 V=2107.5239 H= 1.418 | kl=0.0358 clip=0.113 | ∇=87.954 gclip=1.00 e= 2 | lr=3.00e-04 entc=2.00e-02 | mem=    95MB | sps=   983 t=  0.3m
  [Checkpoint] Saved latest.pt
[    1] ste

## Find the latest checkpoint
This grabs the most recent `latest.pt` under the output directory.

In [None]:
import glob

ckpts = glob.glob(f"{OUT_DIR}/**/checkpoints/latest.pt", recursive=True)
if not ckpts:
    raise FileNotFoundError(f"No latest.pt found under {OUT_DIR}/")

ckpts_sorted = sorted(ckpts, key=lambda p: Path(p).stat().st_mtime)
CKPT = ckpts_sorted[-1]
print("Using checkpoint:", CKPT)

FileNotFoundError: No latest.pt found under runs_colab/

## Evaluate (Greedy / SGBS / EAS / SGBS+EAS)
Choose the script based on the variant family:
- `run_eval_eas_family_q4_beststart` for `ppo_family_*_beststart_*`
- `run_eval_eas_duration_aware` for `ppo_duration_aware_*`

In [None]:
EVAL_SEED = 42
NUM_INSTANCES = 16
SCALE = "small"  # small|medium|large
EPS_STEPS = 5

METHOD = "sgbs_eas"  # eas|sgbs_eas
BETA = "4"
GAMMA = "4"
MAX_ITERS = 50
EAS_LR = 0.003
EAS_IL = 0.01
SAMPLES_PER_ITER = 32

if "duration_aware" in VARIANT_ID:
    eval_mod = "PaST.run_eval_eas_duration_aware"
else:
    eval_mod = "PaST.run_eval_eas_family_q4_beststart"

eval_cmd = [
    sys.executable,
    "-m",
    eval_mod,
    "--checkpoint",
    CKPT,
    "--variant_id",
    VARIANT_ID,
    "--eval_seed",
    str(EVAL_SEED),
    "--num_instances",
    str(NUM_INSTANCES),
    "--scale",
    SCALE,
    "--epsilon_steps",
    str(EPS_STEPS),
    "--method",
    METHOD,
    "--beta",
    BETA,
    "--gamma",
    GAMMA,
    "--max_iterations",
    str(MAX_ITERS),
    "--eas_lr",
    str(EAS_LR),
    "--eas_il_weight",
    str(EAS_IL),
    "--samples_per_iter",
    str(SAMPLES_PER_ITER),
]
print(" ".join(eval_cmd))

In [None]:
!{' '.join(eval_cmd)}