# Colab-ready notebook: Train Active Explorer MNIST PPO (flat observations)
# Cell 1: Title/description (Markdown)


# Colab notebook

This notebook trains the Active Explorer MNIST agent (PPO) using the same flat-observation style as your PC run. It clones the repo, installs dependencies, mounts Google Drive (optional), runs training (optionally resuming from a checkpoint in Drive), evaluates, and copies artifacts back to Drive for download.

Checklist:
- [ ] Set Colab runtime to GPU (Runtime > Change runtime type > GPU)
- [ ] (Optional) Upload any resume checkpoint (`ppo_pretrained.zip` or `ppo_final.zip`) into Google Drive
- [ ] Run each cell in order


In [None]:
# Cell: Install dependencies

# Install core packages quietly. Adjust torch install if you need a specific CUDA version.
!pip install -q stable-baselines3[extra] gymnasium torch torchvision opencv-python pandas seaborn scikit-learn tqdm

# Verify versions and GPU
import sys, torch
print('Python', sys.version)
print('Torch', torch.__version__)
print('CUDA available:', torch.cuda.is_available())


In [None]:
# Clone repository and list files

# Clone the repo (if you already uploaded the project to Drive you may skip cloning)
!rm -rf decentralized-multiarm || true
!git clone https://github.com/real-stanford/decentralized-multiarm.git

# Show the active_explorer_mnist folder
!ls -la decentralized-multiarm/active_explorer_mnist


In [None]:
# Mount Google Drive (optional) and ensure repo exists
import os
from pathlib import Path
from google.colab import drive
print('If you want to persist models/logs, mount your Google Drive when prompted')
drive.mount('/content/drive')

# Clone repository if it doesn't exist (useful when you open the notebook fresh)
if not Path('/content/decentralized-multiarm').exists():
    print('Cloning repository...')
    os.system('git clone https://github.com/real-stanford/decentralized-multiarm.git')

# Set workspace and default vars
WORKDIR = '/content/decentralized-multiarm/active_explorer_mnist'
if not Path(WORKDIR).exists():
    raise FileNotFoundError(f"Expected WORKDIR at {WORKDIR} — run the clone cell or upload your project to Drive and update WORKDIR.")

# change into workspace
os.chdir(WORKDIR)
print('Changed working directory to', os.getcwd())

# Default parameters - edit these cells before running training
RESUME_PATH = None  # e.g. '/content/drive/MyDrive/active_explorer_mnist/ppo_pretrained.zip'
TIMESTEPS = 5000000  # set how many timesteps to run on Colab (e.g., 5e6)
SAVE_DIR = './ppo_colab'
SEED = 0
print('WORKDIR', WORKDIR)
print('RESUME_PATH', RESUME_PATH)
print('TIMESTEPS', TIMESTEPS)
print('SAVE_DIR', SAVE_DIR)


In [None]:
# Flat-observation wrapper note (no-op if env already returns flat obs)

# The env in this repo already returns flat observations for SB3 MlpPolicy.
# If you ever switch to a dict obs env, you can apply FlattenObservation wrapper:
from gymnasium.wrappers import FlattenObservation
print('FlattenObservation available')


In [None]:
# Training cell

# Builds the command conditionally: include --resume-path only if RESUME_PATH is set
import os
import subprocess
from pathlib import Path

resume_flag = ["--resume-path", str(RESUME_PATH)] if RESUME_PATH else []
cmd_list = [
    'python3', 'train_explorer_ppo.py',
    '--classifier', './mnist_cnn.pth',
    '--timesteps', str(int(TIMESTEPS)),
    '--save-path', SAVE_DIR,
    '--seed', str(SEED),
]
# insert resume flag if present
if resume_flag:
    cmd_list[cmd_list.index('--save-path'):0]
    cmd_list += resume_flag

print('Running training command:')
print(' '.join(cmd_list))

# Run the training (this will display the tqdm output)
# Use subprocess.run to stream output directly
subprocess.run(cmd_list, check=True)


In [None]:
# Upload helper cell — ways to get your classifier / resume model into Colab

# Option A — place files in your Google Drive (recommended):
# 1) Open https://drive.google.com in your browser
# 2) Upload the files into a folder, e.g. /MyDrive/active_explorer_mnist/
# 3) In the WORKDIR cell set RESUME_PATH='/content/drive/MyDrive/active_explorer_mnist/ppo_pretrained.zip'

# Option B — upload directly to this Colab session (small files, ephemeral):
from google.colab import files
print('Use the file browser or the next cell to upload files directly to /content.')
# Uncomment to run interactive upload (this will open a file picker):
# uploaded = files.upload()
# After upload, move files to the repo folder, e.g.:
# import shutil
# for fn in uploaded.keys():
#     shutil.move(fn, WORKDIR)

# Option C — download from a URL (if you host the files elsewhere):
# !wget -O ./mnist_cnn.pth "https://example.com/path/to/mnist_cnn.pth"

print('If you uploaded files via the Drive UI, set RESUME_PATH to the Drive path and run the training cell.')


In [None]:
# Evaluation + analysis cell
import os
import shutil
import subprocess
from pathlib import Path

# Evaluate saved model (100 episodes) and generate plots
MODEL_ZIP = Path(SAVE_DIR) / 'ppo_explorer.zip'
if MODEL_ZIP.exists():
    eval_cmd = [
        'python3', 'test_policy_runner.py',
        '--policy', 'saved',
        '--saved-path', str(MODEL_ZIP),
        '--num-episodes', '100',
        '--output', './colab_results.csv',
        '--classifier', './mnist_cnn.pth',
        '--seed', str(SEED),
    ]
    print('Running evaluation:')
    print(' '.join(eval_cmd))
    subprocess.run(eval_cmd, check=True)
    print('Analyzing results...')
    subprocess.run(['python3', 'analyze_policy_results.py', './colab_results.csv', '--threshold', '0.90', '--out-dir', '.'], check=True)
    # copy artifacts to Drive
    dst_dir = Path('/content/drive/MyDrive/active_explorer_mnist')
    dst_dir.mkdir(parents=True, exist_ok=True)
    for f in ['colab_results.csv', 'colab_results_confusion.png', 'colab_results_confidence_hist.png', 'colab_results_moves_vs_pixels.png']:
        if Path(f).exists():
            shutil.copy2(f, dst_dir / f)
            print('Copied', f, 'to Drive')
else:
    print('Model zip not found, skipping evaluation')


# Final instructions

1) In Colab: Runtime -> Change runtime type -> GPU, then run this notebook.

2) If you have a BC/previous checkpoint from your PC, upload it to Google Drive and set `RESUME_PATH` to its Drive path (e.g., `/content/drive/MyDrive/active_explorer_mnist/ppo_pretrained.zip`) before running the training cell.

3) The notebook copies the final `ppo_explorer.zip` and analysis artifacts to `/content/drive/MyDrive/active_explorer_mnist`. Use the Drive UI to download them to your PC.

4) For direct downloads, small files may be downloaded with `from google.colab import files; files.download('path')`, but for large models Drive is recommended.

5) If you want me to add a convolutional policy or reward-shaping example in the notebook, say so and I will update it.
