# Block B Starter / Executor (Colab)
Use this notebook to:
1) Set up the repo for Block B
2) Configure dataset paths (Cityscapes, nuScenes-mini, CMP)
3) Run a **dry test** of the new data pipeline on CMP Facade

**Note:** End-to-end model inference is wired via the existing CLI (`stg-stsg-model/src/infer_v1.py`).
At this stage, features are dummy-cached; we patch the runner to call the CLI per scene.

In [None]:
#@title Verify Colab path
import os
assert os.path.isdir('/content/stg-system-main'), "❌ Repo not found at /content/stg-system-main. Re-clone before running."
print('✅ Repo path verified. You can now run all cells top-to-bottom.')


In [1]:
#@title 0) Environment check (GPU & Python)
import sys, platform, subprocess, os
print("Python:", sys.version)
print("Platform:", platform.platform())
!nvidia-smi || echo "No GPU visible (CPU-only run)."


Python: 3.12.12 (main, Oct 10 2025, 08:52:57) [GCC 11.4.0]
Platform: Linux-6.6.105+-x86_64-with-glibc2.35
Sun Nov  9 09:15:55 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  NVIDIA A100-SXM4-80GB          Off |   00000000:00:05.0 Off |                    0 |
| N/A   32C    P0             55W /  400W |       0MiB /  81920MiB |      0%      Default |
|                                         |                        |             Disabled |
+---------------------------------

## 1) Bring the code into Colab
We now clone the public repository directly from GitHub instead of uploading a ZIP.

In [2]:
#@title Clone the repo
REPO_URL = "https://github.com/azsanche-asg/stg-system.git"  # public repo URL
EXTRACT_DIR = "/content/stg-system-main"

import os, shutil

# Remove any previous clone to ensure a clean copy
shutil.rmtree(EXTRACT_DIR, ignore_errors=True)

print(f"Cloning from {REPO_URL} ...")
!git clone "$REPO_URL" "$EXTRACT_DIR"

print("✅ Cloned to:", EXTRACT_DIR)
!find /content/stg-system-main -maxdepth 2 -type d -print


Cloning from https://github.com/azsanche-asg/stg-system.git ...
Cloning into '/content/stg-system-main'...
remote: Enumerating objects: 235, done.[K
remote: Counting objects: 100% (235/235), done.[K
remote: Compressing objects: 100% (171/171), done.[K
remote: Total 235 (delta 103), reused 187 (delta 55), pack-reused 0 (from 0)[K
Receiving objects: 100% (235/235), 32.96 MiB | 16.26 MiB/s, done.
Resolving deltas: 100% (103/103), done.
✅ Cloned to: /content/stg-system-main
/content/stg-system-main
/content/stg-system-main/stg-real-eval
/content/stg-system-main/stg-real-eval/src
/content/stg-system-main/stg-real-eval/datasets
/content/stg-system-main/stg-real-eval/configs
/content/stg-system-main/stg-baselines
/content/stg-system-main/stg-baselines/src
/content/stg-system-main/stg-stsg-model
/content/stg-system-main/stg-stsg-model/src
/content/stg-system-main/stg-stsg-model/configs
/content/stg-system-main/stg-synthetic-eval
/content/stg-system-main/stg-synthetic-eval/src
/content/stg-

## 2) Install minimal requirements
This keeps installs light; heavy backbones will be added later.

In [3]:
#@title Install project requirements (lightweight)
%pip -q install -U pip
%pip -q install numpy pyyaml tqdm opencv-python matplotlib pillow torch torchvision


[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/1.8 MB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m71.6 MB/s[0m eta [36m0:00:00[0m
[?25h

In [4]:
#@title Install feature extractors (CLIP / MiDaS / DINO / SAM)
%pip -q install ftfy regex tqdm
%pip -q install git+https://github.com/openai/CLIP.git

import torch, os
print("Torch:", torch.__version__, "CUDA available:", torch.cuda.is_available())

# Allow git-based torch.hub downloads in Colab
os.environ["TORCH_HOME"] = "/content/torch_hub"
os.makedirs(os.environ["TORCH_HOME"], exist_ok=True)


  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
  Building wheel for clip (pyproject.toml) ... [?25l[?25hdone
Torch: 2.8.0+cu126 CUDA available: True


## 3) Quick repo sanity & path setup

In [5]:
#@title Add repo to sys.path
import sys, os
ROOT = "/content/stg-system-main"
assert os.path.isdir(ROOT), "Repo root not found."
sys.path.append(ROOT)
print("Repo root:", ROOT)

# Show key modules
!ls -la "$ROOT"
!ls -la "$ROOT/stg-real-eval" || echo "stg-real-eval not found (ensure you applied the Block B patch)."


Repo root: /content/stg-system-main
total 56
drwxr-xr-x 8 root root  4096 Nov  9 09:16 .
drwxr-xr-x 1 root root  4096 Nov  9 09:16 ..
-rw-r--r-- 1 root root 11722 Nov  9 09:16 BlockB_Starter_Colab.ipynb
-rw-r--r-- 1 root root  6148 Nov  9 09:16 .DS_Store
drwxr-xr-x 8 root root  4096 Nov  9 09:16 .git
-rw-r--r-- 1 root root   816 Nov  9 09:16 README.md
drwxr-xr-x 3 root root  4096 Nov  9 09:16 stg-baselines
drwxr-xr-x 4 root root  4096 Nov  9 09:16 stg-procedural-data
drwxr-xr-x 5 root root  4096 Nov  9 09:16 stg-real-eval
drwxr-xr-x 4 root root  4096 Nov  9 09:16 stg-stsg-model
drwxr-xr-x 5 root root  4096 Nov  9 09:16 stg-synthetic-eval
total 44
drwxr-xr-x 5 root root 4096 Nov  9 09:16  .
drwxr-xr-x 8 root root 4096 Nov  9 09:16  ..
drwxr-xr-x 2 root root 4096 Nov  9 09:16  configs
drwxr-xr-x 3 root root 4096 Nov  9 09:16  datasets
-rw-r--r-- 1 root root 6148 Nov  9 09:16  .DS_Store
-rw-r--r-- 1 root root  559 Nov  9 09:16 'README 2.md'
-rw-r--r-- 1 root root  559 Nov  9 09:16  README

## 4) Configure dataset paths
Fill in your local paths. For a **dry test**, you can point CMP to a folder with 5–10 JPGs.

In [7]:
#@title Set dataset paths (edit these)
CITYSCAPES_SEQ_ROOT = "/content/cityscapes/leftImg8bit_sequence/demoVideo"  #@param {type:"string"}
CITYSCAPES_STILL_ROOT = "/content/cityscapes/leftImg8bit/val"               #@param {type:"string"}
CMP_ROOT = "/content/cmp/facade_db"                                         #@param {type:"string"}
NUSCENES_ROOT = "/content/nuscenes-mini"                                     #@param {type:"string"}

# Patch CMP config inline
import yaml, json
from pathlib import Path
cfg_path = Path(ROOT) / "stg-real-eval/configs/block_b_cmp.yaml"
cfg = yaml.safe_load(cfg_path.read_text())
cfg["paths"]["root"] = CMP_ROOT
(out_path:=cfg_path).write_text(yaml.safe_dump(cfg))
print("Wrote CMP path into:", out_path)
print(Path(out_path).read_text())


Wrote CMP path into: /content/stg-system-main/stg-real-eval/configs/block_b_cmp.yaml
dataset: cmp-facade
eval:
  image_size:
  - 512
  - 512
  max_images: 10
outputs:
  results_dir: results/block_b/cmp
  save_json: true
paths:
  root: /content/cmp/facade_db



In [8]:
#@title Verify dataset paths
import os
print("CMP_ROOT:", CMP_ROOT, "exists:", os.path.isdir(CMP_ROOT))
print("CITYSCAPES_SEQ_ROOT:", CITYSCAPES_SEQ_ROOT, "exists:", os.path.isdir(CITYSCAPES_SEQ_ROOT))
print("CITYSCAPES_STILL_ROOT:", CITYSCAPES_STILL_ROOT, "exists:", os.path.isdir(CITYSCAPES_STILL_ROOT))
print("NUSCENES_ROOT:", NUSCENES_ROOT, "exists:", os.path.isdir(NUSCENES_ROOT))


CMP_ROOT: /content/cmp/facade_db exists: False
CITYSCAPES_SEQ_ROOT: /content/cityscapes/leftImg8bit_sequence/demoVideo exists: False
CITYSCAPES_STILL_ROOT: /content/cityscapes/leftImg8bit/val exists: False
NUSCENES_ROOT: /content/nuscenes-mini exists: False


## 5) (Optional) Create a tiny debug CMP folder
If you don't have CMP yet, generate 6 dummy images to exercise the pipeline.

In [9]:
#@title Create dummy CMP images (optional)
import os
from PIL import Image, ImageDraw
from pathlib import Path
import numpy as np

if not os.path.exists(CMP_ROOT):
    os.makedirs(CMP_ROOT, exist_ok=True)
    for i in range(6):
        img = Image.new("RGB", (640, 384), (220, 220, 230))
        d = ImageDraw.Draw(img)
        for x in range(40, 600, 60):
            for y in range(40, 320, 70):
                d.rectangle([x, y, x+30, y+20], outline=(0,0,0))
        img.save(f"{CMP_ROOT}/cmp_{i:03d}.jpg")
print("CMP_ROOT contains:", len(list(Path(CMP_ROOT).glob('*.jpg'))) + len(list(Path(CMP_ROOT).glob('*.png'))), "images")
!ls -1 "$CMP_ROOT" | head


CMP_ROOT contains: 6 images
cmp_000.jpg
cmp_001.jpg
cmp_002.jpg
cmp_003.jpg
cmp_004.jpg
cmp_005.jpg


## 7) **Dry test** on CMP
This invokes the end-to-end Block B runner with the CMP config.

In [10]:
#@title Run dry test (CMP)
%cd $ROOT
!python stg-real-eval/src/run_eval_real.py --config stg-real-eval/configs/block_b_cmp.yaml
!echo "Results:"
!find results/block_b/cmp -maxdepth 1 -type f -name "*.json" -print


/content/stg-system-main
Downloading: "https://github.com/intel-isl/MiDaS/zipball/master" to /content/torch_hub/hub/master.zip
Loading weights:  None
Downloading: "https://github.com/rwightman/gen-efficientnet-pytorch/zipball/master" to /content/torch_hub/hub/master.zip
Downloading: "https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/tf_efficientnet_lite3-b733e338.pth" to /content/torch_hub/hub/checkpoints/tf_efficientnet_lite3-b733e338.pth
Downloading: "https://github.com/isl-org/MiDaS/releases/download/v2_1/midas_v21_small_256.pt" to /content/torch_hub/hub/checkpoints/midas_v21_small_256.pt
100% 81.8M/81.8M [00:02<00:00, 33.5MB/s]
Downloading: "https://download.pytorch.org/models/vit_b_16-c867db91.pth" to /content/torch_hub/hub/checkpoints/vit_b_16-c867db91.pth
100% 330M/330M [00:01<00:00, 205MB/s]
100%|████████████████████████████████████████| 335M/335M [00:02<00:00, 154MiB/s]
✅ Cached 6 frames for cmp-facade/cmp_10 using features: ('clip', 'dino', 'mid

In [11]:
%cd $ROOT
!python stg-real-eval/src/run_eval_real.py --config stg-real-eval/configs/block_b_cityscapes.yaml
!echo "Results:"
!find results/block_b/cityscapes -maxdepth 1 -type f -name "*.json" -print


/content/stg-system-main
{
  "summary": []
}
Results:
find: ‘results/block_b/cityscapes’: No such file or directory


By default all four extractors run (CLIP, DINO, MiDaS, SAM).
To limit which ones are extracted during development, set the environment variable `STG_FEATURES` before the run, e.g.:
```python
import os
os.environ['STG_FEATURES'] = 'clip,dino'
```
If present, the scripts will only cache those features.

In [None]:
#@title Run CMP with real feature cache (Phase 2A)
%cd $ROOT

import yaml
from pathlib import Path
cfg_path = Path("stg-real-eval/configs/block_b_cmp.yaml")
cfg = yaml.safe_load(cfg_path.read_text())
cfg["paths"]["root"] = CMP_ROOT
cfg_path.write_text(yaml.safe_dump(cfg))

!python stg-real-eval/src/run_eval_real.py --config stg-real-eval/configs/block_b_cmp.yaml

print("\nCache preview:")
!find cache/block_b -maxdepth 3 -type f | head -n 20


## 8) Optional sanity test on Cityscapes demo subset
This cell runs the subset selector to copy a few frames from the `demoVideo` split and tests the pipeline.

In [None]:
#@title Create and test a Cityscapes demo subset
%cd $ROOT

CITYSCAPES_DEMO = CITYSCAPES_SEQ_ROOT
DEST_SUBSET = "/content/stg-real-eval/datasets/cityscapes_subset"

!python stg-real-eval/src/scripts/select_cityscapes_subset.py --root "$CITYSCAPES_DEMO" --num_towns 1 --num_frames 10 --dest "$DEST_SUBSET"

# Patch config to use the subset as seq_root
import yaml
from pathlib import Path
cfg_path = Path(ROOT) / "stg-real-eval/configs/block_b_cityscapes.yaml"
cfg = yaml.safe_load(cfg_path.read_text())
cfg["paths"]["seq_root"] = DEST_SUBSET
cfg_path.write_text(yaml.safe_dump(cfg))

print("Running Block B pipeline on the subset...")
!python stg-real-eval/src/run_eval_real.py --config stg-real-eval/configs/block_b_cityscapes.yaml

# Visualize subset contents (first 3 frames)
from IPython.display import Image, display
subset_imgs = sorted(list(Path(DEST_SUBSET).glob("**/*.png")))[:3]
for im in subset_imgs:
    display(Image(filename=str(im)))
print(f"Displayed {len(subset_imgs)} images from subset.")


In [None]:
#@title Cache sanity check
import os
print("Torch Hub home:", os.environ.get("TORCH_HOME", "(default)"))
!du -sh cache/block_b || echo "No cache yet"
!find cache/block_b -maxdepth 2 -type d -print
