# 🧪 VLM Health Lab (Colab) – Image Tagger 3.4.74_vlm_lab_notebook_TL_runbook

This notebook gives you an **anti-gravity, ephemeral lab** for the Image Tagger project.

It is designed to:
- unpack a copy of the Image Tagger repo,
- set up a small database and a tiny synthetic image set,
- run the **science pipeline** in stub mode (no real API keys needed),
- and then run the **VLM Health** variance audit on the results.

## Benefits vs Costs

**Benefits (TRACK B – Science Notebook):**
- Works on most machines with a browser and a Google account.
- No Docker, no local Postgres install.
- Good for *short, self-contained experiments* and demos.
- Lets you inspect how the VLM health scripts behave on a toy dataset.

**Costs / tradeoffs:**
- The environment is **ephemeral**:
  - when the Colab runtime disconnects or you close the tab,
    everything under `/content` is **lost**.
  - only files you explicitly **save to Google Drive** or **download**
    will persist.
- Performance is limited; this is not for large-scale runs.
- If the repo layout changes, you may need to adjust a path or two.

## Relationship to the Full App (TRACK A)

- The **Full App (TRACK A)** runs via Docker (locally, Codespaces, or a VM).
  - It is **persistent**: data and settings survive restarts.
  - It is ideal for multi-week projects and full tagging workflows
    with Workbench and Explorer.
- This notebook is **ephemeral** and **minimal**:
  - it focuses on the **science + VLM health** parts only.
  - use it when students cannot run Docker or when you want
    a light-weight lab exercise.

Before using this notebook in class, instructors and TAs should read:

- `docs/ops/Cloud_AntiGravity_Quickstart.md`
- `docs/ops/Student_Quickstart_v3.4.73.md`

---


In [None]:
# @title 📦 Step 1: Setup "Anti-Gravity" Environment
# @markdown This cell installs Python dependencies and a lightweight PostgreSQL database
# @markdown directly in this notebook. **Run it once at the start.**
import os

print("⬇️ Installing Python libraries...")
!pip install -q fastapi uvicorn sqlalchemy psycopg2-binary pydantic pydantic-settings
!pip install -q openai anthropic pandas numpy scipy scikit-image requests opencv-python-headless

print("🐘 Installing PostgreSQL (this may take a minute)...")
!sudo apt-get -y -qq update
!sudo apt-get -y -qq install postgresql

print("🔧 Starting PostgreSQL service...")
!sudo service postgresql start

print("🛠️ Configuring database user and schema...")
!sudo -u postgres psql -c "CREATE USER tagger WITH PASSWORD 'tagger_pass';" || echo "User may already exist."
!sudo -u postgres psql -c "CREATE DATABASE image_tagger_v3 OWNER tagger;" || echo "DB may already exist."

os.environ['DATABASE_URL'] = "postgresql://tagger:tagger_pass@localhost:5432/image_tagger_v3"
os.environ['IMAGE_STORAGE_ROOT'] = "/content/data_store"

print("✅ Environment Ready. If you see errors above, read them carefully before continuing.")


In [None]:
# @title 📂 Step 2: Upload the Image Tagger ZIP
# @markdown 1. Run this cell.
# @markdown 2. Click **"Choose Files"** when prompted.
# @markdown 3. Upload the Image Tagger repo zip your instructor gave you
# @markdown    (for example: `Image_Tagger_3.4.74_vlm_lab_TL_runbook_full.zip` or later).
from google.colab import files
import zipfile
import os

uploaded = files.upload()
if not uploaded:
    raise RuntimeError("No file uploaded. Please upload the Image Tagger zip.")
filename = next(iter(uploaded))
print(f"📦 Unpacking {filename}...")

os.makedirs("/content/repo", exist_ok=True)
with zipfile.ZipFile(filename, 'r') as zip_ref:
    zip_ref.extractall("/content/repo")

os.chdir("/content/repo")
print("✅ Repo unpacked in /content/repo and set as current directory.")


In [None]:
# @title 🌱 Step 3: Seed Database & Generate Toy Images
# @markdown This step:
# @markdown - creates database tables,
# @markdown - seeds basic configuration (if seed scripts are present),
# @markdown - generates a few synthetic "architectural" images.
import sys
import numpy as np
import cv2
from pathlib import Path

# Ensure Python can find backend modules
sys.path.append("/content/repo")

print("🔄 Creating database tables...")
from backend.database.core import engine, Base, SessionLocal
from backend.models import *  # noqa: F401,F403
Base.metadata.create_all(bind=engine)

print("🌱 Seeding configs & attributes (if seed scripts are present)...")
try:
    !python3 backend/scripts/seed_tool_configs.py
except Exception as e:
    print("   -> seed_tool_configs.py not found or failed:", e)
try:
    !python3 backend/scripts/seed_attributes.py
except Exception as e:
    print("   -> seed_attributes.py not found or failed:", e)

print("🖼️ Generating synthetic images...")
data_store = Path(os.environ['IMAGE_STORAGE_ROOT'])
data_store.mkdir(parents=True, exist_ok=True)

from backend.models.assets import Image

with SessionLocal() as db:
    for i in range(1, 6):
        img = np.random.randint(0, 255, (512, 512, 3), dtype=np.uint8)
        # Draw a central "building" rectangle
        cv2.rectangle(img, (100, 100), (400, 400), (200, 200, 200), -1)
        p = data_store / f"toy_arch_{i}.jpg"
        cv2.imwrite(str(p), img)

        db_img = Image(filename=p.name, storage_path=str(p))
        db.add(db_img)
        db.commit()
        print(f"   -> Created {p.name} (ID: {db_img.id})")

print("✅ Toy data ready.")


In [None]:
# @title 🔬 Step 4: Run Science Pipeline (Stub VLM)
# @markdown This runs the Image Tagger science pipeline on the toy images.
# @markdown If no real VLM keys are set, it should fall back to a stub/neutral engine,
# @markdown which is fine for testing the *plumbing*.
import asyncio
from backend.database.core import SessionLocal
from backend.models.assets import Image
from backend.science.pipeline import SciencePipeline

print("🧪 Running science pipeline on toy images...")

async def run_pipeline():
    db = SessionLocal()
    try:
        pipeline = SciencePipeline(db)
        images = db.query(Image).all()
        for img in images:
            print(f"   Analyzing Image {img.id} ({img.filename})...")
            await pipeline.process_image(img.id)
    finally:
        db.close()

await run_pipeline()
print("✅ Science pipeline complete.")


In [None]:
# @title 📊 Step 5: Run VLM Health Variance Audit
# @markdown This uses the `scripts/audit_vlm_variance.py` helper script from the repo
# @markdown to compute simple variance statistics over VLM outputs.
# @markdown
# @markdown **Note:** The exact output paths may differ slightly by version.
# @markdown If no CSV appears, inspect the script output and adjust the glob pattern.
import glob
import pandas as pd

print("📉 Running VLM variance audit...")
!python3 scripts/audit_vlm_variance.py

candidates = []
candidates.extend(glob.glob("reports/vlm_health/*/vlm_variance_audit.csv"))
candidates.extend(glob.glob("reports/vlm_health/*/variance_audit.csv"))

if not candidates:
    print("❌ No variance audit CSV found.")
    print("   Please check the output of audit_vlm_variance.py above and adjust the path if needed.")
else:
    path = sorted(candidates)[-1]
    print(f"📄 Found variance audit CSV: {path}")
    df = pd.read_csv(path)
    display(df.head())
    print(f"Rows: {len(df)}")
