# OR-SIM — Complete End-to-End Testing Guide
## Backend on Kaggle (GPU) + Frontend on Local Machine

```
Your Windows Machine                    Kaggle Notebook (T4 GPU)
──────────────────────                  ─────────────────────────
Browser → http://localhost:5173         uvicorn :8000  (FastAPI)
    Vite dev-server                          │  MedGemma GGUF  (~2.1 GB)
          │  HTTPS / WSS                     │  MedASR int8 ONNX  (~85 MB)
    ngrok public URL ───────────────────────►│
```

## What this notebook covers
| Phase | Cells | What is tested |
|-------|-------|----------------|
| **Setup** | 1–6 | GPU, repo clone, llama-cpp CUDA wheel, Python deps, model downloads |
| **Unit tests** | 7 | Full pytest suite (routes + WebSocket manager + pipeline mocks) |
| **Component smoke** | 8–10 | ASR loads + transcribes · LLM loads + responds · Pipeline state transitions |
| **Server + tunnel** | 11–13 | FastAPI binds port 8000, all REST endpoints healthy, ngrok URL printed |
| **E2E API tests** | 14 | Automated REST + WebSocket round-trip against the live server |
| **Frontend (local)** | 15 | Step-by-step Vite testing instructions |
| **Keep-alive** | 16 | Ping loop prevents Kaggle idle timeout |

## Prerequisites
| Requirement | Where |
|---|---|
| Kaggle GPU — T4 x2 **or** P100 (≥ 15 GB VRAM) | Notebook Settings → Accelerator |
| Internet access **On** | Notebook Settings → Internet |
| Free ngrok account | https://ngrok.com |
| Node.js ≥ 18 (local Windows PC) | https://nodejs.org |

## Only two values you must fill in before running
| Variable | Cell | Value |
|---|---|---|
| `NGROK_AUTH_TOKEN` | 12 | Your ngrok auth token — https://dashboard.ngrok.com/get-started/your-authtoken |
| `HF_TOKEN` | 6 | Optional — enables Tier 1 google/medasr (6.6% WER). Leave `''` to use Tier 2 int8 (always works, no token). |


---
## Cell 1 — Verify GPU accelerator


In [None]:
import subprocess, sys

result = subprocess.run(
    ['nvidia-smi', '--query-gpu=name,memory.total,driver_version,cuda_version',
     '--format=csv,noheader'],
    capture_output=True, text=True,
)
if result.returncode != 0:
    raise RuntimeError(
        'No GPU detected.  Go to Notebook Settings → Accelerator → select GPU T4 x2 or P100.'
    )

gpu_line = result.stdout.strip()
print('GPU detected:', gpu_line)

# Warn if less than 12 GB (MedGemma Q3 needs ~2.1 GB VRAM, MedASR ~0.2 GB)
import re
mem_match = re.search(r'(\d+)\s*MiB', gpu_line)
if mem_match and int(mem_match.group(1)) < 12000:
    print('WARNING: GPU has < 12 GB VRAM — reduce n_gpu_layers in the session/start call if needed.')
else:
    print('VRAM check: OK')

# Also print CUDA runtime version
cuda_result = subprocess.run(['nvcc', '--version'], capture_output=True, text=True)
if cuda_result.returncode == 0:
    print('CUDA:', cuda_result.stdout.strip().split('\n')[-1])
else:
    print('nvcc not found — checking driver CUDA level from nvidia-smi only.')
print()
print('Cell 1 PASSED ✓')


---
## Cell 2 — Clone the OR-SIM repository

> `REPO_URL` points to the GitHub repo. The clone is skipped if already present (re-run safe).


In [None]:
import os, sys, subprocess

REPO_URL = 'https://github.com/Aditya-Lingam-9000/OR-SIM.git'
REPO_DIR = '/kaggle/working/OR-SIM'

if not os.path.isdir(os.path.join(REPO_DIR, '.git')):
    subprocess.run(['git', 'clone', '--depth', '1', REPO_URL, REPO_DIR], check=True)
    print(f'Cloned → {REPO_DIR}')
else:
    subprocess.run(['git', '-C', REPO_DIR, 'pull', '--ff-only'], check=True)
    print(f'Updated existing clone at {REPO_DIR}')

# Add repo root to sys.path — required for `import backend.*`
if REPO_DIR not in sys.path:
    sys.path.insert(0, REPO_DIR)

# Show last 5 commits as a sanity check
subprocess.run(['git', '-C', REPO_DIR, 'log', '--oneline', '-5'], check=True)

# Verify expected directories exist
for expected in ('backend', 'frontend', 'requirements', 'tests', 'models'):
    path = os.path.join(REPO_DIR, expected)
    assert os.path.isdir(path), f'Missing expected directory: {path}'
    print(f'  ✓  {expected}/')

print()
print('Cell 2 PASSED ✓')


---
## Cell 3 — Install llama-cpp-python (CUDA 12.4 prebuilt wheel)

Kaggle T4 instances ship **CUDA 12.4**.  Using the prebuilt wheel from `abetlen.github.io` takes ~30 s instead of ~5 min from source.  The `llama_supports_gpu_offload()` check confirms the CUDA extension compiled correctly.


In [None]:
subprocess.run(
    [
        sys.executable, '-m', 'pip', 'install', '--quiet',
        'llama-cpp-python',
        '--extra-index-url', 'https://abetlen.github.io/llama-cpp-python/whl/cu124',
    ],
    check=True,
)

import llama_cpp
lib    = getattr(llama_cpp, 'llama_cpp', None)
gpu_ok = lib.llama_supports_gpu_offload() if lib else False
print(f'llama_supports_gpu_offload() = {gpu_ok}')
if not gpu_ok:
    raise RuntimeError(
        'GPU offload NOT compiled in.\n'
        'Check Cell 1 shows a CUDA 12.x driver — the cu124 wheel requires CUDA ≥ 12.4.\n'
        'If on P100 with CUDA 11.x, change the index URL to cu118.'
    )

from llama_cpp import Llama
print('llama-cpp-python import OK — version:', llama_cpp.__version__)
print()
print('Cell 3 PASSED ✓')


---
## Cell 4 — Install OR-SIM Python dependencies

Installs `base.txt`, `asr.txt`, `server.txt`, `dev.txt` from the repo.  
`llm.txt` is **skipped** to preserve the CUDA wheel installed in Cell 3.  
`libportaudio2` is installed first — required by `sounddevice` even though there is no microphone on Kaggle.


In [None]:
# libportaudio2 — C library required by sounddevice (imported at module level in backend/asr)
subprocess.run(['apt-get', 'install', '-y', '-q', 'libportaudio2'], check=False)

req_files = ['base.txt', 'asr.txt', 'server.txt', 'dev.txt']
for rf in req_files:
    rpath = os.path.join(REPO_DIR, 'requirements', rf)
    assert os.path.isfile(rpath), f'Missing requirements file: {rpath}'
    subprocess.run(
        [sys.executable, '-m', 'pip', 'install', '--quiet', '-r', rpath],
        check=True,
    )
    print(f'  requirements/{rf}  installed  ✓')

# huggingface_hub — used in Cells 5 and 6 for model downloads
subprocess.run(
    [sys.executable, '-m', 'pip', 'install', '--quiet', 'huggingface_hub'],
    check=True,
)
print('  huggingface_hub          installed  ✓')

# Quick import smoke-test of key packages
for pkg, import_name in [
    ('fastapi',  'fastapi'),
    ('uvicorn',  'uvicorn'),
    ('loguru',   'loguru'),
    ('pydantic', 'pydantic'),
    ('numpy',    'numpy'),
    ('torch',    'torch'),
    ('onnxruntime', 'onnxruntime'),
    ('pytest',   'pytest'),
    ('httpx',    'httpx'),
]:
    try:
        __import__(import_name)
        print(f'  import {pkg:<20} OK')
    except ImportError as e:
        raise ImportError(f'Failed to import {pkg}: {e}')

print()
print('Cell 4 PASSED ✓')


---
## Cell 5 — Download MedGemma GGUF

Downloads `medgemma-4b-it-Q3_K_M.gguf` (~2.1 GB) from `unsloth/medgemma-4b-it-GGUF` (public, no token needed).  
Skipped automatically if the file already exists.


In [None]:
from huggingface_hub import hf_hub_download
from pathlib import Path

GGUF_DIR = Path(REPO_DIR) / 'models' / 'medgemma'
GGUF_DIR.mkdir(parents=True, exist_ok=True)

GGUF_FILE = GGUF_DIR / 'medgemma-4b-it-Q3_K_M.gguf'

if GGUF_FILE.exists():
    size_gb = GGUF_FILE.stat().st_size / 1e9
    print(f'GGUF already present  ({size_gb:.2f} GB) — skipping download.')
    if size_gb < 1.8:
        raise RuntimeError('GGUF file looks truncated (< 1.8 GB). Delete it and re-run.')
else:
    print('Downloading medgemma-4b-it-Q3_K_M.gguf (~2.1 GB)...')
    print('Expected time: 2–5 min on Kaggle')
    MODEL_PATH = Path(hf_hub_download(
        repo_id   = 'unsloth/medgemma-4b-it-GGUF',
        filename  = 'medgemma-4b-it-Q3_K_M.gguf',
        local_dir = str(GGUF_DIR),
    ))
    GGUF_FILE = MODEL_PATH

size_gb = GGUF_FILE.stat().st_size / 1e9
print(f'Model path : {GGUF_FILE}')
print(f'File size  : {size_gb:.2f} GB')
assert size_gb > 1.8, 'GGUF file suspiciously small — re-run this cell.'

MODEL_PATH = GGUF_FILE   # used in later cells
print()
print('Cell 5 PASSED ✓')


---
## Cell 6 — Download MedASR models

**Part A (always runs)** — Tier 2 sherpa-onnx int8 ONNX model (~85 MB, public, no token).  
**Part B (optional)** — Tier 1 google/medasr PyTorch weights (~420 MB, requires HF token and accepting the model terms at https://huggingface.co/google/medasr).  

| Tier | Model | WER | Token req? |
|------|-------|-----|------------|
| 1 | google/medasr PyTorch | 6.6% | Yes |
| 2 | sherpa-onnx int8 ONNX | ~18% | No |


In [None]:
from huggingface_hub import hf_hub_download
from pathlib import Path

ASR_DIR = Path(REPO_DIR) / 'models' / 'medasr'
ASR_DIR.mkdir(parents=True, exist_ok=True)

# ── Part A: Tier 2 baseline (always runs) ─────────────────────────────────────
INT8_REPO = 'csukuangfj/sherpa-onnx-medasr-ctc-en-int8-2025-12-25'
for fname in ('model.int8.onnx', 'tokens.txt'):
    fpath = ASR_DIR / fname
    if fpath.exists():
        print(f'  [Tier 2] {fname}  already present  ({fpath.stat().st_size / 1e6:.1f} MB) — skip')
    else:
        print(f'  [Tier 2] Downloading {fname}...')
        hf_hub_download(repo_id=INT8_REPO, filename=fname, local_dir=str(ASR_DIR))
        print(f'  [Tier 2] {fname}  downloaded  ({(ASR_DIR / fname).stat().st_size / 1e6:.1f} MB)')

assert (ASR_DIR / 'model.int8.onnx').exists(), 'int8 ONNX download failed!'
assert (ASR_DIR / 'tokens.txt').exists(),      'tokens.txt download failed!'
print()
print('[Tier 2] MedASR int8 ONNX ready ✓')

# ── Part B: Tier 1 upgrade (optional) ────────────────────────────────────────
# Set your HF token below OR add a Kaggle Secret named 'HF_TOKEN'
HF_TOKEN = ''   # ← paste: hf_xxxxxxxxxxxxxxxxxxxx

if not HF_TOKEN:
    try:
        from kaggle_secrets import UserSecretsClient
        HF_TOKEN = UserSecretsClient().get_secret('HF_TOKEN')
        print('[Tier 1] HF_TOKEN loaded from Kaggle Secret ✓')
    except Exception:
        pass

if not HF_TOKEN:
    print()
    print('[Tier 1] No HF_TOKEN — Tier 1 download skipped.  Tier 2 (int8 ONNX) will be used.')
    print('         To upgrade: accept terms at https://huggingface.co/google/medasr')
    print('         then paste your token above and re-run.')
else:
    TORCH_DIR = ASR_DIR / 'pytorch'
    if (TORCH_DIR / 'config.json').exists():
        size_mb = sum(f.stat().st_size for f in TORCH_DIR.rglob('*') if f.is_file()) / 1e6
        print(f'[Tier 1] google/medasr already downloaded ({size_mb:.0f} MB) — skipping.')
    else:
        TORCH_DIR.mkdir(parents=True, exist_ok=True)
        print('[Tier 1] Downloading google/medasr PyTorch weights (~420 MB)...')
        try:
            from huggingface_hub import snapshot_download
            snapshot_download(
                repo_id         = 'google/medasr',
                local_dir       = str(TORCH_DIR),
                token           = HF_TOKEN,
                ignore_patterns = ['*.msgpack', '*.h5', 'flax_model*', 'tf_model*', 'rust_model*'],
            )
            size_mb = sum(f.stat().st_size for f in TORCH_DIR.rglob('*') if f.is_file()) / 1e6
            print(f'[Tier 1] google/medasr downloaded ({size_mb:.0f} MB) ✓')
        except Exception as e:
            print(f'[Tier 1] Download failed ({type(e).__name__}: {e})')
            print('[Tier 1] Falling back to Tier 2 (int8 ONNX). No action needed.')

print()
print('Cell 6 PASSED ✓')


---
## Cell 7 — Run unit tests via pytest

Runs the full test suite inside Kaggle with all dependencies already installed.  
All tests use mocks — no real GPU inference, no microphone.  

Expected: **all tests PASSED** (no errors, no failures).  
The test run covers:
- `tests/phase5/test_routes.py` — all REST + WebSocket endpoints
- `tests/phase5/test_ws_manager.py` — ConnectionManager unit tests


In [None]:
import subprocess, sys, os

result = subprocess.run(
    [
        sys.executable, '-m', 'pytest',
        'tests/',
        '-v',
        '--tb=short',
        '--color=yes',
        '-q',
    ],
    cwd=REPO_DIR,
    capture_output=False,
    text=True,
    env={**os.environ, 'PYTHONPATH': REPO_DIR, 'PYTHONDONTWRITEBYTECODE': '1'},
)

print(f'\npytest exit code: {result.returncode}')
if result.returncode != 0:
    raise RuntimeError(
        'pytest FAILED.  Scroll up to see which tests failed and why.\n'
        'Common cause: missing import → re-run Cell 4.\n'
        'If a specific test fails, check the traceback above.'
    )

print()
print('Cell 7 PASSED ✓  — all unit tests green')


---
## Cell 8 — Smoke test: MedASR loads and transcribes

Exercises the ASR transcriber with a synthetic 1-second sine-wave audio buffer.  
Tests that the int8 ONNX model loads, runs inference, and returns a string.


In [None]:
import sys, os
sys.path.insert(0, REPO_DIR)
os.environ.setdefault('MEDASR_MODEL_DIR', str(Path(REPO_DIR) / 'models' / 'medasr'))

import numpy as np

# Synthetic audio: 1 s of 440 Hz sine at 16 kHz (silence-equivalent for ASR — expects empty/low output)
SAMPLE_RATE = 16_000
t = np.linspace(0, 1.0, SAMPLE_RATE, dtype=np.float32)
audio_chunk = (0.05 * np.sin(2 * np.pi * 440 * t)).astype(np.float32)

from backend.asr.transcriber import MedASRTranscriber

print('Loading MedASR transcriber...')
asr = MedASRTranscriber()
print(f'  model tier : {getattr(asr, "tier", "unknown")}')
print(f'  model path : {getattr(asr, "model_path", "unknown")}')

print('Running inference on synthetic audio...')
result = asr.transcribe(audio_chunk)
print(f'  transcription result : {repr(result)}')
assert isinstance(result, str), f'Expected str, got {type(result)}'
print()
print('Cell 8 PASSED ✓  — MedASR loads and runs inference successfully')


---
## Cell 9 — Smoke test: MedGemma LLM loads and parses OR commands

Loads `medgemma-4b-it-Q3_K_M.gguf` onto GPU and runs a single structured inference.  
**Expected time: 20–40 s** (model load from disk + 1 forward pass).


In [None]:
import os
os.environ['MEDASR_MODEL_DIR']   = str(Path(REPO_DIR) / 'models' / 'medasr')
os.environ['MEDGEMMA_MODEL_DIR'] = str(Path(REPO_DIR) / 'models' / 'medgemma')

from backend.llm.medgemma     import MedGemmaModel
from backend.data.surgeries   import SurgeryType

print('Loading MedGemma model onto GPU...')
print('(~20–40 s on T4 — this is normal)')
llm = MedGemmaModel(model_path=str(MODEL_PATH), n_gpu_layers=-1)
print('  model loaded ✓')

# Build a test prompt for Heart Transplant surgery
from backend.llm.prompt_builder import build_prompt

surgery     = SurgeryType.HEART_TRANSPLANT
transcripts = ['turn on the heart lung machine', 'activate the patient monitor']
machines    = ['Patient Monitor', 'Ventilator', 'Cardiopulmonary Bypass']

print()
print('Running structured inference...')
prompt = build_prompt(surgery=surgery, transcript='\n'.join(transcripts), machines=machines)
raw_output = llm.generate(prompt)

print(f'  raw output  : {repr(raw_output[:200])}')
assert isinstance(raw_output, str) and len(raw_output) > 0, 'LLM returned empty output!'

# Parse the output
from backend.llm.output_parser import parse_llm_output

parsed = parse_llm_output(raw_output, machines=machines)
print(f'  parsed      : {parsed}')
assert isinstance(parsed, dict), f'Expected dict, got {type(parsed)}'

print()
print('Cell 9 PASSED ✓  — MedGemma loads, infers, and parses machine states correctly')


---
## Cell 10 — Smoke test: Pipeline state transitions (mocked audio)

Instantiates an `ORPipeline` with mocked audio capture so no real microphone is needed.  
Simulates a voice command by directly calling the LLM → state update path.


In [None]:
from unittest.mock import patch, MagicMock
import threading, time

from backend.pipeline.pipeline  import ORPipeline
from backend.data.surgeries     import SurgeryType
from backend.data.models        import ORStateSnapshot

surgery = SurgeryType.HEART_TRANSPLANT

# Mock AudioCapture so no microphone is required
with patch('backend.pipeline.pipeline.AudioCapture') as MockAudio:
    mock_audio_instance         = MagicMock()
    mock_audio_instance.start   = MagicMock()
    mock_audio_instance.stop    = MagicMock()
    mock_audio_instance.read    = MagicMock(return_value=None)   # no audio
    MockAudio.return_value      = mock_audio_instance

    pipeline = ORPipeline(surgery=surgery, n_gpu_layers=-1)

print(f'Pipeline created for surgery: {surgery.value}')
print(f'  state_manager type: {type(pipeline.state_manager).__name__}')

# Verify initial state snapshot
snapshot = pipeline.state_manager.get_snapshot()
print(f'  initial snapshot type   : {type(snapshot).__name__}')
assert isinstance(snapshot, ORStateSnapshot), 'Expected ORStateSnapshot'
print(f'  initial surgery         : {snapshot.surgery}')
assert snapshot.surgery == surgery.value, 'Surgery name mismatch'

# Verify all machines are initially OFF
initial_on = snapshot.machine_states.get('1', [])
print(f'  initial machines ON     : {initial_on}')
assert initial_on == [], 'Machines should all be OFF at startup'

# Simulate a transcription event triggering the LLM pipeline
print()
print('Simulating voice command: "activate ventilator and patient monitor"...')
pipeline.state_manager._process_transcript(
    'activate ventilator and patient monitor'
)
time.sleep(0.5)   # allow state update to propagate

updated = pipeline.state_manager.get_snapshot()
print(f'  machines ON after command: {updated.machine_states.get("1", [])}')
print(f'  transcription recorded  : {repr(updated.transcription)}')

print()
print('Cell 10 PASSED ✓  — Pipeline state transitions verified')


---
## Cell 11 — Start the FastAPI backend server

Launches `python -m backend.server` as a subprocess (avoids asyncio event-loop conflicts between uvicorn and Jupyter's own loop).  
MedGemma loads into GPU VRAM during startup — **allow up to 60 seconds**.


In [None]:
import socket, time, urllib.request, json as _json, os

# Stop any previous server from a prior run of this cell
try:
    if _server_proc.poll() is None:
        _server_proc.terminate()
        _server_proc.wait(timeout=5)
        print('Stopped previous server instance.')
except NameError:
    pass

env            = os.environ.copy()
env['PYTHONPATH']            = REPO_DIR
env['MEDASR_MODEL_DIR']      = str(Path(REPO_DIR) / 'models' / 'medasr')
env['MEDGEMMA_MODEL_DIR']    = str(Path(REPO_DIR) / 'models' / 'medgemma')

_server_proc = subprocess.Popen(
    [sys.executable, '-m', 'backend.server', '--host', '0.0.0.0', '--port', '8000'],
    cwd=REPO_DIR,
    env=env,
)
print(f'Server PID: {_server_proc.pid}')
print('Waiting for FastAPI to bind port 8000 ', end='', flush=True)

deadline   = time.time() + 90    # 90 s for model load
connected  = False
while time.time() < deadline:
    if _server_proc.poll() is not None:
        raise RuntimeError(
            f'Server exited early (exit code {_server_proc.returncode}).\n'
            'Scroll up for the traceback — common causes:\n'
            '  • GGUF not found → re-run Cell 5\n'
            '  • int8 ONNX not found → re-run Cell 6\n'
            '  • import error → re-run Cell 4'
        )
    try:
        with socket.create_connection(('127.0.0.1', 8000), timeout=1):
            connected = True
            break
    except OSError:
        print('.', end='', flush=True)
        time.sleep(1)

print()
if not connected:
    raise TimeoutError('Server did not bind within 90 s — check startup logs above.')

# Health check
try:
    with urllib.request.urlopen('http://127.0.0.1:8000/api/health', timeout=10) as r:
        health = _json.loads(r.read())
    print(f'Health check: {_json.dumps(health, indent=2)}')
    assert health['status'] == 'ok', f'Unexpected health status: {health}'
except Exception as e:
    raise RuntimeError(f'Health check failed: {e}')

print()
print('Cell 11 PASSED ✓  — FastAPI server is running and healthy')


---
## Cell 12 — Install pyngrok and open the tunnel

**You must paste your ngrok auth token** (free account) before running.  
Get it from https://dashboard.ngrok.com/get-started/your-authtoken  

Or add a Kaggle Secret (recommended for shared notebooks):  
Sidebar → Add-ons → Secrets → + Add → Key: `NGROK_TOKEN` → Value: your token


In [None]:
subprocess.run(
    [sys.executable, '-m', 'pip', 'install', 'pyngrok', '--quiet'],
    check=True,
)
from pyngrok import ngrok
print('pyngrok installed ✓')

# ── CONFIGURE YOUR TOKEN HERE ──────────────────────────────────────────────
NGROK_AUTH_TOKEN = 'YOUR_NGROK_AUTH_TOKEN'   # ← paste your token

# Option B — Kaggle Secret (uncomment the three lines below):
# from kaggle_secrets import UserSecretsClient
# NGROK_AUTH_TOKEN = UserSecretsClient().get_secret('NGROK_TOKEN')
# print('NGROK_AUTH_TOKEN: loaded from Kaggle Secret')

if NGROK_AUTH_TOKEN == 'YOUR_NGROK_AUTH_TOKEN':
    raise ValueError(
        'ngrok token not configured.\n'
        'Edit this cell and paste your real token.\n'
        'Get it from: https://dashboard.ngrok.com/get-started/your-authtoken'
    )

# Close any stale tunnels from previous runs
for _t in ngrok.get_tunnels():
    ngrok.disconnect(_t.public_url)

ngrok.set_auth_token(NGROK_AUTH_TOKEN)
tunnel      = ngrok.connect(8000, bind_tls=True)
NGROK_HTTPS = tunnel.public_url
NGROK_WSS   = NGROK_HTTPS.replace('https://', 'wss://')

print()
print('=' * 70)
print('  OR-SIM BACKEND IS LIVE')
print('=' * 70)
print(f'  HTTPS  (REST API + WebSocket) : {NGROK_HTTPS}')
print(f'  WSS    (WebSocket endpoint)   : {NGROK_WSS}/ws/state')
print('=' * 70)
print()
print('NEXT STEP: Copy the HTTPS URL and follow Cell 15 on your local machine.')
print()
print('Cell 12 PASSED ✓')


---
## Cell 13 — Verify all API endpoints through the tunnel

Confirms every REST endpoint is reachable via the ngrok URL.  
The `ngrok-skip-browser-warning` header bypasses the ngrok browser-warning page.


In [None]:
import urllib.request, json as _j

SKIP = {'ngrok-skip-browser-warning': '1'}

def _get(path, timeout=15):
    req = urllib.request.Request(f'{NGROK_HTTPS}{path}', headers=SKIP)
    with urllib.request.urlopen(req, timeout=timeout) as r:
        return r.status, _j.loads(r.read())

checks = [
    ('/api/health', 'Health'),
    ('/api/state',  'State (no session — expect 400)'),
]

results = []
for path, label in checks:
    try:
        status, data = _get(path)
        preview      = _j.dumps(data, indent=2)[:300]
        print(f'{label} →  HTTP {status}')
        print(preview)
        print()
        results.append((label, status, True))
    except urllib.error.HTTPError as e:
        # /api/state returns 400 when no session is active — that's expected
        if e.code == 400 and path == '/api/state':
            body = _j.loads(e.read())
            print(f'{label} →  HTTP 400 (expected — no active session)')
            print(f'  detail: {body.get("detail", "")}')
            print()
            results.append((label, 400, True))
        else:
            print(f'{label} →  UNEXPECTED HTTP {e.code}: {e.reason}')
            results.append((label, e.code, False))
    except Exception as ex:
        print(f'{label} →  ERROR: {ex}')
        results.append((label, None, False))

all_ok = all(ok for _, _, ok in results)
assert all_ok, 'One or more endpoint checks FAILED — see output above.'

print('All endpoint checks PASSED ✓')
print()
print('Cell 13 PASSED ✓')


---
## Cell 14 — End-to-end automated API tests (REST + WebSocket)

Runs a full test sequence against the live FastAPI server on port 8000:  

1. `GET /api/health` — server healthy  
2. `GET /api/state` — 400 while no session active  
3. `POST /api/session/start` with `surgery=heart` — 200, pipeline starts  
4. `GET /api/health` — `pipeline_active=True`  
5. `GET /api/state` — 200 with machine states dict  
6. `WS /ws/state` — connect, receive state snapshot JSON  
7. `POST /api/session/stop` — 200, pipeline stops  
8. `GET /api/health` — `pipeline_active=False` again  

Each step prints PASS/FAIL with the actual response so failures are easy to diagnose.


In [None]:
import urllib.request, json as _j, time

BASE   = 'http://127.0.0.1:8000'
SKIPHD = {'ngrok-skip-browser-warning': '1', 'Content-Type': 'application/json'}

def _req(method, path, body=None, expect_code=200):
    data_bytes = _j.dumps(body).encode() if body else None
    req = urllib.request.Request(
        f'{BASE}{path}',
        data    = data_bytes,
        headers = SKIPHD,
        method  = method,
    )
    try:
        with urllib.request.urlopen(req, timeout=30) as r:
            code   = r.status
            parsed = _j.loads(r.read())
    except urllib.error.HTTPError as e:
        code   = e.code
        parsed = _j.loads(e.read() or b'{}')

    ok  = (code == expect_code)
    tag = 'PASS' if ok else 'FAIL'
    print(f'  [{tag}]  {method} {path}  →  HTTP {code}')
    if not ok:
        print(f'         Expected {expect_code}, got {code}')
        print(f'         Response: {_j.dumps(parsed, indent=6)[:400]}')
    return code, parsed, ok

passed = 0
failed = 0

# ── Test 1: Health ──────────────────────────────────────────────────────────
code, data, ok = _req('GET', '/api/health', expect_code=200)
assert ok and data.get('status') == 'ok'
passed += 1
print(f'         pipeline_active={data.get("pipeline_active")}  surgery={data.get("surgery")}')

# ── Test 2: State without session → 400 ────────────────────────────────────
code, data, ok = _req('GET', '/api/state', expect_code=400)
assert ok and 'No active pipeline' in str(data.get('detail', ''))
passed += 1

# ── Test 3: Start session (heart transplant) ────────────────────────────────
print()
print('  Starting pipeline session for "heart" surgery...')
print('  (MedGemma may already be loaded from Cell 9 — startup is fast)')
code, data, ok = _req('POST', '/api/session/start', body={'surgery': 'heart'}, expect_code=200)
assert ok and data.get('status') == 'ok'
passed += 1
print(f'         response: {_j.dumps(data)}')

# ── Test 4: Health now shows pipeline active ────────────────────────────────
time.sleep(2)   # give pipeline a moment to register
code, data, ok = _req('GET', '/api/health', expect_code=200)
assert ok and data.get('pipeline_active') is True
passed += 1
print(f'         pipeline_active={data["pipeline_active"]}  surgery={data["surgery"]}')

# ── Test 5: State returns machine state dict ────────────────────────────────
code, data, ok = _req('GET', '/api/state', expect_code=200)
assert ok and data.get('status') == 'ok'
states = data.get('state', {}).get('machine_states', {})
assert isinstance(states, dict) and len(states) >= 1
passed += 1
print(f'         machine_states keys: {list(states.keys())}')

# ── Test 6: WebSocket — connect and receive a state snapshot ────────────────
print()
print('  Connecting to WebSocket /ws/state...')
try:
    import websocket as _ws_lib   # websocket-client
except ImportError:
    subprocess.run([sys.executable, '-m', 'pip', 'install', 'websocket-client', '--quiet'], check=True)
    import websocket as _ws_lib

ws_messages = []

def _on_message(ws, msg):
    ws_messages.append(_j.loads(msg))
    ws.close()

def _on_error(ws, err):
    print(f'  WebSocket error: {err}')

_wsapp = _ws_lib.WebSocketApp(
    'ws://127.0.0.1:8000/ws/state',
    on_message = _on_message,
    on_error   = _on_error,
)
import threading
_wst = threading.Thread(target=_wsapp.run_forever, daemon=True)
_wst.start()
_wst.join(timeout=10)

if ws_messages:
    msg = ws_messages[0]
    print(f'  [PASS]  WS /ws/state  →  received snapshot  surgery={msg.get("surgery")}')
    assert 'machine_states' in msg, 'Snapshot missing machine_states'
    passed += 1
else:
    print('  [FAIL]  WS /ws/state  →  no message received within 10 s')
    failed += 1

# ── Test 7: Stop session ────────────────────────────────────────────────────
print()
code, data, ok = _req('POST', '/api/session/stop', expect_code=200)
assert ok and data.get('status') == 'ok'
passed += 1

# ── Test 8: Health shows pipeline inactive ──────────────────────────────────
time.sleep(2)
code, data, ok = _req('GET', '/api/health', expect_code=200)
assert ok and data.get('pipeline_active') is False
passed += 1
print(f'         pipeline_active={data["pipeline_active"]} (should be False)')

# ── Summary ─────────────────────────────────────────────────────────────────
print()
print('=' * 50)
print(f'  E2E API Tests  — PASSED: {passed}  FAILED: {failed}')
print('=' * 50)
if failed > 0:
    raise AssertionError(f'{failed} E2E test(s) FAILED — see output above.')

print()
print('Cell 14 PASSED ✓  — all E2E API tests green')


---
## Cell 15 — Local Frontend Testing  *(execute on your Windows PC, not Kaggle)*

### Prerequisites check

Open **PowerShell** on your local machine and run:
```powershell
node --version     # must be >= 18
npm  --version     # must be >= 9
```
If missing → download from https://nodejs.org (LTS).

---

### Step 1 — Install frontend dependencies (first time only)
```powershell
cd d:\OR-SIM\frontend
npm install
```
Expected: `added NNN packages` with no errors.

---

### Step 2 — Set the backend URL in `.env.local`

Copy the HTTPS URL printed by Cell 12 above, then run:
```powershell
# Replace the URL with YOUR ngrok URL from Cell 12 output:
'VITE_BACKEND_URL=https://abcd-1234.ngrok-free.app' |
    Out-File -Encoding utf8 d:\OR-SIM\frontend\.env.local

# Verify the file was written correctly:
Get-Content d:\OR-SIM\frontend\.env.local
```
Expected output:
```
VITE_BACKEND_URL=https://abcd-1234.ngrok-free.app
```
> **No trailing slash.** Copy the exact URL from Cell 12.

---

### Step 3 — Run a production build check (catches JSX/import errors)
```powershell
cd d:\OR-SIM\frontend
npm run build
```
Expected output:
```
✓ NNN modules transformed.
dist/index.html             x.xx kB
dist/assets/index-XXXX.js   NNN.xx kB │ gzip:  NNN.xx kB
✓ built in X.XXs
```
If the build fails, it prints the exact component and line number — fix before proceeding.

---

### Step 4 — Start the Vite dev server
```powershell
cd d:\OR-SIM\frontend
npm run dev
```
Expected output:
```
  VITE v6.x.x  ready in ~641 ms

  ➜  Local:   http://localhost:5173/
  ➜  Network: use --host to expose
```

---

### Step 5 — Open the simulator and run an E2E session

1. Open **http://localhost:5173** in Chrome or Edge
2. The 3D OR room should render immediately (animated surgical lights, OR personnel, patient breathing)
3. Open browser DevTools (F12) → Console — confirm **no red errors**
4. Select a surgery from the dropdown (e.g. **Heart Transplant**)
5. Click **▶ Start Session**
   - The WebSocket status dot should turn **green** within 2 s
   - The surgery name should appear in the top bar
6. Allow microphone access when the browser asks
7. Speak a command: *"Activate the ventilator and patient monitor"*
   - The transcription bar at the bottom should display your words
   - The matching machines in the 3D room should glow and animate
8. Click **⏹ Stop Session** — machines go back to OFF state, dot turns grey

---

### Step 6 — Frontend checklist

| Test | Expected result |
|------|----------------|
| Page loads at `localhost:5173` | 3D OR room renders, no white screen |
| Console errors | **None** — DevTools console must be clean |
| Surgery dropdown | All 20 surgeries listed; selecting one updates the machines |
| Start Session | WS status dot turns green within 2 s |
| Voice command | Transcription bar updates; machines change state |
| Machine labels | All machine names clearly visible in the 3D scene |
| Surgical lights ON | Large flat LED panels illuminate, visible cone effect |
| Personnel animation | Surgeon/nurse arms move, patient chest rises/falls |
| Stop Session | All machines OFF, session ends cleanly |
| Changing surgery mid-session | Old pipeline stops, new one starts automatically |

---

### Troubleshooting

| Symptom | Cause | Fix |
|---------|-------|-----|
| White screen / React error | JSX error | Run `npm run build` to see the exact line |
| WS dot stays red | Wrong URL in `.env.local` | Re-run Step 2 with correct URL, restart `npm run dev` |
| `ERR_NGROK_3200` | Free tier: 1 tunnel max | Close other ngrok sessions at dashboard.ngrok.com |
| Microphone permission denied | Browser blocked mic | Click the lock icon in address bar → Allow microphone |
| Transcription not updating | Audio not reaching server | Check console for WebSocket messages; verify mic works |
| `npm run build` fails | Import or syntax error | Check the error line in the build output, fix the file |
| `node_modules` missing | First run | `cd d:\OR-SIM\frontend && npm install` |
| `.env.local` not picked up | Vite caches old env | Stop `npm run dev`, update file, restart |


---
## Cell 16 — Keep-alive ping loop

Prevents Kaggle from killing the kernel after ~30 min of inactivity.  
**Stop with the Interrupt (■) button when you are done.**


In [None]:
import datetime, time, urllib.request

PING_INTERVAL = 60       # seconds between pings
MAX_PINGS     = 720      # 720 × 60 s = 12 h (max Kaggle GPU session)

SKIP_WARN = {'ngrok-skip-browser-warning': '1'}

print(f'Keep-alive started — pinging every {PING_INTERVAL}s.')
print('Stop with the Interrupt (■) button.')
print()

for _i in range(1, MAX_PINGS + 1):
    time.sleep(PING_INTERVAL)

    # Check server process is still alive
    if _server_proc.poll() is not None:
        print(f'[{datetime.datetime.now(datetime.timezone.utc).strftime("%H:%M:%S UTC")}] '
              f'Server process died (code {_server_proc.returncode}).')
        print('Re-run Cells 11 and 12 to restart the server and tunnel.')
        break

    # Ping the local health endpoint (no ngrok overhead)
    try:
        with urllib.request.urlopen('http://127.0.0.1:8000/api/health', timeout=5) as _r:
            _status = _r.status
        _health_ok = True
    except Exception:
        _status    = 'error'
        _health_ok = False

    # Also ping through ngrok to keep the tunnel alive
    try:
        _req_ngrok = urllib.request.Request(
            f'{NGROK_HTTPS}/api/health', headers=SKIP_WARN)
        with urllib.request.urlopen(_req_ngrok, timeout=8) as _r_ngrok:
            _ngrok_status = _r_ngrok.status
    except Exception:
        _ngrok_status = 'error'

    ts = datetime.datetime.now(datetime.timezone.utc).strftime('%H:%M:%S UTC')
    print(f'  [{ts}]  ping #{_i:04d}  local→{_status}  ngrok→{_ngrok_status}')
