# OR-SIM — End-to-End: Kaggle Backend + Local Frontend via ngrok

**Environment**: Kaggle GPU T4x2 or P100 (>=15 GB VRAM)

```
Your Machine                          Kaggle Notebook (GPU)

Browser  http://localhost:5173       |  uvicorn :8000  (FastAPI + MedGemma + MedASR)
   Vite dev-server                   |       ↑
          ↕ HTTPS / WSS              |  ngrok tunnel
   ngrok public URL ─────────────────┘
```

## Steps at a glance
| # | Cell | Action |
|---|------|--------|
| 1 | GPU check | Verify T4/P100 is present |
| 2 | Clone repo | Pull OR-SIM from GitHub |
| 3 | llama-cpp | CUDA 12.4 prebuilt wheel (fast, no compile) |
| 4 | Python deps | Install requirements/base + asr + server |
| 5 | MedGemma | Download GGUF from `unsloth/medgemma-4b-it-GGUF` (~2.1 GB) |
| 6 | MedASR | Download int8 ONNX baseline (Part A, always runs); optionally export google/medasr (Part B) |
| 7 | ngrok | Install pyngrok + set auth token |
| 8 | Server | Launch FastAPI backend (subprocess) |
| 9 | Tunnel | Open ngrok — **copy the HTTPS URL** |
| 10 | Frontend | Step-by-step: set .env.local and start Vite |
| 11 | Sanity | Confirm tunnel reaches /api/health and /api/state |
| 12 | Keep-alive | Ping loop — prevents Kaggle idle timeout |

## Prerequisites
| Requirement | Where |
|---|---|
| Kaggle GPU notebook (T4 x2 or P100) | Notebook Settings → Accelerator |
| Internet access ON | Notebook Settings → Internet |
| ngrok auth token | https://dashboard.ngrok.com/get-started/your-authtoken |
| OR-SIM pushed to GitHub | Already done — all phases committed |

## Only one thing required before running
| Variable | Cell | What to set |
|---|---|---|
| `NGROK_AUTH_TOKEN` | 7 | Your ngrok auth token (or use a Kaggle Secret) |

> **HF token is optional** — Cell 6 Part A downloads a public int8 ASR model (no token needed).  
> To upgrade to the higher-quality `google/medasr` (6.6% WER), also complete Cell 6 Part B.



---
## Cell 1 — Verify GPU


In [None]:
import subprocess, sys

result = subprocess.run(
    ['nvidia-smi', '--query-gpu=name,memory.total,driver_version', '--format=csv,noheader'],
    capture_output=True, text=True,
)
if result.returncode == 0:
    print('GPU:', result.stdout.strip())
else:
    print('No GPU detected — go to Notebook Settings > Accelerator and enable GPU.')
    sys.exit('GPU required')


---
## Cell 2 — Clone OR-SIM repository

> **`REPO_URL` is pre-filled with your GitHub repo.  No changes needed.**

In [None]:
import os

REPO_URL = 'https://github.com/Aditya-Lingam-9000/OR-SIM.git'
REPO_DIR = '/kaggle/working/OR-SIM'

if not os.path.isdir(os.path.join(REPO_DIR, '.git')):
    subprocess.run(['git', 'clone', '--depth', '1', REPO_URL, REPO_DIR], check=True)
    print(f'Cloned -> {REPO_DIR}')
else:
    subprocess.run(['git', '-C', REPO_DIR, 'pull', '--ff-only'], check=True)
    print(f'Updated existing clone at {REPO_DIR}')

# Add repo root to sys.path so backend.* packages are importable in all later cells
if REPO_DIR not in sys.path:
    sys.path.insert(0, REPO_DIR)

# Show the 5 most recent commits as a sanity check
subprocess.run(['git', '-C', REPO_DIR, 'log', '--oneline', '-5'], check=True)


---
## Cell 3 — Install llama-cpp-python (CUDA 12.4 prebuilt wheel — no compilation)

Kaggle T4 instances ship CUDA 12.4.  
We use the prebuilt wheel from `abetlen.github.io` — identical to the Phase 3 approach.
Takes ~30 s vs ~5 minutes if built from source.

In [None]:
subprocess.run(
    [
        sys.executable, '-m', 'pip', 'install',
        'llama-cpp-python',
        '--extra-index-url', 'https://abetlen.github.io/llama-cpp-python/whl/cu124',
        '--quiet',
    ],
    check=True,
)

# Verify GPU offload was compiled in — MUST be True for decent latency
import llama_cpp
lib    = getattr(llama_cpp, 'llama_cpp', None)
gpu_ok = lib.llama_supports_gpu_offload() if lib else False
print(f'llama_supports_gpu_offload() = {gpu_ok}')
if not gpu_ok:
    raise RuntimeError(
        'GPU offload not compiled in.  '
        'Check nvidia-smi shows CUDA 12.x — the cu124 wheel requires CUDA >= 12.4.'
    )

from llama_cpp import Llama
print('llama-cpp-python imported successfully')


---
## Cell 4 — Install OR-SIM Python dependencies

Installs from the repo's own requirements files (`base.txt`, `asr.txt`, `server.txt`).  
`requirements/llm.txt` is **intentionally skipped** — `llama-cpp-python` was already installed
from the CUDA 12.4 prebuilt wheel in Cell 3; running pip against `llm.txt` would replace it
with the slower CPU-only build from PyPI.


In [None]:
# PortAudio C library — required by sounddevice (used in requirements/asr.txt).
# sounddevice is imported at module level inside backend/asr, so it must be
# importable even though there is no real microphone on Kaggle.
subprocess.run(
    ['apt-get', 'install', '-y', '-q', 'libportaudio2'],
    check=False,   # non-fatal — package may already be present on some Kaggle images
)

# Install from the repo's requirements files.
# llm.txt is intentionally excluded — llama-cpp-python was installed with CUDA
# support in Cell 3; installing it again from PyPI would overwrite the CUDA wheel.
for req_file in ('base.txt', 'asr.txt', 'server.txt'):
    subprocess.run(
        [
            sys.executable, '-m', 'pip', 'install', '--quiet',
            '-r', os.path.join(REPO_DIR, 'requirements', req_file),
        ],
        check=True,
    )
    print(f'  requirements/{req_file} installed')

# huggingface_hub is used in Cells 5 and 6 for model downloads
subprocess.run(
    [sys.executable, '-m', 'pip', 'install', '--quiet', 'huggingface_hub'],
    check=True,
)
print('huggingface_hub installed')
print('All OR-SIM Python dependencies installed')


---
## Cell 5 — Download MedGemma GGUF from HuggingFace

Model: `unsloth/medgemma-4b-it-GGUF`  `medgemma-4b-it-Q3_K_M.gguf` (~2.1 GB)  
Downloaded directly into the cloned repo at the path the backend expects.  
**No dataset upload required** — identical to the Phase 3 approach.

In [None]:
from huggingface_hub import hf_hub_download
from pathlib import Path

GGUF_DIR = Path(REPO_DIR) / 'models' / 'medgemma'
GGUF_DIR.mkdir(parents=True, exist_ok=True)

EXPECTED_GGUF = GGUF_DIR / 'medgemma-4b-it-Q3_K_M.gguf'

# Skip download if already present (e.g. re-running the notebook)
if EXPECTED_GGUF.exists():
    MODEL_PATH = EXPECTED_GGUF
    print(f'GGUF already present ({MODEL_PATH.stat().st_size / 1e9:.2f} GB) — skipping download.')
else:
    print('Downloading medgemma-4b-it-Q3_K_M.gguf from HuggingFace...')
    print('(~2.1 GB — takes 2-5 minutes on Kaggle)')
    MODEL_PATH = Path(hf_hub_download(
        repo_id   = 'unsloth/medgemma-4b-it-GGUF',
        filename  = 'medgemma-4b-it-Q3_K_M.gguf',
        local_dir = str(GGUF_DIR),
    ))

print(f'Model path : {MODEL_PATH}')
print(f'File size  : {MODEL_PATH.stat().st_size / 1e9:.2f} GB')


---
## Cell 6 — Download MedASR models

**Part A — Tier 2 baseline (always runs, ~3 min, no token needed)**  
Downloads `csukuangfj/sherpa-onnx-medasr-ctc-en-int8-2025-12-25` (public repo).  
This ensures the server never crashes at startup regardless of HF token status.

**Part B — Tier 1 upgrade (optional, ~5 min, requires HF token)**  
Downloads `google/medasr` PyTorch weights directly — **no ONNX conversion**, no export errors.  
Before running: accept terms at [huggingface.co/google/medasr](https://huggingface.co/google/medasr)  
and set `HF_TOKEN` below or add a Kaggle Secret named `HF_TOKEN`.

Active model priority at runtime:
| Tier | Model | WER | Requires |
|------|-------|-----|---------|
| 1 | google/medasr PyTorch (local) | 6.6% | Cell 6 Part B + HF token |
| 2 | sherpa-onnx int8 | ~18% | Cell 6 Part A only |
| 3 | google/medasr PyTorch (online) | 6.6% | HUGGING_FACE_HUB_TOKEN env |


In [None]:
import os
from pathlib import Path
from huggingface_hub import hf_hub_download

ASR_DIR = Path(REPO_DIR) / 'models' / 'medasr'
ASR_DIR.mkdir(parents=True, exist_ok=True)

# ═══════════════════════════════════════════════════════════════════════════════
# PART A — Tier 2 baseline: sherpa-onnx int8 ONNX (public, no token needed)
# Always runs. This guarantees the backend can start even without a HF token.
# ═══════════════════════════════════════════════════════════════════════════════
INT8_REPO = 'csukuangfj/sherpa-onnx-medasr-ctc-en-int8-2025-12-25'
for fname in ('model.int8.onnx', 'tokens.txt'):
    fpath = ASR_DIR / fname
    if fpath.exists():
        print(f'  [Tier 2] {fname} already present ({fpath.stat().st_size / 1e6:.1f} MB) — skipping.')
    else:
        print(f'  [Tier 2] Downloading {fname} from {INT8_REPO} ...')
        hf_hub_download(repo_id=INT8_REPO, filename=fname, local_dir=str(ASR_DIR))
        print(f'  [Tier 2] Downloaded {fname} ({(ASR_DIR / fname).stat().st_size / 1e6:.1f} MB)')

assert (ASR_DIR / 'model.int8.onnx').exists(), 'int8 ONNX download failed!'
print()
print('Tier 2 (int8 ONNX) ready — server will start using this model.')

# ═══════════════════════════════════════════════════════════════════════════════
# PART B — Tier 1 upgrade: download google/medasr PyTorch weights (no conversion)
# Skip this block if you don't have a token — Tier 2 above is enough.
# Note: no ONNX export — the backend loads the PyTorch model directly.
#       This avoids all LasrFeatureExtractor export errors.
# ═══════════════════════════════════════════════════════════════════════════════
# Set your HuggingFace token here (or add a Kaggle Secret named HF_TOKEN):
HF_TOKEN = ''   # ← paste: hf_xxxxxxxxxxxxxxxxxxxx

# Try Kaggle Secret if empty
if not HF_TOKEN:
    try:
        from kaggle_secrets import UserSecretsClient
        HF_TOKEN = UserSecretsClient().get_secret('HF_TOKEN')
        print('HF_TOKEN loaded from Kaggle Secret')
    except Exception:
        pass

if not HF_TOKEN:
    print()
    print('No HF_TOKEN set — skipping Tier 1 (google/medasr PyTorch) download.')
    print('The server will use Tier 2 (int8 ONNX). This is fine for a demo.')
    print('To upgrade: accept terms at https://huggingface.co/google/medasr')
    print('            then set HF_TOKEN above and re-run this cell.')
else:
    TORCH_DIR = ASR_DIR / 'pytorch'

    # Check if already downloaded (config.json is always present in a valid snapshot)
    if (TORCH_DIR / 'config.json').exists():
        size_mb = sum(f.stat().st_size for f in TORCH_DIR.rglob('*') if f.is_file()) / 1e6
        print(f'[Tier 1] google/medasr already downloaded ({size_mb:.0f} MB) — skipping.')
    else:
        TORCH_DIR.mkdir(parents=True, exist_ok=True)
        print('[Tier 1] Downloading google/medasr PyTorch weights (~420 MB)...')
        print('         (no ONNX conversion — loading model directly at runtime)')
        try:
            from huggingface_hub import snapshot_download
            snapshot_download(
                repo_id   = 'google/medasr',
                local_dir = str(TORCH_DIR),
                token     = HF_TOKEN,
                ignore_patterns=['*.msgpack', '*.h5', 'flax_model*', 'tf_model*', 'rust_model*'],
            )
            size_mb = sum(f.stat().st_size for f in TORCH_DIR.rglob('*') if f.is_file()) / 1e6
            print(f'[Tier 1] google/medasr downloaded ({size_mb:.0f} MB) ✓')
            print('[Tier 1] Server will use Tier 1 (google/medasr PyTorch, 6.6% WER).')
        except Exception as e:
            print(f'[Tier 1] Download failed ({type(e).__name__}: {e})')
            print('[Tier 1] Server will fall back to Tier 2 (int8 ONNX). No action needed.')

print()
print('Cell 6 complete.')


---
## Cell 7 — Install pyngrok and configure your auth token

1. Sign up (free) at https://ngrok.com  
2. Copy your token from https://dashboard.ngrok.com/get-started/your-authtoken  
3. Either paste it directly OR store it as a Kaggle Secret (recommended):
   - *Notebook Settings  Add-ons  Secrets  Add* — key: `NGROK_TOKEN`
   - Uncomment the `UserSecretsClient` lines and remove the placeholder string

In [None]:
subprocess.run(
    [sys.executable, '-m', 'pip', 'install', 'pyngrok', '--quiet'],
    check=True,
)
from pyngrok import ngrok
print('pyngrok installed')

#  CONFIGURE 
# Option A — paste token directly (do NOT commit a notebook with a real token)
NGROK_AUTH_TOKEN = 'YOUR_NGROK_AUTH_TOKEN'

# Option B — Kaggle Secret (recommended for shared notebooks)
# from kaggle_secrets import UserSecretsClient
# NGROK_AUTH_TOKEN = UserSecretsClient().get_secret('NGROK_TOKEN')
# 

if NGROK_AUTH_TOKEN == 'YOUR_NGROK_AUTH_TOKEN':
    raise ValueError(
        'Paste your real ngrok token above or use a Kaggle Secret.'
        '  Get it from https://dashboard.ngrok.com/get-started/your-authtoken'
    )

ngrok.set_auth_token(NGROK_AUTH_TOKEN)
print('ngrok auth token configured')


---
## Cell 8 — Start the FastAPI backend server

The server is launched as a **subprocess** (`python -m backend.server`).
Using a subprocess (not a thread) avoids asyncio event-loop conflicts between uvicorn
and Jupyter's own event loop — the most common cause of silent startup failures on Kaggle.

MedGemma loads into GPU VRAM during startup — allow up to **60 seconds**.

In [None]:
import socket, time, urllib.request, json as _json

# Stop any server left from a previous run of this cell
try:
    if _server_proc.poll() is None:
        _server_proc.terminate()
        _server_proc.wait(timeout=5)
        print('Stopped previous server instance.')
except NameError:
    pass  # first run

env = os.environ.copy()
env['PYTHONPATH'] = REPO_DIR

# python -m backend.server  is the canonical entry-point (same as local dev)
_server_proc = subprocess.Popen(
    [sys.executable, '-m', 'backend.server', '--host', '0.0.0.0', '--port', '8000'],
    cwd=REPO_DIR,
    env=env,
    # stdout/stderr flow to the Kaggle cell output so startup logs are visible
)
print(f'Server PID: {_server_proc.pid}')
print('Waiting for FastAPI to bind port 8000', end='', flush=True)

deadline   = time.time() + 60
connected  = False
while time.time() < deadline:
    if _server_proc.poll() is not None:
        raise RuntimeError(
            f'Server process exited early (exit code {_server_proc.returncode}).\n'
            'Scroll up for the traceback.'
        )
    try:
        with socket.create_connection(('127.0.0.1', 8000), timeout=1):
            connected = True
            break
    except OSError:
        print('.', end='', flush=True)
        time.sleep(1)

print()
if not connected:
    raise TimeoutError('Server did not bind within 60 s — check the logs above.')

# Quick health-check via plain urllib (no extra libraries needed)
try:
    with urllib.request.urlopen('http://127.0.0.1:8000/api/health', timeout=5) as r:
        health = _json.loads(r.read())
    print(f'Server healthy: {health}')
except Exception as e:
    print(f'Health check failed: {e}  — server may still be loading MedGemma, re-run this cell.')


---
## Cell 9 — Open ngrok tunnel and print the public URLs

**Copy the HTTPS URL printed below — you will paste it into `frontend/.env.local`.**

In [None]:
# Close any stale tunnels from previous runs
for _t in ngrok.get_tunnels():
    ngrok.disconnect(_t.public_url)

tunnel      = ngrok.connect(8000, bind_tls=True)
NGROK_HTTPS = tunnel.public_url
NGROK_WSS   = NGROK_HTTPS.replace('https://', 'wss://')

print()
print('=' * 65)
print('  OR-SIM BACKEND IS LIVE')
print('=' * 65)
print(f'  HTTPS (REST API) : {NGROK_HTTPS}')
print(f'  WSS  (WebSocket) : {NGROK_WSS}/ws/state')
print('=' * 65)
print()
print('Copy the HTTPS URL, then follow Cell 10 on your local machine.')


---
## Cell 10 — Connect the local frontend  *(run on YOUR machine, not Kaggle)*

### Step 1 — Create `frontend/.env.local`

```
# d:\OR-SIM\frontend\.env.local
VITE_BACKEND_URL=https://abcd-1234.ngrok-free.app
```

Replace `https://abcd-1234.ngrok-free.app` with the **HTTPS URL from Cell 9**. No trailing slash.

PowerShell one-liner (edit the URL first):
```powershell
'VITE_BACKEND_URL=https://abcd-1234.ngrok-free.app' | Out-File -Encoding utf8 d:\OR-SIM\frontend\.env.local
```

### Step 2 — Start the Vite dev server

```powershell
cd d:\OR-SIM\frontend
npm run dev
```

Expected output:
```
  VITE v6.x.x  ready in ~641 ms
  -> Local:   http://localhost:5173/
```

### Step 3 — Open the simulator

1. Open **http://localhost:5173** in Chrome or Edge.
2. Pick a surgery (Heart Transplant / Liver Resection / Kidney PCNL).
3. Click **Start Session**.
4. Allow microphone access when the browser asks.
5. Speak surgical commands — machines glow ON/OFF in the 3D room in real-time!

### Stopping

- Click **Stop Session** in the browser.
- Stop the Kaggle kernel (square button) to free GPU VRAM.
- The ngrok tunnel closes automatically with the kernel.

### Troubleshooting

| Symptom | Fix |
|---|---|
| WS status dot stays red | Check `.env.local` — no trailing slash, correct URL |
| `ERR_NGROK_3200` | Free tier allows 1 tunnel; close other open ngrok sessions |
| `ModuleNotFoundError: backend` | Re-run Cell 2 (sys.path), then Cell 8 |
| `sounddevice` ImportError | Re-run Cell 4 (installs libportaudio2) |
| Server exits immediately | Scroll up in Cell 8 for the Python traceback |
| GGUF not found | Check Cell 5 printed a valid file size |
| Kaggle kernel idle timeout | Re-run Cells 8 and 9 to restart server and tunnel |


---
## Cell 11 — Sanity check: confirm tunnel reaches the API


In [None]:
# ngrok adds a browser-warning page for unauthenticated requests.
# The header below skips it so we get the raw JSON response.
SKIP_WARN = {'ngrok-skip-browser-warning': '1'}

for path, label in [('/api/health', 'Health'), ('/api/state', 'State')]:
    url = f'{NGROK_HTTPS}{path}'
    try:
        req = urllib.request.Request(url, headers=SKIP_WARN)
        with urllib.request.urlopen(req, timeout=10) as r:
            data = _json.loads(r.read())
        preview = _json.dumps(data, indent=2)[:400]
        print(f'{label}:\n{preview}\n')
    except Exception as e:
        print(f'{label} check failed: {e}')


---
## Cell 12 — Keep-alive ping *(optional — run while using the simulator)*

Kaggle kills idle kernels after ~30 minutes.  Interrupt the cell (square button) to stop.

In [None]:
import datetime

PING_INTERVAL = 60     # seconds between pings
MAX_PINGS     = 720    # 720 x 60 s = 12 hours (full Kaggle session limit)

print(f'Keep-alive running — ping every {PING_INTERVAL}s.  Stop with the interrupt button.')
for _i in range(1, MAX_PINGS + 1):
    time.sleep(PING_INTERVAL)

    if _server_proc.poll() is not None:
        print(f'Server process died (code {_server_proc.returncode}).  Re-run Cell 8 + 9.')
        break

    try:
        _req = urllib.request.Request(f'{NGROK_HTTPS}/api/health', headers=SKIP_WARN)
        with urllib.request.urlopen(_req, timeout=5) as _r:
            _status = _r.status
    except Exception:
        _status = 'error'

    _ts = datetime.datetime.now(datetime.timezone.utc).strftime('%H:%M:%S UTC')
    print(f'  [{_ts}] ping #{_i:03d} -> HTTP {_status}')


---
# Complete Step-by-Step Run Instructions

## Part A — Kaggle (backend)

### Step 1 — Create a new Kaggle notebook
1. Go to https://www.kaggle.com → **New Notebook**
2. Click **File → Import Notebook** → upload `kaggle/phase6_e2e_ngrok.ipynb`  
   **OR** open the notebook directly from the cloned repo in your workspace
3. **Notebook Settings** (gear icon) → set:
   - Accelerator: **GPU T4 x2** (or P100)
   - Internet: **On**

### Step 2 — Configure ngrok token (Cell 7)
1. Sign up free at https://ngrok.com
2. Get your token from https://dashboard.ngrok.com/get-started/your-authtoken
3. In Cell 7, either:
   - Paste directly: `NGROK_AUTH_TOKEN = 'your_real_token_here'`
   - **Recommended** — Kaggle Secret:
     - Notebook sidebar → **Add-ons → Secrets → + Add**
     - Key: `NGROK_TOKEN`, Value: your token
     - Uncomment the two `UserSecretsClient` lines in Cell 7

### Step 3 — Run all cells in order (1 → 12)
| Cell | Expected output | Approx time |
|------|----------------|-------------|
| 1 | `GPU: Tesla T4, 15360 MiB, 525.xx` | instant |
| 2 | `Cloned -> /kaggle/working/OR-SIM` | ~5 s |
| 3 | `llama_supports_gpu_offload() = True` | ~30 s |
| 4 | `All OR-SIM Python dependencies installed` | ~60 s |
| 5 | `File size  : 2.1x GB` | ~3-5 min |
| 6 | `MedASR model files ready` | ~30 s |
| 7 | `ngrok auth token configured` | instant |
| 8 | `Server healthy: {'status': 'ok', ...}` | ~30-60 s |
| **9** | **`HTTPS (REST API) : https://xxxx.ngrok-free.app`** | instant |
| 11 | `Health: {...}` and `State: {...}` | ~5 s |
| 12 | `ping #001 -> HTTP 200` (keep running) | ongoing |

### Step 4 — Copy the ngrok URL
When Cell 9 prints:
```
=================================================================
  OR-SIM BACKEND IS LIVE
=================================================================
  HTTPS (REST API) : https://abcd-1234.ngrok-free.app
  WSS  (WebSocket) : wss://abcd-1234.ngrok-free.app/ws/state
=================================================================
```
**Copy the HTTPS URL** (e.g. `https://abcd-1234.ngrok-free.app`).

---

## Part B — Local Machine (frontend)

### Step 5 — Create `frontend/.env.local`

Open a new PowerShell window and run (replace the URL with yours):

```powershell
'VITE_BACKEND_URL=https://abcd-1234.ngrok-free.app' |
    Out-File -Encoding utf8 d:\OR-SIM\frontend\.env.local
```

Or open `d:\OR-SIM\frontend\.env.local` in any text editor and type:
```
VITE_BACKEND_URL=https://abcd-1234.ngrok-free.app
```
> **No trailing slash.** Copy the exact URL from Cell 9 output.

### Step 6 — Start the Vite dev server

```powershell
cd d:\OR-SIM\frontend
npm run dev
```

Expected output:
```
  VITE v6.x.x  ready in 641 ms

  ➜  Local:   http://localhost:5173/
  ➜  Network: use --host to expose
```

### Step 7 — Open the simulator

1. Open **http://localhost:5173** in Chrome or Edge
2. Select a surgery:  `Heart Transplant` / `Liver Resection` / `Kidney PCNL`
3. Click **▶ Start Session** — the WS status dot should turn **green**
4. Allow microphone access when the browser asks
5. Speak surgical commands — machines glow ON in the 3D room in real time!

---

## Part C — Updating the ngrok URL (each new Kaggle session)

Every time you restart the Kaggle kernel, ngrok generates a **new URL**.

```powershell
# On your local machine — update the URL and restart Vite:
'VITE_BACKEND_URL=https://NEW-URL.ngrok-free.app' |
    Out-File -Encoding utf8 d:\OR-SIM\frontend\.env.local

# Ctrl+C to stop the current Vite session, then:
cd d:\OR-SIM\frontend
npm run dev
```

You do **not** need to rebuild (`npm run build`) — Vite hot-reloads `.env.local` on restart.

---

## Part D — Stopping cleanly

1. Click **⏹ Stop Session** in the browser UI
2. Interrupt Cell 12 (the ping loop) with the **square stop button** in Kaggle
3. Kill the Kaggle kernel to free GPU VRAM (≈15 GB)
4. The ngrok tunnel closes automatically when the kernel dies

---

## Troubleshooting

| Symptom | Cause | Fix |
|---------|-------|-----|
| WS dot stays red/yellow | Wrong URL in `.env.local` | Check no trailing slash; re-copy from Cell 9 |
| `ERR_NGROK_3200` | Free tier: only 1 tunnel allowed | Close other ngrok sessions; check dashboard |
| `GPU offload not compiled in` | Wrong CUDA version wheel | Re-run Cell 3; Cell 1 confirms `525.xx` driver → cu124 is correct |
| `ModuleNotFoundError: backend` | sys.path not set | Re-run Cell 2 (sets sys.path), then Cell 8 |
| Cell 8 exits early | MedGemma not loaded | Check Cell 5 shows 2.1 GB; re-run Cell 8 |
| `sounddevice` error in Cell 4 | libportaudio2 missing | Cell 4 installs it; if it fails run: `!apt-get install -y libportaudio2` manually |
| Session start fails in browser | Server can't find model | Check Cell 5 and 6 both printed success |
| Kaggle timeout after ~30 min | Idle kernel | Cell 12 prevents this; if it stopped, re-run Cells 8 + 9 + 12 |
