# OR-SIM — End-to-End: Kaggle Backend + Local Frontend via ngrok

**Goal:** Run the FastAPI/MedGemma/MedASR backend on a Kaggle GPU, expose it over the internet
with ngrok, and connect the local React/Three.js frontend on your own machine.

```
Your Machine                       Kaggle Notebook (GPU)
──────────────────────            ──────────────────────────────────────
Browser → localhost:5173          │  uvicorn :8000  (FastAPI + MedGemma)
   Vite dev-server                │      ↑
        ↕  HTTPS / WSS            │  ngrok tunnel
   ngrok public URL ──────────────┘
```

## Prerequisites
| Requirement | Where to get it |
|---|---|
| Kaggle account with GPU quota | kaggle.com — enable T4 x2 in notebook settings |
| ngrok account + auth token | dashboard.ngrok.com → *Your Authtoken* |
| OR-SIM repo cloned locally | `d:\OR-SIM` (already done) |
| Kaggle dataset with the GGUF | Upload `medgemma-4b-it-Q3_K_M.gguf` as a private dataset |

## Steps at a glance
1. Run cells 1-8 in order on Kaggle
2. Copy the **ngrok HTTPS URL** printed by cell 8
3. On your local machine: create `frontend/.env.local`, paste the URL, `npm run dev`
4. Open http://localhost:5173 – pick surgery – Start Session – speak!

---
## Cell 1 — Verify GPU

In [None]:
import subprocess, sys

result = subprocess.run(['nvidia-smi'], capture_output=True, text=True)
if result.returncode == 0:
    # Print only the first 20 lines to keep output tidy
    print('\n'.join(result.stdout.splitlines()[:20]))
else:
    print('⚠  No GPU detected — go to Notebook Settings and enable a GPU accelerator.')
    sys.exit('GPU required')

---
## Cell 2 — Install llama-cpp-python (CUDA 12.4 prebuilt wheel)

The Kaggle T4 instance ships CUDA 12.4. We install a prebuilt wheel instead of compiling
from source (saves ~20 minutes).

In [None]:
print('Installing llama-cpp-python (CUDA 12.4 wheel)...')
subprocess.run(
    [
        sys.executable, '-m', 'pip', 'install',
        'llama-cpp-python==0.3.4',
        '--extra-index-url', 'https://abetlen.github.io/llama-cpp-python/whl/cu124',
        '--quiet',
    ],
    check=True,
)
print('Done.')

---
## Cell 3 — Clone OR-SIM repo and install Python dependencies

> **Update `REPO_URL`** to your own GitHub repository URL before running.

In [None]:
import os

# ── ⚙  CONFIGURE: replace with your GitHub repo URL ──────────────────────────
REPO_URL   = 'https://github.com/YOUR_USERNAME/OR-SIM.git'
REPO_DIR   = '/kaggle/working/OR-SIM'
# ─────────────────────────────────────────────────────────────────────────────

if not os.path.isdir(os.path.join(REPO_DIR, '.git')):
    subprocess.run(['git', 'clone', '--depth', '1', REPO_URL, REPO_DIR], check=True)
    print(f'Cloned → {REPO_DIR}')
else:
    subprocess.run(['git', '-C', REPO_DIR, 'pull', '--ff-only'], check=True)
    print(f'Updated existing clone at {REPO_DIR}')

# Install Python deps from the repo's requirements file
req = os.path.join(REPO_DIR, 'requirements.txt')
if os.path.exists(req):
    subprocess.run([sys.executable, '-m', 'pip', 'install', '-r', req, '--quiet'], check=True)
    print('Python dependencies installed.')
else:
    print('⚠  requirements.txt not found — install deps manually if needed.')

---
## Cell 4 — Locate the MedGemma GGUF model

The backend expects the GGUF at `<repo_root>/models/medgemma/medgemma-4b-it-Q3_K_M.gguf`.
We create a symlink from your Kaggle dataset path to that location so nothing else changes.

### Option A — Kaggle Dataset (recommended)
1. Go to **kaggle.com → Datasets → New Dataset**
2. Upload `medgemma-4b-it-Q3_K_M.gguf` (2.1 GB) as a private dataset
   (name it e.g. `medgemma-4b-it-q3-k-m`)
3. Attach it to this notebook via *+ Add data*
4. The file will appear at `/kaggle/input/<dataset-slug>/medgemma-4b-it-Q3_K_M.gguf`

### Option B — Already inside the repo
Change `GGUF_DATASET_PATH` to point inside `REPO_DIR/models/medgemma/...`.

In [None]:
# ── ⚙  CONFIGURE: path where Kaggle mounted your dataset ─────────────────────
GGUF_DATASET_PATH = '/kaggle/input/medgemma-4b-it-q3-k-m/medgemma-4b-it-Q3_K_M.gguf'
# If the file is already inside the cloned repo (e.g. git-lfs), use:
#   GGUF_DATASET_PATH = os.path.join(REPO_DIR, 'models/medgemma/medgemma-4b-it-Q3_K_M.gguf')
# ─────────────────────────────────────────────────────────────────────────────

assert os.path.exists(GGUF_DATASET_PATH), (
    f'GGUF not found at {GGUF_DATASET_PATH}\n'
    'Check the dataset path or attach/upload the model file (see cell instructions).'
)
print(f'✓  GGUF found: {GGUF_DATASET_PATH} ({os.path.getsize(GGUF_DATASET_PATH)/1e9:.2f} GB)')

# Create symlink at the location the backend code expects
GGUF_LINK = os.path.join(REPO_DIR, 'models', 'medgemma', 'medgemma-4b-it-Q3_K_M.gguf')
os.makedirs(os.path.dirname(GGUF_LINK), exist_ok=True)

if not os.path.exists(GGUF_LINK):
    os.symlink(GGUF_DATASET_PATH, GGUF_LINK)
    print(f'✓  Symlinked: {GGUF_LINK}')
else:
    print(f'✓  Symlink already exists: {GGUF_LINK}')

---
## Cell 5 — Verify MedASR ONNX model

The `model.int8.onnx` and `tokens.txt` are committed to the repo under `models/medasr/`.
This cell confirms they are present.

In [None]:
asr_dir = os.path.join(REPO_DIR, 'models', 'medasr')
for fname in ('model.int8.onnx', 'tokens.txt'):
    fpath = os.path.join(asr_dir, fname)
    if os.path.exists(fpath):
        print(f'✓  {fname}  ({os.path.getsize(fpath)/1e6:.1f} MB)')
    else:
        print(f'✗  MISSING: {fpath}')
        print('   → make sure models/medasr/ is committed or add it to a Kaggle dataset.')

---
## Cell 6 — Install pyngrok and configure your auth token

1. Sign up (free) at https://ngrok.com
2. Go to https://dashboard.ngrok.com/get-started/your-authtoken
3. Copy your auth token and paste it below.

> **Security:** treat your auth token like a password — do not commit this notebook with
> the real token in it. Use a Kaggle Secret (Settings → Secrets) and call
> `kaggle_secrets.UserSecretsClient().get_secret('NGROK_TOKEN')` instead.

In [None]:
subprocess.run([sys.executable, '-m', 'pip', 'install', 'pyngrok', '--quiet'], check=True)
print('pyngrok installed.')

from pyngrok import ngrok

# ── ⚙  CONFIGURE: paste your ngrok auth token ────────────────────────────────
# Recommended: use a Kaggle Secret instead of hard-coding
#   from kaggle_secrets import UserSecretsClient
#   NGROK_AUTH_TOKEN = UserSecretsClient().get_secret('NGROK_TOKEN')
NGROK_AUTH_TOKEN = 'YOUR_NGROK_AUTH_TOKEN'
# ─────────────────────────────────────────────────────────────────────────────

ngrok.set_auth_token(NGROK_AUTH_TOKEN)
print('✓  ngrok auth token configured.')

---
## Cell 7 — Start the FastAPI backend server

The server is launched in a daemon thread so the notebook remains interactive.
Startup takes ~10-20 s while MedGemma loads into GPU VRAM.

In [None]:
import sys, os, threading, time
import uvicorn

# Make sure the repo's Python packages are importable
if REPO_DIR not in sys.path:
    sys.path.insert(0, REPO_DIR)

_server_started = threading.Event()

def _run_server():
    from backend.server.app import create_app
    app = create_app()
    # Signal ready before blocking in uvicorn.run
    _server_started.set()
    uvicorn.run(
        app,
        host='0.0.0.0',
        port=8000,
        log_level='info',
        # Disable reload — we are inside a thread, not the main process
        reload=False,
    )

server_thread = threading.Thread(target=_run_server, name='or-sim-server', daemon=True)
server_thread.start()

# Wait up to 30 s for uvicorn to bind the port
print('Waiting for FastAPI server to start', end='', flush=True)
deadline = time.time() + 30
import socket
while time.time() < deadline:
    try:
        with socket.create_connection(('127.0.0.1', 8000), timeout=1):
            break
    except OSError:
        print('.', end='', flush=True)
        time.sleep(1)

print()

# Quick health check
import urllib.request, json as _json
try:
    with urllib.request.urlopen('http://127.0.0.1:8000/api/health', timeout=5) as r:
        health = _json.loads(r.read())
    print(f'✓  Server healthy: {health}')
except Exception as e:
    print(f'✗  Health check failed: {e}')
    print('   → check the server thread output above for errors.')

---
## Cell 8 — Open ngrok tunnel and print the public URLs

**Copy the printed HTTPS URL — you will need it for the local frontend.**

In [None]:
# Close any stale tunnels from previous runs
for t in ngrok.get_tunnels():
    ngrok.disconnect(t.public_url)

# Open a new HTTPS tunnel to port 8000
tunnel = ngrok.connect(8000, bind_tls=True)
NGROK_HTTPS = tunnel.public_url          # e.g. https://abcd-1234.ngrok-free.app
NGROK_WSS   = NGROK_HTTPS.replace('https://', 'wss://')  # WebSocket URL

print()
print('=' * 65)
print('  OR-SIM BACKEND IS LIVE')
print('=' * 65)
print(f'  HTTPS (REST API) : {NGROK_HTTPS}')
print(f'  WSS  (WebSocket) : {NGROK_WSS}/ws/state')
print('=' * 65)
print()
print('Next step → configure your LOCAL FRONTEND (see Cell 9).')

---
## Cell 9 — Connect the local frontend

Run the following steps **on your local machine** (`d:\OR-SIM\frontend`):

### Step 1 — Create `frontend/.env.local`

```
# d:\OR-SIM\frontend\.env.local
VITE_BACKEND_URL=https://abcd-1234.ngrok-free.app
```

Replace `https://abcd-1234.ngrok-free.app` with the **HTTPS URL printed in Cell 8**.

### Step 2 — Start the Vite dev server

Open a terminal and run:

```powershell
cd d:\OR-SIM\frontend
npm run dev
```

Vite will output:

```
VITE v6.x.x  ready in 641 ms

  ➜  Local:   http://localhost:5173/
```

### Step 3 — Open the simulator

1. Open **http://localhost:5173** in Chrome / Edge.
2. Pick a surgery (Heart Transplant / Liver Resection / Kidney PCNL).
3. Click **▶ Start Session**.
4. Allow microphone access when the browser asks.
5. Speak surgical commands — machines will glow ON/OFF in the 3D room in real-time.

### Stopping the session

- Click **⏹ Stop Session** in the UI, then interrupt the Kaggle kernel to free GPU VRAM.
- Delete the ngrok tunnel with `ngrok.disconnect(NGROK_HTTPS)` in a new cell if needed.

### Troubleshooting

| Symptom | Fix |
|---|---|
| WS status dot stays red | Check `.env.local` — no trailing slash on the URL |
| `ERR_NGROK_3200` | Free ngrok tier allows 1 tunnel; close any other open sessions |
| `ModuleNotFoundError: backend` | Cell 7 sets `sys.path`; re-run Cell 7 |
| GGUF not found | Verify dataset is attached to the notebook in Kaggle settings |
| Kaggle kernel timeout | Kaggle sessions expire after ~12 h; re-run cells 7-8 to restart |

---
## Cell 10 — Live sanity check (optional)

Confirms the tunnel is reachable from inside Kaggle and prints the full system state.

In [None]:
import urllib.request, json as _json

for path, label in [('/api/health', 'Health'), ('/api/state', 'State')]:
    url = f'{NGROK_HTTPS}{path}'
    try:
        req = urllib.request.Request(url, headers={'ngrok-skip-browser-warning': '1'})
        with urllib.request.urlopen(req, timeout=10) as r:
            data = _json.loads(r.read())
        print(f'✓  {label}: {_json.dumps(data, indent=2)[:400]}')
    except Exception as e:
        print(f'✗  {label} check failed: {e}')

---
## Cell 11 — Keep-alive ping (optional — run while you use the UI)

Kaggle kills idle kernels after ~30 min. Run this cell to send a health-check ping
every 60 seconds and keep the kernel alive.

In [None]:
import time, urllib.request, datetime

PING_INTERVAL = 60   # seconds
MAX_PINGS     = 60   # 60 × 60 s = 1 hour; increase if needed

print(f'Keep-alive started — pinging every {PING_INTERVAL}s (interrupt kernel to stop).')
for i in range(1, MAX_PINGS + 1):
    time.sleep(PING_INTERVAL)
    try:
        req = urllib.request.Request(
            f'{NGROK_HTTPS}/api/health',
            headers={'ngrok-skip-browser-warning': '1'},
        )
        with urllib.request.urlopen(req, timeout=5) as r:
            status = r.status
    except Exception:
        status = 'error'
    ts = datetime.datetime.now().strftime('%H:%M:%S')
    print(f'  [{ts}] ping #{i:02d} → HTTP {status}')