# MedGemma Clinical Note Assistant - Google Colab Setup

This notebook sets up and runs the FastAPI backend on Google Colab with GPU support.

## Prerequisites
- Google Colab Pro (recommended) for GPU access
- HuggingFace token (if model requires authentication)

## Step 1: Enable GPU Runtime

**Go to: Runtime ‚Üí Change runtime type ‚Üí GPU (T4 or better)**

Make sure to select GPU before running the cells below!

## Step 1: Install Dependencies

Install all required packages including FastAPI, PyTorch with CUDA, and transformers.

In [None]:
# Install dependencies
%pip install -q fastapi uvicorn[standard] pydantic pydantic-settings transformers torch accelerate pyngrok requests

# Verify CUDA is available
import torch
print(f"‚úÖ CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"‚úÖ GPU: {torch.cuda.get_device_name(0)}")
    print(f"‚úÖ CUDA Version: {torch.version.cuda}")
else:
    print("‚ö†Ô∏è  CUDA not available. Make sure you selected GPU runtime!")

## Step 2: Clone from GitHub (Recommended)

**Option A: Clone from GitHub (Recommended)**
- If you've pushed your code to GitHub, clone it here
- This is the easiest and most reliable method

**Option B: Upload Files Manually**
- Use Colab's file uploader (right sidebar ‚Üí üìÅ) to upload your `app/` directory

After cloning/uploading, your directory structure should include:
```
app/
  ‚îú‚îÄ‚îÄ main.py
  ‚îú‚îÄ‚îÄ api/
  ‚îú‚îÄ‚îÄ core/
  ‚îú‚îÄ‚îÄ services/
  ‚îú‚îÄ‚îÄ schemas/
  ‚îú‚îÄ‚îÄ templates/
  ‚îî‚îÄ‚îÄ utils/
```

In [None]:
# Option A: Clone from GitHub (Recommended)
# Replace with your actual GitHub repository URL and uncomment below:
# REPO_URL = "https://github.com/YOUR_USERNAME/YOUR_REPO_NAME.git"
# REPO_DIR = "YOUR_REPO_NAME"  # Directory name after cloning

# Example (uncomment and customize):
REPO_URL = "https://github.com/Aregawi-Teame/offline-clinical-note-assistant-backend.git"
REPO_DIR = "offline-clinical-note-assistant-backend"

import os
import subprocess

# Check if repository directory already exists
if os.path.exists(REPO_DIR):
    print(f"üìÅ Repository '{REPO_DIR}' already exists. Pulling latest changes...")
    
    # Check if it's a git repository
    git_dir = os.path.join(REPO_DIR, '.git')
    if os.path.exists(git_dir):
        # Change to repository directory first
        original_dir = os.getcwd()
        os.chdir(REPO_DIR)
        print(f"üìÇ Changed to: {os.getcwd()}")
        
        # Pull latest changes
        result = subprocess.run(['git', 'pull'], capture_output=True, text=True)
        if result.returncode == 0:
            print("‚úÖ Successfully pulled latest changes")
            if result.stdout.strip():
                print(result.stdout)
            else:
                print("   (Already up to date)")
        else:
            print("‚ö†Ô∏è  Could not pull changes")
            if result.stdout.strip():
                print(f"   Output: {result.stdout}")
            if result.stderr.strip():
                print(f"   Error: {result.stderr}")
    else:
        print(f"‚ö†Ô∏è  '{REPO_DIR}' exists but is not a git repository")
        print(f"   Removing it and cloning fresh...")
        import shutil
        shutil.rmtree(REPO_DIR)
        # Clone fresh
        result = subprocess.run(['git', 'clone', REPO_URL], capture_output=True, text=True)
        if result.returncode == 0:
            print(f"‚úÖ Successfully cloned repository")
            os.chdir(REPO_DIR)
        else:
            print(f"‚ùå Failed to clone repository: {result.stderr}")
    
    # Ensure we're in the repo directory
    if os.path.exists(REPO_DIR):
        os.chdir(REPO_DIR)
        print(f"üìÇ Current directory: {os.getcwd()}")
else:
    print(f"üì• Cloning repository from {REPO_URL}...")
    result = subprocess.run(['git', 'clone', REPO_URL], capture_output=True, text=True)
    if result.returncode == 0:
        print(f"‚úÖ Successfully cloned repository")
        os.chdir(REPO_DIR)
        print(f"üìÇ Changed to directory: {os.getcwd()}")
    else:
        print(f"‚ùå Failed to clone repository")
        if result.stdout.strip():
            print(f"   Output: {result.stdout}")
        if result.stderr.strip():
            print(f"   Error: {result.stderr}")
        print("   Make sure the repository URL is correct and publicly accessible")

# Option B: If not using GitHub, upload files manually via Colab's file uploader
# Use the files tab (üìÅ) in the left sidebar to upload your app/ directory

# Verify app directory exists
print("\nüîç Verifying project structure...")
if os.path.exists('app'):
    print("‚úÖ app/ directory found")
    app_contents = os.listdir('app')
    print(f"üìÅ Contents: {app_contents}")
    if 'main.py' in app_contents:
        print("‚úÖ main.py found - project structure looks good!")
    else:
        print("‚ö†Ô∏è  main.py not found - check your directory structure")
else:
    print("‚ö†Ô∏è  app/ directory not found.")
    print("   Please either:")
    print("   1. Uncomment and customize REPO_URL above, then re-run this cell")
    print("   2. Upload files manually via Colab's file uploader")

## Step 3: Configure Environment

Set environment variables for the application. Colab will automatically use CUDA when `DEVICE=auto`.

### üîë HuggingFace Token (May be Required)

Some models require authentication. If you see errors about "not a valid model identifier", you may need a HuggingFace token:

1. **Get your token**: Go to https://huggingface.co/settings/tokens
2. **Create a token** (with "read" permissions)
3. **Set it in Cell 6** (see next cell) by uncommenting the `HUGGINGFACE_HUB_TOKEN` line

**Note**: Not all models require authentication. Try without a token first, and add it if you get authentication errors.

In [None]:
import os

# Configuration - these will override .env file if present
# IMPORTANT: Set these BEFORE importing app modules (Settings reads env vars at import time)
os.environ['ENV'] = 'dev'  # Must be 'dev' or 'prod' - prevents validation errors
os.environ['DEVICE'] = 'auto'  # Will auto-detect CUDA in Colab
os.environ['MODEL_ID'] = 'google/medgemma-1.5-4b-it'  # or 'google/medgemma-1.5-4b-it'
os.environ['DEMO_MODE'] = 'false'  # Set to 'true' for demo mode (no model needed)
os.environ['MAX_NEW_TOKENS'] = '800'
os.environ['TEMPERATURE'] = '0.2'
os.environ['TOP_P'] = '0.9'

# IMPORTANT: HuggingFace Token (Required for some models)
# If you get "not a valid model identifier" errors, the model may require authentication
# 
# Get your token:
# 1. Go to: https://huggingface.co/settings/tokens
# 2. Create a token with "read" permissions
# 3. Uncomment the line below and paste your token
#
# os.environ['HUGGINGFACE_HUB_TOKEN'] = 'hf_your_token_here'
# 
# Alternative: Login using HuggingFace CLI
# !huggingface-cli login --token YOUR_TOKEN

# Check if token is set
hf_token = os.environ.get('HUGGINGFACE_HUB_TOKEN') or os.environ.get('HF_TOKEN')
if hf_token:
    print(f"‚úÖ HuggingFace token is set (will be used for model authentication)")
else:
    print("‚ÑπÔ∏è  HuggingFace token not set (may be required for some models)")

print("‚úÖ Environment configured")
print(f"   ENV: {os.environ.get('ENV')} (must be 'dev' or 'prod')")
print(f"   DEVICE: {os.environ.get('DEVICE')}")
print(f"   MODEL_ID: {os.environ.get('MODEL_ID')} ‚ö†Ô∏è  Make sure this matches what you want!")
print(f"   DEMO_MODE: {os.environ.get('DEMO_MODE')}")

# Verify the environment variable is set
model_id_set = os.environ.get('MODEL_ID', 'NOT SET')
print(f"\nüìã MODEL_ID is set to: {model_id_set}")
if model_id_set == 'NOT SET':
    print("‚ö†Ô∏è  WARNING: MODEL_ID not set!")
elif 'medgemma-1.5-4b-it' in model_id_set or 'medgemma-4b-it' in model_id_set:
    print(f"‚ö†Ô∏è  WARNING: '{model_id_set}' may not be a valid model identifier!")
    print("   Try: 'google/medgemma-2b' or 'google/medgemma-7b'")
    print("   Search for valid models: https://huggingface.co/models?search=medgemma")
else:
    print(f"‚úÖ MODEL_ID set to: {model_id_set}")

print("\nüìå IMPORTANT: Run this cell BEFORE starting the server (Cell 10)")
print("   The server must be restarted if you change MODEL_ID after it's running.")

## Step 4: Authenticate ngrok (Recommended)

ngrok requires authentication for reliable tunnels. You can use it without authentication, but authenticated sessions are more stable.

### Get ngrok Auth Token

1. **Sign up for free**: Go to https://dashboard.ngrok.com/signup
2. **Get your token**: After signing up, go to https://dashboard.ngrok.com/get-started/your-authtoken
3. **Copy the token** (looks like: `2abc123def456ghi789jkl_1a2B3c4D5e6F7g8H9i0J`)

### Authenticate in Colab

Run the cell below with your auth token. If you skip this, ngrok will still work but may have limitations.

In [None]:
# ========================================
# INSTRUCTIONS: Paste Your ngrok Token Here
# ========================================
#
# After getting your token from https://dashboard.ngrok.com/get-started/your-authtoken:
# 1. Find the line below that says: # !ngrok config add-authtoken YOUR_NGROK_AUTH_TOKEN
# 2. Remove the # at the start (uncomment it)
# 3. Replace YOUR_NGROK_AUTH_TOKEN with your actual token (paste between quotes)
# 4. It should look like: !ngrok config add-authtoken 2abc123def456ghi789jkl_1a2B3c4D5e6F
# 5. Run this cell
#
# Example (after you paste your token):
# !ngrok config add-authtoken 2abc123def456ghi789jkl_1a2B3c4D5e6F7g8H9i0J
#
# ========================================

# üëá PASTE YOUR TOKEN HERE üëá
# Uncomment the line below and replace YOUR_NGROK_AUTH_TOKEN with your actual token:
# !ngrok config add-authtoken YOUR_NGROK_AUTH_TOKEN

# Option: Skip authentication (works but may have limitations)
# If you skip authentication, you can still use ngrok but sessions may timeout sooner
# Just leave the line above commented and run this cell

print("‚ÑπÔ∏è  ngrok authentication status:")
try:
    from pyngrok import ngrok
    # Try to check if authenticated (this is approximate)
    print("   Ready to authenticate")
    print("   ‚ö†Ô∏è  To authenticate: Uncomment the line above, paste your token, and re-run this cell")
except:
    print("   Pyngrok not imported yet (will be imported in next cell)")

## Step 5: Start FastAPI Server with ngrok

This will start the FastAPI server and create a public URL using ngrok. The server will run in the background.

In [None]:
from pyngrok import ngrok
import uvicorn
import threading
import time
import requests
import os

# IMPORTANT: Verify MODEL_ID is set correctly BEFORE starting server
# The Settings class reads environment variables at import time (when server starts)
# So make sure Cell 6 (Configure Environment) was run FIRST
print("üîç Verifying configuration before starting server...")
model_id_env = os.environ.get('MODEL_ID', 'NOT SET')
print(f"   Environment MODEL_ID: {model_id_env}")

if model_id_env == 'NOT SET':
    print("‚ö†Ô∏è  WARNING: MODEL_ID not set! Run Cell 6 (Configure Environment) first!")
elif 'medgemma-1.5-4b-it' in model_id_env or 'medgemma-4b-it' in model_id_env:
    print(f"‚ö†Ô∏è  WARNING: MODEL_ID '{model_id_env}' may not be valid!")
    print("   Try: 'google/medgemma-2b' or 'google/medgemma-7b'")
    print("   Check: https://huggingface.co/models?search=medgemma")
else:
    print(f"‚úÖ MODEL_ID is set to: {model_id_env}")
    print(f"   (If you get model loading errors, verify this model exists on HuggingFace)")

print()

# Clear any existing ngrok tunnels (free tier has 5 tunnel limit)
print("üßπ Checking for existing ngrok tunnels...")
try:
    # Get all active tunnels using ngrok API
    tunnels = ngrok.get_tunnels()
    if tunnels:
        print(f"   Found {len(tunnels)} existing tunnel(s), closing them...")
        for tunnel_info in tunnels:
            try:
                ngrok.disconnect(tunnel_info.public_url)
                print(f"   ‚úÖ Closed: {tunnel_info.public_url}")
            except Exception as e:
                print(f"   ‚ö†Ô∏è  Could not close {tunnel_info.public_url}: {e}")
    else:
        print("   ‚úÖ No existing tunnels found")
except Exception as e:
    # If we can't check, continue anyway - might fail if ngrok isn't running yet
    print(f"   ‚ÑπÔ∏è  Could not check existing tunnels: {e}")
    print("   (This is OK if this is the first tunnel)")
    pass

# Start ngrok tunnel
print("\nüöÄ Starting new ngrok tunnel...")
try:
    tunnel = ngrok.connect(8000)
except Exception as e:
    if "ERR_NGROK_324" in str(e) or "endpoints" in str(e).lower():
        print("‚ùå Error: Too many ngrok tunnels running (free tier limit: 5)")
        print("   Solution: Go to https://dashboard.ngrok.com/status/tunnels")
        print("   and manually close unnecessary tunnels, then re-run this cell")
        raise
    else:
        raise
# Extract the public URL string from the tunnel object
# The tunnel object string looks like: NgrokTunnel: "https://xxx.ngrok-free.dev" -> "http://localhost:8000"
if hasattr(tunnel, 'public_url'):
    public_url = tunnel.public_url
elif hasattr(tunnel, 'data') and 'public_url' in tunnel.data:
    public_url = tunnel.data['public_url']
else:
    # Extract URL from string representation
    tunnel_str = str(tunnel)
    import re
    url_match = re.search(r'"(https://[^"]+)"', tunnel_str)
    if url_match:
        public_url = url_match.group(1)
    else:
        public_url = tunnel_str  # Fallback

print(f"üåê Public API URL: {public_url}")
print(f"üìö API Docs: {public_url}/api/v1/docs")
print(f"üîç Health Check: {public_url}/api/v1/health")
print()
print("‚è≥ Starting server... (this may take 30-60 seconds on first run)")

# Start FastAPI server in background thread
def run_server():
    try:
        uvicorn.run(
            "app.main:app",
            host="0.0.0.0",
            port=8000,
            log_level="info"
        )
    except Exception as e:
        print(f"‚ùå Server error: {e}")

# Start server thread
server_thread = threading.Thread(target=run_server, daemon=True)
server_thread.start()

# Wait for server to start and model to load
print("Waiting for server to initialize...")
time.sleep(10)  # Initial wait

# Try to check health
max_retries = 12
for i in range(max_retries):
    try:
        health_url = f"{public_url}/api/v1/health"
        response = requests.get(health_url, timeout=10)
        if response.status_code == 200:
            print("\n‚úÖ Server is ready!")
            print(f"Response: {response.json()}")
            break
    except requests.exceptions.RequestException as e:
        if i < max_retries - 1:
            print(f"Waiting... ({i+1}/{max_retries}) - Server may still be loading model")
            time.sleep(5)
        else:
            print(f"\n‚ö†Ô∏è  Health check timed out after {max_retries * 5} seconds")
            print(f"   Try accessing manually: {public_url}/api/v1/health")
            print(f"   Server is running - model may still be loading")

## Step 6: Test the API

Test the health endpoint and generate a clinical note.

In [None]:
import requests
import json

# Health check
# Note: public_url should be set from the previous cell
# If you get an error, make sure you ran the "Start FastAPI Server" cell first
try:
    health_url = f"{public_url}/api/v1/health"
    response = requests.get(health_url, timeout=10)
    print("üìä Health Check:")
    print(json.dumps(response.json(), indent=2))
except NameError:
    print("‚ùå Error: public_url not found. Please run the 'Start FastAPI Server' cell first.")
except Exception as e:
    print(f"‚ùå Error: {e}")
    print(f"   Try accessing: {public_url}/api/v1/health manually")

In [None]:
# Generate a SOAP note
url = f"{public_url}/api/v1/generate/"
payload = {
    "task": "SOAP",
    "notes": "Patient presents with chest pain. 45-year-old male with history of hypertension. Blood pressure 140/90, heart rate regular at 72 bpm.",
    "options": {
        "maxTokens": 800,
        "temperature": 0.2,
        "topP": 0.9
    }
}

print("üöÄ Generating SOAP note...")
print(f"   Notes: {payload['notes'][:60]}...")
print()

try:
    response = requests.post(url, json=payload, timeout=120)
    if response.status_code == 200:
        result = response.json()
        print("‚úÖ Generation successful!")
        print(f"\nüìù Generated Note ({result['task']}):")
        print(f"\n{result['output']}")
        print(f"\nüìä Metadata:")
        print(f"   Model: {result['model']}")
        print(f"   Latency: {result['latencyMs']:.2f} ms")
        print(f"   Request ID: {result.get('requestId', 'N/A')}")
    else:
        print(f"‚ùå Error {response.status_code}:")
        print(json.dumps(response.json(), indent=2))
except requests.exceptions.Timeout:
    print("‚è±Ô∏è  Request timed out. Model may still be loading or generation is slow.")
except Exception as e:
    print(f"‚ùå Error: {e}")

## Troubleshooting: ngrok Tunnel Limits

If you see an error about too many tunnels (ERR_NGROK_324):

**Option 1: Kill existing tunnels (Run this cell)**
Run the cell below to automatically close all existing tunnels.

**Option 2: Manual cleanup**
Go to https://dashboard.ngrok.com/status/tunnels and manually close unnecessary tunnels.

**Option 3: Restart Colab runtime**
Runtime ‚Üí Restart runtime (this clears all tunnels)

In [None]:
# Kill all existing ngrok tunnels
# Run this cell if you get "too many tunnels" error (ERR_NGROK_324)

from pyngrok import ngrok

print("üîç Checking for existing ngrok tunnels...")
try:
    # Get all active tunnels
    tunnels = ngrok.get_tunnels()
    
    if tunnels:
        print(f"Found {len(tunnels)} active tunnel(s):")
        for i, tunnel_info in enumerate(tunnels, 1):
            addr = tunnel_info.config.get('addr', 'unknown') if hasattr(tunnel_info, 'config') else 'unknown'
            print(f"  {i}. {tunnel_info.public_url} -> {addr}")
        
        print("\nüî™ Closing all tunnels...")
        closed_count = 0
        for tunnel_info in tunnels:
            try:
                ngrok.disconnect(tunnel_info.public_url)
                print(f"   ‚úÖ Closed: {tunnel_info.public_url}")
                closed_count += 1
            except Exception as e:
                print(f"   ‚ö†Ô∏è  Could not close {tunnel_info.public_url}: {e}")
        
        if closed_count > 0:
            print(f"\n‚úÖ Closed {closed_count} tunnel(s). You can now re-run the 'Start FastAPI Server' cell.")
        else:
            print("\n‚ö†Ô∏è  Could not close any tunnels. Try manual cleanup or restart runtime.")
    else:
        print("‚úÖ No tunnels to close.")
        
except Exception as e:
    print(f"‚ö†Ô∏è  Error checking tunnels: {e}")
    print("\nüí° Alternative solutions:")
    print("   1. Go to: https://dashboard.ngrok.com/status/tunnels")
    print("      Manually close unnecessary tunnels")
    print("   2. Runtime ‚Üí Restart runtime (this clears all tunnels)")
    print("      Then re-run all cells from the beginning")

## Important Notes

### Session Management
- **Free Colab**: Sessions timeout after ~12 hours
- **Colab Pro**: Up to 24 hours (with idle timeout)
- Keep cells running or reconnect ngrok if session restarts

### GPU Access
- Colab Pro provides T4 GPU (sometimes A100)
- `DEVICE=auto` will automatically use CUDA when GPU is available
- First request is slower (model loading ~30-60 seconds)

### ngrok Authentication
- **Get free token**: Sign up at https://dashboard.ngrok.com/signup
- **Authenticate**: Use your token in Step 4 (cell above) before starting server
- **Why authenticate**: Longer tunnels, better stability, no random disconnects
- **Without auth**: Still works but may have shorter timeouts

### ngrok URL
- Free tier: URL changes on restart
- Paid tier: Can use fixed domain
- Save your public URL if you need to use it elsewhere

### Using the API from Outside Colab

Your API is now accessible via the public URL. Example curl command:
```bash
curl -X POST "YOUR_NGROK_URL/api/v1/generate/" \\
  -H "Content-Type: application/json" \\
  -d '{"task": "SOAP", "notes": "Your clinical notes here"}'
```