# MedGemma Clinical Note Assistant - Google Colab Setup

This notebook sets up and runs the FastAPI backend on Google Colab with GPU support.

## Prerequisites
- Google Colab Pro (recommended) for GPU access
- HuggingFace token (if model requires authentication)

## Step 1: Enable GPU Runtime

**Go to: Runtime ‚Üí Change runtime type ‚Üí GPU (T4 or better)**

Make sure to select GPU before running the cells below!

## Step 1: Install Dependencies

Install all required packages including FastAPI, PyTorch with CUDA, and transformers.

In [None]:
# Install dependencies
%pip install -q fastapi uvicorn[standard] pydantic pydantic-settings transformers torch accelerate pyngrok requests

# Verify CUDA is available
import torch
print(f"‚úÖ CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"‚úÖ GPU: {torch.cuda.get_device_name(0)}")
    print(f"‚úÖ CUDA Version: {torch.version.cuda}")
else:
    print("‚ö†Ô∏è  CUDA not available. Make sure you selected GPU runtime!")

## Step 2: Clone from GitHub (Recommended)

**Option A: Clone from GitHub (Recommended)**
- If you've pushed your code to GitHub, clone it here
- This is the easiest and most reliable method

**Option B: Upload Files Manually**
- Use Colab's file uploader (right sidebar ‚Üí üìÅ) to upload your `app/` directory

After cloning/uploading, your directory structure should include:
```
app/
  ‚îú‚îÄ‚îÄ main.py
  ‚îú‚îÄ‚îÄ api/
  ‚îú‚îÄ‚îÄ core/
  ‚îú‚îÄ‚îÄ services/
  ‚îú‚îÄ‚îÄ schemas/
  ‚îú‚îÄ‚îÄ templates/
  ‚îî‚îÄ‚îÄ utils/
```

In [None]:
# Option A: Clone from GitHub (Recommended)
# Replace with your actual GitHub repository URL
# !git clone https://github.com/yourusername/offline-clinical-note-assistant-backend.git
# %cd offline-clinical-note-assistant-backend

# Uncomment and run the lines above with your GitHub URL, for example:
# !git clone https://github.com/YOUR_USERNAME/YOUR_REPO_NAME.git
# %cd YOUR_REPO_NAME

# Option B: If not using GitHub, upload files manually via Colab's file uploader
# Use the files tab (üìÅ) in the left sidebar to upload your app/ directory

# Verify app directory exists
import os
if os.path.exists('app'):
    print("‚úÖ app/ directory found")
    print(f"üìÅ Contents: {os.listdir('app')}")
    if 'main.py' in os.listdir('app'):
        print("‚úÖ main.py found - project structure looks good!")
    else:
        print("‚ö†Ô∏è  main.py not found - check your directory structure")
else:
    print("‚ö†Ô∏è  app/ directory not found.")
    print("   Please either:")
    print("   1. Clone from GitHub (uncomment the git clone lines above)")
    print("   2. Upload files manually via Colab's file uploader")

## Step 3: Configure Environment

Set environment variables for the application. Colab will automatically use CUDA when `DEVICE=auto`.

In [None]:
import os

# Configuration - these will override .env file if present
# IMPORTANT: Set ENV explicitly to avoid conflicts with system ENV variable
os.environ['ENV'] = 'dev'  # Must be 'dev' or 'prod' - prevents validation errors
os.environ['DEVICE'] = 'auto'  # Will auto-detect CUDA in Colab
os.environ['MODEL_ID'] = 'google/medgemma-4b-it'  # or 'google/medgemma-1.5-4b-it'
os.environ['DEMO_MODE'] = 'false'  # Set to 'true' for demo mode (no model needed)
os.environ['MAX_NEW_TOKENS'] = '800'
os.environ['TEMPERATURE'] = '0.2'
os.environ['TOP_P'] = '0.9'

# Optional: Set HuggingFace token if model requires authentication
# os.environ['HUGGINGFACE_HUB_TOKEN'] = 'your_token_here'

print("‚úÖ Environment configured")
print(f"   ENV: {os.environ.get('ENV')} (must be 'dev' or 'prod')")
print(f"   DEVICE: {os.environ.get('DEVICE')}")
print(f"   MODEL_ID: {os.environ.get('MODEL_ID')}")
print(f"   DEMO_MODE: {os.environ.get('DEMO_MODE')}")

## Step 4: Authenticate ngrok (Recommended)

ngrok requires authentication for reliable tunnels. You can use it without authentication, but authenticated sessions are more stable.

### Get ngrok Auth Token

1. **Sign up for free**: Go to https://dashboard.ngrok.com/signup
2. **Get your token**: After signing up, go to https://dashboard.ngrok.com/get-started/your-authtoken
3. **Copy the token** (looks like: `2abc123def456ghi789jkl_1a2B3c4D5e6F7g8H9i0J`)

### Authenticate in Colab

Run the cell below with your auth token. If you skip this, ngrok will still work but may have limitations.

In [None]:
# ========================================
# INSTRUCTIONS: Paste Your ngrok Token Here
# ========================================
#
# After getting your token from https://dashboard.ngrok.com/get-started/your-authtoken:
# 1. Find the line below that says: # !ngrok config add-authtoken YOUR_NGROK_AUTH_TOKEN
# 2. Remove the # at the start (uncomment it)
# 3. Replace YOUR_NGROK_AUTH_TOKEN with your actual token (paste between quotes)
# 4. It should look like: !ngrok config add-authtoken 2abc123def456ghi789jkl_1a2B3c4D5e6F
# 5. Run this cell
#
# Example (after you paste your token):
# !ngrok config add-authtoken 2abc123def456ghi789jkl_1a2B3c4D5e6F7g8H9i0J
#
# ========================================

# üëá PASTE YOUR TOKEN HERE üëá
# Uncomment the line below and replace YOUR_NGROK_AUTH_TOKEN with your actual token:
# !ngrok config add-authtoken YOUR_NGROK_AUTH_TOKEN

# Option: Skip authentication (works but may have limitations)
# If you skip authentication, you can still use ngrok but sessions may timeout sooner
# Just leave the line above commented and run this cell

print("‚ÑπÔ∏è  ngrok authentication status:")
try:
    from pyngrok import ngrok
    # Try to check if authenticated (this is approximate)
    print("   Ready to authenticate")
    print("   ‚ö†Ô∏è  To authenticate: Uncomment the line above, paste your token, and re-run this cell")
except:
    print("   Pyngrok not imported yet (will be imported in next cell)")

## Step 5: Start FastAPI Server with ngrok

This will start the FastAPI server and create a public URL using ngrok. The server will run in the background.

In [None]:
from pyngrok import ngrok
import uvicorn
import threading
import time
import requests

# Start ngrok tunnel
public_url = ngrok.connect(8000)
print(f"üåê Public API URL: {public_url}")
print(f"üìö API Docs: {public_url}/api/v1/docs")
print(f"üîç Health Check: {public_url}/api/v1/health")
print()
print("‚è≥ Starting server... (this may take 30-60 seconds on first run)")

# Start FastAPI server in background thread
def run_server():
    try:
        uvicorn.run(
            "app.main:app",
            host="0.0.0.0",
            port=8000,
            log_level="info"
        )
    except Exception as e:
        print(f"‚ùå Server error: {e}")

# Start server thread
server_thread = threading.Thread(target=run_server, daemon=True)
server_thread.start()

# Wait for server to start and model to load
print("Waiting for server to initialize...")
time.sleep(10)  # Initial wait

# Try to check health
max_retries = 12
for i in range(max_retries):
    try:
        response = requests.get(f"{public_url}/api/v1/health", timeout=5)
        if response.status_code == 200:
            print("\n‚úÖ Server is ready!")
            print(f"Response: {response.json()}")
            break
    except requests.exceptions.RequestException:
        if i < max_retries - 1:
            print(f"Waiting... ({i+1}/{max_retries})")
            time.sleep(5)
        else:
            print("\n‚ö†Ô∏è  Server may still be starting. Check manually with health endpoint.")

## Step 6: Test the API

Test the health endpoint and generate a clinical note.

In [None]:
import requests
import json

# Health check
try:
    response = requests.get(f"{public_url}/api/v1/health")
    print("üìä Health Check:")
    print(json.dumps(response.json(), indent=2))
except Exception as e:
    print(f"‚ùå Error: {e}")

In [None]:
# Generate a SOAP note
url = f"{public_url}/api/v1/generate/"
payload = {
    "task": "SOAP",
    "notes": "Patient presents with chest pain. 45-year-old male with history of hypertension. Blood pressure 140/90, heart rate regular at 72 bpm.",
    "options": {
        "maxTokens": 800,
        "temperature": 0.2,
        "topP": 0.9
    }
}

print("üöÄ Generating SOAP note...")
print(f"   Notes: {payload['notes'][:60]}...")
print()

try:
    response = requests.post(url, json=payload, timeout=120)
    if response.status_code == 200:
        result = response.json()
        print("‚úÖ Generation successful!")
        print(f"\nüìù Generated Note ({result['task']}):")
        print(f"\n{result['output']}")
        print(f"\nüìä Metadata:")
        print(f"   Model: {result['model']}")
        print(f"   Latency: {result['latencyMs']:.2f} ms")
        print(f"   Request ID: {result.get('requestId', 'N/A')}")
    else:
        print(f"‚ùå Error {response.status_code}:")
        print(json.dumps(response.json(), indent=2))
except requests.exceptions.Timeout:
    print("‚è±Ô∏è  Request timed out. Model may still be loading or generation is slow.")
except Exception as e:
    print(f"‚ùå Error: {e}")

## Important Notes

### Session Management
- **Free Colab**: Sessions timeout after ~12 hours
- **Colab Pro**: Up to 24 hours (with idle timeout)
- Keep cells running or reconnect ngrok if session restarts

### GPU Access
- Colab Pro provides T4 GPU (sometimes A100)
- `DEVICE=auto` will automatically use CUDA when GPU is available
- First request is slower (model loading ~30-60 seconds)

### ngrok Authentication
- **Get free token**: Sign up at https://dashboard.ngrok.com/signup
- **Authenticate**: Use your token in Step 4 (cell above) before starting server
- **Why authenticate**: Longer tunnels, better stability, no random disconnects
- **Without auth**: Still works but may have shorter timeouts

### ngrok URL
- Free tier: URL changes on restart
- Paid tier: Can use fixed domain
- Save your public URL if you need to use it elsewhere

### Using the API from Outside Colab

Your API is now accessible via the public URL. Example curl command:
```bash
curl -X POST "YOUR_NGROK_URL/api/v1/generate/" \\
  -H "Content-Type: application/json" \\
  -d '{"task": "SOAP", "notes": "Your clinical notes here"}'
```