# Real-Time Semantic Segmentation - Google Colab Backend Deployment

This notebook deploys the **backend server** on Google Colab with GPU acceleration.

## Split Architecture

```
┌─────────────────────┐         ┌──────────────────────┐
│  YOUR LOCAL         │         │  GOOGLE COLAB        │
│  MACHINE            │         │  (Free GPU)          │
├─────────────────────┤         ├──────────────────────┤
│                     │         │                      │
│  Frontend Server    │◄────────┤  Backend Server      │
│  (Port 8080)        │  ngrok  │  (Port 8000)         │
│                     │  WebSocket                     │
│  - HTML/CSS/JS      │         │  - FastAPI           │
│  - Webcam capture   │         │  - PyTorch Models    │
│  - UI controls      │         │  - GPU Inference     │
└─────────────────────┘         └──────────────────────┘
```

## Setup Instructions

1. **Enable GPU**: Go to `Runtime` → `Change runtime type` → Select `GPU`
2. **Get ngrok token**: Sign up at https://ngrok.com and get your auth token
3. **Run cells 1-6**: Execute each cell sequentially
4. **Copy ngrok URL**: From Cell 5 output (you'll paste this in your local frontend)
5. **Start local frontend**: On your machine, run `./start_frontend.sh`
6. **Connect**: Open http://localhost:8080 and paste the ngrok URL

## Note
- Colab sessions timeout after 12 hours or 90 minutes idle
- GPU allocation is not guaranteed (T4/P100/V100 depending on availability)
- Frontend runs locally for better performance and easier development

## 1. Check GPU Availability

In [None]:
!nvidia-smi

## 2. Clone Repository

In [None]:
# Clone your repository (replace with your repo URL)
!git clone https://github.com/yourusername/RealTimeSeg.git
%cd RealTimeSeg

## 3. Install Dependencies

In [None]:
# Install PyTorch with CUDA support
!pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118

# Install other requirements
!pip install -r backend/requirements.txt

# Install ngrok for tunneling
!pip install pyngrok

# Install nest_asyncio for Colab event loop compatibility
!pip install nest_asyncio

## 4. Download and Cache Models (Optional)

Pre-download models to speed up startup. Models will be cached for the session.

In [None]:
import torch
import torchvision.models.segmentation as models
from transformers import SegformerForSemanticSegmentation

print("Downloading models...")

# Download DeepLabV3 models
print("1/3 Downloading DeepLabV3-MobileNetV3...")
_ = models.deeplabv3_mobilenet_v3_large(pretrained=True)

print("2/3 Downloading DeepLabV3-ResNet50...")
_ = models.deeplabv3_resnet50(pretrained=True)

print("3/3 Downloading SegFormer-B3...")
_ = SegformerForSemanticSegmentation.from_pretrained(
    "nvidia/segformer-b3-finetuned-ade-512-512"
)

print("✓ All models downloaded and cached")

## 5. Setup ngrok Tunnel

**Important**: 
1. Replace `YOUR_NGROK_TOKEN` with your actual token from https://ngrok.com
2. **COPY THE URL** shown below - you'll paste it in your local frontend!

In [None]:
from pyngrok import ngrok, conf
import os

# Set your ngrok auth token here
NGROK_TOKEN = "YOUR_NGROK_TOKEN"  # Replace with your token from https://ngrok.com

# Configure ngrok
conf.get_default().auth_token = NGROK_TOKEN

# Create tunnel
public_url = ngrok.connect(8000)
print(f"\n{'='*70}")
print(f"🌐 PUBLIC URL: {public_url}")
print(f"{'='*70}\n")
print(f"📋 COPY THIS URL - You'll paste it in your local frontend!")
print(f"\nNext steps:")
print(f"1. Copy the URL above")
print(f"2. On your local machine, run: ./start_frontend.sh")
print(f"3. Open: http://localhost:8080")
print(f"4. Paste this URL in the 'Backend Server URL' field")
print(f"5. Click 'Connect'")
print(f"\n⚠️ Keep this URL handy - Cell 6 will keep running!")

## 6. Start Backend Server

This will start the FastAPI backend server. The cell will keep running - don't stop it!

**What you should see:**
- ✓ Backend directory: ...
- ✓ Model loader created
- ✓ Default model loaded
- ✓ Frame processor created
- ✅ Server initialized successfully
- INFO: Uvicorn running on http://0.0.0.0:8000

**If you see errors:** Check the troubleshooting section below.

In [None]:
import sys
import os
import asyncio
import nest_asyncio

# Allow nested event loops (required for Colab/Jupyter)
nest_asyncio.apply()

# Configure paths
backend_dir = '/content/RealTimeSeg/backend'
sys.path.insert(0, backend_dir)
os.chdir(backend_dir)

# Set CUDA device
os.environ['CUDA_VISIBLE_DEVICES'] = '0'

print(f"✓ Backend directory: {backend_dir}")
print(f"✓ Python path: {sys.path[0]}")
print(f"✓ Current directory: {os.getcwd()}")
print("")

# Import and initialize
try:
    print("Starting server...")
    from app import app, initialize_server
    import uvicorn
    
    # Initialize server before starting
    print("Initializing server components...")
    initialize_server()
    
    # Start uvicorn with Config (compatible with existing event loop)
    print("Starting uvicorn...")
    print("=" * 70)
    print("🚀 Server is starting...")
    print("=" * 70)
    print("")
    
    config = uvicorn.Config(
        app,
        host="0.0.0.0",
        port=8000,
        log_level="info"
    )
    server = uvicorn.Server(config)
    
    # Run in the existing event loop
    await server.serve()
    
except Exception as e:
    print(f"❌ Failed to start server: {e}")
    import traceback
    traceback.print_exc()

## 7. Keep Session Alive (Optional)

Run this in a separate cell to prevent Colab from disconnecting due to inactivity.

**Note**: This is optional and should be used responsibly.

In [None]:
import time
from datetime import datetime

print("Keep-alive started. Press 'Stop' button to end.")
print("This will print a message every 60 seconds.\n")

try:
    while True:
        current_time = datetime.now().strftime("%H:%M:%S")
        print(f"[{current_time}] Session active...")
        time.sleep(60)
except KeyboardInterrupt:
    print("\nKeep-alive stopped.")

## Troubleshooting

### Backend Issues (Colab)

**GPU Not Available**
- Go to `Runtime` → `Change runtime type` → Select `GPU`
- Restart runtime and run cells again

**ngrok Connection Failed**
- Check that your ngrok token is correct
- Free tier has connection limits (40 connections/min)
- Try regenerating the tunnel (run Cell 5 again)

**Server Won't Start**
- Check that all dependencies installed successfully (Cell 3)
- Make sure Cell 6 shows all ✓ checkmarks
- Review error messages in Cell 6 output
- Try restarting runtime and running all cells again

**"asyncio.run() cannot be called from a running event loop"**
- This is fixed in the updated Cell 6 code
- Make sure you're using the latest notebook version
- Cell 6 uses `nest_asyncio` and `await server.serve()` for Colab compatibility
- If you still see this, re-run Cell 3 to install nest_asyncio

**ImportError: attempted relative import**
- This means paths aren't configured correctly
- Make sure Cell 6 runs the proper initialization code
- The repository should be cloned at `/content/RealTimeSeg`

**TypeError: 'NoneType' object is not callable**
- Server initialization failed
- Check Cell 6 output for initialization errors
- Verify models downloaded successfully in Cell 4

### Frontend Connection Issues (Local)

**"Connection Failed" in Browser**
- Verify Cell 6 is running (should show "Uvicorn running")
- Copy exact ngrok URL from Cell 5 (including https://)
- Try the test tool: http://localhost:8080/test_connection.html
- Check if ngrok URL works in a regular browser tab first

**"JSON parse error"**
- Backend is returning HTML instead of JSON
- This usually means old backend code is running
- Solution: Re-clone repository in Cell 2 and restart from Cell 3

**WebSocket Connection Failed**
- HTTP test passed but WebSocket fails?
- Check CORS settings (should already be configured)
- Verify you're using wss:// (not ws://) for https URLs
- The frontend should auto-convert the URL

**Frontend Not Starting Locally**
- Make sure you're in the RealTimeSeg directory
- Run: `./start_frontend.sh` (Linux/Mac) or `start_frontend.bat` (Windows)
- Or manually: `cd frontend && python3 -m http.server 8080`
- Check that port 8080 isn't already in use

## Performance Tips

1. **Model Selection**:
   - Fast Mode: 30-40 FPS (MobileNetV3)
   - Balanced Mode: 20-25 FPS (ResNet50) - **Default**
   - Accurate Mode: 10-12 FPS (SegFormer-B3)

2. **Monitor GPU**: Run `!nvidia-smi` in a new cell to check GPU usage

3. **First Run is Slower**: Models download (~2-3GB), subsequent runs use cache

4. **Session Limits**: 
   - 12-hour maximum session
   - 90-minute idle disconnect
   - Run Cell 7 (keep-alive) to prevent idle disconnect

## Testing Connection

Use the diagnostic tool to test connection:
1. On your local machine, open: http://localhost:8080/test_connection.html
2. Paste your ngrok URL from Cell 5
3. Click "1. Test HTTP" - should return JSON with server info
4. Click "2. Test WebSocket" - should receive "connected" message
5. If both pass, you can use the main app!

## Architecture Notes

This setup uses **split architecture**:
- **Backend (Colab)**: Runs inference with GPU, serves WebSocket API
- **Frontend (Local)**: Serves UI, captures webcam, renders results
- **Connection**: WebSocket via ngrok tunnel

Benefits:
- ✅ Fast local UI (no tunneling latency for static files)
- ✅ Free Colab GPU for inference
- ✅ Easy frontend development (just refresh browser)
- ✅ Better overall performance

## Technical Notes

**Asyncio Compatibility**: Colab notebooks run in an existing asyncio event loop. The server uses `nest_asyncio` and `await server.serve()` instead of `uvicorn.run()` to work within this environment.