# üé® Stable Diffusion Backend for Frontend
### Google Colab + Cloudflared Tunnel
**Optimized for low VRAM GPU**

This notebook sets up a complete API backend with memory optimization.

## Step 1: Clone server files from GitHub or Upload
Option A: Upload server folder to Colab  
Option B: Git clone from your repository

In [None]:
# Option A: Mount Google Drive and use files from there
from google.colab import drive
drive.mount('/content/drive')

# Copy server files from Drive
import shutil
import os

# Adjust this path to your Drive location
# For example: /content/drive/My Drive/Stable_Diffusion/server
source_dir = '/content/drive/My Drive/Stable_Diffusion/server'  # CHANGE THIS
dest_dir = '/content/server'

if os.path.exists(source_dir):
    shutil.copytree(source_dir, dest_dir, dirs_exist_ok=True)
    print(f'‚úÖ Copied server files from Drive')
else:
    print(f'‚ö†Ô∏è Source directory not found: {source_dir}')
    print('You can upload the server folder manually in Files tab')

## Step 2: Install Dependencies

In [None]:
!pip install -q torch==2.0.1 torchvision==0.15.2 diffusers==0.21.4 transformers==4.30.2 accelerate==0.20.3 safetensors==0.3.1 flask==2.3.2 flask-cors==4.0.0 pillow==9.5.0 numpy==1.24.3 xformers==0.0.20 peft==0.4.0 requests==2.31.0
print('‚úÖ Dependencies installed!')

## Step 3: Install and Setup Cloudflared

In [None]:
import subprocess
import os

# Download and install cloudflared
!wget -q https://github.com/cloudflare/wrangler/releases/download/wrangler-3.0.1/cloudflared-linux-amd64.deb -O /tmp/cloudflared.deb
!dpkg -i /tmp/cloudflared.deb > /dev/null 2>&1

# Setup CUDA memory optimization
os.environ['PYTORCH_CUDA_ALLOC_CONF'] = 'expandable_segments:True'
os.environ['CUDA_LAUNCH_BLOCKING'] = '1'

print('‚úÖ Cloudflared installed!')
print('‚úÖ CUDA memory optimization enabled!')

## Step 4: Verify Setup and Start Server

In [None]:
import sys
import os
import threading
import time
import subprocess

# Change to server directory
os.chdir('/content/server')
sys.path.insert(0, '/content/server')

print('üìÅ Working directory:', os.getcwd())
print('üìÑ Files in directory:')
for f in os.listdir('.'):
    if not f.startswith('.'):
        print(f'  - {f}')

# Import config to verify settings
try:
    import config
    print(f'\n‚úÖ Config loaded')
    print(f'  Model: {config.MODEL_ID}')
    print(f'  FP16: {config.USE_FP16}')
    print(f'  xFormers: {config.ENABLE_XFORMERS}')
    print(f'  Attention Slicing: {config.ENABLE_ATTENTION_SLICING}')
    print(f'  VAE Tiling: {config.ENABLE_VAE_TILING}')
    print(f'  Model CPU Offload: {config.ENABLE_MODEL_CPU_OFFLOAD}')
except Exception as e:
    print(f'‚ö†Ô∏è Config issue: {e}')

## Step 5: Start Flask + Cloudflared

In [None]:
import subprocess
import threading
import time
import re

# Global variables
tunnel_url = None
cloudflared_process = None
flask_process = None

def start_cloudflared():
    global tunnel_url, cloudflared_process
    
    print('üåê Starting cloudflared tunnel...')
    try:
        cloudflared_process = subprocess.Popen(
            ['cloudflared', 'tunnel', '--url', 'http://localhost:5000'],
            stdout=subprocess.PIPE,
            stderr=subprocess.PIPE,
            text=True,
            bufsize=1
        )
        
        # Read output until we find the tunnel URL
        timeout = 30
        start = time.time()
        
        while time.time() - start < timeout:
            line = cloudflared_process.stdout.readline()
            if not line:
                time.sleep(0.1)
                continue
            
            print(line.strip())
            
            # Look for HTTPS URL
            match = re.search(r'https://[\w.-]+\.trycloudflare\.com', line)
            if match:
                tunnel_url = match.group(0)
                print(f'\n‚úÖ Tunnel URL: {tunnel_url}')
                print('\nüìã USE THIS URL IN YOUR FRONTEND SETTINGS!')
                break
    
    except Exception as e:
        print(f'‚ùå Cloudflared error: {e}')

def start_flask():
    global flask_process
    
    print('\nüöÄ Starting Flask server...')
    time.sleep(2)  # Give cloudflared time to start
    
    try:
        flask_process = subprocess.Popen(
            ['python', 'app.py'],
            stdout=subprocess.PIPE,
            stderr=subprocess.PIPE,
            text=True,
            bufsize=1
        )
        
        # Stream output
        for line in flask_process.stdout:
            print(line.strip())
    
    except Exception as e:
        print(f'‚ùå Flask error: {e}')

# Start both in threads
cloudflared_thread = threading.Thread(target=start_cloudflared, daemon=True)
flask_thread = threading.Thread(target=start_flask, daemon=True)

cloudflared_thread.start()
flask_thread.start()

print('‚è≥ Starting servers (this may take 1-2 minutes to load the model)...')
print('\nKeep this cell running - servers are running in background!')

## Step 6: Test API Connection

In [None]:
import requests
import json
import time

# Wait for server to be ready
print('‚è≥ Waiting for server to be ready...')
time.sleep(5)

# Test local endpoint
try:
    response = requests.get('http://localhost:5000/api/health', timeout=10)
    if response.status_code == 200:
        print('‚úÖ Local API is responding')
        print(json.dumps(response.json(), indent=2))
    else:
        print(f'‚ùå Local API error: {response.status_code}')
except Exception as e:
    print(f'‚è≥ API not ready yet: {e}')
    print('Check Step 5 output - model might still be loading')
    print('This can take 2-5 minutes on first run')

## Step 7: Monitor Status
Run this cell periodically to check memory and generation status

In [None]:
import requests
import json

try:
    # System info
    resp = requests.get('http://localhost:5000/api/system', timeout=5)
    print('System Status:')
    for key, value in resp.json().items():
        if isinstance(value, float):
            print(f'  {key}: {value:.2f} GB')
        else:
            print(f'  {key}: {value}')
    
    # Memory
    print('\nMemory Usage:')
    resp = requests.get('http://localhost:5000/api/memory', timeout=5)
    memory = resp.json()
    if 'gpu' in memory and memory['gpu']:
        gpu = memory['gpu']
        print(f"  GPU Allocated: {gpu['allocated_gb']:.2f} / {gpu['total_gb']:.2f} GB")
        print(f"  GPU Reserved:  {gpu['reserved_gb']:.2f} GB")
    
    # Progress
    print('\nGeneration Status:')
    resp = requests.get('http://localhost:5000/api/progress', timeout=5)
    progress = resp.json()
    print(f"  Generating: {progress['is_generating']}")
    print(f"  Current: {progress['current_prompt'][:50] if progress['current_prompt'] else 'None'}...")
    
except Exception as e:
    print(f'‚ùå Error: {e}')

## ‚ÑπÔ∏è Usage Instructions

### Getting the Public URL:
1. Look at Step 5 output - find the line with `https://xxx.trycloudflare.com`
2. Copy that URL

### In Your Frontend:
1. Open your HTML app
2. Go to Settings (#settings tab if you have it)
3. Paste the Cloudflared URL
4. Click "Test Connection"
5. Start generating!

### API Endpoints (POST):
- `/api/txt2img` - Text to Image
- `/api/img2img` - Image to Image  
- `/api/inpaint` - Inpainting

### Parameters (JSON):
```json
{
  "prompt": "your description",
  "negative_prompt": "what to avoid",
  "steps": 20,
  "cfg_scale": 7.5,
  "width": 512,
  "height": 512,
  "seed": -1,
  "batch_size": 1
}
```

### Troubleshooting:
- **CUDA out of memory**: Reduce `steps` or `batch_size`
- **Slow generation**: Check GPU memory usage in Step 7
- **Connection failed**: Check that Cloudflared shows a URL in Step 5
- **Server won't start**: Check Step 5 output for errors, model might still be downloading

### Tips for Low VRAM:
- Use smaller batch sizes (1-2)
- Use 20-30 steps instead of 50
- The notebook has memory optimization enabled by default
- Model runs on GPU with automatic CPU offloading