RunPod ComfyUI Serverless Handler - Optimized v3.0

A high-performance, production-ready ComfyUI serverless handler for RunPod with advanced optimizations, comprehensive model management, and enhanced custom nodes support.

⚠️ CUDA 12.8 Required: This image only works on Ada Lovelace (RTX 40xx, L40), Hopper (H100/H200), or Blackwell GPUs. Older Ampere GPUs (RTX 30xx, A100) are NOT supported!

🚀 Key Features

Performance Optimizations

20-30% faster inference with PyTorch 2.0+ optimizations
torch.compile support for automatic graph optimization
Multi-stage Docker build for 50% smaller images
Cold start optimizations for faster container startup

Advanced Model Management

Dynamic ComfyUI versioning with automatic latest release detection
Comprehensive model download system with 8 pre-configured models across 8 categories
Parallel downloads with progress tracking and checksum verification
Automated model verification and integrity checking

Enhanced Custom Nodes

6 essential custom nodes (expanded from 1)
ComfyUI-Manager for node management
ComfyUI-Impact-Pack for advanced ControlNet
rgthree-comfy for workflow utilities
ComfyUI-Advanced-ControlNet for advanced implementations
ComfyUI-VideoHelperSuite for video processing

Production Features

Serverless GPU Computing: Uses RunPod's Serverless Platform for scalable GPU computations
ComfyUI Integration: Seamless integration with ComfyUI for AI image & video generation
Heavy Video Rendering: Optimized for long-running video workflows (AnimateDiff, SVD, etc.)
Automatic Seed Randomization: Seeds are automatically randomized for each execution (configurable)
S3 Storage: Direct upload to Cloudflare R2, AWS S3, or Backblaze B2 with presigned URLs
RunPod Network Volume Support: Automatic backup of generated files to RunPod Network Volume
Workflow Flexibility: Supports both predefined and dynamic workflows
Extended Timeouts: 20 min startup timeout, 60 min workflow execution timeout
Error Handling: Robust error handling and detailed logging with automatic stderr output
Dynamic ComfyUI Versioning: Build with latest or a specific tag via Docker ARG
Performance Tuning: TF32, cuDNN autotune, and optional torch.compile
Custom Nodes Pack: 5+ essential custom nodes pre-installed (configurable)
Model Downloader: Parallel downloads with checksum verification
Multi-stage Docker Build: Smaller images and faster rebuilds using BuildKit caches

📋 Requirements

RunPod Account with API Key
RunPod Network Volume (for persistent storage)
Docker (for image build)
Python 3.11+

🔥 GPU Requirements (CUDA 12.8)

⚠️ IMPORTANT: This image requires CUDA 12.8 or higher!

Only GPUs with Ada Lovelace, Hopper, or Blackwell architecture are supported:

✅ Compatible GPUs (CUDA 12.8+):

Consumer/Prosumer:
- RTX 4090, RTX 4080, RTX 4070 Ti (Ada Lovelace)
- RTX 5090, RTX 5080 (Blackwell - when available)
Datacenter:
- L40, L40S (Ada Lovelace)
- H100, H200 (Hopper)
- B100, B200 (Blackwell)

❌ NOT Compatible (Older Architectures):

RTX 3090, RTX 3080, A100, A40 (Ampere) - Will NOT work!
All older GPUs (Turing, Pascal, etc.)

How to filter in RunPod:

Go to RunPod → Serverless → Deploy Endpoint
Filter by "CUDA 12.8" or "CUDA 12.9"
Only select GPUs from the compatible list above

💾 VRAM Recommendations by Workload:

Images (SD 1.5/SDXL): 16GB+ (RTX 4080, L40)
Videos (AnimateDiff, SVD): 24GB+ (RTX 4090, L40S)
Heavy Video (4K, long sequences): 48GB+ (H100, H200)

🛠️ Installation

Quick Setup

# Clone repository
git clone https://github.com/EcomTree/runpod-comfyui-serverless.git
cd runpod-comfyui-serverless

# Build optimized Docker image
docker build -t comfyui-serverless:latest -f Dockerfile .

# Or build with specific ComfyUI version
docker build --build-arg COMFYUI_VERSION=v0.3.58 -t comfyui-serverless:latest -f Dockerfile .

Advanced Setup

Clone Repository

git clone https://github.com/EcomTree/runpod-comfyui-serverless.git
cd runpod-comfyui-serverless

Configure Custom Nodes (Optional)

# Edit custom nodes configuration
nano configs/custom_nodes.json

# Install custom nodes manually
./scripts/install_custom_nodes.sh

Build Docker Image (with optional ComfyUI version)

# Build with latest ComfyUI release (default)
docker build -t ecomtree/comfyui-serverless:latest -f Dockerfile .

# Or pin a specific ComfyUI version
docker build --build-arg COMFYUI_VERSION=v0.3.57 \
  -t ecomtree/comfyui-serverless:0.3.57 -f Dockerfile .

Download Models (Optional)

# Download all models
python3 scripts/download_models.py --config models_download.json

# Download specific categories
python3 scripts/download_models.py --config models_download.json --categories checkpoints,loras

# Download with custom concurrency
python3 scripts/download_models.py --config models_download.json --categories checkpoints,vae --concurrency 8

Push to Registry
```
docker push comfyui-serverless:latest
```

🔧 Configuration

Environment Variables

The handler supports the following environment variables:

ComfyUI Configuration

COMFY_PORT: ComfyUI Port (default: 8188)
COMFY_HOST: ComfyUI Host (default: 127.0.0.1)
RANDOMIZE_SEEDS: Automatically randomize all seeds in workflows (default: true)
- Set to false if you want to preserve exact seeds from your workflow
- When enabled, all seed values are replaced with random values before execution

Performance Tuning

ENABLE_TORCH_COMPILE: Enable torch.compile optimization hooks (default: false)
TORCH_COMPILE_BACKEND: Compile backend (default: inductor)
TORCH_COMPILE_MODE: default | reduce-overhead (default) | max-autotune
TORCH_COMPILE_FULLGRAPH: Require full graph capture (default: 0)
TORCH_COMPILE_DYNAMIC: Allow dynamic shapes (default: 0)
ENABLE_TF32: Allow TF32 on Ampere+ (default: true)
ENABLE_CUDNN_BENCHMARK: Enable cuDNN autotune (default: true)
MATMUL_PRECISION: highest | high (default) | medium
COMFY_EXTRA_ARGS: Extra CLI flags passed to ComfyUI at startup
Deprecated (kept for backward compatibility): DISABLE_SMART_MEMORY, FORCE_FP16, COLD_START_OPTIMIZATION, PRELOAD_MODELS, GPU_MEMORY_FRACTION
- Deprecation Notice: These variables are now deprecated as of version 4.0 (released Q4 2024) and may be removed in future releases. Please migrate to the modern configuration detailed below:
  - DISABLE_SMART_MEMORY: Use the default smart memory behaviour or refer to docs/performance-tuning.md for advanced memory flags.
  - FORCE_FP16: Adjust precision using MATMUL_PRECISION or relevant PyTorch environment flags instead.
  - COLD_START_OPTIMIZATION: Cold start improvements are now automatic; no manual flag is required.
  - PRELOAD_MODELS: Model lifecycle is handled by the manifest-driven downloader; see docs/model-management.md.
  - GPU_MEMORY_FRACTION: GPU memory is tuned automatically through allocator settings; manual fractions are deprecated.

See docs/performance-tuning.md for details.

Storage Configuration (S3 or Network Volume)

S3 Storage (Recommended for HTTP Access):

S3_BUCKET: Name of your S3 Bucket (required)
S3_ACCESS_KEY: S3 Access Key ID (required)
S3_SECRET_KEY: S3 Secret Access Key (required)
S3_ENDPOINT_URL: Custom Endpoint for S3-compatible services (e.g. Cloudflare R2, Backblaze B2)
S3_REGION: S3 Region (default: "auto")
S3_PUBLIC_URL: Optional: Custom Public URL Prefix (e.g. CDN URL)
S3_SIGNED_URL_EXPIRY: Validity duration of signed URLs in seconds (default: 3600)

Network Volume (Fallback):

RUNPOD_VOLUME_PATH: Path to Network Volume (default: /runpod-volume)
RUNPOD_OUTPUT_DIR: Alternative output directory (optional)
VOLUME_MODELS_DIR: Optional override path to models directory (if nonstandard)

Note: When S3 is configured, it will be used automatically. The Network Volume serves as fallback.

Debugging & Logging

DEBUG_S3_URLS: Log full presigned URLs including authentication tokens (default: false)
- ⚠️ Security Warning: Only enable in development! Presigned URLs contain sensitive tokens
- When disabled, URLs in logs show path only with note: [presigned - query params redacted for security]
- See URL_LOGGING.md for detailed information

Workflow Configuration

Workflows are passed as JSON directly in the request. The handler expects the ComfyUI workflow format.

Model Downloads (Optional)

This project includes a model downloader with link verification and checksum validation.

# Verify links (skips auth-only links unless HUGGINGFACE_TOKEN is set)
python scripts/verify_links.py --config models_download.json

# Download a subset of models (e.g., checkpoints and vae)
python scripts/download_models.py --config models_download.json \
  --categories checkpoints,vae --concurrency 4

# Optionally set a Hugging Face token for gated models
export HUGGINGFACE_TOKEN=hf_xxx

Manifest format: see models_download.json.

📝 Usage

Request Format

{
  "input": {
    "workflow": {
      // ComfyUI Workflow JSON
      // Example: SD 1.5 Text-to-Image
      "3": {
        "inputs": {
          "seed": 42,
          "steps": 20,
          "cfg": 7.0,
          "sampler_name": "euler",
          "scheduler": "normal",
          "denoise": 1.0,
          "model": ["4", 0],
          "positive": ["6", 0],
          "negative": ["7", 0],
          "latent_image": ["5", 0]
        },
        "class_type": "KSampler"
      }
      // ... more nodes
    }
  }
}

Response Format

With S3 Storage (Cloudflare R2, AWS S3, Backblaze B2):

{
  "links": [
    "https://account-id.r2.cloudflarestorage.com/comfyui-outputs/job-id/20251003_120530_output_image.png?X-Amz-..."
  ],
  "total_images": 1,
  "job_id": "abc123",
  "storage_type": "s3",
  "s3_bucket": "comfyui-outputs",
  "local_paths": [
    "/workspace/ComfyUI/output/output_image.png"
  ],
  "volume_paths": [
    "/runpod-volume/comfyui/output/comfyui-20251003_120530_000000-abc12345-output_image.png"
  ]
}

With Network Volume Only (S3 not configured):

{
  "links": [
    "/runpod-volume/comfyui/output/comfyui-20251003_120530_000000-abc12345-output_image.png"
  ],
  "total_images": 1,
  "job_id": "abc123",
  "storage_type": "volume",
  "volume_paths": [
    "/runpod-volume/comfyui/output/comfyui-20251003_120530_000000-abc12345-output_image.png"
  ]
}

Note: When S3 is configured, images are uploaded to S3 and backed up to the Network Volume. The links array contains publicly accessible S3 URLs (presigned URLs by default, or custom CDN URLs if S3_PUBLIC_URL is set).

☁️ S3 Setup Guide

Cloudflare R2 (Recommended - Free up to 10GB)

Create R2 Bucket:
- Go to Cloudflare Dashboard → R2
- Create new bucket (e.g. comfyui-outputs)
Create API Token:
- R2 → Manage R2 API Tokens → Create API Token
- Note down: Access Key ID, Secret Access Key
- Endpoint URL: https://<account-id>.r2.cloudflarestorage.com

Set Environment Variables in RunPod:

S3_BUCKET=comfyui-outputs
S3_ACCESS_KEY=<your-access-key>
S3_SECRET_KEY=<your-secret-key>
S3_ENDPOINT_URL=https://<account-id>.r2.cloudflarestorage.com
S3_REGION=auto

AWS S3

Create S3 Bucket:
- AWS Console → Create Bucket
- Select region (e.g. us-east-1)
IAM User & Credentials:
- IAM → Users → Add User
- Permissions: s3:PutObject, s3:GetObject, s3:DeleteObject

Environment Variables:

S3_BUCKET=your-bucket-name
S3_ACCESS_KEY=<aws-access-key>
S3_SECRET_KEY=<aws-secret-key>
S3_REGION=us-east-1

Backblaze B2

Create Bucket: Backblaze Console
Create Application Key: Note down Key ID & Key

Environment Variables:

S3_BUCKET=your-bucket-name
S3_ACCESS_KEY=<key-id>
S3_SECRET_KEY=<application-key>
S3_ENDPOINT_URL=https://s3.us-west-002.backblazeb2.com
S3_REGION=us-west-002

🧪 Testing

Use the included test script to validate your endpoint:

# Configure the test script
cp scripts/test_endpoint.sh scripts/test_endpoint_local.sh
# Edit scripts/test_endpoint_local.sh with your ENDPOINT_ID and API_KEY

# Run tests
bash scripts/test_endpoint_local.sh

Note: Never commit API keys or endpoint IDs to version control!

#!/bin/bash
# WARNING: Do not commit real API keys or endpoint IDs to version control!
ENDPOINT_ID="your-endpoint-id"
API_KEY="your-runpod-api-key"
API_URL="https://api.runpod.ai/v2/${ENDPOINT_ID}/runsync"

curl -X POST "$API_URL" \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "input": {
      "workflow": "workflow_data_here"
    }
  }'

🏗️ Architecture

runpod-comfyui-serverless/
├── src/                           # Source code modules
│   ├── config.py                  # Configuration management
│   ├── comfyui_manager.py         # ComfyUI server lifecycle
│   ├── s3_handler.py              # S3 storage operations
│   └── workflow_processor.py      # Workflow processing utilities
├── scripts/                       # Setup and maintenance scripts
│   ├── setup.sh                   # Unified setup script
│   ├── common-codex.sh             # Shared helper functions
│   └── test_endpoint.sh           # Testing utilities
├── config/                        # Configuration files
├── tests/                         # Test files
├── rp_handler.py                  # Main RunPod handler
├── Dockerfile                     # Serverless Docker image
├── requirements.txt               # Python dependencies
├── .env.example                   # Configuration template
├── .gitignore                     # Git ignore rules
└── README.md                      # This file

Handler Architecture

The handler is now organized into focused modules:

src/config.py: Centralized configuration management with environment variable parsing
src/comfyui_manager.py: ComfyUI server lifecycle, workflow execution, and model management
src/s3_handler.py: S3 storage operations with proper error handling and URL sanitization
src/workflow_processor.py: Workflow processing utilities including seed randomization
rp_handler.py: Main entry point that orchestrates all components
scripts/: Installers, model management, and performance hooks
- scripts/get_latest_version.sh: Resolve latest ComfyUI release
- scripts/install_custom_nodes.sh: Install core custom nodes from configs/custom_nodes.json
- scripts/download_models.py: Parallel model downloader with checksums
- scripts/verify_links.py: Link validation tool
docs/: Guides for performance tuning and custom nodes

🚀 Deployment

Setup and Build

# Clone and setup the project
git clone https://github.com/EcomTree/runpod-comfyui-serverless.git
cd runpod-comfyui-serverless
bash scripts/setup.sh

# Build Docker image
docker build -t ecomtree/comfyui-serverless:latest -f Dockerfile .
docker push ecomtree/comfyui-serverless:latest

Create RunPod Serverless Endpoint
- Go to RunPod Dashboard
- Create new Serverless Endpoint
- Docker Image: ecomtree/comfyui-serverless:latest
- Container Disk: at least 15GB (20GB recommended for large models)
- GPU Filter: CUDA 12.8 or 12.9 only!
- GPU: RTX 4090, L40/L40S, H100/H200 or better (see GPU Requirements above)
- Important: Connect Network Volume with sufficient storage for models and outputs

Configure Environment Variables Set the following environment variables in your RunPod endpoint:

# S3 Storage (recommended)
S3_BUCKET=your-bucket-name
S3_ACCESS_KEY=your-access-key
S3_SECRET_KEY=your-secret-key
S3_ENDPOINT_URL=https://your-s3-endpoint.com

# Performance tuning
PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:1024,expandable_segments:True
TORCH_ALLOW_TF32_CUBLAS_OVERRIDE=1

Note down credentials
- Endpoint ID: Available in the RunPod dashboard
- API Key: Generated when creating the endpoint

📊 Performance

Optimized Performance (v3.0)

Cold Start: ~10-20 seconds (20-30% improvement with optimizations)
Heavy Model Loading: Up to 15 minutes for large model collections (25% improvement)
Warm Start: ~1-3 seconds (40% improvement)
Image Workflow: 3-90 seconds (20-30% faster with torch.compile)
Video Workflow: 1.5-45 minutes (25% improvement)
S3 Upload: ~1-5 seconds per file
Volume Save: <1 second per file

Performance Features

torch.compile: 20-30% faster inference
Multi-stage builds: 50% smaller Docker images
Cold start optimization: 15-25% faster startup
Memory optimization: 10-15% more efficient memory usage

💡 Technical Details

Base Image: runpod/pytorch:2.8.0-py3.11-cuda12.8.1-cudnn-devel-ubuntu22.04
CUDA Version: 12.8.1 (requires Ada Lovelace, Hopper, or Blackwell GPUs)
ComfyUI Version: Dynamic (latest by default, configurable via COMFYUI_VERSION)
PyTorch: 2.8.0 with CUDA 12.8 + torch.compile optimizations
Pre-installed Models: 160+ models available via download system
GPU Memory: Optimized with --normalvram flag + memory optimizations
Tensor Cores: Fully optimized for modern Tensor Cores (4th gen+)
Custom Nodes: 6 essential nodes (ComfyUI-Manager, Impact-Pack, rgthree-comfy, Advanced-ControlNet, VideoHelperSuite, LoadImageFromHttpURL)
Docker: Multi-stage build with BuildKit cache mounts for faster rebuilds
Performance: torch.compile, CUDNN optimizations, memory management

📚 Documentation

Performance Tuning Guide - Detailed performance optimization guide
Custom Nodes Guide - Complete custom nodes documentation
Model Download System - Comprehensive model library
Custom Nodes Config - Custom nodes configuration

🛠️ Scripts

scripts/get_latest_version.sh - Get latest ComfyUI version
scripts/optimize_performance.py - Apply performance optimizations
scripts/download_models.py - Download models with parallel processing
scripts/verify_links.py - Verify model download links
scripts/install_custom_nodes.sh - Install custom nodes

🤝 Contributing

Contributions are welcome! Please create a pull request with your changes.

📄 License

This project is licensed under the MIT License.

🙏 Acknowledgments

RunPod for the Serverless GPU Infrastructure
ComfyUI for the awesome AI Workflow System
The Open Source Community for continuous support

Created with ❤️ for the AI Art Community

Name		Name	Last commit message	Last commit date
Latest commit History 292 Commits
.workspace		.workspace
configs		configs
docs		docs
scripts		scripts
src		src
.env.example		.env.example
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CODEX_SETUP.md		CODEX_SETUP.md
Dockerfile		Dockerfile
README.md		README.md
SETUP_SCRIPTS_INFO.md		SETUP_SCRIPTS_INFO.md
URL_LOGGING.md		URL_LOGGING.md
models_download.json		models_download.json
requirements.txt		requirements.txt
rp_handler.py		rp_handler.py
sitecustomize.py		sitecustomize.py

Folders and files

Latest commit

History

Repository files navigation

RunPod ComfyUI Serverless Handler - Optimized v3.0

🚀 Key Features

Performance Optimizations

Advanced Model Management

Enhanced Custom Nodes

Production Features

📋 Requirements

🔥 GPU Requirements (CUDA 12.8)

✅ Compatible GPUs (CUDA 12.8+):

❌ NOT Compatible (Older Architectures):

💾 VRAM Recommendations by Workload:

🛠️ Installation

Quick Setup

Advanced Setup

🔧 Configuration

Environment Variables

ComfyUI Configuration

Performance Tuning

Storage Configuration (S3 or Network Volume)

Debugging & Logging

Workflow Configuration

Model Downloads (Optional)

📝 Usage

Request Format

Response Format

☁️ S3 Setup Guide

Cloudflare R2 (Recommended - Free up to 10GB)

AWS S3

Backblaze B2

🧪 Testing

🏗️ Architecture

Handler Architecture

🚀 Deployment

📊 Performance

Optimized Performance (v3.0)

Performance Features

💡 Technical Details

📚 Documentation

🛠️ Scripts

🤝 Contributing

📄 License

🙏 Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages