Skip to content

v0.3.0 — Dispatch and I/O hot path optimizations

Choose a tag to compare

@anyin233 anyin233 released this 10 Apr 10:38
· 86 commits to main since this release

Highlights

Performance pass targeting the non-model-computation hotspots observed across all 7 workflows. Public API signatures unchanged; GPL _vendor/ region untouched.

Measured wins (A/B via benchmarks/compare_optimizations.py, CUDA host)

Hot path Before After Speedup
execute_node dispatch (EmptyLatentImage 512×512) 12.09 µs 3.62 µs 3.34×
configure() repeated identical call 15.98 µs 0.38 µs 42×
load_nodes_from_path repeated identical call 55.48 µs 5.26 µs 10.5×
SaveImage tensor conversion (batch=8, 512×512) 4247 µs 2574 µs 1.65×
LoadImage RGB (256×256) 1042 µs 895 µs 1.16×
Lanczos upscale image 256→512 4646 µs 306 µs 15.2×
Lanczos upscale latent 64→128 591 µs 189 µs 3.12×

What changed

  • Node dispatch caching (executor.py): _is_v3_node, inspect.iscoroutinefunction memoized per class; V1 node instances pooled as per-class singletons. Opt-out via cls._COMFY_RUNTIME_NO_POOL = True.
  • list_nodes() memoize: result tuple cached; invalidated from registry mutators.
  • configure() idempotent (config.py): snapshot short-circuit + cat_dir in existing_paths dedup guard eliminates redundant model-path injection.
  • load_nodes_from_path mtime cache (registry.py): modules are not re-executed when the source file/directory hasn't changed.
  • SaveImage batch transfer (compat/nodes.py): collapses N per-image .cpu() syncs into one fused clamp/mul/cast. Non-in-place ops preserve caller tensor.
  • LoadImage conditional RGBA (compat/nodes.py): RGB inputs skip the forced RGBA conversion + slicing; has-alpha detection upfront.
  • Lanczos torch fast path (compat/comfy/utils.py): torch.nn.functional.interpolate(mode='bicubic', antialias=True) replaces the per-channel PIL CPU round-trip, with automatic PIL fallback on older torch.

Infrastructure

  • New benchmarks/ package with _harness.py primitives, run_all.py auto-discovery runner, per-optimization bench_*.py scripts, and compare_optimizations.py one-shot A/B comparison.
  • 32 new unit tests (6 files). All 38 unit tests green on CPU and CUDA.

Development discipline

Every production change was driven by a failing test (strict TDD: RED → GREEN → VERIFY). Commits are atomic and independently revertible.