v0.3.0 — Dispatch and I/O hot path optimizations
Highlights
Performance pass targeting the non-model-computation hotspots observed across all 7 workflows. Public API signatures unchanged; GPL _vendor/ region untouched.
Measured wins (A/B via benchmarks/compare_optimizations.py, CUDA host)
| Hot path | Before | After | Speedup |
|---|---|---|---|
execute_node dispatch (EmptyLatentImage 512×512) |
12.09 µs | 3.62 µs | 3.34× |
configure() repeated identical call |
15.98 µs | 0.38 µs | 42× |
load_nodes_from_path repeated identical call |
55.48 µs | 5.26 µs | 10.5× |
| SaveImage tensor conversion (batch=8, 512×512) | 4247 µs | 2574 µs | 1.65× |
| LoadImage RGB (256×256) | 1042 µs | 895 µs | 1.16× |
| Lanczos upscale image 256→512 | 4646 µs | 306 µs | 15.2× |
| Lanczos upscale latent 64→128 | 591 µs | 189 µs | 3.12× |
What changed
- Node dispatch caching (
executor.py):_is_v3_node,inspect.iscoroutinefunctionmemoized per class; V1 node instances pooled as per-class singletons. Opt-out viacls._COMFY_RUNTIME_NO_POOL = True. list_nodes()memoize: result tuple cached; invalidated from registry mutators.configure()idempotent (config.py): snapshot short-circuit +cat_dir in existing_pathsdedup guard eliminates redundant model-path injection.load_nodes_from_pathmtime cache (registry.py): modules are not re-executed when the source file/directory hasn't changed.- SaveImage batch transfer (
compat/nodes.py): collapses N per-image.cpu()syncs into one fused clamp/mul/cast. Non-in-place ops preserve caller tensor. - LoadImage conditional RGBA (
compat/nodes.py): RGB inputs skip the forced RGBA conversion + slicing; has-alpha detection upfront. - Lanczos torch fast path (
compat/comfy/utils.py):torch.nn.functional.interpolate(mode='bicubic', antialias=True)replaces the per-channel PIL CPU round-trip, with automatic PIL fallback on older torch.
Infrastructure
- New
benchmarks/package with_harness.pyprimitives,run_all.pyauto-discovery runner, per-optimizationbench_*.pyscripts, andcompare_optimizations.pyone-shot A/B comparison. - 32 new unit tests (6 files). All 38 unit tests green on CPU and CUDA.
Development discipline
Every production change was driven by a failing test (strict TDD: RED → GREEN → VERIFY). Commits are atomic and independently revertible.