Release v0.3.0 — Dispatch and I/O hot path optimizations · anyin233/comfy-runtime

Highlights

Performance pass targeting the non-model-computation hotspots observed across all 7 workflows. Public API signatures unchanged; GPL _vendor/ region untouched.

Measured wins (A/B via `benchmarks/compare_optimizations.py`, CUDA host)

Hot path	Before	After	Speedup
`execute_node` dispatch (EmptyLatentImage 512×512)	12.09 µs	3.62 µs	3.34×
`configure()` repeated identical call	15.98 µs	0.38 µs	42×
`load_nodes_from_path` repeated identical call	55.48 µs	5.26 µs	10.5×
SaveImage tensor conversion (batch=8, 512×512)	4247 µs	2574 µs	1.65×
LoadImage RGB (256×256)	1042 µs	895 µs	1.16×
Lanczos upscale image 256→512	4646 µs	306 µs	15.2×
Lanczos upscale latent 64→128	591 µs	189 µs	3.12×

What changed

Node dispatch caching (executor.py): _is_v3_node, inspect.iscoroutinefunction memoized per class; V1 node instances pooled as per-class singletons. Opt-out via cls._COMFY_RUNTIME_NO_POOL = True.
list_nodes() memoize: result tuple cached; invalidated from registry mutators.
configure() idempotent (config.py): snapshot short-circuit + cat_dir in existing_paths dedup guard eliminates redundant model-path injection.
load_nodes_from_path mtime cache (registry.py): modules are not re-executed when the source file/directory hasn't changed.
SaveImage batch transfer (compat/nodes.py): collapses N per-image .cpu() syncs into one fused clamp/mul/cast. Non-in-place ops preserve caller tensor.
LoadImage conditional RGBA (compat/nodes.py): RGB inputs skip the forced RGBA conversion + slicing; has-alpha detection upfront.
Lanczos torch fast path (compat/comfy/utils.py): torch.nn.functional.interpolate(mode='bicubic', antialias=True) replaces the per-channel PIL CPU round-trip, with automatic PIL fallback on older torch.

Infrastructure

New benchmarks/ package with _harness.py primitives, run_all.py auto-discovery runner, per-optimization bench_*.py scripts, and compare_optimizations.py one-shot A/B comparison.
32 new unit tests (6 files). All 38 unit tests green on CPU and CUDA.

Development discipline

Every production change was driven by a failing test (strict TDD: RED → GREEN → VERIFY). Commits are atomic and independently revertible.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.3.0 — Dispatch and I/O hot path optimizations

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Highlights

Measured wins (A/B via `benchmarks/compare_optimizations.py`, CUDA host)

What changed

Infrastructure

Development discipline

Uh oh!

v0.3.0 — Dispatch and I/O hot path optimizations

Highlights

Measured wins (A/B via benchmarks/compare_optimizations.py, CUDA host)

What changed

Infrastructure

Development discipline

Uh oh!

Measured wins (A/B via `benchmarks/compare_optimizations.py`, CUDA host)