Summary
The longlive pipeline crashes with a ValueError in VaceEncodingBlock._encode_with_conditioning when vace_input_masks has 3 channels instead of the expected 1. This happens when the video-depth-anything pipeline output is used as vace_input_frames in a graph — the depth map appears to arrive as a 3-channel (RGB) tensor rather than the 1-channel (grayscale) format that VACE requires.
The error fires continuously (~160+ times in a single session), causing the entire longlive pipeline to fail on every chunk for the duration of the session.
Error Logs (Grafana/Loki — 2026-04-10 16:36–16:38 UTC)
fal_app: github_f1lhgmk5v76a0ev1w0u378by-scope-app--prod
fal_job_id: 42337cb5-17d4-440c-82d5-1e2aa30a485d
session: 3c6a7a72
Error in block: (vace_encoding, VaceEncodingBlock)
Error details: VaceEncodingBlock._encode_with_conditioning: vace_input_masks must have 1 channel, got 3
File "/app/src/scope/core/pipelines/wan2_1/vace/blocks/vace_encoding.py", line 207, in __call__
File "/app/src/scope/core/pipelines/wan2_1/vace/blocks/vace_encoding.py", line 767, in _encode_with_conditioning
ValueError: VaceEncodingBlock._encode_with_conditioning: vace_input_masks must have 1 channel, got 3
Repeated every ~400ms for the full session (160+ occurrences).
Triggering Graph Config
The user had configured the following pipeline graph:
input → video-depth-anything → longlive (vace_input_frames) → rife → output
With params:
vace_enabled: True
vace_use_input_video: True
vace_context_scale: 0.6
The video-depth-anything pipeline output is fed directly to longlive on the vace_input_frames port. The depth pipeline likely produces a 3-channel output (depth map replicated across RGB channels) but VaceEncodingBlock expects a single-channel grayscale mask.
Affected File
src/scope/core/pipelines/wan2_1/vace/blocks/vace_encoding.py
- Line 207:
__call__
- Line 767:
_encode_with_conditioning
Note: Distinct from Related Issues
Suggested Fix
In _encode_with_conditioning (line ~767), add a channel check and auto-convert before the validation raises:
if vace_input_masks.shape[1] != 1:
# Convert RGB depth map to single-channel by averaging
vace_input_masks = vace_input_masks.mean(dim=1, keepdim=True)
Or alternatively, validate and raise a more helpful error directing the user to ensure their mask source produces single-channel output.
The input should ideally be converted/normalized at the graph edge level (when the depth map is wired to the vace_input_frames port) rather than inside the block itself.
Impact
- Severity: High — the pipeline silently keeps running but produces no useful output; every chunk fails
- Frequency: ~160+ errors in a single ~2-hour session (2026-04-10)
- Affects: Any user wiring
video-depth-anything → longlive with VACE enabled
Summary
The longlive pipeline crashes with a
ValueErrorinVaceEncodingBlock._encode_with_conditioningwhenvace_input_maskshas 3 channels instead of the expected 1. This happens when thevideo-depth-anythingpipeline output is used asvace_input_framesin a graph — the depth map appears to arrive as a 3-channel (RGB) tensor rather than the 1-channel (grayscale) format that VACE requires.The error fires continuously (~160+ times in a single session), causing the entire longlive pipeline to fail on every chunk for the duration of the session.
Error Logs (Grafana/Loki — 2026-04-10 16:36–16:38 UTC)
fal_app:
github_f1lhgmk5v76a0ev1w0u378by-scope-app--prodfal_job_id:
42337cb5-17d4-440c-82d5-1e2aa30a485dsession:
3c6a7a72Repeated every ~400ms for the full session (160+ occurrences).
Triggering Graph Config
The user had configured the following pipeline graph:
With params:
vace_enabled: Truevace_use_input_video: Truevace_context_scale: 0.6The
video-depth-anythingpipeline output is fed directly tolongliveon thevace_input_framesport. The depth pipeline likely produces a 3-channel output (depth map replicated across RGB channels) butVaceEncodingBlockexpects a single-channel grayscale mask.Affected File
src/scope/core/pipelines/wan2_1/vace/blocks/vace_encoding.py__call___encode_with_conditioningNote: Distinct from Related Issues
vace_input_masks shape mismatch— temporal frames count (B,1,13 vs B,1,12). Different dimension.Conv2d spatial kernel underflow— different error entirely.Suggested Fix
In
_encode_with_conditioning(line ~767), add a channel check and auto-convert before the validation raises:Or alternatively, validate and raise a more helpful error directing the user to ensure their mask source produces single-channel output.
The input should ideally be converted/normalized at the graph edge level (when the depth map is wired to the
vace_input_framesport) rather than inside the block itself.Impact
video-depth-anything→longlivewith VACE enabled