Skip to content

[fal.ai] longlive/VACE: vace_input_masks must have 1 channel, got 3 — depth map fed as RGB instead of grayscale #908

@livepeer-tessa

Description

@livepeer-tessa

Summary

The longlive pipeline crashes with a ValueError in VaceEncodingBlock._encode_with_conditioning when vace_input_masks has 3 channels instead of the expected 1. This happens when the video-depth-anything pipeline output is used as vace_input_frames in a graph — the depth map appears to arrive as a 3-channel (RGB) tensor rather than the 1-channel (grayscale) format that VACE requires.

The error fires continuously (~160+ times in a single session), causing the entire longlive pipeline to fail on every chunk for the duration of the session.


Error Logs (Grafana/Loki — 2026-04-10 16:36–16:38 UTC)

fal_app: github_f1lhgmk5v76a0ev1w0u378by-scope-app--prod
fal_job_id: 42337cb5-17d4-440c-82d5-1e2aa30a485d
session: 3c6a7a72

Error in block: (vace_encoding, VaceEncodingBlock)
Error details: VaceEncodingBlock._encode_with_conditioning: vace_input_masks must have 1 channel, got 3
  File "/app/src/scope/core/pipelines/wan2_1/vace/blocks/vace_encoding.py", line 207, in __call__
  File "/app/src/scope/core/pipelines/wan2_1/vace/blocks/vace_encoding.py", line 767, in _encode_with_conditioning
ValueError: VaceEncodingBlock._encode_with_conditioning: vace_input_masks must have 1 channel, got 3

Repeated every ~400ms for the full session (160+ occurrences).


Triggering Graph Config

The user had configured the following pipeline graph:

input → video-depth-anything → longlive (vace_input_frames) → rife → output

With params:

  • vace_enabled: True
  • vace_use_input_video: True
  • vace_context_scale: 0.6

The video-depth-anything pipeline output is fed directly to longlive on the vace_input_frames port. The depth pipeline likely produces a 3-channel output (depth map replicated across RGB channels) but VaceEncodingBlock expects a single-channel grayscale mask.


Affected File

src/scope/core/pipelines/wan2_1/vace/blocks/vace_encoding.py

  • Line 207: __call__
  • Line 767: _encode_with_conditioning

Note: Distinct from Related Issues


Suggested Fix

In _encode_with_conditioning (line ~767), add a channel check and auto-convert before the validation raises:

if vace_input_masks.shape[1] != 1:
    # Convert RGB depth map to single-channel by averaging
    vace_input_masks = vace_input_masks.mean(dim=1, keepdim=True)

Or alternatively, validate and raise a more helpful error directing the user to ensure their mask source produces single-channel output.

The input should ideally be converted/normalized at the graph edge level (when the depth map is wired to the vace_input_frames port) rather than inside the block itself.


Impact

  • Severity: High — the pipeline silently keeps running but produces no useful output; every chunk fails
  • Frequency: ~160+ errors in a single ~2-hour session (2026-04-10)
  • Affects: Any user wiring video-depth-anythinglonglive with VACE enabled

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions