Skip to content

Support loading the alpha channel of videos.#13564

Merged
comfyanonymous merged 2 commits intomasterfrom
temp_pr
Apr 26, 2026
Merged

Support loading the alpha channel of videos.#13564
comfyanonymous merged 2 commits intomasterfrom
temp_pr

Conversation

@comfyanonymous
Copy link
Copy Markdown
Member

Not exposed in nodes yet.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 26, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 4d1f3da0-e265-407d-a16c-854d765976fb

📥 Commits

Reviewing files that changed from the base of the PR and between ae7434f and 14a5d14.

📒 Files selected for processing (1)
  • comfy_api/latest/_input_impl/video_types.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • comfy_api/latest/_input_impl/video_types.py

📝 Walkthrough

Walkthrough

The changes add alpha channel support to video frame decoding. The VideoComponents dataclass is extended with an optional alpha field to store alpha channel data as MaskInput. The video decoding logic now inspects decoded frames for an alpha plane and uses either 3-channel gbrpf32le or 4-channel gbrapf32le conversion accordingly. When alpha is present, the decoded ndarray is split into RGB and alpha tensors and both are stacked over frames; when absent, only the RGB tensor is returned. The returned VideoComponents object is populated with the new alpha field.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title directly and clearly describes the main change: adding support for loading video alpha channels, which matches the core implementation across both modified files.
Description check ✅ Passed The description is brief but related to the changeset, noting that alpha channel loading is implemented but not yet exposed in nodes, which accurately reflects the scope of changes.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@comfy_api/latest/_input_impl/video_types.py`:
- Around line 264-269: The alpha channel is produced with a trailing channel dim
so VideoComponents.alpha ends up [B,H,W,1] instead of the MaskInput contract
[B,H,W]; change the per-frame alpha creation from torch.from_numpy(img[...,
-1:]) to torch.from_numpy(img[..., -1]) (or squeeze the last dim) so stacked
alphas are 3D, and update the empty fallback from torch.zeros(0, 0, 0, 1) to
torch.zeros(0, 0, 0) so the shape matches MaskInput; reference the
variables/fields alphas, frames, MaskInput, and VideoComponents.alpha when
applying the change.
- Around line 243-253: The alpha buffer is being reset inside the per-frame loop
causing only the last frame's alpha to survive; in the function handling frame
decoding (the container.decode(video_stream) loop) detect whether the stream has
an alpha channel once before iterating frames (inspect frame.format.components
or better the video_stream pixel format), set alphas = [] and image_format =
'gbrapf32le' if alpha is present before entering the loop, and then inside the
loop only append per-frame alpha data (don't reassign alphas); ensure
start_pts/end_pts and container.seek remain as-is and only the alpha
detection/initialization is hoisted out of the decode loop so alphas accumulates
one entry per frame.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a0571af9-4633-4c2f-ad3d-beb1bdba52b1

📥 Commits

Reviewing files that changed from the base of the PR and between 5e3f15a and ae7434f.

📒 Files selected for processing (2)
  • comfy_api/latest/_input_impl/video_types.py
  • comfy_api/latest/_util/video_types.py

Comment thread comfy_api/latest/_input_impl/video_types.py Outdated
Comment on lines +264 to +269
frames.append(torch.from_numpy(img[..., :-1]))
alphas.append(torch.from_numpy(img[..., -1:]))

images = torch.stack(frames) if len(frames) > 0 else torch.zeros(0, 0, 0, 3)
if alphas is not None:
alphas = torch.stack(alphas) if len(alphas) > 0 else torch.zeros(0, 0, 0, 1)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Alpha tensor shape doesn't match the MaskInput contract.

MaskInput is documented as [B, H, W] (see comfy_api/latest/_input/basic_types.py:9-12), but the slice img[..., -1:] keeps the channel dimension, so the stacked alpha ends up as [B, H, W, 1]. The empty fallback on line 269 also produces a 4D tensor. Any node that consumes VideoComponents.alpha as a standard mask will get the wrong rank.

🛡️ Proposed fix
             img = frame.to_ndarray(format=image_format)  # shape: (H, W, 3) or (H, W, 4) when alpha
             if alphas is None:
                 frames.append(torch.from_numpy(img))
             else:
                 frames.append(torch.from_numpy(img[..., :-1]))
-                alphas.append(torch.from_numpy(img[..., -1:]))
+                alphas.append(torch.from_numpy(img[..., -1]))
 
         images = torch.stack(frames) if len(frames) > 0 else torch.zeros(0, 0, 0, 3)
         if alphas is not None:
-            alphas = torch.stack(alphas) if len(alphas) > 0 else torch.zeros(0, 0, 0, 1)
+            alphas = torch.stack(alphas) if len(alphas) > 0 else torch.zeros(0, 0, 0)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
frames.append(torch.from_numpy(img[..., :-1]))
alphas.append(torch.from_numpy(img[..., -1:]))
images = torch.stack(frames) if len(frames) > 0 else torch.zeros(0, 0, 0, 3)
if alphas is not None:
alphas = torch.stack(alphas) if len(alphas) > 0 else torch.zeros(0, 0, 0, 1)
img = frame.to_ndarray(format=image_format) # shape: (H, W, 3) or (H, W, 4) when alpha
if alphas is None:
frames.append(torch.from_numpy(img))
else:
frames.append(torch.from_numpy(img[..., :-1]))
alphas.append(torch.from_numpy(img[..., -1]))
images = torch.stack(frames) if len(frames) > 0 else torch.zeros(0, 0, 0, 3)
if alphas is not None:
alphas = torch.stack(alphas) if len(alphas) > 0 else torch.zeros(0, 0, 0)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@comfy_api/latest/_input_impl/video_types.py` around lines 264 - 269, The
alpha channel is produced with a trailing channel dim so VideoComponents.alpha
ends up [B,H,W,1] instead of the MaskInput contract [B,H,W]; change the
per-frame alpha creation from torch.from_numpy(img[..., -1:]) to
torch.from_numpy(img[..., -1]) (or squeeze the last dim) so stacked alphas are
3D, and update the empty fallback from torch.zeros(0, 0, 0, 1) to torch.zeros(0,
0, 0) so the shape matches MaskInput; reference the variables/fields alphas,
frames, MaskInput, and VideoComponents.alpha when applying the change.

@comfyanonymous comfyanonymous merged commit df22bcd into master Apr 26, 2026
16 checks passed
@comfyanonymous comfyanonymous deleted the temp_pr branch April 26, 2026 01:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant