Support loading the alpha channel of videos.#13564
Conversation
Not exposed in nodes yet.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughThe changes add alpha channel support to video frame decoding. The 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@comfy_api/latest/_input_impl/video_types.py`:
- Around line 264-269: The alpha channel is produced with a trailing channel dim
so VideoComponents.alpha ends up [B,H,W,1] instead of the MaskInput contract
[B,H,W]; change the per-frame alpha creation from torch.from_numpy(img[...,
-1:]) to torch.from_numpy(img[..., -1]) (or squeeze the last dim) so stacked
alphas are 3D, and update the empty fallback from torch.zeros(0, 0, 0, 1) to
torch.zeros(0, 0, 0) so the shape matches MaskInput; reference the
variables/fields alphas, frames, MaskInput, and VideoComponents.alpha when
applying the change.
- Around line 243-253: The alpha buffer is being reset inside the per-frame loop
causing only the last frame's alpha to survive; in the function handling frame
decoding (the container.decode(video_stream) loop) detect whether the stream has
an alpha channel once before iterating frames (inspect frame.format.components
or better the video_stream pixel format), set alphas = [] and image_format =
'gbrapf32le' if alpha is present before entering the loop, and then inside the
loop only append per-frame alpha data (don't reassign alphas); ensure
start_pts/end_pts and container.seek remain as-is and only the alpha
detection/initialization is hoisted out of the decode loop so alphas accumulates
one entry per frame.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: a0571af9-4633-4c2f-ad3d-beb1bdba52b1
📒 Files selected for processing (2)
comfy_api/latest/_input_impl/video_types.pycomfy_api/latest/_util/video_types.py
| frames.append(torch.from_numpy(img[..., :-1])) | ||
| alphas.append(torch.from_numpy(img[..., -1:])) | ||
|
|
||
| images = torch.stack(frames) if len(frames) > 0 else torch.zeros(0, 0, 0, 3) | ||
| if alphas is not None: | ||
| alphas = torch.stack(alphas) if len(alphas) > 0 else torch.zeros(0, 0, 0, 1) |
There was a problem hiding this comment.
Alpha tensor shape doesn't match the MaskInput contract.
MaskInput is documented as [B, H, W] (see comfy_api/latest/_input/basic_types.py:9-12), but the slice img[..., -1:] keeps the channel dimension, so the stacked alpha ends up as [B, H, W, 1]. The empty fallback on line 269 also produces a 4D tensor. Any node that consumes VideoComponents.alpha as a standard mask will get the wrong rank.
🛡️ Proposed fix
img = frame.to_ndarray(format=image_format) # shape: (H, W, 3) or (H, W, 4) when alpha
if alphas is None:
frames.append(torch.from_numpy(img))
else:
frames.append(torch.from_numpy(img[..., :-1]))
- alphas.append(torch.from_numpy(img[..., -1:]))
+ alphas.append(torch.from_numpy(img[..., -1]))
images = torch.stack(frames) if len(frames) > 0 else torch.zeros(0, 0, 0, 3)
if alphas is not None:
- alphas = torch.stack(alphas) if len(alphas) > 0 else torch.zeros(0, 0, 0, 1)
+ alphas = torch.stack(alphas) if len(alphas) > 0 else torch.zeros(0, 0, 0)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| frames.append(torch.from_numpy(img[..., :-1])) | |
| alphas.append(torch.from_numpy(img[..., -1:])) | |
| images = torch.stack(frames) if len(frames) > 0 else torch.zeros(0, 0, 0, 3) | |
| if alphas is not None: | |
| alphas = torch.stack(alphas) if len(alphas) > 0 else torch.zeros(0, 0, 0, 1) | |
| img = frame.to_ndarray(format=image_format) # shape: (H, W, 3) or (H, W, 4) when alpha | |
| if alphas is None: | |
| frames.append(torch.from_numpy(img)) | |
| else: | |
| frames.append(torch.from_numpy(img[..., :-1])) | |
| alphas.append(torch.from_numpy(img[..., -1])) | |
| images = torch.stack(frames) if len(frames) > 0 else torch.zeros(0, 0, 0, 3) | |
| if alphas is not None: | |
| alphas = torch.stack(alphas) if len(alphas) > 0 else torch.zeros(0, 0, 0) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@comfy_api/latest/_input_impl/video_types.py` around lines 264 - 269, The
alpha channel is produced with a trailing channel dim so VideoComponents.alpha
ends up [B,H,W,1] instead of the MaskInput contract [B,H,W]; change the
per-frame alpha creation from torch.from_numpy(img[..., -1:]) to
torch.from_numpy(img[..., -1]) (or squeeze the last dim) so stacked alphas are
3D, and update the empty fallback from torch.zeros(0, 0, 0, 1) to torch.zeros(0,
0, 0) so the shape matches MaskInput; reference the variables/fields alphas,
frames, MaskInput, and VideoComponents.alpha when applying the change.
Not exposed in nodes yet.