Fix MPS upscale/VAE failures: enforce contiguous tiled input and safer split-attention chunking#12463
Fix MPS upscale/VAE failures: enforce contiguous tiled input and safer split-attention chunking#12463fritzprix wants to merge 8 commits intoComfy-Org:masterfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR fixes Apple Silicon/MPS runtime failures in upscale and VAE encode workflows by addressing tensor contiguity issues and attention computation limits.
Changes:
- Added
.contiguous()calls to tiled upscaling to ensure memory layout compatibility with model view operations - Improved VAE attention slice sizing with proper ceil-based chunking (fixes incorrect logic for non-evenly-divisible cases)
- Added MPS-specific safeguards to prevent INT_MAX tensor size errors with automatic retry logic
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| comfy/utils.py | Added contiguous() calls before passing tiled tensors to model functions, fixing "view size is not compatible" errors on MPS |
| comfy/ldm/modules/diffusionmodules/model.py | Improved slice_attention chunking with ceil-based sizing, MPS INT_MAX safeguards, and retry logic for MPS-specific errors |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| del s2 | ||
| break | ||
| except RuntimeError as e: | ||
| if q.device.type == "mps" and ("INT_MAX" in str(e) or "MPSGraph" in str(e) or "MPSGaph" in str(e)): |
There was a problem hiding this comment.
There appears to be a typo in the error string check: "MPSGaph" should likely be "MPSGraph". This typo means the error handling won't catch errors containing "MPSGraph" misspelled in this way. Consider removing this misspelled variant unless it's intentionally included to handle a known typo in MPS error messages.
| if q.device.type == "mps" and ("INT_MAX" in str(e) or "MPSGraph" in str(e) or "MPSGaph" in str(e)): | |
| if q.device.type == "mps" and ("INT_MAX" in str(e) or "MPSGraph" in str(e)): |
| del s2 | ||
| break | ||
| except RuntimeError as e: | ||
| if q.device.type == "mps" and ("INT_MAX" in str(e) or "MPSGraph" in str(e) or "MPSGaph" in str(e)): |
There was a problem hiding this comment.
Did you consider moving this complex logic to a new function?
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review infoConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughReplaces integer-based attention slicing with ceil division and a minimum slice size; adds an MPS-specific cap on slice_size derived from q/k shapes to avoid hardware overflow. Introduces MPS-targeted RuntimeError recovery: on certain MPS errors it clears cache, doubles steps, and retries up to a cap (4096), re-raising non-MPS errors. Preserves existing OOM handling (soft cache clear and steps doubling). Also changes two utility call sites to pass tensors as contiguous (.contiguous()). Lines changed across files: ~+13/-1 (plus small util edits). 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Tip Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs). Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Test Evidence CheckIf this PR changes user-facing behavior, visual proof (screen recording or screenshot) is required. PRs without applicable visual documentation may not be reviewed until provided. You can add it by:
|
Summary
This PR fixes two Apple Silicon/MPS runtime failures seen in upscale + VAE encode workflows.
Problem
On MPS, some workflows fail with:
view size is not compatible with input tensor's size and strideMPSGraph does not support tensor dims larger than INT_MAXRoot cause
view-compatible memory layout.Changes
s.contiguous(),s_in.contiguous()).slice_attention:Impact / compatibility
Validation
#11851