fix(directml): correct VRAM detection + make torchaudio imports optional#13898
fix(directml): correct VRAM detection + make torchaudio imports optional#13898Emiliooooo12 wants to merge 3 commits into
Conversation
…ional ## VRAM Detection (model_management.py) The DirectML code path had two hardcoded `1024 * 1024 * 1024 #TODO` values in `get_total_memory()` and `get_free_memory()`, causing ComfyUI to report only 1 GB of VRAM on any AMD/Intel GPU using the DirectML backend — regardless of actual hardware. This forced NORMAL_VRAM or LOW_VRAM calculations to be wildly wrong. Fix for `get_total_memory`: - On Windows, reads `HardwareInformation.qwMemorySize` from the GPU driver registry key via `winreg`. This is the 64-bit accurate value (unlike `Win32_VideoController.AdapterRAM` which overflows at 4 GB). - Allows override via `COMFYUI_DIRECTML_VRAM_MB` env var. - Falls back to 6 GB if registry query fails (safe default for modern dGPUs). Fix for `get_free_memory`: - Uses `torch_directml.gpu_memory(0)` to get per-tile usage fractions and derives free memory as `total * (1 - max_usage_fraction)`. ## torchaudio: optional import on AMD/DirectML torchaudio has a DLL incompatibility with torch-directml (which ships its own torch runtime). The following files had bare `import torchaudio` at module level, crashing ComfyUI startup entirely when torchaudio was absent: - comfy/ldm/lightricks/vae/audio_vae.py - comfy/audio_encoders/whisper.py - comfy/audio_encoders/audio_encoders.py - comfy_extras/nodes_audio.py - comfy_extras/nodes_lt.py - comfy_extras/nodes_wandancer.py Each import is wrapped in `try/except (ImportError, OSError): torchaudio = None`, matching the pattern already used in comfy/ldm/mmaudio/vae/autoencoder.py and comfy/ldm/ace/vae/music_dcae_pipeline.py. Audio nodes will degrade gracefully rather than preventing ComfyUI from starting. Tested on: AMD Radeon RX 5600 XT (6 GB VRAM, gfx1010, Windows 10) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
📝 WalkthroughWalkthroughThis PR adds guarded imports of torchaudio across several audio modules and updates DirectML handling: startup logging, total VRAM discovery (env → Windows registry → 6GB fallback), free-memory estimation via torch_directml.gpu_memory(0) with exception fallback, defensive handling for opaque DirectML tensors in mmap/residency and cast_to_gathered, and fixed-step attention-slicing fallbacks when reported free memory is non-positive. 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (8)
comfy/audio_encoders/whisper.py (1)
22-31:⚠️ Potential issue | 🟠 Major | ⚡ Quick winRuntime guard needed in WhisperFeatureExtractor initialization.
WhisperFeatureExtractor.__init__instantiatestorchaudio.transforms.MelSpectrogramwithout checking iftorchaudio is None. This will fail immediately when the class is instantiated, defeating the purpose of the optional import.Consider adding a guard:
🛡️ Suggested fix
class WhisperFeatureExtractor(nn.Module): def __init__(self, n_mels=128, device=None): super().__init__() + if torchaudio is None: + raise ImportError("torchaudio is required for WhisperFeatureExtractor but is not installed. " + "Please install torchaudio to use this feature.") self.sample_rate = 16000🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@comfy/audio_encoders/whisper.py` around lines 22 - 31, WhisperFeatureExtractor.__init__ currently constructs self.mel_spectrogram = torchaudio.transforms.MelSpectrogram(...) unconditionally; add a runtime guard that first checks torchaudio is not None (the optional import) and only instantiates the transform when torchaudio is available, otherwise set self.mel_spectrogram to None (or a safe no-op placeholder) and surface a clear error later when used; update any places that assume self.mel_spectrogram exists to raise a descriptive error if it is None.comfy/ldm/lightricks/vae/audio_vae.py (2)
82-105:⚠️ Potential issue | 🟠 Major | ⚡ Quick winAdd runtime guard for torchaudio in waveform_to_mel().
AudioPreprocessor.waveform_to_mel()instantiatestorchaudio.transforms.MelSpectrogramwithout checking iftorchaudio is None.🛡️ Suggested fix
def waveform_to_mel( self, waveform: torch.Tensor, waveform_sample_rate: int, device ) -> torch.Tensor: + if torchaudio is None: + raise ImportError("torchaudio is required for mel spectrogram computation but is not installed. " + "Please install torchaudio to use audio features.") waveform = self.resample(waveform, waveform_sample_rate)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@comfy/ldm/lightricks/vae/audio_vae.py` around lines 82 - 105, In waveform_to_mel (AudioPreprocessor.waveform_to_mel) add a runtime guard that checks torchaudio is available before instantiating torchaudio.transforms.MelSpectrogram: if torchaudio is None (or import fails), raise a clear RuntimeError indicating torchaudio is required for waveform_to_mel and advising how to install it; otherwise proceed to create MelSpectrogram on the given device and compute mel as before. Ensure the guard is at the top of waveform_to_mel so the function fails fast with a helpful message.
77-80:⚠️ Potential issue | 🟠 Major | ⚡ Quick winAdd runtime guard for torchaudio in resample().
AudioPreprocessor.resample()callstorchaudio.functional.resample()without checking iftorchaudio is None, which will cause anAttributeErrorat runtime.🛡️ Suggested fix
def resample(self, waveform: torch.Tensor, source_rate: int) -> torch.Tensor: if source_rate == self.target_sample_rate: return waveform + if torchaudio is None: + raise ImportError("torchaudio is required for audio resampling but is not installed. " + "Please install torchaudio to use audio features.") return torchaudio.functional.resample(waveform, source_rate, self.target_sample_rate)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@comfy/ldm/lightricks/vae/audio_vae.py` around lines 77 - 80, The resample method calls torchaudio.functional.resample without guarding for torchaudio being unavailable; update the resample(self, waveform: torch.Tensor, source_rate: int) method to first check if torchaudio is None and if so raise a clear RuntimeError (or ImportError) explaining torchaudio is required (e.g., "torchaudio not installed; please install torchaudio to use resample"), otherwise proceed to return torchaudio.functional.resample(waveform, source_rate, self.target_sample_rate); reference the resample function and torchaudio.functional.resample in your change.comfy_extras/nodes_audio.py (3)
773-815:⚠️ Potential issue | 🟠 Major | ⚡ Quick winAudioEqualizer3Band needs runtime guard for torchaudio.
AudioEqualizer3Band.execute()uses multipletorchaudio.functionalbiquad filter functions (bass_biquad, equalizer_biquad, treble_biquad) without checking iftorchaudio is None.🛡️ Suggested fix
`@classmethod` def execute(cls, audio, low_gain_dB, low_freq, mid_gain_dB, mid_freq, mid_q, high_gain_dB, high_freq) -> IO.NodeOutput: if audio is None: return IO.NodeOutput(None) + if torchaudio is None: + raise ImportError("torchaudio is required for AudioEqualizer3Band but is not installed. " + "Please install torchaudio to use audio equalization features.") waveform = audio["waveform"]🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@comfy_extras/nodes_audio.py` around lines 773 - 815, The execute method of AudioEqualizer3Band calls torchaudio.functional functions without verifying torchaudio is available; add a runtime guard at the top of AudioEqualizer3Band.execute to check if torchaudio is None and if so immediately return IO.NodeOutput(audio) (or a clear error via IO.NodeOutput(None) if that pattern is preferred), so that calls to bass_biquad, equalizer_biquad, and treble_biquad are only attempted when the torchaudio module is present; update the method to reference torchaudio only after this guard to avoid AttributeError at runtime.
528-540:⚠️ Potential issue | 🟠 Major | ⚡ Quick winHelper function match_audio_sample_rates needs torchaudio guard.
The
match_audio_sample_rates()helper function callstorchaudio.functional.resample()without checking iftorchaudio is None. This is called byJoinAudioChannels,AudioConcat, andAudioMergenodes.🛡️ Suggested fix
def match_audio_sample_rates(waveform_1, sample_rate_1, waveform_2, sample_rate_2): if sample_rate_1 != sample_rate_2: + if torchaudio is None: + raise ImportError("torchaudio is required for audio resampling but is not installed. " + "Please install torchaudio to use audio processing features.") if sample_rate_1 > sample_rate_2:🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@comfy_extras/nodes_audio.py` around lines 528 - 540, The helper function match_audio_sample_rates calls torchaudio.functional.resample without guarding for torchaudio being unavailable; modify match_audio_sample_rates to check that the global/module-level torchaudio is not None (or raise/handle a clear error) before calling torchaudio.functional.resample, and if torchaudio is None either raise an informative RuntimeError or return a sensible fallback; update logging to reflect the guarded path so callers like JoinAudioChannels, AudioConcat, and AudioMerge fail fast with a clear message when torchaudio is not installed.
86-98:⚠️ Potential issue | 🟠 Major | ⚡ Quick winVAEEncodeAudio needs runtime guard for torchaudio.
VAEEncodeAudio.execute()callstorchaudio.functional.resample()on line 93 without checking iftorchaudio is None. This will cause confusing runtime errors for users without torchaudio installed.🛡️ Suggested fix
`@classmethod` def execute(cls, vae, audio) -> IO.NodeOutput: if audio is None: raise ValueError("VAEEncodeAudio: input audio is None (source video may have no audio track).") + if torchaudio is None: + raise ImportError("torchaudio is required for VAEEncodeAudio but is not installed. " + "Please install torchaudio to use audio encoding features.") sample_rate = audio["sample_rate"]🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@comfy_extras/nodes_audio.py` around lines 86 - 98, VAEEncodeAudio.execute currently calls torchaudio.functional.resample without verifying torchaudio is available, which causes unclear runtime crashes; update VAEEncodeAudio.execute to check for the presence of the torchaudio module before using it (e.g., verify that a top-level torchaudio symbol or an injected dependency exists) and if missing raise a clear, actionable error message like "torchaudio is required for VAEEncodeAudio but is not installed" or provide a documented fallback path; ensure the resampling branch (where waveform = torchaudio.functional.resample(...)) is only executed when torchaudio is present and reference the method name VAEEncodeAudio.execute, the attribute vae.audio_sample_rate, and the variable waveform so reviewers can locate the exact change.comfy_extras/nodes_lt.py (1)
730-738:⚠️ Potential issue | 🟠 Major | ⚡ Quick winLTXVReferenceAudio needs runtime guard for torchaudio.
LTXVReferenceAudio.execute()callstorchaudio.functional.resample()on line 736 without checking iftorchaudio is None. This will cause anAttributeErrorat runtime when users try to use audio reference features without torchaudio installed.🛡️ Suggested fix
`@classmethod` def execute(cls, model, positive, negative, reference_audio, audio_vae, identity_guidance_scale, start_percent, end_percent) -> io.NodeOutput: + if torchaudio is None: + raise ImportError("torchaudio is required for LTXVReferenceAudio but is not installed. " + "Please install torchaudio to use audio reference features.") # Encode reference audio to latents and patchify sample_rate = reference_audio["sample_rate"]🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@comfy_extras/nodes_lt.py` around lines 730 - 738, The execute method of LTXVReferenceAudio calls torchaudio.functional.resample without ensuring torchaudio is available; add a runtime guard in the execute method to check that the module (torchaudio) is not None before calling torchaudio.functional.resample (or check hasattr(torchaudio, "functional")). If torchaudio is missing, raise a clear RuntimeError (or ValueError) explaining that audio reference features require torchaudio and instructing the user to install it, otherwise proceed with the resample call; reference the execute method and the use of torchaudio.functional.resample and the variables sample_rate, vae_sample_rate, and waveform when implementing the guard.comfy_extras/nodes_wandancer.py (1)
732-749:⚠️ Potential issue | 🟠 Major | ⚡ Quick winWanDancerEncodeAudio needs runtime guard for torchaudio.
WanDancerEncodeAudio.execute()callstorchaudio.functional.resample()on line 748 without checking iftorchaudio is None. This will cause anAttributeErrorat runtime.🛡️ Suggested fix
`@classmethod` def execute(cls, video_frames, audio_inject_scale, audio) -> io.NodeOutput: + if torchaudio is None: + raise ImportError("torchaudio is required for WanDancerEncodeAudio but is not installed. " + "Please install torchaudio to use audio encoding features.") waveform = audio["waveform"][0] sample_rate = audio["sample_rate"]🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@comfy_extras/nodes_wandancer.py` around lines 732 - 749, The execute method in WanDancerEncodeAudio currently calls torchaudio.functional.resample unguarded; add a runtime guard in WanDancerEncodeAudio.execute to check that the module variable torchaudio is not None (or importable) before calling torchaudio.functional.resample, and handle the missing dependency by raising a clear error or skipping/rescaling logic (e.g., fallback or informative exception). Locate the call to torchaudio.functional.resample in WanDancerEncodeAudio.execute and wrap it with a conditional that either uses the resample when torchaudio is present or returns/raises a helpful message when it is absent.
🧹 Nitpick comments (1)
comfy/model_management.py (1)
216-249: ⚡ Quick winAdd logging for VRAM detection path to aid debugging.
The current implementation silently falls through env var → registry → fallback with no indication of which succeeded. Adding a log statement when each path succeeds would help users and maintainers verify correct VRAM detection, especially given the multi-GPU concerns.
📝 Example logging additions
if _override: _dml_vram = int(_override) * 1024 * 1024 logging.info(f"DirectML VRAM override: {_dml_vram // (1024*1024)} MB") # ... later after registry query succeeds: _dml_vram = _mem logging.info(f"DirectML VRAM from registry: {_dml_vram // (1024*1024)} MB") break # ... at fallback: if _dml_vram <= 0: _dml_vram = 6 * 1024 * 1024 * 1024 logging.info("DirectML VRAM registry query failed, using 6GB default")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@comfy/model_management.py` around lines 216 - 249, The VRAM detection currently silently uses the env var, registry, or a 6GB fallback; add informative logging where each path sets _dml_vram so users can see which branch succeeded. Specifically, in the COMFYUI_DIRECTML_VRAM_MB override branch (when _override is truthy) log the override value in MB; in the registry branch where _mem is assigned to _dml_vram log the registry-detected VRAM in MB (inside the winreg loop where _dml_vram = _mem); and in the final fallback (where _dml_vram is set to 6 * 1024 * 1024 * 1024) log that the registry/env check failed and the 6GB default is being used; use the existing logging module and keep messages concise and contextual (reference symbols: _dml_vram, _override/COMFYUI_DIRECTML_VRAM_MB, _mem, mem_total).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@comfy/audio_encoders/audio_encoders.py`:
- Around line 7-10: Add a runtime guard at the start of the encode_audio method
to fail fast when the optional dependency is missing: in the function
encode_audio(self, audio, sample_rate) check if torchaudio is None and raise an
ImportError with a clear message like "torchaudio is required for audio encoding
but is not installed. Please install torchaudio to use this feature." This
prevents calling torchaudio.functional.resample() when torchaudio is missing and
matches the existing pattern used elsewhere (e.g., comfy_api/latest/_ui.py).
In `@comfy/model_management.py`:
- Around line 216-249: The current registry scan fills _dml_vram from the first
adapter >128MB which can mismatch the DirectML device selected by args.directml;
update the logic to locate the registry subkey corresponding to the chosen
DirectML device by comparing the adapter name string (use
torch_directml.device_name(args.directml) or the same name used at logging)
against the registry values (e.g., the subkey or DeviceDescription/DriverDesc
entries) when iterating winreg keys and only accept
HardwareInformation.qwMemorySize for that matched adapter; if no match, fall
back to COMFYUI_DIRECTML_VRAM_MB and then the 6GB default, and ensure
COMFYUI_DIRECTML_VRAM_MB is documented as the override for multi‑GPU setups.
- Around line 1540-1549: The code incorrectly hardcodes
torch_directml.gpu_memory(0) which queries GPU 0 instead of the selected
DirectML device; update the call to use the module-level device_index variable
(declared near the top of the module) so gpu_memory(device_index) is queried,
and ensure the mem_free_total calculation still uses total_vram and _usage_pct
(adjust _total and mem_free_total in the try/except block around the gpu_memory
call that computes _usage_fracs, _usage_pct, _total, and mem_free_total).
- Line 238: The boundary check for VRAM size uses a strict greater-than and will
exclude GPUs reporting exactly 128 MB; in the condition using _mem (inside the
function/method where GPUs are filtered) change the test from "isinstance(_mem,
int) and _mem > 128 * 1024 * 1024" to use ">=" so it reads "isinstance(_mem,
int) and _mem >= 128 * 1024 * 1024" to include devices with exactly 128 MB VRAM.
---
Outside diff comments:
In `@comfy_extras/nodes_audio.py`:
- Around line 773-815: The execute method of AudioEqualizer3Band calls
torchaudio.functional functions without verifying torchaudio is available; add a
runtime guard at the top of AudioEqualizer3Band.execute to check if torchaudio
is None and if so immediately return IO.NodeOutput(audio) (or a clear error via
IO.NodeOutput(None) if that pattern is preferred), so that calls to bass_biquad,
equalizer_biquad, and treble_biquad are only attempted when the torchaudio
module is present; update the method to reference torchaudio only after this
guard to avoid AttributeError at runtime.
- Around line 528-540: The helper function match_audio_sample_rates calls
torchaudio.functional.resample without guarding for torchaudio being
unavailable; modify match_audio_sample_rates to check that the
global/module-level torchaudio is not None (or raise/handle a clear error)
before calling torchaudio.functional.resample, and if torchaudio is None either
raise an informative RuntimeError or return a sensible fallback; update logging
to reflect the guarded path so callers like JoinAudioChannels, AudioConcat, and
AudioMerge fail fast with a clear message when torchaudio is not installed.
- Around line 86-98: VAEEncodeAudio.execute currently calls
torchaudio.functional.resample without verifying torchaudio is available, which
causes unclear runtime crashes; update VAEEncodeAudio.execute to check for the
presence of the torchaudio module before using it (e.g., verify that a top-level
torchaudio symbol or an injected dependency exists) and if missing raise a
clear, actionable error message like "torchaudio is required for VAEEncodeAudio
but is not installed" or provide a documented fallback path; ensure the
resampling branch (where waveform = torchaudio.functional.resample(...)) is only
executed when torchaudio is present and reference the method name
VAEEncodeAudio.execute, the attribute vae.audio_sample_rate, and the variable
waveform so reviewers can locate the exact change.
In `@comfy_extras/nodes_lt.py`:
- Around line 730-738: The execute method of LTXVReferenceAudio calls
torchaudio.functional.resample without ensuring torchaudio is available; add a
runtime guard in the execute method to check that the module (torchaudio) is not
None before calling torchaudio.functional.resample (or check hasattr(torchaudio,
"functional")). If torchaudio is missing, raise a clear RuntimeError (or
ValueError) explaining that audio reference features require torchaudio and
instructing the user to install it, otherwise proceed with the resample call;
reference the execute method and the use of torchaudio.functional.resample and
the variables sample_rate, vae_sample_rate, and waveform when implementing the
guard.
In `@comfy_extras/nodes_wandancer.py`:
- Around line 732-749: The execute method in WanDancerEncodeAudio currently
calls torchaudio.functional.resample unguarded; add a runtime guard in
WanDancerEncodeAudio.execute to check that the module variable torchaudio is not
None (or importable) before calling torchaudio.functional.resample, and handle
the missing dependency by raising a clear error or skipping/rescaling logic
(e.g., fallback or informative exception). Locate the call to
torchaudio.functional.resample in WanDancerEncodeAudio.execute and wrap it with
a conditional that either uses the resample when torchaudio is present or
returns/raises a helpful message when it is absent.
In `@comfy/audio_encoders/whisper.py`:
- Around line 22-31: WhisperFeatureExtractor.__init__ currently constructs
self.mel_spectrogram = torchaudio.transforms.MelSpectrogram(...)
unconditionally; add a runtime guard that first checks torchaudio is not None
(the optional import) and only instantiates the transform when torchaudio is
available, otherwise set self.mel_spectrogram to None (or a safe no-op
placeholder) and surface a clear error later when used; update any places that
assume self.mel_spectrogram exists to raise a descriptive error if it is None.
In `@comfy/ldm/lightricks/vae/audio_vae.py`:
- Around line 82-105: In waveform_to_mel (AudioPreprocessor.waveform_to_mel) add
a runtime guard that checks torchaudio is available before instantiating
torchaudio.transforms.MelSpectrogram: if torchaudio is None (or import fails),
raise a clear RuntimeError indicating torchaudio is required for waveform_to_mel
and advising how to install it; otherwise proceed to create MelSpectrogram on
the given device and compute mel as before. Ensure the guard is at the top of
waveform_to_mel so the function fails fast with a helpful message.
- Around line 77-80: The resample method calls torchaudio.functional.resample
without guarding for torchaudio being unavailable; update the resample(self,
waveform: torch.Tensor, source_rate: int) method to first check if torchaudio is
None and if so raise a clear RuntimeError (or ImportError) explaining torchaudio
is required (e.g., "torchaudio not installed; please install torchaudio to use
resample"), otherwise proceed to return torchaudio.functional.resample(waveform,
source_rate, self.target_sample_rate); reference the resample function and
torchaudio.functional.resample in your change.
---
Nitpick comments:
In `@comfy/model_management.py`:
- Around line 216-249: The VRAM detection currently silently uses the env var,
registry, or a 6GB fallback; add informative logging where each path sets
_dml_vram so users can see which branch succeeded. Specifically, in the
COMFYUI_DIRECTML_VRAM_MB override branch (when _override is truthy) log the
override value in MB; in the registry branch where _mem is assigned to _dml_vram
log the registry-detected VRAM in MB (inside the winreg loop where _dml_vram =
_mem); and in the final fallback (where _dml_vram is set to 6 * 1024 * 1024 *
1024) log that the registry/env check failed and the 6GB default is being used;
use the existing logging module and keep messages concise and contextual
(reference symbols: _dml_vram, _override/COMFYUI_DIRECTML_VRAM_MB, _mem,
mem_total).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 1592ae33-f4e6-4123-a046-ab7ea68b968b
📒 Files selected for processing (7)
comfy/audio_encoders/audio_encoders.pycomfy/audio_encoders/whisper.pycomfy/ldm/lightricks/vae/audio_vae.pycomfy/model_management.pycomfy_extras/nodes_audio.pycomfy_extras/nodes_lt.pycomfy_extras/nodes_wandancer.py
| try: | ||
| import torchaudio | ||
| except (ImportError, OSError): | ||
| torchaudio = None |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Search for torchaudio usage patterns in existing audio modules
rg -n -A5 -B5 'torchaudio\s*=\s*None' --type=pyRepository: Comfy-Org/ComfyUI
Length of output: 3875
🏁 Script executed:
#!/bin/bash
# Check the encode_audio method implementation around line 43
sed -n '35,55p' comfy/audio_encoders/audio_encoders.pyRepository: Comfy-Org/ComfyUI
Length of output: 927
🏁 Script executed:
#!/bin/bash
# Check if other audio modules have runtime guards for torchaudio usage
rg -n 'if\s+torchaudio\s+is\s+None' --type=py -A 2Repository: Comfy-Org/ComfyUI
Length of output: 43
🏁 Script executed:
#!/bin/bash
# Search for all torchaudio.functional.resample usage
rg -n 'torchaudio\.functional\.resample' --type=py -B 3 -A 3Repository: Comfy-Org/ComfyUI
Length of output: 7279
Add runtime guard for optional torchaudio dependency.
The conditional import at lines 7-10 allows the module to load when torchaudio is unavailable, but encode_audio() at line 43 calls torchaudio.functional.resample() without checking if torchaudio is None. This will raise an AttributeError: 'NoneType' object has no attribute 'functional' instead of a clear error message.
The codebase already demonstrates the correct pattern in comfy_api/latest/_ui.py, which validates availability before use. Add a guard at the start of encode_audio():
def encode_audio(self, audio, sample_rate):
if torchaudio is None:
raise ImportError("torchaudio is required for audio encoding but is not installed. "
"Please install torchaudio to use this feature.")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@comfy/audio_encoders/audio_encoders.py` around lines 7 - 10, Add a runtime
guard at the start of the encode_audio method to fail fast when the optional
dependency is missing: in the function encode_audio(self, audio, sample_rate)
check if torchaudio is None and raise an ImportError with a clear message like
"torchaudio is required for audio encoding but is not installed. Please install
torchaudio to use this feature." This prevents calling
torchaudio.functional.resample() when torchaudio is missing and matches the
existing pattern used elsewhere (e.g., comfy_api/latest/_ui.py).
| # Query real VRAM from Windows registry (qwMemorySize is 64-bit, AdapterRAM caps at 4GB) | ||
| # Falls back to COMFYUI_DIRECTML_VRAM_MB env var, then 6GB default | ||
| _dml_vram = 0 | ||
| try: | ||
| _override = os.environ.get("COMFYUI_DIRECTML_VRAM_MB") | ||
| if _override: | ||
| _dml_vram = int(_override) * 1024 * 1024 | ||
| except Exception: | ||
| pass | ||
| if _dml_vram <= 0: | ||
| try: | ||
| import winreg as _winreg | ||
| _base = r"SYSTEM\CurrentControlSet\Control\Class\{4d36e968-e325-11ce-bfc1-08002be10318}" | ||
| with _winreg.OpenKey(_winreg.HKEY_LOCAL_MACHINE, _base) as _hbase: | ||
| _i = 0 | ||
| while True: | ||
| try: | ||
| _sub = _winreg.EnumKey(_hbase, _i) | ||
| _i += 1 | ||
| try: | ||
| with _winreg.OpenKey(_hbase, _sub) as _hdev: | ||
| _mem, _ = _winreg.QueryValueEx(_hdev, "HardwareInformation.qwMemorySize") | ||
| if isinstance(_mem, int) and _mem > 128 * 1024 * 1024: | ||
| _dml_vram = _mem | ||
| break | ||
| except Exception: | ||
| pass | ||
| except OSError: | ||
| break | ||
| except Exception: | ||
| pass | ||
| if _dml_vram <= 0: | ||
| _dml_vram = 6 * 1024 * 1024 * 1024 # 6GB safe default for modern AMD cards | ||
| mem_total = _dml_vram |
There was a problem hiding this comment.
Multi-GPU correctness: Registry query ignores selected DirectML device.
The registry enumeration breaks on the first GPU with >128 MB VRAM (line 240), but this may not correspond to the DirectML device selected via args.directml (line 107). On multi-GPU systems, this will report the wrong GPU's VRAM if the selected device isn't the first enumerated adapter.
Consider querying the registry for the specific device by matching against the DirectML device name from torch_directml.device_name(device_index) (already logged at line 112), or document that the env var override COMFYUI_DIRECTML_VRAM_MB should be used on multi-GPU systems.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@comfy/model_management.py` around lines 216 - 249, The current registry scan
fills _dml_vram from the first adapter >128MB which can mismatch the DirectML
device selected by args.directml; update the logic to locate the registry subkey
corresponding to the chosen DirectML device by comparing the adapter name string
(use torch_directml.device_name(args.directml) or the same name used at logging)
against the registry values (e.g., the subkey or DeviceDescription/DriverDesc
entries) when iterating winreg keys and only accept
HardwareInformation.qwMemorySize for that matched adapter; if no match, fall
back to COMFYUI_DIRECTML_VRAM_MB and then the 6GB default, and ensure
COMFYUI_DIRECTML_VRAM_MB is documented as the override for multi‑GPU setups.
| try: | ||
| with _winreg.OpenKey(_hbase, _sub) as _hdev: | ||
| _mem, _ = _winreg.QueryValueEx(_hdev, "HardwareInformation.qwMemorySize") | ||
| if isinstance(_mem, int) and _mem > 128 * 1024 * 1024: |
There was a problem hiding this comment.
Off-by-one boundary check may skip GPUs with exactly 128 MB VRAM.
Line 238 uses > 128 * 1024 * 1024 which would incorrectly skip a GPU reporting exactly 128 MB. Should be >= for correctness.
🔧 Proposed fix
- if isinstance(_mem, int) and _mem > 128 * 1024 * 1024:
+ if isinstance(_mem, int) and _mem >= 128 * 1024 * 1024:📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if isinstance(_mem, int) and _mem > 128 * 1024 * 1024: | |
| if isinstance(_mem, int) and _mem >= 128 * 1024 * 1024: |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@comfy/model_management.py` at line 238, The boundary check for VRAM size uses
a strict greater-than and will exclude GPUs reporting exactly 128 MB; in the
condition using _mem (inside the function/method where GPUs are filtered) change
the test from "isinstance(_mem, int) and _mem > 128 * 1024 * 1024" to use ">="
so it reads "isinstance(_mem, int) and _mem >= 128 * 1024 * 1024" to include
devices with exactly 128 MB VRAM.
| # gpu_memory(0) returns a list of per-tile usage fractions [0.0–1.0] | ||
| # total_vram (module-level) is the registry-queried real VRAM in MB | ||
| try: | ||
| import torch_directml as _tdml | ||
| _usage_fracs = _tdml.gpu_memory(0) | ||
| _usage_pct = max(_usage_fracs) if _usage_fracs else 0.0 | ||
| _total = int(total_vram * 1024 * 1024) | ||
| mem_free_total = max(0, int(_total * (1.0 - _usage_pct))) | ||
| except Exception: | ||
| mem_free_total = int(total_vram * 1024 * 1024) |
There was a problem hiding this comment.
Hardcoded device index queries wrong GPU's free memory on multi-GPU systems.
Line 1544 always queries torch_directml.gpu_memory(0), but the user may have selected a different DirectML device via args.directml. On multi-GPU systems, this will report device 0's memory usage even when a different device is active.
Use the module-level device_index variable (set at line 107) to query the correct device.
🔧 Proposed fix
try:
import torch_directml as _tdml
- _usage_fracs = _tdml.gpu_memory(0)
+ _usage_fracs = _tdml.gpu_memory(device_index)
_usage_pct = max(_usage_fracs) if _usage_fracs else 0.0📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # gpu_memory(0) returns a list of per-tile usage fractions [0.0–1.0] | |
| # total_vram (module-level) is the registry-queried real VRAM in MB | |
| try: | |
| import torch_directml as _tdml | |
| _usage_fracs = _tdml.gpu_memory(0) | |
| _usage_pct = max(_usage_fracs) if _usage_fracs else 0.0 | |
| _total = int(total_vram * 1024 * 1024) | |
| mem_free_total = max(0, int(_total * (1.0 - _usage_pct))) | |
| except Exception: | |
| mem_free_total = int(total_vram * 1024 * 1024) | |
| # gpu_memory(0) returns a list of per-tile usage fractions [0.0–1.0] | |
| # total_vram (module-level) is the registry-queried real VRAM in MB | |
| try: | |
| import torch_directml as _tdml | |
| _usage_fracs = _tdml.gpu_memory(device_index) | |
| _usage_pct = max(_usage_fracs) if _usage_fracs else 0.0 | |
| _total = int(total_vram * 1024 * 1024) | |
| mem_free_total = max(0, int(_total * (1.0 - _usage_pct))) | |
| except Exception: | |
| mem_free_total = int(total_vram * 1024 * 1024) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@comfy/model_management.py` around lines 1540 - 1549, The code incorrectly
hardcodes torch_directml.gpu_memory(0) which queries GPU 0 instead of the
selected DirectML device; update the call to use the module-level device_index
variable (declared near the top of the module) so gpu_memory(device_index) is
queried, and ensure the mem_free_total calculation still uses total_vram and
_usage_pct (adjust _total and mem_free_total in the try/except block around the
gpu_memory call that computes _usage_fracs, _usage_pct, _total, and
mem_free_total).
|
This is an interesting PR, the only issue I see is that torch directml seems to be unmaintained and the last version is for pytorch 2.5 which should still work in current comfyui but we will likely stop supporting it at some point. |
Two runtime crashes affect AMD GPU users on Windows using torch-directml: 1. NotImplementedError in module_mmap_residency / cast_to_gathered DirectML tensors are opaque (OpaqueTensorImpl) and do not support untyped_storage(). Wrap both call sites in try/except so mmap tracking is skipped for DirectML tensors instead of crashing. 2. ZeroDivisionError in attention_split DirectML does not expose free VRAM via the standard query path, leaving mem_free_total as 0. Guard the math.log() call with a floor of 4 GB so split-attention steps are computed safely. Tested on AMD RX 5600 XT (6 GB VRAM), Windows 11, torch-directml 0.2.5, ComfyUI 0.21.1, DreamShaper 8 (SD 1.5). Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
🧹 Nitpick comments (2)
comfy/model_management.py (1)
1335-1343: 💤 Low valueFold the duplicated
copy_call into a single tail statement.The two branches both end with
dest_view.copy_(tensor, non_blocking=non_blocking). Promoting the mmap-marking into thetryand letting a single trailingcopy_handle both cases reads a bit more cleanly and removes the risk of the two copies drifting apart later.♻️ Suggested tidy-up
- try: - storage = tensor._qdata.untyped_storage() if isinstance(tensor, comfy.quant_ops.QuantizedTensor) else tensor.untyped_storage() - except NotImplementedError: - # DirectML tensors are opaque — skip mmap marking, just copy - dest_view.copy_(tensor, non_blocking=non_blocking) - continue - if hasattr(storage, "_comfy_tensor_mmap_touched"): - storage._comfy_tensor_mmap_touched = True - dest_view.copy_(tensor, non_blocking=non_blocking) + try: + storage = tensor._qdata.untyped_storage() if isinstance(tensor, comfy.quant_ops.QuantizedTensor) else tensor.untyped_storage() + if hasattr(storage, "_comfy_tensor_mmap_touched"): + storage._comfy_tensor_mmap_touched = True + except NotImplementedError: + # DirectML tensors are opaque — skip mmap marking + pass + dest_view.copy_(tensor, non_blocking=non_blocking)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@comfy/model_management.py` around lines 1335 - 1343, The duplicated dest_view.copy_(tensor, non_blocking=non_blocking) should be unified: in the block that handles tensor storage (the try that obtains storage via tensor._qdata.untyped_storage() for comfy.quant_ops.QuantizedTensor or tensor.untyped_storage()), only perform the storage lookup and conditional mmap-marking (set storage._comfy_tensor_mmap_touched = True if present); in the except NotImplementedError branch skip marking (don't call copy_ there), then move a single trailing dest_view.copy_(tensor, non_blocking=non_blocking) after the try/except so both cases share that one copy operation. Ensure you reference the same names (storage, tensor, dest_view, _comfy_tensor_mmap_touched) and keep the NotImplementedError handling that skips mmap marking.comfy/ldm/modules/attention.py (1)
339-342: ⚡ Quick win4GB floor can over-promise on sub-4GB DirectML devices.
A flat 4 GB is fine for the 6 GB Radeon you tested on, but on smaller DirectML targets (older iGPUs, 2–4 GB discrete cards) this lies upward and the chunking math at line 343 will under-step, forcing the OOM-retry loop to do the work that this fallback was meant to avoid. Since
get_total_memory(q.device)already returns the registry-derived real total for DirectML, clamping the floor to that value is essentially free and keeps small cards honest.♻️ Suggested clamp
- if mem_free_total <= 0: - # DirectML doesn't expose free VRAM — assume 4GB free as a safe fallback for 6GB cards - mem_free_total = 4 * (1024 ** 3) + if mem_free_total <= 0: + # DirectML doesn't expose reliable free VRAM — assume up to 4GB, + # but never more than the device's actual total. + mem_free_total = min(4 * (1024 ** 3), model_management.get_total_memory(q.device))🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@comfy/ldm/modules/attention.py` around lines 339 - 342, When mem_free_total is non-positive for DirectML, don't unconditionally set a 4GB floor; instead clamp that fallback to the actual device total returned by get_total_memory(q.device). Replace the current assignment in the attention module so mem_free_total = min(4 * (1024 ** 3), get_total_memory(q.device)) (or use the registry-derived total if smaller) so small DirectML devices aren’t over-promised; keep using the existing mem_free_total symbol and get_total_memory(q.device) call to locate the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@comfy/ldm/modules/attention.py`:
- Around line 339-342: When mem_free_total is non-positive for DirectML, don't
unconditionally set a 4GB floor; instead clamp that fallback to the actual
device total returned by get_total_memory(q.device). Replace the current
assignment in the attention module so mem_free_total = min(4 * (1024 ** 3),
get_total_memory(q.device)) (or use the registry-derived total if smaller) so
small DirectML devices aren’t over-promised; keep using the existing
mem_free_total symbol and get_total_memory(q.device) call to locate the change.
In `@comfy/model_management.py`:
- Around line 1335-1343: The duplicated dest_view.copy_(tensor,
non_blocking=non_blocking) should be unified: in the block that handles tensor
storage (the try that obtains storage via tensor._qdata.untyped_storage() for
comfy.quant_ops.QuantizedTensor or tensor.untyped_storage()), only perform the
storage lookup and conditional mmap-marking (set
storage._comfy_tensor_mmap_touched = True if present); in the except
NotImplementedError branch skip marking (don't call copy_ there), then move a
single trailing dest_view.copy_(tensor, non_blocking=non_blocking) after the
try/except so both cases share that one copy operation. Ensure you reference the
same names (storage, tensor, dest_view, _comfy_tensor_mmap_touched) and keep the
NotImplementedError handling that skips mmap marking.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 268003a0-789c-4662-b42e-fbc5e0225339
📒 Files selected for processing (2)
comfy/ldm/modules/attention.pycomfy/model_management.py
…roDivisionError sites Improves on the previous directml commit with three research-based refinements: 1. model_management.py — module_mmap_residency() and cast_to_gathered() Replace broad try/except NotImplementedError with an explicit `t.device.type == 'privateuseone'` guard. Checking device type is faster in a hot loop and makes the intent self-documenting. Fixes: github.com/Comfy-Org/issues/8347 2. attention.py — attention_split() Replace the "assume 4 GB free" heuristic with `steps = 64`. 64-slice chunking is safe and correct regardless of card size; the 4 GB assumption was fragile on cards with less or more VRAM. 3. diffusionmodules/model.py — slice_attention() Apply the identical `steps = 64` guard to the second call site for the same ZeroDivisionError (was missed in the previous commit). Fixes: github.com/Comfy-Org/issues/1518 Tested end-to-end on AMD RX 5600 XT (6 GB VRAM), Windows 11, torch-directml 0.2.5, ComfyUI 0.21.1, DreamShaper 8 (SD 1.5). Full 20-step txt2img pipeline completes and returns a valid PNG. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@comfy/model_management.py`:
- Around line 247-249: The _dml_vram fallback is too optimistic: when registry
probing fails, set a conservative lower bound (e.g. 1 * 1024 * 1024 * 1024 for
1GB) instead of 6GB and use that to set mem_total so downstream total_vram and
vram_state/ budgeting use a safe low-VRAM mode; update the code paths that
assign _dml_vram and mem_total (symbols: _dml_vram and mem_total) to enforce
this minimum and ensure total_vram/vram_state logic will treat the GPU as
low-memory.
- Around line 1552-1559: The except block currently assigns full capacity to
mem_free_total on torch_directml probe failure; change it to mirror the
normal-path conservative calculation by assuming a safe default usage percentage
(e.g. usage_pct = 0.5) and computing mem_free_total = max(0, int(total_vram *
1024 * 1024 * (1.0 - usage_pct))). Update the exception handler in the same
try/except that imports torch_directml (symbols: torch_directml, _usage_fracs,
_usage_pct, mem_free_total, total_vram) so failures use a bounded conservative
free-memory estimate instead of full capacity.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: c64ed801-e1b8-4122-a8be-e008020fcb76
📒 Files selected for processing (3)
comfy/ldm/modules/attention.pycomfy/ldm/modules/diffusionmodules/model.pycomfy/model_management.py
| if _dml_vram <= 0: | ||
| _dml_vram = 6 * 1024 * 1024 * 1024 # 6GB safe default for modern AMD cards | ||
| mem_total = _dml_vram |
There was a problem hiding this comment.
Use a conservative VRAM fallback.
If the registry probe fails, hardcoding 6 GB can overestimate lower-memory DirectML GPUs and push total_vram, vram_state, and later load budgeting too high. A lower bound or forcing a low-VRAM mode on this path is safer than optimistic accounting. As per coding guidelines, comfy/** changes should focus on memory management and GPU resource handling.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@comfy/model_management.py` around lines 247 - 249, The _dml_vram fallback is
too optimistic: when registry probing fails, set a conservative lower bound
(e.g. 1 * 1024 * 1024 * 1024 for 1GB) instead of 6GB and use that to set
mem_total so downstream total_vram and vram_state/ budgeting use a safe low-VRAM
mode; update the code paths that assign _dml_vram and mem_total (symbols:
_dml_vram and mem_total) to enforce this minimum and ensure
total_vram/vram_state logic will treat the GPU as low-memory.
| try: | ||
| import torch_directml as _tdml | ||
| _usage_fracs = _tdml.gpu_memory(0) | ||
| _usage_pct = max(_usage_fracs) if _usage_fracs else 0.0 | ||
| _total = int(total_vram * 1024 * 1024) | ||
| mem_free_total = max(0, int(_total * (1.0 - _usage_pct))) | ||
| except Exception: | ||
| mem_free_total = int(total_vram * 1024 * 1024) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
git diff HEAD -- comfy/model_management.py | head -200Repository: Comfy-Org/ComfyUI
Length of output: 43
🏁 Script executed:
git log --oneline -n 1 -- comfy/model_management.pyRepository: Comfy-Org/ComfyUI
Length of output: 160
🏁 Script executed:
# Get broader context around lines 1552-1559 to understand how mem_free_total is used
sed -n '1540,1580p' comfy/model_management.pyRepository: Comfy-Org/ComfyUI
Length of output: 2105
🏁 Script executed:
# See the full context of the get_free_memory function including downstream usage
sed -n '1530,1650p' comfy/model_management.pyRepository: Comfy-Org/ComfyUI
Length of output: 4168
🏁 Script executed:
# Search for where mem_free_total is used after it's calculated
rg 'mem_free_total' comfy/model_management.py -A 2 -B 2Repository: Comfy-Org/ComfyUI
Length of output: 2055
Return conservative free memory on DirectML probe failure instead of assuming full capacity is available.
The exception handler currently reports the entire total_vram as free when torch_directml.gpu_memory() fails. This contradicts the normal path, which applies a conservative (1.0 - usage_pct) calculation with bounds checking. Returning full capacity on a telemetry failure can lead to memory overcommitment and OOM errors.
🔧 Proposed fix
except Exception:
- mem_free_total = int(total_vram * 1024 * 1024)
+ mem_free_total = 0📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| try: | |
| import torch_directml as _tdml | |
| _usage_fracs = _tdml.gpu_memory(0) | |
| _usage_pct = max(_usage_fracs) if _usage_fracs else 0.0 | |
| _total = int(total_vram * 1024 * 1024) | |
| mem_free_total = max(0, int(_total * (1.0 - _usage_pct))) | |
| except Exception: | |
| mem_free_total = int(total_vram * 1024 * 1024) | |
| try: | |
| import torch_directml as _tdml | |
| _usage_fracs = _tdml.gpu_memory(0) | |
| _usage_pct = max(_usage_fracs) if _usage_fracs else 0.0 | |
| _total = int(total_vram * 1024 * 1024) | |
| mem_free_total = max(0, int(_total * (1.0 - _usage_pct))) | |
| except Exception: | |
| mem_free_total = 0 |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@comfy/model_management.py` around lines 1552 - 1559, The except block
currently assigns full capacity to mem_free_total on torch_directml probe
failure; change it to mirror the normal-path conservative calculation by
assuming a safe default usage percentage (e.g. usage_pct = 0.5) and computing
mem_free_total = max(0, int(total_vram * 1024 * 1024 * (1.0 - usage_pct))).
Update the exception handler in the same try/except that imports torch_directml
(symbols: torch_directml, _usage_fracs, _usage_pct, mem_free_total, total_vram)
so failures use a bounded conservative free-memory estimate instead of full
capacity.
Summary
Two independent fixes for AMD GPU users running ComfyUI on Windows with the DirectML backend.
1. Fix hardcoded 1 GB VRAM detection for DirectML (
comfy/model_management.py)Both
get_total_memory()andget_free_memory()contained a hardcoded1024 * 1024 * 1024 #TODOfor the DirectML path. This caused ComfyUI to always report 1 GB of VRAM regardless of actual hardware, forcing incorrect memory management decisions (wrongvram_state, undersized model loads, premature offloading).Fix for
get_total_memory:HardwareInformation.qwMemorySizefrom the GPU driver registry key viawinreg. This is the 64-bit accurate VRAM value — unlikeWin32_VideoController.AdapterRAMwhich silently truncates at 4 GB for cards with more VRAM.COMFYUI_DIRECTML_VRAM_MBenvironment variable override for non-standard setups.Fix for
get_free_memory:torch_directml.gpu_memory(device_index)to obtain per-tile usage fractions and derives free memory astotal_vram * (1.0 - max_usage_fraction).Before:
Total VRAM 1024 MB ... Set vram state to: LOW_VRAMAfter:
Total VRAM 6128 MB ... Set vram state to: NORMAL_VRAMTested on AMD Radeon RX 5600 XT (6 GB VRAM, gfx1010, Windows 10, torch-directml 0.2.5).
2. Make
torchaudioimports optional in audio/video nodestorchaudiohas a DLL incompatibility withtorch-directml(which ships its own torch runtime). The following files had bareimport torchaudioat module level, causing a hard crash at ComfyUI startup when torchaudio is absent or incompatible:comfy/ldm/lightricks/vae/audio_vae.pycomfy/audio_encoders/whisper.pycomfy/audio_encoders/audio_encoders.pycomfy_extras/nodes_audio.pycomfy_extras/nodes_lt.pycomfy_extras/nodes_wandancer.pyEach import is wrapped in
try/except (ImportError, OSError): torchaudio = None, matching the pattern already used incomfy/ldm/mmaudio/vae/autoencoder.pyandcomfy/ldm/ace/vae/music_dcae_pipeline.py. Audio/video nodes degrade gracefully rather than preventing ComfyUI from starting entirely.Test plan
--directmlon an AMD GPU and confirmTotal VRAMmatches actual hardware (not 1024 MB)vram_stateis set correctly based on real VRAMdirectml_enabledbranch)