Skip to content

fix(directml): correct VRAM detection + make torchaudio imports optional#13898

Open
Emiliooooo12 wants to merge 3 commits into
Comfy-Org:masterfrom
Emiliooooo12:fix/directml-amd-windows
Open

fix(directml): correct VRAM detection + make torchaudio imports optional#13898
Emiliooooo12 wants to merge 3 commits into
Comfy-Org:masterfrom
Emiliooooo12:fix/directml-amd-windows

Conversation

@Emiliooooo12
Copy link
Copy Markdown

Summary

Two independent fixes for AMD GPU users running ComfyUI on Windows with the DirectML backend.

1. Fix hardcoded 1 GB VRAM detection for DirectML (comfy/model_management.py)

Both get_total_memory() and get_free_memory() contained a hardcoded 1024 * 1024 * 1024 #TODO for the DirectML path. This caused ComfyUI to always report 1 GB of VRAM regardless of actual hardware, forcing incorrect memory management decisions (wrong vram_state, undersized model loads, premature offloading).

Fix for get_total_memory:

  • On Windows, reads HardwareInformation.qwMemorySize from the GPU driver registry key via winreg. This is the 64-bit accurate VRAM value — unlike Win32_VideoController.AdapterRAM which silently truncates at 4 GB for cards with more VRAM.
  • Respects a COMFYUI_DIRECTML_VRAM_MB environment variable override for non-standard setups.
  • Falls back to 6 GB if the registry query fails (safe default for modern discrete GPUs).

Fix for get_free_memory:

  • Uses torch_directml.gpu_memory(device_index) to obtain per-tile usage fractions and derives free memory as total_vram * (1.0 - max_usage_fraction).

Before: Total VRAM 1024 MB ... Set vram state to: LOW_VRAM
After: Total VRAM 6128 MB ... Set vram state to: NORMAL_VRAM

Tested on AMD Radeon RX 5600 XT (6 GB VRAM, gfx1010, Windows 10, torch-directml 0.2.5).


2. Make torchaudio imports optional in audio/video nodes

torchaudio has a DLL incompatibility with torch-directml (which ships its own torch runtime). The following files had bare import torchaudio at module level, causing a hard crash at ComfyUI startup when torchaudio is absent or incompatible:

  • comfy/ldm/lightricks/vae/audio_vae.py
  • comfy/audio_encoders/whisper.py
  • comfy/audio_encoders/audio_encoders.py
  • comfy_extras/nodes_audio.py
  • comfy_extras/nodes_lt.py
  • comfy_extras/nodes_wandancer.py

Each import is wrapped in try/except (ImportError, OSError): torchaudio = None, matching the pattern already used in comfy/ldm/mmaudio/vae/autoencoder.py and comfy/ldm/ace/vae/music_dcae_pipeline.py. Audio/video nodes degrade gracefully rather than preventing ComfyUI from starting entirely.


Test plan

  • Start ComfyUI with --directml on an AMD GPU and confirm Total VRAM matches actual hardware (not 1024 MB)
  • Confirm vram_state is set correctly based on real VRAM
  • Uninstall torchaudio and confirm ComfyUI starts without error; audio nodes log a warning instead of crashing
  • Confirm standard CUDA/ROCm paths are unaffected (registry query only runs inside the directml_enabled branch)

…ional

## VRAM Detection (model_management.py)

The DirectML code path had two hardcoded `1024 * 1024 * 1024 #TODO` values
in `get_total_memory()` and `get_free_memory()`, causing ComfyUI to report
only 1 GB of VRAM on any AMD/Intel GPU using the DirectML backend — regardless
of actual hardware. This forced NORMAL_VRAM or LOW_VRAM calculations to be
wildly wrong.

Fix for `get_total_memory`:
- On Windows, reads `HardwareInformation.qwMemorySize` from the GPU driver
  registry key via `winreg`. This is the 64-bit accurate value (unlike
  `Win32_VideoController.AdapterRAM` which overflows at 4 GB).
- Allows override via `COMFYUI_DIRECTML_VRAM_MB` env var.
- Falls back to 6 GB if registry query fails (safe default for modern dGPUs).

Fix for `get_free_memory`:
- Uses `torch_directml.gpu_memory(0)` to get per-tile usage fractions and
  derives free memory as `total * (1 - max_usage_fraction)`.

## torchaudio: optional import on AMD/DirectML

torchaudio has a DLL incompatibility with torch-directml (which ships its own
torch runtime). The following files had bare `import torchaudio` at module
level, crashing ComfyUI startup entirely when torchaudio was absent:

- comfy/ldm/lightricks/vae/audio_vae.py
- comfy/audio_encoders/whisper.py
- comfy/audio_encoders/audio_encoders.py
- comfy_extras/nodes_audio.py
- comfy_extras/nodes_lt.py
- comfy_extras/nodes_wandancer.py

Each import is wrapped in `try/except (ImportError, OSError): torchaudio = None`,
matching the pattern already used in comfy/ldm/mmaudio/vae/autoencoder.py and
comfy/ldm/ace/vae/music_dcae_pipeline.py. Audio nodes will degrade gracefully
rather than preventing ComfyUI from starting.

Tested on: AMD Radeon RX 5600 XT (6 GB VRAM, gfx1010, Windows 10)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 14, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

This PR adds guarded imports of torchaudio across several audio modules and updates DirectML handling: startup logging, total VRAM discovery (env → Windows registry → 6GB fallback), free-memory estimation via torch_directml.gpu_memory(0) with exception fallback, defensive handling for opaque DirectML tensors in mmap/residency and cast_to_gathered, and fixed-step attention-slicing fallbacks when reported free memory is non-positive.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the two main changes: DirectML VRAM detection fixes and making torchaudio imports optional.
Description check ✅ Passed The description comprehensively explains both fixes with technical details, testing guidance, and specific file changes.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (8)
comfy/audio_encoders/whisper.py (1)

22-31: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Runtime guard needed in WhisperFeatureExtractor initialization.

WhisperFeatureExtractor.__init__ instantiates torchaudio.transforms.MelSpectrogram without checking if torchaudio is None. This will fail immediately when the class is instantiated, defeating the purpose of the optional import.

Consider adding a guard:

🛡️ Suggested fix
 class WhisperFeatureExtractor(nn.Module):
     def __init__(self, n_mels=128, device=None):
         super().__init__()
+        if torchaudio is None:
+            raise ImportError("torchaudio is required for WhisperFeatureExtractor but is not installed. "
+                             "Please install torchaudio to use this feature.")
         self.sample_rate = 16000
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@comfy/audio_encoders/whisper.py` around lines 22 - 31,
WhisperFeatureExtractor.__init__ currently constructs self.mel_spectrogram =
torchaudio.transforms.MelSpectrogram(...) unconditionally; add a runtime guard
that first checks torchaudio is not None (the optional import) and only
instantiates the transform when torchaudio is available, otherwise set
self.mel_spectrogram to None (or a safe no-op placeholder) and surface a clear
error later when used; update any places that assume self.mel_spectrogram exists
to raise a descriptive error if it is None.
comfy/ldm/lightricks/vae/audio_vae.py (2)

82-105: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add runtime guard for torchaudio in waveform_to_mel().

AudioPreprocessor.waveform_to_mel() instantiates torchaudio.transforms.MelSpectrogram without checking if torchaudio is None.

🛡️ Suggested fix
 def waveform_to_mel(
     self, waveform: torch.Tensor, waveform_sample_rate: int, device
 ) -> torch.Tensor:
+    if torchaudio is None:
+        raise ImportError("torchaudio is required for mel spectrogram computation but is not installed. "
+                         "Please install torchaudio to use audio features.")
     waveform = self.resample(waveform, waveform_sample_rate)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@comfy/ldm/lightricks/vae/audio_vae.py` around lines 82 - 105, In
waveform_to_mel (AudioPreprocessor.waveform_to_mel) add a runtime guard that
checks torchaudio is available before instantiating
torchaudio.transforms.MelSpectrogram: if torchaudio is None (or import fails),
raise a clear RuntimeError indicating torchaudio is required for waveform_to_mel
and advising how to install it; otherwise proceed to create MelSpectrogram on
the given device and compute mel as before. Ensure the guard is at the top of
waveform_to_mel so the function fails fast with a helpful message.

77-80: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add runtime guard for torchaudio in resample().

AudioPreprocessor.resample() calls torchaudio.functional.resample() without checking if torchaudio is None, which will cause an AttributeError at runtime.

🛡️ Suggested fix
 def resample(self, waveform: torch.Tensor, source_rate: int) -> torch.Tensor:
     if source_rate == self.target_sample_rate:
         return waveform
+    if torchaudio is None:
+        raise ImportError("torchaudio is required for audio resampling but is not installed. "
+                         "Please install torchaudio to use audio features.")
     return torchaudio.functional.resample(waveform, source_rate, self.target_sample_rate)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@comfy/ldm/lightricks/vae/audio_vae.py` around lines 77 - 80, The resample
method calls torchaudio.functional.resample without guarding for torchaudio
being unavailable; update the resample(self, waveform: torch.Tensor,
source_rate: int) method to first check if torchaudio is None and if so raise a
clear RuntimeError (or ImportError) explaining torchaudio is required (e.g.,
"torchaudio not installed; please install torchaudio to use resample"),
otherwise proceed to return torchaudio.functional.resample(waveform,
source_rate, self.target_sample_rate); reference the resample function and
torchaudio.functional.resample in your change.
comfy_extras/nodes_audio.py (3)

773-815: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

AudioEqualizer3Band needs runtime guard for torchaudio.

AudioEqualizer3Band.execute() uses multiple torchaudio.functional biquad filter functions (bass_biquad, equalizer_biquad, treble_biquad) without checking if torchaudio is None.

🛡️ Suggested fix
 `@classmethod`
 def execute(cls, audio, low_gain_dB, low_freq, mid_gain_dB, mid_freq, mid_q, high_gain_dB, high_freq) -> IO.NodeOutput:
     if audio is None:
         return IO.NodeOutput(None)
+    if torchaudio is None:
+        raise ImportError("torchaudio is required for AudioEqualizer3Band but is not installed. "
+                         "Please install torchaudio to use audio equalization features.")
     waveform = audio["waveform"]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@comfy_extras/nodes_audio.py` around lines 773 - 815, The execute method of
AudioEqualizer3Band calls torchaudio.functional functions without verifying
torchaudio is available; add a runtime guard at the top of
AudioEqualizer3Band.execute to check if torchaudio is None and if so immediately
return IO.NodeOutput(audio) (or a clear error via IO.NodeOutput(None) if that
pattern is preferred), so that calls to bass_biquad, equalizer_biquad, and
treble_biquad are only attempted when the torchaudio module is present; update
the method to reference torchaudio only after this guard to avoid AttributeError
at runtime.

528-540: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Helper function match_audio_sample_rates needs torchaudio guard.

The match_audio_sample_rates() helper function calls torchaudio.functional.resample() without checking if torchaudio is None. This is called by JoinAudioChannels, AudioConcat, and AudioMerge nodes.

🛡️ Suggested fix
 def match_audio_sample_rates(waveform_1, sample_rate_1, waveform_2, sample_rate_2):
     if sample_rate_1 != sample_rate_2:
+        if torchaudio is None:
+            raise ImportError("torchaudio is required for audio resampling but is not installed. "
+                             "Please install torchaudio to use audio processing features.")
         if sample_rate_1 > sample_rate_2:
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@comfy_extras/nodes_audio.py` around lines 528 - 540, The helper function
match_audio_sample_rates calls torchaudio.functional.resample without guarding
for torchaudio being unavailable; modify match_audio_sample_rates to check that
the global/module-level torchaudio is not None (or raise/handle a clear error)
before calling torchaudio.functional.resample, and if torchaudio is None either
raise an informative RuntimeError or return a sensible fallback; update logging
to reflect the guarded path so callers like JoinAudioChannels, AudioConcat, and
AudioMerge fail fast with a clear message when torchaudio is not installed.

86-98: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

VAEEncodeAudio needs runtime guard for torchaudio.

VAEEncodeAudio.execute() calls torchaudio.functional.resample() on line 93 without checking if torchaudio is None. This will cause confusing runtime errors for users without torchaudio installed.

🛡️ Suggested fix
 `@classmethod`
 def execute(cls, vae, audio) -> IO.NodeOutput:
     if audio is None:
         raise ValueError("VAEEncodeAudio: input audio is None (source video may have no audio track).")
+    if torchaudio is None:
+        raise ImportError("torchaudio is required for VAEEncodeAudio but is not installed. "
+                         "Please install torchaudio to use audio encoding features.")
     sample_rate = audio["sample_rate"]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@comfy_extras/nodes_audio.py` around lines 86 - 98, VAEEncodeAudio.execute
currently calls torchaudio.functional.resample without verifying torchaudio is
available, which causes unclear runtime crashes; update VAEEncodeAudio.execute
to check for the presence of the torchaudio module before using it (e.g., verify
that a top-level torchaudio symbol or an injected dependency exists) and if
missing raise a clear, actionable error message like "torchaudio is required for
VAEEncodeAudio but is not installed" or provide a documented fallback path;
ensure the resampling branch (where waveform =
torchaudio.functional.resample(...)) is only executed when torchaudio is present
and reference the method name VAEEncodeAudio.execute, the attribute
vae.audio_sample_rate, and the variable waveform so reviewers can locate the
exact change.
comfy_extras/nodes_lt.py (1)

730-738: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

LTXVReferenceAudio needs runtime guard for torchaudio.

LTXVReferenceAudio.execute() calls torchaudio.functional.resample() on line 736 without checking if torchaudio is None. This will cause an AttributeError at runtime when users try to use audio reference features without torchaudio installed.

🛡️ Suggested fix
 `@classmethod`
 def execute(cls, model, positive, negative, reference_audio, audio_vae, identity_guidance_scale, start_percent, end_percent) -> io.NodeOutput:
+    if torchaudio is None:
+        raise ImportError("torchaudio is required for LTXVReferenceAudio but is not installed. "
+                         "Please install torchaudio to use audio reference features.")
     # Encode reference audio to latents and patchify
     sample_rate = reference_audio["sample_rate"]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@comfy_extras/nodes_lt.py` around lines 730 - 738, The execute method of
LTXVReferenceAudio calls torchaudio.functional.resample without ensuring
torchaudio is available; add a runtime guard in the execute method to check that
the module (torchaudio) is not None before calling
torchaudio.functional.resample (or check hasattr(torchaudio, "functional")). If
torchaudio is missing, raise a clear RuntimeError (or ValueError) explaining
that audio reference features require torchaudio and instructing the user to
install it, otherwise proceed with the resample call; reference the execute
method and the use of torchaudio.functional.resample and the variables
sample_rate, vae_sample_rate, and waveform when implementing the guard.
comfy_extras/nodes_wandancer.py (1)

732-749: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

WanDancerEncodeAudio needs runtime guard for torchaudio.

WanDancerEncodeAudio.execute() calls torchaudio.functional.resample() on line 748 without checking if torchaudio is None. This will cause an AttributeError at runtime.

🛡️ Suggested fix
 `@classmethod`
 def execute(cls, video_frames, audio_inject_scale, audio) -> io.NodeOutput:
+    if torchaudio is None:
+        raise ImportError("torchaudio is required for WanDancerEncodeAudio but is not installed. "
+                         "Please install torchaudio to use audio encoding features.")
     waveform = audio["waveform"][0]
     sample_rate = audio["sample_rate"]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@comfy_extras/nodes_wandancer.py` around lines 732 - 749, The execute method
in WanDancerEncodeAudio currently calls torchaudio.functional.resample
unguarded; add a runtime guard in WanDancerEncodeAudio.execute to check that the
module variable torchaudio is not None (or importable) before calling
torchaudio.functional.resample, and handle the missing dependency by raising a
clear error or skipping/rescaling logic (e.g., fallback or informative
exception). Locate the call to torchaudio.functional.resample in
WanDancerEncodeAudio.execute and wrap it with a conditional that either uses the
resample when torchaudio is present or returns/raises a helpful message when it
is absent.
🧹 Nitpick comments (1)
comfy/model_management.py (1)

216-249: ⚡ Quick win

Add logging for VRAM detection path to aid debugging.

The current implementation silently falls through env var → registry → fallback with no indication of which succeeded. Adding a log statement when each path succeeds would help users and maintainers verify correct VRAM detection, especially given the multi-GPU concerns.

📝 Example logging additions
if _override:
    _dml_vram = int(_override) * 1024 * 1024
    logging.info(f"DirectML VRAM override: {_dml_vram // (1024*1024)} MB")
# ... later after registry query succeeds:
    _dml_vram = _mem
    logging.info(f"DirectML VRAM from registry: {_dml_vram // (1024*1024)} MB")
    break
# ... at fallback:
if _dml_vram <= 0:
    _dml_vram = 6 * 1024 * 1024 * 1024
    logging.info("DirectML VRAM registry query failed, using 6GB default")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@comfy/model_management.py` around lines 216 - 249, The VRAM detection
currently silently uses the env var, registry, or a 6GB fallback; add
informative logging where each path sets _dml_vram so users can see which branch
succeeded. Specifically, in the COMFYUI_DIRECTML_VRAM_MB override branch (when
_override is truthy) log the override value in MB; in the registry branch where
_mem is assigned to _dml_vram log the registry-detected VRAM in MB (inside the
winreg loop where _dml_vram = _mem); and in the final fallback (where _dml_vram
is set to 6 * 1024 * 1024 * 1024) log that the registry/env check failed and the
6GB default is being used; use the existing logging module and keep messages
concise and contextual (reference symbols: _dml_vram,
_override/COMFYUI_DIRECTML_VRAM_MB, _mem, mem_total).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@comfy/audio_encoders/audio_encoders.py`:
- Around line 7-10: Add a runtime guard at the start of the encode_audio method
to fail fast when the optional dependency is missing: in the function
encode_audio(self, audio, sample_rate) check if torchaudio is None and raise an
ImportError with a clear message like "torchaudio is required for audio encoding
but is not installed. Please install torchaudio to use this feature." This
prevents calling torchaudio.functional.resample() when torchaudio is missing and
matches the existing pattern used elsewhere (e.g., comfy_api/latest/_ui.py).

In `@comfy/model_management.py`:
- Around line 216-249: The current registry scan fills _dml_vram from the first
adapter >128MB which can mismatch the DirectML device selected by args.directml;
update the logic to locate the registry subkey corresponding to the chosen
DirectML device by comparing the adapter name string (use
torch_directml.device_name(args.directml) or the same name used at logging)
against the registry values (e.g., the subkey or DeviceDescription/DriverDesc
entries) when iterating winreg keys and only accept
HardwareInformation.qwMemorySize for that matched adapter; if no match, fall
back to COMFYUI_DIRECTML_VRAM_MB and then the 6GB default, and ensure
COMFYUI_DIRECTML_VRAM_MB is documented as the override for multi‑GPU setups.
- Around line 1540-1549: The code incorrectly hardcodes
torch_directml.gpu_memory(0) which queries GPU 0 instead of the selected
DirectML device; update the call to use the module-level device_index variable
(declared near the top of the module) so gpu_memory(device_index) is queried,
and ensure the mem_free_total calculation still uses total_vram and _usage_pct
(adjust _total and mem_free_total in the try/except block around the gpu_memory
call that computes _usage_fracs, _usage_pct, _total, and mem_free_total).
- Line 238: The boundary check for VRAM size uses a strict greater-than and will
exclude GPUs reporting exactly 128 MB; in the condition using _mem (inside the
function/method where GPUs are filtered) change the test from "isinstance(_mem,
int) and _mem > 128 * 1024 * 1024" to use ">=" so it reads "isinstance(_mem,
int) and _mem >= 128 * 1024 * 1024" to include devices with exactly 128 MB VRAM.

---

Outside diff comments:
In `@comfy_extras/nodes_audio.py`:
- Around line 773-815: The execute method of AudioEqualizer3Band calls
torchaudio.functional functions without verifying torchaudio is available; add a
runtime guard at the top of AudioEqualizer3Band.execute to check if torchaudio
is None and if so immediately return IO.NodeOutput(audio) (or a clear error via
IO.NodeOutput(None) if that pattern is preferred), so that calls to bass_biquad,
equalizer_biquad, and treble_biquad are only attempted when the torchaudio
module is present; update the method to reference torchaudio only after this
guard to avoid AttributeError at runtime.
- Around line 528-540: The helper function match_audio_sample_rates calls
torchaudio.functional.resample without guarding for torchaudio being
unavailable; modify match_audio_sample_rates to check that the
global/module-level torchaudio is not None (or raise/handle a clear error)
before calling torchaudio.functional.resample, and if torchaudio is None either
raise an informative RuntimeError or return a sensible fallback; update logging
to reflect the guarded path so callers like JoinAudioChannels, AudioConcat, and
AudioMerge fail fast with a clear message when torchaudio is not installed.
- Around line 86-98: VAEEncodeAudio.execute currently calls
torchaudio.functional.resample without verifying torchaudio is available, which
causes unclear runtime crashes; update VAEEncodeAudio.execute to check for the
presence of the torchaudio module before using it (e.g., verify that a top-level
torchaudio symbol or an injected dependency exists) and if missing raise a
clear, actionable error message like "torchaudio is required for VAEEncodeAudio
but is not installed" or provide a documented fallback path; ensure the
resampling branch (where waveform = torchaudio.functional.resample(...)) is only
executed when torchaudio is present and reference the method name
VAEEncodeAudio.execute, the attribute vae.audio_sample_rate, and the variable
waveform so reviewers can locate the exact change.

In `@comfy_extras/nodes_lt.py`:
- Around line 730-738: The execute method of LTXVReferenceAudio calls
torchaudio.functional.resample without ensuring torchaudio is available; add a
runtime guard in the execute method to check that the module (torchaudio) is not
None before calling torchaudio.functional.resample (or check hasattr(torchaudio,
"functional")). If torchaudio is missing, raise a clear RuntimeError (or
ValueError) explaining that audio reference features require torchaudio and
instructing the user to install it, otherwise proceed with the resample call;
reference the execute method and the use of torchaudio.functional.resample and
the variables sample_rate, vae_sample_rate, and waveform when implementing the
guard.

In `@comfy_extras/nodes_wandancer.py`:
- Around line 732-749: The execute method in WanDancerEncodeAudio currently
calls torchaudio.functional.resample unguarded; add a runtime guard in
WanDancerEncodeAudio.execute to check that the module variable torchaudio is not
None (or importable) before calling torchaudio.functional.resample, and handle
the missing dependency by raising a clear error or skipping/rescaling logic
(e.g., fallback or informative exception). Locate the call to
torchaudio.functional.resample in WanDancerEncodeAudio.execute and wrap it with
a conditional that either uses the resample when torchaudio is present or
returns/raises a helpful message when it is absent.

In `@comfy/audio_encoders/whisper.py`:
- Around line 22-31: WhisperFeatureExtractor.__init__ currently constructs
self.mel_spectrogram = torchaudio.transforms.MelSpectrogram(...)
unconditionally; add a runtime guard that first checks torchaudio is not None
(the optional import) and only instantiates the transform when torchaudio is
available, otherwise set self.mel_spectrogram to None (or a safe no-op
placeholder) and surface a clear error later when used; update any places that
assume self.mel_spectrogram exists to raise a descriptive error if it is None.

In `@comfy/ldm/lightricks/vae/audio_vae.py`:
- Around line 82-105: In waveform_to_mel (AudioPreprocessor.waveform_to_mel) add
a runtime guard that checks torchaudio is available before instantiating
torchaudio.transforms.MelSpectrogram: if torchaudio is None (or import fails),
raise a clear RuntimeError indicating torchaudio is required for waveform_to_mel
and advising how to install it; otherwise proceed to create MelSpectrogram on
the given device and compute mel as before. Ensure the guard is at the top of
waveform_to_mel so the function fails fast with a helpful message.
- Around line 77-80: The resample method calls torchaudio.functional.resample
without guarding for torchaudio being unavailable; update the resample(self,
waveform: torch.Tensor, source_rate: int) method to first check if torchaudio is
None and if so raise a clear RuntimeError (or ImportError) explaining torchaudio
is required (e.g., "torchaudio not installed; please install torchaudio to use
resample"), otherwise proceed to return torchaudio.functional.resample(waveform,
source_rate, self.target_sample_rate); reference the resample function and
torchaudio.functional.resample in your change.

---

Nitpick comments:
In `@comfy/model_management.py`:
- Around line 216-249: The VRAM detection currently silently uses the env var,
registry, or a 6GB fallback; add informative logging where each path sets
_dml_vram so users can see which branch succeeded. Specifically, in the
COMFYUI_DIRECTML_VRAM_MB override branch (when _override is truthy) log the
override value in MB; in the registry branch where _mem is assigned to _dml_vram
log the registry-detected VRAM in MB (inside the winreg loop where _dml_vram =
_mem); and in the final fallback (where _dml_vram is set to 6 * 1024 * 1024 *
1024) log that the registry/env check failed and the 6GB default is being used;
use the existing logging module and keep messages concise and contextual
(reference symbols: _dml_vram, _override/COMFYUI_DIRECTML_VRAM_MB, _mem,
mem_total).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 1592ae33-f4e6-4123-a046-ab7ea68b968b

📥 Commits

Reviewing files that changed from the base of the PR and between 1f28908 and e860732.

📒 Files selected for processing (7)
  • comfy/audio_encoders/audio_encoders.py
  • comfy/audio_encoders/whisper.py
  • comfy/ldm/lightricks/vae/audio_vae.py
  • comfy/model_management.py
  • comfy_extras/nodes_audio.py
  • comfy_extras/nodes_lt.py
  • comfy_extras/nodes_wandancer.py

Comment on lines +7 to +10
try:
import torchaudio
except (ImportError, OSError):
torchaudio = None
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Search for torchaudio usage patterns in existing audio modules
rg -n -A5 -B5 'torchaudio\s*=\s*None' --type=py

Repository: Comfy-Org/ComfyUI

Length of output: 3875


🏁 Script executed:

#!/bin/bash
# Check the encode_audio method implementation around line 43
sed -n '35,55p' comfy/audio_encoders/audio_encoders.py

Repository: Comfy-Org/ComfyUI

Length of output: 927


🏁 Script executed:

#!/bin/bash
# Check if other audio modules have runtime guards for torchaudio usage
rg -n 'if\s+torchaudio\s+is\s+None' --type=py -A 2

Repository: Comfy-Org/ComfyUI

Length of output: 43


🏁 Script executed:

#!/bin/bash
# Search for all torchaudio.functional.resample usage
rg -n 'torchaudio\.functional\.resample' --type=py -B 3 -A 3

Repository: Comfy-Org/ComfyUI

Length of output: 7279


Add runtime guard for optional torchaudio dependency.

The conditional import at lines 7-10 allows the module to load when torchaudio is unavailable, but encode_audio() at line 43 calls torchaudio.functional.resample() without checking if torchaudio is None. This will raise an AttributeError: 'NoneType' object has no attribute 'functional' instead of a clear error message.

The codebase already demonstrates the correct pattern in comfy_api/latest/_ui.py, which validates availability before use. Add a guard at the start of encode_audio():

def encode_audio(self, audio, sample_rate):
    if torchaudio is None:
        raise ImportError("torchaudio is required for audio encoding but is not installed. "
                         "Please install torchaudio to use this feature.")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@comfy/audio_encoders/audio_encoders.py` around lines 7 - 10, Add a runtime
guard at the start of the encode_audio method to fail fast when the optional
dependency is missing: in the function encode_audio(self, audio, sample_rate)
check if torchaudio is None and raise an ImportError with a clear message like
"torchaudio is required for audio encoding but is not installed. Please install
torchaudio to use this feature." This prevents calling
torchaudio.functional.resample() when torchaudio is missing and matches the
existing pattern used elsewhere (e.g., comfy_api/latest/_ui.py).

Comment thread comfy/model_management.py
Comment on lines +216 to +249
# Query real VRAM from Windows registry (qwMemorySize is 64-bit, AdapterRAM caps at 4GB)
# Falls back to COMFYUI_DIRECTML_VRAM_MB env var, then 6GB default
_dml_vram = 0
try:
_override = os.environ.get("COMFYUI_DIRECTML_VRAM_MB")
if _override:
_dml_vram = int(_override) * 1024 * 1024
except Exception:
pass
if _dml_vram <= 0:
try:
import winreg as _winreg
_base = r"SYSTEM\CurrentControlSet\Control\Class\{4d36e968-e325-11ce-bfc1-08002be10318}"
with _winreg.OpenKey(_winreg.HKEY_LOCAL_MACHINE, _base) as _hbase:
_i = 0
while True:
try:
_sub = _winreg.EnumKey(_hbase, _i)
_i += 1
try:
with _winreg.OpenKey(_hbase, _sub) as _hdev:
_mem, _ = _winreg.QueryValueEx(_hdev, "HardwareInformation.qwMemorySize")
if isinstance(_mem, int) and _mem > 128 * 1024 * 1024:
_dml_vram = _mem
break
except Exception:
pass
except OSError:
break
except Exception:
pass
if _dml_vram <= 0:
_dml_vram = 6 * 1024 * 1024 * 1024 # 6GB safe default for modern AMD cards
mem_total = _dml_vram
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Multi-GPU correctness: Registry query ignores selected DirectML device.

The registry enumeration breaks on the first GPU with >128 MB VRAM (line 240), but this may not correspond to the DirectML device selected via args.directml (line 107). On multi-GPU systems, this will report the wrong GPU's VRAM if the selected device isn't the first enumerated adapter.

Consider querying the registry for the specific device by matching against the DirectML device name from torch_directml.device_name(device_index) (already logged at line 112), or document that the env var override COMFYUI_DIRECTML_VRAM_MB should be used on multi-GPU systems.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@comfy/model_management.py` around lines 216 - 249, The current registry scan
fills _dml_vram from the first adapter >128MB which can mismatch the DirectML
device selected by args.directml; update the logic to locate the registry subkey
corresponding to the chosen DirectML device by comparing the adapter name string
(use torch_directml.device_name(args.directml) or the same name used at logging)
against the registry values (e.g., the subkey or DeviceDescription/DriverDesc
entries) when iterating winreg keys and only accept
HardwareInformation.qwMemorySize for that matched adapter; if no match, fall
back to COMFYUI_DIRECTML_VRAM_MB and then the 6GB default, and ensure
COMFYUI_DIRECTML_VRAM_MB is documented as the override for multi‑GPU setups.

Comment thread comfy/model_management.py
try:
with _winreg.OpenKey(_hbase, _sub) as _hdev:
_mem, _ = _winreg.QueryValueEx(_hdev, "HardwareInformation.qwMemorySize")
if isinstance(_mem, int) and _mem > 128 * 1024 * 1024:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Off-by-one boundary check may skip GPUs with exactly 128 MB VRAM.

Line 238 uses > 128 * 1024 * 1024 which would incorrectly skip a GPU reporting exactly 128 MB. Should be >= for correctness.

🔧 Proposed fix
-                                        if isinstance(_mem, int) and _mem > 128 * 1024 * 1024:
+                                        if isinstance(_mem, int) and _mem >= 128 * 1024 * 1024:
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if isinstance(_mem, int) and _mem > 128 * 1024 * 1024:
if isinstance(_mem, int) and _mem >= 128 * 1024 * 1024:
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@comfy/model_management.py` at line 238, The boundary check for VRAM size uses
a strict greater-than and will exclude GPUs reporting exactly 128 MB; in the
condition using _mem (inside the function/method where GPUs are filtered) change
the test from "isinstance(_mem, int) and _mem > 128 * 1024 * 1024" to use ">="
so it reads "isinstance(_mem, int) and _mem >= 128 * 1024 * 1024" to include
devices with exactly 128 MB VRAM.

Comment thread comfy/model_management.py
Comment on lines +1540 to +1549
# gpu_memory(0) returns a list of per-tile usage fractions [0.0–1.0]
# total_vram (module-level) is the registry-queried real VRAM in MB
try:
import torch_directml as _tdml
_usage_fracs = _tdml.gpu_memory(0)
_usage_pct = max(_usage_fracs) if _usage_fracs else 0.0
_total = int(total_vram * 1024 * 1024)
mem_free_total = max(0, int(_total * (1.0 - _usage_pct)))
except Exception:
mem_free_total = int(total_vram * 1024 * 1024)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Hardcoded device index queries wrong GPU's free memory on multi-GPU systems.

Line 1544 always queries torch_directml.gpu_memory(0), but the user may have selected a different DirectML device via args.directml. On multi-GPU systems, this will report device 0's memory usage even when a different device is active.

Use the module-level device_index variable (set at line 107) to query the correct device.

🔧 Proposed fix
             try:
                 import torch_directml as _tdml
-                _usage_fracs = _tdml.gpu_memory(0)
+                _usage_fracs = _tdml.gpu_memory(device_index)
                 _usage_pct = max(_usage_fracs) if _usage_fracs else 0.0
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# gpu_memory(0) returns a list of per-tile usage fractions [0.0–1.0]
# total_vram (module-level) is the registry-queried real VRAM in MB
try:
import torch_directml as _tdml
_usage_fracs = _tdml.gpu_memory(0)
_usage_pct = max(_usage_fracs) if _usage_fracs else 0.0
_total = int(total_vram * 1024 * 1024)
mem_free_total = max(0, int(_total * (1.0 - _usage_pct)))
except Exception:
mem_free_total = int(total_vram * 1024 * 1024)
# gpu_memory(0) returns a list of per-tile usage fractions [0.0–1.0]
# total_vram (module-level) is the registry-queried real VRAM in MB
try:
import torch_directml as _tdml
_usage_fracs = _tdml.gpu_memory(device_index)
_usage_pct = max(_usage_fracs) if _usage_fracs else 0.0
_total = int(total_vram * 1024 * 1024)
mem_free_total = max(0, int(_total * (1.0 - _usage_pct)))
except Exception:
mem_free_total = int(total_vram * 1024 * 1024)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@comfy/model_management.py` around lines 1540 - 1549, The code incorrectly
hardcodes torch_directml.gpu_memory(0) which queries GPU 0 instead of the
selected DirectML device; update the call to use the module-level device_index
variable (declared near the top of the module) so gpu_memory(device_index) is
queried, and ensure the mem_free_total calculation still uses total_vram and
_usage_pct (adjust _total and mem_free_total in the try/except block around the
gpu_memory call that computes _usage_fracs, _usage_pct, _total, and
mem_free_total).

@comfyanonymous
Copy link
Copy Markdown
Member

This is an interesting PR, the only issue I see is that torch directml seems to be unmaintained and the last version is for pytorch 2.5 which should still work in current comfyui but we will likely stop supporting it at some point.

Two runtime crashes affect AMD GPU users on Windows using torch-directml:

1. NotImplementedError in module_mmap_residency / cast_to_gathered
   DirectML tensors are opaque (OpaqueTensorImpl) and do not support
   untyped_storage(). Wrap both call sites in try/except so mmap
   tracking is skipped for DirectML tensors instead of crashing.

2. ZeroDivisionError in attention_split
   DirectML does not expose free VRAM via the standard query path,
   leaving mem_free_total as 0. Guard the math.log() call with a
   floor of 4 GB so split-attention steps are computed safely.

Tested on AMD RX 5600 XT (6 GB VRAM), Windows 11, torch-directml 0.2.5,
ComfyUI 0.21.1, DreamShaper 8 (SD 1.5).

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
comfy/model_management.py (1)

1335-1343: 💤 Low value

Fold the duplicated copy_ call into a single tail statement.

The two branches both end with dest_view.copy_(tensor, non_blocking=non_blocking). Promoting the mmap-marking into the try and letting a single trailing copy_ handle both cases reads a bit more cleanly and removes the risk of the two copies drifting apart later.

♻️ Suggested tidy-up
-            try:
-                storage = tensor._qdata.untyped_storage() if isinstance(tensor, comfy.quant_ops.QuantizedTensor) else tensor.untyped_storage()
-            except NotImplementedError:
-                # DirectML tensors are opaque — skip mmap marking, just copy
-                dest_view.copy_(tensor, non_blocking=non_blocking)
-                continue
-            if hasattr(storage, "_comfy_tensor_mmap_touched"):
-                storage._comfy_tensor_mmap_touched = True
-            dest_view.copy_(tensor, non_blocking=non_blocking)
+            try:
+                storage = tensor._qdata.untyped_storage() if isinstance(tensor, comfy.quant_ops.QuantizedTensor) else tensor.untyped_storage()
+                if hasattr(storage, "_comfy_tensor_mmap_touched"):
+                    storage._comfy_tensor_mmap_touched = True
+            except NotImplementedError:
+                # DirectML tensors are opaque — skip mmap marking
+                pass
+            dest_view.copy_(tensor, non_blocking=non_blocking)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@comfy/model_management.py` around lines 1335 - 1343, The duplicated
dest_view.copy_(tensor, non_blocking=non_blocking) should be unified: in the
block that handles tensor storage (the try that obtains storage via
tensor._qdata.untyped_storage() for comfy.quant_ops.QuantizedTensor or
tensor.untyped_storage()), only perform the storage lookup and conditional
mmap-marking (set storage._comfy_tensor_mmap_touched = True if present); in the
except NotImplementedError branch skip marking (don't call copy_ there), then
move a single trailing dest_view.copy_(tensor, non_blocking=non_blocking) after
the try/except so both cases share that one copy operation. Ensure you reference
the same names (storage, tensor, dest_view, _comfy_tensor_mmap_touched) and keep
the NotImplementedError handling that skips mmap marking.
comfy/ldm/modules/attention.py (1)

339-342: ⚡ Quick win

4GB floor can over-promise on sub-4GB DirectML devices.

A flat 4 GB is fine for the 6 GB Radeon you tested on, but on smaller DirectML targets (older iGPUs, 2–4 GB discrete cards) this lies upward and the chunking math at line 343 will under-step, forcing the OOM-retry loop to do the work that this fallback was meant to avoid. Since get_total_memory(q.device) already returns the registry-derived real total for DirectML, clamping the floor to that value is essentially free and keeps small cards honest.

♻️ Suggested clamp
-    if mem_free_total <= 0:
-        # DirectML doesn't expose free VRAM — assume 4GB free as a safe fallback for 6GB cards
-        mem_free_total = 4 * (1024 ** 3)
+    if mem_free_total <= 0:
+        # DirectML doesn't expose reliable free VRAM — assume up to 4GB,
+        # but never more than the device's actual total.
+        mem_free_total = min(4 * (1024 ** 3), model_management.get_total_memory(q.device))
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@comfy/ldm/modules/attention.py` around lines 339 - 342, When mem_free_total
is non-positive for DirectML, don't unconditionally set a 4GB floor; instead
clamp that fallback to the actual device total returned by
get_total_memory(q.device). Replace the current assignment in the attention
module so mem_free_total = min(4 * (1024 ** 3), get_total_memory(q.device)) (or
use the registry-derived total if smaller) so small DirectML devices aren’t
over-promised; keep using the existing mem_free_total symbol and
get_total_memory(q.device) call to locate the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@comfy/ldm/modules/attention.py`:
- Around line 339-342: When mem_free_total is non-positive for DirectML, don't
unconditionally set a 4GB floor; instead clamp that fallback to the actual
device total returned by get_total_memory(q.device). Replace the current
assignment in the attention module so mem_free_total = min(4 * (1024 ** 3),
get_total_memory(q.device)) (or use the registry-derived total if smaller) so
small DirectML devices aren’t over-promised; keep using the existing
mem_free_total symbol and get_total_memory(q.device) call to locate the change.

In `@comfy/model_management.py`:
- Around line 1335-1343: The duplicated dest_view.copy_(tensor,
non_blocking=non_blocking) should be unified: in the block that handles tensor
storage (the try that obtains storage via tensor._qdata.untyped_storage() for
comfy.quant_ops.QuantizedTensor or tensor.untyped_storage()), only perform the
storage lookup and conditional mmap-marking (set
storage._comfy_tensor_mmap_touched = True if present); in the except
NotImplementedError branch skip marking (don't call copy_ there), then move a
single trailing dest_view.copy_(tensor, non_blocking=non_blocking) after the
try/except so both cases share that one copy operation. Ensure you reference the
same names (storage, tensor, dest_view, _comfy_tensor_mmap_touched) and keep the
NotImplementedError handling that skips mmap marking.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 268003a0-789c-4662-b42e-fbc5e0225339

📥 Commits

Reviewing files that changed from the base of the PR and between e860732 and 93510fd.

📒 Files selected for processing (2)
  • comfy/ldm/modules/attention.py
  • comfy/model_management.py

…roDivisionError sites

Improves on the previous directml commit with three research-based refinements:

1. model_management.py — module_mmap_residency() and cast_to_gathered()
   Replace broad try/except NotImplementedError with an explicit
   `t.device.type == 'privateuseone'` guard. Checking device type is
   faster in a hot loop and makes the intent self-documenting.
   Fixes: github.com/Comfy-Org/issues/8347

2. attention.py — attention_split()
   Replace the "assume 4 GB free" heuristic with `steps = 64`.
   64-slice chunking is safe and correct regardless of card size;
   the 4 GB assumption was fragile on cards with less or more VRAM.

3. diffusionmodules/model.py — slice_attention()
   Apply the identical `steps = 64` guard to the second call site
   for the same ZeroDivisionError (was missed in the previous commit).
   Fixes: github.com/Comfy-Org/issues/1518

Tested end-to-end on AMD RX 5600 XT (6 GB VRAM), Windows 11,
torch-directml 0.2.5, ComfyUI 0.21.1, DreamShaper 8 (SD 1.5).
Full 20-step txt2img pipeline completes and returns a valid PNG.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@comfy/model_management.py`:
- Around line 247-249: The _dml_vram fallback is too optimistic: when registry
probing fails, set a conservative lower bound (e.g. 1 * 1024 * 1024 * 1024 for
1GB) instead of 6GB and use that to set mem_total so downstream total_vram and
vram_state/ budgeting use a safe low-VRAM mode; update the code paths that
assign _dml_vram and mem_total (symbols: _dml_vram and mem_total) to enforce
this minimum and ensure total_vram/vram_state logic will treat the GPU as
low-memory.
- Around line 1552-1559: The except block currently assigns full capacity to
mem_free_total on torch_directml probe failure; change it to mirror the
normal-path conservative calculation by assuming a safe default usage percentage
(e.g. usage_pct = 0.5) and computing mem_free_total = max(0, int(total_vram *
1024 * 1024 * (1.0 - usage_pct))). Update the exception handler in the same
try/except that imports torch_directml (symbols: torch_directml, _usage_fracs,
_usage_pct, mem_free_total, total_vram) so failures use a bounded conservative
free-memory estimate instead of full capacity.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c64ed801-e1b8-4122-a8be-e008020fcb76

📥 Commits

Reviewing files that changed from the base of the PR and between 93510fd and 61235fc.

📒 Files selected for processing (3)
  • comfy/ldm/modules/attention.py
  • comfy/ldm/modules/diffusionmodules/model.py
  • comfy/model_management.py

Comment thread comfy/model_management.py
Comment on lines +247 to +249
if _dml_vram <= 0:
_dml_vram = 6 * 1024 * 1024 * 1024 # 6GB safe default for modern AMD cards
mem_total = _dml_vram
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use a conservative VRAM fallback.

If the registry probe fails, hardcoding 6 GB can overestimate lower-memory DirectML GPUs and push total_vram, vram_state, and later load budgeting too high. A lower bound or forcing a low-VRAM mode on this path is safer than optimistic accounting. As per coding guidelines, comfy/** changes should focus on memory management and GPU resource handling.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@comfy/model_management.py` around lines 247 - 249, The _dml_vram fallback is
too optimistic: when registry probing fails, set a conservative lower bound
(e.g. 1 * 1024 * 1024 * 1024 for 1GB) instead of 6GB and use that to set
mem_total so downstream total_vram and vram_state/ budgeting use a safe low-VRAM
mode; update the code paths that assign _dml_vram and mem_total (symbols:
_dml_vram and mem_total) to enforce this minimum and ensure
total_vram/vram_state logic will treat the GPU as low-memory.

Comment thread comfy/model_management.py
Comment on lines +1552 to +1559
try:
import torch_directml as _tdml
_usage_fracs = _tdml.gpu_memory(0)
_usage_pct = max(_usage_fracs) if _usage_fracs else 0.0
_total = int(total_vram * 1024 * 1024)
mem_free_total = max(0, int(_total * (1.0 - _usage_pct)))
except Exception:
mem_free_total = int(total_vram * 1024 * 1024)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

git diff HEAD -- comfy/model_management.py | head -200

Repository: Comfy-Org/ComfyUI

Length of output: 43


🏁 Script executed:

git log --oneline -n 1 -- comfy/model_management.py

Repository: Comfy-Org/ComfyUI

Length of output: 160


🏁 Script executed:

# Get broader context around lines 1552-1559 to understand how mem_free_total is used
sed -n '1540,1580p' comfy/model_management.py

Repository: Comfy-Org/ComfyUI

Length of output: 2105


🏁 Script executed:

# See the full context of the get_free_memory function including downstream usage
sed -n '1530,1650p' comfy/model_management.py

Repository: Comfy-Org/ComfyUI

Length of output: 4168


🏁 Script executed:

# Search for where mem_free_total is used after it's calculated
rg 'mem_free_total' comfy/model_management.py -A 2 -B 2

Repository: Comfy-Org/ComfyUI

Length of output: 2055


Return conservative free memory on DirectML probe failure instead of assuming full capacity is available.

The exception handler currently reports the entire total_vram as free when torch_directml.gpu_memory() fails. This contradicts the normal path, which applies a conservative (1.0 - usage_pct) calculation with bounds checking. Returning full capacity on a telemetry failure can lead to memory overcommitment and OOM errors.

🔧 Proposed fix
            except Exception:
-                mem_free_total = int(total_vram * 1024 * 1024)
+                mem_free_total = 0
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
try:
import torch_directml as _tdml
_usage_fracs = _tdml.gpu_memory(0)
_usage_pct = max(_usage_fracs) if _usage_fracs else 0.0
_total = int(total_vram * 1024 * 1024)
mem_free_total = max(0, int(_total * (1.0 - _usage_pct)))
except Exception:
mem_free_total = int(total_vram * 1024 * 1024)
try:
import torch_directml as _tdml
_usage_fracs = _tdml.gpu_memory(0)
_usage_pct = max(_usage_fracs) if _usage_fracs else 0.0
_total = int(total_vram * 1024 * 1024)
mem_free_total = max(0, int(_total * (1.0 - _usage_pct)))
except Exception:
mem_free_total = 0
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@comfy/model_management.py` around lines 1552 - 1559, The except block
currently assigns full capacity to mem_free_total on torch_directml probe
failure; change it to mirror the normal-path conservative calculation by
assuming a safe default usage percentage (e.g. usage_pct = 0.5) and computing
mem_free_total = max(0, int(total_vram * 1024 * 1024 * (1.0 - usage_pct))).
Update the exception handler in the same try/except that imports torch_directml
(symbols: torch_directml, _usage_fracs, _usage_pct, mem_free_total, total_vram)
so failures use a bounded conservative free-memory estimate instead of full
capacity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants