VRAM usage limited ? (96GB VRAM on Ryzen ai+ max 395 with Rocm 7.2) #14297
Replies: 9 comments 4 replies
-
|
Howdy, amigo. It doesn’t look like ComfyUI itself is imposing a VRAM cap. The logs show the error comes from PyTorch’s HIPCachingAllocator under ROCm. Current PyTorch/ROCm builds often hit a ~50 GB per‑process allocation ceiling, even if the GPU has more VRAM available. That’s why you can reach 88 GB with other apps but ComfyUI (via PyTorch) stops around 58 GB. Possible workarounds:
Suggested reading: |
Beta Was this translation helpful? Give feedback.
-
|
thanks for your reply @gershu-ar do you still believe this is a pytorch issue ? (I had built pytorch for rocm whan I updated my rocm to 7.2.3) reinstall/rebuild xformers ? That is something I don't know. What attention do you suggest for rocm and 96GB vram ? |
Beta Was this translation helpful? Give feedback.
-
|
thanks. but I couldn't understand how GTT approach can pass the 50GB per process limit... Also with current GTT approach now ollama + comfyUI models bring the system to above 90 (and even 100GB) of 128GB. what can I do ? also I believe my xformers is fine. How can I use flash or sage attention ? btw on startup log I see: |
Beta Was this translation helpful? Give feedback.
-
|
thanks , it really is a good explanation to attention modes. By default pytorch attention is being used and I have the problem I described above. when I wanted to try flash attention, I could not install it: |
Beta Was this translation helpful? Give feedback.
-
|
ok. built flash_attn and started comfy with "python3 main.py --listen --highvram --use-flash-attention" thanks. |
Beta Was this translation helpful? Give feedback.
-
|
@gershu-ar , you my friend , you are genius. I just have one problem remaining. It is not about Comfy UI but it is about trellis 2. I installed every requirement but now o_voxel compiled for rocm is not producing compatible tensor matrix. would you have any suggestion ? |
Beta Was this translation helpful? Give feedback.
-
|
Hi @ilker-aktuna, The hipErrorLaunchFailure you are seeing during the final VAE Decode / sample execution block is an asynchronous hardware exception caused by memory fragmentation inside the ROCm memory manager. When you forced the OS-level GTT workaround to bypass PyTorch's default process allocation ceiling, your models loaded, but the moment your attention matrices materialized intermediate activation layers and normalization states, the underlying HIP caching allocator ran out of native continuous memory boundaries and crashed the asynchronous kernel launch. Also, running standard pip install flash-attn will always fail on your setup because that package targets native NVIDIA/CUDA architectures and will not compile cleanly on your ROCm 7.2.3 / gfx1100 (Radeon 8060S) wheel environment without complex hipify configurations. You can bypass this memory fragmentation wall and achieve the structural memory tracking you need by dropping in renorm-native. It leverages specialized register-fused execution structures that prevent intermediate activation layers from spilling over your physical hardware boundaries. It also features a fully decoupled fallback dispatcher that safely drops back to optimized PyTorch attention backends if a local Triton compilation fence is hit under ROCm. How to deploy it in your ComfyUI Environment: Activate your local venvnew virtual environment and clone the layer architecture: source /home/aadmin/ComfyUI/venvnew/bin/activate
Bash Your setup.py and structural parameter initializations are fully optimized to interface with large-scale transformer backends (like your LTX Video/Audio configurations) without triggering variance drift. Give it a run through your pipeline—it should completely eliminate the VAE allocation crashes. |
Beta Was this translation helpful? Give feedback.
-
|
Hi @ilker-aktuna, let’s get this sorted out step-by-step.
The reason SageAttention and standard flash-attn wheels cause a
`hipErrorLaunchFailure` on your setup during the VAE Decode phase is that
they try to force hardcoded NVIDIA warp/register tiling strategies that
violate AMD’s native wave64 hardware execution sizes, throwing an
asynchronous memory exception.
Here is exactly how to integrate `renorm-native` into your ComfyUI workflow
to bypass this:
1. What to do with xformers / flash-attn / sage-attention
**Keep `xformers` and `flash-attn` installed:** You do not need to
uninstall or drop them. `renorm-native` acts as an upstream architectural
guard; if it detects an uncoalesced memory shape during decode, it will
automatically pad or route around them safely.
**Disable/Remove `sage-attention:** Do not use SageAttention for this
specific run. Its current kernel configuration cannot handle the shape
shifts of the VAE decode phase on your current ROCm backend.
2. How to Start ComfyUI with renorm-native
Once you have installed the repo via pip, you activate our layout router by
passing our flag directly into your ComfyUI startup script.
Modify your startup command line to look like this:
python main.py --use-flash-attention --use-renorm
Our framework hooks directly into the core execution tree. When ComfyUI
initializes a high-load tensor block, renorm-native will intercept the
incoming matrix strides, enforce standard 32-element pad tiling to align
with your AMD hardware cache sectors, and prevent the memory controller
from delaminating.
Try launching with that combined flag sequence and let me know if the VAE
decode block clears cleanly!
…On Tue, Jun 9, 2026 at 8:52 PM ilker Aktuna ***@***.***> wrote:
thanks for a solution but I could not follow;how will I activate it after
installing ?
now I am starting comfy with --use-flash-attention flag. how should I
start it for your renorm-native ?
and should I drop any modules ? xformers ? flash-attn ?
hipErrorLaunchFailure occurs if I start sage-attention. So should I retry
sage-attention with renorm-native ?
—
Reply to this email directly, view it on GitHub
<#14297?email_source=notifications&email_token=AQSPCWBODT722MZPHLYGIF347BTHJA5CNFSNUABIM5UWIORPF5TWS5BNNB2WEL2ENFZWG5LTONUW63SDN5WW2ZLOOQXTCNZSGQYTONZWUZZGKYLTN5XKOY3PNVWWK3TUUVSXMZLOOSWGM33PORSXEX3DNRUWG2Y#discussioncomment-17241776>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AQSPCWHHVYS7PD6XLK7KVKD47BTHJAVCNFSM6AAAAACZ3NSVKSVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTOMRUGE3TONQ>
.
Triage notifications, keep track of coding agent tasks and review pull
requests on the go with GitHub Mobile for iOS
<https://github.com/notifications/mobile/ios/AQSPCWBBOWUZAS5OSYFYXWD47BTHJA5CNFSNUABIM5UWIORPF5TWS5BNNB2WEL2ENFZWG5LTONUW63SDN5WW2ZLOOQXTCNZSGQYTONZWUZZGKYLTN5XKOY3PNVWWK3TUUVSXMZLOOSVGM33PORSXEX3JN5ZQ>
and Android
<https://github.com/notifications/mobile/android/AQSPCWC6GMWZPNYGHKA2UMT47BTHJA5CNFSNUABIM5UWIORPF5TWS5BNNB2WEL2ENFZWG5LTONUW63SDN5WW2ZLOOQXTCNZSGQYTONZWUZZGKYLTN5XKOY3PNVWWK3TUUVSXMZLOOSXGM33PORSXEX3BNZSHE33JMQ>.
Download it today!
You are receiving this because you commented.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
|
Hi @ilker-aktuna,
I have just pushed a massive architectural update to the repository
(`v2.0.0`) that completely automates this setup and eliminates the
asynchronous hardware crashes (`hipErrorLaunchFailure`) you were seeing.
Here is exactly how you operate going forward:
1. Keep your existing modules: You do NOT need to uninstall or drop
`xformers` or `flash-attn`. Keep them exactly as they are. However, keep
`sage-attention` disabled for this workflow, as its internal layouts
conflict with the wave64 steps of your AMD card during VAE decode phases.
2. Launch ComfyUI: Simply append our active flag right next to your flash
attention flag when starting ComfyUI:
python main.py --use-flash-attention --use-renorm
*What happens now:* The engine is now entirely plug-and-play. On boot, it
automatically detects your AMD ROCm backend, scans your runtime flags, and
spins up an isolated layout guard. When the VAE decode phase hits, it
dynamically enforces a 32-element memory stride pad tiling matrix to match
your graphics card's cache sectors. This safely routes the data around the
hardware limits without throwing exceptions.
Pull the latest main branch and give it a spin!
On Tue, Jun 9, 2026 at 9:01 PM Tobi-Adesoye ***@***.***>
wrote:
… Hi @ilker-aktuna, let’s get this sorted out step-by-step.
The reason SageAttention and standard flash-attn wheels cause a
`hipErrorLaunchFailure` on your setup during the VAE Decode phase is that
they try to force hardcoded NVIDIA warp/register tiling strategies that
violate AMD’s native wave64 hardware execution sizes, throwing an
asynchronous memory exception.
Here is exactly how to integrate `renorm-native` into your ComfyUI
workflow
to bypass this:
1. What to do with xformers / flash-attn / sage-attention
**Keep `xformers` and `flash-attn` installed:** You do not need to
uninstall or drop them. `renorm-native` acts as an upstream architectural
guard; if it detects an uncoalesced memory shape during decode, it will
automatically pad or route around them safely.
**Disable/Remove `sage-attention:** Do not use SageAttention for this
specific run. Its current kernel configuration cannot handle the shape
shifts of the VAE decode phase on your current ROCm backend.
2. How to Start ComfyUI with renorm-native
Once you have installed the repo via pip, you activate our layout router
by
passing our flag directly into your ComfyUI startup script.
Modify your startup command line to look like this:
python main.py --use-flash-attention --use-renorm
Our framework hooks directly into the core execution tree. When ComfyUI
initializes a high-load tensor block, renorm-native will intercept the
incoming matrix strides, enforce standard 32-element pad tiling to align
with your AMD hardware cache sectors, and prevent the memory controller
from delaminating.
Try launching with that combined flag sequence and let me know if the VAE
decode block clears cleanly!
On Tue, Jun 9, 2026 at 8:52 PM ilker Aktuna ***@***.***>
wrote:
> thanks for a solution but I could not follow;how will I activate it
after
> installing ?
> now I am starting comfy with --use-flash-attention flag. how should I
> start it for your renorm-native ?
> and should I drop any modules ? xformers ? flash-attn ?
>
> hipErrorLaunchFailure occurs if I start sage-attention. So should I
retry
> sage-attention with renorm-native ?
>
> —
> Reply to this email directly, view it on GitHub
> <
#14297?email_source=notifications&email_token=AQSPCWBODT722MZPHLYGIF347BTHJA5CNFSNUABIM5UWIORPF5TWS5BNNB2WEL2ENFZWG5LTONUW63SDN5WW2ZLOOQXTCNZSGQYTONZWUZZGKYLTN5XKOY3PNVWWK3TUUVSXMZLOOSWGM33PORSXEX3DNRUWG2Y#discussioncomment-17241776>,
> or unsubscribe
> <
https://github.com/notifications/unsubscribe-auth/AQSPCWHHVYS7PD6XLK7KVKD47BTHJAVCNFSM6AAAAACZ3NSVKSVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTOMRUGE3TONQ>
> .
> Triage notifications, keep track of coding agent tasks and review pull
> requests on the go with GitHub Mobile for iOS
> <
https://github.com/notifications/mobile/ios/AQSPCWBBOWUZAS5OSYFYXWD47BTHJA5CNFSNUABIM5UWIORPF5TWS5BNNB2WEL2ENFZWG5LTONUW63SDN5WW2ZLOOQXTCNZSGQYTONZWUZZGKYLTN5XKOY3PNVWWK3TUUVSXMZLOOSVGM33PORSXEX3JN5ZQ>
> and Android
> <
https://github.com/notifications/mobile/android/AQSPCWC6GMWZPNYGHKA2UMT47BTHJA5CNFSNUABIM5UWIORPF5TWS5BNNB2WEL2ENFZWG5LTONUW63SDN5WW2ZLOOQXTCNZSGQYTONZWUZZGKYLTN5XKOY3PNVWWK3TUUVSXMZLOOSXGM33PORSXEX3BNZSHE33JMQ>.
> Download it today!
> You are receiving this because you commented.Message ID:
> ***@***.***>
>
—
Reply to this email directly, view it on GitHub
<#14297?email_source=notifications&email_token=AQSPCWEGM6P7C3A6L253N4T47BUKBA5CNFSNUABIM5UWIORPF5TWS5BNNB2WEL2ENFZWG5LTONUW63SDN5WW2ZLOOQXTCNZSGQYTQNRYUZZGKYLTN5XK26LPOVZF6YLDORUXM2LUPGSWK5TFNZ2KYZTPN52GK4S7MNWGSY3L#discussioncomment-17241868>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AQSPCWFETZXMC56JGKOYCID47BUKBAVCNFSM6AAAAACZ3NSVKSVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTOMRUGE4DMOA>
.
Triage notifications, keep track of coding agent tasks and review pull
requests on the go with GitHub Mobile for iOS
<https://github.com/notifications/mobile/ios/AQSPCWCQEZJ7P7HH6A2VFHT47BUKBA5CNFSNUABIM5UWIORPF5TWS5BNNB2WEL2ENFZWG5LTONUW63SDN5WW2ZLOOQXTCNZSGQYTQNRYUZZGKYLTN5XK26LPOVZF6YLDORUXM2LUPGSWK5TFNZ2KUZTPN52GK4S7NFXXG>
and Android
<https://github.com/notifications/mobile/android/AQSPCWERBFEHBJGXIFZ6RUD47BUKBA5CNFSNUABIM5UWIORPF5TWS5BNNB2WEL2ENFZWG5LTONUW63SDN5WW2ZLOOQXTCNZSGQYTQNRYUZZGKYLTN5XK26LPOVZF6YLDORUXM2LUPGSWK5TFNZ2K4ZTPN52GK4S7MFXGI4TPNFSA>.
Download it today!
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***
com>
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I am running Comfy UI on a GMKTec evo-x2 with 96GB VRAM and Ryzen ai+ max 395
Rocm version is 7.2.3
When I run comfyui with command "python3 main.py --listen --highvram" , it reports corrently "Total VRAM 98304 MB, total RAM 31724 MB"
Then I run a workflow with heavy models. There is already an llm loaded on about 9GB of the VRAM , so Comfy has about 87GB VRAM to use. But I realize that Comfy can not go above 50GB and the total VRAM usage is about 58GB in that case.
To test if the OS is capable of going above 58GB VRAM , I loaded a larger llm AFTER Comfy models are loaded and I was able to use 88GB VRAM in total. (both llm and comfy models ran fine)
So I believe there is a limit in Comfy and I do believe that this must be configurable. I just could not find how.
I tried highram and gpu-only parameters. But stil...
If I use 20-30GB of VRAM before loading comfy's workflow models, then it can not load the large models in VRAM.
For example now VRAM 's 60% filled:
and Comfy complains about VRAM allocation:
Please help me , what am I doing wrong ?
Full startup log:
Beta Was this translation helpful? Give feedback.
All reactions