Skip to content

Comprehensive ROCm helper for centralized Windows package integration and installation support#1629

Merged
mohnjiles merged 45 commits into
LykosAI:mainfrom
NeuralFault:universal-rocm
Jun 3, 2026
Merged

Comprehensive ROCm helper for centralized Windows package integration and installation support#1629
mohnjiles merged 45 commits into
LykosAI:mainfrom
NeuralFault:universal-rocm

Conversation

@NeuralFault
Copy link
Copy Markdown
Contributor

@NeuralFault NeuralFault commented May 2, 2026

Introduces significant improvements to AMD GPU (ROCm) support and consolidates Windows ROCm support behind a shared helper and expands AMD GPU coverage across the current Windows-native ROCm path. The result is a more consistent install and launch flow for ROCm-capable packages, less duplicated package-specific logic, and broader support from Vega/GCN5 through the entire RDNA lineup. It also establishes the shared ROCm helper foundation that ComfyUI and Wan2GP now use directly, with the same model intended to be reused by other AMD/ROCm-capable packages going forward.

ROCm Support Integration and Refactoring:

  • Introduced the IRocmPackageHelper dependency to PackageFactory, ComfyUI, and Wan2GP, so ROCm compatibility checks, runtime/install context resolution, Windows-native package installation, and launch environment construction all flow through the same shared service instead of being reimplemented per package. This centralizes the Windows ROCm path and makes it easier to extend the same behavior to additional packages later. [1] [2] [3] [4] [5] [6] [7] [8]

  • Refactored the Windows ROCm path in ComfyUI around the shared helper and a package-owned WindowsRocmProfile, replacing hardcoded install/index handling with shared compatibility, install, and environment policy. This keeps the package-specific behavior limited to the pieces that actually need to remain package-specific while letting the helper own the common ROCm workflow. [1] [2] [3]

  • Updated ROCm support detection and launch environment injection to use the centralized helper throughout the package startup. This makes ROCm eligibility checks, EnVar defaults, and runtime-specific overrides consistent across packages, while still allowing package-owned extras such as ComfyUI-specific COMFYUI_ENABLE_MIOPEN. [1] [2] [3]
    This also fixes user-set EnVars in SM settings/Environment Variables not overriding the package configured EnVars due to immutability coded in comfyui.cs's original Windows-ROCm package specific handling. Environment Variable injection flow is as follows: Helper Defaults > Package config > User-set. So the user-set variables are added as last step pulling them from SettingsManager, before finally being set into the package's launch flow. Prioritizing user variables over any previously injected variable of the same key if the user wishes to disable or modify a default variable.

GPU Architecture and Compatibility Improvements:

  • Extended the AMD GPU architecture detection matrix in GpuInfo to cover a wider set of Vega/GCN5, RDNA1, RDNA2, RDNA3/3.5, related handheld/mobile variants, and adding handling support for R9600 RDNA4 Pro GPU that was previously absent. Improving Windows ROCm coverage across the supported lineup instead of limiting support to a narrower subset of cards, as TheRock ROCm Technical Preview PyTorch builds exist now for Vega dGPUs, RDNA1 dGPUs, and practically the entire RDNA2 family. From Vega 56 all the way to RX 9070/R9700. This still excludes Radeon Instinct Vega/CDNA HPC/Datacenter GPUs due no official driver support and/or needing custom hacked drivers for Windows. [1] [2]

  • Refactored ROCm compatibility checks in GpuInfo and HardwareHelper to use shared WindowsRocmSupport logic, removing duplicated support tables and consolidating both support detection and architecture-based policy decisions in one place as a single source of truth in the domain of the ROCm helper. Keeping GpuInfo and HardwareHelper specifically handling just GPU-name > gfxarch translation, and extracting of hardware information from the OS for the GPU installed on the user's system respectively.
    Simplifying further addition of future released GPU gfxarch translation, along with installation indexes and special environmental handling, to 2 dedicated files which can be updated and automatically apply globally to packages wired into the ROCm helper's call paths. [1] [2]

Other Installation and Launch Improvements:

  • Adjusted ComfyUI launch defaults to account for Windows ROCm architecture differences, including defaulting to legacy Windows ROCm GPUs to Quad Cross-Attention where that remains the better default. This keeps package launch behavior aligned with the shared architecture policy without hardcoding that policy in multiple places. [1] [2]

  • Updated package launch environment injection to use ROCm helper-generated variables and shared defaults, so ROCm-enabled installs consistently receive the expected runtime configuration on Windows while still leaving room for package-specific additions where needed. [1] [2] [3]


In Progress / Considerations for further development while still Draft PR status
  • [SD WebUI Reforge Implemented] Expand the shared Windows ROCm helper wiring into additional packages, particularly A3 WebUI and SDForge variants (Forge / reForge), so they can reuse the same compatibility checks, package selection, install flow, and launch environment defaults now used by ComfyUI and Wan2GP. Adding the expansive support provided by the ROCm helper to these most popular additional packages.
  • [Implemented] Improve SwarmUI Windows ROCm integration so the default ROCm launch environment is passed through when SwarmUI starts its self-launched ComfyUI backend, while also layering in user-set variables in SM Settings. This never happened previously, only user set variables were passed from SettingsManager, So variables that were hardcoded in original Win/ROCm handling in ComfyUI never got passed to the SwarmUI Comfy Self-Start backend leading to degraded performance unless the user specifically set them in SM Settings.
  • [Implemented] Evaluate adding Flash Attention 2 support for legacy-architecture installs through AMD AITER using custom builds by 0xDELUXA, either as a default path where stable enough or as an optional Package Command action. With consideration for gating to RDNA2 and older. (RDNA3+ gets its own FA2 via AOTriton which is enabled by default if using modern arch's)
  • [Implemented] Evaluate adding Sage Attention 1.0 as an optional Package Command install, including any required ops patching (sourced from ComfyUI-Zluda installation scripting needed for the Windows ROCm environment along with triton-windows. With potiential gating to RDNA2 and older. (AOTriton Flash Attn 2 in RDNA3+ is preferrable for modern arch's)
  • [ROCm runtime implemented by default] Consider defaulting Windows ROCm installs to the runtime-only ROCm module set rather than the full ROCm + SDK module set, with fuller SDK components installable later through Package Commands if needed. The full SDK stack modules add considerable footprint to the overall size of the venv, installing just the ROCm core runtime modules can be preferrable but needs testing to verify. Potentially making installation of the SDK modules optional if the user intends to compile/build modules needing it at any point.
  • [Implemented] Consider offering an optimized ROCm-aware bitsandbytes install path through the Package Command menu for packages and workflows that benefit from it. These would be prebuilt optimized wheels by 0xDELUXA that cover the entire Win-ROCm supported GPU lineup. Improves low quantization performance for GPUs that support it (needs verification of support variance)
Other future considerations
  • Refactoring and package integration for future Linux install support for passing the same default environment variables for better user experience and performance when using AMD GPUs in a Linux environment with Stability Matrix.
  • Expanding package integration to include other WebUI packages such as InvokeAI, AI Toolkit, Trainers, etc.
  • [Implemented] Refactor build index decision and handling to use new multi-arch url format from TheRock. Consolidating the ROCm/PyTorch installation URLs to a single universal index and have the gfxarch applied in the pip command for the current GPU.

NeuralFault and others added 8 commits April 21, 2026 19:02
- Add initial ROCm helper structure
- Set up ROCm helper foundation
Compile test sucessful.
- Add initial ROCm helper calls/config
- Removed pre-existing Windows ROCm blocks which will be obsolete following helper implementation
- Windows ROCm install/bootstrap logic into shared ROCm helper
- Add gfx-family mapping for Windows-native TheRock ROCm URLs
- Route ComfyUI Win Rocm installs through helper-resolved ROCm runtime, rocm-sdk, and pytorch setup
- Prevent requirements.txt from overwritting helper-installed ROCm torch packages
- Add helper-owned post-install torch verification and improve unsupported GPU failure handling
…ntime/install/environment API and simplify the ROCm profile/context models around the helper’s real responsibilities

- add a centralized Windows ROCm support map so GPU detection, architecture support checks, and package index resolution all use the same source of truth
- expand AMD architecture detection to cover additional RDNA4, Steam Deck, RDNA1, and Vega-class GPUs used by the Windows ROCm support path
- add a helper-managed Windows ROCm bootstrap flow that installs the ROCm runtime, initializes/reinitializes the SDK, aligns rocm-sdk-devel with the resolved torch build, and verifies both torch ROCm metadata and runtime availability
- centralize ROCm launch environment construction in the helper, including default MIOpen, allocator, flash-attention, and AOTriton settings plus legacy SDP fallback, RDNA1 overrides, and user env override layering
- switch ComfyUI to helper-driven Windows ROCm compatibility and launch env handling, and default legacy Windows ROCm GPUs to quad cross-attention while keeping Comfy-specific MIOpen enablement as a preset
- integrate Wan2GP with the shared Windows ROCm helper for install and launch flows, while updating its Linux ROCm path to use upstream rocm7.2 torch/vision/audio installs
- wire the ROCm helper through package construction and add focused test coverage for ROCm build/version parsing, runtime failure classification, and Windows ROCm support/index resolution
- centralize Windows ROCm architecture classification and legacy-attention fallback policy in WindowsRocmSupport
- move ComfyUI-specific MIOpen env handling out of the helper and into package-owned ROCm config
- reuse shared ROCm policy for ComfyUI quad-attention defaults and helper-managed AOTriton / math SDP / RDNA1 gates
- remove dead ROCm preset plumbing and trim unused RocmPackageProfile surface
- rename helper/package methods for clearer default-policy semantics
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request centralizes Windows ROCm support by introducing a shared IRocmPackageHelper service and associated models, refactoring ComfyUI and Wan2GP to use this new framework. The changes include expanded GPU architecture detection and standardized installation and environment configuration logic. Feedback identifies a missing dependency injection for Wan2GP in the factory and a configuration regression in ComfyUI's memory allocation settings. Further suggestions focus on optimizing the helper by reducing redundant hardware probing, avoiding unnecessary hardware refreshes, and using more appropriate exception types.

Comment thread StabilityMatrix.Core/Helper/Factory/PackageFactory.cs
Comment thread StabilityMatrix.Core/Models/Packages/ComfyUI.cs Outdated
Comment thread StabilityMatrix.Core/Services/Rocm/RocmPackageHelper.cs Outdated
Comment thread StabilityMatrix.Core/Services/Rocm/RocmPackageHelper.cs Outdated
Comment thread StabilityMatrix.Core/Services/Rocm/RocmPackageHelper.cs Outdated
Comment thread StabilityMatrix.Core/Services/Rocm/RocmPackageHelper.cs Outdated
NeuralFault and others added 9 commits May 2, 2026 18:32
Was used during debugging and was unintentionally left on.
Add helper-managed ROCm torch compatibility profiles for Windows packages

Enable ComfyUI to use shared ROCm dependency fallback behavior

Add shared Windows ROCm launch notice and experimental-support messaging

Align Wan2GP Windows ROCm disclaimer text with the shared helper messaging

Lower helper-managed MIOPEN_FIND_ENFORCE default from 3 to 1

Add gfx103x borrowed dependency fallback using ROCm-hosted setuptools and mpmath from gfx103x-dgpu due to -all index missing compatible versions

Add helper preinstall and supplemental dependency handling for ROCm torch installs

Add temporary environment override support for helper-owned install steps
- generalize the ROCm helper install path around a new multi-arch TheRock nightly repo index flow
- unify ROCm and PyTorch installation into a single command that automatically selects the correct PyTorch build for the user's GPU architecture
- centralize Windows ROCm GPU and architecture resolution behind one shared helper machine-state path
- trim the ROCm helper API and remove obsolete package-side ROCm install/runtime duplication
- deleted no longer used torch compatibility model since A3WebUI/reForge work was rolled back and abandoned to be visited again in the future for intergration
…upport.

- extract the shared Windows ComfyUI ROCm profile from package-local code
- reuse the shared ROCm helper/profile for both direct ComfyUI launch behavior and SwarmUI self-launch pass-through
- inject Windows ROCm ComfyUI env vars into SwarmUI launching so they propagate to the self-launched backend

- Modified Torch ROCm AOTriton activation EnVar to exclude gfx1152/1153 due to no support yet.
…lityMatrix into universal-rocm

Deleted stale models from previous implementation state.
@NeuralFault
Copy link
Copy Markdown
Contributor Author

@gemini-code-assist recheck

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Thank you for the recheck request. I have reviewed the changes in this pull request, which introduces a centralized IRocmPackageHelper to streamline Windows ROCm support across multiple packages like ComfyUI, Wan2GP, and future integrations. The refactoring effectively consolidates compatibility checks, installation logic, and environment variable management, reducing code duplication and improving maintainability. The extension of the GPU architecture detection matrix to include RDNA4 and additional Vega/RDNA variants is a significant improvement for broad hardware support. The implementation appears robust and follows the established architectural patterns of the project. No further issues identified.

@NeuralFault
Copy link
Copy Markdown
Contributor Author

Refactored installation path to use the new multi-arch index, simplifying the torch install handling to a single command which the gfxarch for the user's GPU is pulled and integrated into the command used from resolved gfxarch, installing PyTorch modules and ROCm runtime in one go. "torch[device-gfx###]" "torchvision[device-gfx###]" torchaudio --index-url https://rocm.nightlies.amd.com/whl-staging-multi-arch/

Centralized most of ComfyUI’s Windows ROCm handling into the shared helper domain to make launch environment variable generation and reuse more consistent, and to support env-var passthrough for ComfyUI self-launch backends in SwarmUI. That Swarm-side integration is intentionally gated behind the expected operating environment and Windows ROCm compatibility checks, and is limited to launch-time passthrough rather than expanding Swarm’s install behavior. As part of that, a shared Windows ROCm Comfy profile was extracted so direct ComfyUI launches and Swarm-managed self-launches follow the same ROCm-specific behavior.

Added global notice message sent to console output of Comfy and future integrated packages notifying user of experimental state of the Windows ROCm configuration and on issue reporting for triage so unnecessary issue reports get submitted upstream for if the issue has nothing to do with the WebUI package itself. (Used package install disclaimer instead for Wan2GP due to not being able to successfully getting it to show in console)

Added a temporary exclusion for gfx1152 and gfx1153 from the AOTriton experimental EnVar passing since those architectures are not currently supported just yet from upstream, and added post-install torch verification so helper-managed installs confirm that the resulting torch build actually reports usable ROCm metadata after installation.

Integration with the helper for reForge (and A3WebUI since it inherits a lot from it) was worked on and got functional, but required rather large dedicated functions for override and fallbacks due to missing modules in the 103x-all index and incompatibility with certain extension that comes with reforge, and setuptools issue. So rolled all that back and gave up on it for the time being but can more easily revisit it due to switching to the multi-arch install handling as this should alleviate a lot of the bloating of the helper and package configs I had previously when they were part of the implementation.

Tested and verified compatibility and function with both R9700 (RDNA4) and 6900XT (RDNA2). Could not test older generation due to not having my Vega 64 anymore, but the special handling for legacy GPUs are already in place.

Last thing is to look over the consideration of the other attention mechanism builds and acceleration library (bnb), along with possible option for the user to install the ROCm SDK devel module for if they need or wish to compile modules or extensions against their installed ROCm, in the package command menu before marking ready for review.

…ific handling and patching sourced from ComfyUI-Zluda. Pulls VSBuild Tools prerequisite auto install path as a requirement.
…bitsandbytes package commands.

- replace the SageAttention-specific ROCm step with a generic Windows ROCm package-command step and shared command enum
- move rocm-sdk-devel resolution and install logic into the ROCm helper, including nightly-date matching with fallback to the latest multi-arch build
- gate Flash Attention to legacy ROCm architectures and bitsandbytes to Python 3.12 installs

Non-ROCm Helper changes
- Added an optional visibility predicate to extra package commands, enabling installed-package-instance-aware filtering so packages can declaratively hide commands that do not apply to the current install state (for example, the ComfyUI bitsandbytes command on non-Python-3.12 Windows ROCm installs).
@NeuralFault
Copy link
Copy Markdown
Contributor Author

@gemini-code-assist recheck

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Thank you for the update and for providing such a comprehensive overview of the refactoring and improvements. The consolidation of the Windows ROCm logic into IRocmPackageHelper and the introduction of the multi-arch index handling are significant steps forward for maintainability and broad hardware support. The addition of the experimental notice and the post-install verification steps are also excellent additions for robustness. Since you have verified compatibility with RDNA2 and RDNA4 and addressed the environment variable passthrough for SwarmUI, this looks very solid. I have no further concerns at this time.

@NeuralFault NeuralFault marked this pull request as ready for review May 10, 2026 02:31
@NeuralFault
Copy link
Copy Markdown
Contributor Author

NeuralFault commented May 10, 2026

Would also like to note, and possibly good for changelog if merged and hits release, that PyTorch TunableOp is intentionally disabled by default. Any user who has a cache database for it and wishes to continue using it and compile further optimized GEMM kernels will need to add PYTORCH_TUNABLEOP_ENABLED with a value of 1 to their environment variables in SM settings.

@NeuralFault NeuralFault marked this pull request as draft May 10, 2026 22:23
PyVenvRunner/UvVenvRunner changes:
- treat pip show package-not-found results as a normal missing-package case instead of throwing from the venv runners
- treat pip index versions no-match results as empty lookups instead of surfacing low-level process failures
- add optional prerelease support to PipIndex so callers can query prerelease-only feeds

ROCm Helper Change:
- enable prerelease lookup for rocm-sdk-devel on the ROCm multi-arch index so nightly SDK builds are discoverable during Windows ROCm package commands.
@NeuralFault
Copy link
Copy Markdown
Contributor Author

The full ReadyToRun build exposed a latent runner bug that blocked installation of the ROCm development SDK module for the SageAttention and ROCm SDK package commands when it was not already installed in the ComfyUI venv, because pip show “not found” results were being raised as exceptions instead of being treated as a missing package state (null).

This change to pyrunner/uvrunner should be safe for other calls as the other call sites for it handle null intentionally and don't assume a non-null result without checking.

…install config and unified machine state

- Compose RocmPackageProfile around PipInstallConfig instead of duplicating pip-install fields in the ROCm profile model
- Add PostTorchInstallPipArgs to PipInstallConfig so ROCm-specific post-torch package pins fit the shared install pipeline
- Update BaseGitPackage to expose shared Windows ROCm compatibility helpers and to support optional requirements and optional torch stages in StandardPipInstallProcessAsync
- Refactor IRocmPackageHelper / RocmPackageHelper so the helper provides the Windows ROCm install config and owns only the ROCm torch install + verification path
- Replace the private resolved helper state with RocmMachineState as the shared ROCm hardware/install state shape
- Remove obsolete RocmInstallContext after moving its install-time fields into RocmMachineState
- Reduce compatibility/runtime/install translation churn by projecting those paths from the shared machine-state model instead of copying between overlapping ROCm context types
- Remove leftover `_ = profile;` no-op assignments from RocmPackageHelper now that the affected internal methods no longer use profile for compatibility/runtime resolution
- Switch ComfyUI Windows ROCm installs to the shared StandardPipInstallProcessAsync pipeline using helper-built install config
- Move the ROCm-specific torch step to InstallWindowsNativeTorchAsync so ComfyUI only owns package wiring, not a duplicate install pipeline
- Update ComfyWindowsRocmProfile to use composed InstallConfig settings from the refactored ROCm profile model
- Reuse the shared Windows ROCm compatibility helpers from BaseGitPackage.cs to avoid package-local compatibility drift
- Deduplicate ComfyUI Windows ROCm extra-package command handlers through a shared command runner helper
- Switch Wan2GP Windows ROCm installs to the shared StandardPipInstallProcessAsync pipeline using helper-built install config
- Move the ROCm-specific torch step to InstallWindowsNativeTorchAsync to match the refactored helper contract
- Extract the Windows ROCm package profile to Wan2GpWindowsRocmProfile.cs for consistency with other helper-managed packages
- Update Windows ROCm compatibility and launch environment wiring to use the shared base/helper profile path
- Keep Wan2GP-specific post-torch package pins in the composed InstallConfig
- Add IRocmPackageHelper wiring to reForge and factory construction
- Add ReforgeWindowsRocmProfile to define Windows ROCm install policy and package-specific pins
- Reuse SDWebForge requirements discovery for helper-managed Windows ROCm installs
- Route Windows ROCm installs through shared pip install handling plus helper-managed ROCm torch installation
- Apply helper-built Windows ROCm launch environment and launch notice lines for ROCm-enabled reForge runs
- Add reForge-specific cross-attention launch options so ROCm-aware defaults can be set in the UI: legacy AMD defaults to Quad Cross-Attention, modern AMD defaults to PyTorch Cross-Attention, and all other install states remain unset so reForge keeps its previous upstream behavior
- Pinned `accelerate` to `0.22.0` instead of reForge’s upstream `0.21.0` because `0.21.0` imports torch.distributed too early and breaks the Marigold Depth Controlnet Processor import.
Distributed Torch support is currently absent in ROCm builds for Windows.
- Add a post-torch reinstall of pinned upstream `setuptools` because the ROCm multi-arch torch install path currently resolves only `81.x` wheel from the stable index, which breaks upstream expectations and introduces pkg_resources deprecation noise

- Add the correct ReForge-upstream Stable Diffusion repo URL override  across launch states instead of inheriting the A3WebUI repo URL, preventing commit hash mismatches during launch checks that caused immediate startup failure

A3WebUI / SDWebForge dedupe support
- Extend A3WebUI with package-aware launch environment and launch notice hooks so reForge can customize launch behavior by overriding small helper methods instead of duplicating the entire `RunPackage` flow
- Extract reusable requirements discovery from SDWebForge as a dedupe measure for reForge, so the Windows ROCm install path can reuse the existing built-in requirements walk without copying that logic
- Switch Windows ROCm multi-arch index selection to use the nightly TheRock feed for gfx900 / gfx906 while keeping other supported architectures on the stable feed
- Update helper-managed Windows ROCm torch installs to resolve the multi-arch package index from the detected runtime architecture
- Update rocm-sdk-devel lookup and installation to use the same architecture-specific index selection as the ROCm torch install path

Vega-based consumer/pro cards don't have builds in the fixed release index of TheRock builds, but do on the nightly index.
NeuralFault and others added 7 commits May 26, 2026 20:46
- add missing closing brace
- remove stale GetEnvVars environment update call reintroduced during merge resolution
- add IRocmPackageHelper plumbing to InvokeAI construction in PackageFactory
- add InvokeAiWindowsRocmProfile.cs to model InvokeAI’s Windows ROCm compatibility checks, helper-managed launch environment, launch notices, and Python 3.12 bitsandbytes post-install requirement
- add supported Windows ROCm auto-selection in InvokeAI.cs so InvokeAI prefers TorchIndex.Rocm on compatible Windows AMD systems
-add a Windows ROCm-specific install path in InvokeAI.cs that skips the Linux ROCm wheel index, then installs helper-managed native ROCm torch after the base invokeai package install
- add Windows ROCm launch-time environment injection and shared startup notices in InvokeAI.cs when an InvokeAI install is launched with the ROCm backend selected
-  update the InvokeAI preview image URL to a current docs asset after the previous upstream preview path was removed, fixing the package link test failure
- Wan2GP.cs: add `selectedTorchIndex` check before injecting ROCm env
- ComfyUI.cs: same check in `GetEnvVars`
…Helper.cs`

- early return when `PreferredGpu` is non-AMD in `ResolveWindowsMachineState`
- `ReforgeWindowsRocmProfile.cs`: add `HasSupport`, `ShouldUseWindowsNativeInstall`, `ApplyWindowsRocmLaunchDefaults`, `GetPreferredCrossAttentionArgument`, `BuildLaunchEnvironment`, `GetLaunchNoticeLines`

- `Reforge.cs`: consume those new helpers. pure refactor, no behavior change
@NeuralFault
Copy link
Copy Markdown
Contributor Author

@gemini-code-assist recheck

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Hi @mohnjiles, thank you for the detailed feedback. You've correctly identified that RocmPackageHelper.InstallWindowsNativePackageAsync and BaseGitPackage.StandardPipInstallProcessAsync share a significant amount of logic, which does introduce a risk of drift.

I agree that refactoring to keep the StandardPipInstallProcessAsync as the primary driver while allowing the RocmPackageHelper to own only the ROCm-specific installation steps (like the multi-arch torch install and verification) is a cleaner, more maintainable approach. This aligns with the existing SkipTorchInstall = true pattern and would definitely reduce the maintenance burden for future updates to the pip process.

I will look into refactoring the helper to provide a BuildWindowsPipInstallConfig method and a dedicated InstallWindowsRocmTorchAsync method, allowing packages to compose their installation flow as you suggested. This should resolve the duplication concern while keeping the ROCm-specific logic encapsulated.

@NeuralFault
Copy link
Copy Markdown
Contributor Author

Implementation is complete with the initial WebUI package support for ComfyUI, SD WebUI Reforge, InvokeAI, SwarmUI, and Wan2GP.
Tested various GPU install configurations at install and launch time ensuring expected behavior for AMD GPU and Nvidia environments.

It is now in an intended state of general use and for future package support expansion.

Unless something else stands out that can be changed, ready for merge.

NeuralFault and others added 3 commits May 29, 2026 06:07
is dead code when false. was for debugging purposes early on anyways
…from GetCompatibility

- Convert `RocmPackageHelper` from primary constructor to body constructor to support field initializers
- Cache resolved machine state via Lazy<RocmMachineState> to avoid repeated GPU enumeration across calls
- Remove unused profile parameter from `IRocmPackageHelper.GetCompatibility()` and all call sites
- Remove unused profile parameter from `BaseGitPackage.HasWindowsRocmSupport`/`GetWindowsRocmCompatibility` helpers and update package-local wrappers in ComfyUI and Wan2GP
- Delete dead `TryResolvePreferredAmdGfxArch` private method
- Inline `IsSupportedWindowsRocmGpu`, `IsSupportedWindowsRocmArchitecture`, and `GetUnsupportedGpuReason` pass-through wrappers directly to `WindowsRocmSupport` calls
- Replace discarded cancellationToken assignments with `cancellationToken.ThrowIfCancellationRequested()` in `VerifyWindowsNativeTorchInstallAsync`; remove dead discard in `EnsureWindowsSdkDevelAsync`
- Move constants and static data fields above instance fields and constructor in `RocmPackageHelper`
Copy link
Copy Markdown
Member

@ionite34 ionite34 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, so looking at the abstractions of the profiles and the creation of the static specific profiles, few things I want to note.

Swarm self-launching a Comfy backend means the profile has to be reachable outside ComfyUI class, and pulling COMFY_ENABLE_MIOPEN etc. out of the config, so ROCm stuff being in a designated place, not inlined per package seems fine, no pushback on that.

But the per-package static class pattern vs. just carrying the data or inheritance seems a bit like complexity and more code without any of the actual abstraction or inheritance benefits.

So a couple things:

  1. The forwarders are basically indirecting code with negative value since the caller already holds the helper objects they pass in
// InvokeAiWindowsRocmProfile

public static bool HasSupport(IRocmPackageHelper? rocmPackageHelper)
{
    return GetCompatibility(rocmPackageHelper).IsCompatible;
}

public static IReadOnlyDictionary<string, string> BuildLaunchEnvironment(
    IRocmPackageHelper? rocmPackageHelper
)
{
    return rocmPackageHelper?.BuildLaunchEnvironment(Profile) ?? new Dictionary<string, string>();
}

These take the helper as an arg only to call a method on that same arg. The caller in InvokeAI already has rocmPackageHelper to pass it in, so it could just write rocmPackageHelper.BuildLaunchEnvironment(InvokeAiProfile) without needing the wrapper. Same for HasSupport, BuildLaunchEnvironment, ShouldUseWindowsNativeInstall, GetLaunchNoticeLines across reForge + Invoke, they're the package's own static class re-exposing the helper's methods pre-bound to a Profile static property.

2. The nullable + the ?? new Dictionary<>() guarding against it is kind of self-inflicted. PackageFactory injects a non-null IRocmPackageHelper into every package (PackageFactory.cs:65–294). StableSwarm and Wan2GP ctors correctly take it non-nullable. But ComfyUI/InvokeAI/Reforge declare IRocmPackageHelper? rocmPackageHelper = null, and InvokeAI.cs:302 then throws an "internal configuration error" to handle a null the factory never passes. The ?./?? in every forwarder exists only to service that impossible null. Drop the = null default → the nullable goes away → the defensive coalescing goes away → the throw goes away.

Which leads to the bigger structural bit:

3. It seems we built an instance abstraction and then bolted a static façade onto the side of it instead of through it.

RocmPackageProfile is already an instance type with init props, designed to be instantiated and subclassed. Wrapping it in static class XWindowsRocmProfile { static Profile { get; } = new()... } plus loose static methods means the per-package layer can't be polymorphic, injected, overridden, or substituted in tests.

And looking at the 8 members on InvokeAiWindowsRocmProfile, the only thing InvokeAI-specific is CreateInstallProfile (the py3.12 bitsandbytes-wheel branch). GetCompatibility, HasSupport, ShouldApplyLaunchEnvironment, BuildLaunchEnvironment, GetLaunchNoticeLines are byte-for-byte the same logic as reForge's (ReforgeWindowsRocmProfile). 5 identical forwarders in two different package-named static classes. That's the copy and paste the static pattern produces.

So my suggestion - I think instance subclasses of RocmPackageProfile is the pattern this was reaching for. Profile as an instance, package-specific behavior as a virtual override, shared dispatch is on the helper:

// base carries data + virtuals for the bits that actually vary per package
public class RocmPackageProfile
{
    public PipInstallConfig InstallConfig { get; init; } = new();
    public RocmEnvironmentOptions EnvironmentOptions { get; init; } = new();

    // package-specific hooks; default = no-op / inherit
    public virtual IReadOnlyDictionary<string, string> BuildExtraEnvironment(RocmRuntimeContext ctx) =>
        new Dictionary<string, string>();
}

// InvokeAI's profile is an instance subclass, only the genuinely specific bit lives here
public class InvokeAiRocmProfile : RocmPackageProfile
{
    private const string TritonWindowsPackage = "triton-windows";
    private const string BitsAndBytesWheelUrl = "https://github.com/0xDELUXA/...whl";

    public InvokeAiRocmProfile(PyVersion pyVersion)
    {
        var postTorch = pyVersion is { Major: 3, Minor: 12 }
            ? new[] { TritonWindowsPackage, BitsAndBytesWheelUrl }
            : [TritonWindowsPackage];
        InstallConfig = new PipInstallConfig { PostTorchInstallPipArgs = [.. postTorch] };
    }
}

and all the forwarders that are identical across packages move onto the helper, taking the profile instance as an arg perhaps:

// IRocmPackageHelper

bool ShouldApplyLaunchEnvironment(TorchIndex selectedTorchIndex);

IReadOnlyList<string> GetLaunchNoticeLines(TorchIndex selectedTorchIndex);

// BuildLaunchEnvironment(RocmPackageProfile) already exists

Call site then collapses from InvokeAiWindowsRocmProfile.BuildLaunchEnvironment(rocmPackageHelper) to rocmPackageHelper.BuildLaunchEnvironment(profile).

For Comfy's COMFYUI_ENABLE_MIOPEN (the modern-arch env that Swarm needs to pick up), that becomes the BuildExtraEnvironment override on ComfyRocmProfile, and Swarm just holds/references the same ComfyRocmProfile instance, making that work without a behavior-bearing static class.

Actual package specific stuff can be overrides: reForge's ApplyWindowsRocmLaunchDefaults (CUDA Malloc/Stream disabling, Reforge.cs:61) and GetPreferredCrossAttentionArgument are virtual members on the base overridden in ReforgeRocmProfile. So we just need to move the five identical ones to the helper, make the package specific ones overrides.

- Change IRocmPackageHelper parameter from nullable with default null to
  non-nullable in ComfyUI, InvokeAI, and Reforge constructors
- Remove null-throw guard blocks in ComfyUI.InstallPackage,
  InvokeAI.InstallPackage, and Reforge.InstallPackage — PackageFactory
  always injects a non-null instance, making these unreachable
- Remove dead null-return guards in ComfyUI.EmitWindowsRocmLaunchNotice
  and ComfyUI.GetEnvVars
- Collapse InvokeAI null-throw pattern (?? throw) on torch install call
  to a direct rocmPackageHelper method call
- Remove nullHelperMessage parameter and ?? throw expression from ComfyUI.RunWindowsRocmPackageCommandAsync and all four call sites
- Update BaseGitPackage.HasWindowsRocmSupport and GetWindowsRocmCompatibility signatures to non-nullable; remove dead
  null branch from the compatibility check condition
- Remove null guards, ?? new Dictionary<>(), and ?? [] fallbacks from InvokeAiWindowsRocmProfile and ReforgeWindowsRocmProfile static methods
- Add IRocmPackageHelper parameter to ComfyZluda and forward to ComfyUI base constructor to satisfy the now-required parameter; ZLUDA does not use it
…LaunchArgMigrationTests

- Add IRocmPackageHelper parameter to TestComfyUI in ComfyLaunchArgMigrationTests and pass an NSubstitute substitute to satisfy the now-required base constructor parameter
… helper methods

- Remove static forwarder methods (GetCompatibility, HasSupport, ShouldApplyLaunchEnvironment/ShouldUseWindowsNativeInstall, BuildLaunchEnvironment, GetLaunchNoticeLines) from InvokeAiWindowsRocmProfile and ReforgeWindowsRocmProfile
- Add ShouldApplyWindowsLaunchEnvironment(TorchIndex) and GetWindowsLaunchNoticeLines(TorchIndex) to IRocmPackageHelper and RocmPackageHelper; make no-arg overload private
- Update InvokeAI and Reforge call sites to use IRocmPackageHelper directly
- Migrate ComfyUI.EmitWindowsRocmLaunchNotice to use GetWindowsLaunchNoticeLines(TorchIndex), removing ShouldShowWindowsRocmLaunchNotice
…ymorphic classes

- replace static *WindowsRocmProfile wrappers with concrete subclasses of RocmPackageProfile
- introduce Default singleton instances for Comfy, InvokeAI, reForge, and Wan2GP profile classes
- remove all remaining .Profile static call sites and switch package flows to profile instances
- add virtual hooks to RocmPackageProfile for package-specific launch defaults and cross-attention selection
- migrate reForge ROCm launch customization to profile overrides (ApplyWindowsRocmLaunchDefaults, GetPreferredCrossAttentionArgument)
- keep install-time dynamic profile factories (CreateProfile, CreateInstallProfile) for per-package/per-python install config
- preserve helper-centric ROCm orchestration by continuing to route env/install decisions through IRocmPackageHelper
@NeuralFault
Copy link
Copy Markdown
Contributor Author

@ionite34 The requested changes have been addressed. Let me know if there is anything further needed.

@NeuralFault NeuralFault requested a review from ionite34 June 2, 2026 03:03
Copy link
Copy Markdown
Contributor

@mohnjiles mohnjiles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@mohnjiles mohnjiles merged commit 75733d4 into LykosAI:main Jun 3, 2026
3 checks passed
@github-actions github-actions Bot locked and limited conversation to collaborators Jun 3, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants