[AMDGPU] Add SubtargetFeature for dynamic VGPR mode #130030

rovka · 2025-03-06T09:34:08Z

This represents a hardware mode supported only for wave32 compute shaders. When enabled, we set the .dynamic_vgpr_en field of .compute_registers to true in the PAL metadata.

llvmbot · 2025-03-06T09:34:30Z

@llvm/pr-subscribers-backend-amdgpu

Author: Diana Picus (rovka)

Changes

This represents a hardware mode supported only for wave32 compute shaders. When enabled, we set the .dynamic_vgpr_en field of .compute_registers to true in the PAL metadata.

Full diff: https://github.com/llvm/llvm-project/pull/130030.diff

5 Files Affected:

(modified) llvm/docs/AMDGPUUsage.rst (+6)
(modified) llvm/lib/Target/AMDGPU/AMDGPU.td (+6)
(modified) llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp (+3)
(modified) llvm/lib/Target/AMDGPU/GCNSubtarget.h (+3)
(modified) llvm/test/CodeGen/AMDGPU/pal-metadata-3.0.ll (+8-5)

diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index def6addd595e8..59cc08a59ed7c 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -758,6 +758,12 @@ For example:
                                                   enabled will execute correctly but may be less
                                                   performant than code generated for XNACK replay
                                                   disabled.
+
+     dynamic-vgpr    TODO                         Represents the "Dynamic VGPR" hardware mode, introduced in GFX12.
+                                                  Waves launched in this mode may allocate or deallocate the VGPRs
+                                                  using dedicated instructions, but may not send the DEALLOC_VGPRS
+                                                  message.
+
      =============== ============================ ==================================================
 
 .. _amdgpu-target-id:
diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.td b/llvm/lib/Target/AMDGPU/AMDGPU.td
index effc8d2ed6b49..31a98ee132bf6 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPU.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPU.td
@@ -1239,6 +1239,12 @@ def FeatureXF32Insts : SubtargetFeature<"xf32-insts",
    "v_mfma_f32_16x16x8_xf32 and v_mfma_f32_32x32x4_xf32"
  >;
 
+def FeatureDynamicVGPR : SubtargetFeature <"dynamic-vgpr",
+  "DynamicVGPR",
+  "true",
+  "Enable dynamic VGPR mode"
+>;
+
 // Dummy feature used to disable assembler instructions.
 def FeatureDisable : SubtargetFeature<"",
   "FeatureDisable","true",
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
index 31e0bd8d652bc..13e61756e3036 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
@@ -1414,6 +1414,9 @@ static void EmitPALMetadataCommon(AMDGPUPALMetadata *MD,
     MD->setHwStage(CC, ".trap_present",
                    (bool)CurrentProgramInfo.TrapHandlerEnable);
     MD->setHwStage(CC, ".excp_en", CurrentProgramInfo.EXCPEnable);
+
+    if (ST.isDynamicVGPREnabled())
+      MD->setComputeRegisters(".dynamic_vgpr_en", true);
   }
 
   MD->setHwStage(CC, ".lds_size",
diff --git a/llvm/lib/Target/AMDGPU/GCNSubtarget.h b/llvm/lib/Target/AMDGPU/GCNSubtarget.h
index 6664a70572ded..1254cbad83b60 100644
--- a/llvm/lib/Target/AMDGPU/GCNSubtarget.h
+++ b/llvm/lib/Target/AMDGPU/GCNSubtarget.h
@@ -190,6 +190,7 @@ class GCNSubtarget final : public AMDGPUGenSubtargetInfo,
   /// indicates a lack of S_CLAUSE support.
   unsigned MaxHardClauseLength = 0;
   bool SupportsSRAMECC = false;
+  bool DynamicVGPR = false;
 
   // This should not be used directly. 'TargetID' tracks the dynamic settings
   // for SRAMECC.
@@ -1647,6 +1648,8 @@ class GCNSubtarget final : public AMDGPUGenSubtargetInfo,
     return true;
   }
 
+  bool isDynamicVGPREnabled() const { return DynamicVGPR; }
+
   bool requiresDisjointEarlyClobberAndUndef() const override {
     // AMDGPU doesn't care if early-clobber and undef operands are allocated
     // to the same register.
diff --git a/llvm/test/CodeGen/AMDGPU/pal-metadata-3.0.ll b/llvm/test/CodeGen/AMDGPU/pal-metadata-3.0.ll
index 7536e83a9da6b..fa22089978c2e 100644
--- a/llvm/test/CodeGen/AMDGPU/pal-metadata-3.0.ll
+++ b/llvm/test/CodeGen/AMDGPU/pal-metadata-3.0.ll
@@ -1,4 +1,6 @@
-; RUN: llc -mtriple=amdgcn--amdpal -mcpu=gfx1100 <%s | FileCheck %s
+; RUN: llc -mtriple=amdgcn--amdpal -mcpu=gfx1100 <%s | FileCheck %s --check-prefixes=CHECK,GFX11
+; RUN: llc -mtriple=amdgcn--amdpal -mcpu=gfx1200 <%s | FileCheck %s --check-prefixes=CHECK
+; RUN: llc -mtriple=amdgcn--amdpal -mcpu=gfx1200 -mattr=+dynamic-vgpr <%s | FileCheck %s --check-prefixes=CHECK,DVGPR
 
 ; CHECK-LABEL: {{^}}_amdgpu_cs_main:
 ; CHECK: ; TotalNumSgprs: 4
@@ -8,6 +10,7 @@
 ; CHECK-NEXT: amdpal.pipelines:
 ; CHECK-NEXT:   - .api:            Vulkan
 ; CHECK-NEXT:     .compute_registers:
+; DVGPR-NEXT:       .dynamic_vgpr_en:   true
 ; CHECK-NEXT:       .tg_size_en:     true
 ; CHECK-NEXT:       .tgid_x_en:      false
 ; CHECK-NEXT:       .tgid_y_en:      false
@@ -57,7 +60,7 @@
 ; CHECK-NEXT:        .entry_point_symbol:    _amdgpu_cs_main
 ; CHECK-NEXT:        .excp_en:        0
 ; CHECK-NEXT:        .float_mode:     0xc0
-; CHECK-NEXT:        .ieee_mode:      false
+; GFX11-NEXT:        .ieee_mode:      false
 ; CHECK-NEXT:        .image_op:       false
 ; CHECK-NEXT:        .lds_size:       0
 ; CHECK-NEXT:        .mem_ordered:    true
@@ -112,7 +115,7 @@
 ; CHECK-NEXT:        .debug_mode:     false
 ; CHECK-NEXT:        .entry_point:    _amdgpu_gs
 ; CHECK-NEXT:        .entry_point_symbol:    gs_shader
-; CHECK-NEXT:        .ieee_mode:      false
+; GFX11-NEXT:        .ieee_mode:      false
 ; CHECK-NEXT:        .lds_size:       0x200
 ; CHECK-NEXT:        .mem_ordered:    true
 ; CHECK-NEXT:        .scratch_en:     false
@@ -124,7 +127,7 @@
 ; CHECK-NEXT:        .debug_mode:     false
 ; CHECK-NEXT:        .entry_point:    _amdgpu_hs
 ; CHECK-NEXT:        .entry_point_symbol:    hs_shader
-; CHECK-NEXT:        .ieee_mode:      false
+; GFX11-NEXT:        .ieee_mode:      false
 ; CHECK-NEXT:        .lds_size:       0x1000
 ; CHECK-NEXT:        .mem_ordered:    true
 ; CHECK-NEXT:        .scratch_en:     false
@@ -136,7 +139,7 @@
 ; CHECK-NEXT:        .debug_mode:     false
 ; CHECK-NEXT:        .entry_point:    _amdgpu_ps
 ; CHECK-NEXT:        .entry_point_symbol:    ps_shader
-; CHECK-NEXT:        .ieee_mode:      false
+; GFX11-NEXT:        .ieee_mode:      false
 ; CHECK-NEXT:        .lds_size:       0
 ; CHECK-NEXT:        .mem_ordered:    true
 ; CHECK-NEXT:        .scratch_en:     false

shiltian · 2025-03-06T23:47:17Z

llvm/lib/Target/AMDGPU/AMDGPU.td

@@ -1239,6 +1239,12 @@ def FeatureXF32Insts : SubtargetFeature<"xf32-insts",
   "v_mfma_f32_16x16x8_xf32 and v_mfma_f32_32x32x4_xf32"
 >;

+def FeatureDynamicVGPR : SubtargetFeature <"dynamic-vgpr",


Where is this target feature enabled?

My understanding is graphics front-end adds this target feature as required. This is similar to cumode and xnack.

That's right, this is enabled from above the backend.

arsenm

It seems to me this should be a separate attribute, it's not really part of the target

rovka · 2025-03-07T10:12:10Z

It seems to me this should be a separate attribute, it's not really part of the target

I don't really disagree with that. I think I made it a feature because it's kind of a hardware mode (i.e. it takes over a CU, and waves that use dynamic VGPRs can't be mixed with waves that don't). If you feel strongly about this, I can make it an attribute. We already have places downstream that use it as a target feature, so I would like to have the subtarget feature as well while we migrate away from it. Should I add an attribute too in this PR?

This represents a hardware mode supported only for wave32 compute shaders. When enabled, we set the `.dynamic_vgpr_en` field of `.compute_registers` to true in the PAL metadata.

rovka · 2025-03-17T10:44:45Z

Ping. Can the attribute wait until next week or so? :D

arsenm

It probably should be a separate attribute. Does this have any impact on non-entry functions?

rovka · 2025-03-18T10:46:22Z

It probably should be a separate attribute. Does this have any impact on non-entry functions?

Yes, I updated the test for non-entry functions. Thanks!

Use a function attribute (amdgpu-dynamic-vgpr) instead of a subtarget feature, as requested in llvm#130030.

Use a function attribute (amdgpu-dynamic-vgpr) instead of a subtarget feature, as requested in #130030.

Use a function attribute (amdgpu-dynamic-vgpr) instead of a subtarget feature, as requested in llvm#130030.

rovka added the backend:AMDGPU label Mar 6, 2025

rovka requested review from jayfoad, arsenm, perlfu, shiltian, mbrkusanin and mariusz-sikora-at-amd March 6, 2025 09:34

shiltian reviewed Mar 6, 2025

View reviewed changes

arsenm reviewed Mar 7, 2025

View reviewed changes

rovka changed the base branch from users/rovka/dvgpr-1 to main March 10, 2025 04:09

[AMDGPU] Add SubtargetFeature for dynamic VGPR mode

5f73d9e

This represents a hardware mode supported only for wave32 compute shaders. When enabled, we set the `.dynamic_vgpr_en` field of `.compute_registers` to true in the PAL metadata.

rovka force-pushed the users/rovka/dvgpr-2 branch from b2a7bdc to 5f73d9e Compare March 10, 2025 04:32

qiaojbao added a commit to GPUOpen-Drivers/llvm-project that referenced this pull request Mar 13, 2025

[AMDGPU] Add SubtargetFeature for dynamic VGPR mode llvm#130030

7ca3f36

arsenm approved these changes Mar 18, 2025

View reviewed changes

Update test for non-entry func

3927762

rovka merged commit 0a21ef9 into main Mar 18, 2025
7 of 10 checks passed

rovka deleted the users/rovka/dvgpr-2 branch March 18, 2025 10:48

rovka added a commit to rovka/llvm-project that referenced this pull request Mar 28, 2025

[AMDGPU] Replace dynamic VGPR feature with attribute

ab3edd0

Use a function attribute (amdgpu-dynamic-vgpr) instead of a subtarget feature, as requested in llvm#130030.

rovka mentioned this pull request Mar 28, 2025

[AMDGPU] Replace dynamic VGPR feature with attribute #133444

Merged

rovka added a commit that referenced this pull request Jun 24, 2025

[AMDGPU] Replace dynamic VGPR feature with attribute (#133444)

a201f88

Use a function attribute (amdgpu-dynamic-vgpr) instead of a subtarget feature, as requested in #130030.

DrSergei pushed a commit to DrSergei/llvm-project that referenced this pull request Jun 24, 2025

[AMDGPU] Replace dynamic VGPR feature with attribute (llvm#133444)

012c508

Use a function attribute (amdgpu-dynamic-vgpr) instead of a subtarget feature, as requested in llvm#130030.

anthonyhatran pushed a commit to anthonyhatran/llvm-project that referenced this pull request Jun 26, 2025

[AMDGPU] Replace dynamic VGPR feature with attribute (llvm#133444)

07f2058

Use a function attribute (amdgpu-dynamic-vgpr) instead of a subtarget feature, as requested in llvm#130030.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AMDGPU] Add SubtargetFeature for dynamic VGPR mode #130030

[AMDGPU] Add SubtargetFeature for dynamic VGPR mode #130030

Uh oh!

rovka commented Mar 6, 2025

Uh oh!

llvmbot commented Mar 6, 2025

Uh oh!

shiltian Mar 6, 2025

Uh oh!

perlfu Mar 7, 2025

Uh oh!

rovka Mar 7, 2025

Uh oh!

arsenm left a comment

Uh oh!

rovka commented Mar 7, 2025

Uh oh!

rovka commented Mar 17, 2025

Uh oh!

arsenm left a comment

Uh oh!

rovka commented Mar 18, 2025

Uh oh!

Uh oh!

Uh oh!

[AMDGPU] Add SubtargetFeature for dynamic VGPR mode #130030

[AMDGPU] Add SubtargetFeature for dynamic VGPR mode #130030

Uh oh!

Conversation

rovka commented Mar 6, 2025

Uh oh!

llvmbot commented Mar 6, 2025

Uh oh!

shiltian Mar 6, 2025

Choose a reason for hiding this comment

Uh oh!

perlfu Mar 7, 2025

Choose a reason for hiding this comment

Uh oh!

rovka Mar 7, 2025

Choose a reason for hiding this comment

Uh oh!

arsenm left a comment

Choose a reason for hiding this comment

Uh oh!

rovka commented Mar 7, 2025

Uh oh!

rovka commented Mar 17, 2025

Uh oh!

arsenm left a comment

Choose a reason for hiding this comment

Uh oh!

rovka commented Mar 18, 2025

Uh oh!

Uh oh!

Uh oh!