[AMDGPU] fix amdgpu_max_num_work_groups in templates #141633

yxsamliu · 2025-05-27T17:05:48Z

Clang does not instantiate amdgpu_max_num_work_groups attribute with one template argument, causing
assertion codegen.

Fixes: #139570

llvmbot · 2025-05-27T17:06:24Z

@llvm/pr-subscribers-backend-amdgpu

Author: Yaxun (Sam) Liu (yxsamliu)

Changes

Clang does not instantiate amdgpu_max_num_work_groups attribute with one template argument, causing
assertion codegen.

Fixes: #139570

Full diff: https://github.com/llvm/llvm-project/pull/141633.diff

2 Files Affected:

(modified) clang/lib/Sema/SemaTemplateInstantiateDecl.cpp (+22-13)
(modified) clang/test/CodeGenCUDA/amdgpu-kernel-attrs.cu (+6)

diff --git a/clang/lib/Sema/SemaTemplateInstantiateDecl.cpp b/clang/lib/Sema/SemaTemplateInstantiateDecl.cpp
index 44700a446dfac..174c8fc59e4fa 100644
--- a/clang/lib/Sema/SemaTemplateInstantiateDecl.cpp
+++ b/clang/lib/Sema/SemaTemplateInstantiateDecl.cpp
@@ -648,21 +648,30 @@ static void instantiateDependentAMDGPUMaxNumWorkGroupsAttr(
   EnterExpressionEvaluationContext Unevaluated(
       S, Sema::ExpressionEvaluationContext::ConstantEvaluated);
 
-  ExprResult ResultX = S.SubstExpr(Attr.getMaxNumWorkGroupsX(), TemplateArgs);
-  if (!ResultX.isUsable())
-    return;
-  ExprResult ResultY = S.SubstExpr(Attr.getMaxNumWorkGroupsY(), TemplateArgs);
-  if (!ResultY.isUsable())
-    return;
-  ExprResult ResultZ = S.SubstExpr(Attr.getMaxNumWorkGroupsZ(), TemplateArgs);
-  if (!ResultZ.isUsable())
-    return;
+  Expr *XExpr = nullptr;
+  Expr *YExpr = nullptr;
+  Expr *ZExpr = nullptr;
+
+  if (Attr.getMaxNumWorkGroupsX()) {
+    ExprResult ResultX = S.SubstExpr(Attr.getMaxNumWorkGroupsX(), TemplateArgs);
+    if (ResultX.isUsable())
+      XExpr = ResultX.getAs<Expr>();
+  }
+
+  if (Attr.getMaxNumWorkGroupsY()) {
+    ExprResult ResultY = S.SubstExpr(Attr.getMaxNumWorkGroupsY(), TemplateArgs);
+    if (ResultY.isUsable())
+      YExpr = ResultY.getAs<Expr>();
+  }
 
-  Expr *XExpr = ResultX.getAs<Expr>();
-  Expr *YExpr = ResultY.getAs<Expr>();
-  Expr *ZExpr = ResultZ.getAs<Expr>();
+  if (Attr.getMaxNumWorkGroupsZ()) {
+    ExprResult ResultZ = S.SubstExpr(Attr.getMaxNumWorkGroupsZ(), TemplateArgs);
+    if (ResultZ.isUsable())
+      ZExpr = ResultZ.getAs<Expr>();
+  }
 
-  S.AMDGPU().addAMDGPUMaxNumWorkGroupsAttr(New, Attr, XExpr, YExpr, ZExpr);
+  if (XExpr)
+    S.AMDGPU().addAMDGPUMaxNumWorkGroupsAttr(New, Attr, XExpr, YExpr, ZExpr);
 }
 
 // This doesn't take any template parameters, but we have a custom action that
diff --git a/clang/test/CodeGenCUDA/amdgpu-kernel-attrs.cu b/clang/test/CodeGenCUDA/amdgpu-kernel-attrs.cu
index 253ac0898f546..ced0059e69d9b 100644
--- a/clang/test/CodeGenCUDA/amdgpu-kernel-attrs.cu
+++ b/clang/test/CodeGenCUDA/amdgpu-kernel-attrs.cu
@@ -78,6 +78,12 @@ __global__ void template_32_4_a_max_num_work_groups() {}
 template __global__ void template_32_4_a_max_num_work_groups<2>();
 // CHECK: define{{.*}} amdgpu_kernel void @_Z35template_32_4_a_max_num_work_groupsILj2EEvv() [[MAX_NUM_WORK_GROUPS_32_4_2:#[0-9]+]]
 
+template<unsigned a>
+__attribute__((amdgpu_max_num_work_groups(a)))
+__global__ void template_a_max_num_work_groups() {}
+template __global__ void template_a_max_num_work_groups<32>();
+// CHECK: define{{.*}} amdgpu_kernel void @_Z30template_a_max_num_work_groupsILj32EEvv() [[MAX_NUM_WORK_GROUPS_32_1_1]]
+
 // Make sure this is silently accepted on other targets.
 // NAMD-NOT: "amdgpu-flat-work-group-size"
 // NAMD-NOT: "amdgpu-waves-per-eu"

llvmbot · 2025-05-27T17:06:25Z

@llvm/pr-subscribers-clang

Author: Yaxun (Sam) Liu (yxsamliu)

Changes

Clang does not instantiate amdgpu_max_num_work_groups attribute with one template argument, causing
assertion codegen.

Fixes: #139570

Full diff: https://github.com/llvm/llvm-project/pull/141633.diff

2 Files Affected:

(modified) clang/lib/Sema/SemaTemplateInstantiateDecl.cpp (+22-13)
(modified) clang/test/CodeGenCUDA/amdgpu-kernel-attrs.cu (+6)

diff --git a/clang/lib/Sema/SemaTemplateInstantiateDecl.cpp b/clang/lib/Sema/SemaTemplateInstantiateDecl.cpp
index 44700a446dfac..174c8fc59e4fa 100644
--- a/clang/lib/Sema/SemaTemplateInstantiateDecl.cpp
+++ b/clang/lib/Sema/SemaTemplateInstantiateDecl.cpp
@@ -648,21 +648,30 @@ static void instantiateDependentAMDGPUMaxNumWorkGroupsAttr(
   EnterExpressionEvaluationContext Unevaluated(
       S, Sema::ExpressionEvaluationContext::ConstantEvaluated);
 
-  ExprResult ResultX = S.SubstExpr(Attr.getMaxNumWorkGroupsX(), TemplateArgs);
-  if (!ResultX.isUsable())
-    return;
-  ExprResult ResultY = S.SubstExpr(Attr.getMaxNumWorkGroupsY(), TemplateArgs);
-  if (!ResultY.isUsable())
-    return;
-  ExprResult ResultZ = S.SubstExpr(Attr.getMaxNumWorkGroupsZ(), TemplateArgs);
-  if (!ResultZ.isUsable())
-    return;
+  Expr *XExpr = nullptr;
+  Expr *YExpr = nullptr;
+  Expr *ZExpr = nullptr;
+
+  if (Attr.getMaxNumWorkGroupsX()) {
+    ExprResult ResultX = S.SubstExpr(Attr.getMaxNumWorkGroupsX(), TemplateArgs);
+    if (ResultX.isUsable())
+      XExpr = ResultX.getAs<Expr>();
+  }
+
+  if (Attr.getMaxNumWorkGroupsY()) {
+    ExprResult ResultY = S.SubstExpr(Attr.getMaxNumWorkGroupsY(), TemplateArgs);
+    if (ResultY.isUsable())
+      YExpr = ResultY.getAs<Expr>();
+  }
 
-  Expr *XExpr = ResultX.getAs<Expr>();
-  Expr *YExpr = ResultY.getAs<Expr>();
-  Expr *ZExpr = ResultZ.getAs<Expr>();
+  if (Attr.getMaxNumWorkGroupsZ()) {
+    ExprResult ResultZ = S.SubstExpr(Attr.getMaxNumWorkGroupsZ(), TemplateArgs);
+    if (ResultZ.isUsable())
+      ZExpr = ResultZ.getAs<Expr>();
+  }
 
-  S.AMDGPU().addAMDGPUMaxNumWorkGroupsAttr(New, Attr, XExpr, YExpr, ZExpr);
+  if (XExpr)
+    S.AMDGPU().addAMDGPUMaxNumWorkGroupsAttr(New, Attr, XExpr, YExpr, ZExpr);
 }
 
 // This doesn't take any template parameters, but we have a custom action that
diff --git a/clang/test/CodeGenCUDA/amdgpu-kernel-attrs.cu b/clang/test/CodeGenCUDA/amdgpu-kernel-attrs.cu
index 253ac0898f546..ced0059e69d9b 100644
--- a/clang/test/CodeGenCUDA/amdgpu-kernel-attrs.cu
+++ b/clang/test/CodeGenCUDA/amdgpu-kernel-attrs.cu
@@ -78,6 +78,12 @@ __global__ void template_32_4_a_max_num_work_groups() {}
 template __global__ void template_32_4_a_max_num_work_groups<2>();
 // CHECK: define{{.*}} amdgpu_kernel void @_Z35template_32_4_a_max_num_work_groupsILj2EEvv() [[MAX_NUM_WORK_GROUPS_32_4_2:#[0-9]+]]
 
+template<unsigned a>
+__attribute__((amdgpu_max_num_work_groups(a)))
+__global__ void template_a_max_num_work_groups() {}
+template __global__ void template_a_max_num_work_groups<32>();
+// CHECK: define{{.*}} amdgpu_kernel void @_Z30template_a_max_num_work_groupsILj32EEvv() [[MAX_NUM_WORK_GROUPS_32_1_1]]
+
 // Make sure this is silently accepted on other targets.
 // NAMD-NOT: "amdgpu-flat-work-group-size"
 // NAMD-NOT: "amdgpu-waves-per-eu"

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

clang/test/CodeGenCUDA/amdgpu-kernel-attrs.cu

Clang does not instantiate amdgpu_max_num_work_groups attribute with one template argument, causing assertion codegen. Fixes: llvm#139570

yxsamliu requested review from jayfoad, arsenm, shiltian and Copilot May 27, 2025 17:05

llvmbot added clang Clang issues not falling into any other category backend:AMDGPU clang:frontend Language frontend issues, e.g. anything involving "Sema" labels May 27, 2025

Copilot AI reviewed May 27, 2025

View reviewed changes

arsenm reviewed May 27, 2025

View reviewed changes

clang/test/CodeGenCUDA/amdgpu-kernel-attrs.cu Show resolved Hide resolved

[AMDGPU] fix amdgpu_max_num_work_groups in templates

e8692fe

Clang does not instantiate amdgpu_max_num_work_groups attribute with one template argument, causing assertion codegen. Fixes: llvm#139570

yxsamliu force-pushed the max-wg-depend branch from f4849e0 to e8692fe Compare May 28, 2025 02:15

arsenm approved these changes May 28, 2025

View reviewed changes

yxsamliu merged commit 54ddbc6 into llvm:main May 28, 2025
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AMDGPU] fix amdgpu_max_num_work_groups in templates #141633

[AMDGPU] fix amdgpu_max_num_work_groups in templates #141633

Uh oh!

yxsamliu commented May 27, 2025

Uh oh!

llvmbot commented May 27, 2025

Uh oh!

llvmbot commented May 27, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[AMDGPU] fix amdgpu_max_num_work_groups in templates #141633

[AMDGPU] fix amdgpu_max_num_work_groups in templates #141633

Uh oh!

Conversation

yxsamliu commented May 27, 2025

Uh oh!

llvmbot commented May 27, 2025

Uh oh!

llvmbot commented May 27, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!