Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL] Basic code generation for SYCL kernel caller offload entry point functions. #133030

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

tahonermann
Copy link
Contributor

A function declared with the sycl_kernel_entry_point attribute, sometimes called a SYCL kernel entry point function, specifies a pattern from which the parameters and body of an offload entry point function, sometimes called a SYCL kernel caller function, are derived.

SYCL kernel caller functions are emitted during SYCL device compilation. Their parameters and body are derived from the SYCLKernelCallStmt statement and OutlinedFunctionDecl declaration associated with their corresponding SYCL kernel entry point function. A distinct SYCL kernel caller function is generated for each SYCL kernel entry point function defined as a non-inline function or ODR-used in the translation unit.

The name of each SYCL kernel caller function is parameterized by the SYCL kernel name type specified by the sycl_kernel_entry_point attribute attached to the corresponding SYCL kernel entry point function. For the moment, the Itanium ABI mangled name for typeinfo data (_ZTS<type>) is used to name these functions; a future change will switch to a more appropriate naming scheme.

The calling convention used for a SYCL kernel caller function is target dependent. Support for AMDGCN, NVPTX, and SPIR targets is currently provided. These functions are required to observe the language restrictions for SYCL devices as specified by the SYCL 2020 specification; this includes a forward progress guarantee and prohibits recursion.

Only SYCL kernel caller functions, functions declared as SYCL_EXTERNAL, and functions directly or indirectly referenced from those functions should be emitted during device compilation. Pruning of other declarations has not yet been implemented.

…nt functions.

A function declared with the `sycl_kernel_entry_point` attribute, sometimes
called a SYCL kernel entry point function, specifies a pattern from which the
parameters and body of an offload entry point function, sometimes called a
SYCL kernel caller function, are derived.

SYCL kernel caller functions are emitted during SYCL device compilation.
Their parameters and body are derived from the `SYCLKernelCallStmt` statement
and `OutlinedFunctionDecl` declaration associated with their corresponding
SYCL kernel entry point function. A distinct SYCL kernel caller function is
generated for each SYCL kernel entry point function defined as a non-inline
function or ODR-used in the translation unit.

The name of each SYCL kernel caller function is parameterized by the SYCL
kernel name type specified by the `sycl_kernel_entry_point` attribute attached
to the corresponding SYCL kernel entry point function. For the moment, the
Itanium ABI mangled name for typeinfo data (`_ZTS<type>`) is used to name
these functions; a future change will switch to a more appropriate naming
scheme.

The calling convention used for a SYCL kernel caller function is target
dependent. Support for AMDGCN, NVPTX, and SPIR targets is currently provided.
These functions are required to observe the language restrictions for SYCL
devices as specified by the SYCL 2020 specification; this includes a forward
progress guarantee and prohibits recursion.

Only SYCL kernel caller functions, functions declared as `SYCL_EXTERNAL`, and
functions directly or indirectly referenced from those functions should be
emitted during device compilation. Pruning of other declarations has not yet
been implemented.

Co-authored-by: Tom Honermann <tom.honermann@intel.com>
@tahonermann tahonermann added the SYCL https://registry.khronos.org/SYCL label Mar 26, 2025
@tahonermann tahonermann self-assigned this Mar 26, 2025
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:codegen IR generation bugs: mangling, exceptions, etc. labels Mar 26, 2025
@llvmbot
Copy link
Member

llvmbot commented Mar 26, 2025

@llvm/pr-subscribers-clang

@llvm/pr-subscribers-clang-codegen

Author: Tom Honermann (tahonermann)

Changes

A function declared with the sycl_kernel_entry_point attribute, sometimes called a SYCL kernel entry point function, specifies a pattern from which the parameters and body of an offload entry point function, sometimes called a SYCL kernel caller function, are derived.

SYCL kernel caller functions are emitted during SYCL device compilation. Their parameters and body are derived from the SYCLKernelCallStmt statement and OutlinedFunctionDecl declaration associated with their corresponding SYCL kernel entry point function. A distinct SYCL kernel caller function is generated for each SYCL kernel entry point function defined as a non-inline function or ODR-used in the translation unit.

The name of each SYCL kernel caller function is parameterized by the SYCL kernel name type specified by the sycl_kernel_entry_point attribute attached to the corresponding SYCL kernel entry point function. For the moment, the Itanium ABI mangled name for typeinfo data (_ZTS&lt;type&gt;) is used to name these functions; a future change will switch to a more appropriate naming scheme.

The calling convention used for a SYCL kernel caller function is target dependent. Support for AMDGCN, NVPTX, and SPIR targets is currently provided. These functions are required to observe the language restrictions for SYCL devices as specified by the SYCL 2020 specification; this includes a forward progress guarantee and prohibits recursion.

Only SYCL kernel caller functions, functions declared as SYCL_EXTERNAL, and functions directly or indirectly referenced from those functions should be emitted during device compilation. Pruning of other declarations has not yet been implemented.


Patch is 24.24 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/133030.diff

10 Files Affected:

  • (modified) clang/include/clang/AST/SYCLKernelInfo.h (+6-2)
  • (modified) clang/lib/AST/ASTContext.cpp (+40-4)
  • (modified) clang/lib/CodeGen/CGCall.cpp (+10)
  • (modified) clang/lib/CodeGen/CMakeLists.txt (+1)
  • (modified) clang/lib/CodeGen/CodeGenModule.cpp (+25)
  • (modified) clang/lib/CodeGen/CodeGenModule.h (+5)
  • (added) clang/lib/CodeGen/CodeGenSYCL.cpp (+73)
  • (modified) clang/lib/CodeGen/CodeGenTypes.h (+6)
  • (modified) clang/lib/CodeGen/Targets/NVPTX.cpp (+4)
  • (added) clang/test/CodeGenSYCL/kernel-caller-entry-point.cpp (+177)
diff --git a/clang/include/clang/AST/SYCLKernelInfo.h b/clang/include/clang/AST/SYCLKernelInfo.h
index 4a4827e601053..3825af86c14e3 100644
--- a/clang/include/clang/AST/SYCLKernelInfo.h
+++ b/clang/include/clang/AST/SYCLKernelInfo.h
@@ -22,9 +22,10 @@ namespace clang {
 class SYCLKernelInfo {
 public:
   SYCLKernelInfo(CanQualType KernelNameType,
-                 const FunctionDecl *KernelEntryPointDecl)
+                 const FunctionDecl *KernelEntryPointDecl,
+                 const std::string &KernelName)
       : KernelNameType(KernelNameType),
-        KernelEntryPointDecl(KernelEntryPointDecl) {}
+        KernelEntryPointDecl(KernelEntryPointDecl), KernelName(KernelName) {}
 
   CanQualType getKernelNameType() const { return KernelNameType; }
 
@@ -32,9 +33,12 @@ class SYCLKernelInfo {
     return KernelEntryPointDecl;
   }
 
+  const std::string &GetKernelName() const { return KernelName; }
+
 private:
   CanQualType KernelNameType;
   const FunctionDecl *KernelEntryPointDecl;
+  std::string KernelName;
 };
 
 } // namespace clang
diff --git a/clang/lib/AST/ASTContext.cpp b/clang/lib/AST/ASTContext.cpp
index c9d1bea4c623a..0945eeea4d0f7 100644
--- a/clang/lib/AST/ASTContext.cpp
+++ b/clang/lib/AST/ASTContext.cpp
@@ -12816,6 +12816,15 @@ bool ASTContext::DeclMustBeEmitted(const Decl *D) {
     if (!FD->doesThisDeclarationHaveABody())
       return FD->doesDeclarationForceExternallyVisibleDefinition();
 
+    // Function definitions with the sycl_kernel_entry_point attribute are
+    // required during device compilation so that SYCL kernel caller offload
+    // entry points are emitted.
+    if (LangOpts.SYCLIsDevice && FD->hasAttr<SYCLKernelEntryPointAttr>())
+      return true;
+
+    // FIXME: Functions declared with SYCL_EXTERNAL are required during
+    // FIXME: device compilation.
+
     // Constructors and destructors are required.
     if (FD->hasAttr<ConstructorAttr>() || FD->hasAttr<DestructorAttr>())
       return true;
@@ -14794,9 +14803,36 @@ void ASTContext::getFunctionFeatureMap(llvm::StringMap<bool> &FeatureMap,
   }
 }
 
-static SYCLKernelInfo BuildSYCLKernelInfo(CanQualType KernelNameType,
+static SYCLKernelInfo BuildSYCLKernelInfo(ASTContext &Context,
+                                          CanQualType KernelNameType,
                                           const FunctionDecl *FD) {
-  return {KernelNameType, FD};
+  // Host and device compilation may use different ABIs and different ABIs
+  // may allocate name mangling discriminators differently. A discriminator
+  // override is used to ensure consistent discriminator allocation across
+  // host and device compilation.
+  auto DeviceDiscriminatorOverrider =
+      [](ASTContext &Ctx, const NamedDecl *ND) -> std::optional<unsigned> {
+    if (const auto *RD = dyn_cast<CXXRecordDecl>(ND))
+      if (RD->isLambda())
+        return RD->getDeviceLambdaManglingNumber();
+    return std::nullopt;
+  };
+  std::unique_ptr<MangleContext> MC{ItaniumMangleContext::create(
+      Context, Context.getDiagnostics(), DeviceDiscriminatorOverrider)};
+
+  // Construct a mangled name for the SYCL kernel caller offload entry point.
+  // FIXME: The Itanium typeinfo mangling (_ZTS<type>) is currently used to
+  // FIXME: name the SYCL kernel caller offload entry point function. This
+  // FIXME: mangling does not suffice to clearly identify symbols that
+  // FIXME: correspond to SYCL kernel caller functions, nor is this mangling
+  // FIXME: natural for targets that use a non-Itanium ABI.
+  std::string Buffer;
+  Buffer.reserve(128);
+  llvm::raw_string_ostream Out(Buffer);
+  MC->mangleCanonicalTypeName(KernelNameType, Out);
+  std::string KernelName = Out.str();
+
+  return {KernelNameType, FD, KernelName};
 }
 
 void ASTContext::registerSYCLEntryPointFunction(FunctionDecl *FD) {
@@ -14817,8 +14853,8 @@ void ASTContext::registerSYCLEntryPointFunction(FunctionDecl *FD) {
           declaresSameEntity(FD, IT->second.getKernelEntryPointDecl())) &&
          "SYCL kernel name conflict");
   (void)IT;
-  SYCLKernels.insert(
-      std::make_pair(KernelNameType, BuildSYCLKernelInfo(KernelNameType, FD)));
+  SYCLKernels.insert(std::make_pair(
+      KernelNameType, BuildSYCLKernelInfo(*this, KernelNameType, FD)));
 }
 
 const SYCLKernelInfo &ASTContext::getSYCLKernelInfo(QualType T) const {
diff --git a/clang/lib/CodeGen/CGCall.cpp b/clang/lib/CodeGen/CGCall.cpp
index 7aa77e55dbfcc..8646ce02b85c4 100644
--- a/clang/lib/CodeGen/CGCall.cpp
+++ b/clang/lib/CodeGen/CGCall.cpp
@@ -732,6 +732,16 @@ CodeGenTypes::arrangeBuiltinFunctionDeclaration(CanQualType resultType,
                                  RequiredArgs::All);
 }
 
+const CGFunctionInfo &
+CodeGenTypes::arrangeSYCLKernelCallerDeclaration(QualType resultType,
+                                                 const FunctionArgList &args) {
+  auto argTypes = getArgTypesForDeclaration(Context, args);
+
+  return arrangeLLVMFunctionInfo(
+      GetReturnType(resultType), FnInfoOpts::None, argTypes,
+      FunctionType::ExtInfo(CC_OpenCLKernel), {}, RequiredArgs::All);
+}
+
 /// Arrange a call to a C++ method, passing the given arguments.
 ///
 /// numPrefixArgs is the number of ABI-specific prefix arguments we have. It
diff --git a/clang/lib/CodeGen/CMakeLists.txt b/clang/lib/CodeGen/CMakeLists.txt
index 94a908197d795..a6888ed955916 100644
--- a/clang/lib/CodeGen/CMakeLists.txt
+++ b/clang/lib/CodeGen/CMakeLists.txt
@@ -101,6 +101,7 @@ add_clang_library(clangCodeGen
   CodeGenFunction.cpp
   CodeGenModule.cpp
   CodeGenPGO.cpp
+  CodeGenSYCL.cpp
   CodeGenTBAA.cpp
   CodeGenTypes.cpp
   ConstantInitBuilder.cpp
diff --git a/clang/lib/CodeGen/CodeGenModule.cpp b/clang/lib/CodeGen/CodeGenModule.cpp
index 96ab4dd1837ce..88018a2c28382 100644
--- a/clang/lib/CodeGen/CodeGenModule.cpp
+++ b/clang/lib/CodeGen/CodeGenModule.cpp
@@ -3303,6 +3303,27 @@ void CodeGenModule::EmitDeferred() {
   CurDeclsToEmit.swap(DeferredDeclsToEmit);
 
   for (GlobalDecl &D : CurDeclsToEmit) {
+    // Functions declared with the sycl_kernel_entry_point attribute are
+    // emitted normally during host compilation. During device compilation,
+    // a SYCL kernel caller offload entry point function is generated and
+    // emitted in place of each of these functions.
+    if (const auto *FD = D.getDecl()->getAsFunction()) {
+      if (LangOpts.SYCLIsDevice && FD->hasAttr<SYCLKernelEntryPointAttr>() &&
+          FD->isDefined()) {
+        // Functions with an invalid sycl_kernel_entry_point attribute are
+        // ignored during device compilation.
+        if (!FD->getAttr<SYCLKernelEntryPointAttr>()->isInvalidAttr()) {
+          // Generate and emit the SYCL kernel caller function.
+          EmitSYCLKernelCaller(FD, getContext());
+          // Recurse to emit any symbols directly or indirectly referenced
+          // by the SYCL kernel caller function.
+          EmitDeferred();
+        }
+        // Do not emit the sycl_kernel_entry_point attributed function.
+        continue;
+      }
+    }
+
     // We should call GetAddrOfGlobal with IsForDefinition set to true in order
     // to get GlobalValue with exactly the type we need, not something that
     // might had been created for another decl with the same mangled name but
@@ -3638,6 +3659,10 @@ bool CodeGenModule::MayBeEmittedEagerly(const ValueDecl *Global) {
     // Defer until all versions have been semantically checked.
     if (FD->hasAttr<TargetVersionAttr>() && !FD->isMultiVersion())
       return false;
+    // Defer emission of SYCL kernel entry point functions during device
+    // compilation.
+    if (LangOpts.SYCLIsDevice && FD->hasAttr<SYCLKernelEntryPointAttr>())
+      return false;
   }
   if (const auto *VD = dyn_cast<VarDecl>(Global)) {
     if (Context.getInlineVariableDefinitionKind(VD) ==
diff --git a/clang/lib/CodeGen/CodeGenModule.h b/clang/lib/CodeGen/CodeGenModule.h
index 6deb467b2cc9f..45ccbdd4c44cf 100644
--- a/clang/lib/CodeGen/CodeGenModule.h
+++ b/clang/lib/CodeGen/CodeGenModule.h
@@ -1974,6 +1974,11 @@ class CodeGenModule : public CodeGenTypeCache {
   /// .gcda files in a way that persists in .bc files.
   void EmitCoverageFile();
 
+  /// Given a sycl_kernel_entry_point attributed function, emit the
+  /// corresponding SYCL kernel caller offload entry point function.
+  void EmitSYCLKernelCaller(const FunctionDecl *KernelEntryPointFn,
+                            ASTContext &Ctx);
+
   /// Determine whether the definition must be emitted; if this returns \c
   /// false, the definition can be emitted lazily if it's used.
   bool MustBeEmitted(const ValueDecl *D);
diff --git a/clang/lib/CodeGen/CodeGenSYCL.cpp b/clang/lib/CodeGen/CodeGenSYCL.cpp
new file mode 100644
index 0000000000000..fd332a804cf8d
--- /dev/null
+++ b/clang/lib/CodeGen/CodeGenSYCL.cpp
@@ -0,0 +1,73 @@
+//===--------- CodeGenSYCL.cpp - Code for SYCL kernel generation ----------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This contains code required for generation of SYCL kernel caller offload
+// entry point functions.
+//
+//===----------------------------------------------------------------------===//
+
+#include "CodeGenFunction.h"
+#include "CodeGenModule.h"
+
+using namespace clang;
+using namespace CodeGen;
+
+static void SetSYCLKernelAttributes(llvm::Function *Fn, CodeGenFunction &CGF) {
+  // SYCL 2020 device language restrictions require forward progress and
+  // disallow recursion.
+  Fn->setDoesNotRecurse();
+  if (CGF.checkIfFunctionMustProgress())
+    Fn->addFnAttr(llvm::Attribute::MustProgress);
+}
+
+void CodeGenModule::EmitSYCLKernelCaller(const FunctionDecl *KernelEntryPointFn,
+                                         ASTContext &Ctx) {
+  assert(Ctx.getLangOpts().SYCLIsDevice &&
+         "SYCL kernel caller offload entry point functions can only be emitted"
+         " during device compilation");
+
+  const auto *KernelEntryPointAttr =
+      KernelEntryPointFn->getAttr<SYCLKernelEntryPointAttr>();
+  assert(KernelEntryPointAttr && "Missing sycl_kernel_entry_point attribute");
+  assert(!KernelEntryPointAttr->isInvalidAttr() &&
+         "sycl_kernel_entry_point attribute is invalid");
+
+  // Find the SYCLKernelCallStmt.
+  SYCLKernelCallStmt *KernelCallStmt =
+      dyn_cast<SYCLKernelCallStmt>(KernelEntryPointFn->getBody());
+  assert(KernelCallStmt && "SYCLKernelCallStmt not found");
+
+  // Retrieve the SYCL kernel caller parameters from the OutlinedFunctionDecl.
+  FunctionArgList Args;
+  const OutlinedFunctionDecl *OutlinedFnDecl =
+      KernelCallStmt->getOutlinedFunctionDecl();
+  Args.append(OutlinedFnDecl->param_begin(), OutlinedFnDecl->param_end());
+
+  // Compute the function info and LLVM function type.
+  const CGFunctionInfo &FnInfo =
+      getTypes().arrangeSYCLKernelCallerDeclaration(Ctx.VoidTy, Args);
+  llvm::FunctionType *FnTy = getTypes().GetFunctionType(FnInfo);
+
+  // Retrieve the generated name for the SYCL kernel caller function.
+  CanQualType KernelNameType =
+      Ctx.getCanonicalType(KernelEntryPointAttr->getKernelName());
+  const SYCLKernelInfo &KernelInfo = Ctx.getSYCLKernelInfo(KernelNameType);
+  auto *Fn = llvm::Function::Create(FnTy, llvm::Function::ExternalLinkage,
+                                    KernelInfo.GetKernelName(), &getModule());
+
+  // Emit the SYCL kernel caller function.
+  CodeGenFunction CGF(*this);
+  SetLLVMFunctionAttributes(GlobalDecl(), FnInfo, Fn, false);
+  SetSYCLKernelAttributes(Fn, CGF);
+  CGF.StartFunction(GlobalDecl(), Ctx.VoidTy, Fn, FnInfo, Args,
+                    SourceLocation(), SourceLocation());
+  CGF.EmitFunctionBody(OutlinedFnDecl->getBody());
+  setDSOLocal(Fn);
+  SetLLVMFunctionAttributesForDefinition(cast<Decl>(OutlinedFnDecl), Fn);
+  CGF.FinishFunction();
+}
diff --git a/clang/lib/CodeGen/CodeGenTypes.h b/clang/lib/CodeGen/CodeGenTypes.h
index 5aebf9a212237..aac0c20aaa933 100644
--- a/clang/lib/CodeGen/CodeGenTypes.h
+++ b/clang/lib/CodeGen/CodeGenTypes.h
@@ -229,6 +229,12 @@ class CodeGenTypes {
   const CGFunctionInfo &arrangeBuiltinFunctionCall(QualType resultType,
                                                    const CallArgList &args);
 
+  /// A SYCL device kernel function is a free standing function with
+  /// spir_kernel calling convention
+  const CGFunctionInfo &
+  arrangeSYCLKernelCallerDeclaration(QualType resultType,
+                                     const FunctionArgList &args);
+
   /// Objective-C methods are C functions with some implicit parameters.
   const CGFunctionInfo &arrangeObjCMethodDeclaration(const ObjCMethodDecl *MD);
   const CGFunctionInfo &arrangeObjCMessageSendSignature(const ObjCMethodDecl *MD,
diff --git a/clang/lib/CodeGen/Targets/NVPTX.cpp b/clang/lib/CodeGen/Targets/NVPTX.cpp
index f617e645a9eaf..25ab28c54b659 100644
--- a/clang/lib/CodeGen/Targets/NVPTX.cpp
+++ b/clang/lib/CodeGen/Targets/NVPTX.cpp
@@ -77,6 +77,10 @@ class NVPTXTargetCodeGenInfo : public TargetCodeGenInfo {
     return true;
   }
 
+  unsigned getOpenCLKernelCallingConv() const override {
+    return llvm::CallingConv::PTX_Kernel;
+  }
+
   // Adds a NamedMDNode with GV, Name, and Operand as operands, and adds the
   // resulting MDNode to the nvvm.annotations MDNode.
   static void addNVVMMetadata(llvm::GlobalValue *GV, StringRef Name,
diff --git a/clang/test/CodeGenSYCL/kernel-caller-entry-point.cpp b/clang/test/CodeGenSYCL/kernel-caller-entry-point.cpp
new file mode 100644
index 0000000000000..2aca8dd8ea69b
--- /dev/null
+++ b/clang/test/CodeGenSYCL/kernel-caller-entry-point.cpp
@@ -0,0 +1,177 @@
+// RUN: %clang_cc1 -fsycl-is-host -emit-llvm -triple x86_64-unknown-linux-gnu %s -o - | FileCheck --check-prefixes=CHECK-HOST,CHECK-HOST-LINUX %s
+// RUN: %clang_cc1 -fsycl-is-device -emit-llvm -aux-triple x86_64-unknown-linux-gnu -triple amdgcn %s -o - | FileCheck --check-prefixes=CHECK-DEVICE,CHECK-AMDGCN %s
+// RUN: %clang_cc1 -fsycl-is-device -emit-llvm -aux-triple x86_64-unknown-linux-gnu -triple nvptx %s -o - | FileCheck --check-prefixes=CHECK-DEVICE,CHECK-NVPTX %s
+// RUN: %clang_cc1 -fsycl-is-device -emit-llvm -aux-triple x86_64-unknown-linux-gnu -triple spir64 %s -o - | FileCheck --check-prefixes=CHECK-DEVICE,CHECK-SPIR %s
+// RUN: %clang_cc1 -fsycl-is-host -emit-llvm -triple x86_64-pc-windows-msvc %s -o - | FileCheck --check-prefixes=CHECK-HOST,CHECK-HOST-WINDOWS %s
+// RUN: %clang_cc1 -fsycl-is-device -emit-llvm -aux-triple x86_64-pc-windows-msvc -triple amdgcn %s -o - | FileCheck --check-prefixes=CHECK-DEVICE,CHECK-AMDGCN %s
+// RUN: %clang_cc1 -fsycl-is-device -emit-llvm -aux-triple x86_64-pc-windows-msvc -triple nvptx %s -o - | FileCheck --check-prefixes=CHECK-DEVICE,CHECK-NVPTX %s
+// RUN: %clang_cc1 -fsycl-is-device -emit-llvm -aux-triple x86_64-pc-windows-msvc -triple spir64 %s -o - | FileCheck --check-prefixes=CHECK-DEVICE,CHECK-SPIR %s
+
+// Test the generation of SYCL kernel caller functions. These functions are
+// generated from functions declared with the sycl_kernel_entry_point attribute
+// and emited during device compilation. They are not emitted during device
+// compilation.
+
+template <typename name, typename Func>
+__attribute__((sycl_kernel_entry_point(name))) void kernel_single_task(const Func kernelFunc) {
+  kernelFunc();
+}
+
+struct single_purpose_kernel_name;
+struct single_purpose_kernel {
+  void operator()() const {}
+};
+
+__attribute__((sycl_kernel_entry_point(single_purpose_kernel_name)))
+void single_purpose_kernel_task(single_purpose_kernel kernelFunc) {
+  kernelFunc();
+}
+
+int main() {
+  int capture;
+  kernel_single_task<class lambda_kernel_name>(
+    [=]() {
+      (void) capture;
+    });
+  single_purpose_kernel obj;
+  single_purpose_kernel_task(obj);
+}
+
+// Verify that SYCL kernel caller functions are not emitted during host
+// compilation.
+//
+// CHECK-HOST-NOT: _ZTS26single_purpose_kernel_name
+// CHECK-HOST-NOT: _ZTSZ4mainE18lambda_kernel_name
+
+// Verify that sycl_kernel_entry_point attributed functions are not emitted
+// during device compilation.
+//
+// CHECK-DEVICE-NOT: single_purpose_kernel_task
+// CHECK-DEVICE-NOT: kernel_single_task
+
+// Verify that no code is generated for the bodies of sycl_kernel_entry_point
+// attributed functions during host compilation. ODR-use of these functions may
+// require them to be emitted, but they have no effect if called.
+//
+// CHECK-HOST-LINUX:      define dso_local void @_Z26single_purpose_kernel_task21single_purpose_kernel() #[[LINUX_ATTR0:[0-9]+]] {
+// CHECK-HOST-LINUX-NEXT: entry:
+// CHECK-HOST-LINUX-NEXT:   %kernelFunc = alloca %struct.single_purpose_kernel, align 1
+// CHECK-HOST-LINUX-NEXT:   ret void
+// CHECK-HOST-LINUX-NEXT: }
+//
+// CHECK-HOST-LINUX:      define internal void @_Z18kernel_single_taskIZ4mainE18lambda_kernel_nameZ4mainEUlvE_EvT0_(i32 %kernelFunc.coerce) #[[LINUX_ATTR0:[0-9]+]] {
+// CHECK-HOST-LINUX-NEXT: entry:
+// CHECK-HOST-LINUX-NEXT:   %kernelFunc = alloca %class.anon, align 4
+// CHECK-HOST-LINUX-NEXT:   %coerce.dive = getelementptr inbounds nuw %class.anon, ptr %kernelFunc, i32 0, i32 0
+// CHECK-HOST-LINUX-NEXT:   store i32 %kernelFunc.coerce, ptr %coerce.dive, align 4
+// CHECK-HOST-LINUX-NEXT:   ret void
+// CHECK-HOST-LINUX-NEXT: }
+//
+// CHECK-HOST-WINDOWS:      define dso_local void @"?single_purpose_kernel_task@@YAXUsingle_purpose_kernel@@@Z"(i8 %kernelFunc.coerce) #[[WINDOWS_ATTR0:[0-9]+]] {
+// CHECK-HOST-WINDOWS-NEXT: entry:
+// CHECK-HOST-WINDOWS-NEXT:   %kernelFunc = alloca %struct.single_purpose_kernel, align 1
+// CHECK-HOST-WINDOWS-NEXT:   %coerce.dive = getelementptr inbounds nuw %struct.single_purpose_kernel, ptr %kernelFunc, i32 0, i32 0
+// CHECK-HOST-WINDOWS-NEXT:   store i8 %kernelFunc.coerce, ptr %coerce.dive, align 1
+// CHECK-HOST-WINDOWS-NEXT:   ret void
+// CHECK-HOST-WINDOWS-NEXT: }
+//
+// CHECK-HOST-WINDOWS:      define internal void @"??$kernel_single_task@Vlambda_kernel_name@?1??main@@9@V<lambda_1>@?0??2@9@@@YAXV<lambda_1>@?0??main@@9@@Z"(i32 %kernelFunc.coerce) #[[WINDOWS_ATTR0:[0-9]+]] {
+// CHECK-HOST-WINDOWS-NEXT: entry:
+// CHECK-HOST-WINDOWS-NEXT:   %kernelFunc = alloca %class.anon, align 4
+// CHECK-HOST-WINDOWS-NEXT:   %coerce.dive = getelementptr inbounds nuw %class.anon, ptr %kernelFunc, i32 0, i32 0
+// CHECK-HOST-WINDOWS-NEXT:   store i32 %kernelFunc.coerce, ptr %coerce.dive, align 4
+// CHECK-HOST-WINDOWS-NEXT:   ret void
+// CHECK-HOST-WINDOWS-NEXT: }
+
+// Verify that SYCL kernel caller functions are emitted for each device target.
+//
+// FIXME: The following set of matches are used to skip over the declaration of
+// FIXME: main(). main() shouldn't be emitted in device code, but that pruning
+// FIXME: isn't performed yet.
+// CHECK-DEVICE:      Function Attrs: convergent mustprogress noinline norecurse nounwind optnone
+// CHECK-DEVICE-NEXT: define dso_local noundef i32 @main() #0
+
+// IR for the SYCL kernel caller function generated for
+// single_purpose_kernel_task with single_purpose_kernel_name as the SYCL kernel
+// name type.
+//
+// CHECK-AMDGCN:      Function Attrs: convergent mustprogress noinline norecurse nounwind optnone
+// CHECK-AMDGCN-NEXT: define dso_local amdgpu_kernel void @_ZTS26single_purpose_kernel_name
+// CHECK-AMDGCN-SAME:   (ptr addrspace(4) noundef byref(%struct.single_purpose_kernel) align 1 %0) #[[AMDGCN_ATTR0:[0-9]+]] {
+// CHECK-AMDGCN-NEXT: entry:
+// CHECK-AMDGCN-NEXT:   %coerce = alloca %struct.single_purpose_kernel, align 1, addrspace(5)
+// CHECK-AMDGCN-NEXT:   %kernelFunc = addrspacecast ptr addrspace(5) %coerce to ptr
+// CHECK-AMDGCN-NEXT:   call void @llvm.memcpy.p0.p4.i64(ptr align 1 %kernelFunc, ptr addrspace(4) align 1 %0, i64 1, i1 false)
+// CHECK-AMDGCN-NEXT:   call void @_ZNK21single_purpose_kernelclEv
+// CHECK-AMDGCN-SAME:     (ptr noundef nonnull align 1 dereferenceable(1) %kernelFunc) #[[AMDGCN_ATTR1:[0-9]+]]
+// CHECK-AMDGCN-NEXT:   ret void
+// CHECK-AMDGCN-NEXT: }
+// CHECK-AMDGCN:      define linkonce_odr void @_ZNK21single_purpose_kernelclEv
+//
+// CHECK-NVPTX:       Function Attrs: convergent mustprogress noinline norecurse nounwind optnone
+// CHECK-NVPTX-NEXT:  define dso_local ptx_kernel void @_ZTS26single_purpose_kernel_name
+// CHECK-NVPTX-SAME:    (ptr noundef byval(%struct.sin...
[truncated]

Copy link
Contributor Author

@tahonermann tahonermann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the review comments @Fznamznon and @bader! I added responses to some of the comments and will get updates applied tomorrow.

…d Alexey Bader.

- Corrected a misleading comment.
- Corrected inconsistent naming style in a test.
- Expanded a test to exercise 32-bit and 64-bit SPIR, SPIR-V, and NVPTX.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:codegen IR generation bugs: mangling, exceptions, etc. clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category SYCL https://registry.khronos.org/SYCL
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants