[PGO][Offload] Allow PGO flags to be used on GPU targets #94268

EthanLuisMcDonough · 2024-06-03T18:32:54Z

This pull request is the third part of an ongoing effort to extends PGO instrumentation to GPU device code and depends on #93365. This PR makes the following changes:

Allows PGO flags to be supplied to GPU targets
Pulls version global from device
Modifies __llvm_write_custom_profile and lprofWriteDataImpl to allow the PGO version to be overridden

github-actions · 2024-06-03T18:36:49Z

✅ With the latest revision this PR passed the C/C++ code formatter.

jhuber6

Seems to be lots of accidental clang-format changes. Why do we need new flags for this instead of just using the old ones and changing behavior when the target is a known GPU? I.e. SPIR-V, CUDA, or HSA.

EthanLuisMcDonough · 2024-06-24T05:03:31Z

@jhuber6 The clang format errors are mostly due to my local version of clang-format disagreeing with the buildbot's version. Its a bit annoying, but it shouldn't be too much of a problem given I plan on squashing and merging once this gets approved.

I added new flags for GPU PGO specifically because I didn't want to modify the PGO flags' existing behavior. PGO has a significant runtime cost, so I figured it would be best for the end user experience to only enable PGO on the GPU when it was specifically requested.

jhuber6 · 2024-06-24T10:58:48Z

@jhuber6 The clang format errors are mostly due to my local version of clang-format disagreeing with the buildbot's version. Its a bit annoying, but it shouldn't be too much of a problem given I plan on squashing and merging once this gets approved.

I added new flags for GPU PGO specifically because I didn't want to modify the PGO flags' existing behavior. PGO has a significant runtime cost, so I figured it would be best for the end user experience to only enable PGO on the GPU when it was specifically requested.

Is this something that specifically requires its own flag? Or could we just do -Xarch_device -fprofile-generate.

EthanLuisMcDonough · 2024-06-24T14:08:42Z

Is this something that specifically requires its own flag? Or could we just do -Xarch_device -fprofile-generate.

Right now the -fprofile-generate-gpu and -fprofile-instr-generate-gpu flags make sure that the compiler-rt profiling library is included even if host profiling isn't enabled, but your suggestion seems quite nice and compact. I'm open to looking into this. Thoughts @jdoerfert?

jdenny-ornl · 2024-10-02T15:40:46Z

offload/test/offloading/gpupgo/pgo1.c

@@ -0,0 +1,82 @@
+// RUN: %libomptarget-compile-generic -fprofile-generate-gpu


When targeting a V100, this command fails for me in both pgo1.c and pgo2.c. In the LTO case:

LLVM ERROR: Circular dependency found in global variable set

In the non-LTO case:

fatal error: error in backend: NVPTX aliasee must be a non-kernel function definition

I do not see this problem in PR #93365's pgo1.c.

This is a limitation of the PTX target, globals cannot reference themselves. Most likely whatever NVIDIA engineer wrote the PTX parser found it annoying to reference something that wasn't fully parsed yet so he just decided to make it an error and here we are. See https://godbolt.org/z/53PP5c5ve.

jdenny-ornl · 2024-10-03T23:58:19Z

For some codes, I get the following error for a gfx906:

LLVM ERROR: Relocation for CG Profile could not be created: unknown relocation name

I see it for OpenMC, but the following is a simpler example:

$ cat test.c
#include <stdio.h>
#include <stdlib.h>
__attribute__((noinline))
double test(double x, int n) {
  double res = 1;
  for (int i = 0; i < n; ++i)
    res *= x;
  return res;
}
int main(int argc, char *argv[]) {
  double x = atof(argv[1]);
  unsigned n = atoi(argv[2]);
  #pragma omp target map(tofrom:x)
  x = test(x, n);
  printf("%f\n", x);
  return 0;
}

$ clang -O2 -g -fopenmp --offload-arch=native test.c -o test \
      -fprofile-generate -fprofile-generate-gpu

$ LLVM_PROFILE_FILE=test.profraw ./test 2 4
16.000000

$ llvm-profdata merge -output=test.profdata *.profraw

$ clang -O2 -g -fopenmp --offload-arch=native test.c -foffload-lto \
      -fprofile-use-gpu=test.profdata

I can prevent the error by lowering the last -O2 to -O1 or by removing the __attribute__((noinline)). Am I doing something wrong?

EthanLuisMcDonough · 2024-11-01T22:23:16Z

For some codes, I get the following error for a gfx906:

LLVM ERROR: Relocation for CG Profile could not be created: unknown relocation name

I see it for OpenMC, but the following is a simpler example:

$ cat test.c
#include <stdio.h>
#include <stdlib.h>
__attribute__((noinline))
double test(double x, int n) {
  double res = 1;
  for (int i = 0; i < n; ++i)
    res *= x;
  return res;
}
int main(int argc, char *argv[]) {
  double x = atof(argv[1]);
  unsigned n = atoi(argv[2]);
  #pragma omp target map(tofrom:x)
  x = test(x, n);
  printf("%f\n", x);
  return 0;
}

$ clang -O2 -g -fopenmp --offload-arch=native test.c -o test \
      -fprofile-generate -fprofile-generate-gpu

$ LLVM_PROFILE_FILE=test.profraw ./test 2 4
16.000000

$ llvm-profdata merge -output=test.profdata *.profraw

$ clang -O2 -g -fopenmp --offload-arch=native test.c -foffload-lto \
      -fprofile-use-gpu=test.profdata

I can prevent the error by lowering the last -O2 to -O1 or by removing the __attribute__((noinline)). Am I doing something wrong?

Not sure if this is still relevant, but I think #114617 should fix this issue. The ASM builder was trying to create a reloc directive with the kind BFD_RELOC_NONE, which isn't defined in getFixupKind for AMDGPU.

jdenny-ornl · 2024-11-27T19:44:48Z

For some codes, I get the following error for a gfx906:
LLVM ERROR: Relocation for CG Profile could not be created: unknown relocation name 

Not sure if this is still relevant, but I think #114617 should fix this issue. The ASM builder was trying to create a reloc directive with the kind BFD_RELOC_NONE, which isn't defined in getFixupKind for AMDGPU.

Thanks, it does seem to fix it.

If possible, would you please keep this PR in sync with PR #93365? That should bring in the above fix and generally help me to review against my use cases.

EthanLuisMcDonough · 2024-11-27T21:49:28Z

For some codes, I get the following error for a gfx906:
LLVM ERROR: Relocation for CG Profile could not be created: unknown relocation name 
Not sure if this is still relevant, but I think #114617 should fix this issue. The ASM builder was trying to create a reloc directive with the kind BFD_RELOC_NONE, which isn't defined in getFixupKind for AMDGPU.

Thanks, it does seem to fix it.

If possible, would you please keep this PR in sync with PR #93365? That should bring in the above fix and generally help me to review against my use cases.

Sure, I'll sync this branch with my changes in the other PR.

jhuber6

My preference is that we do not introduce new flags, but use the ones we have already and use -Xarch as the canonical way to do it.

-fsanitize_instr_generate // Both host and all devices
-Xarch_host -fsanitize_instr_generate // Only on the host
-Xarch_device -fsanitize_instr_generate // Only on the devices
-Xarch_nvptx64 -fsanitize_instr_generate  // Only on NVPTX targets
-Xarch_gfx90a -fsanitize_instr_generate  // Only on gfx90a

All of this handling should be done for you by the getArgsForToolchain() interface that's called before we generate the arguments from the tools. So, all this should require is tests + documentation for this usage. You should just be able to replace existing checks on these new flags to the standard ones.

jhuber6 · 2025-03-04T19:33:29Z

offload/plugins-nextgen/common/src/GlobalHandler.cpp

@@ -273,7 +273,7 @@ void GPUProfGlobals::dump() const {
 }

 Error GPUProfGlobals::write() const {
-  if (!__llvm_write_custom_profile)
+  if (__llvm_write_custom_profile == NULL)


!ptr is canonical, but if you want to be explicit use `nullptr.

offload/test/offloading/gpupgo/pgo1.c

jdoerfert · 2025-02-11T21:59:55Z

clang/include/clang/Driver/Options.td

+    Group<f_Group>,
+    Visibility<[ClangOption, CLOption]>,
+    MetaVarName<"<pathname>">,
+    HelpText<"Use instrumentation data for profile-guided optimization targeting GPU">;


The existing non-gpu text reads different, on purpose?

jdoerfert · 2025-02-11T22:00:18Z

clang/lib/Driver/ToolChain.cpp

-                                      Multilib::flags_list &Result) {
+static void getARMMultilibFlags(const Driver &D, const llvm::Triple &Triple,
+                                const llvm::opt::ArgList &Args,
+                                Multilib::flags_list &Result) {


unrelated, more below. Don't format the file, use git-clang-format to format only your commit.

jdoerfert · 2025-02-11T22:02:02Z

clang/lib/Driver/ToolChains/Clang.cpp

@@ -575,6 +575,76 @@ static void addDashXForInput(const ArgList &Args, const InputInfo &Input,
  }
 }

+static void addPGOFlagsGPU(const ToolChain &TC, const ArgList &Args,
+                           ArgStringList &CmdArgs) {


Was this copied from somewhere? Can we merge it, e.g., in a template that takes the options enums (CPU vs GPU)?

jdoerfert · 2025-02-11T22:02:24Z

compiler-rt/include/profile/InstrProfData.inc

@@ -152,7 +152,7 @@ INSTR_PROF_VALUE_NODE(PtrToNodeT, llvm::PointerType::getUnqual(Ctx), Next, \
 #define INSTR_PROF_DATA_DEFINED
 #endif
 INSTR_PROF_RAW_HEADER(uint64_t, Magic, __llvm_profile_get_magic())
-INSTR_PROF_RAW_HEADER(uint64_t, Version, __llvm_profile_get_version())
+INSTR_PROF_RAW_HEADER(uint64_t, Version, Version)


Why did this change?

This ensures that a GPU profile is written with the version pulled from the GPU. Previously, if the host used LLVM-level instrumentation and the device has clang-level instrumentation, the GPU profile would use the host's format. This change ensures that is not the case.

This file is a part of public API and is used by a number of users that have their own runtime implementation (for example baremetal users, OS kernels) and this change is going to break all of them. Is there another way we could handle this case that wouldn't require this change and avoid all that churn?

I think the best solution would be to forbid the user from using different instrumentation levels (LLVM IR vs clang) on the host and device. I can replace the version replacement with a check that ensures the device version matches the host version.

I fixed the version detection and made sure to set the Version property the custom value instead of changing the default value.

llvmbot · 2025-03-18T21:22:22Z

@llvm/pr-subscribers-clang-codegen
@llvm/pr-subscribers-llvm-transforms
@llvm/pr-subscribers-offload
@llvm/pr-subscribers-pgo
@llvm/pr-subscribers-clang-driver

@llvm/pr-subscribers-clang

Author: Ethan Luis McDonough (EthanLuisMcDonough)

Changes

This pull request is the third part of an ongoing effort to extends PGO instrumentation to GPU device code and depends on #93365. This PR makes the following changes:

Allows PGO flags to be supplied to GPU targets
Pulls version global from device
Modifies __llvm_write_custom_profile and lprofWriteDataImpl to allow the PGO version to be overridden

Patch is 26.26 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/94268.diff

19 Files Affected:

(modified) clang/lib/Driver/ToolChains/Clang.cpp (+1-5)
(removed) clang/test/Driver/cuda-no-pgo-or-coverage.cu (-33)
(modified) compiler-rt/include/profile/InstrProfData.inc (+1-1)
(modified) compiler-rt/lib/profile/InstrProfiling.h (+2-1)
(modified) compiler-rt/lib/profile/InstrProfilingBuffer.c (+2-1)
(modified) compiler-rt/lib/profile/InstrProfilingFile.c (+15-7)
(modified) compiler-rt/lib/profile/InstrProfilingInternal.h (+2-1)
(modified) compiler-rt/lib/profile/InstrProfilingWriter.c (+10-10)
(modified) llvm/include/llvm/ProfileData/InstrProfData.inc (+1-1)
(modified) llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp (+4-1)
(modified) llvm/test/tools/llvm-profdata/binary-ids-padding.test (+1-1)
(modified) llvm/test/tools/llvm-profdata/malformed-not-space-for-another-header.test (+1-1)
(modified) llvm/test/tools/llvm-profdata/malformed-num-counters-zero.test (+1-1)
(modified) llvm/test/tools/llvm-profdata/malformed-ptr-to-counter-array.test (+1-1)
(modified) offload/plugins-nextgen/common/include/GlobalHandler.h (+9-4)
(modified) offload/plugins-nextgen/common/src/GlobalHandler.cpp (+14-4)
(added) offload/test/offloading/gpupgo/pgo1.c (+84)
(added) offload/test/offloading/gpupgo/pgo2.c (+76)
(removed) offload/test/offloading/pgo1.c (-66)

diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp
index 1012128085c7a..e0f1206496486 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -6387,11 +6387,7 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA,
   Args.AddLastArg(CmdArgs, options::OPT_fconvergent_functions,
                   options::OPT_fno_convergent_functions);
 
-  // NVPTX/AMDGCN doesn't support PGO or coverage. There's no runtime support
-  // for sampling, overhead of call arc collection is way too high and there's
-  // no way to collect the output.
-  if (!Triple.isNVPTX() && !Triple.isAMDGCN())
-    addPGOAndCoverageFlags(TC, C, JA, Output, Args, SanitizeArgs, CmdArgs);
+  addPGOAndCoverageFlags(TC, C, JA, Output, Args, SanitizeArgs, CmdArgs);
 
   Args.AddLastArg(CmdArgs, options::OPT_fclang_abi_compat_EQ);
 
diff --git a/clang/test/Driver/cuda-no-pgo-or-coverage.cu b/clang/test/Driver/cuda-no-pgo-or-coverage.cu
deleted file mode 100644
index b84587e1e182b..0000000000000
--- a/clang/test/Driver/cuda-no-pgo-or-coverage.cu
+++ /dev/null
@@ -1,33 +0,0 @@
-// Check that profiling/coverage arguments doen't get passed down to device-side
-// compilation.
-//
-//
-// XRUN: not %clang -### --target=x86_64-linux-gnu -c --cuda-gpu-arch=sm_20 \
-// XRUN:   -fprofile-generate %s 2>&1 | \
-// XRUN:   FileCheck --check-prefixes=CHECK,PROF %s
-//
-// RUN: not %clang -### --target=x86_64-linux-gnu -c --cuda-gpu-arch=sm_20 \
-// RUN:   -fprofile-instr-generate %s 2>&1 | \
-// RUN:   FileCheck --check-prefixes=CHECK,PROF %s
-//
-// RUN: not %clang -### --target=x86_64-linux-gnu -c --cuda-gpu-arch=sm_20 \
-// RUN:   -coverage %s 2>&1 | \
-// RUN:   FileCheck --check-prefixes=CHECK,GCOV %s
-//
-// RUN: not %clang -### --target=x86_64-linux-gnu -c --cuda-gpu-arch=sm_20 \
-// RUN:   -ftest-coverage %s 2>&1 | \
-// RUN:   FileCheck --check-prefixes=CHECK,GCOV %s
-//
-// RUN: not %clang -### --target=x86_64-linux-gnu -c --cuda-gpu-arch=sm_20   \
-// RUN:   -fprofile-instr-generate -fcoverage-mapping %s 2>&1 | \
-// RUN:   FileCheck --check-prefixes=CHECK,PROF %s
-//
-//
-// CHECK-NOT: error: unsupported option '-fprofile
-// CHECK-NOT: error: invalid argument
-// CHECK-DAG: "-fcuda-is-device"
-// CHECK-NOT: "-f{{[^"/]*coverage.*}}"
-// CHECK-NOT: "-fprofile{{[^"]*}}"
-// CHECK: "-triple" "x86_64-unknown-linux-gnu"
-// PROF:      "-fprofile{{.*}}"
-// GCOV:      "-coverage-notes-file=
diff --git a/compiler-rt/include/profile/InstrProfData.inc b/compiler-rt/include/profile/InstrProfData.inc
index 2cdfea9a579a4..d51b58386f168 100644
--- a/compiler-rt/include/profile/InstrProfData.inc
+++ b/compiler-rt/include/profile/InstrProfData.inc
@@ -152,7 +152,7 @@ INSTR_PROF_VALUE_NODE(PtrToNodeT, llvm::PointerType::getUnqual(Ctx), Next, \
 #define INSTR_PROF_DATA_DEFINED
 #endif
 INSTR_PROF_RAW_HEADER(uint64_t, Magic, __llvm_profile_get_magic())
-INSTR_PROF_RAW_HEADER(uint64_t, Version, __llvm_profile_get_version())
+INSTR_PROF_RAW_HEADER(uint64_t, Version, Version)
 INSTR_PROF_RAW_HEADER(uint64_t, BinaryIdsSize, __llvm_write_binary_ids(NULL))
 INSTR_PROF_RAW_HEADER(uint64_t, NumData, NumData)
 INSTR_PROF_RAW_HEADER(uint64_t, PaddingBytesBeforeCounters, PaddingBytesBeforeCounters)
diff --git a/compiler-rt/lib/profile/InstrProfiling.h b/compiler-rt/lib/profile/InstrProfiling.h
index 77c8d6c79322d..a90558fdcfbbf 100644
--- a/compiler-rt/lib/profile/InstrProfiling.h
+++ b/compiler-rt/lib/profile/InstrProfiling.h
@@ -310,7 +310,8 @@ int __llvm_write_custom_profile(const char *Target,
                                 const __llvm_profile_data *DataEnd,
                                 const char *CountersBegin,
                                 const char *CountersEnd, const char *NamesBegin,
-                                const char *NamesEnd);
+                                const char *NamesEnd,
+                                const uint64_t *VersionOverride);
 
 /*!
  * This variable is defined in InstrProfilingRuntime.cpp as a hidden
diff --git a/compiler-rt/lib/profile/InstrProfilingBuffer.c b/compiler-rt/lib/profile/InstrProfilingBuffer.c
index 1c451d7ec7563..b406e8db74f3f 100644
--- a/compiler-rt/lib/profile/InstrProfilingBuffer.c
+++ b/compiler-rt/lib/profile/InstrProfilingBuffer.c
@@ -252,5 +252,6 @@ COMPILER_RT_VISIBILITY int __llvm_profile_write_buffer_internal(
       &BufferWriter, DataBegin, DataEnd, CountersBegin, CountersEnd,
       BitmapBegin, BitmapEnd, /*VPDataReader=*/0, NamesBegin, NamesEnd,
       /*VTableBegin=*/NULL, /*VTableEnd=*/NULL, /*VNamesBegin=*/NULL,
-      /*VNamesEnd=*/NULL, /*SkipNameDataWrite=*/0);
+      /*VNamesEnd=*/NULL, /*SkipNameDataWrite=*/0,
+      __llvm_profile_get_version());
 }
diff --git a/compiler-rt/lib/profile/InstrProfilingFile.c b/compiler-rt/lib/profile/InstrProfilingFile.c
index 4667c02892505..19467429cf4c3 100644
--- a/compiler-rt/lib/profile/InstrProfilingFile.c
+++ b/compiler-rt/lib/profile/InstrProfilingFile.c
@@ -1273,10 +1273,13 @@ COMPILER_RT_VISIBILITY int __llvm_profile_set_file_object(FILE *File,
   return 0;
 }
 
-COMPILER_RT_USED int __llvm_write_custom_profile(
-    const char *Target, const __llvm_profile_data *DataBegin,
-    const __llvm_profile_data *DataEnd, const char *CountersBegin,
-    const char *CountersEnd, const char *NamesBegin, const char *NamesEnd) {
+int __llvm_write_custom_profile(const char *Target,
+                                const __llvm_profile_data *DataBegin,
+                                const __llvm_profile_data *DataEnd,
+                                const char *CountersBegin,
+                                const char *CountersEnd, const char *NamesBegin,
+                                const char *NamesEnd,
+                                const uint64_t *VersionOverride) {
   int ReturnValue = 0, FilenameLength, TargetLength;
   char *FilenameBuf, *TargetFilename;
   const char *Filename;
@@ -1358,10 +1361,15 @@ COMPILER_RT_USED int __llvm_write_custom_profile(
   ProfDataWriter fileWriter;
   initFileWriter(&fileWriter, OutputFile);
 
+  uint64_t Version = __llvm_profile_get_version();
+  if (VersionOverride)
+    Version = *VersionOverride;
+
   /* Write custom data to the file */
-  ReturnValue = lprofWriteDataImpl(
-      &fileWriter, DataBegin, DataEnd, CountersBegin, CountersEnd, NULL, NULL,
-      lprofGetVPDataReader(), NULL, NULL, NULL, NULL, NamesBegin, NamesEnd, 0);
+  ReturnValue =
+      lprofWriteDataImpl(&fileWriter, DataBegin, DataEnd, CountersBegin,
+                         CountersEnd, NULL, NULL, lprofGetVPDataReader(), NULL,
+                         NULL, NULL, NULL, NamesBegin, NamesEnd, 0, Version);
   closeFileObject(OutputFile);
 
   // Restore SIGKILL.
diff --git a/compiler-rt/lib/profile/InstrProfilingInternal.h b/compiler-rt/lib/profile/InstrProfilingInternal.h
index b100343ca04f9..03df71828b91d 100644
--- a/compiler-rt/lib/profile/InstrProfilingInternal.h
+++ b/compiler-rt/lib/profile/InstrProfilingInternal.h
@@ -160,7 +160,8 @@ int lprofWriteDataImpl(ProfDataWriter *Writer,
                        VPDataReaderType *VPDataReader, const char *NamesBegin,
                        const char *NamesEnd, const VTableProfData *VTableBegin,
                        const VTableProfData *VTableEnd, const char *VNamesBegin,
-                       const char *VNamesEnd, int SkipNameDataWrite);
+                       const char *VNamesEnd, int SkipNameDataWrite,
+                       uint64_t Version);
 
 /* Merge value profile data pointed to by SrcValueProfData into
  * in-memory profile counters pointed by to DstData.  */
diff --git a/compiler-rt/lib/profile/InstrProfilingWriter.c b/compiler-rt/lib/profile/InstrProfilingWriter.c
index 8816a71155511..bcd88b30d050d 100644
--- a/compiler-rt/lib/profile/InstrProfilingWriter.c
+++ b/compiler-rt/lib/profile/InstrProfilingWriter.c
@@ -254,21 +254,21 @@ COMPILER_RT_VISIBILITY int lprofWriteData(ProfDataWriter *Writer,
   const VTableProfData *VTableEnd = __llvm_profile_end_vtables();
   const char *VNamesBegin = __llvm_profile_begin_vtabnames();
   const char *VNamesEnd = __llvm_profile_end_vtabnames();
+  uint64_t Version = __llvm_profile_get_version();
   return lprofWriteDataImpl(Writer, DataBegin, DataEnd, CountersBegin,
                             CountersEnd, BitmapBegin, BitmapEnd, VPDataReader,
                             NamesBegin, NamesEnd, VTableBegin, VTableEnd,
-                            VNamesBegin, VNamesEnd, SkipNameDataWrite);
+                            VNamesBegin, VNamesEnd, SkipNameDataWrite, Version);
 }
 
-COMPILER_RT_VISIBILITY int
-lprofWriteDataImpl(ProfDataWriter *Writer, const __llvm_profile_data *DataBegin,
-                   const __llvm_profile_data *DataEnd,
-                   const char *CountersBegin, const char *CountersEnd,
-                   const char *BitmapBegin, const char *BitmapEnd,
-                   VPDataReaderType *VPDataReader, const char *NamesBegin,
-                   const char *NamesEnd, const VTableProfData *VTableBegin,
-                   const VTableProfData *VTableEnd, const char *VNamesBegin,
-                   const char *VNamesEnd, int SkipNameDataWrite) {
+COMPILER_RT_VISIBILITY int lprofWriteDataImpl(
+    ProfDataWriter *Writer, const __llvm_profile_data *DataBegin,
+    const __llvm_profile_data *DataEnd, const char *CountersBegin,
+    const char *CountersEnd, const char *BitmapBegin, const char *BitmapEnd,
+    VPDataReaderType *VPDataReader, const char *NamesBegin,
+    const char *NamesEnd, const VTableProfData *VTableBegin,
+    const VTableProfData *VTableEnd, const char *VNamesBegin,
+    const char *VNamesEnd, int SkipNameDataWrite, uint64_t Version) {
   /* Calculate size of sections. */
   const uint64_t DataSectionSize =
       __llvm_profile_get_data_size(DataBegin, DataEnd);
diff --git a/llvm/include/llvm/ProfileData/InstrProfData.inc b/llvm/include/llvm/ProfileData/InstrProfData.inc
index 2cdfea9a579a4..d51b58386f168 100644
--- a/llvm/include/llvm/ProfileData/InstrProfData.inc
+++ b/llvm/include/llvm/ProfileData/InstrProfData.inc
@@ -152,7 +152,7 @@ INSTR_PROF_VALUE_NODE(PtrToNodeT, llvm::PointerType::getUnqual(Ctx), Next, \
 #define INSTR_PROF_DATA_DEFINED
 #endif
 INSTR_PROF_RAW_HEADER(uint64_t, Magic, __llvm_profile_get_magic())
-INSTR_PROF_RAW_HEADER(uint64_t, Version, __llvm_profile_get_version())
+INSTR_PROF_RAW_HEADER(uint64_t, Version, Version)
 INSTR_PROF_RAW_HEADER(uint64_t, BinaryIdsSize, __llvm_write_binary_ids(NULL))
 INSTR_PROF_RAW_HEADER(uint64_t, NumData, NumData)
 INSTR_PROF_RAW_HEADER(uint64_t, PaddingBytesBeforeCounters, PaddingBytesBeforeCounters)
diff --git a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp
index a8055979acaa2..ea4be07d0c8c8 100644
--- a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp
+++ b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp
@@ -462,7 +462,10 @@ createIRLevelProfileFlagVar(Module &M,
   auto IRLevelVersionVariable = new GlobalVariable(
       M, IntTy64, true, GlobalValue::WeakAnyLinkage,
       Constant::getIntegerValue(IntTy64, APInt(64, ProfileVersion)), VarName);
-  IRLevelVersionVariable->setVisibility(GlobalValue::HiddenVisibility);
+  if (isGPUProfTarget(M))
+    IRLevelVersionVariable->setVisibility(GlobalValue::ProtectedVisibility);
+  else
+    IRLevelVersionVariable->setVisibility(GlobalValue::HiddenVisibility);
   Triple TT(M.getTargetTriple());
   if (TT.supportsCOMDAT()) {
     IRLevelVersionVariable->setLinkage(GlobalValue::ExternalLinkage);
diff --git a/llvm/test/tools/llvm-profdata/binary-ids-padding.test b/llvm/test/tools/llvm-profdata/binary-ids-padding.test
index 292c582b45c52..f31aa15bfe6c9 100644
--- a/llvm/test/tools/llvm-profdata/binary-ids-padding.test
+++ b/llvm/test/tools/llvm-profdata/binary-ids-padding.test
@@ -1,7 +1,7 @@
 // Header
 //
 // INSTR_PROF_RAW_HEADER(uint64_t, Magic, __llvm_profile_get_magic())
-// INSTR_PROF_RAW_HEADER(uint64_t, Version, __llvm_profile_get_version())
+// INSTR_PROF_RAW_HEADER(uint64_t, Version, Version)
 // INSTR_PROF_RAW_HEADER(uint64_t, BinaryIdsSize, __llvm_write_binary_ids(NULL))
 // INSTR_PROF_RAW_HEADER(uint64_t, DataSize, DataSize)
 // INSTR_PROF_RAW_HEADER(uint64_t, CountersSize, CountersSize)
diff --git a/llvm/test/tools/llvm-profdata/malformed-not-space-for-another-header.test b/llvm/test/tools/llvm-profdata/malformed-not-space-for-another-header.test
index 705e5efaf5875..44be2980bb2f2 100644
--- a/llvm/test/tools/llvm-profdata/malformed-not-space-for-another-header.test
+++ b/llvm/test/tools/llvm-profdata/malformed-not-space-for-another-header.test
@@ -1,7 +1,7 @@
 // Header
 //
 // INSTR_PROF_RAW_HEADER(uint64_t, Magic, __llvm_profile_get_magic())
-// INSTR_PROF_RAW_HEADER(uint64_t, Version, __llvm_profile_get_version())
+// INSTR_PROF_RAW_HEADER(uint64_t, Version, Version)
 // INSTR_PROF_RAW_HEADER(uint64_t, BinaryIdsSize, __llvm_write_binary_ids(NULL))
 // INSTR_PROF_RAW_HEADER(uint64_t, DataSize, DataSize)
 // INSTR_PROF_RAW_HEADER(uint64_t, CountersSize, CountersSize)
diff --git a/llvm/test/tools/llvm-profdata/malformed-num-counters-zero.test b/llvm/test/tools/llvm-profdata/malformed-num-counters-zero.test
index 157c13b926a7e..9af9d65a6bdba 100644
--- a/llvm/test/tools/llvm-profdata/malformed-num-counters-zero.test
+++ b/llvm/test/tools/llvm-profdata/malformed-num-counters-zero.test
@@ -1,7 +1,7 @@
 // Header
 //
 // INSTR_PROF_RAW_HEADER(uint64_t, Magic, __llvm_profile_get_magic())
-// INSTR_PROF_RAW_HEADER(uint64_t, Version, __llvm_profile_get_version())
+// INSTR_PROF_RAW_HEADER(uint64_t, Version, Version)
 // INSTR_PROF_RAW_HEADER(uint64_t, BinaryIdsSize, __llvm_write_binary_ids(NULL))
 // INSTR_PROF_RAW_HEADER(uint64_t, DataSize, DataSize)
 // INSTR_PROF_RAW_HEADER(uint64_t, CountersSize, CountersSize)
diff --git a/llvm/test/tools/llvm-profdata/malformed-ptr-to-counter-array.test b/llvm/test/tools/llvm-profdata/malformed-ptr-to-counter-array.test
index 83cf76f68fb63..49c5ae9b0931d 100644
--- a/llvm/test/tools/llvm-profdata/malformed-ptr-to-counter-array.test
+++ b/llvm/test/tools/llvm-profdata/malformed-ptr-to-counter-array.test
@@ -1,7 +1,7 @@
 // Header
 //
 // INSTR_PROF_RAW_HEADER(uint64_t, Magic, __llvm_profile_get_magic())
-// INSTR_PROF_RAW_HEADER(uint64_t, Version, __llvm_profile_get_version())
+// INSTR_PROF_RAW_HEADER(uint64_t, Version, Version)
 // INSTR_PROF_RAW_HEADER(uint64_t, BinaryIdsSize, __llvm_write_binary_ids(NULL))
 // INSTR_PROF_RAW_HEADER(uint64_t, DataSize, DataSize)
 // INSTR_PROF_RAW_HEADER(uint64_t, CountersSize, CountersSize)
diff --git a/offload/plugins-nextgen/common/include/GlobalHandler.h b/offload/plugins-nextgen/common/include/GlobalHandler.h
index e030ab9e6b61f..562bf5ec223bc 100644
--- a/offload/plugins-nextgen/common/include/GlobalHandler.h
+++ b/offload/plugins-nextgen/common/include/GlobalHandler.h
@@ -13,6 +13,7 @@
 #ifndef LLVM_OPENMP_LIBOMPTARGET_PLUGINS_NEXTGEN_COMMON_GLOBALHANDLER_H
 #define LLVM_OPENMP_LIBOMPTARGET_PLUGINS_NEXTGEN_COMMON_GLOBALHANDLER_H
 
+#include <optional>
 #include <type_traits>
 
 #include "llvm/ADT/DenseMap.h"
@@ -64,10 +65,13 @@ struct __llvm_profile_data {
 };
 
 extern "C" {
-extern int __attribute__((weak)) __llvm_write_custom_profile(
-    const char *Target, const __llvm_profile_data *DataBegin,
-    const __llvm_profile_data *DataEnd, const char *CountersBegin,
-    const char *CountersEnd, const char *NamesBegin, const char *NamesEnd);
+extern int __attribute__((weak))
+__llvm_write_custom_profile(const char *Target,
+                            const __llvm_profile_data *DataBegin,
+                            const __llvm_profile_data *DataEnd,
+                            const char *CountersBegin, const char *CountersEnd,
+                            const char *NamesBegin, const char *NamesEnd,
+                            const uint64_t *VersionOverride);
 }
 
 /// PGO profiling data extracted from a GPU device
@@ -76,6 +80,7 @@ struct GPUProfGlobals {
   SmallVector<__llvm_profile_data> Data;
   SmallVector<uint8_t> NamesData;
   Triple TargetTriple;
+  std::optional<uint64_t> Version;
 
   void dump() const;
   Error write() const;
diff --git a/offload/plugins-nextgen/common/src/GlobalHandler.cpp b/offload/plugins-nextgen/common/src/GlobalHandler.cpp
index 8783490831e25..9b9233c95e567 100644
--- a/offload/plugins-nextgen/common/src/GlobalHandler.cpp
+++ b/offload/plugins-nextgen/common/src/GlobalHandler.cpp
@@ -16,6 +16,7 @@
 
 #include "Shared/Utils.h"
 
+#include "llvm/ProfileData/InstrProfData.inc"
 #include "llvm/Support/Error.h"
 
 #include <cstring>
@@ -214,6 +215,13 @@ GenericGlobalHandlerTy::readProfilingGlobals(GenericDeviceTy &Device,
       if (auto Err = readGlobalFromDevice(Device, Image, DataGlobal))
         return Err;
       DeviceProfileData.Data.push_back(std::move(Data));
+    } else if (*NameOrErr == INSTR_PROF_QUOTE(INSTR_PROF_RAW_VERSION_VAR)) {
+      uint64_t RawVersionData;
+      GlobalTy RawVersionGlobal(NameOrErr->str(), Sym.getSize(),
+                                &RawVersionData);
+      if (auto Err = readGlobalFromDevice(Device, Image, RawVersionGlobal))
+        return Err;
+      DeviceProfileData.Version = RawVersionData;
     }
   }
   return DeviceProfileData;
@@ -265,7 +273,7 @@ void GPUProfGlobals::dump() const {
 }
 
 Error GPUProfGlobals::write() const {
-  if (!__llvm_write_custom_profile)
+  if (__llvm_write_custom_profile == nullptr)
     return Plugin::error("Could not find symbol __llvm_write_custom_profile. "
                          "The compiler-rt profiling library must be linked for "
                          "GPU PGO to work.");
@@ -274,6 +282,8 @@ Error GPUProfGlobals::write() const {
          CountsSize = Counts.size() * sizeof(int64_t);
   __llvm_profile_data *DataBegin, *DataEnd;
   char *CountersBegin, *CountersEnd, *NamesBegin, *NamesEnd;
+  const uint64_t *VersionOverride =
+      Version.has_value() ? &Version.value() : nullptr;
 
   // Initialize array of contiguous data. We need to make sure each section is
   // contiguous so that the PGO library can compute deltas properly
@@ -295,9 +305,9 @@ Error GPUProfGlobals::write() const {
   memcpy(NamesBegin, NamesData.data(), NamesData.size());
 
   // Invoke compiler-rt entrypoint
-  int result = __llvm_write_custom_profile(TargetTriple.str().c_str(),
-                                           DataBegin, DataEnd, CountersBegin,
-                                           CountersEnd, NamesBegin, NamesEnd);
+  int result = __llvm_write_custom_profile(
+      TargetTriple.str().c_str(), DataBegin, DataEnd, CountersBegin,
+      CountersEnd, NamesBegin, NamesEnd, VersionOverride);
   if (result != 0)
     return Plugin::error("Error writing GPU PGO data to file");
 
diff --git a/offload/test/offloading/gpupgo/pgo1.c b/offload/test/offloading/gpupgo/pgo1.c
new file mode 100644
index 0000000000000..c8011cbae83c0
--- /dev/null
+++ b/offload/test/offloading/gpupgo/pgo1.c
@@ -0,0 +1,84 @@
+// RUN: %libomptarget-compile-generic -fcreate-profile \
+// RUN:     -Xarch_device -fprofile-generate
+// RUN: env LLVM_PROFILE_FILE=%basename_t.llvm.profraw \
+// RUN:     %libomptarget-run-generic 2>&1
+// RUN: llvm-profdata show --all-functions --counts \
+// RUN:     %target_triple.%basename_t.llvm.profraw | \
+// RUN:     %fcheck-generic --check-prefix="LLVM-PGO"
+
+// RUN: %libomptarget-compile-generic -fcreate-profile \
+// RUN:     -Xarch_device -fprofile-instr-generate
+// RUN: env LLVM_PROFILE_FILE=%basename_t.clang.profraw \
+// RUN:     %libomptarget-run-generic 2>&1
+// RUN: llvm-profdata show --all-functions --counts \
+// RUN:     %target_triple.%basename_t.clang.profraw | \
+// RUN:     %fcheck-generic --check-prefix="CLANG-PGO"
+
+// REQUIRES: gpu
+// REQUIRES: pgo
+
+int test1(int a) { return a / 2; }
+int test2(int a) { return a * 2; }
+
+int main() {
+  int m = 2;
+#pragma omp target
+  for (int i = 0; i < 10; i++) {
+    m = test1(m);
+    for (int j = 0; j < 2; j++) {
+      m = test2(m);
+    }
+  }
+}
+
+// LLVM-PGO-LABEL: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}}:
+// LLVM-PGO: Hash: {{0[xX][0-9a-fA-F]+}}
+// LLVM-PGO: Counters: 4
+// LLVM-PGO: Block counts: [20, 10, 2, 1]
+
+// LLVM-PGO-LABEL: test1:
+// LLVM-PGO: Hash: {{0[xX][0-9a-fA-F]+}}
+// LLVM-PGO: Counters: 1
+// LLVM-PGO: Block counts: [10]
+
+// LLVM-PGO-LABEL: test2:
+// LLVM-PGO: Hash...
[truncated]

jhuber6

LG with a nit

jhuber6 · 2025-03-11T01:29:22Z

clang/lib/Driver/ToolChain.cpp

+           getArch() != llvm::Triple::armeb &&
+           getArch() != llvm::Triple::thumbeb;


A lot of unrelated formatting changes, please undo those and just use git clang-format HEAD~1.

jhuber6 · 2025-03-18T21:36:21Z

offload/plugins-nextgen/common/src/GlobalHandler.cpp

@@ -265,7 +273,7 @@ void GPUProfGlobals::dump() const {
 }

 Error GPUProfGlobals::write() const {
-  if (!__llvm_write_custom_profile)
+  if (__llvm_write_custom_profile == nullptr)


Suggested change

if (__llvm_write_custom_profile == nullptr)

if (!__llvm_write_custom_profile)

nit. this is idiomatic and shouldn't have been changed.

llvm-ci · 2025-03-20T00:49:44Z

LLVM Buildbot has detected a new failure on builder arc-builder running on arc-worker while building clang,compiler-rt,llvm,offload at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/3/builds/13339

Here is the relevant piece of the build log for the reference

Step 6 (test-build-unified-tree-check-all) failure: 1200 seconds without output running [b'ninja', b'check-all'], attempting to kill
1.227 [59/18/1] Linking CXX executable tools/clang/unittests/Basic/BasicTests
1.355 [58/18/2] Linking CXX executable tools/clang/unittests/Format/FormatTests
5.573 [57/18/3] Linking CXX executable tools/clang/unittests/Lex/LexTests
6.054 [56/18/4] Linking CXX executable tools/clang/unittests/libclang/libclangTests
command timed out: 1200 seconds without output running [b'ninja', b'check-all'], attempting to kill
process killed by signal 9
program finished with exit code -1
elapsedTime=1299.445143

Artem-B · 2025-03-28T19:09:33Z

This is breaking CUDA/NVPTX.
Enabling PGO results in compiler generating PGO-related data which references itself, and NVPTX can't compile those.

E.g. we see data like this which includes a reference to itself:

@__profd__ZN12cuda_helpers13memcmp_kernelEPjS0_mPb = protected global { i64, i64, i64, i64, ptr, ptr, i32, [3 x i16], i32 } { i64 322482019142718985, i64 287486624882456055, i64 sub (i64 ptrtoint (ptr @__profc__ZN12cuda_helpers13memcmp_kernelEPjS0_mPb to i64), i64 ptr
toint (ptr @__profd__ZN12cuda_helpers13memcmp_kernelEPjS0_mPb to i64)), i64 0, ptr @_ZN12cuda_helpers13memcmp_kernelEPjS0_mPb, ptr null, i32 5, [3 x i16] zeroinitializer, i32 0 }, section "__llvm_prf_data", comdat($__profc__ZN12cuda_helpers13memcmp_kernelEPjS0_mPb), a
lign 8

When LLVM sees such self-reference, it crashes. https://godbolt.org/z/o6PTqr1ca

LLVM ERROR: Circular dependency found in global variable set
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.	Program arguments: /opt/compiler-explorer/clang-trunk/bin/llc -o /app/output.s -x86-asm-syntax=intel -march=nvptx64 -mcpu=sm_20 <source>
 #0 0x0000000003ab1a88 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/opt/compiler-explorer/clang-trunk/bin/llc+0x3ab1a88)
 #1 0x0000000003aaf644 SignalHandler(int, siginfo_t*, void*) Signals.cpp:0:0
 #2 0x00007a9fa6c42520 (/lib/x86_64-linux-gnu/libc.so.6+0x42520)
 #3 0x00007a9fa6c969fc pthread_kill (/lib/x86_64-linux-gnu/libc.so.6+0x969fc)
 #4 0x00007a9fa6c42476 gsignal (/lib/x86_64-linux-gnu/libc.so.6+0x42476)
 #5 0x00007a9fa6c287f3 abort (/lib/x86_64-linux-gnu/libc.so.6+0x287f3)
 #6 0x000000000077d6a2 llvm::UniqueStringSaver::save(llvm::Twine const&) (.cold) StringSaver.cpp:0:0
 #7 0x0000000003a079d8 (/opt/compiler-explorer/clang-trunk/bin/llc+0x3a079d8)
 #8 0x00000000017dc57e (/opt/compiler-explorer/clang-trunk/bin/llc+0x17dc57e)
 #9 0x00000000017dc3dd VisitGlobalVariableForEmission(llvm::GlobalVariable const*, llvm::SmallVectorImpl<llvm::GlobalVariable const*>&, llvm::DenseSet<llvm::GlobalVariable const*, llvm::DenseMapInfo<llvm::GlobalVariable const*, void>>&, llvm::DenseSet<llvm::GlobalVariable const*, llvm::DenseMapInfo<llvm::GlobalVariable const*, void>>&) NVPTXAsmPrinter.cpp:0:0
#10 0x00000000017dc6dc llvm::NVPTXAsmPrinter::emitGlobals(llvm::Module const&) (/opt/compiler-explorer/clang-trunk/bin/llc+0x17dc6dc)
#11 0x00000000017dcf7d llvm::NVPTXAsmPrinter::doFinalization(llvm::Module&) (/opt/compiler-explorer/clang-trunk/bin/llc+0x17dcf7d)
#12 0x00000000030a916d llvm::FPPassManager::doFinalization(llvm::Module&) (/opt/compiler-explorer/clang-trunk/bin/llc+0x30a916d)
#13 0x00000000030b48f0 llvm::legacy::PassManagerImpl::run(llvm::Module&) (/opt/compiler-explorer/clang-trunk/bin/llc+0x30b48f0)
#14 0x00000000008b4a08 compileModule(char**, llvm::LLVMContext&) llc.cpp:0:0
#15 0x000000000079237e main (/opt/compiler-explorer/clang-trunk/bin/llc+0x79237e)
#16 0x00007a9fa6c29d90 (/lib/x86_64-linux-gnu/libc.so.6+0x29d90)
#17 0x00007a9fa6c29e40 __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x29e40)
#18 0x00000000008ac9c5 _start (/opt/compiler-explorer/clang-trunk/bin/llc+0x8ac9c5)
Program terminated with signal: SIGSEGV
Compiler returned: 139

Artem-B · 2025-03-28T19:15:23Z

@jhuber6 @jdoerfert I propose reverting the change, unless it can be quickly fixed forward so it does not affect CUDA/NVPTX.

jhuber6 · 2025-03-28T19:17:29Z

I really wish someone from NVIDIA would just fix this stupid limitation already. @EthanLuisMcDonough This type of stuff shows up for things like this. Any way we can modify the source code to work around this? It's fine to put it in a separate global.

int arr[] = {1, arr[0]};

EthanLuisMcDonough · 2025-03-28T19:28:30Z

Thank you all for making me aware of this. I'll look into fixing this as soon as possible.

Artem-B · 2025-03-28T21:05:22Z

The crash is blocking our compiler updates. If nothing depends on this change yet, it would be great to revert the patch and re-land it once it's fixed.

jhuber6 · 2025-03-28T21:18:07Z

The crash is blocking our compiler updates. If nothing depends on this change yet, it would be great to revert the patch and re-land it once it's fixed.

Go ahead and revert it, but I think there were some follow-up patches. Also the pgo1.c and pgo2.c tests were flaky so I'm it's definitely the right move.

EthanLuisMcDonough · 2025-03-28T21:20:08Z

This patch should disable PGO on nvidia targets: #133522. I'll merge it after I'm done running some tests

EthanLuisMcDonough requested review from jdoerfert and jhuber6 June 3, 2024 18:32

jhuber6 reviewed Jun 23, 2024

View reviewed changes

jdenny-ornl mentioned this pull request Sep 30, 2024

Extend kernel-info to emit PGO-based FLOP count #110586

Draft

jdenny-ornl reviewed Oct 2, 2024

View reviewed changes

jhuber6 requested changes Feb 12, 2025

View reviewed changes

jhuber6 reviewed Mar 4, 2025

View reviewed changes

EthanLuisMcDonough requested a review from jhuber6 March 11, 2025 01:25

jdoerfert reviewed Mar 12, 2025

View reviewed changes

EthanLuisMcDonough changed the title ~~[PGO][Offload] Add GPU profiling flags to driver~~ [PGO][Offload] Allow PGO flags to be used on GPU targets Mar 18, 2025

EthanLuisMcDonough force-pushed the gpuprofdriver branch from 2789380 to 6aea96e Compare March 18, 2025 21:21

llvmbot added clang compiler-rt clang:driver PGO llvm:transforms offload labels Mar 18, 2025

[PGO][Offload] Allow PGO flags to be used on GPU targets

3a2047c

EthanLuisMcDonough force-pushed the gpuprofdriver branch from 6aea96e to 3a2047c Compare March 18, 2025 21:34

jhuber6 approved these changes Mar 18, 2025

View reviewed changes

Revert == nullptr check to !

3fcaded

Fix version extraction

298dafc

llvmbot added the clang:codegen label Mar 19, 2025

Manually set Version instead of changing instprof macros

0dd32c3

EthanLuisMcDonough requested a review from petrhosek March 19, 2025 07:33

Revert Version change in instrprofdata.inc

afd16b0

EthanLuisMcDonough merged commit c50d39f into llvm:main Mar 20, 2025
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PGO][Offload] Allow PGO flags to be used on GPU targets #94268

[PGO][Offload] Allow PGO flags to be used on GPU targets #94268

EthanLuisMcDonough commented Jun 3, 2024 •

edited

Loading

github-actions bot commented Jun 3, 2024 •

edited

Loading

jhuber6 left a comment

EthanLuisMcDonough commented Jun 24, 2024

jhuber6 commented Jun 24, 2024

EthanLuisMcDonough commented Jun 24, 2024

jdenny-ornl Oct 2, 2024

jhuber6 Oct 2, 2024

jdenny-ornl commented Oct 3, 2024 •

edited

Loading

EthanLuisMcDonough commented Nov 1, 2024

jdenny-ornl commented Nov 27, 2024

EthanLuisMcDonough commented Nov 27, 2024

jhuber6 left a comment

jhuber6 Mar 4, 2025

jdoerfert Feb 11, 2025

jdoerfert Feb 11, 2025

jdoerfert Feb 11, 2025

jdoerfert Feb 11, 2025

EthanLuisMcDonough Mar 12, 2025

petrhosek Mar 18, 2025

EthanLuisMcDonough Mar 19, 2025

EthanLuisMcDonough Mar 19, 2025

llvmbot commented Mar 18, 2025 •

edited

Loading

jhuber6 left a comment

jhuber6 Mar 11, 2025

jhuber6 Mar 18, 2025

llvm-ci commented Mar 20, 2025

Artem-B commented Mar 28, 2025

Artem-B commented Mar 28, 2025

jhuber6 commented Mar 28, 2025

EthanLuisMcDonough commented Mar 28, 2025

Artem-B commented Mar 28, 2025

jhuber6 commented Mar 28, 2025

EthanLuisMcDonough commented Mar 28, 2025

		@@ -0,0 +1,82 @@
		// RUN: %libomptarget-compile-generic -fprofile-generate-gpu

		getArch() != llvm::Triple::armeb &&
		getArch() != llvm::Triple::thumbeb;

	if (__llvm_write_custom_profile == nullptr)
	if (!__llvm_write_custom_profile)

[PGO][Offload] Allow PGO flags to be used on GPU targets #94268

[PGO][Offload] Allow PGO flags to be used on GPU targets #94268

Conversation

EthanLuisMcDonough commented Jun 3, 2024 • edited Loading

github-actions bot commented Jun 3, 2024 • edited Loading

jhuber6 left a comment

Choose a reason for hiding this comment

EthanLuisMcDonough commented Jun 24, 2024

jhuber6 commented Jun 24, 2024

EthanLuisMcDonough commented Jun 24, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jdenny-ornl commented Oct 3, 2024 • edited Loading

EthanLuisMcDonough commented Nov 1, 2024

jdenny-ornl commented Nov 27, 2024

EthanLuisMcDonough commented Nov 27, 2024

jhuber6 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

llvmbot commented Mar 18, 2025 • edited Loading

jhuber6 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

llvm-ci commented Mar 20, 2025

Artem-B commented Mar 28, 2025

Artem-B commented Mar 28, 2025

jhuber6 commented Mar 28, 2025

EthanLuisMcDonough commented Mar 28, 2025

Artem-B commented Mar 28, 2025

jhuber6 commented Mar 28, 2025

EthanLuisMcDonough commented Mar 28, 2025

EthanLuisMcDonough commented Jun 3, 2024 •

edited

Loading

github-actions bot commented Jun 3, 2024 •

edited

Loading

jdenny-ornl commented Oct 3, 2024 •

edited

Loading

llvmbot commented Mar 18, 2025 •

edited

Loading