-
Notifications
You must be signed in to change notification settings - Fork 13.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PGO][Offload] Allow PGO flags to be used on GPU targets #94268
[PGO][Offload] Allow PGO flags to be used on GPU targets #94268
Conversation
✅ With the latest revision this PR passed the C/C++ code formatter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems to be lots of accidental clang-format
changes. Why do we need new flags for this instead of just using the old ones and changing behavior when the target is a known GPU? I.e. SPIR-V, CUDA, or HSA.
@jhuber6 The clang format errors are mostly due to my local version of I added new flags for GPU PGO specifically because I didn't want to modify the PGO flags' existing behavior. PGO has a significant runtime cost, so I figured it would be best for the end user experience to only enable PGO on the GPU when it was specifically requested. |
Is this something that specifically requires its own flag? Or could we just do |
Right now the |
@@ -0,0 +1,82 @@ | |||
// RUN: %libomptarget-compile-generic -fprofile-generate-gpu |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When targeting a V100, this command fails for me in both pgo1.c and pgo2.c. In the LTO case:
LLVM ERROR: Circular dependency found in global variable set
In the non-LTO case:
fatal error: error in backend: NVPTX aliasee must be a non-kernel function definition
I do not see this problem in PR #93365's pgo1.c.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a limitation of the PTX target, globals cannot reference themselves. Most likely whatever NVIDIA engineer wrote the PTX parser found it annoying to reference something that wasn't fully parsed yet so he just decided to make it an error and here we are. See https://godbolt.org/z/53PP5c5ve.
For some codes, I get the following error for a gfx906:
I see it for OpenMC, but the following is a simpler example:
I can prevent the error by lowering the last -O2 to -O1 or by removing the |
Not sure if this is still relevant, but I think #114617 should fix this issue. The ASM builder was trying to create a reloc directive with the kind |
Thanks, it does seem to fix it. If possible, would you please keep this PR in sync with PR #93365? That should bring in the above fix and generally help me to review against my use cases. |
Sure, I'll sync this branch with my changes in the other PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My preference is that we do not introduce new flags, but use the ones we have already and use -Xarch
as the canonical way to do it.
-fsanitize_instr_generate // Both host and all devices
-Xarch_host -fsanitize_instr_generate // Only on the host
-Xarch_device -fsanitize_instr_generate // Only on the devices
-Xarch_nvptx64 -fsanitize_instr_generate // Only on NVPTX targets
-Xarch_gfx90a -fsanitize_instr_generate // Only on gfx90a
All of this handling should be done for you by the getArgsForToolchain()
interface that's called before we generate the arguments from the tools. So, all this should require is tests + documentation for this usage. You should just be able to replace existing checks on these new flags to the standard ones.
@@ -273,7 +273,7 @@ void GPUProfGlobals::dump() const { | |||
} | |||
|
|||
Error GPUProfGlobals::write() const { | |||
if (!__llvm_write_custom_profile) | |||
if (__llvm_write_custom_profile == NULL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
!ptr
is canonical, but if you want to be explicit use `nullptr.
Group<f_Group>, | ||
Visibility<[ClangOption, CLOption]>, | ||
MetaVarName<"<pathname>">, | ||
HelpText<"Use instrumentation data for profile-guided optimization targeting GPU">; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The existing non-gpu text reads different, on purpose?
clang/lib/Driver/ToolChain.cpp
Outdated
Multilib::flags_list &Result) { | ||
static void getARMMultilibFlags(const Driver &D, const llvm::Triple &Triple, | ||
const llvm::opt::ArgList &Args, | ||
Multilib::flags_list &Result) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unrelated, more below. Don't format the file, use git-clang-format to format only your commit.
@@ -575,6 +575,76 @@ static void addDashXForInput(const ArgList &Args, const InputInfo &Input, | |||
} | |||
} | |||
|
|||
static void addPGOFlagsGPU(const ToolChain &TC, const ArgList &Args, | |||
ArgStringList &CmdArgs) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this copied from somewhere? Can we merge it, e.g., in a template that takes the options enums (CPU vs GPU)?
@@ -152,7 +152,7 @@ INSTR_PROF_VALUE_NODE(PtrToNodeT, llvm::PointerType::getUnqual(Ctx), Next, \ | |||
#define INSTR_PROF_DATA_DEFINED | |||
#endif | |||
INSTR_PROF_RAW_HEADER(uint64_t, Magic, __llvm_profile_get_magic()) | |||
INSTR_PROF_RAW_HEADER(uint64_t, Version, __llvm_profile_get_version()) | |||
INSTR_PROF_RAW_HEADER(uint64_t, Version, Version) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This ensures that a GPU profile is written with the version pulled from the GPU. Previously, if the host used LLVM-level instrumentation and the device has clang-level instrumentation, the GPU profile would use the host's format. This change ensures that is not the case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is a part of public API and is used by a number of users that have their own runtime implementation (for example baremetal users, OS kernels) and this change is going to break all of them. Is there another way we could handle this case that wouldn't require this change and avoid all that churn?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the best solution would be to forbid the user from using different instrumentation levels (LLVM IR vs clang) on the host and device. I can replace the version replacement with a check that ensures the device version matches the host version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fixed the version detection and made sure to set the Version
property the custom value instead of changing the default value.
2789380
to
6aea96e
Compare
@llvm/pr-subscribers-clang-codegen @llvm/pr-subscribers-clang Author: Ethan Luis McDonough (EthanLuisMcDonough) ChangesThis pull request is the third part of an ongoing effort to extends PGO instrumentation to GPU device code and depends on #93365. This PR makes the following changes:
Patch is 26.26 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/94268.diff 19 Files Affected:
diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp
index 1012128085c7a..e0f1206496486 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -6387,11 +6387,7 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA,
Args.AddLastArg(CmdArgs, options::OPT_fconvergent_functions,
options::OPT_fno_convergent_functions);
- // NVPTX/AMDGCN doesn't support PGO or coverage. There's no runtime support
- // for sampling, overhead of call arc collection is way too high and there's
- // no way to collect the output.
- if (!Triple.isNVPTX() && !Triple.isAMDGCN())
- addPGOAndCoverageFlags(TC, C, JA, Output, Args, SanitizeArgs, CmdArgs);
+ addPGOAndCoverageFlags(TC, C, JA, Output, Args, SanitizeArgs, CmdArgs);
Args.AddLastArg(CmdArgs, options::OPT_fclang_abi_compat_EQ);
diff --git a/clang/test/Driver/cuda-no-pgo-or-coverage.cu b/clang/test/Driver/cuda-no-pgo-or-coverage.cu
deleted file mode 100644
index b84587e1e182b..0000000000000
--- a/clang/test/Driver/cuda-no-pgo-or-coverage.cu
+++ /dev/null
@@ -1,33 +0,0 @@
-// Check that profiling/coverage arguments doen't get passed down to device-side
-// compilation.
-//
-//
-// XRUN: not %clang -### --target=x86_64-linux-gnu -c --cuda-gpu-arch=sm_20 \
-// XRUN: -fprofile-generate %s 2>&1 | \
-// XRUN: FileCheck --check-prefixes=CHECK,PROF %s
-//
-// RUN: not %clang -### --target=x86_64-linux-gnu -c --cuda-gpu-arch=sm_20 \
-// RUN: -fprofile-instr-generate %s 2>&1 | \
-// RUN: FileCheck --check-prefixes=CHECK,PROF %s
-//
-// RUN: not %clang -### --target=x86_64-linux-gnu -c --cuda-gpu-arch=sm_20 \
-// RUN: -coverage %s 2>&1 | \
-// RUN: FileCheck --check-prefixes=CHECK,GCOV %s
-//
-// RUN: not %clang -### --target=x86_64-linux-gnu -c --cuda-gpu-arch=sm_20 \
-// RUN: -ftest-coverage %s 2>&1 | \
-// RUN: FileCheck --check-prefixes=CHECK,GCOV %s
-//
-// RUN: not %clang -### --target=x86_64-linux-gnu -c --cuda-gpu-arch=sm_20 \
-// RUN: -fprofile-instr-generate -fcoverage-mapping %s 2>&1 | \
-// RUN: FileCheck --check-prefixes=CHECK,PROF %s
-//
-//
-// CHECK-NOT: error: unsupported option '-fprofile
-// CHECK-NOT: error: invalid argument
-// CHECK-DAG: "-fcuda-is-device"
-// CHECK-NOT: "-f{{[^"/]*coverage.*}}"
-// CHECK-NOT: "-fprofile{{[^"]*}}"
-// CHECK: "-triple" "x86_64-unknown-linux-gnu"
-// PROF: "-fprofile{{.*}}"
-// GCOV: "-coverage-notes-file=
diff --git a/compiler-rt/include/profile/InstrProfData.inc b/compiler-rt/include/profile/InstrProfData.inc
index 2cdfea9a579a4..d51b58386f168 100644
--- a/compiler-rt/include/profile/InstrProfData.inc
+++ b/compiler-rt/include/profile/InstrProfData.inc
@@ -152,7 +152,7 @@ INSTR_PROF_VALUE_NODE(PtrToNodeT, llvm::PointerType::getUnqual(Ctx), Next, \
#define INSTR_PROF_DATA_DEFINED
#endif
INSTR_PROF_RAW_HEADER(uint64_t, Magic, __llvm_profile_get_magic())
-INSTR_PROF_RAW_HEADER(uint64_t, Version, __llvm_profile_get_version())
+INSTR_PROF_RAW_HEADER(uint64_t, Version, Version)
INSTR_PROF_RAW_HEADER(uint64_t, BinaryIdsSize, __llvm_write_binary_ids(NULL))
INSTR_PROF_RAW_HEADER(uint64_t, NumData, NumData)
INSTR_PROF_RAW_HEADER(uint64_t, PaddingBytesBeforeCounters, PaddingBytesBeforeCounters)
diff --git a/compiler-rt/lib/profile/InstrProfiling.h b/compiler-rt/lib/profile/InstrProfiling.h
index 77c8d6c79322d..a90558fdcfbbf 100644
--- a/compiler-rt/lib/profile/InstrProfiling.h
+++ b/compiler-rt/lib/profile/InstrProfiling.h
@@ -310,7 +310,8 @@ int __llvm_write_custom_profile(const char *Target,
const __llvm_profile_data *DataEnd,
const char *CountersBegin,
const char *CountersEnd, const char *NamesBegin,
- const char *NamesEnd);
+ const char *NamesEnd,
+ const uint64_t *VersionOverride);
/*!
* This variable is defined in InstrProfilingRuntime.cpp as a hidden
diff --git a/compiler-rt/lib/profile/InstrProfilingBuffer.c b/compiler-rt/lib/profile/InstrProfilingBuffer.c
index 1c451d7ec7563..b406e8db74f3f 100644
--- a/compiler-rt/lib/profile/InstrProfilingBuffer.c
+++ b/compiler-rt/lib/profile/InstrProfilingBuffer.c
@@ -252,5 +252,6 @@ COMPILER_RT_VISIBILITY int __llvm_profile_write_buffer_internal(
&BufferWriter, DataBegin, DataEnd, CountersBegin, CountersEnd,
BitmapBegin, BitmapEnd, /*VPDataReader=*/0, NamesBegin, NamesEnd,
/*VTableBegin=*/NULL, /*VTableEnd=*/NULL, /*VNamesBegin=*/NULL,
- /*VNamesEnd=*/NULL, /*SkipNameDataWrite=*/0);
+ /*VNamesEnd=*/NULL, /*SkipNameDataWrite=*/0,
+ __llvm_profile_get_version());
}
diff --git a/compiler-rt/lib/profile/InstrProfilingFile.c b/compiler-rt/lib/profile/InstrProfilingFile.c
index 4667c02892505..19467429cf4c3 100644
--- a/compiler-rt/lib/profile/InstrProfilingFile.c
+++ b/compiler-rt/lib/profile/InstrProfilingFile.c
@@ -1273,10 +1273,13 @@ COMPILER_RT_VISIBILITY int __llvm_profile_set_file_object(FILE *File,
return 0;
}
-COMPILER_RT_USED int __llvm_write_custom_profile(
- const char *Target, const __llvm_profile_data *DataBegin,
- const __llvm_profile_data *DataEnd, const char *CountersBegin,
- const char *CountersEnd, const char *NamesBegin, const char *NamesEnd) {
+int __llvm_write_custom_profile(const char *Target,
+ const __llvm_profile_data *DataBegin,
+ const __llvm_profile_data *DataEnd,
+ const char *CountersBegin,
+ const char *CountersEnd, const char *NamesBegin,
+ const char *NamesEnd,
+ const uint64_t *VersionOverride) {
int ReturnValue = 0, FilenameLength, TargetLength;
char *FilenameBuf, *TargetFilename;
const char *Filename;
@@ -1358,10 +1361,15 @@ COMPILER_RT_USED int __llvm_write_custom_profile(
ProfDataWriter fileWriter;
initFileWriter(&fileWriter, OutputFile);
+ uint64_t Version = __llvm_profile_get_version();
+ if (VersionOverride)
+ Version = *VersionOverride;
+
/* Write custom data to the file */
- ReturnValue = lprofWriteDataImpl(
- &fileWriter, DataBegin, DataEnd, CountersBegin, CountersEnd, NULL, NULL,
- lprofGetVPDataReader(), NULL, NULL, NULL, NULL, NamesBegin, NamesEnd, 0);
+ ReturnValue =
+ lprofWriteDataImpl(&fileWriter, DataBegin, DataEnd, CountersBegin,
+ CountersEnd, NULL, NULL, lprofGetVPDataReader(), NULL,
+ NULL, NULL, NULL, NamesBegin, NamesEnd, 0, Version);
closeFileObject(OutputFile);
// Restore SIGKILL.
diff --git a/compiler-rt/lib/profile/InstrProfilingInternal.h b/compiler-rt/lib/profile/InstrProfilingInternal.h
index b100343ca04f9..03df71828b91d 100644
--- a/compiler-rt/lib/profile/InstrProfilingInternal.h
+++ b/compiler-rt/lib/profile/InstrProfilingInternal.h
@@ -160,7 +160,8 @@ int lprofWriteDataImpl(ProfDataWriter *Writer,
VPDataReaderType *VPDataReader, const char *NamesBegin,
const char *NamesEnd, const VTableProfData *VTableBegin,
const VTableProfData *VTableEnd, const char *VNamesBegin,
- const char *VNamesEnd, int SkipNameDataWrite);
+ const char *VNamesEnd, int SkipNameDataWrite,
+ uint64_t Version);
/* Merge value profile data pointed to by SrcValueProfData into
* in-memory profile counters pointed by to DstData. */
diff --git a/compiler-rt/lib/profile/InstrProfilingWriter.c b/compiler-rt/lib/profile/InstrProfilingWriter.c
index 8816a71155511..bcd88b30d050d 100644
--- a/compiler-rt/lib/profile/InstrProfilingWriter.c
+++ b/compiler-rt/lib/profile/InstrProfilingWriter.c
@@ -254,21 +254,21 @@ COMPILER_RT_VISIBILITY int lprofWriteData(ProfDataWriter *Writer,
const VTableProfData *VTableEnd = __llvm_profile_end_vtables();
const char *VNamesBegin = __llvm_profile_begin_vtabnames();
const char *VNamesEnd = __llvm_profile_end_vtabnames();
+ uint64_t Version = __llvm_profile_get_version();
return lprofWriteDataImpl(Writer, DataBegin, DataEnd, CountersBegin,
CountersEnd, BitmapBegin, BitmapEnd, VPDataReader,
NamesBegin, NamesEnd, VTableBegin, VTableEnd,
- VNamesBegin, VNamesEnd, SkipNameDataWrite);
+ VNamesBegin, VNamesEnd, SkipNameDataWrite, Version);
}
-COMPILER_RT_VISIBILITY int
-lprofWriteDataImpl(ProfDataWriter *Writer, const __llvm_profile_data *DataBegin,
- const __llvm_profile_data *DataEnd,
- const char *CountersBegin, const char *CountersEnd,
- const char *BitmapBegin, const char *BitmapEnd,
- VPDataReaderType *VPDataReader, const char *NamesBegin,
- const char *NamesEnd, const VTableProfData *VTableBegin,
- const VTableProfData *VTableEnd, const char *VNamesBegin,
- const char *VNamesEnd, int SkipNameDataWrite) {
+COMPILER_RT_VISIBILITY int lprofWriteDataImpl(
+ ProfDataWriter *Writer, const __llvm_profile_data *DataBegin,
+ const __llvm_profile_data *DataEnd, const char *CountersBegin,
+ const char *CountersEnd, const char *BitmapBegin, const char *BitmapEnd,
+ VPDataReaderType *VPDataReader, const char *NamesBegin,
+ const char *NamesEnd, const VTableProfData *VTableBegin,
+ const VTableProfData *VTableEnd, const char *VNamesBegin,
+ const char *VNamesEnd, int SkipNameDataWrite, uint64_t Version) {
/* Calculate size of sections. */
const uint64_t DataSectionSize =
__llvm_profile_get_data_size(DataBegin, DataEnd);
diff --git a/llvm/include/llvm/ProfileData/InstrProfData.inc b/llvm/include/llvm/ProfileData/InstrProfData.inc
index 2cdfea9a579a4..d51b58386f168 100644
--- a/llvm/include/llvm/ProfileData/InstrProfData.inc
+++ b/llvm/include/llvm/ProfileData/InstrProfData.inc
@@ -152,7 +152,7 @@ INSTR_PROF_VALUE_NODE(PtrToNodeT, llvm::PointerType::getUnqual(Ctx), Next, \
#define INSTR_PROF_DATA_DEFINED
#endif
INSTR_PROF_RAW_HEADER(uint64_t, Magic, __llvm_profile_get_magic())
-INSTR_PROF_RAW_HEADER(uint64_t, Version, __llvm_profile_get_version())
+INSTR_PROF_RAW_HEADER(uint64_t, Version, Version)
INSTR_PROF_RAW_HEADER(uint64_t, BinaryIdsSize, __llvm_write_binary_ids(NULL))
INSTR_PROF_RAW_HEADER(uint64_t, NumData, NumData)
INSTR_PROF_RAW_HEADER(uint64_t, PaddingBytesBeforeCounters, PaddingBytesBeforeCounters)
diff --git a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp
index a8055979acaa2..ea4be07d0c8c8 100644
--- a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp
+++ b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp
@@ -462,7 +462,10 @@ createIRLevelProfileFlagVar(Module &M,
auto IRLevelVersionVariable = new GlobalVariable(
M, IntTy64, true, GlobalValue::WeakAnyLinkage,
Constant::getIntegerValue(IntTy64, APInt(64, ProfileVersion)), VarName);
- IRLevelVersionVariable->setVisibility(GlobalValue::HiddenVisibility);
+ if (isGPUProfTarget(M))
+ IRLevelVersionVariable->setVisibility(GlobalValue::ProtectedVisibility);
+ else
+ IRLevelVersionVariable->setVisibility(GlobalValue::HiddenVisibility);
Triple TT(M.getTargetTriple());
if (TT.supportsCOMDAT()) {
IRLevelVersionVariable->setLinkage(GlobalValue::ExternalLinkage);
diff --git a/llvm/test/tools/llvm-profdata/binary-ids-padding.test b/llvm/test/tools/llvm-profdata/binary-ids-padding.test
index 292c582b45c52..f31aa15bfe6c9 100644
--- a/llvm/test/tools/llvm-profdata/binary-ids-padding.test
+++ b/llvm/test/tools/llvm-profdata/binary-ids-padding.test
@@ -1,7 +1,7 @@
// Header
//
// INSTR_PROF_RAW_HEADER(uint64_t, Magic, __llvm_profile_get_magic())
-// INSTR_PROF_RAW_HEADER(uint64_t, Version, __llvm_profile_get_version())
+// INSTR_PROF_RAW_HEADER(uint64_t, Version, Version)
// INSTR_PROF_RAW_HEADER(uint64_t, BinaryIdsSize, __llvm_write_binary_ids(NULL))
// INSTR_PROF_RAW_HEADER(uint64_t, DataSize, DataSize)
// INSTR_PROF_RAW_HEADER(uint64_t, CountersSize, CountersSize)
diff --git a/llvm/test/tools/llvm-profdata/malformed-not-space-for-another-header.test b/llvm/test/tools/llvm-profdata/malformed-not-space-for-another-header.test
index 705e5efaf5875..44be2980bb2f2 100644
--- a/llvm/test/tools/llvm-profdata/malformed-not-space-for-another-header.test
+++ b/llvm/test/tools/llvm-profdata/malformed-not-space-for-another-header.test
@@ -1,7 +1,7 @@
// Header
//
// INSTR_PROF_RAW_HEADER(uint64_t, Magic, __llvm_profile_get_magic())
-// INSTR_PROF_RAW_HEADER(uint64_t, Version, __llvm_profile_get_version())
+// INSTR_PROF_RAW_HEADER(uint64_t, Version, Version)
// INSTR_PROF_RAW_HEADER(uint64_t, BinaryIdsSize, __llvm_write_binary_ids(NULL))
// INSTR_PROF_RAW_HEADER(uint64_t, DataSize, DataSize)
// INSTR_PROF_RAW_HEADER(uint64_t, CountersSize, CountersSize)
diff --git a/llvm/test/tools/llvm-profdata/malformed-num-counters-zero.test b/llvm/test/tools/llvm-profdata/malformed-num-counters-zero.test
index 157c13b926a7e..9af9d65a6bdba 100644
--- a/llvm/test/tools/llvm-profdata/malformed-num-counters-zero.test
+++ b/llvm/test/tools/llvm-profdata/malformed-num-counters-zero.test
@@ -1,7 +1,7 @@
// Header
//
// INSTR_PROF_RAW_HEADER(uint64_t, Magic, __llvm_profile_get_magic())
-// INSTR_PROF_RAW_HEADER(uint64_t, Version, __llvm_profile_get_version())
+// INSTR_PROF_RAW_HEADER(uint64_t, Version, Version)
// INSTR_PROF_RAW_HEADER(uint64_t, BinaryIdsSize, __llvm_write_binary_ids(NULL))
// INSTR_PROF_RAW_HEADER(uint64_t, DataSize, DataSize)
// INSTR_PROF_RAW_HEADER(uint64_t, CountersSize, CountersSize)
diff --git a/llvm/test/tools/llvm-profdata/malformed-ptr-to-counter-array.test b/llvm/test/tools/llvm-profdata/malformed-ptr-to-counter-array.test
index 83cf76f68fb63..49c5ae9b0931d 100644
--- a/llvm/test/tools/llvm-profdata/malformed-ptr-to-counter-array.test
+++ b/llvm/test/tools/llvm-profdata/malformed-ptr-to-counter-array.test
@@ -1,7 +1,7 @@
// Header
//
// INSTR_PROF_RAW_HEADER(uint64_t, Magic, __llvm_profile_get_magic())
-// INSTR_PROF_RAW_HEADER(uint64_t, Version, __llvm_profile_get_version())
+// INSTR_PROF_RAW_HEADER(uint64_t, Version, Version)
// INSTR_PROF_RAW_HEADER(uint64_t, BinaryIdsSize, __llvm_write_binary_ids(NULL))
// INSTR_PROF_RAW_HEADER(uint64_t, DataSize, DataSize)
// INSTR_PROF_RAW_HEADER(uint64_t, CountersSize, CountersSize)
diff --git a/offload/plugins-nextgen/common/include/GlobalHandler.h b/offload/plugins-nextgen/common/include/GlobalHandler.h
index e030ab9e6b61f..562bf5ec223bc 100644
--- a/offload/plugins-nextgen/common/include/GlobalHandler.h
+++ b/offload/plugins-nextgen/common/include/GlobalHandler.h
@@ -13,6 +13,7 @@
#ifndef LLVM_OPENMP_LIBOMPTARGET_PLUGINS_NEXTGEN_COMMON_GLOBALHANDLER_H
#define LLVM_OPENMP_LIBOMPTARGET_PLUGINS_NEXTGEN_COMMON_GLOBALHANDLER_H
+#include <optional>
#include <type_traits>
#include "llvm/ADT/DenseMap.h"
@@ -64,10 +65,13 @@ struct __llvm_profile_data {
};
extern "C" {
-extern int __attribute__((weak)) __llvm_write_custom_profile(
- const char *Target, const __llvm_profile_data *DataBegin,
- const __llvm_profile_data *DataEnd, const char *CountersBegin,
- const char *CountersEnd, const char *NamesBegin, const char *NamesEnd);
+extern int __attribute__((weak))
+__llvm_write_custom_profile(const char *Target,
+ const __llvm_profile_data *DataBegin,
+ const __llvm_profile_data *DataEnd,
+ const char *CountersBegin, const char *CountersEnd,
+ const char *NamesBegin, const char *NamesEnd,
+ const uint64_t *VersionOverride);
}
/// PGO profiling data extracted from a GPU device
@@ -76,6 +80,7 @@ struct GPUProfGlobals {
SmallVector<__llvm_profile_data> Data;
SmallVector<uint8_t> NamesData;
Triple TargetTriple;
+ std::optional<uint64_t> Version;
void dump() const;
Error write() const;
diff --git a/offload/plugins-nextgen/common/src/GlobalHandler.cpp b/offload/plugins-nextgen/common/src/GlobalHandler.cpp
index 8783490831e25..9b9233c95e567 100644
--- a/offload/plugins-nextgen/common/src/GlobalHandler.cpp
+++ b/offload/plugins-nextgen/common/src/GlobalHandler.cpp
@@ -16,6 +16,7 @@
#include "Shared/Utils.h"
+#include "llvm/ProfileData/InstrProfData.inc"
#include "llvm/Support/Error.h"
#include <cstring>
@@ -214,6 +215,13 @@ GenericGlobalHandlerTy::readProfilingGlobals(GenericDeviceTy &Device,
if (auto Err = readGlobalFromDevice(Device, Image, DataGlobal))
return Err;
DeviceProfileData.Data.push_back(std::move(Data));
+ } else if (*NameOrErr == INSTR_PROF_QUOTE(INSTR_PROF_RAW_VERSION_VAR)) {
+ uint64_t RawVersionData;
+ GlobalTy RawVersionGlobal(NameOrErr->str(), Sym.getSize(),
+ &RawVersionData);
+ if (auto Err = readGlobalFromDevice(Device, Image, RawVersionGlobal))
+ return Err;
+ DeviceProfileData.Version = RawVersionData;
}
}
return DeviceProfileData;
@@ -265,7 +273,7 @@ void GPUProfGlobals::dump() const {
}
Error GPUProfGlobals::write() const {
- if (!__llvm_write_custom_profile)
+ if (__llvm_write_custom_profile == nullptr)
return Plugin::error("Could not find symbol __llvm_write_custom_profile. "
"The compiler-rt profiling library must be linked for "
"GPU PGO to work.");
@@ -274,6 +282,8 @@ Error GPUProfGlobals::write() const {
CountsSize = Counts.size() * sizeof(int64_t);
__llvm_profile_data *DataBegin, *DataEnd;
char *CountersBegin, *CountersEnd, *NamesBegin, *NamesEnd;
+ const uint64_t *VersionOverride =
+ Version.has_value() ? &Version.value() : nullptr;
// Initialize array of contiguous data. We need to make sure each section is
// contiguous so that the PGO library can compute deltas properly
@@ -295,9 +305,9 @@ Error GPUProfGlobals::write() const {
memcpy(NamesBegin, NamesData.data(), NamesData.size());
// Invoke compiler-rt entrypoint
- int result = __llvm_write_custom_profile(TargetTriple.str().c_str(),
- DataBegin, DataEnd, CountersBegin,
- CountersEnd, NamesBegin, NamesEnd);
+ int result = __llvm_write_custom_profile(
+ TargetTriple.str().c_str(), DataBegin, DataEnd, CountersBegin,
+ CountersEnd, NamesBegin, NamesEnd, VersionOverride);
if (result != 0)
return Plugin::error("Error writing GPU PGO data to file");
diff --git a/offload/test/offloading/gpupgo/pgo1.c b/offload/test/offloading/gpupgo/pgo1.c
new file mode 100644
index 0000000000000..c8011cbae83c0
--- /dev/null
+++ b/offload/test/offloading/gpupgo/pgo1.c
@@ -0,0 +1,84 @@
+// RUN: %libomptarget-compile-generic -fcreate-profile \
+// RUN: -Xarch_device -fprofile-generate
+// RUN: env LLVM_PROFILE_FILE=%basename_t.llvm.profraw \
+// RUN: %libomptarget-run-generic 2>&1
+// RUN: llvm-profdata show --all-functions --counts \
+// RUN: %target_triple.%basename_t.llvm.profraw | \
+// RUN: %fcheck-generic --check-prefix="LLVM-PGO"
+
+// RUN: %libomptarget-compile-generic -fcreate-profile \
+// RUN: -Xarch_device -fprofile-instr-generate
+// RUN: env LLVM_PROFILE_FILE=%basename_t.clang.profraw \
+// RUN: %libomptarget-run-generic 2>&1
+// RUN: llvm-profdata show --all-functions --counts \
+// RUN: %target_triple.%basename_t.clang.profraw | \
+// RUN: %fcheck-generic --check-prefix="CLANG-PGO"
+
+// REQUIRES: gpu
+// REQUIRES: pgo
+
+int test1(int a) { return a / 2; }
+int test2(int a) { return a * 2; }
+
+int main() {
+ int m = 2;
+#pragma omp target
+ for (int i = 0; i < 10; i++) {
+ m = test1(m);
+ for (int j = 0; j < 2; j++) {
+ m = test2(m);
+ }
+ }
+}
+
+// LLVM-PGO-LABEL: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}}:
+// LLVM-PGO: Hash: {{0[xX][0-9a-fA-F]+}}
+// LLVM-PGO: Counters: 4
+// LLVM-PGO: Block counts: [20, 10, 2, 1]
+
+// LLVM-PGO-LABEL: test1:
+// LLVM-PGO: Hash: {{0[xX][0-9a-fA-F]+}}
+// LLVM-PGO: Counters: 1
+// LLVM-PGO: Block counts: [10]
+
+// LLVM-PGO-LABEL: test2:
+// LLVM-PGO: Hash...
[truncated]
|
6aea96e
to
3a2047c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LG with a nit
clang/lib/Driver/ToolChain.cpp
Outdated
getArch() != llvm::Triple::armeb && | ||
getArch() != llvm::Triple::thumbeb; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A lot of unrelated formatting changes, please undo those and just use git clang-format HEAD~1
.
@@ -265,7 +273,7 @@ void GPUProfGlobals::dump() const { | |||
} | |||
|
|||
Error GPUProfGlobals::write() const { | |||
if (!__llvm_write_custom_profile) | |||
if (__llvm_write_custom_profile == nullptr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (__llvm_write_custom_profile == nullptr) | |
if (!__llvm_write_custom_profile) |
nit. this is idiomatic and shouldn't have been changed.
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/3/builds/13339 Here is the relevant piece of the build log for the reference
|
This is breaking CUDA/NVPTX. E.g. we see data like this which includes a reference to itself:
When LLVM sees such self-reference, it crashes. https://godbolt.org/z/o6PTqr1ca
|
@jhuber6 @jdoerfert I propose reverting the change, unless it can be quickly fixed forward so it does not affect CUDA/NVPTX. |
I really wish someone from NVIDIA would just fix this stupid limitation already. @EthanLuisMcDonough This type of stuff shows up for things like this. Any way we can modify the source code to work around this? It's fine to put it in a separate global.
|
Thank you all for making me aware of this. I'll look into fixing this as soon as possible. |
The crash is blocking our compiler updates. If nothing depends on this change yet, it would be great to revert the patch and re-land it once it's fixed. |
Go ahead and revert it, but I think there were some follow-up patches. Also the |
This patch should disable PGO on nvidia targets: #133522. I'll merge it after I'm done running some tests |
This pull request is the third part of an ongoing effort to extends PGO instrumentation to GPU device code and depends on #93365. This PR makes the following changes:
__llvm_write_custom_profile
andlprofWriteDataImpl
to allow the PGO version to be overridden