merge main into amd-staging #488

ronlieb · 2025-11-04T18:17:05Z

No description provided.

…t if all extracts would lead to UB on poison. (llvm#164683) This change aims to avoid inserting a freeze instruction between the load and bitcast when scalarizing extend-extract. This is particularly useful in combination with llvm#164682, which can then potentially further scalarize, provided there is no freeze. alive2 proof: https://alive2.llvm.org/ce/z/W-GD88

AMDGCN flavoured SPIR-V has slightly different defaults from what the BE adopts: it assumes all extensions are enabled, and expects nonsemantic debug info to be generated. Furthermore, it is necessary to encode in the resulting SPIR-V binary that what was generated was AMDGCN flavoured, which we do by setting the Generator Version to `UINT16_MAX` (which matches what we expect to see at reverse translation). We will register this generator version at <https://github.com/KhronosGroup/SPIRV-Headers>. This is a preliminary patch out of a series of patches that are needed for adopting the BE for AMDGCN flavoured SPIR-V generation.

…ansfer_write to new offsets syntax (llvm#162095) Changes the `VectorToXeGPU` pass to generate `xegpu.load_nd/store_nd` ops using new syntax with where offsets are specified at the load/store ops level. ```mlir // from this %desc = xegpu.create_nd_tdesc %src[%off1, %off2]: memref<8x16xf16> -> !xegpu.tensor_desc<8x16xf16> %res = xegpu.load_nd %desc : !xegpu.tensor_desc<8x16xf16> -> vector<8x16xf16> // to this %desc = xegpu.create_nd_tdesc %src: memref<8x16xf16> -> !xegpu.tensor_desc<8x16xf16> %res = xegpu.load_nd %desc[%off1, %off2] : !xegpu.tensor_desc<8x16xf16> -> vector<8x16xf16> ``` In order to support cases with dimension reduction at the `create_nd_tdesc` level (e.g. `memref<8x8x16xf16> -> tensor_desc<8x16xf16>` it was decided to insert a memref.subview that collapses the source shape to 2d, for example: ```mlir // input: %0 = vector.load %source[%off0, %off1, %off2] : memref<8x16x32xf32>, vector<8x16xf32> // --vector-to-xegpu (old) %tdesc = xegpu.create_nd_tdesc %source[%off0, %off1, %off2] : memref<8x16x32xf32> -> tdesc<8x32xf32> %vec = xegpu.load_nd %tdesc // --vector-to-xegpu (new) %collapsed = memref.subview %source[%off0, 0, 0] [1, 16, 32] [1, 1, 1] : memref<8x16x32xf32> -> memref<16x32xf32, strided<[32, 1], offset: ?>> %tdesc = xegpu.create_nd_tdesc %collapsed : memref<16x32xf32, ...> -> tdesc<8x32xf32> %vec = xegpu.load_nd %tdesc[%off1, %off2] ``` <details><summary>Why we need to change that?</summary> ```mlir // reduce dim and apply all 3 offsets at load_nd %desc = xegpu.create_nd_tdesc %source : memref<8x16x32xf32> -> !xegpu.tensor_desc<16x32xf32> // error: xegpu.load_nd len(offsets) != desc.rank %res = xegpu.load_nd %desc[%off, %off, %off] : !xegpu.tensor_desc<16x32xf32> -> vector<8x16xf32> ``` </details> --------- Signed-off-by: dchigarev <dmitry.chigarev@intel.com>

…n of conditions (llvm#165748) In simplifycfg/cvp/sccp, we eliminate dead edges of switches according to the knownbits/range info of conditions. However, these approximations may not meet the real-world needs when the domain of condition values is sparse. For example, if the condition can only be either -3 or 3, we cannot prove that the condition never evaluates to 1 (knownbits: ???????1, range: [-3, 4)). This patch adds a helper function `collectPossibleValues` to enumerate all the possible values of V. To fix the motivating issue, `eliminateDeadSwitchCases` will use the result to remove dead edges. Note: In https://discourse.llvm.org/t/missed-optimization-due-to-overflow-check/88700 I proposed a new value lattice kind to represent such values. But I find it hard to apply because the transition becomes much complicated. Compile-time impact looks neutral: https://llvm-compile-time-tracker.com/compare.php?from=32d6b2139a6c8f79e074e8c6cfe0cc9e79c4c0c8&to=e47c26e3f1bf9eb062684dda4fafce58438e994b&stat=instructions:u This patch removes many dead error-handling codes: dtcxzyw/llvm-opt-benchmark#3012 Closes llvm#165179.

In llvm#165720 we started using a DWARF API (`llvm::dwarf::getTag`) from `BinaryFormat`. This patch makes dwarfdump link against the necessary LLVM component. This fixes following linker error that started occurring on some of the bots: ``` [7758/8172] Linking CXX executable bin/llvm-dwarfdump FAILED: bin/llvm-dwarfdump : && /usr/bin/c++ -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wno-missing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough -Wno-uninitialized -Wno-nonnull -Wno-class-memaccess -Wno-redundant-move -Wno-pessimizing-move -Wno-array-bounds -Wno-stringop-overread -Wno-noexcept-type -Wdelete-non-virtual-dtor -Wsuggest-override -Wno-comment -Wno-misleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -Wl,-rpath-link,/home/botworker/bbot/amdgpu-offload-ubuntu-22-cmake-build-only/build/./lib -Wl,--gc-sections tools/llvm-dwarfdump/CMakeFiles/llvm-dwarfdump.dir/SectionSizes.cpp.o tools/llvm-dwarfdump/CMakeFiles/llvm-dwarfdump.dir/Statistics.cpp.o tools/llvm-dwarfdump/CMakeFiles/llvm-dwarfdump.dir/llvm-dwarfdump.cpp.o -o bin/llvm-dwarfdump -Wl,-rpath,"\$ORIGIN/../lib:/home/botworker/bbot/amdgpu-offload-ubuntu-22-cmake-build-only/build/lib:" lib/libLLVMAMDGPUDesc.so.22.0git lib/libLLVMSPIRVDesc.so.22.0git lib/libLLVMX86Desc.so.22.0git lib/libLLVMAMDGPUInfo.so.22.0git lib/libLLVMSPIRVInfo.so.22.0git lib/libLLVMX86Info.so.22.0git lib/libLLVMDebugInfoDWARF.so.22.0git lib/libLLVMObject.so.22.0git lib/libLLVMMC.so.22.0git lib/libLLVMDebugInfoDWARFLowLevel.so.22.0git lib/libLLVMTargetParser.so.22.0git lib/libLLVMSupport.so.22.0git -Wl,-rpath-link,/home/botworker/bbot/amdgpu-offload-ubuntu-22-cmake-build-only/build/lib && : /usr/bin/ld: tools/llvm-dwarfdump/CMakeFiles/llvm-dwarfdump.dir/llvm-dwarfdump.cpp.o: undefined reference to symbol '_ZN4llvm5dwarf6getTagENS_9StringRefE' /usr/bin/ld: /home/botworker/bbot/amdgpu-offload-ubuntu-22-cmake-build-only/build/./lib/libLLVMBinaryFormat.so.22.0git: error adding symbols: DSO missing from command line collect2: error: ld returned 1 exit status ```

-MG is supposed to suppress "file not found" diagnostics and instead treat those as generated files for purposes of dependency scanning. Clang was previously emitting the diagnostic instead of emitting the name of the embedded file. Fixes llvm#165632

…RMW store chain AND its stored value (llvm#166366)

…ent (llvm#163816) This was left out of the original patch (llvm#142392) to simplify the initial implementation. However, after refactoring the SVE prologue/epilogue code in llvm#162253, it's not much of an extension to support this case. The main change here is when restoring the SP from the FP for the SVE restores, we may need an additional frame offset to move from the start of the ZPR callee-saves to the start of the PPR callee-saves. This patch also fixes a previously latent bug where we'd add the `RealignmentPadding` when allocating the PPR locals, then again for the ZPR locals. This was unnecessary as the stack only needs to be realigned after all SVE allocations.

…166263) Add a customizable `visitBlockTransfer` method to dense forward and backward dataflow analyses, allowing subclasses to customize lattice propagation behavior along control flow edges between blocks. Default implementation preserves existing join/meet semantics. This change mirrors the exiting structure of both dense dataflow classes, where `RegionBranchOpInterface` and callables are allowed to be customized by subclasses. The use case motivating this change is dense liveness analysis. Currently, without the customization hook the block transfer function produces incorrect results. The issue is the current logic doesn't remove the successor block arguments from the live set, as it only meets the successor state with the predecessor state (ie. set union). With this change is now possible to compute the correct result by specifying the correct logic in `visitBlockTransfer`. Signed-off-by: Fabian Mora <fabian.mora-cordero@amd.com>

The verifier of `xegpu.{load/store/prefetch}_nd` op fails if `offset` a mix of static and dynamic values, e.g. `offset = [0, %c0]`. In this case the length of dynamic offsets is 1 and the check `offsetSize != tDescRank` (=2) fails. Instead, we should check the length of `getMixedOffsets()`.

…m#165791) Update MMA tests to add run line for `cpu=future` to ensure MMA functionality is not broken with the new `wacc` register classes introduced. Previous commit have added def for using the new `wacc` registers, this just add in testing and fixes a few patterns that was missing .

…kernel_attributes` (llvm#165891) This adds BE support for the [`SPV_INTEL_kernel_attributes`](https://github.khronos.org/SPIRV-Registry/extensions/INTEL/SPV_INTEL_kernel_attributes.html) extension. The extension is necessary to encode the rather useful `max_work_group_size` kernel attribute, via `OpExecutionMode MaxWorkgroupSizeINTEL`, which is the only Execution Mode added by the extension that this patch adds full processing for. Future patches will add the other Execution Modes and Capabilities. The test is adapted from the equivalent Translator test; it depends on llvm#165815.

…lvm#165606) `clang -x hip foo.c --offload-arch=amdgcnspirv --offload-new-driver -save-temps` was crashing with the following error: ``` /usr/bin/ld: input file 'foo-x86_64-unknown-linux-gnu.o' is the same as output file build/bin/clang-linker-wrapper: error: 'ld' failed ``` The `LinkerWrapperJobAction` [is created](https://github.com/llvm/llvm-project/blob/957598f71bd8baa029d886e59ed9aed60e6e9bb9/clang/lib/Driver/Driver.cpp#L4888) with `types::TY_Object` which makes `Driver::GetNamedOutputPath` assign the same name as the assembler's output and thus causing the crash.

Replace the current `ADRRelaxationPass` with `AArch64RelaxationPass`, which, besides the existing ADR relaxation, will also run LDR relaxation that for now only handles these two forms of LDR instructions: `ldr Xt, [label]` and `ldr Wt, [label]`.

According to discussion of llvm#153600 (comment) add LLVM_ABI to function getMemcmp declaration

NDD ADD is only supported on 64 bit, but `LEA32` has `Requires<[Not64BitMode]>`. The reason it doesnt fail upstream is that the predicates check is commented out on `X86MCInstLower.cpp`: ``` // FIXME: Enable feature predicate checks once all the test pass. // X86_MC::verifyInstructionPredicates(MI->getOpcode(), // Subtarget->getFeatureBits()); ``` Introduced by: llvm#158254

…lvm#166220)

…66217) Emit empty line after a namespace scope is opened and before its closed. Adjust DirectiveEmitter code empty line emission in response to this to avoid lot of unit test changes.

This patch moves llvm::to_address to STLForwardCompat.h, a collection of backports from C++20 and beyond.

In C++17, static constexpr members are implicitly inline, so they no longer require an out-of-line definition. Once we remove the redundant declarations, Minidump.cpp becomes effectively empty, so this patch removes the file. Identified with readability-redundant-declaration.

This patch replaces: using Foo = enum { A, B, C }; with the more conventional: enum Foo { A, B, C }; These two enum declaration styles are not identical, but their difference does not matter in these .cpp files. With the "using Foo" style, the enum is unnamed and cannot be forward-declared, whereas the conventional style creates a named enum that can be. Since these changes are confined to .cpp files, this distinction has no practical impact here.

Some followups after llvm#131687 switched to the "runtimes build". - The `check-sanitizer` build target doesn't exist in the runtimes build; use `check-runtimes` instead. - ASan is not supported on 32-bit windows. Pass `-DCOMPILER_RT_BUILD_SANITIZERS=OFF` - `check-runtimes` includes the orcjit tests, which never passed on windows; build with `-DCOMPILER_RT_BUILD_ORC=OFF` - Various asan and libfuzzer tests fail; suppress them with `LIT_FILTER_OUT`

Enable the `SPV_INTEL_bfloat16_arithmetic` extension, which allows arithmetic, relational and `OpExtInst` instructions to take `bfloat16` arguments. This patch only adds support to arithmetic and relational ops. The extension itself is rather fresh, but `bfloat16` is ubiquitous at this point and not supporting these ops is limiting.

…ng (llvm#166268) Though we have a few code examples in our documentation that show how to *use* libclang, we never actually show how to *link* against it. I myself mostly figured this out through trial and error some time ago, and I’ve since had to explain it to others on several occasions, so I thought adding some very minimal CMake example code might be helpful.

…r ops (llvm#163414) As [suggested here](llvm#163071 (comment)) the PR adds an optional layout attribute for `LoadGather` and `StoreScatter` ops. For the load-op the attribute describes the layout of the result (ex `layout_result_0`), and for store-op it describes the layout for the vector-to-store operand (ex `layout_operand_0`). The PR also reworks `propagate-layout` pass to consider perm layout attributes and back-propagate them accordingly. The helper utility function `getDistributeLayoutAttr` is reworked to return either `layout_operand/result_0` or `layout` for load/store ops (denepding on which one is set). After an offline discussion decided that the overall utilities layouts API is confusing since it tries to mix permament and temporary layouts. Would need to change it in the future. --------- Signed-off-by: dchigarev <dmitry.chigarev@intel.com>

…vm#166392) This fixes a typo introduced in llvm#165606 which makes the test case fail. --------- Co-authored-by: Joseph Huber <huberjn@outlook.com>

… when navigating up/down (llvm#166394) When stopped in a hidden frame (either because we selected the hidden frame or hit a breakpoint inside it), a user most likely is intersted in exploring the immediate frames around it. But currently issuing `up`/`down` commands will unconditionally skip over all hidden frames. This patch makes it so `up`/`down` commands don't skip hidden frames if the frame we started it was a hidden frame.

This patch deprecates an APInt constructor that has been soft-deprecated via comments since: commit 7a16288 Author: Jeffrey Yasskin <jyasskin@google.com> Date: Mon Jul 18 21:45:40 2011 +0000 This patch updates a small number of remaining uses.

…S is off

Reverts llvm#115917 and its follow up llvm#116840. Fixes llvm#153782 and introduces regression tests. Reopens llvm#114270.

Directives cannot be nested. A directive sentinel that appears within another directive should be ignored, and instead fall back to be treated as a line comment. Fixes: llvm#165874

This reverts commit df1d786.

z1-cciauto · 2025-11-04T18:17:30Z

PSDB Link: https://compiler-ci.amd.com/job/compiler-psdb-amd-staging/2652

ronlieb · 2025-11-04T18:21:10Z

ticket filed for reverted PR
Revert "[C2y] Support WG14 N3457, the COUNTER macro (
https://ontrack-internal.amd.com/browse/SWDEV-564905

juliannagele and others added 30 commits November 4, 2025 12:39

[X86] bittest-big-integer.ll - add test showing multiple uses of the …

89c2617

…RMW store chain AND its stored value (llvm#166366)

[NFC] add LLVM_ABI to function getMemcmp declaration (llvm#166192)

b258681

According to discussion of llvm#153600 (comment) add LLVM_ABI to function getMemcmp declaration

[NFC][TableGen] Use namespace qualifier to define RecordKeeperImpl (l…

5ba746d

…lvm#166220)

[NFC][TableGen] Emit empty lines after/before namespace scope (llvm#1…

a2495ff

…66217) Emit empty line after a namespace scope is opened and before its closed. Adjust DirectiveEmitter code empty line emission in response to this to avoid lot of unit test changes.

[ADT] Move llvm::to_address to STLForwardCompat.h (NFC) (llvm#166315)

c2269c8

This patch moves llvm::to_address to STLForwardCompat.h, a collection of backports from C++20 and beyond.

[llvm] Proofread MergeFunctions.rst (llvm#166317)

502742b

[NFC] [Build Fix] Fix failing test case due to missing host arch. (ll…

a50d036

…vm#166392) This fixes a typo introduced in llvm#165606 which makes the test case fail. --------- Co-authored-by: Joseph Huber <huberjn@outlook.com>

[LLDB] Don't check for libcxx if LLDB_ENFORCE_STRICT_TEST_REQUIREMENT…

78769d5

…S is off

marco-antognini-sonarsource and others added 4 commits November 4, 2025 17:39

[analyzer] Revert incorrect LazyCoumpoundVal changes (llvm#163461)

cc3ad20

Reverts llvm#115917 and its follow up llvm#116840. Fixes llvm#153782 and introduces regression tests. Reopens llvm#114270.

[Flang] Nested directives are comments (llvm#166348)

2dc0fa1

Directives cannot be nested. A directive sentinel that appears within another directive should be ignored, and instead fall back to be treated as a line comment. Fixes: llvm#165874

merge main into amd-staging

8d8e9eb

Revert "[C2y] Support WG14 N3457, the __COUNTER__ macro (llvm#162662)"

0e0ec98

This reverts commit df1d786.

ronlieb requested review from a team and dpalermo November 4, 2025 18:17

dpalermo approved these changes Nov 4, 2025

View reviewed changes

z1-cciauto merged commit 7f40630 into amd-staging Nov 4, 2025
8 checks passed

z1-cciauto deleted the amd/merge/upstream_merge_20251104105719 branch November 4, 2025 21:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

merge main into amd-staging #488

merge main into amd-staging #488

Uh oh!

ronlieb commented Nov 4, 2025

Uh oh!

z1-cciauto commented Nov 4, 2025

Uh oh!

ronlieb commented Nov 4, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

26 participants

merge main into amd-staging #488

merge main into amd-staging #488

Uh oh!

Conversation

ronlieb commented Nov 4, 2025

Uh oh!

z1-cciauto commented Nov 4, 2025

Uh oh!

ronlieb commented Nov 4, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

26 participants