forked from llvm/llvm-project
-
Notifications
You must be signed in to change notification settings - Fork 76
merge main into amd-staging #488
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
z1-cciauto
merged 34 commits into
amd-staging
from
amd/merge/upstream_merge_20251104105719
Nov 4, 2025
Merged
merge main into amd-staging #488
z1-cciauto
merged 34 commits into
amd-staging
from
amd/merge/upstream_merge_20251104105719
Nov 4, 2025
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…t if all extracts would lead to UB on poison. (llvm#164683) This change aims to avoid inserting a freeze instruction between the load and bitcast when scalarizing extend-extract. This is particularly useful in combination with llvm#164682, which can then potentially further scalarize, provided there is no freeze. alive2 proof: https://alive2.llvm.org/ce/z/W-GD88
AMDGCN flavoured SPIR-V has slightly different defaults from what the BE adopts: it assumes all extensions are enabled, and expects nonsemantic debug info to be generated. Furthermore, it is necessary to encode in the resulting SPIR-V binary that what was generated was AMDGCN flavoured, which we do by setting the Generator Version to `UINT16_MAX` (which matches what we expect to see at reverse translation). We will register this generator version at <https://github.com/KhronosGroup/SPIRV-Headers>. This is a preliminary patch out of a series of patches that are needed for adopting the BE for AMDGCN flavoured SPIR-V generation.
…ansfer_write to new offsets syntax (llvm#162095) Changes the `VectorToXeGPU` pass to generate `xegpu.load_nd/store_nd` ops using new syntax with where offsets are specified at the load/store ops level. ```mlir // from this %desc = xegpu.create_nd_tdesc %src[%off1, %off2]: memref<8x16xf16> -> !xegpu.tensor_desc<8x16xf16> %res = xegpu.load_nd %desc : !xegpu.tensor_desc<8x16xf16> -> vector<8x16xf16> // to this %desc = xegpu.create_nd_tdesc %src: memref<8x16xf16> -> !xegpu.tensor_desc<8x16xf16> %res = xegpu.load_nd %desc[%off1, %off2] : !xegpu.tensor_desc<8x16xf16> -> vector<8x16xf16> ``` In order to support cases with dimension reduction at the `create_nd_tdesc` level (e.g. `memref<8x8x16xf16> -> tensor_desc<8x16xf16>` it was decided to insert a memref.subview that collapses the source shape to 2d, for example: ```mlir // input: %0 = vector.load %source[%off0, %off1, %off2] : memref<8x16x32xf32>, vector<8x16xf32> // --vector-to-xegpu (old) %tdesc = xegpu.create_nd_tdesc %source[%off0, %off1, %off2] : memref<8x16x32xf32> -> tdesc<8x32xf32> %vec = xegpu.load_nd %tdesc // --vector-to-xegpu (new) %collapsed = memref.subview %source[%off0, 0, 0] [1, 16, 32] [1, 1, 1] : memref<8x16x32xf32> -> memref<16x32xf32, strided<[32, 1], offset: ?>> %tdesc = xegpu.create_nd_tdesc %collapsed : memref<16x32xf32, ...> -> tdesc<8x32xf32> %vec = xegpu.load_nd %tdesc[%off1, %off2] ``` <details><summary>Why we need to change that?</summary> ```mlir // reduce dim and apply all 3 offsets at load_nd %desc = xegpu.create_nd_tdesc %source : memref<8x16x32xf32> -> !xegpu.tensor_desc<16x32xf32> // error: xegpu.load_nd len(offsets) != desc.rank %res = xegpu.load_nd %desc[%off, %off, %off] : !xegpu.tensor_desc<16x32xf32> -> vector<8x16xf32> ``` </details> --------- Signed-off-by: dchigarev <dmitry.chigarev@intel.com>
…n of conditions (llvm#165748) In simplifycfg/cvp/sccp, we eliminate dead edges of switches according to the knownbits/range info of conditions. However, these approximations may not meet the real-world needs when the domain of condition values is sparse. For example, if the condition can only be either -3 or 3, we cannot prove that the condition never evaluates to 1 (knownbits: ???????1, range: [-3, 4)). This patch adds a helper function `collectPossibleValues` to enumerate all the possible values of V. To fix the motivating issue, `eliminateDeadSwitchCases` will use the result to remove dead edges. Note: In https://discourse.llvm.org/t/missed-optimization-due-to-overflow-check/88700 I proposed a new value lattice kind to represent such values. But I find it hard to apply because the transition becomes much complicated. Compile-time impact looks neutral: https://llvm-compile-time-tracker.com/compare.php?from=32d6b2139a6c8f79e074e8c6cfe0cc9e79c4c0c8&to=e47c26e3f1bf9eb062684dda4fafce58438e994b&stat=instructions:u This patch removes many dead error-handling codes: dtcxzyw/llvm-opt-benchmark#3012 Closes llvm#165179.
In llvm#165720 we started using a DWARF API (`llvm::dwarf::getTag`) from `BinaryFormat`. This patch makes dwarfdump link against the necessary LLVM component. This fixes following linker error that started occurring on some of the bots: ``` [7758/8172] Linking CXX executable bin/llvm-dwarfdump FAILED: bin/llvm-dwarfdump : && /usr/bin/c++ -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wno-missing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough -Wno-uninitialized -Wno-nonnull -Wno-class-memaccess -Wno-redundant-move -Wno-pessimizing-move -Wno-array-bounds -Wno-stringop-overread -Wno-noexcept-type -Wdelete-non-virtual-dtor -Wsuggest-override -Wno-comment -Wno-misleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -Wl,-rpath-link,/home/botworker/bbot/amdgpu-offload-ubuntu-22-cmake-build-only/build/./lib -Wl,--gc-sections tools/llvm-dwarfdump/CMakeFiles/llvm-dwarfdump.dir/SectionSizes.cpp.o tools/llvm-dwarfdump/CMakeFiles/llvm-dwarfdump.dir/Statistics.cpp.o tools/llvm-dwarfdump/CMakeFiles/llvm-dwarfdump.dir/llvm-dwarfdump.cpp.o -o bin/llvm-dwarfdump -Wl,-rpath,"\$ORIGIN/../lib:/home/botworker/bbot/amdgpu-offload-ubuntu-22-cmake-build-only/build/lib:" lib/libLLVMAMDGPUDesc.so.22.0git lib/libLLVMSPIRVDesc.so.22.0git lib/libLLVMX86Desc.so.22.0git lib/libLLVMAMDGPUInfo.so.22.0git lib/libLLVMSPIRVInfo.so.22.0git lib/libLLVMX86Info.so.22.0git lib/libLLVMDebugInfoDWARF.so.22.0git lib/libLLVMObject.so.22.0git lib/libLLVMMC.so.22.0git lib/libLLVMDebugInfoDWARFLowLevel.so.22.0git lib/libLLVMTargetParser.so.22.0git lib/libLLVMSupport.so.22.0git -Wl,-rpath-link,/home/botworker/bbot/amdgpu-offload-ubuntu-22-cmake-build-only/build/lib && : /usr/bin/ld: tools/llvm-dwarfdump/CMakeFiles/llvm-dwarfdump.dir/llvm-dwarfdump.cpp.o: undefined reference to symbol '_ZN4llvm5dwarf6getTagENS_9StringRefE' /usr/bin/ld: /home/botworker/bbot/amdgpu-offload-ubuntu-22-cmake-build-only/build/./lib/libLLVMBinaryFormat.so.22.0git: error adding symbols: DSO missing from command line collect2: error: ld returned 1 exit status ```
-MG is supposed to suppress "file not found" diagnostics and instead treat those as generated files for purposes of dependency scanning. Clang was previously emitting the diagnostic instead of emitting the name of the embedded file. Fixes llvm#165632
…RMW store chain AND its stored value (llvm#166366)
…ent (llvm#163816) This was left out of the original patch (llvm#142392) to simplify the initial implementation. However, after refactoring the SVE prologue/epilogue code in llvm#162253, it's not much of an extension to support this case. The main change here is when restoring the SP from the FP for the SVE restores, we may need an additional frame offset to move from the start of the ZPR callee-saves to the start of the PPR callee-saves. This patch also fixes a previously latent bug where we'd add the `RealignmentPadding` when allocating the PPR locals, then again for the ZPR locals. This was unnecessary as the stack only needs to be realigned after all SVE allocations.
…166263) Add a customizable `visitBlockTransfer` method to dense forward and backward dataflow analyses, allowing subclasses to customize lattice propagation behavior along control flow edges between blocks. Default implementation preserves existing join/meet semantics. This change mirrors the exiting structure of both dense dataflow classes, where `RegionBranchOpInterface` and callables are allowed to be customized by subclasses. The use case motivating this change is dense liveness analysis. Currently, without the customization hook the block transfer function produces incorrect results. The issue is the current logic doesn't remove the successor block arguments from the live set, as it only meets the successor state with the predecessor state (ie. set union). With this change is now possible to compute the correct result by specifying the correct logic in `visitBlockTransfer`. Signed-off-by: Fabian Mora <fabian.mora-cordero@amd.com>
The verifier of `xegpu.{load/store/prefetch}_nd` op fails if `offset` a
mix of static and dynamic values, e.g. `offset = [0, %c0]`. In this case
the length of dynamic offsets is 1 and the check `offsetSize !=
tDescRank` (=2) fails. Instead, we should check the length of
`getMixedOffsets()`.
…m#165791) Update MMA tests to add run line for `cpu=future` to ensure MMA functionality is not broken with the new `wacc` register classes introduced. Previous commit have added def for using the new `wacc` registers, this just add in testing and fixes a few patterns that was missing .
…kernel_attributes` (llvm#165891) This adds BE support for the [`SPV_INTEL_kernel_attributes`](https://github.khronos.org/SPIRV-Registry/extensions/INTEL/SPV_INTEL_kernel_attributes.html) extension. The extension is necessary to encode the rather useful `max_work_group_size` kernel attribute, via `OpExecutionMode MaxWorkgroupSizeINTEL`, which is the only Execution Mode added by the extension that this patch adds full processing for. Future patches will add the other Execution Modes and Capabilities. The test is adapted from the equivalent Translator test; it depends on llvm#165815.
…lvm#165606) `clang -x hip foo.c --offload-arch=amdgcnspirv --offload-new-driver -save-temps` was crashing with the following error: ``` /usr/bin/ld: input file 'foo-x86_64-unknown-linux-gnu.o' is the same as output file build/bin/clang-linker-wrapper: error: 'ld' failed ``` The `LinkerWrapperJobAction` [is created](https://github.com/llvm/llvm-project/blob/957598f71bd8baa029d886e59ed9aed60e6e9bb9/clang/lib/Driver/Driver.cpp#L4888) with `types::TY_Object` which makes `Driver::GetNamedOutputPath` assign the same name as the assembler's output and thus causing the crash.
Replace the current `ADRRelaxationPass` with `AArch64RelaxationPass`, which, besides the existing ADR relaxation, will also run LDR relaxation that for now only handles these two forms of LDR instructions: `ldr Xt, [label]` and `ldr Wt, [label]`.
According to discussion of llvm#153600 (comment) add LLVM_ABI to function getMemcmp declaration
NDD ADD is only supported on 64 bit, but `LEA32` has `Requires<[Not64BitMode]>`. The reason it doesnt fail upstream is that the predicates check is commented out on `X86MCInstLower.cpp`: ``` // FIXME: Enable feature predicate checks once all the test pass. // X86_MC::verifyInstructionPredicates(MI->getOpcode(), // Subtarget->getFeatureBits()); ``` Introduced by: llvm#158254
…66217) Emit empty line after a namespace scope is opened and before its closed. Adjust DirectiveEmitter code empty line emission in response to this to avoid lot of unit test changes.
This patch moves llvm::to_address to STLForwardCompat.h, a collection of backports from C++20 and beyond.
In C++17, static constexpr members are implicitly inline, so they no longer require an out-of-line definition. Once we remove the redundant declarations, Minidump.cpp becomes effectively empty, so this patch removes the file. Identified with readability-redundant-declaration.
This patch replaces:
using Foo = enum { A, B, C };
with the more conventional:
enum Foo { A, B, C };
These two enum declaration styles are not identical, but their
difference does not matter in these .cpp files. With the "using Foo"
style, the enum is unnamed and cannot be forward-declared, whereas the
conventional style creates a named enum that can be. Since these
changes are confined to .cpp files, this distinction has no practical
impact here.
Some followups after llvm#131687 switched to the "runtimes build". - The `check-sanitizer` build target doesn't exist in the runtimes build; use `check-runtimes` instead. - ASan is not supported on 32-bit windows. Pass `-DCOMPILER_RT_BUILD_SANITIZERS=OFF` - `check-runtimes` includes the orcjit tests, which never passed on windows; build with `-DCOMPILER_RT_BUILD_ORC=OFF` - Various asan and libfuzzer tests fail; suppress them with `LIT_FILTER_OUT`
Enable the `SPV_INTEL_bfloat16_arithmetic` extension, which allows arithmetic, relational and `OpExtInst` instructions to take `bfloat16` arguments. This patch only adds support to arithmetic and relational ops. The extension itself is rather fresh, but `bfloat16` is ubiquitous at this point and not supporting these ops is limiting.
…ng (llvm#166268) Though we have a few code examples in our documentation that show how to *use* libclang, we never actually show how to *link* against it. I myself mostly figured this out through trial and error some time ago, and I’ve since had to explain it to others on several occasions, so I thought adding some very minimal CMake example code might be helpful.
…r ops (llvm#163414) As [suggested here](llvm#163071 (comment)) the PR adds an optional layout attribute for `LoadGather` and `StoreScatter` ops. For the load-op the attribute describes the layout of the result (ex `layout_result_0`), and for store-op it describes the layout for the vector-to-store operand (ex `layout_operand_0`). The PR also reworks `propagate-layout` pass to consider perm layout attributes and back-propagate them accordingly. The helper utility function `getDistributeLayoutAttr` is reworked to return either `layout_operand/result_0` or `layout` for load/store ops (denepding on which one is set). After an offline discussion decided that the overall utilities layouts API is confusing since it tries to mix permament and temporary layouts. Would need to change it in the future. --------- Signed-off-by: dchigarev <dmitry.chigarev@intel.com>
…vm#166392) This fixes a typo introduced in llvm#165606 which makes the test case fail. --------- Co-authored-by: Joseph Huber <huberjn@outlook.com>
… when navigating up/down (llvm#166394) When stopped in a hidden frame (either because we selected the hidden frame or hit a breakpoint inside it), a user most likely is intersted in exploring the immediate frames around it. But currently issuing `up`/`down` commands will unconditionally skip over all hidden frames. This patch makes it so `up`/`down` commands don't skip hidden frames if the frame we started it was a hidden frame.
This patch deprecates an APInt constructor that has been soft-deprecated via comments since: commit 7a16288 Author: Jeffrey Yasskin <jyasskin@google.com> Date: Mon Jul 18 21:45:40 2011 +0000 This patch updates a small number of remaining uses.
Reverts llvm#115917 and its follow up llvm#116840. Fixes llvm#153782 and introduces regression tests. Reopens llvm#114270.
Directives cannot be nested. A directive sentinel that appears within another directive should be ignored, and instead fall back to be treated as a line comment. Fixes: llvm#165874
This reverts commit df1d786.
Collaborator
Collaborator
Author
|
ticket filed for reverted PR |
dpalermo
approved these changes
Nov 4, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.