Skip to content

Conversation

@ronlieb
Copy link
Collaborator

@ronlieb ronlieb commented Nov 4, 2025

No description provided.

juliannagele and others added 30 commits November 4, 2025 12:39
…t if all extracts would lead to UB on poison. (llvm#164683)

This change aims to avoid inserting a freeze instruction between the
load and bitcast when scalarizing extend-extract. This is particularly
useful in combination with
llvm#164682, which can then
potentially further scalarize, provided there is no freeze.

alive2 proof: https://alive2.llvm.org/ce/z/W-GD88
AMDGCN flavoured SPIR-V has slightly different defaults from what the BE
adopts: it assumes all extensions are enabled, and expects nonsemantic
debug info to be generated. Furthermore, it is necessary to encode in
the resulting SPIR-V binary that what was generated was AMDGCN
flavoured, which we do by setting the Generator Version to `UINT16_MAX`
(which matches what we expect to see at reverse translation). We will
register this generator version at
<https://github.com/KhronosGroup/SPIRV-Headers>. This is a preliminary
patch out of a series of patches that are needed for adopting the BE for
AMDGCN flavoured SPIR-V generation.
…ansfer_write to new offsets syntax (llvm#162095)

Changes the `VectorToXeGPU` pass to generate `xegpu.load_nd/store_nd`
ops using new syntax with where offsets are specified at the load/store
ops level.
```mlir
// from this
%desc = xegpu.create_nd_tdesc %src[%off1, %off2]: memref<8x16xf16> -> !xegpu.tensor_desc<8x16xf16>
%res = xegpu.load_nd %desc : !xegpu.tensor_desc<8x16xf16> -> vector<8x16xf16>

// to this
%desc = xegpu.create_nd_tdesc %src: memref<8x16xf16> -> !xegpu.tensor_desc<8x16xf16>
%res = xegpu.load_nd %desc[%off1, %off2] : !xegpu.tensor_desc<8x16xf16> -> vector<8x16xf16>
```

In order to support cases with dimension reduction at the
`create_nd_tdesc` level (e.g. `memref<8x8x16xf16> ->
tensor_desc<8x16xf16>` it was decided to insert a memref.subview that
collapses the source shape to 2d, for example:

```mlir
// input:
%0 = vector.load %source[%off0, %off1, %off2] : memref<8x16x32xf32>, vector<8x16xf32>

// --vector-to-xegpu (old)
%tdesc = xegpu.create_nd_tdesc %source[%off0, %off1, %off2] : memref<8x16x32xf32> -> tdesc<8x32xf32>
%vec = xegpu.load_nd %tdesc

// --vector-to-xegpu (new)
%collapsed = memref.subview %source[%off0, 0, 0] [1, 16, 32] [1, 1, 1] :
    memref<8x16x32xf32> -> memref<16x32xf32, strided<[32, 1], offset: ?>>
%tdesc = xegpu.create_nd_tdesc %collapsed : memref<16x32xf32, ...> -> tdesc<8x32xf32>
%vec = xegpu.load_nd %tdesc[%off1, %off2]
```

<details><summary>Why we need to change that?</summary>

```mlir
// reduce dim and apply all 3 offsets at load_nd
%desc = xegpu.create_nd_tdesc %source : memref<8x16x32xf32> -> !xegpu.tensor_desc<16x32xf32>
// error: xegpu.load_nd len(offsets) != desc.rank
%res = xegpu.load_nd %desc[%off, %off, %off] : !xegpu.tensor_desc<16x32xf32> -> vector<8x16xf32>
```

</details>

---------

Signed-off-by: dchigarev <dmitry.chigarev@intel.com>
…n of conditions (llvm#165748)

In simplifycfg/cvp/sccp, we eliminate dead edges of switches according
to the knownbits/range info of conditions. However, these approximations
may not meet the real-world needs when the domain of condition values is
sparse. For example, if the condition can only be either -3 or 3, we
cannot prove that the condition never evaluates to 1 (knownbits:
???????1, range: [-3, 4)).
This patch adds a helper function `collectPossibleValues` to enumerate
all the possible values of V. To fix the motivating issue,
`eliminateDeadSwitchCases` will use the result to remove dead edges.

Note: In
https://discourse.llvm.org/t/missed-optimization-due-to-overflow-check/88700
I proposed a new value lattice kind to represent such values. But I find
it hard to apply because the transition becomes much complicated.

Compile-time impact looks neutral:
https://llvm-compile-time-tracker.com/compare.php?from=32d6b2139a6c8f79e074e8c6cfe0cc9e79c4c0c8&to=e47c26e3f1bf9eb062684dda4fafce58438e994b&stat=instructions:u
This patch removes many dead error-handling codes:
dtcxzyw/llvm-opt-benchmark#3012
Closes llvm#165179.
In llvm#165720 we started using a
DWARF API (`llvm::dwarf::getTag`) from `BinaryFormat`. This patch makes
dwarfdump link against the necessary LLVM component.

This fixes following linker error that started occurring on some of the
bots:
```
[7758/8172] Linking CXX executable bin/llvm-dwarfdump
FAILED: bin/llvm-dwarfdump
: && /usr/bin/c++ -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wno-missing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough -Wno-uninitialized -Wno-nonnull -Wno-class-memaccess -Wno-redundant-move -Wno-pessimizing-move -Wno-array-bounds -Wno-stringop-overread -Wno-noexcept-type -Wdelete-non-virtual-dtor -Wsuggest-override -Wno-comment -Wno-misleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -Wl,-rpath-link,/home/botworker/bbot/amdgpu-offload-ubuntu-22-cmake-build-only/build/./lib  -Wl,--gc-sections tools/llvm-dwarfdump/CMakeFiles/llvm-dwarfdump.dir/SectionSizes.cpp.o tools/llvm-dwarfdump/CMakeFiles/llvm-dwarfdump.dir/Statistics.cpp.o tools/llvm-dwarfdump/CMakeFiles/llvm-dwarfdump.dir/llvm-dwarfdump.cpp.o -o bin/llvm-dwarfdump  -Wl,-rpath,"\$ORIGIN/../lib:/home/botworker/bbot/amdgpu-offload-ubuntu-22-cmake-build-only/build/lib:"  lib/libLLVMAMDGPUDesc.so.22.0git  lib/libLLVMSPIRVDesc.so.22.0git  lib/libLLVMX86Desc.so.22.0git  lib/libLLVMAMDGPUInfo.so.22.0git  lib/libLLVMSPIRVInfo.so.22.0git  lib/libLLVMX86Info.so.22.0git  lib/libLLVMDebugInfoDWARF.so.22.0git  lib/libLLVMObject.so.22.0git  lib/libLLVMMC.so.22.0git  lib/libLLVMDebugInfoDWARFLowLevel.so.22.0git  lib/libLLVMTargetParser.so.22.0git  lib/libLLVMSupport.so.22.0git  -Wl,-rpath-link,/home/botworker/bbot/amdgpu-offload-ubuntu-22-cmake-build-only/build/lib && :
/usr/bin/ld: tools/llvm-dwarfdump/CMakeFiles/llvm-dwarfdump.dir/llvm-dwarfdump.cpp.o: undefined reference to symbol '_ZN4llvm5dwarf6getTagENS_9StringRefE'
/usr/bin/ld: /home/botworker/bbot/amdgpu-offload-ubuntu-22-cmake-build-only/build/./lib/libLLVMBinaryFormat.so.22.0git: error adding symbols: DSO missing from command line
collect2: error: ld returned 1 exit status

```
-MG is supposed to suppress "file not found" diagnostics and instead
treat those as generated files for purposes of dependency scanning.
Clang was previously emitting the diagnostic instead of emitting the
name of the embedded file.

Fixes llvm#165632
…ent (llvm#163816)

This was left out of the original patch (llvm#142392) to simplify the
initial implementation. However, after refactoring the SVE
prologue/epilogue code in llvm#162253, it's not much of an extension to
support this case.

The main change here is when restoring the SP from the FP for the SVE
restores, we may need an additional frame offset to move from the start
of the ZPR callee-saves to the start of the PPR callee-saves.

This patch also fixes a previously latent bug where we'd add the
`RealignmentPadding` when allocating the PPR locals, then again for the
ZPR locals. This was unnecessary as the stack only needs to be realigned
after all SVE allocations.
…166263)

Add a customizable `visitBlockTransfer` method to dense forward and
backward dataflow analyses, allowing subclasses to customize lattice
propagation behavior along control flow edges between blocks. Default
implementation preserves existing join/meet semantics.

This change mirrors the exiting structure of both dense dataflow
classes, where `RegionBranchOpInterface` and callables are allowed to be
customized by subclasses.

The use case motivating this change is dense liveness analysis.
Currently, without the customization hook the block transfer function
produces incorrect results. The issue is the current logic doesn't
remove the successor block arguments from the live set, as it only meets
the successor state with the predecessor state (ie. set union).
With this change is now possible to compute the correct result by
specifying the correct logic in `visitBlockTransfer`.

Signed-off-by: Fabian Mora <fabian.mora-cordero@amd.com>
The verifier of `xegpu.{load/store/prefetch}_nd` op fails if `offset` a
mix of static and dynamic values, e.g. `offset = [0, %c0]`. In this case
the length of dynamic offsets is 1 and the check `offsetSize !=
tDescRank` (=2) fails. Instead, we should check the length of
`getMixedOffsets()`.
…m#165791)

Update MMA tests to add run line for `cpu=future` to ensure MMA
functionality is not broken with the new `wacc` register classes
introduced. Previous commit have added def for using the new `wacc`
registers, this just add in testing and fixes a few patterns that was
missing .
…kernel_attributes` (llvm#165891)

This adds BE support for the
[`SPV_INTEL_kernel_attributes`](https://github.khronos.org/SPIRV-Registry/extensions/INTEL/SPV_INTEL_kernel_attributes.html)
extension. The extension is necessary to encode the rather useful
`max_work_group_size` kernel attribute, via `OpExecutionMode
MaxWorkgroupSizeINTEL`, which is the only Execution Mode added by the
extension that this patch adds full processing for. Future patches will
add the other Execution Modes and Capabilities. The test is adapted from
the equivalent Translator test; it depends on llvm#165815.
…lvm#165606)

`clang -x hip foo.c --offload-arch=amdgcnspirv --offload-new-driver
-save-temps` was crashing with the following error:

```
/usr/bin/ld: input file 'foo-x86_64-unknown-linux-gnu.o' is the same as output file
build/bin/clang-linker-wrapper: error: 'ld' failed
```

The `LinkerWrapperJobAction` [is
created](https://github.com/llvm/llvm-project/blob/957598f71bd8baa029d886e59ed9aed60e6e9bb9/clang/lib/Driver/Driver.cpp#L4888)
with `types::TY_Object` which makes `Driver::GetNamedOutputPath` assign
the same name as the assembler's output and thus causing the crash.
Replace the current `ADRRelaxationPass` with `AArch64RelaxationPass`,
which, besides the existing ADR relaxation, will also run LDR relaxation
that for now only handles these two forms of LDR instructions:
`ldr Xt, [label]` and `ldr Wt, [label]`.
According to discussion of
llvm#153600 (comment)

add LLVM_ABI to function getMemcmp declaration
NDD ADD is only supported on 64 bit, but `LEA32` has
`Requires<[Not64BitMode]>`. The reason it doesnt fail upstream is that
the predicates check is commented out on `X86MCInstLower.cpp`:

```
  // FIXME: Enable feature predicate checks once all the test pass.
  // X86_MC::verifyInstructionPredicates(MI->getOpcode(),
  //                                     Subtarget->getFeatureBits());

```

Introduced by: llvm#158254
…66217)

Emit empty line after a namespace scope is opened and before its closed.
Adjust DirectiveEmitter code empty line emission in response to this to
avoid lot of unit test changes.
This patch moves llvm::to_address to STLForwardCompat.h, a collection
of backports from C++20 and beyond.
In C++17, static constexpr members are implicitly inline, so they no
longer require an out-of-line definition.

Once we remove the redundant declarations, Minidump.cpp becomes
effectively empty, so this patch removes the file.

Identified with readability-redundant-declaration.
This patch replaces:

  using Foo = enum { A, B, C };

with the more conventional:

  enum Foo { A, B, C };

These two enum declaration styles are not identical, but their
difference does not matter in these .cpp files.  With the "using Foo"
style, the enum is unnamed and cannot be forward-declared, whereas the
conventional style creates a named enum that can be.  Since these
changes are confined to .cpp files, this distinction has no practical
impact here.
Some followups after llvm#131687 switched to the "runtimes build".

- The `check-sanitizer` build target doesn't exist in the runtimes build; use `check-runtimes` instead.
- ASan is not supported on 32-bit windows. Pass `-DCOMPILER_RT_BUILD_SANITIZERS=OFF`
- `check-runtimes` includes the orcjit tests, which never passed on windows; build with `-DCOMPILER_RT_BUILD_ORC=OFF`
- Various asan and libfuzzer tests fail; suppress them with `LIT_FILTER_OUT`
Enable the `SPV_INTEL_bfloat16_arithmetic` extension, which allows arithmetic, relational and `OpExtInst` instructions to take `bfloat16` arguments. This patch only adds support to arithmetic and relational ops. The extension itself is rather fresh, but `bfloat16` is ubiquitous at this point and not supporting these ops is limiting.
…ng (llvm#166268)

Though we have a few code examples in our documentation that show how to
*use* libclang, we never actually show how to *link* against it. I
myself mostly figured this out through trial and error some time ago,
and I’ve since had to explain it to others on several occasions, so I
thought adding some very minimal CMake example code might be helpful.
…r ops (llvm#163414)

As [suggested
here](llvm#163071 (comment))
the PR adds an optional layout attribute for `LoadGather` and
`StoreScatter` ops.

For the load-op the attribute describes the layout of the result (ex
`layout_result_0`), and for store-op it describes the layout for the
vector-to-store operand (ex `layout_operand_0`).

The PR also reworks `propagate-layout` pass to consider perm layout
attributes and back-propagate them accordingly.

The helper utility function `getDistributeLayoutAttr` is reworked to
return either `layout_operand/result_0` or `layout` for load/store ops
(denepding on which one is set). After an offline discussion decided
that the overall utilities layouts API is confusing since it tries to
mix permament and temporary layouts. Would need to change it in the
future.

---------

Signed-off-by: dchigarev <dmitry.chigarev@intel.com>
…vm#166392)

This fixes a typo introduced in llvm#165606 which makes the test case fail.

---------

Co-authored-by: Joseph Huber <huberjn@outlook.com>
… when navigating up/down (llvm#166394)

When stopped in a hidden frame (either because we selected the hidden
frame or hit a breakpoint inside it), a user most likely is intersted in
exploring the immediate frames around it. But currently issuing
`up`/`down` commands will unconditionally skip over all hidden frames.

This patch makes it so `up`/`down` commands don't skip hidden frames if
the frame we started it was a hidden frame.
This patch deprecates an APInt constructor that has been
soft-deprecated via comments since:

  commit 7a16288
  Author: Jeffrey Yasskin <jyasskin@google.com>
  Date:   Mon Jul 18 21:45:40 2011 +0000

This patch updates a small number of remaining uses.
Reverts llvm#115917 and its follow up llvm#116840.
Fixes llvm#153782 and introduces regression tests.
Reopens llvm#114270.
Directives cannot be nested. A directive sentinel that appears within
another directive should be ignored, and instead fall back to be treated
as a line comment.

Fixes: llvm#165874
@ronlieb ronlieb requested review from a team and dpalermo November 4, 2025 18:17
@z1-cciauto
Copy link
Collaborator

@ronlieb
Copy link
Collaborator Author

ronlieb commented Nov 4, 2025

ticket filed for reverted PR
Revert "[C2y] Support WG14 N3457, the COUNTER macro (
https://ontrack-internal.amd.com/browse/SWDEV-564905

@z1-cciauto z1-cciauto merged commit 7f40630 into amd-staging Nov 4, 2025
8 checks passed
@z1-cciauto z1-cciauto deleted the amd/merge/upstream_merge_20251104105719 branch November 4, 2025 21:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.