merge main into amd-staging #665

z1-cciauto · 2025-11-24T12:06:26Z

No description provided.

Fix warning: llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:1455:23: warning: variable 'Store' set but not used [-Wunused-but-set-variable]

…duct (llvm#168074) A `transform` pass to lower `vector.contract` to (a) `vector.fma` for `F32`, (b) `x86vector.avx512.dot` for `BF16`, (c) `x86vector.avx.dot.i8` for `Int8` packed types. The lowering works on condition with `m`, `batch`, `k` dims to be `one` and `vnni` dim should be `2` for `bf16`; `4` for `int8`. **The lowering pattern**: `batch_reduce.matmul` (input) -> register-tiling(M, N) -> Vectorization (to `vector.contract`) -> `unroll` vector.contract (`unit` dims) -> `hoisting` transformation (move `C` loads/store outside batch/k loop) -> apply `licm`, `canonicalization`, and `bufferize`.

Adds required dependency for `inferContractionDims`. Fixes llvm#168074

Identified with llvm-use-ranges.

Identified with modernize-loop-convert.

…lvm#169245) This patch simplifies iterator_range construction with the conversion constructor.

…es (llvm#169030) We used to create a scope for the true- and false expression of a conditional operator. This was done so e.g. in this example: ```c++ struct A { constexpr A(){}; ~A(); constexpr int get() { return 10; } }; // all-note 2{{declared here}} static_assert( (false ? A().get() : 1) == 1); ``` we did _not_ evaluate the true branch at all, meaning we did not register the local variable for the temporary of type `A`, which means we also didn't call it destructor. However, this breaks the case where the temporary needs to outlive the conditional operator and instead be destroyed via the surrounding `ExprWithCleanups`: ``` constexpr bool test2(bool b) { unsigned long __ms = b ? (const unsigned long &)0 : __ms; return true; } static_assert(test2(true)); ``` Before this patch, we diagnosed this example: ```console ./array.cpp:180:15: error: static assertion expression is not an integral constant expression 180 | static_assert(test2(true)); | ^~~~~~~~~~~ ./array.cpp:177:24: note: read of temporary whose lifetime has ended 177 | unsigned long __ms = b ? (const unsigned long &)0 : __ms; | ^ ./array.cpp:180:15: note: in call to 'test2(true)' 180 | static_assert(test2(true)); | ^~~~~~~~~~~ ./array.cpp:177:51: note: temporary created here 177 | unsigned long __ms = b ? (const unsigned long &)0 : __ms; | ^ 1 error generated. ``` because the temporary created for the true branch got immediately destroyed. The problem in essence is that since the conditional operator doesn't create a scope at all, we register the local variables for both its branches, but we later only execute one of them, which means we should also only destroy the locals of one of the branches. We fix this similar to clang codgen's `is_active` flag: In the case of a conditional operator (which is so far the only case where this is problematic, and this also helps minimize the performance impact of this change), we make local variables as disabled-by-default and then emit a `EnableLocal` opcode later, which marks them as enabled. The code calling their destructors checks whether the local was enabled at all.

…69281) WaitingOnGraphTests.Emit_SingleContainerSimpleCycle tests a pair of emit operations where the second completes a simple cycle (1: A -> B, 2: B -> A). We already had a test of WaitingOnGraph::simplify's behavior in this case, but did not have one for WaitingOnGraph::emit.

…lvm#169279) Fixes llvm#149960

…n auto type with typename (llvm#162514) ASTImporter on importing template specialization with auto return type faces cycle when return type is not nested one, but typename from template arguments and other template. There is code, that prevents cycle to auto return types when nested type declared. Solved case differs somehow from nested types, but have same solution with UsedDifferentProtoType - with delayed return type determining.

Supported Ops: `fadd`, `fsub`

To make it clear that the return value is immutable.

This method returns the current expression being emitted, but is only used testing whether an expression is being emitted or not. This patch therefore replaces it with a boolean isEmittingExpression() method.

Building with GCC I got: ``` <...>/OnDiskGraphDB.cpp:624:18: warning: ‘static {anonymous}::DataRecordHandle {anonymous}::DataRecordHandle::construct(char*, const {anonymous}::DataRecordHandle::Input&)’ defined but not used [-Wunused-function] 624 | DataRecordHandle DataRecordHandle::construct(char *Mem, const Input &I) { | ^~~~~~~~~~~~~~~~ <...>/OnDiskGraphDB.cpp:456:1: warning: ‘static {anonymous}::DataRecordHandle {anonymous}::DataRecordHandle::create(llvm::function_ref<char*(long unsigned int)>, const {anonymous}::DataRecordHandle::Input&)’ defined but not used [-Wunused-function] 456 | DataRecordHandle::create(function_ref<char *(size_t Size)> Alloc, | ^~~~~~~~~~~~~~~~ ``` These implement parts of a class that is defined in an anonymous namespace. All llvm tests passed with them removed.

…69294) This PR fixes the bazel build that went out of sync with the changes introduced in llvm#168074. Signed-off-by: Ingo Müller <ingomueller@google.com>

The implementation is based on the directive tree. Fixes clangd/clangd#1623

…encing blocks (llvm#169208) Objective-C blocks are like lambdas. They have captures, just like lambdas. However, they can also implicitly capture themselves unlike lambdas. This means that when walking the captures of a block, we may end up in infinite recursion. This is not possible with lambdas, but happened in practice with blocks downstream. In this patch, I just use a set to keep track of the visited MemRegions. Note that theoretically, there is nothing preventing usual lambdas or functors from falling for the same trap, but probably slightly more difficult to do so. You would likely need a pointer to itself, etc. I'll not speculate here. This inf recursion was likely caused by llvm#126620, released in clang-21. rdar://162215172

…#167502) This lowers an SVE FMUL of bf16 using the BFMLAL top/bottom instructions rather than extending to an f32 mul. This does require zeroing the accumulator, but requires fewer extends/unpacking.

…lvm#162633) Introducing this utility makes the `__grow_by{,_and_replace}` significantly easier to understand and allows us to migrate away from these functions in the future.

…7670) This allows propagating optimizations to different algorithms by just optimizing the lowest one. This is especially relevant now that we start optimizing how we're iterating through ranges (e.g. the segmented iterator optimizations) and adding assumptions so the compier can better leverage semantics guaranteed by the standard (e.g. `__builtin_assume_dereferenceable`).

In some workloads we see an argument passed on the stack where it is loaded, only for it to be immediately spilled to a different slot on the stack and then reloaded from that spill slot later on. We can avoid the unnecessary spill by marking loads as rematerializable and just directly loading from where the argument was originally passed on the stack. TargetTransformInfo::isReMaterializableImpl checks to make sure that any loads are `MI.isDereferenceableInvariantLoad()`, so we should be able to move the load down to the remat site. This gives a 14.8% reduction in spills in 544.nab_r on rva23u64 -O3, and a few other smaller reductions on llvm-test-suite. I didn't find any benchmarks where the number of spills/reloads increased. Related: llvm#165761

…#157819) This patch introduces the LASX and LSX conversion intrinsics: - __m256 __lasx_cast_128_s (__m128) - __m256d __lasx_cast_128_d (__m128d) - __m256i __lasx_cast_128 (__m128i) - __m256 __lasx_concat_128_s (__m128, __m128) - __m256d __lasx_concat_128_d (__m128, __m128d) - __m256i __lasx_concat_128 (__m128, __m128i) - __m128 __lasx_extract_128_lo_s (__m256) - __m128d __lasx_extract_128_lo_d (__m256d) - __m128i __lasx_extract_128_lo (__m256i) - __m128 __lasx_extract_128_hi_s (__m256) - __m128d __lasx_extract_128_hi_d (__m256d) - __m128i __lasx_extract_128_hi (__m256i) - __m256 __lasx_insert_128_lo_s (__m256, __m128) - __m256d __lasx_insert_128_lo_d (__m256d, __m128d) - __m256i __lasx_insert_128_lo (__m256i, __m128i) - __m256 __lasx_insert_128_hi_s (__m256, __m128) - __m256d __lasx_insert_128_hi_d (__m256d, __m128d) - __m256i __lasx_insert_128_hi (__m256i, __m128i) Relevant GCC patch: https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=c2013267642fea4a6e89b826940c8aa80a76089d

`[[nodiscard]]` should be applied to functions where discarding the return value is most likely a correctness issue. - https://libcxx.llvm.org/CodingGuidelines.html#apply-nodiscard-where-relevant --------- Co-authored-by: Hristo Hristov <zingam@outlook.com>

`[[nodiscard]]` should be applied to functions where discarding the return value is most likely a correctness issue. - https://libcxx.llvm.org/CodingGuidelines.html#apply-nodiscard-where-relevant

…epi128/_mm256_bsrli_epi128 intrinsics (llvm#169309)

…9303) We didn't take `IntAP`/`IntAPS` into account when casting to and from the computation LHS type. This broke the `std/ranges/range.factories/range.iota.view/end.pass.cpp` test.

…llvm#166353) In combination with llvm#149470 this will introduce parallel accumulators when unrolling reductions with vector instructions. See also llvm#166630, which aims to introduce parallel accumulators for FP reductions.

Linaro is doing network maintenance and I don't have an estimated time these will be back online.

…llvm#169316) This is a second attempt to fix the bazel build (after the first in llvm#169294, which was accidentally merged before CI passed). In the first attempt, not all bazel dependencies had been added; this PR should add them all and make CI pass. Signed-off-by: Ingo Müller <ingomueller@google.com>

InstCombine phi scalarization would always create a new binary op with the phi as the first operand, which is not correct for non-commutable binary ops such as sub. This fix preserves the original binary op ordering in the new binary op and adds a test for this behavior. Currently, this transformation can produce silently incorrect IR, and in the case of the added test, would optimize it out entirely.

Avoid adding any given SuperNode SN to its own SuperNode-deps set. This saves us from trying to redundantly merge its dependencies back into itself (a no-op, but a potentially expensive one).

show information about the signal when the user presses `process handle <unix-signal>` i.e ```sh (lldb) process handle SIGWINCH NAME PASS STOP NOTIFY DESCRIPTION =========== ===== ===== ====== =================== SIGWINCH true false false window size changes ``` Wanted to use the existing `GetSignalDescription` but it is expected behaviour to return the signal name if no signal code is passed. It is used in stop info. https://github.com/llvm/llvm-project/blob/65c895dfe084860847e9e220ff9f1b283ebcb289/lldb/source/Target/StopInfo.cpp#L1192-L1195

…168745) Fix the dependency of `CodeGenDAGPatterns::ParseDefaultOperands()` on the particular order of SDNode definitions. Implicit usage of the first definition as a placeholder makes `llvm-tblgen -gen-dag-isel` fail if that SDNode is not usable as an output pattern operator and an instance of `OperandWithDefaultOps` is used in a pattern. Presently, each `OperandWithDefaultOps` record is processed by constructing an instance of TreePattern from its `DefaultOps` argument that has the form `(ops ...)`. Even though the result of processing the root operator of that DAG is not inspected by `ParseDefaultOperands()` function itself, that operator has to be supported by the underlying `TreePattern::ParseTreePattern()` function. For that reason, a temporary DAG is created by replacing the root operator of `DefaultOps` argument with the first SDNode defined, which is usually `def imm : ...` defined in `TargetSelectionDAG.td` file. This results in misleading errors being reported when implementing new `SDNode` types, if the new definition happens to be added before the `def imm : ...` line. The error is reported by several test cases executed by `check-llvm` target, as well as by the regular build, if one of the enabled targets inherit one of its operand types from `OperandWithDefaultOps`: OptionalIntOperand: ../llvm/test/TableGen/DAGDefaultOps.td:28:5: error: In OptionalIntOperand: Cannot use 'unexpected_node' in an output pattern! def OptionalIntOperand: OperandWithDefaultOps<i32, (ops (i32 0))>; This commit implements a dedicated constructor of `TreePattern` to be used if the caller does not care about the particular root operator of the pattern being processed.

…nsfer_read' for PVC & BMG (llvm#168910) The PR changes the `TransferReadLowering` to always use `xegpu.load` (and not `xegpu.load_nd`) for 1D cases as it has more developed interface (e.g. layouts capabilites). Signed-off-by: dchigarev <dmitry.chigarev@intel.com>

z1-cciauto · 2025-11-24T12:07:44Z

PSDB Link: https://compiler-ci.amd.com/job/compiler-psdb-amd-staging/2948

hstk30-hw and others added 30 commits November 24, 2025 12:49

[Sema] Fix Wunused-but-set-variable warning(NFC) (llvm#169220)

13a39ea

Fix warning: llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:1455:23: warning: variable 'Store' set but not used [-Wunused-but-set-variable]

[LoongArch][NFC] Add tests for combining vand(vnot) (llvm#160830)

76e7e9f

[mlir][x86vector] Add missing Linalg dependency (llvm#169280)

d124675

Adds required dependency for `inferContractionDims`. Fixes llvm#168074

[StaticAnalyzer] Use llvm::find_if (NFC) (llvm#169237)

54db657

Identified with llvm-use-ranges.

[mlir] Construct SmallVector with initial values (NFC) (llvm#169239)

67391fc

Identified with llvm-use-ranges.

[Orc] Use a range-based for loop (NFC) (llvm#169240)

2b81e9e

Identified with modernize-loop-convert.

[SPIRV] Use range-based for loops (NFC) (llvm#169241)

7dd531f

Identified with modernize-loop-convert.

[AST] Construct iterator_range with the conversion constructor (NFC) (l…

9ce6fad

…lvm#169245) This patch simplifies iterator_range construction with the conversion constructor.

[clang-format] Handle import when used as template function name (l…

02a997c

…lvm#169279) Fixes llvm#149960

[AMDGPU] Add wave reduce intrinsics for float types - 2 (llvm#168859)

e888cf8

Supported Ops: `fadd`, `fsub`

[IVDesc] Make getCastInsts return an ArrayRef (NFC) (llvm#169021)

1abb055

To make it clear that the return value is immutable.

[mlir][emitc] Refactor getEmittedExpression (NFC) (llvm#168361)

ce70d4b

This method returns the current expression being emitted, but is only used testing whether an expression is being emitted or not. This patch therefore replaces it with a boolean isEmittingExpression() method.

[mlir:x86vector:transform] Fix bazel build after llvm#168074. (llvm#1…

c745a51

…69294) This PR fixes the bazel build that went out of sync with the changes introduced in llvm#168074. Signed-off-by: Ingo Müller <ingomueller@google.com>

[clangd] Implement fold range for #pragma region (llvm#168177)

6413e5a

The implementation is based on the directive tree. Fixes clangd/clangd#1623

[AMDGPU] Add builtins for wave reduction intrinsics (llvm#161816)

4604762

[AArch64][SVE] Add custom lowering for bfloat FMUL (with +bf16) (llvm…

4b65caf

…#167502) This lowers an SVE FMUL of bf16 using the BFMLAL top/bottom instructions rather than extending to an f32 mul. This does require zeroing the accumulator, but requires fewer extends/unpacking.

[libc++] Introduce basic_string::__allocate_long_buffer_for_growing (l…

121e2e9

…lvm#162633) Introducing this utility makes the `__grow_by{,_and_replace}` significantly easier to understand and allows us to migrate away from these functions in the future.

[libc++][list] Applied [[nodiscard]] (llvm#169015)

4c4cf71

`[[nodiscard]]` should be applied to functions where discarding the return value is most likely a correctness issue. - https://libcxx.llvm.org/CodingGuidelines.html#apply-nodiscard-where-relevant

[X86] avx2-builtins.c - add constexpr test coverage for _mm256_bslli_…

74a62b1

…epi128/_mm256_bsrli_epi128 intrinsics (llvm#169309)

[clang][bytecode] Fix compound assign operators for IntAP(S) (llvm#16…

d44d329

…9303) We didn't take `IntAP`/`IntAPS` into account when casting to and from the computation LHS type. This broke the `std/ranges/range.factories/range.iota.view/end.pass.cpp` test.

juliannagele and others added 9 commits November 24, 2025 11:12

[libcxx][ci] Temporarily disable ARM jobs (llvm#169318)

840a43b

Linaro is doing network maintenance and I don't have an estimated time these will be back online.

[ORC] Avoid self-dependence in SuperNode dependence graph. (llvm#169286)

d162c91

Avoid adding any given SuperNode SN to its own SuperNode-deps set. This saves us from trying to redundantly merge its dependencies back into itself (a no-op, but a potentially expensive one).

merge main into amd-staging

5adce54

z1-cciauto requested a review from nicolasvasilache as a code owner November 24, 2025 12:06

z1-cciauto requested a review from a team November 24, 2025 12:06

ronlieb removed the request for review from nicolasvasilache November 24, 2025 12:07

ronlieb approved these changes Nov 24, 2025

View reviewed changes

z1-cciauto merged commit 1f35e37 into amd-staging Nov 24, 2025
15 checks passed

z1-cciauto deleted the upstream_merge_202511240706 branch November 24, 2025 14:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

merge main into amd-staging #665

merge main into amd-staging #665

z1-cciauto commented Nov 24, 2025

Uh oh!

z1-cciauto commented Nov 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

30 participants

merge main into amd-staging #665

merge main into amd-staging #665

Conversation

z1-cciauto commented Nov 24, 2025

Uh oh!

z1-cciauto commented Nov 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

30 participants