merge main into amd-staging #699

z1-cciauto · 2025-11-27T13:19:28Z

No description provided.

This is the followup of llvm#168135

`[[nodiscard]]` should be applied to functions where discarding the return value is most likely a correctness issue. - https://libcxx.llvm.org/CodingGuidelines.html#apply-nodiscard-where-relevant

…thIf` (llvm#169606) This commit adds support for `replaceUsesWithIf` (and variants such as `replaceAllUsesExcept`) to the `ConversionPatternRewriter`. This API is supported only in no-rollback mode. An assertion is triggered in rollback mode. (This missing assertion has been confusing for users because it seemed that the API supported, while it was actually not working properly.) This commit brings us a bit closer towards removing [this](https://github.com/llvm/llvm-project/blob/76ec25f729fcc7ae576caf21293cc393e68e7cf7/mlir/lib/Transforms/Utils/DialectConversion.cpp#L1214) workaround. Additional changes are needed to support this API in rollback mode. In particular, no entries should be added to the `ConversionValueMapping` for conditional replacements. It's unclear at this point if this API can be supported in rollback mode, so this is deferred to later. This commit turns `replaceUsesWithIf` into a virtual function, so that the `ConversionPatternRewriter` can override it. All other API functions for conditional value replacements call that function. Note for LLVM integration: If you are seeing failed assertions due to this change, you are using unsupported API in your dialect conversion. You have 3 options: (1) Migrate to the no-rollback driver. (2) Rewrite your patterns without the unsupported API. (3) Last resort: bypass the rewriter and call `replaceUsesWithIf` etc. directly on the `Value` object.

`[[nodiscard]]` should be applied to functions where discarding the return value is most likely a correctness issue. - https://libcxx.llvm.org/CodingGuidelines.html#apply-nodiscard-where-relevant

…lvm#169744) # Summary This is a forward fix for test errors from llvm#163653. The PR moved debugger initialization outside of InitializeRequestHandler, and into Launch/AttachRequestHandlers to support DAP sessions sharing debugger instances for dynamically created targets. However, DExTer's DAP class seemed to set breakpoints before the debugger was initialized, which caused the tests to hang waiting for a breakpoint to hit due to none of the breakpoints getting resolved. # Tests ``` bin/llvm-lit -v /home/qxy11/llvm/llvm-project/cross-project-tests/debuginfo-tests/dexter-tests/ ```

Fixes memory leak uncovered by llvm#133533.

…tiple times in the same basic block (llvm#169141) Fixes llvm#167500.

…69005) This change adds the `RN` and `RZ` rounding modes to the `convert.f32x2.to.f16x2` and `convert.f32x2.to.bf16x2` Ops. Tests are added in `convert_fp16x2.mlir` and `invalid_convert_fp16x2.mlir`. Tests with these Ops in `convert_stochastic_rounding.mlir` and `invalid-convert-stochastic-rounding.mlir` have been removed or modified. PTX spec reference: https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-cvt

… types (llvm#168188) Add createIntrinsicCall overload that accepts return type and arguments, automatically resolve overload types rather than requiring manual computation. Simplifies NVVM_PrefetchOp by removing conditional overload logic.

… macro arguments (llvm#169757) When the start and end token are both spelled in macro arguments, we still want to reject the range if they come from two separate macro arguments, as the original specified range is not precisely spelled in a single sequence of characters in source.

…vm#169517) This commit extends the CGProfile module flags export with support for missing function references. Previously, this caused a crash and now it's properly exported to `null` values in the metadata node. Fixes: llvm#160717

We emit this diagnostic from CheckPointerToIntegralCast() already, so remove the emission from CastPointerIntegral().

- Mass-reformat tests in `std/utilities/optional/optional.object/optional.object.ctor` and rearrange header `#include`s - No functional changes - Prelude for llvm#169203

Use const& in range-based for loop to avoid unnecessary copies

…specified layouts (llvm#169267) Introduce anchor layout for XeGPU anchor ops: load_nd, store_nd, prefetch_nd, dpas, load, store, prefetch, load_matrix, store_matrix, and atomic_rmw. Anchor layout is permanent, and is guaranteed to be honored by XeGPU distribution and lowerinngs once specified. 1. Add anchor_layout for XeGPU anchor OPs: load_nd, store_nd, prefetch_nd, dpas, load, store, prefetch, load_matrix, store_matrix, and atomic_rmw. 2. rename layout attributes to anchor_layout for these ops: load, store, load_matrix, store_matrix 3. update layout propagation pass: Only when user doesn't specify anchor layout, the pass computes a default layout and set to anchor op's permant layout and use that for propagation. if user specified anchor layout, the pass takes user-specified anchor layout. permant layout and use that for propagation. if user specified anchor layout, the pass takes user-specified anchor layout.

…ax_lanes (llvm#169293) On RISC-V, some loops that the loop vectorizer vectorizes pre-LTO may turn out to have the exact trip count exposed after LTO, see llvm#164762. If the trip count is small enough we can fold away the @llvm.experimental.get.vector.length intrinsic based on this corollary from the LangRef: > If %cnt is less than or equal to %max_lanes, the return value is equal to %cnt. This on its own doesn't remove the @llvm.experimental.get.vector.length in llvm#164762 since we also need to teach computeKnownBits about @llvm.experimental.get.vector.length and the sub recurrence, but this PR is a starting point. I've added this in InstCombine rather than InstSimplify since we may need to insert a truncation (@llvm.experimental.get.vector.length can take an i64 %cnt argument, the result is always i32). Note that there was something similar done in VPlan in llvm#167647 for when the loop vectorizer knows the trip count.

@c-rhodes

This patch adds recognition of high-half multiply by parts into a single larger multiply. Considering a multiply made up of high and low parts, we can split the multiply into: x * y == (xh*T + xl) * (yh*T + yl) where `xh == x>>32` and `xl == x & 0xffffffff`. `T = 2^32`. This expands to xh*yh*T*T + xh*yl*T + xl*yh*T + xl*yl which I find it helpful to be drawn as [ xh*yh ] [ xh*yl ] [ xl*yh ] [ xl*yl ] We are looking for the "high" half, which is xh*yh + xh*yl>>32 + xl*yh>>32 + carrys. The carry makes this difficult and there are multiple ways of representing it. The ones we attempt to support here are: Carry: xh*yh + carry + lowsum carry = lowsum < xh*yl ? 0x1000000 : 0 lowsum = xh*yl + xl*yh + (xl*yl>>32) Ladder: xh*yh + c2>>32 + c3>>32 c2 = xh*yl + (xl*yl >> 32); c3 = c2&0xffffffff + xl*yh Carry4: xh*yh + carry + crosssum>>32 + (xl*yl + crosssum&0xffffffff) >> 32 crosssum = xh*yl + xl*yh carry = crosssum < xh*yl ? 0x1000000 : 0 Ladder4: xh*yh + (xl*yh)>>32 + (xh*yl)>>32 + low>>32; low = (xl*yl)>>32 + (xl*yh)&0xffffffff + (xh*yl)&0xfffffff They all start by matching `xh*yh` + 2 or 3 other operands. The bottom of the tree is `xh*yh`, `xh*yl`, `xl*yh` and `xl*yl`. Based on llvm#156879 by @c-rhodes

With EVL tail folding, the LastActiveLane can be computed with EVL - 1. This removes the need for a header mask and vfirst.m for loops with live outs on RISC-V: # %bb.5: # %for.cond.cleanup7 - vsetvli zero, zero, e32, m2, ta, ma - vmv.v.x v8, s1 - vmsleu.vv v10, v8, v22 - vfirst.m a0, v10 - srli a1, a0, 63 - czero.nez a0, a0, a1 - czero.eqz a1, s8, a1 - or a0, a0, a1 - addi a0, a0, -1 - vsetvli zero, zero, e64, m4, ta, ma - vslidedown.vx v8, v12, a0 + addi s1, s1, -1 + vslidedown.vx v8, v12, s1

…#169776) This commit fixes the import of `branch_weights` metadata from LLVM IR to the LLVM dialect. Previously, `branch_weights` metadata containing the `!"expected"` field were rejected because the importer expected integer weights at operand 1, but found a string.

As far as I can tell the llvm.arm.mve.vminnm.m intrinsic used in these tests was the pre-upstream name of llvm.arm.mve.min.predicated. The tests should not need IR sections, so remove them just relying on the MIR portions.

This tests show how type-checking is performed for `__builtin_amdgcn_load_to_lds`, but not for `__builtin_amdgcn_raw_ptr_buffer_load_lds`, `__builtin_amdgcn_struct_ptr_buffer_load_lds` and `__builtin_amdgcn_global_load_lds` since they are declared with the 't' attribute. Stacked on top of: llvm#165387

…ed by no-return blocks (llvm#167548) At present, the shrink wrapping pass misses opportunities to shrink wrap in the presence of machine basic blocks which exit the function without returning. Such cases arise from C++ functions like the following: ```cxx int foo(int err, void* ptr) { if (err == -1) { if (ptr == nullptr) { throw MyException("Received `nullptr`!", __FILE__, __LINE__); } handle(ptr); } return STATUS_OK; } ``` In particular, assuming `MyException`'s constructor is not marked `noexcept`, the above code will generate a trivial EH landing pad calling `__cxa_free_exception()` and rethrowing the unhandled internal exception, exiting the function without returning. As such, the shrink wrapping pass refuses to touch the above function, spilling to the stack on every call, even though no CSRs are clobbered on the hot path. This patch tweaks the shrink wrapping logic to enable the pass to fire in this and similar cases.

… AVX512 VPMULTISHIFTQB intrinsics to be used in constexpr (llvm#168995) Resolves llvm#167477

The natural assumption is that there's some sort of order here and having people read the reference manual before the basic tutorial does not make sense to me.

…mentwise_sqrt versions (llvm#165682)

Currently LLVM fails to recognize a manual implementation of `phadd` https://godbolt.org/z/zozrssaWb ```llvm declare <8 x i16> @llvm.x86.ssse3.phadd.sw.128(<8 x i16>, <8 x i16>) declare <8 x i16> @llvm.sadd.sat.v8i16(<8 x i16>, <8 x i16>) define <8 x i16> @phaddsw_v8i16_intrinsic(<8 x i16> %a, <8 x i16> %b) { entry: %res = call <8 x i16> @llvm.x86.ssse3.phadd.sw.128(<8 x i16> %a, <8 x i16> %b) ret <8 x i16> %res } define <8 x i16> @phaddsw_v8i16_generic(<8 x i16> %a, <8 x i16> %b) { entry: %even = shufflevector <8 x i16> %a, <8 x i16> %b, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14> %odd = shufflevector <8 x i16> %a, <8 x i16> %b, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15> %sum = call <8 x i16> @llvm.sadd.sat.v8i16(<8 x i16> %even, <8 x i16> %odd) ret <8 x i16> %sum } ``` ```asm phaddsw_v8i16_intrinsic: # @phaddsw_v8i16_intrinsic phaddsw xmm0, xmm1 ret phaddsw_v8i16_generic: # @phaddsw_v8i16_generic movdqa xmm2, xmmword ptr [rip + .LCPI1_0] # xmm2 = [0,1,4,5,8,9,12,13,8,9,12,13,12,13,14,15] movdqa xmm3, xmm1 pshufb xmm3, xmm2 movdqa xmm4, xmm0 pshufb xmm4, xmm2 punpcklqdq xmm4, xmm3 # xmm4 = xmm4[0],xmm3[0] psrad xmm1, 16 psrad xmm0, 16 packssdw xmm0, xmm1 paddsw xmm0, xmm4 ret ``` This PR does recognize the pattern.

`[[nodiscard]]` should be applied to functions where discarding the return value is most likely a correctness issue. - https://libcxx.llvm.org/CodingGuidelines.html#apply-nodiscard-where-relevant

…set (llvm#169786) We can't access the RecordLayout of an invalid decl, so return failure if that happens. Fixes llvm#167076

Extends test coverage to include different start and step values, as well as interleaving.

`[[nodiscard]]` should be applied to functions where discarding the return value is most likely a correctness issue. - https://libcxx.llvm.org/CodingGuidelines.html#apply-nodiscard-where-relevant

This should unbreak that test on 32bit builders, e.g. https://lab.llvm.org/buildbot/#/builders/154/builds/24509

…lvm#162295) Fixes llvm#160312

…169781) Let standard casting / builtin_convertvector handle the conversions from BF16 to F32 My only query is how to best implement _mm_cvtpbh_ps - I went for the v8bf16 -> v8f32 conversion followed by subvector extraction in the end, but could just as easily extract a v4bf16 first - makes no difference to final optimized codegen. First part of llvm#154911

…9670) Remove explicit VT numbers from ValueTypes.td so that patches that add a new VT do not have to renumber the entire file. In TableGen VTs are now identified by ValueType.LLVMName instead of ValueType.Value. This is important for target-defined types (typically based on PtrValueType) which are not mentioned in ValueTypes.td itself.

Adding support for serializing the ada entry flags helps with mir based test cases. Without this change, the flags are simple displayed as being "unkmown".

)

z1-cciauto · 2025-11-27T13:20:56Z

PSDB Link: https://compiler-ci.amd.com/job/compiler-psdb-amd-staging/3001

cmc-rep and others added 30 commits November 26, 2025 17:34

[LoadStoreVectorizer] Fix one-element vector handling (llvm#169671)

ceba82f

This is the followup of llvm#168135

[libc++][queue] Applied [[nodiscard]] (llvm#169469)

b028dac

`[[nodiscard]]` should be applied to functions where discarding the return value is most likely a correctness issue. - https://libcxx.llvm.org/CodingGuidelines.html#apply-nodiscard-where-relevant

[flang] Use default constructor for FIRToSCF pass (llvm#169741)

bd643bc

[libc++][mdspan] Applied [[nodiscard]] (llvm#169326)

bacca23

`[[nodiscard]]` should be applied to functions where discarding the return value is most likely a correctness issue. - https://libcxx.llvm.org/CodingGuidelines.html#apply-nodiscard-where-relevant

Add missing freeConstants() call for ConstantPtrAuths.

b3428bb

Fixes memory leak uncovered by llvm#133533.

[ReplaceConstant] Don't create instructions for the same constant mul…

e7dec23

…tiple times in the same basic block (llvm#169141) Fixes llvm#167500.

[clang][bytecode] Remove double diagnostic emission (llvm#169658)

6696e0c

We emit this diagnostic from CheckPointerToIntegralCast() already, so remove the emission from CastPointerIntegral().

[libc++] Reformat optional constructor tests (llvm#169231)

f6712b6

- Mass-reformat tests in `std/utilities/optional/optional.object/optional.object.ctor` and rearrange header `#include`s - No functional changes - Prelude for llvm#169203

[NFC][SampleFDO] Use const& to avoid copies (llvm#164584)

a9cc7fe

Use const& in range-based for loop to avoid unnecessary copies

[LoongArch][NFC] Pre-commit tests for vector rotl/rotr (llvm#161115)

f1ddb2f

[ARM] Remove IR from mve vpt mir tests. NFC

dc8311f

As far as I can tell the llvm.arm.mve.vminnm.m intrinsic used in these tests was the pre-upstream name of llvm.arm.mve.min.predicated. The tests should not need IR sections, so remove them just relying on the MIR portions.

[Clang] VectorExprEvaluator::VisitCallExpr / InterpretBuiltin - Allow…

c3c3d16

… AVX512 VPMULTISHIFTQB intrinsics to be used in constexpr (llvm#168995) Resolves llvm#167477

[llvm][Tablegen] Link to tutorial before programmer's reference

0b16512

The natural assumption is that there's some sort of order here and having people read the reference manual before the basic tutorial does not make sense to me.

[NFC][LLVM] Add bitcode tests for llvm.aarch64.sve.rev

8401a8d

[Clang] Replace some x86 sqrt builtins with the generic __builtin_ele…

0dbedd1

…mentwise_sqrt versions (llvm#165682)

[libc++][deque] Applied [[nodiscard]] (llvm#169745)

d6be9fc

`[[nodiscard]]` should be applied to functions where discarding the return value is most likely a correctness issue. - https://libcxx.llvm.org/CodingGuidelines.html#apply-nodiscard-where-relevant

tbaederr and others added 10 commits November 27, 2025 13:06

[clang][bytecode] Check for invalid record decls in IntPointer::atOff…

bd95a74

…set (llvm#169786) We can't access the RecordLayout of an invalid decl, so return failure if that happens. Fixes llvm#167076

[LV] Test more combinations of scalar stores using last lane of IV.

682f292

Extends test coverage to include different start and step values, as well as interleaving.

[libc++][flat_set] Applied [[nodiscard]] (llvm#169739)

df80612

`[[nodiscard]]` should be applied to functions where discarding the return value is most likely a correctness issue. - https://libcxx.llvm.org/CodingGuidelines.html#apply-nodiscard-where-relevant

[clang][bytecode][test] Specify triple for Invalid.cpp

7b813c3

This should unbreak that test on 32bit builders, e.g. https://lab.llvm.org/buildbot/#/builders/154/builds/24509

[X86][Clang] Allow constexpr evaluation of F16C CVTPS2PH intrinsics (l…

eee09ca

…lvm#162295) Fixes llvm#160312

[SystemZ] Serialize ada entry flags (llvm#169395)

66ca3f1

Adding support for serializing the ada entry flags helps with mir based test cases. Without this change, the flags are simple displayed as being "unkmown".

RuntimeLibcalls: Add macos unlocked IO functions to systems (llvm#167084

1d7d83d

)

merge main into amd-staging

120deff

z1-cciauto requested a review from a team November 27, 2025 13:19

ronlieb approved these changes Nov 27, 2025

View reviewed changes

z1-cciauto merged commit 2520866 into amd-staging Nov 27, 2025
13 checks passed

z1-cciauto deleted the upstream_merge_202511270819 branch November 27, 2025 15:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

merge main into amd-staging #699

merge main into amd-staging #699

Uh oh!

z1-cciauto commented Nov 27, 2025

Uh oh!

z1-cciauto commented Nov 27, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

35 participants

merge main into amd-staging #699

merge main into amd-staging #699

Uh oh!

Conversation

z1-cciauto commented Nov 27, 2025

Uh oh!

z1-cciauto commented Nov 27, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

35 participants