merge main into amd-staging #724

ronlieb · 2025-12-01T12:48:01Z

No description provided.

@grypp

llvm#170001) Fixes: llvm#169113 Correctly propagate verification failure when `NVVM::RequiresSMInterface` check fails during `gpu.module` verification. Previously, the walk was interrupted but the function returned `success()`, causing a mismatch between the emitted diagnostic and the return status. This led to assertion failures in Python bindings which expect `failure()` when diagnostics are emitted. CC: @grypp

…fo (llvm#168474) Add a "shared_cache_path" key-value to the jGetSharedCacheInfo response, if we can fetch the shared cache path. If debugserver and the inferior process are running with the same shared cache UUID, there is a simple SPI to get debugserver's own shared cache filepath and we will return that. On newer OSes, there are SPI we can use to get the inferior process' shared cache filepath, use that if necessary and the SPI are available. The response for the jGetSharedCacheInfo packet will now look like {"shared_cache_base_address":6609256448,"shared_cache_uuid":"B69FF43C-DBFD-3FB1-B4FE-A8FE32EA1062","no_shared_cache":false,"shared_cache_private_cache":false,"shared_cache_path":"/System/Volumes/Preboot/Cryptexes/OS/System/Library/dyld/dyld_shared_cache_arm64e"} when we have the full information about the shared cache in the inferior. There are three possible types of responses: 1. inferior has not yet mapped in a shared cache (read: when stopped at dyld_start and dyld hasn't started executing yet). In this case, no "shared_cache_path" is listed. ("shared_cache_base_address" will be 0, "shared_cache_uuid" will be all-zeroes uuid) 2. inferior has a shared cache, but it is different than debugserver's and we do not have the new SPI to query the shared cache filepath. No "shared_cache_path" is listed. 3. We were able to find the shared cache filepath, and it is included in the response, as above. I'm not using this information in lldb yet, but changes that build on this will be forthcoming. rdar://148939795

`cast<Constant>` is not guarded by a type check during canonicalization of predicates. This patch adds a type check in the outer if to avoid the crash. `dyn_cast` may introduce another nested if, so I just use `isa<Constant>` instead. Address the crash reported in llvm#153053 (comment).

The scalar loop doesn't exist anymore after 8907b6d

Not sure why that didn't exist yet, but we have quite a few places using the same `std::distance` pattern.

Major part of this PR is commit implementing support for DT_INIT_ARRAY for BOLT runtime libraries initialization. Also, it adds related hook-init test & fixes couple of X86 instrumentation tests. This commit follows implementation of instrumentation hook via DT_FINI_ARRAY (llvm#67348) and extends it for BOLT runtime libraries (including instrumentation library) initialization hooking. Initialization has has differences compared to finalization: - Executables always use ELF entry point address. Update code checks it and updates init_array entry if ELF is shared library (have no interp entry) and have no DT_INIT entry. Also this commit introduces "runtime-lib-init-hook" option to select primary initialization hook (entry_point, init, init_array) with fall back to next available hook in input binary. e.g. in case of libc we can explicitly set it to init_array. - Shared library init_array entries relocations usually has R_AARCH64_ABS64 type on AArch64 binaries. We check relocation type and adjust methods for reading init_array relocations in discovery and update methods. --------- Co-authored-by: Vasily Leonenko <vasily.leonenko@huawei.com>

Add support for `arith.cmpf`.

While attempting to remove the use of undef from more loop vectoriser tests I discovered a bug where this assert was firing: ``` llvm::Constant* llvm::Constant::getSplatValue(bool) const: Assertion `this->getType()->isVectorTy() && "Only valid for vectors!"' failed. ... #8 0x0000aaaab9e2fba4 llvm::Constant::getSplatValue #9 0x0000aaaab9dfb844 llvm::ConstantFoldBinaryInstruction ``` This seems to be happening because we are incorrectly generating WidePtrAdd recipes for scalar VFs. The PR fixes this by checking whether a plan has a scalar VF only in legalizeAndOptimizeInductions. This PR also removes the use of undef from the test `both` in Transforms/LoopVectorize/iv_outside_user.ll, which is what started triggering the assert. Fixes llvm#169334

-- This commit addresses [follow-up review comments on 169704](llvm#169704 (review)). -- Contains NFC nit/minor changes. Signed-off-by: Abhishek Varma <abhvarma@amd.com>

Addresses a comment on the PR that introduces the ub.reachable -> spriv.Unreachable lowering (llvm#169872 (comment)).

Add support for `arith.negf`.

As described in section 2.14.6 of openmp spec, the patch implements support for iterator in motion clauses. --------- Co-authored-by: Shashwathi N <nshashwa@pe31.hpc.amslabs.hpecorp.net>

Add support for `arith.minnumf`, `arith.maxnumf`, `arith.minimumf`, `arith.maximumf`.

llvm#170088) …bounds-avoid-unchecked-container-access Missing a trailing underscore to render it as a link. Co-authored-by: Carlos Gálvez <carlos.galvez@zenseact.com>

…ilcalls" (llvm#169881) (llvm#169929) This reapplies commit 5d6d743. Fix: added assertions to the requirements of the test -------- Original commit message: In the Inliner pass, tailcalls are converted to calls in the inlined BasicBlock. If the tailcall is indirect, the `BR` is converted to `BLR`. These instructions require different BTI landing pads at their targets. As the targets of indirect tailcalls are unknown, inlining such blocks is unsound for BTI: they should be skipped instead.

This should ensure that the structurizer while loop is deterministic across runs. Use of `MapVector` addresses the source of the nondeterminism which is use of a `Block*` as a map key. fixes llvm#128547

…lvm#166362) This patch extends the MachineSMEABIPass to support ZT0. This is done with the addition of two new states: - `ACTIVE_ZT0_SAVED` * This is used when calling a function that shares ZA, but does not share ZT0 (i.e., no ZT0 attributes) * This state indicates ZT0 must be saved to the save slot, but ZA must remain on, with no lazy save setup - `LOCAL_COMMITTED` * This is used for saving ZT0 in functions without ZA state * This state indicates ZA is off and ZT0 has been saved * This state is general enough to support ZA, but the required transitions have not been implemented† To aid with readability, the state transitions have been reworked to a switch of `transitionFrom(<FromState>).to(<ToState>)`, rather than nested ifs, which helps manage more transitions. † This could be implemented to handle some cases of undefined behavior better.

Previously we were less specific for POINTER/TARGET: encoding that they could alias with (almost) anything. In the new system, the "target data" tree is now a sibling of the other trees (e.g. "global data"). POITNTER variables go at the root of the "target data" tree, whereas TARGET variables get their own nodes under that tree. For example, ``` integer, pointer :: ip real, pointer :: rp integer, target :: it integer, target :: it2(:) real, target :: rt integer :: i real :: r ``` - `ip` and `rp` may alias with any variable except `i` and `r`. - `it`, `it2`, and `rt` may alias only with `ip` or `rp`. - `i` and `r` cannot alias with any other variable. Fortran 2023 15.5.2.14 gives restrictions on entities associated with dummy arguments. These do not allow non-target globals to be modified through dummy arguments and therefore I don't think we need to make all globals alias with dummy arguments. I haven't implemented it in this patch, but I wonder whether it is ever possible for `ip` to alias with `rt` or even `it2`. While I was updating the tests I fixed up some tests that still assumed that local alloc tbaa wasn't the default. I found no functional regressions in the gfortran test suite, fujitsu test suite, spec2017, or a selection of HPC apps we test internally.

A barrier will pause execution until all threads reach it. If some go to a different barrier then we deadlock. This manifests in that the finalization callback must only be run once. Fix by ensuring we always go through the same finalization block whether the thread in cancelled or not and no matter which cancellation point causes the cancellation. The old callback only affected PARALLEL, so it has been moved into the code generating PARALLEL. For this reason, we don't need similar changes for other cancellable constructs. We need to create the barrier on the shared exit from the outlined function instead of only on the cancelled branch to make sure that threads exiting normally (without cancellation) meet the same barriers as those which were cancelled. For example, previously we might have generated code like ``` ... %ret = call i32 @__kmpc_cancel(...) %cond = icmp eq i32 %ret, 0 br i1 %cond, label %continue, label %cancel continue: // do the rest of the callback, eventually branching to %fini br label %fini cancel: // Populated by the callback: // unsafe: if any thread makes it to the end without being cancelled // it won't reach this barrier and then the program will deadlock %unused = call i32 @__kmpc_cancel_barrier(...) br label %fini fini: // run destructors etc ret ``` In the new version the barrier is moved into fini. I generate it *after* the destructors because the standard describes the barrier as occurring after the end of the parallel region. ``` ... %ret = call i32 @__kmpc_cancel(...) %cond = icmp eq i32 %ret, 0 br i1 %cond, label %continue, label %cancel continue: // do the rest of the callback, eventually branching to %fini br label %fini cancel: br label %fini fini: // run destructors etc // safe so long as every exit from the function happens via this block: %unused = call i32 @__kmpc_cancel_barrier(...) ret ``` To achieve this, the barrier is now generated alongside the finalization code instead of in the callback. This is the reason for the changes to the unit test. I'm unsure if I should keep the incorrect barrier generation callback only on the cancellation branch in clang with the OMPIRBuilder backend because that would match clang's ordinary codegen. Right now I have opted to remove it entirely because it is a deadlock waiting to happen. --- This re-lands llvm#164586 with a small fix for a failing buildbot running address sanitizer on clang lit tests. In the previous version of the patch I added an insertion point guard "just to be safe" and never removed it. There isn't insertion point guarding on the other route out of this function and we do not preserve the insertion point around getFiniBB either so it is not needed here. The problem flagged by the sanitizers was because the saved insertion point pointed to an instruction which was then removed inside the FiniCB for some clang codegen functions. The instruction was freed when it was removed. Then accessing it to restore the insertion point was a use after free bug.

As noted in the reproducer provided in llvm#164762 (comment), on RISC-V after LTO we sometimes have trip counts exposed to vectorized loops. The loop vectorizer will have generated calls to @llvm.experimental.get.vector.length, but there are [some properties](https://llvm.org/docs/LangRef.html#id2399) about the intrinsic we can use to simplify it: - The result is always less than both Count and MaxLanes - If Count <= MaxLanes, then the result is Count This teaches SCCP to handle these cases with the intrinsic, which allows some single-iteration-after-LTO loops to be unfolded. llvm#169293 is related and also simplifies the intrinsic in InstCombine via computeKnownBits, but it can't fully remove the loop since computeKnownBits only does limited reasoning on recurrences.

They don't have side-effects, so this should be fine. Fixes llvm#170064

Similar to how getElementCount avoids the need to reason about fixed and scalable ElementCounts separately, this patch adds getTypeSize to do the same for TypeSize. It also goes through and replaces some of the manual uses of getVScale with getTypeSize/getElementCount where possible.

…ll. NFC.

…n into after region (llvm#169892) When a `scf.if` directly precedes a `scf.condition` in the before region of a `scf.while` and both share the same condition, move the if into the after region of the loop. This helps simplify the control flow to enable uplifting `scf.while` to `scf.for`.

During InsertNegateRAState pass we check the annotations on instructions, to decide where to generate the OpNegateRAState CFIs in the output binary. As only instructions in the input binary were annotated, we have to make a judgement on instructions generated by other BOLT passes. Incorrect placement may cause issues when an (async) unwind request is received during the new "unknown" instructions. This patch adds more logic to make a more informed decision on by taking into account: - unknown instructions in a BasicBlock with other instruction have the same RAState. Previously, if the BasicBlock started with an unknown instruction, the RAState was copied from the preceding block. Now, the RAState is copied from the succeeding instructions in the same block. - Some BasicBlocks may only contain instructions with unknown RAState, As explained in issue llvm#160989, these blocks already have incorrect unwind info. Because of this, the last known RAState based on the layout order is copied. Updated bolt/docs/PacRetDesign.md to reflect changes.

…er region in scf-uplift-while-to-for" (llvm#169888) Reverts llvm#165216 It is implemented in llvm#169892 .

Reverts llvm#169544 [Regressed](https://lab.llvm.org/buildbot/#/builders/143/builds/12956) gfortran test suite

…70095) From OpenMP 4.0: > When an if clause is present on a cancel construct and the if expression > evaluates to false, the cancel construct does not activate cancellation. > The cancellation point associated with the cancel construct is always > encountered regardless of the value of the if expression. This wording is retained unmodified in OpenMP 6.0. This re-opens the already approved PR llvm#164587, which was closed by accident. The only changes are a rebase.

…lvm#170096) Similar to fdiv, we should be trying to concat these high latency instructions together

…together. (llvm#170098) Can only do this for 128->256 cases as we can't safely convert to the RCP14/RSQRT14 variants

…lvm#152397) Fixes llvm#71844

Python multiprocessing is limited to 60 workers at most: https://github.com/python/cpython/blob/6bc65c30ff1fd0b581a2c93416496fc720bc442c/Lib/concurrent/futures/process.py#L669-L672 The limit being per thread pool, we can work around it by using multiple pools on windows when we want to actually use more workers.

…lvm#170097) We can't read from those and will run into an assertion sooner or later. Fixes llvm#170031

From the review in llvm#169527 (comment), there are some users where we want to extend or truncate a ConstantRange only if it's not already the destination bitwidth. Previously this asserted, so this PR relaxes it to just be a no-op, similar to IRBuilder::createZExt and friends.

Follow-up for llvm#169047. The previous PR moved some functions from DA to Delinearization, but the member function declarations were not updated accordingly. This patch removes them.

…ther. (llvm#170108)

z1-cciauto · 2025-12-01T12:49:32Z

PSDB Link: https://compiler-ci.amd.com/job/compiler-psdb-amd-staging/3032

Men-cotton and others added 30 commits December 1, 2025 09:50

[LV] Regenerate some check lines. NFC

dc5ce79

The scalar loop doesn't exist anymore after 8907b6d

[clang][AST] Add RecordDecl::getNumFields() (llvm#170022)

bbb0dba

Not sure why that didn't exist yet, but we have quite a few places using the same `std::distance` pattern.

[mlir][arith] Add support for cmpf to ArithToAPFloat (llvm#169753)

4d7abe5

Add support for `arith.cmpf`.

[NFC][Linalg] Follow-up on ConvMatchBuilder (llvm#170080)

7ce7141

-- This commit addresses [follow-up review comments on 169704](llvm#169704 (review)). -- Contains NFC nit/minor changes. Signed-off-by: Abhishek Varma <abhvarma@amd.com>

[mlir][SPIRV] Improve ub.unreachable lowering test case (llvm#170083)

f67b018

Addresses a comment on the PR that introduces the ub.reachable -> spriv.Unreachable lowering (llvm#169872 (comment)).

[mlir][arith] Add support for negf to ArithToAPFloat (llvm#169759)

05b1989

Add support for `arith.negf`.

Adding support for iterator in motion clauses. (llvm#159112)

9afb651

As described in section 2.14.6 of openmp spec, the patch implements support for iterator in motion clauses. --------- Co-authored-by: Shashwathi N <nshashwa@pe31.hpc.amslabs.hpecorp.net>

[mlir][arith] Add support for min/max to ArithToAPFloat (llvm#169760)

147c466

Add support for `arith.minnumf`, `arith.maxnumf`, `arith.minimumf`, `arith.maximumf`.

[clang-tidy][doc] Fix incorrect link syntax in cppcoreguidelines-pro-… (

eb711d8

llvm#170088) …bounds-avoid-unchecked-container-access Missing a trailing underscore to render it as a link. Co-authored-by: Carlos Gálvez <carlos.galvez@zenseact.com>

[CAS] Temporarily skip tests on old windows version (llvm#170063)

8079d03

[mlir][spirv] Use MapVector for BlockMergeInfoMap (llvm#169636)

dda15ad

This should ensure that the structurizer while loop is deterministic across runs. Use of `MapVector` addresses the source of the nondeterminism which is use of a `Block*` as a map key. fixes llvm#128547

[clang][bytecode] Fix discarding ImplitiValueInitExprs (llvm#170089)

b162099

They don't have side-effects, so this should be fine. Fixes llvm#170064

[RISCV] Remove the duplicate for RV32/RV64 in zicond-fp-select-zfinx.…

b7721c5

…ll. NFC.

Revert "[MLIR][SCF] Sink scf.if from scf.while before region into aft…

2c21790

…er region in scf-uplift-while-to-for" (llvm#169888) Reverts llvm#165216 It is implemented in llvm#169892 .

Revert "[flang][TBAA] refine TARGET/POINTER encoding" (llvm#170105)

b60a84a

Reverts llvm#169544 [Regressed](https://lab.llvm.org/buildbot/#/builders/143/builds/12956) gfortran test suite

[X86] Add tests showing failure to concat sqrt intrinsics together. (l…

6c0a02f

…lvm#170096) Similar to fdiv, we should be trying to concat these high latency instructions together

RKSimon and others added 10 commits December 1, 2025 11:28

[X86] Add tests showing failure to concat RCPPS + RSQRTPS intrinsics …

0e721b7

…together. (llvm#170098) Can only do this for 128->256 cases as we can't safely convert to the RCP14/RSQRT14 variants

[WebAssembly] Optimize away mask of 63 for shl ( zext (and i32 63))) (l…

edd1856

…lvm#152397) Fixes llvm#71844

[MLIR] Fix build after llvm#169982 (llvm#170107)

130746a

[clang][bytecode] Check memcmp builtin for one-past-the-end pointers (l…

48931e5

…lvm#170097) We can't read from those and will run into an assertion sooner or later. Fixes llvm#170031

[DA] Clean up unnecessary member function declarations (llvm#170106)

5877020

Follow-up for llvm#169047. The previous PR moved some functions from DA to Delinearization, but the member function declarations were not updated accordingly. This patch removes them.

[MLIR|BUILD]: Fix for 8ceeba8 (llvm#170110)

6157d46

[X86] Add tests showing failure to concat fp rounding intrinsics toge…

989ac4c

…ther. (llvm#170108)

merge main into amd-staging

efdebf8

ronlieb requested review from a team and dpalermo December 1, 2025 12:48

ronlieb requested review from antiagainst, kuhar and nicolasvasilache as code owners December 1, 2025 12:48

ronlieb removed request for antiagainst, kuhar and nicolasvasilache December 1, 2025 14:29

dpalermo approved these changes Dec 1, 2025

View reviewed changes

z1-cciauto merged commit 8f6a28b into amd-staging Dec 1, 2025
14 checks passed

z1-cciauto deleted the amd/merge/upstream_merge_20251201062109 branch December 1, 2025 15:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

merge main into amd-staging #724

merge main into amd-staging #724

Uh oh!

ronlieb commented Dec 1, 2025

Uh oh!

z1-cciauto commented Dec 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

28 participants

merge main into amd-staging #724

merge main into amd-staging #724

Uh oh!

Conversation

ronlieb commented Dec 1, 2025

Uh oh!

z1-cciauto commented Dec 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

28 participants