merge main into amd-staging #695

z1-cciauto · 2025-11-26T20:30:43Z

No description provided.

…extract/insert_strided_slice` (llvm#168626) This PR adds general SIMT distribution support for `vector.extract/insert_strided_slice`. Currently vector distribution already have support for these operations but have restrictions to avoid requiring layouts during distribution logic. For example, `extract_stride_slice` require that distributed dimension is fully extracted. However, more complex cases may require extracting partially from distributed dimension (eg. 8x16xf16 extraction from 8x32xf16). These types of cases need the layouts to reason about how the data is spread across SIMT lanes. Currently, we don't have layout access in vector distribution so these new patterns are place in XeGPU side. They have higher pattern benefit so that they will be tried first before trying regular vector distribution based patterns.

run ```shell build/bin/llvm-exegesis -mode=latency -mtriple=riscv64-unknown-linux-gnu --mcpu=generic --benchmark-phase=assemble-measured-code -opcode-index=-1 ``` error: ``` --- mode: latency key: instructions: - 'NDS_FMV_BF16_X F2_H X11' - 'NDS_FMV_X_BF16 X26 F2_H' config: '' register_initial_values: - 'X11=0x0' cpu_name: generic llvm_triple: riscv64-unknown-linux-gnu min_instructions: 10000 measurements: [] error: actual measurements skipped. info: Repeating two instructions assembled_snippet: 41116AE48145538105F0530D01E0538105F0530D01E0538105F0530D01E0538105F0530D01E0226D41018280 ... LLVM ERROR: Attempting to emit FMV_H_X instruction but the Feature_HasHalfFPLoadStoreMove predicate(s) are not met ```

Partial reductions can easily be represented by the VPReductionRecipe class by setting their scale factor to something greater than 1. This PR merges the two together and gives VPReductionRecipe a VFScaleFactor so that it can choose to generate the partial reduction intrinsic at execute time. Stacked PRs: 1. llvm#147026 2. llvm#147255 3. llvm#156976 4. llvm#160154 5. llvm#147302 6. llvm#162503 7. -> llvm#147513 Replaces llvm#146073 .

…llvm#169653) This patch reverts 80a4e6f After the relevant patches clang now supports dwarf fission with RISC-V linker relaxations, so we can remove the related driver error.

This PR enables maximising scalable vector bandwidth for all AArch64 cores other than the V1 and N2. Those two have shown small regressions that we'll investigate, fix and then enable.

Checks if an instruction is BTI, and updates the immediate value to the newly requested variant. This can be used in situations when the compiler already inserted a BTI landing pad to a location, but BOLT needs to update it to a different variant. Example: br x0 to a location with a BTI c.

Based on the RUN lines, there is actually no need for different versions of these error files since no cpu specific option needed. Combine to reduce confusion and maintenance as these are not huge files.

…en the loop contains control convergence operations. (llvm#165643) Skip constant folding the loop predicates if the loop contains control convergence tokens referenced outside the loop. Fixes llvm#164496. Verified [loop_peeling.test](llvm/offload-test-suite#473) passes with the fix. Similar control convergence issues are found on other passes. llvm#165642 HLSL used for tests: ```hlsl RWStructuredBuffer<uint> Out : register(u0); [numthreads(8,1,1)] void main(uint3 TID : SV_GroupThreadID) { for (uint i = 0; i < 8; i++) { if (i == TID.x) { Out[TID.x] = WaveActiveMax(TID.x); break; } } } ``` With nested loop: ```hlsl RWStructuredBuffer<uint> Out : register(u0); [numthreads(8,8,1)] void main(uint3 TID : SV_GroupThreadID) { for (uint i = 0; i < 8; i++) { for (uint j = 0; j < 8; j++) { if (i == TID.x && j == TID.y) { uint index = TID.x * 8 + TID.y; Out[index] = WaveActiveMax(index); break; } } } } ```

…llvm#167011) This is a branch off of llvm#159856, in which consists of the runtime portion of the changes required to support indirect function and virtual function calls on an `omp target device` when the virtual class / indirect function is mapped to the device from the host. Key Changes - Introduced a new flag OMP_DECLARE_TARGET_INDIRECT_VTABLE to mark VTable registrations - Modified setupIndirectCallTable to support both VTable entries and indirect function pointers Details: The setupIndirectCallTable implementation was modified to support this registration type by retrieving the first address of the VTable and inferring the remaining data needed to build the indirect call table. Since the Vtables / Classes registered as indirect can be larger than 8 bytes, and the vtables may not be at the first address we either need to pass the size to __llvm_omp_indirect_call_lookup and have a check at each step of the binary search, or add multiple entries to the indirect table for each address registered. The latter was chosen. Commit: a00def3 is not a part of this PR and is handled / reviewed in: llvm#159856, This is PR (2/3) Register Vtable PR (1/3): llvm#159856, Codegen / _llvm_omp_indirect_call_lookup PR (3/3): llvm#159857

Resolves llvm#160514 Enables usage of the following x86 intrinsics in `constexpr`: ``` _mm256_shuffle_i64x2 _mm256_mask_shuffle_i64x2 _mm256_maskz_shuffle_i64x2 _mm256_shuffle_f64x2 _mm256_mask_shuffle_f64x2 _mm256_maskz_shuffle_f64x2 _mm512_shuffle_i64x2 _mm512_mask_shuffle_i64x2 _mm512_maskz_shuffle_i64x2 _mm512_shuffle_f64x2 _mm512_mask_shuffle_f64x2 _mm512_maskz_shuffle_f64x2 _mm256_shuffle_i32x4 _mm256_mask_shuffle_i32x4 _mm256_maskz_shuffle_i32x4 _mm256_shuffle_f32x4 _mm256_mask_shuffle_f32x4 _mm256_maskz_shuffle_f32x4 _mm512_shuffle_i32x4 _mm512_mask_shuffle_i32x4 _mm512_maskz_shuffle_i32x4 _mm512_shuffle_f32x4 _mm512_mask_shuffle_f32x4 _mm512_maskz_shuffle_f32x4 ```

Add the cir::exp2 operation and handling for the related builtins.

Fixes -Wunused-variable when compiling without LLVM_ENABLE_THREADS

A deduced return type can be an object type, in which case `const` can have an effect. Delay the diagnostic to the point at which the type is deduced. Add tests for lambdas. Fixes llvm#43054 Note that there is a discussion in llvm#43054 about adding a separate warning for "const return types are weird" for the class type cases, but it would have to be a separate warning - warning which currently exists in clang-tidy as `readability-const-return-type`.

Implement CountOf on VariableArrayType with IntegerConstant SizeExpr

A couple of builtin helper functions were taking a clang::Expr argument but only using it to build an MLIR location. This change updates these functions to take a location directly.

…lvm#163653) ## Summary: This change introduces a `DAPSessionManager` to enable multiple DAP sessions to share debugger instances when needed, for things like child process debugging and some scripting hooks that create dynamically new targets. Changes include: - Add `DAPSessionManager` singleton to track and coordinate all active DAP sessions - Support attaching to an existing target via its globally unique target ID (targetId parameter) - Share debugger instances across sessions when new targets are created dynamically - Refactor event thread management to allow sharing event threads between sessions and move event thread and event thread handlers to `EventHelpers` - Add `eBroadcastBitNewTargetCreated` event to notify when new targets are created - Extract session names from target creation events - Defer debugger initialization from 'initialize' request to 'launch'/'attach' requests. The only time the debugger is used currently in between its creation in `InitializeRequestHandler` and the `Launch` or `Attach` requests is during the `TelemetryDispatcher` destruction call at the end of the `DAP::HandleObject` call, so this is safe. This enables scenarios when new targets are created dynamically so that the debug adapter can automatically start a new debug session for the spawned target while sharing the debugger instance. ## Tests: The refactoring maintains backward compatibility. All existing DAP test cases pass. Also added a few basic unit tests for DAPSessionManager ``` >> ninja DAPTests >> ./tools/lldb/unittests/DAP/DAPTests >>./bin/llvm-lit -v ../llvm-project/lldb/test/API/tools/lldb-dap/ ```

…#169708)

Both `Target::ReadSignedIntegerFromMemory()` and `Process::ReadSignedIntegerFromMemory()` internally created an unsigned scalar, so extending the value later did not duplicate the sign bit.

…169042) * Added missing cluster.load ops with different sizes. Extended all rocdl tests

…folding. (llvm#149042)" This reverts commit a6edeed. The following fixes have landed, addressing issues causing the original revert: * llvm#169298 * llvm#167897 * llvm#168949 Original message: Building on top of llvm#148817, introduce a new abstract LastActiveLane opcode that gets lowered to Not(Mask) → FirstActiveLane(NotMask) → Sub(result, 1). When folding the tail, update all extracts for uses outside the loop the extract the value of the last actice lane. See also llvm#148603 PR: llvm#149042

While taking a look at the code of lldb test-suite packages, I have noticed that in `get_triple_str` in `darwin.py` env is added inside a `components` list, which is probably supposed to be `component` (defined on the line 61). Signed-off-by: Nikita B <n2h9z4@gmail.com>

…or target (llvm#168273) This pr fixes llvm#167388 . ## Description This pr adds new method `GetArchName` to `SBTarget` so that no need to parse triple to get arch name in client code. ## Testing ### All from `TestTargetAPI.py` run test with ``` ./build/bin/lldb-dotest -v -p TestTargetAPI.py ``` <details> <summary>existing tests (without newly added)</summary> <img width="1425" height="804" alt="image" src="https://github.com/user-attachments/assets/617e4c69-5c6b-44c4-9aeb-b751a47e253c" /> </details> <details> <summary>existing tests (with newly added)</summary> <img width="1422" height="778" alt="image" src="https://github.com/user-attachments/assets/746990a1-df88-4348-a090-224963d3c640" /> </details> ### Only `test_get_arch_name` run test with ``` ./build/bin/lldb-dotest -v -p TestTargetAPI.py -f test_get_arch_name_dwarf -f test_get_arch_name_dwo -f test_get_arch_name_dsym lldb/test/API/python_api/target ``` <details> <summary>only newly added</summary> <img width="1422" height="778" alt="image" src="https://github.com/user-attachments/assets/fcaafa5d-2622-4171-acee-e104ecee0652" /> </details> --------- Signed-off-by: Nikita B <n2h9z4@gmail.com> Co-authored-by: Jonas Devlieghere <jonas@devlieghere.com>

…en tail-folding. (llvm#149042)"" This reverts commit 72e51d3. Missed some test updates.

A recent change introduced a failure in debug builds due to an incorrect level of indirection inside an assert. This fixes that.

…folding. (llvm#149042)" This reverts commit a6edeed. The following fixes have landed, addressing issues causing the original revert: * llvm#169298 * llvm#167897 * llvm#168949 Original message: Building on top of llvm#148817, introduce a new abstract LastActiveLane opcode that gets lowered to Not(Mask) → FirstActiveLane(NotMask) → Sub(result, 1). When folding the tail, update all extracts for uses outside the loop the extract the value of the last actice lane. See also llvm#148603 PR: llvm#149042

`[[nodiscard]]` should be applied to functions where discarding the return value is most likely a correctness issue. - https://libcxx.llvm.org/CodingGuidelines.html#apply-nodiscard-where-relevant

…m#169611) https://wg21.link/#support `[[nodiscard]]` should be applied to functions where discarding the return value is most likely a correctness issue. - https://libcxx.llvm.org/CodingGuidelines.html#apply-nodiscard-where-relevant The following was implemented in this patch: - [x] `<compare>` - [x] `<corotine>` - [x] `<initializer_list>` - [x] Integer comparisons --------- Co-authored-by: Hristo Hristov <zingam@outlook.com> Co-authored-by: A. Jiang <de34@live.cn>

This basically adds a Leave option for a specific range of literals.

This supports the following use cases: - ConstantPtrAuth expressions that are unrepresentable using standard PAuth relocations such as expressions involving an integer operand or deactivation symbols. - libc implementations that do not support PAuth relocations. For more information see the RFC: https://discourse.llvm.org/t/rfc-structure-protection-a-family-of-uaf-mitigation-techniques/85555 Reviewers: MaskRay, fmayer, smithp35, kovdan01 Reviewed By: fmayer Pull Request: llvm#133533

z1-cciauto · 2025-11-26T20:32:27Z

PSDB Link: https://compiler-ci.amd.com/job/compiler-psdb-amd-staging/2990

ronlieb · 2025-11-26T22:44:40Z

!PSDB

z1-cciauto · 2025-11-26T22:45:27Z

PSDB Link: https://compiler-ci.amd.com/job/compiler-psdb-amd-staging/2992

charithaintc and others added 30 commits November 26, 2025 10:10

Revert [Driver] Error for -gsplit-dwarf with RISC-V linker relaxation (…

7c3c8da

…llvm#169653) This patch reverts 80a4e6f After the relevant patches clang now supports dwarf fission with RISC-V linker relaxations, so we can remove the related driver error.

[AArch64] Enable maximising scalable vector bandwidth (llvm#166748)

9bd30e2

This PR enables maximising scalable vector bandwidth for all AArch64 cores other than the V1 and N2. Those two have shown small regressions that we'll investigate, fix and then enable.

[HLSL] Remove faceforward SPIRV fast path (llvm#169547)

e99029e

[NFC][PowerPC] Merge ppc64 encoding error tests (llvm#169669)

b78b5ba

Based on the RUN lines, there is actually no need for different versions of these error files since no cpu specific option needed. Combine to reduce confusion and maintenance as these are not huge files.

[CIR] Upstream Builtin Exp2Op (llvm#169152)

411a53e

Add the cir::exp2 operation and handling for the related builtins.

Move static test variable into the #if that uses it (llvm#169695)

0940f68

Fixes -Wunused-variable when compiling without LLVM_ENABLE_THREADS

[CIR] CountOf VLA with Array element type (llvm#169404)

18805b6

Implement CountOf on VariableArrayType with IntegerConstant SizeExpr

[CIR][NFC] Cleanup builtin helper function interfaces (llvm#169586)

587e279

A couple of builtin helper functions were taking a clang::Expr argument but only using it to build an MLIR location. This change updates these functions to take a location directly.

[flang][OpenMP] Remove unused #include "dump-parse-tree.h", NFC (llvm…

bf43b95

…#169708)

[lldb] Fix reading 32-bit signed integers (llvm#169150)

a059afa

Both `Target::ReadSignedIntegerFromMemory()` and `Process::ReadSignedIntegerFromMemory()` internally created an unsigned scalar, so extending the value later did not duplicate the sign bit.

[SLP][NFC]Add a test with single op inst, used in many nodes, NFC.

66e18b8

[ROCDL] Added missing cluster.load.async.to.lds op (gfx1250) (llvm#…

d09644a

…169042) * Added missing cluster.load ops with different sizes. Extended all rocdl tests

Revert "Reapply "[LV] Use ExtractLane(LastActiveLane, V) live outs wh…

d58ebe3

…en tail-folding. (llvm#149042)"" This reverts commit 72e51d3. Missed some test updates.

[mlir][amdgpu] Add make_dma_base operation (llvm#169086)

9af00e6

[CIR][NFC] Fix build problem inside an assert (llvm#169715)

cabcb5a

A recent change introduced a failure in debug builds due to an incorrect level of indirection inside an assert. This fixes that.

[X86] addcarry.ll - add test coverage for llvm#169691 (llvm#169716)

cec837e

[libc++][flat_map] Applied [[nodiscard]] (llvm#169453)

bbb8f7a

`[[nodiscard]]` should be applied to functions where discarding the return value is most likely a correctness issue. - https://libcxx.llvm.org/CodingGuidelines.html#apply-nodiscard-where-relevant

H-G-Hristov and others added 4 commits November 26, 2025 22:17

[clang-format] Add xxxMaxDigitsNoSeparator (llvm#164286)

3a25a4a

This basically adds a Leave option for a specific range of literals.

merge main into amd-staging

dbb3a6b

z1-cciauto requested review from krzysz00 and kuhar as code owners November 26, 2025 20:30

z1-cciauto requested a review from a team November 26, 2025 20:30

ronlieb approved these changes Nov 27, 2025

View reviewed changes

z1-cciauto merged commit a470909 into amd-staging Nov 27, 2025
13 checks passed

z1-cciauto deleted the upstream_merge_202511261530 branch November 27, 2025 01:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

merge main into amd-staging #695

merge main into amd-staging #695

z1-cciauto commented Nov 26, 2025

Uh oh!

z1-cciauto commented Nov 26, 2025

Uh oh!

ronlieb commented Nov 26, 2025

Uh oh!

z1-cciauto commented Nov 26, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

30 participants

merge main into amd-staging #695

merge main into amd-staging #695

Conversation

z1-cciauto commented Nov 26, 2025

Uh oh!

z1-cciauto commented Nov 26, 2025

Uh oh!

ronlieb commented Nov 26, 2025

Uh oh!

z1-cciauto commented Nov 26, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

30 participants