forked from llvm/llvm-project
-
Notifications
You must be signed in to change notification settings - Fork 77
merge main into amd-staging #695
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
+7,011
−2,749
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…extract/insert_strided_slice` (llvm#168626) This PR adds general SIMT distribution support for `vector.extract/insert_strided_slice`. Currently vector distribution already have support for these operations but have restrictions to avoid requiring layouts during distribution logic. For example, `extract_stride_slice` require that distributed dimension is fully extracted. However, more complex cases may require extracting partially from distributed dimension (eg. 8x16xf16 extraction from 8x32xf16). These types of cases need the layouts to reason about how the data is spread across SIMT lanes. Currently, we don't have layout access in vector distribution so these new patterns are place in XeGPU side. They have higher pattern benefit so that they will be tried first before trying regular vector distribution based patterns.
run
```shell
build/bin/llvm-exegesis -mode=latency -mtriple=riscv64-unknown-linux-gnu --mcpu=generic --benchmark-phase=assemble-measured-code -opcode-index=-1
```
error:
```
---
mode: latency
key:
instructions:
- 'NDS_FMV_BF16_X F2_H X11'
- 'NDS_FMV_X_BF16 X26 F2_H'
config: ''
register_initial_values:
- 'X11=0x0'
cpu_name: generic
llvm_triple: riscv64-unknown-linux-gnu
min_instructions: 10000
measurements: []
error: actual measurements skipped.
info: Repeating two instructions
assembled_snippet: 41116AE48145538105F0530D01E0538105F0530D01E0538105F0530D01E0538105F0530D01E0226D41018280
...
LLVM ERROR: Attempting to emit FMV_H_X instruction but the Feature_HasHalfFPLoadStoreMove predicate(s) are not met
```
Partial reductions can easily be represented by the VPReductionRecipe class by setting their scale factor to something greater than 1. This PR merges the two together and gives VPReductionRecipe a VFScaleFactor so that it can choose to generate the partial reduction intrinsic at execute time. Stacked PRs: 1. llvm#147026 2. llvm#147255 3. llvm#156976 4. llvm#160154 5. llvm#147302 6. llvm#162503 7. -> llvm#147513 Replaces llvm#146073 .
…llvm#169653) This patch reverts 80a4e6f After the relevant patches clang now supports dwarf fission with RISC-V linker relaxations, so we can remove the related driver error.
This PR enables maximising scalable vector bandwidth for all AArch64 cores other than the V1 and N2. Those two have shown small regressions that we'll investigate, fix and then enable.
Checks if an instruction is BTI, and updates the immediate value to the newly requested variant. This can be used in situations when the compiler already inserted a BTI landing pad to a location, but BOLT needs to update it to a different variant. Example: br x0 to a location with a BTI c.
Based on the RUN lines, there is actually no need for different versions of these error files since no cpu specific option needed. Combine to reduce confusion and maintenance as these are not huge files.
…en the loop contains control convergence operations. (llvm#165643) Skip constant folding the loop predicates if the loop contains control convergence tokens referenced outside the loop. Fixes llvm#164496. Verified [loop_peeling.test](llvm/offload-test-suite#473) passes with the fix. Similar control convergence issues are found on other passes. llvm#165642 HLSL used for tests: ```hlsl RWStructuredBuffer<uint> Out : register(u0); [numthreads(8,1,1)] void main(uint3 TID : SV_GroupThreadID) { for (uint i = 0; i < 8; i++) { if (i == TID.x) { Out[TID.x] = WaveActiveMax(TID.x); break; } } } ``` With nested loop: ```hlsl RWStructuredBuffer<uint> Out : register(u0); [numthreads(8,8,1)] void main(uint3 TID : SV_GroupThreadID) { for (uint i = 0; i < 8; i++) { for (uint j = 0; j < 8; j++) { if (i == TID.x && j == TID.y) { uint index = TID.x * 8 + TID.y; Out[index] = WaveActiveMax(index); break; } } } } ```
…llvm#167011) This is a branch off of llvm#159856, in which consists of the runtime portion of the changes required to support indirect function and virtual function calls on an `omp target device` when the virtual class / indirect function is mapped to the device from the host. Key Changes - Introduced a new flag OMP_DECLARE_TARGET_INDIRECT_VTABLE to mark VTable registrations - Modified setupIndirectCallTable to support both VTable entries and indirect function pointers Details: The setupIndirectCallTable implementation was modified to support this registration type by retrieving the first address of the VTable and inferring the remaining data needed to build the indirect call table. Since the Vtables / Classes registered as indirect can be larger than 8 bytes, and the vtables may not be at the first address we either need to pass the size to __llvm_omp_indirect_call_lookup and have a check at each step of the binary search, or add multiple entries to the indirect table for each address registered. The latter was chosen. Commit: a00def3 is not a part of this PR and is handled / reviewed in: llvm#159856, This is PR (2/3) Register Vtable PR (1/3): llvm#159856, Codegen / _llvm_omp_indirect_call_lookup PR (3/3): llvm#159857
Resolves llvm#160514 Enables usage of the following x86 intrinsics in `constexpr`: ``` _mm256_shuffle_i64x2 _mm256_mask_shuffle_i64x2 _mm256_maskz_shuffle_i64x2 _mm256_shuffle_f64x2 _mm256_mask_shuffle_f64x2 _mm256_maskz_shuffle_f64x2 _mm512_shuffle_i64x2 _mm512_mask_shuffle_i64x2 _mm512_maskz_shuffle_i64x2 _mm512_shuffle_f64x2 _mm512_mask_shuffle_f64x2 _mm512_maskz_shuffle_f64x2 _mm256_shuffle_i32x4 _mm256_mask_shuffle_i32x4 _mm256_maskz_shuffle_i32x4 _mm256_shuffle_f32x4 _mm256_mask_shuffle_f32x4 _mm256_maskz_shuffle_f32x4 _mm512_shuffle_i32x4 _mm512_mask_shuffle_i32x4 _mm512_maskz_shuffle_i32x4 _mm512_shuffle_f32x4 _mm512_mask_shuffle_f32x4 _mm512_maskz_shuffle_f32x4 ```
Add the cir::exp2 operation and handling for the related builtins.
Fixes -Wunused-variable when compiling without LLVM_ENABLE_THREADS
A deduced return type can be an object type, in which case `const` can have an effect. Delay the diagnostic to the point at which the type is deduced. Add tests for lambdas. Fixes llvm#43054 Note that there is a discussion in llvm#43054 about adding a separate warning for "const return types are weird" for the class type cases, but it would have to be a separate warning - warning which currently exists in clang-tidy as `readability-const-return-type`.
Implement CountOf on VariableArrayType with IntegerConstant SizeExpr
A couple of builtin helper functions were taking a clang::Expr argument but only using it to build an MLIR location. This change updates these functions to take a location directly.
…lvm#163653) ## Summary: This change introduces a `DAPSessionManager` to enable multiple DAP sessions to share debugger instances when needed, for things like child process debugging and some scripting hooks that create dynamically new targets. Changes include: - Add `DAPSessionManager` singleton to track and coordinate all active DAP sessions - Support attaching to an existing target via its globally unique target ID (targetId parameter) - Share debugger instances across sessions when new targets are created dynamically - Refactor event thread management to allow sharing event threads between sessions and move event thread and event thread handlers to `EventHelpers` - Add `eBroadcastBitNewTargetCreated` event to notify when new targets are created - Extract session names from target creation events - Defer debugger initialization from 'initialize' request to 'launch'/'attach' requests. The only time the debugger is used currently in between its creation in `InitializeRequestHandler` and the `Launch` or `Attach` requests is during the `TelemetryDispatcher` destruction call at the end of the `DAP::HandleObject` call, so this is safe. This enables scenarios when new targets are created dynamically so that the debug adapter can automatically start a new debug session for the spawned target while sharing the debugger instance. ## Tests: The refactoring maintains backward compatibility. All existing DAP test cases pass. Also added a few basic unit tests for DAPSessionManager ``` >> ninja DAPTests >> ./tools/lldb/unittests/DAP/DAPTests >>./bin/llvm-lit -v ../llvm-project/lldb/test/API/tools/lldb-dap/ ```
Both `Target::ReadSignedIntegerFromMemory()` and `Process::ReadSignedIntegerFromMemory()` internally created an unsigned scalar, so extending the value later did not duplicate the sign bit.
…169042) * Added missing cluster.load ops with different sizes. Extended all rocdl tests
…folding. (llvm#149042)" This reverts commit a6edeed. The following fixes have landed, addressing issues causing the original revert: * llvm#169298 * llvm#167897 * llvm#168949 Original message: Building on top of llvm#148817, introduce a new abstract LastActiveLane opcode that gets lowered to Not(Mask) → FirstActiveLane(NotMask) → Sub(result, 1). When folding the tail, update all extracts for uses outside the loop the extract the value of the last actice lane. See also llvm#148603 PR: llvm#149042
While taking a look at the code of lldb test-suite packages, I have noticed that in `get_triple_str` in `darwin.py` env is added inside a `components` list, which is probably supposed to be `component` (defined on the line 61). Signed-off-by: Nikita B <n2h9z4@gmail.com>
…or target (llvm#168273) This pr fixes llvm#167388 . ## Description This pr adds new method `GetArchName` to `SBTarget` so that no need to parse triple to get arch name in client code. ## Testing ### All from `TestTargetAPI.py` run test with ``` ./build/bin/lldb-dotest -v -p TestTargetAPI.py ``` <details> <summary>existing tests (without newly added)</summary> <img width="1425" height="804" alt="image" src="https://github.com/user-attachments/assets/617e4c69-5c6b-44c4-9aeb-b751a47e253c" /> </details> <details> <summary>existing tests (with newly added)</summary> <img width="1422" height="778" alt="image" src="https://github.com/user-attachments/assets/746990a1-df88-4348-a090-224963d3c640" /> </details> ### Only `test_get_arch_name` run test with ``` ./build/bin/lldb-dotest -v -p TestTargetAPI.py -f test_get_arch_name_dwarf -f test_get_arch_name_dwo -f test_get_arch_name_dsym lldb/test/API/python_api/target ``` <details> <summary>only newly added</summary> <img width="1422" height="778" alt="image" src="https://github.com/user-attachments/assets/fcaafa5d-2622-4171-acee-e104ecee0652" /> </details> --------- Signed-off-by: Nikita B <n2h9z4@gmail.com> Co-authored-by: Jonas Devlieghere <jonas@devlieghere.com>
…en tail-folding. (llvm#149042)"" This reverts commit 72e51d3. Missed some test updates.
A recent change introduced a failure in debug builds due to an incorrect level of indirection inside an assert. This fixes that.
…folding. (llvm#149042)" This reverts commit a6edeed. The following fixes have landed, addressing issues causing the original revert: * llvm#169298 * llvm#167897 * llvm#168949 Original message: Building on top of llvm#148817, introduce a new abstract LastActiveLane opcode that gets lowered to Not(Mask) → FirstActiveLane(NotMask) → Sub(result, 1). When folding the tail, update all extracts for uses outside the loop the extract the value of the last actice lane. See also llvm#148603 PR: llvm#149042
`[[nodiscard]]` should be applied to functions where discarding the return value is most likely a correctness issue. - https://libcxx.llvm.org/CodingGuidelines.html#apply-nodiscard-where-relevant
…m#169611) https://wg21.link/#support `[[nodiscard]]` should be applied to functions where discarding the return value is most likely a correctness issue. - https://libcxx.llvm.org/CodingGuidelines.html#apply-nodiscard-where-relevant The following was implemented in this patch: - [x] `<compare>` - [x] `<corotine>` - [x] `<initializer_list>` - [x] Integer comparisons --------- Co-authored-by: Hristo Hristov <zingam@outlook.com> Co-authored-by: A. Jiang <de34@live.cn>
This basically adds a Leave option for a specific range of literals.
This supports the following use cases: - ConstantPtrAuth expressions that are unrepresentable using standard PAuth relocations such as expressions involving an integer operand or deactivation symbols. - libc implementations that do not support PAuth relocations. For more information see the RFC: https://discourse.llvm.org/t/rfc-structure-protection-a-family-of-uaf-mitigation-techniques/85555 Reviewers: MaskRay, fmayer, smithp35, kovdan01 Reviewed By: fmayer Pull Request: llvm#133533
Collaborator
Author
Collaborator
|
!PSDB |
Collaborator
Author
ronlieb
approved these changes
Nov 27, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.