Skip to content

Conversation

@z1-cciauto
Copy link
Collaborator

No description provided.

charithaintc and others added 30 commits November 26, 2025 10:10
…extract/insert_strided_slice` (llvm#168626)

This PR adds general SIMT distribution support for
`vector.extract/insert_strided_slice`. Currently vector distribution
already have support for these operations but have restrictions to avoid
requiring layouts during distribution logic. For example,
`extract_stride_slice` require that distributed dimension is fully
extracted. However, more complex cases may require extracting partially
from distributed dimension (eg. 8x16xf16 extraction from 8x32xf16).
These types of cases need the layouts to reason about how the data is
spread across SIMT lanes.

Currently, we don't have layout access in vector distribution so these
new patterns are place in XeGPU side. They have higher pattern benefit
so that they will be tried first before trying regular vector
distribution based patterns.
run 
```shell
build/bin/llvm-exegesis -mode=latency -mtriple=riscv64-unknown-linux-gnu --mcpu=generic --benchmark-phase=assemble-measured-code -opcode-index=-1
```

error:
```
---
mode:            latency
key:
  instructions:
    - 'NDS_FMV_BF16_X F2_H X11'
    - 'NDS_FMV_X_BF16 X26 F2_H'
  config:          ''
  register_initial_values:
    - 'X11=0x0'
cpu_name:        generic
llvm_triple:     riscv64-unknown-linux-gnu
min_instructions: 10000
measurements:    []
error:           actual measurements skipped.
info:            Repeating two instructions
assembled_snippet: 41116AE48145538105F0530D01E0538105F0530D01E0538105F0530D01E0538105F0530D01E0226D41018280
...
LLVM ERROR: Attempting to emit FMV_H_X instruction but the Feature_HasHalfFPLoadStoreMove predicate(s) are not met
```
Partial reductions can easily be represented by the VPReductionRecipe
class by setting their scale factor to something greater than 1. This PR
merges the two together and gives VPReductionRecipe a VFScaleFactor so
that it can choose to generate the partial reduction intrinsic at
execute time.

Stacked PRs:
1. llvm#147026
2. llvm#147255
3. llvm#156976
4. llvm#160154
5. llvm#147302
6. llvm#162503
7. -> llvm#147513

Replaces llvm#146073 .
…llvm#169653)

This patch reverts 80a4e6f

After the relevant patches clang now supports dwarf fission with RISC-V
linker relaxations, so we can remove the related driver error.
This PR enables maximising scalable vector bandwidth for all AArch64
cores other than the V1 and N2. Those two have shown small regressions
that we'll investigate, fix and then enable.
Checks if an instruction is BTI, and updates the immediate value to the
newly requested variant.  
  
This can be used in situations when the compiler already inserted a BTI
landing pad to a location, but BOLT needs to update it to a different
variant.
Example: br x0 to a location with a BTI c.
Based on the RUN lines, there is actually no need for different versions
of these error files since no cpu specific
option needed. Combine to reduce confusion and maintenance as these are
not huge files.
…en the loop contains control convergence operations. (llvm#165643)

Skip constant folding the loop predicates if the loop contains control
convergence tokens referenced outside the loop.

Fixes llvm#164496.

Verified
[loop_peeling.test](llvm/offload-test-suite#473)
passes with the fix.

Similar control convergence issues are found on other passes.
llvm#165642

HLSL used for tests:
```hlsl
RWStructuredBuffer<uint> Out : register(u0);

[numthreads(8,1,1)]
void main(uint3 TID : SV_GroupThreadID) {
    for (uint i = 0; i < 8; i++) {
        if (i == TID.x) {
            Out[TID.x] = WaveActiveMax(TID.x);
            break;
        }
    }
}
```
With nested loop:
```hlsl
RWStructuredBuffer<uint> Out : register(u0);

[numthreads(8,8,1)]
void main(uint3 TID : SV_GroupThreadID) {
    for (uint i = 0; i < 8; i++) {
        for (uint j = 0; j < 8; j++) {
            if (i == TID.x && j == TID.y) {
                uint index = TID.x * 8 + TID.y;
                Out[index] = WaveActiveMax(index);
                break;
            }
        }
    }
}
```
…llvm#167011)

This is a branch off of
llvm#159856, in which consists of
the runtime portion of the changes required to support indirect function
and virtual function calls on an `omp target device` when the virtual
class / indirect function is mapped to the device from the host.

Key Changes

- Introduced a new flag OMP_DECLARE_TARGET_INDIRECT_VTABLE to mark
VTable registrations
- Modified setupIndirectCallTable to support both VTable entries and
indirect function pointers

Details:
The setupIndirectCallTable implementation was modified to support this
registration type by retrieving the first address of the VTable and
inferring the remaining data needed to build the indirect call table.
Since the Vtables / Classes registered as indirect can be larger than 8
bytes, and the vtables may not be at the first address we either need to
pass the size to __llvm_omp_indirect_call_lookup and have a check at
each step of the binary search, or add multiple entries to the indirect
table for each address registered. The latter was chosen.

Commit: a00def3 is not a part of this
PR and is handled / reviewed in:
llvm#159856,

This is PR (2/3) 
Register Vtable PR (1/3):
llvm#159856,
Codegen / _llvm_omp_indirect_call_lookup PR (3/3):
llvm#159857
Resolves llvm#160514

Enables usage of the following x86 intrinsics in `constexpr`:

```
_mm256_shuffle_i64x2 _mm256_mask_shuffle_i64x2  _mm256_maskz_shuffle_i64x2 
_mm256_shuffle_f64x2 _mm256_mask_shuffle_f64x2  _mm256_maskz_shuffle_f64x2 
_mm512_shuffle_i64x2 _mm512_mask_shuffle_i64x2  _mm512_maskz_shuffle_i64x2 
_mm512_shuffle_f64x2 _mm512_mask_shuffle_f64x2  _mm512_maskz_shuffle_f64x2 

_mm256_shuffle_i32x4 _mm256_mask_shuffle_i32x4  _mm256_maskz_shuffle_i32x4 
_mm256_shuffle_f32x4 _mm256_mask_shuffle_f32x4  _mm256_maskz_shuffle_f32x4 
_mm512_shuffle_i32x4 _mm512_mask_shuffle_i32x4  _mm512_maskz_shuffle_i32x4 
_mm512_shuffle_f32x4 _mm512_mask_shuffle_f32x4  _mm512_maskz_shuffle_f32x4 
```
Add the cir::exp2 operation and handling for the related builtins.
Fixes -Wunused-variable when compiling without LLVM_ENABLE_THREADS
A deduced return type can be an object type, in which case `const` can
have an effect.
Delay the diagnostic to the point at which the type is deduced. 
Add tests for lambdas.

Fixes llvm#43054

Note that there is a discussion in llvm#43054 about adding a separate
warning for "const return types are weird" for the class type cases, but
it would have to be a separate warning - warning which currently exists
in clang-tidy as `readability-const-return-type`.
Implement CountOf on VariableArrayType with IntegerConstant SizeExpr
A couple of builtin helper functions were taking a clang::Expr argument
but only using it to build an MLIR location. This change updates these
functions to take a location directly.
…lvm#163653)

## Summary:
This change introduces a `DAPSessionManager` to enable multiple DAP
sessions to share debugger instances when needed, for things like child
process debugging and some scripting hooks that create dynamically new
targets.

Changes include:
- Add `DAPSessionManager` singleton to track and coordinate all active DAP
sessions
- Support attaching to an existing target via its globally unique target
ID (targetId parameter)
- Share debugger instances across sessions when new targets are created
dynamically
- Refactor event thread management to allow sharing event threads
between sessions and move event thread and event thread handlers to `EventHelpers`
- Add `eBroadcastBitNewTargetCreated` event to notify when new targets are
created
- Extract session names from target creation events
- Defer debugger initialization from 'initialize' request to
'launch'/'attach' requests. The only time the debugger is used currently
in between its creation in `InitializeRequestHandler` and the `Launch`
or `Attach` requests is during the `TelemetryDispatcher` destruction
call at the end of the `DAP::HandleObject` call, so this is safe.

This enables scenarios when new targets are created dynamically so that
the debug adapter can automatically start a new debug session for the
spawned target while sharing the debugger instance.

## Tests:
The refactoring maintains backward compatibility. All existing DAP test
cases pass.

Also added a few basic unit tests for DAPSessionManager
```
>> ninja DAPTests
>> ./tools/lldb/unittests/DAP/DAPTests
>>./bin/llvm-lit -v ../llvm-project/lldb/test/API/tools/lldb-dap/
```
Both `Target::ReadSignedIntegerFromMemory()` and
`Process::ReadSignedIntegerFromMemory()` internally created an unsigned
scalar, so extending the value later did not duplicate the sign bit.
…169042)

* Added missing cluster.load ops with different sizes. Extended all
rocdl tests
…folding. (llvm#149042)"

This reverts commit a6edeed.

The following fixes have landed, addressing issues causing the original
revert:
* llvm#169298
* llvm#167897
* llvm#168949

Original message:
Building on top of llvm#148817,
introduce a new abstract LastActiveLane opcode that gets lowered to
Not(Mask) → FirstActiveLane(NotMask) → Sub(result, 1).

When folding the tail, update all extracts for uses outside the loop the
extract the value of the last actice lane.

See also llvm#148603

PR: llvm#149042
While taking a look at the code of lldb test-suite packages, I have
noticed that in `get_triple_str` in `darwin.py` env is added inside a
`components` list, which is probably supposed to be `component` (defined
on the line 61).

Signed-off-by: Nikita B <n2h9z4@gmail.com>
…or target (llvm#168273)

This pr fixes llvm#167388 .

## Description

This pr adds new method `GetArchName` to `SBTarget` so that no need to
parse triple to get arch name in client code.

## Testing

### All from `TestTargetAPI.py`

run test with

```
./build/bin/lldb-dotest -v -p TestTargetAPI.py
```
<details>
<summary>existing tests (without newly added)</summary>
<img width="1425" height="804" alt="image"
src="https://github.com/user-attachments/assets/617e4c69-5c6b-44c4-9aeb-b751a47e253c"
/>
</details>

<details>
<summary>existing tests (with newly added)</summary>
<img width="1422" height="778" alt="image"
src="https://github.com/user-attachments/assets/746990a1-df88-4348-a090-224963d3c640"
/>

</details>

### Only `test_get_arch_name`

run test with 
```
./build/bin/lldb-dotest -v -p TestTargetAPI.py -f test_get_arch_name_dwarf -f test_get_arch_name_dwo -f test_get_arch_name_dsym lldb/test/API/python_api/target

```
<details>
<summary>only newly added</summary>
<img width="1422" height="778" alt="image"
src="https://github.com/user-attachments/assets/fcaafa5d-2622-4171-acee-e104ecee0652"
/>
</details>

---------

Signed-off-by: Nikita B <n2h9z4@gmail.com>
Co-authored-by: Jonas Devlieghere <jonas@devlieghere.com>
…en tail-folding. (llvm#149042)""

This reverts commit 72e51d3.

Missed some test updates.
A recent change introduced a failure in debug builds due to an incorrect
level of indirection inside an assert. This fixes that.
…folding. (llvm#149042)"

This reverts commit a6edeed.

The following fixes have landed, addressing issues causing the original
revert:
* llvm#169298
* llvm#167897
* llvm#168949

Original message:
Building on top of llvm#148817,
introduce a new abstract LastActiveLane opcode that gets lowered to
Not(Mask) → FirstActiveLane(NotMask) → Sub(result, 1).

When folding the tail, update all extracts for uses outside the loop the
extract the value of the last actice lane.

See also llvm#148603

PR: llvm#149042
`[[nodiscard]]` should be applied to functions where discarding the
return value is most likely a correctness issue.

- https://libcxx.llvm.org/CodingGuidelines.html#apply-nodiscard-where-relevant
H-G-Hristov and others added 4 commits November 26, 2025 22:17
…m#169611)

https://wg21.link/#support

`[[nodiscard]]` should be applied to functions where discarding the
return value is most likely a correctness issue.

- https://libcxx.llvm.org/CodingGuidelines.html#apply-nodiscard-where-relevant

The following was implemented in this patch:

- [x] `<compare>`
- [x] `<corotine>`
- [x] `<initializer_list>`
- [x] Integer comparisons

---------

Co-authored-by: Hristo Hristov <zingam@outlook.com>
Co-authored-by: A. Jiang <de34@live.cn>
This basically adds a Leave option for a specific range of literals.
This supports the following use cases:
- ConstantPtrAuth expressions that are unrepresentable using standard PAuth
  relocations such as expressions involving an integer operand or
  deactivation symbols.
- libc implementations that do not support PAuth relocations.

For more information see the RFC:
https://discourse.llvm.org/t/rfc-structure-protection-a-family-of-uaf-mitigation-techniques/85555

Reviewers: MaskRay, fmayer, smithp35, kovdan01

Reviewed By: fmayer

Pull Request: llvm#133533
@z1-cciauto z1-cciauto requested a review from a team November 26, 2025 20:30
@z1-cciauto
Copy link
Collaborator Author

@ronlieb
Copy link
Collaborator

ronlieb commented Nov 26, 2025

!PSDB

@z1-cciauto
Copy link
Collaborator Author

@z1-cciauto z1-cciauto merged commit a470909 into amd-staging Nov 27, 2025
13 checks passed
@z1-cciauto z1-cciauto deleted the upstream_merge_202511261530 branch November 27, 2025 01:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.