merge main into amd-staging #569

ronlieb · 2025-11-12T05:58:23Z

No description provided.

This shadows the member in the base class, but differs slightly in behavior. The base method doesn't check for the invalid case.

…memref subviews (llvm#166581) This PR applies the same fix from llvm#166569 to `memref.subview`. That PR fixed the issue for `tensor.extract_slice`, and this one addresses the identical problem for `memref.subview`. The runtime verification for `memref.subview` incorrectly rejects valid empty subviews (size=0) starting at the memref boundary. **Example that demonstrates the issue:** ```mlir func.func @subview_with_empty_slice(%memref: memref<10x4x1xf32, strided<[?, ?, ?], offset: ?>>, %dim_0: index, %dim_1: index, %dim_2: index, %offset: index) { // When called with: offset=10, dim_0=0, dim_1=4, dim_2=1 // Runtime verification fails: "offset 0 is out-of-bounds" %subview = memref.subview %memref[%offset, 0, 0] [%dim_0, %dim_1, %dim_2] [1, 1, 1] : memref<10x4x1xf32, strided<[?, ?, ?], offset: ?>> to memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>> return } ``` When `%offset=10` and `%dim_0=0`, we're creating an empty subview (zero elements along dimension 0) starting at the boundary. The current verification enforces `offset < dim_size`, which evaluates to `10 < 10` and fails. I feel this should be valid since no memory is accessed. **The fix:** Same as llvm#166569 - make the offset check conditional on subview size: - Empty subview (size == 0): allow `0 <= offset <= dim_size` - Non-empty subview (size > 0): require `0 <= offset < dim_size` Please see llvm#166569 for motivation and rationale. --- Co-authored-by: Hanumanth Hanumantharayappa <hhanuman@ah-hhanuman-l.dhcp.mathworks.com>

…empty tensor slices (llvm#166569) I hit another runtime verification issue (similar to llvm#164878) while working with TFLite models. The verifier is incorrectly rejecting `tensor.extract_slice` operations when extracting an empty slice (size=0) that starts exactly at the tensor boundary. The current runtime verification unconditionally enforces `offset < dim_size`. This makes sense for non-empty slices, but it's too strict for empty slices, causing false positives that lead to spurious runtime assertions. **Simple example that demonstrates the issue:** ```mlir func.func @extract_empty_slice(%tensor: tensor<?xf32>, %offset: index, %size: index) { // When called with: tensor size=10, offset=10, size=0 // Runtime verification fails: "offset 0 is out-of-bounds" %slice = tensor.extract_slice %tensor[%offset] [%size] [1] : tensor<?xf32> to tensor<?xf32> return } ``` For the above example, the check evaluates `10 < 10` which is false, so verification fails. However, I believe this operation should be valid - we're extracting zero elements, so there's no actual out-of-bounds access. **Real-world repro from the TensorFlow Lite models:** This issue manifests while lowering TFLite models and a lot of our system tests are failing due to this. Here's a simplified version showing the problematic pattern: In this code, `%extracted_slice_0` becomes an empty tensor when SSA value `%15` reaches 10 (on the final loop iteration), making `%16 = 0`. The operation extracts zero elements along dimension 0, which is semantically valid but fails runtime verification. ```mlir func.func @simplified_repro_from_tensorflowlite_model(%arg0: tensor<10x4x1xf32>) -> tensor<10x4x1xf32> { %c0 = arith.constant 0 : index %c1 = arith.constant 1 : index %c2 = arith.constant 2 : index %c10 = arith.constant 10 : index %c-1 = arith.constant -1 : index %0 = "tosa.const"() <{values = dense<0> : tensor<i32>}> : () -> tensor<i32> %1 = "tosa.const"() <{values = dense<1> : tensor<i32>}> : () -> tensor<i32> %2 = "tosa.const"() <{values = dense<10> : tensor<i32>}> : () -> tensor<i32> %3 = "tosa.const"() <{values = dense<-1> : tensor<2xi32>}> : () -> tensor<2xi32> %4 = "tosa.const"() <{values = dense<0> : tensor<2xi32>}> : () -> tensor<2xi32> %5 = "tosa.const"() <{values = dense<0.000000e+00> : tensor<1x4x1xf32>}> : () -> tensor<1x4x1xf32> %c4_1 = tosa.const_shape {values = dense<1> : tensor<1xindex>} : () -> !tosa.shape<1> %6:2 = scf.while (%arg1 = %0, %arg2 = %arg0) : (tensor<i32>, tensor<10x4x1xf32>) -> (tensor<i32>, tensor<10x4x1xf32>) { %7 = tosa.greater %2, %arg1 : (tensor<i32>, tensor<i32>) -> tensor<i1> %extracted = tensor.extract %7[] : tensor<i1> scf.condition(%extracted) %arg1, %arg2 : tensor<i32>, tensor<10x4x1xf32> } do { ^bb0(%arg1: tensor<i32>, %arg2: tensor<10x4x1xf32>): %7 = tosa.add %arg1, %1 : (tensor<i32>, tensor<i32>) -> tensor<i32> // First slice %8 = tosa.reshape %arg1, %c4_1 : (tensor<i32>, !tosa.shape<1>) -> tensor<1xi32> %9 = tosa.concat %8, %3 {axis = 0 : i32} : (tensor<1xi32>, tensor<2xi32>) -> tensor<3xi32> %extracted_0 = tensor.extract %9[%c0] : tensor<3xi32> %10 = index.casts %extracted_0 : i32 to index %11 = arith.cmpi eq, %10, %c-1 : index %12 = arith.select %11, %c10, %10 : index %extracted_slice = tensor.extract_slice %arg2[0, 0, 0] [%12, 4, 1] [1, 1, 1] : tensor<10x4x1xf32> to tensor<?x4x1xf32> // Second slice - this is where the failure occurs %13 = tosa.reshape %7, %c4_1 : (tensor<i32>, !tosa.shape<1>) -> tensor<1xi32> %14 = tosa.concat %13, %4 {axis = 0 : i32} : (tensor<1xi32>, tensor<2xi32>) -> tensor<3xi32> %extracted_1 = tensor.extract %14[%c0] : tensor<3xi32> %15 = index.castu %extracted_1 : i32 to index %16 = arith.subi %c10, %15 : index // size = 10 - offset %extracted_2 = tensor.extract %14[%c1] : tensor<3xi32> %17 = index.castu %extracted_2 : i32 to index %extracted_3 = tensor.extract %14[%c2] : tensor<3xi32> %18 = index.castu %extracted_3 : i32 to index // On the last loop iteration: %15=10, %16=0 // %extracted_slice_0 becomes an empty tensor // Runtime verification fails: "offset 0 is out-of-bounds" %extracted_slice_0 = tensor.extract_slice %arg2[%15, %17, %18] [%16, 4, 1] [1, 1, 1] : tensor<10x4x1xf32> to tensor<?x4x1xf32> %19 = tosa.concat %extracted_slice, %5, %extracted_slice_0 {axis = 0 : i32} : (tensor<?x4x1xf32>, tensor<1x4x1xf32>, tensor<?x4x1xf32>) -> tensor<10x4x1xf32> scf.yield %7, %19 : tensor<i32>, tensor<10x4x1xf32> } return %6#1 : tensor<10x4x1xf32> } ``` **The fix:** Make the offset check conditional on slice size: - Empty slice (size == 0): allow `0 <= offset <= dim_size` - Non-empty slice (size > 0): require `0 <= offset < dim_size` **Question for reviewers:** Should we also relax the static verifier to allow this edge case? Currently, the static verifier rejects the following IR: ```mlir %tensor = arith.constant dense<1.0> : tensor<10xf32> %slice = tensor.extract_slice %tensor[10] [0] [1] : tensor<10xf32> to tensor<0xf32> ``` Since we're allowing it at runtime for dynamic shapes, it seems inconsistent to reject it statically. However, I wanted to get feedback before making that change - this PR focuses only on the runtime verification fix for dynamic shapes. P.S. We have a similar issue with `memref.subview`. I will send a separate patch for the issue. Co-authored-by: Hanumanth Hanumantharayappa <hhanuman@ah-hhanuman-l.dhcp.mathworks.com>

Removed large offset test. It caused issue with ARM 32-bit because of large offset. Co-authored-by: anoopkg6 <anoopkg6@github.com>

The C++ index switch op has utilities for `getCaseBlock(int i)` and `getDefaultBlock()`, so these have been added. Optional body builder args have been added: one for the default case and one for the switch cases.

Add requires to not run `invalid-section-index.s` test in non aarch64 supported environments.

This should not be overridable and the special case hacks have been replaced with RegClassByHwMode

Summary ------- While dogfooding lldb-dap, I observed that VSCode frequently displays certain stack frames as greyed out. Although these frames have valid debug information, double-clicking them shows disassembly instead of source code. However, running `bt` from the LLDB command line correctly displays source file and line information for these same frames, indicating this is an lldb-dap specific issue. Root Cause ---------- Investigation revealed that `DAP::ResolveSource()` incorrectly uses a frame's PC address directly to determine whether valid source line information exists. This approach works for leaf frames, but fails for non-leaf (caller) frames where the PC points to the return address immediately after a call instruction. This return address may fall into compiler-generated code with no associated line information, even though the actual call site has valid source location data. The correct approach is to use the symbol context's line entry, which LLDB resolves by effectively checking PC-1 for non-leaf frames, properly identifying the line information for the call instruction rather than the return address. Testing ------- Manually tested with VSCode debugging sessions on production workloads. Verified that non-leaf frames now correctly display source code instead of disassembly view. Before the change symptom: <img width="1013" height="216" alt="image" src="https://github.com/user-attachments/assets/9487fbc0-f438-4892-a8d2-1437dc25399b" /> And here is after the fix: <img width="1068" height="198" alt="image" src="https://github.com/user-attachments/assets/0d2ebaa7-cca6-4983-a1d1-1a26ae62c86f" /> --------- Co-authored-by: Jeffrey Tan <jeffreytan@fb.com>

…llvm#167456)

…tion motion" (llvm#167465) This patch introduces a new virtual method `TargetInstrInfo::isSafeToMove()` to allow backends to control whether a machine instruction can be safely moved by optimization passes. The `BranchFolder` pass now respects this hook when hoisting common code. By default, all instructions are considered safe to to move. For LoongArch, `isSafeToMove()` is overridden to prevent relocation-related instruction sequences (e.g. PC-relative addressing and calls) from being broken by instruction motion. Correspondingly, `isSchedulingBoundary()` is updated to reuse this logic for consistency. Relands llvm#163725

Summary: The OpenMP handling using an offload binary should be optional, it's only used for extra metadata for llvm-objdump. Also the triple was completely wrong, it didn't let anyone correctly choose between ELF and COFF handling.

…m#167593)

…ns (llvm#166985)

Copy new process from sincos/sincospi

…#167611)

…nnotations' crashes (llvm#167487) Added handling the case of a non-materialized module, also don't call printInfoComment for immaterializable values

…lvm#165066)

…en (llvm#166987) Only use RuntimeLibcallsInfo. Remove the helper functions used to transition.

Resolves llvm#148131 - Unlock `std::optional<T&>` implementation - Allow instantiations of `optional<T(&)(...)>` and `optional<T(&)[]>` but disables `value_or()` and `optional::iterator` + all `iterator` related functions - Update documentation - Update tests

Makes linalg.reduce and linalg.map region_ops so they can be constructed from functions and be called as decorators.

…tions in reduced BMI

A full LTO link time performance and memory regression was introduced by llvm#137081 in cases where the modules contain large quantities of llvm.used globals. This was unnoticed because it was not expected that this would be a typical case, but this is exactly what coverage collection does, and when this feature is enabled together with full LTO we end up with quadratic memory consumption (from the unused constants) and quadratic complexity in the function Verifier::visitGlobalValue (which visits all the unused constants in the use list of each global value). This is a targeted fix that avoids reintroducing the quadratic complexity from before llvm#137081, by having ValueMapper delete the old initializer of an appending global if it is unused, instead of visiting every global in the context after every link. The repro-cfi-64 reproducer from llvm#167037 before and after this change: ``` Elapsed time Max RSS (KB) Before 12:05.11 52537184 After 3:27.68 7520696 ``` Fixes llvm#167037. Reviewers: nikic, teresajohnson Reviewed By: teresajohnson Pull Request: llvm#167629

…#166988) This kind of helper is higher level and not general enough to go directly in SelectionDAG. Most similar utilities are in TargetLowering.

…nwind compatibility (llvm#160887) As it was explained to me in https://discourse.llvm.org/t/libunwinds-raison-detre/88283/2 the LLVM version of libunwind is mostly compatible with nongnu one. This change improves the compatibility a bit further.

…lvm#167639)

NFCI -- the deleted copy constructor already made this immovable. The explicit operations just make clear that this was intentional.

…tions from other named module (llvm#167468) Close llvm#166068 The cause of the problem is that we would import initializers and pending implicit instantiations from other named module. This is very bad and it may waste a lot of time. And we didn't observe it as the weak symbols can live together and the strong symbols would be removed by other mechanism. So we didn't observe the bad behavior for a long time. But it indeeds waste compilation time.

z1-cciauto · 2025-11-12T05:59:24Z

PSDB Link: https://compiler-ci.amd.com/job/compiler-psdb-amd-staging/2779

arsenm and others added 30 commits November 11, 2025 15:30

AMDGPU: Update register class numbers in test (llvm#167601)

441e511

AMDGPU: Remove wrapper around TRI::getRegClass (llvm#159885)

e3a9ac5

This shadows the member in the base class, but differs slightly in behavior. The base method doesn't check for the invalid case.

[JITLINK] Fix large offset issue (llvm#167600)

b07f8b0

Removed large offset test. It caused issue with ARM 32-bit because of large offset. Co-authored-by: anoopkg6 <anoopkg6@github.com>

[MLIR][Python] Add wrappers for scf.index_switch (llvm#167458)

ce17599

The C++ index switch op has utilities for `getCaseBlock(int i)` and `getDefaultBlock()`, so these have been added. Optional body builder args have been added: one for the default case and one for the switch cases.

[MachO] Fix test failure. (llvm#167598)

95dfe79

Add requires to not run `invalid-section-index.s` test in non aarch64 supported environments.

AMDGPU: Remove override of TargetInstrInfo::getRegClass (llvm#159886)

fcba304

This should not be overridable and the special case hacks have been replaced with RegClassByHwMode

[gn build] Port 5c3323a

362119d

[AMDGPU][GISel] Add RegBankLegalize support for G_AMDGPU_WAVE_ADDRESS (…

79d9ae7

…llvm#167456)

[NFC] Generalize the arithmetic type for getDisjunctionWeights (llv…

a863fd8

…m#167593)

RuntimeLibcalls: Add libcall entries for sleef and armpl modf functio…

75e38aa

…ns (llvm#166985)

DAG: Use modf vector libcalls through RuntimeLibcalls (llvm#166986)

4b9771e

Copy new process from sincos/sincospi

InferAddressSpaces: Add more baseline tests for assume handling (llvm…

c1f18a2

…#167611)

[llvm][llvm-dis] Fix 'llvm-dis' with '--materialize-metadata --show-a…

a100a6c

…nnotations' crashes (llvm#167487) Added handling the case of a non-materialized module, also don't call printInfoComment for immaterializable values

[WebAssembly] Use MCRegister::id(). NFC (llvm#167609)

3e6442a

[libunwind] Fix execution flow imbalance when using C++ Exceptions (l…

cf35502

…lvm#165066)

DAG: Stop using TargetLibraryInfo for multi-result FP intrinsic codeg…

95f2728

…en (llvm#166987) Only use RuntimeLibcallsInfo. Remove the helper functions used to transition.

[MLIR][Python] Add region_op wrappers for linalg (llvm#167616)

175e3be

Makes linalg.reduce and linalg.map region_ops so they can be constructed from functions and be called as decorators.

[NFC] [C++20] [Modules] Test that we can avoid adding more specializa…

905ee44

…tions in reduced BMI

DAG: Move expandMultipleResultFPLibCall to TargetLowering (NFC) (llvm…

c5aace4

…#166988) This kind of helper is higher level and not general enough to go directly in SelectionDAG. Most similar utilities are in TargetLowering.

Orc rt session wrap unwrap (llvm#167635)

ce32b73

Wasm: Fix hardcoding _Unwind_CallPersonality function name (llvm#167612)

a664cad

PPC: Mark xfailed sincospi test as unsupported with EXPENSIVE_CHECKS (l…

7d9b7e8

…lvm#167639)

lhames and others added 3 commits November 12, 2025 16:36

[orc-rt] Make Session explicitly immovable. (llvm#167640)

a7ceeff

NFCI -- the deleted copy constructor already made this immovable. The explicit operations just make clear that this was intentional.

merge main into amd-staging

518c02b

ronlieb requested review from a team and dpalermo November 12, 2025 05:58

ronlieb requested review from nicolasvasilache and stellaraccident as code owners November 12, 2025 05:58

dpalermo approved these changes Nov 12, 2025

View reviewed changes

z1-cciauto merged commit e34e00f into amd-staging Nov 12, 2025
13 checks passed

z1-cciauto deleted the amd/merge/upstream_merge_20251111233902 branch November 12, 2025 08:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

merge main into amd-staging #569

merge main into amd-staging #569

Uh oh!

ronlieb commented Nov 12, 2025

Uh oh!

z1-cciauto commented Nov 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

22 participants

merge main into amd-staging #569

merge main into amd-staging #569

Uh oh!

Conversation

ronlieb commented Nov 12, 2025

Uh oh!

z1-cciauto commented Nov 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

22 participants