Skip to content

Conversation

@ronlieb
Copy link
Collaborator

@ronlieb ronlieb commented Nov 12, 2025

No description provided.

arsenm and others added 30 commits November 11, 2025 15:30
This shadows the member in the base class, but differs slightly
in behavior. The base method doesn't check for the invalid case.
…memref subviews (llvm#166581)

This PR applies the same fix from llvm#166569 to `memref.subview`. That PR
fixed the issue for `tensor.extract_slice`, and this one addresses the
identical problem for `memref.subview`.

The runtime verification for `memref.subview` incorrectly rejects valid
empty subviews (size=0) starting at the memref boundary.

**Example that demonstrates the issue:**

```mlir
func.func @subview_with_empty_slice(%memref: memref<10x4x1xf32, strided<[?, ?, ?], offset: ?>>, 
                                     %dim_0: index, 
                                     %dim_1: index, 
                                     %dim_2: index,
                                     %offset: index) {
    // When called with: offset=10, dim_0=0, dim_1=4, dim_2=1
    // Runtime verification fails: "offset 0 is out-of-bounds"
    %subview = memref.subview %memref[%offset, 0, 0] [%dim_0, %dim_1, %dim_2] [1, 1, 1] :
        memref<10x4x1xf32, strided<[?, ?, ?], offset: ?>> to
        memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>>
    return
}
```

When `%offset=10` and `%dim_0=0`, we're creating an empty subview (zero
elements along dimension 0) starting at the boundary. The current
verification enforces `offset < dim_size`, which evaluates to `10 < 10`
and fails. I feel this should be valid since no memory is accessed.

**The fix:**

Same as llvm#166569 - make the offset check conditional on subview size:
- Empty subview (size == 0): allow `0 <= offset <= dim_size`
- Non-empty subview (size > 0): require `0 <= offset < dim_size`

Please see llvm#166569 for motivation and rationale.

---

Co-authored-by: Hanumanth Hanumantharayappa <hhanuman@ah-hhanuman-l.dhcp.mathworks.com>
…empty tensor slices (llvm#166569)

I hit another runtime verification issue (similar to
llvm#164878) while working with
TFLite models. The verifier is incorrectly rejecting
`tensor.extract_slice` operations when extracting an empty slice
(size=0) that starts exactly at the tensor boundary.

The current runtime verification unconditionally enforces `offset <
dim_size`. This makes sense for non-empty slices, but it's too strict
for empty slices, causing false positives that lead to spurious runtime
assertions.

**Simple example that demonstrates the issue:**

```mlir
func.func @extract_empty_slice(%tensor: tensor<?xf32>, %offset: index, %size: index) {
  // When called with: tensor size=10, offset=10, size=0
  // Runtime verification fails: "offset 0 is out-of-bounds"
  %slice = tensor.extract_slice %tensor[%offset] [%size] [1] 
    : tensor<?xf32> to tensor<?xf32>
  return
}
```

For the above example, the check evaluates `10 < 10` which is false, so
verification fails. However, I believe this operation should be valid -
we're extracting zero elements, so there's no actual out-of-bounds
access.

**Real-world repro from the TensorFlow Lite models:**

This issue manifests while lowering TFLite models and a lot of our
system tests are failing due to this. Here's a simplified version
showing the problematic pattern:

In this code, `%extracted_slice_0` becomes an empty tensor when SSA
value `%15` reaches 10 (on the final loop iteration), making `%16 = 0`.
The operation extracts zero elements along dimension 0, which is
semantically valid but fails runtime verification.

```mlir
func.func @simplified_repro_from_tensorflowlite_model(%arg0: tensor<10x4x1xf32>) -> tensor<10x4x1xf32> {
  %c0 = arith.constant 0 : index
  %c1 = arith.constant 1 : index
  %c2 = arith.constant 2 : index
  %c10 = arith.constant 10 : index
  %c-1 = arith.constant -1 : index
  
  %0 = "tosa.const"() <{values = dense<0> : tensor<i32>}> : () -> tensor<i32>
  %1 = "tosa.const"() <{values = dense<1> : tensor<i32>}> : () -> tensor<i32>
  %2 = "tosa.const"() <{values = dense<10> : tensor<i32>}> : () -> tensor<i32>
  %3 = "tosa.const"() <{values = dense<-1> : tensor<2xi32>}> : () -> tensor<2xi32>
  %4 = "tosa.const"() <{values = dense<0> : tensor<2xi32>}> : () -> tensor<2xi32>
  %5 = "tosa.const"() <{values = dense<0.000000e+00> : tensor<1x4x1xf32>}> : () -> tensor<1x4x1xf32>
  %c4_1 = tosa.const_shape  {values = dense<1> : tensor<1xindex>} : () -> !tosa.shape<1>
  
  %6:2 = scf.while (%arg1 = %0, %arg2 = %arg0) 
    : (tensor<i32>, tensor<10x4x1xf32>) -> (tensor<i32>, tensor<10x4x1xf32>) {
    %7 = tosa.greater %2, %arg1 : (tensor<i32>, tensor<i32>) -> tensor<i1>
    %extracted = tensor.extract %7[] : tensor<i1>
    scf.condition(%extracted) %arg1, %arg2 : tensor<i32>, tensor<10x4x1xf32>
  } do {
  ^bb0(%arg1: tensor<i32>, %arg2: tensor<10x4x1xf32>):
    %7 = tosa.add %arg1, %1 : (tensor<i32>, tensor<i32>) -> tensor<i32>
    
    // First slice
    %8 = tosa.reshape %arg1, %c4_1 : (tensor<i32>, !tosa.shape<1>) -> tensor<1xi32>
    %9 = tosa.concat %8, %3 {axis = 0 : i32} : (tensor<1xi32>, tensor<2xi32>) -> tensor<3xi32>
    
    %extracted_0 = tensor.extract %9[%c0] : tensor<3xi32>
    %10 = index.casts %extracted_0 : i32 to index
    %11 = arith.cmpi eq, %10, %c-1 : index
    %12 = arith.select %11, %c10, %10 : index
    
    %extracted_slice = tensor.extract_slice %arg2[0, 0, 0] [%12, 4, 1] [1, 1, 1] 
      : tensor<10x4x1xf32> to tensor<?x4x1xf32>
    
    // Second slice - this is where the failure occurs
    %13 = tosa.reshape %7, %c4_1 : (tensor<i32>, !tosa.shape<1>) -> tensor<1xi32>
    %14 = tosa.concat %13, %4 {axis = 0 : i32} : (tensor<1xi32>, tensor<2xi32>) -> tensor<3xi32>
    
    %extracted_1 = tensor.extract %14[%c0] : tensor<3xi32>
    %15 = index.castu %extracted_1 : i32 to index
    %16 = arith.subi %c10, %15 : index  // size = 10 - offset
    
    %extracted_2 = tensor.extract %14[%c1] : tensor<3xi32>
    %17 = index.castu %extracted_2 : i32 to index
    
    %extracted_3 = tensor.extract %14[%c2] : tensor<3xi32>
    %18 = index.castu %extracted_3 : i32 to index
    
    // On the last loop iteration: %15=10, %16=0
    // %extracted_slice_0 becomes an empty tensor
    // Runtime verification fails: "offset 0 is out-of-bounds"
    %extracted_slice_0 = tensor.extract_slice %arg2[%15, %17, %18] [%16, 4, 1] [1, 1, 1] 
      : tensor<10x4x1xf32> to tensor<?x4x1xf32>
    
    %19 = tosa.concat %extracted_slice, %5, %extracted_slice_0 {axis = 0 : i32} 
      : (tensor<?x4x1xf32>, tensor<1x4x1xf32>, tensor<?x4x1xf32>) -> tensor<10x4x1xf32>
    
    scf.yield %7, %19 : tensor<i32>, tensor<10x4x1xf32>
  }
  
  return %6#1 : tensor<10x4x1xf32>
}
```
**The fix:**

Make the offset check conditional on slice size:
- Empty slice (size == 0): allow `0 <= offset <= dim_size`
- Non-empty slice (size > 0): require `0 <= offset < dim_size`


**Question for reviewers:**
Should we also relax the static verifier to allow this edge case?
Currently, the static verifier rejects the following IR:

```mlir
%tensor = arith.constant dense<1.0> : tensor<10xf32>
%slice = tensor.extract_slice %tensor[10] [0] [1] : tensor<10xf32> to tensor<0xf32>
```
Since we're allowing it at runtime for dynamic shapes, it seems
inconsistent to reject it statically. However, I wanted to get feedback
before making that change - this PR focuses only on the runtime
verification fix for dynamic shapes.

P.S. We have a similar issue with `memref.subview`. I will send a
separate patch for the issue.

Co-authored-by: Hanumanth Hanumantharayappa <hhanuman@ah-hhanuman-l.dhcp.mathworks.com>
Removed large offset test. It caused issue with ARM 32-bit because of
large offset.

Co-authored-by: anoopkg6 <anoopkg6@github.com>
The C++ index switch op has utilities for `getCaseBlock(int i)` and
`getDefaultBlock()`, so these have been added.
Optional body builder args have been added: one for the default case and
one for the switch cases.
Add requires to not run `invalid-section-index.s` test in non aarch64
supported environments.
This should not be overridable and the special case hacks
have been replaced with RegClassByHwMode
Summary
-------

While dogfooding lldb-dap, I observed that VSCode frequently displays
certain stack frames as greyed out. Although these frames have valid
debug information, double-clicking them shows disassembly instead of
source code. However, running `bt` from the LLDB command line correctly
displays source file and line information for these same frames,
indicating this is an lldb-dap specific issue.

Root Cause
----------

Investigation revealed that `DAP::ResolveSource()` incorrectly uses a
frame's PC address directly to determine whether valid source line
information exists. This approach works for leaf frames, but fails for
non-leaf (caller) frames where the PC points to the return address
immediately after a call instruction. This return address may fall into
compiler-generated code with no associated line information, even though
the actual call site has valid source location data.

The correct approach is to use the symbol context's line entry, which
LLDB resolves by effectively checking PC-1 for non-leaf frames, properly
identifying the line information for the call instruction rather than
the return address.

Testing
-------

Manually tested with VSCode debugging sessions on production workloads.
Verified that non-leaf frames now correctly display source code instead
of disassembly view.

Before the change symptom:
<img width="1013" height="216" alt="image"
src="https://github.com/user-attachments/assets/9487fbc0-f438-4892-a8d2-1437dc25399b"
/>


And here is after the fix:
<img width="1068" height="198" alt="image"
src="https://github.com/user-attachments/assets/0d2ebaa7-cca6-4983-a1d1-1a26ae62c86f"
/>

---------

Co-authored-by: Jeffrey Tan <jeffreytan@fb.com>
…tion motion" (llvm#167465)

This patch introduces a new virtual method
`TargetInstrInfo::isSafeToMove()` to allow backends to control whether a
machine instruction can be safely moved by optimization passes.

The `BranchFolder` pass now respects this hook when hoisting common
code. By default, all instructions are considered safe to to move.

For LoongArch, `isSafeToMove()` is overridden to prevent
relocation-related instruction sequences (e.g. PC-relative addressing
and calls) from being broken by instruction motion. Correspondingly,
`isSchedulingBoundary()` is updated to reuse this logic for consistency.

Relands llvm#163725
Summary:
The OpenMP handling using an offload binary should be optional, it's
only used for extra metadata for llvm-objdump. Also the triple was
completely wrong, it didn't let anyone correctly choose between ELF
and COFF handling.
…nnotations' crashes (llvm#167487)

Added handling the case of a non-materialized module, also don't call
printInfoComment for immaterializable values
…en (llvm#166987)

Only use RuntimeLibcallsInfo. Remove the helper functions used to
transition.
Resolves llvm#148131

- Unlock `std::optional<T&>` implementation
- Allow instantiations of `optional<T(&)(...)>` and `optional<T(&)[]>`
but disables `value_or()` and `optional::iterator` + all `iterator`
related functions
- Update documentation
- Update tests
Makes linalg.reduce and linalg.map region_ops so they can be constructed
from functions and be called as decorators.
A full LTO link time performance and memory regression was introduced
by llvm#137081 in cases where the modules contain large quantities of llvm.used
globals. This was unnoticed because it was not expected that this would
be a typical case, but this is exactly what coverage collection does,
and when this feature is enabled together with full LTO we end up with
quadratic memory consumption (from the unused constants) and quadratic
complexity in the function Verifier::visitGlobalValue (which visits all
the unused constants in the use list of each global value). This is a
targeted fix that avoids reintroducing the quadratic complexity from
before llvm#137081, by having ValueMapper delete the old initializer of an
appending global if it is unused, instead of visiting every global in
the context after every link.

The repro-cfi-64 reproducer from llvm#167037 before and after this change:

```
        Elapsed time   Max RSS (KB)
Before   12:05.11        52537184
After     3:27.68         7520696
```

Fixes llvm#167037.

Reviewers: nikic, teresajohnson

Reviewed By: teresajohnson

Pull Request: llvm#167629
…#166988)

This kind of helper is higher level and not general enough to go
directly in SelectionDAG. Most similar utilities are in TargetLowering.
…nwind compatibility (llvm#160887)

As it was explained to me in
https://discourse.llvm.org/t/libunwinds-raison-detre/88283/2 the LLVM
version of libunwind is mostly compatible with nongnu one. This change
improves the compatibility a bit further.
lhames and others added 3 commits November 12, 2025 16:36
NFCI -- the deleted copy constructor already made this immovable. The
explicit operations just make clear that this was intentional.
…tions from other named module (llvm#167468)

Close llvm#166068

The cause of the problem is that we would import initializers and
pending implicit instantiations from other named module. This is very
bad and it may waste a lot of time.

And we didn't observe it as the weak symbols can live together and the
strong symbols would be removed by other mechanism. So we didn't observe
the bad behavior for a long time. But it indeeds waste compilation time.
@z1-cciauto
Copy link
Collaborator

@z1-cciauto z1-cciauto merged commit e34e00f into amd-staging Nov 12, 2025
13 checks passed
@z1-cciauto z1-cciauto deleted the amd/merge/upstream_merge_20251111233902 branch November 12, 2025 08:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.