forked from llvm/llvm-project
-
Notifications
You must be signed in to change notification settings - Fork 76
merge main into amd-staging #569
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
z1-cciauto
merged 33 commits into
amd-staging
from
amd/merge/upstream_merge_20251111233902
Nov 12, 2025
Merged
merge main into amd-staging #569
z1-cciauto
merged 33 commits into
amd-staging
from
amd/merge/upstream_merge_20251111233902
Nov 12, 2025
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This shadows the member in the base class, but differs slightly in behavior. The base method doesn't check for the invalid case.
…memref subviews (llvm#166581) This PR applies the same fix from llvm#166569 to `memref.subview`. That PR fixed the issue for `tensor.extract_slice`, and this one addresses the identical problem for `memref.subview`. The runtime verification for `memref.subview` incorrectly rejects valid empty subviews (size=0) starting at the memref boundary. **Example that demonstrates the issue:** ```mlir func.func @subview_with_empty_slice(%memref: memref<10x4x1xf32, strided<[?, ?, ?], offset: ?>>, %dim_0: index, %dim_1: index, %dim_2: index, %offset: index) { // When called with: offset=10, dim_0=0, dim_1=4, dim_2=1 // Runtime verification fails: "offset 0 is out-of-bounds" %subview = memref.subview %memref[%offset, 0, 0] [%dim_0, %dim_1, %dim_2] [1, 1, 1] : memref<10x4x1xf32, strided<[?, ?, ?], offset: ?>> to memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>> return } ``` When `%offset=10` and `%dim_0=0`, we're creating an empty subview (zero elements along dimension 0) starting at the boundary. The current verification enforces `offset < dim_size`, which evaluates to `10 < 10` and fails. I feel this should be valid since no memory is accessed. **The fix:** Same as llvm#166569 - make the offset check conditional on subview size: - Empty subview (size == 0): allow `0 <= offset <= dim_size` - Non-empty subview (size > 0): require `0 <= offset < dim_size` Please see llvm#166569 for motivation and rationale. --- Co-authored-by: Hanumanth Hanumantharayappa <hhanuman@ah-hhanuman-l.dhcp.mathworks.com>
…empty tensor slices (llvm#166569) I hit another runtime verification issue (similar to llvm#164878) while working with TFLite models. The verifier is incorrectly rejecting `tensor.extract_slice` operations when extracting an empty slice (size=0) that starts exactly at the tensor boundary. The current runtime verification unconditionally enforces `offset < dim_size`. This makes sense for non-empty slices, but it's too strict for empty slices, causing false positives that lead to spurious runtime assertions. **Simple example that demonstrates the issue:** ```mlir func.func @extract_empty_slice(%tensor: tensor<?xf32>, %offset: index, %size: index) { // When called with: tensor size=10, offset=10, size=0 // Runtime verification fails: "offset 0 is out-of-bounds" %slice = tensor.extract_slice %tensor[%offset] [%size] [1] : tensor<?xf32> to tensor<?xf32> return } ``` For the above example, the check evaluates `10 < 10` which is false, so verification fails. However, I believe this operation should be valid - we're extracting zero elements, so there's no actual out-of-bounds access. **Real-world repro from the TensorFlow Lite models:** This issue manifests while lowering TFLite models and a lot of our system tests are failing due to this. Here's a simplified version showing the problematic pattern: In this code, `%extracted_slice_0` becomes an empty tensor when SSA value `%15` reaches 10 (on the final loop iteration), making `%16 = 0`. The operation extracts zero elements along dimension 0, which is semantically valid but fails runtime verification. ```mlir func.func @simplified_repro_from_tensorflowlite_model(%arg0: tensor<10x4x1xf32>) -> tensor<10x4x1xf32> { %c0 = arith.constant 0 : index %c1 = arith.constant 1 : index %c2 = arith.constant 2 : index %c10 = arith.constant 10 : index %c-1 = arith.constant -1 : index %0 = "tosa.const"() <{values = dense<0> : tensor<i32>}> : () -> tensor<i32> %1 = "tosa.const"() <{values = dense<1> : tensor<i32>}> : () -> tensor<i32> %2 = "tosa.const"() <{values = dense<10> : tensor<i32>}> : () -> tensor<i32> %3 = "tosa.const"() <{values = dense<-1> : tensor<2xi32>}> : () -> tensor<2xi32> %4 = "tosa.const"() <{values = dense<0> : tensor<2xi32>}> : () -> tensor<2xi32> %5 = "tosa.const"() <{values = dense<0.000000e+00> : tensor<1x4x1xf32>}> : () -> tensor<1x4x1xf32> %c4_1 = tosa.const_shape {values = dense<1> : tensor<1xindex>} : () -> !tosa.shape<1> %6:2 = scf.while (%arg1 = %0, %arg2 = %arg0) : (tensor<i32>, tensor<10x4x1xf32>) -> (tensor<i32>, tensor<10x4x1xf32>) { %7 = tosa.greater %2, %arg1 : (tensor<i32>, tensor<i32>) -> tensor<i1> %extracted = tensor.extract %7[] : tensor<i1> scf.condition(%extracted) %arg1, %arg2 : tensor<i32>, tensor<10x4x1xf32> } do { ^bb0(%arg1: tensor<i32>, %arg2: tensor<10x4x1xf32>): %7 = tosa.add %arg1, %1 : (tensor<i32>, tensor<i32>) -> tensor<i32> // First slice %8 = tosa.reshape %arg1, %c4_1 : (tensor<i32>, !tosa.shape<1>) -> tensor<1xi32> %9 = tosa.concat %8, %3 {axis = 0 : i32} : (tensor<1xi32>, tensor<2xi32>) -> tensor<3xi32> %extracted_0 = tensor.extract %9[%c0] : tensor<3xi32> %10 = index.casts %extracted_0 : i32 to index %11 = arith.cmpi eq, %10, %c-1 : index %12 = arith.select %11, %c10, %10 : index %extracted_slice = tensor.extract_slice %arg2[0, 0, 0] [%12, 4, 1] [1, 1, 1] : tensor<10x4x1xf32> to tensor<?x4x1xf32> // Second slice - this is where the failure occurs %13 = tosa.reshape %7, %c4_1 : (tensor<i32>, !tosa.shape<1>) -> tensor<1xi32> %14 = tosa.concat %13, %4 {axis = 0 : i32} : (tensor<1xi32>, tensor<2xi32>) -> tensor<3xi32> %extracted_1 = tensor.extract %14[%c0] : tensor<3xi32> %15 = index.castu %extracted_1 : i32 to index %16 = arith.subi %c10, %15 : index // size = 10 - offset %extracted_2 = tensor.extract %14[%c1] : tensor<3xi32> %17 = index.castu %extracted_2 : i32 to index %extracted_3 = tensor.extract %14[%c2] : tensor<3xi32> %18 = index.castu %extracted_3 : i32 to index // On the last loop iteration: %15=10, %16=0 // %extracted_slice_0 becomes an empty tensor // Runtime verification fails: "offset 0 is out-of-bounds" %extracted_slice_0 = tensor.extract_slice %arg2[%15, %17, %18] [%16, 4, 1] [1, 1, 1] : tensor<10x4x1xf32> to tensor<?x4x1xf32> %19 = tosa.concat %extracted_slice, %5, %extracted_slice_0 {axis = 0 : i32} : (tensor<?x4x1xf32>, tensor<1x4x1xf32>, tensor<?x4x1xf32>) -> tensor<10x4x1xf32> scf.yield %7, %19 : tensor<i32>, tensor<10x4x1xf32> } return %6#1 : tensor<10x4x1xf32> } ``` **The fix:** Make the offset check conditional on slice size: - Empty slice (size == 0): allow `0 <= offset <= dim_size` - Non-empty slice (size > 0): require `0 <= offset < dim_size` **Question for reviewers:** Should we also relax the static verifier to allow this edge case? Currently, the static verifier rejects the following IR: ```mlir %tensor = arith.constant dense<1.0> : tensor<10xf32> %slice = tensor.extract_slice %tensor[10] [0] [1] : tensor<10xf32> to tensor<0xf32> ``` Since we're allowing it at runtime for dynamic shapes, it seems inconsistent to reject it statically. However, I wanted to get feedback before making that change - this PR focuses only on the runtime verification fix for dynamic shapes. P.S. We have a similar issue with `memref.subview`. I will send a separate patch for the issue. Co-authored-by: Hanumanth Hanumantharayappa <hhanuman@ah-hhanuman-l.dhcp.mathworks.com>
Removed large offset test. It caused issue with ARM 32-bit because of large offset. Co-authored-by: anoopkg6 <anoopkg6@github.com>
The C++ index switch op has utilities for `getCaseBlock(int i)` and `getDefaultBlock()`, so these have been added. Optional body builder args have been added: one for the default case and one for the switch cases.
Add requires to not run `invalid-section-index.s` test in non aarch64 supported environments.
This should not be overridable and the special case hacks have been replaced with RegClassByHwMode
Summary ------- While dogfooding lldb-dap, I observed that VSCode frequently displays certain stack frames as greyed out. Although these frames have valid debug information, double-clicking them shows disassembly instead of source code. However, running `bt` from the LLDB command line correctly displays source file and line information for these same frames, indicating this is an lldb-dap specific issue. Root Cause ---------- Investigation revealed that `DAP::ResolveSource()` incorrectly uses a frame's PC address directly to determine whether valid source line information exists. This approach works for leaf frames, but fails for non-leaf (caller) frames where the PC points to the return address immediately after a call instruction. This return address may fall into compiler-generated code with no associated line information, even though the actual call site has valid source location data. The correct approach is to use the symbol context's line entry, which LLDB resolves by effectively checking PC-1 for non-leaf frames, properly identifying the line information for the call instruction rather than the return address. Testing ------- Manually tested with VSCode debugging sessions on production workloads. Verified that non-leaf frames now correctly display source code instead of disassembly view. Before the change symptom: <img width="1013" height="216" alt="image" src="https://github.com/user-attachments/assets/9487fbc0-f438-4892-a8d2-1437dc25399b" /> And here is after the fix: <img width="1068" height="198" alt="image" src="https://github.com/user-attachments/assets/0d2ebaa7-cca6-4983-a1d1-1a26ae62c86f" /> --------- Co-authored-by: Jeffrey Tan <jeffreytan@fb.com>
…tion motion" (llvm#167465) This patch introduces a new virtual method `TargetInstrInfo::isSafeToMove()` to allow backends to control whether a machine instruction can be safely moved by optimization passes. The `BranchFolder` pass now respects this hook when hoisting common code. By default, all instructions are considered safe to to move. For LoongArch, `isSafeToMove()` is overridden to prevent relocation-related instruction sequences (e.g. PC-relative addressing and calls) from being broken by instruction motion. Correspondingly, `isSchedulingBoundary()` is updated to reuse this logic for consistency. Relands llvm#163725
Summary: The OpenMP handling using an offload binary should be optional, it's only used for extra metadata for llvm-objdump. Also the triple was completely wrong, it didn't let anyone correctly choose between ELF and COFF handling.
Copy new process from sincos/sincospi
…nnotations' crashes (llvm#167487) Added handling the case of a non-materialized module, also don't call printInfoComment for immaterializable values
…en (llvm#166987) Only use RuntimeLibcallsInfo. Remove the helper functions used to transition.
Resolves llvm#148131 - Unlock `std::optional<T&>` implementation - Allow instantiations of `optional<T(&)(...)>` and `optional<T(&)[]>` but disables `value_or()` and `optional::iterator` + all `iterator` related functions - Update documentation - Update tests
Makes linalg.reduce and linalg.map region_ops so they can be constructed from functions and be called as decorators.
…tions in reduced BMI
A full LTO link time performance and memory regression was introduced by llvm#137081 in cases where the modules contain large quantities of llvm.used globals. This was unnoticed because it was not expected that this would be a typical case, but this is exactly what coverage collection does, and when this feature is enabled together with full LTO we end up with quadratic memory consumption (from the unused constants) and quadratic complexity in the function Verifier::visitGlobalValue (which visits all the unused constants in the use list of each global value). This is a targeted fix that avoids reintroducing the quadratic complexity from before llvm#137081, by having ValueMapper delete the old initializer of an appending global if it is unused, instead of visiting every global in the context after every link. The repro-cfi-64 reproducer from llvm#167037 before and after this change: ``` Elapsed time Max RSS (KB) Before 12:05.11 52537184 After 3:27.68 7520696 ``` Fixes llvm#167037. Reviewers: nikic, teresajohnson Reviewed By: teresajohnson Pull Request: llvm#167629
…#166988) This kind of helper is higher level and not general enough to go directly in SelectionDAG. Most similar utilities are in TargetLowering.
…nwind compatibility (llvm#160887) As it was explained to me in https://discourse.llvm.org/t/libunwinds-raison-detre/88283/2 the LLVM version of libunwind is mostly compatible with nongnu one. This change improves the compatibility a bit further.
NFCI -- the deleted copy constructor already made this immovable. The explicit operations just make clear that this was intentional.
…tions from other named module (llvm#167468) Close llvm#166068 The cause of the problem is that we would import initializers and pending implicit instantiations from other named module. This is very bad and it may waste a lot of time. And we didn't observe it as the weak symbols can live together and the strong symbols would be removed by other mechanism. So we didn't observe the bad behavior for a long time. But it indeeds waste compilation time.
Collaborator
dpalermo
approved these changes
Nov 12, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.