merge main into amd-staging #611

ronlieb · 2025-11-17T23:48:07Z

No description provided.

The file extention was accidentally omitted from llvm#164794.

…cket list correctly when erasing the last bucket (llvm#167865) Fixes llvm#167820

)

) The mbarrier Ops also require access to the `mem_scope` and `shared_space` attributes. Hence, this patch moves their definitions to the beginning of the file alongside the other attribute definitions. Signed-off-by: Durgadoss R <durgadossr@nvidia.com>

…cSection (llvm#166323) A field-named 'size' already available and perfectly usable via inheritance from InputSection, and these variables shadow it for no good reason. The only interesting change here is in PaddingSection, because a parent's field cannot be initialized via a constructor initializer list, setting it needs to be done inside the constructor body.

…2629) This patch adds LLVM infrastructure to support pretty printing of the intrinsic arguments. The motivation is to improve the readability of LLVM intrinsics and facilitate easy modifications and debugging of LLVM IR. This feature adds a property `ArgInfo<ArgIndex, [ArgName<"argName">, ImmArgPrinter<"functionName">]>` to the intrinsic arguments to print self-explanatory inline comments for the arguments. The addition of pretty print support can provide a simple, low-overhead feature that enhances the usability of LLVM intrinsics without disrupting existing workflows. Link to the RFC, where this feature was discussed: https://discourse.llvm.org/t/rfc-pretty-printing-immediate-arguments-in-llvm-intrinsics/88536 --------- Signed-off-by: Dharuni R Acharya <dharunira@nvidia.com> Co-authored-by: Rahul Joshi <rjoshi@nvidia.com>

This PR builds upon the infrastructure set up for Sparse Tensor Loop Ordering Heuristics (llvm#154656) by adding a preference to have dense loops outer and sparse loops inner. As always I'd love to get feedback and know if there's any other direction to go with this work that might be better.

Upstream handling for BaseToDerived casts, adding the cir.base_class_addr operation and lowering to LLVM IR.

AMDGPUMCExpr lives in the MC layer it should not depend on Function.h or GCNSubtarget.h Move the function that needed GCNSubtarget to the one file that called it.

This updates the CIR direct callee builder code to handle the case of calls to functions that were declared with an assembly label using `__asm`. The implementation doesn't actually have any explicit handling of the AsmLabelAttr. It is handled by the name mangler. See https://reviews.llvm.org/D137073 and https://reviews.llvm.org/D134362 for details on how this was implemented in classic codegen. The test here is copied from https://reviews.llvm.org/D134362 because the test in https://reviews.llvm.org/D134362 requires a target that isn't yet supported in CIR.

This PR fixes a Fortran syntax violation in the OpenMP default mapper naming convention. The suffix .omp.default.mapper contains dots which are invalid in Fortran identifiers, causing failures when mappers are written to and read from module files. The fix changes the suffix to _omp_default_mapper which uses underscores instead of dots, complying with Fortran syntax rules. Key changes: - Changed OmpDefaultMapperName constant from .omp.default.mapper to _omp_default_mapper - Added GetUltimate() calls in mapper symbol resolution to properly handle symbols across module boundaries - Added new test case verifying default mappers work correctly when defined in a module and used in consuming programs This fixes llvm#168336.

This commit refactors the SPIRV post-legalizer to use a worklist to process new instructions. Previously, the post-legalizer would iterate through all instructions and try to assign types. This could fail if a new instruction depended on another new instruction that had not been processed yet. The new implementation adds all new instructions that require a SPIR-V type to a worklist. It then iteratively processes the worklist until it is empty. This ensures that all dependencies are met before an instruction is processed. This change makes the post-legalizer more robust and fixes potential ordering issues with newly generated instructions. Existing tests cover existing functionality. More tests will be added as the legalizer is modified. Part of llvm#153091

llvm#140307 added support for cstring hashes in the orderfile to layout cstrings in a specific order, but only when `--deduplicate-strings` is used. This PR supports cstring ordering when `--no-deduplicate-strings` is used. 1. Create `cStringPriorities`, separate from `priorities`, to hold only priorities for cstring pieces. This allows us to lookup by hash directly, instead of first converting to a string. It also fixes a contrived bug where we want to order a symbol named `CSTR;12345` rather than a cstring. 2. Rather than calling `buildCStringPriorities()` which always constructs and returns a vector, we use `forEachStringPiece()` to efficiently iterate over cstring pieces without creating a new vector if no cstring is ordered. 3. Create `SymbolPriorityEntry::{get,set}Priority()` helper functions to simplify code.

This only does this for Linux currently as the issue-write workflow currently does not support writing out multiple comments. This gets the ball rolling as the failures that most people see are common to both platforms. Ensuring we have coverage on Windows for comments will be done in a future patch.

…5923) This MR adds support for the `exact` flag to the `arith.shrui/shrsi/divsi/divui` operations. The semantics are identical to those of the LLVM dialect and the LLVM language reference. This MR also modifies the mechanism for converting `arith` dialect **attributes** to corresponding **properties** in the `LLVM` dialect. (As a specific example, the integer overflow flags `nsw/nuw` are **properties** in the `LLVM` dialect, as opposed to attributes.) Previously, attribute converter classes were required to have a specific method to support integer overflow flags: ```C++ template <typename SourceOp, typename TargetOp> class AttrConvertPassThrough { public: ... LLVM::IntegerOverflowFlags getOverflowFlags() const { return LLVM::IntegerOverflowFlags::none; } }; ``` This method was required, even for `arith` source operations that did not use integer overflow flags (e.g. `AttrConvertFastMathToLLVM`). This MR modifies the interface required by `arith` dialect attribute converters to instead provide a (possibly NULL) properties attribute: ```C++ template <typename SourceOp, typename TargetOp> class AttrConvertPassThrough { public: ... Attribute getPropAttr() const { return {}; } }; ``` For `arith` operations with attributes that map to `LLVM` dialect **properties**, the attribute converter can create a `DictionaryAttr` containing target properties and return that attribute from the attribute converter's `getPropAttr()` method. The `arith` attribute conversion framework will set the `propertiesAttr` of an `OperationState`, and the target operation's `setPropertiesFromAttr()` method will be invoked to set the properties when the target operation is created. The `AttrConvertOverflowToLLVM` class in this MR uses the new approach.

… marked as Pure (llvm#166648) This MR modifies side effect traits of some integer arithmetic operations in the LLVM dialect. Prior to this MR, the LLVM dialect `sdiv` and `udiv` operations were marked as `Pure` through `tblgen` inheritance of the `LLVM_ArithmeticOpBase` class. The `Pure` trait allowed incorrect hoisting of `sdiv`/`udiv` operations by the `loop-independent-code-motion` pass. This MR modifies the `sdiv` and `udiv` LLVM operations to have traits and code motion behavior identical to their counterparts in the `arith` dialect, which were established by the commit/review below. llvm@ed39825 https://reviews.llvm.org/D137814

Remove flag to sepecifcy "no semantic interposition" since this is the default for AIX.

This test was using subshells and then passing the results to diff. Write out the results to files before passing to diff as the internal shell does not support subshells.

This PR adds initial support for codegen of `blockAddressOp`. This is emitted when using the GNU extension labels as values. The operation is used together with `indirectBrOp`, which will be implemented in a future PR. Lowering will be added in a later PR.

As the PGPF effort has been turned down, there is no current way to generate profiles that will be used by these passes. Current efforts are also focused around inserting prefetches in PLO optimizers, which have a more accurate view of how the code looks.

20a22a4 was supposed to fully remove these, but left around the functionality to actually compute them and a unittest that ensured they worked. These are not development features in the sense of features used in development mode, but experimental features that have been superseded by MIR2Vec.

…#167495)

…68137) This test uses `ulimit -f 1` to test what libFuzzer does when trying to create a file > **_1KB_**. However, the none of the input files used by this test are actually >= 1KB, so there's no reason to expect this test to pass. This test appears to be passing on accident since the "control file" happens to be > 1KB, but this is not always the case depending upon the length of the path where the test is run from. This modifies the test to ensure that one of the input file is actually >1KB.

Fixes llvm#167700 to support builds where TableGen's output file is specified as full path rather than just filename.

Required after llvm#167700 This adds yet another format for `tbl_outs` where you pass the list of opts, and a list of outputs (where previously you could only have 1 output). In that case all outputs must be produced, but the first is used for the `-o` arg since tblgen is generating the other names based on that single argument.

AIX out-of-box memory soft limits are often insufficient to run LLVM on reasonably size inputs. Thus, we often encounter users who run into spurious out of memory errors. This change raises the memory soft limits to the hard limits at LLVM startup to prevent these types of issues.

Replace addMetadata with setMetadata, which sets metadata, updating existing entries or adding a new entry otherwise. This isn't strictly needed at the moment, but will be needed for follow-up patches.

https://green.lab.llvm.org/job/llvm.org/view/LLDB/job/lldb-cmake-sanitized/2744/consoleText

…vm#167065) Otherwise, we end up using whatever system-provided compiler runtime is available, which doesn't work on macOS since compiler-rt is located inside the toolchain path, which can't be found by default. However, disable the tests for compiler-rt since those are linking against the system C++ standard library while using the just-built libc++ headers, which is non-sensical and leads to undefined references on macOS.

… tests (llvm#167346) We want to eliminate all .compile.fail.cpp tests since they are brittle: these tests pass regardless of the specific compilation error, which means that e.g. a mising include will render the test null. This is not an exhaustive pass, just a few tests I stumbled upon.

…#167253) Update VPlan to populate VPIRMetadata during VPInstruction construction and use it when creating widened recipes, instead of constructing VPIRMetadata from the underlying IR instruction each time. This centralizes VPIRMetadata in VPInstructions and ensures metadata is consistently available throughout VPlan transformations. PR: llvm#167253

These are simply implemented as specializations of strtofloatingpoint for double / long double and for wchar_t. The unit tests are copied from the strtod / strtold ones.

…cking safe patterns, if "cond" is a constant (llvm#167989) In `-Wunsafe-buffer-usage`, many safe pattern checks can benefit from constant folding. This commit improves null-terminated pointer checks by folding conditional expressions. rdar://159374822 --------- Co-authored-by: Balázs Benics <benicsbalazs@gmail.com>

* Adds lowerings for amdgpy.scaled_ext_packed816 * updates verifiers

…167661) The motivation is to allow passes such as MachineLICM to hoist trivial FMOV instructions out of loops, where previously it didn't do so even when the RHS is a constant. On most architectures, these expensive move instructions have a latency of 2-6 cycles, and certainly not cheap as a 0-1 cycle move.

Starting in version 15, GCC emits a `.base64` directive instead of `.string` or `.ascii` for char arrays of length `>= 3`. See [this godbolt link](https://godbolt.org/z/ebhe3oenv) for an example. This patch adds support for the .base64 directive to AsmParser.cpp, so tools like `llvm-mc` can process the output of GCC more effectively. This addresses llvm#165499.

…lvm#167981) During the initialization sequence in our tests the first 'threads' response sould only be kept if the process is actually stopped, otherwise we will have stale data. In VSCode, during the debug session startup sequence immediately after 'configurationDone' a 'threads' request is made. This initial request is to retrieve the main threads name and id so the UI can be populated. However, in our tests we do not want to cache this value unless the process is actually stopped. We do need to make this initial request because lldb-dap is caching the initial thread list during configurationDone before the process is resumed. We need to make this call to ensure the cached initial threads are purged. I noticed this in a CI job for another review (https://github.com/llvm/llvm-project/actions/runs/19348261989/job/55353961798) where the tests incorrectly failed to fetch the threads prior to validating the thread names.

There is an extra underscore in build_type param in llvm#167583 patch. Fixing it in this PR.

…lvm#168433) This change adds the ACCImplicitRoutine pass which implements the OpenACC specification for implicit routine directives (OpenACC 3.4 spec, section 2.15.1). According to the specification: "If no explicit routine directive applies to a procedure whose definition appears in the program unit being compiled, then the implementation applies an implicit routine directive to that procedure if any of the following conditions holds: The procedure is called or its address is accessed in a compute region." The pass automatically generates `acc.routine` operations for functions called within OpenACC compute constructs or within existing routine functions that do not already have explicit routine directives. It recursively applies implicit routine directives while avoiding infinite recursion when dependencies form cycles. Key features: - Walks through all OpenACC compute constructs (parallel, kernels, serial) to identify function calls - Creates implicit `acc.routine` operations for functions without explicit routine declarations - Recursively processes existing `acc.routine` operations to handle transitive dependencies - Avoids infinite recursion through proper tracking of processed routines - Respects device-type specific bind clauses to skip routines bound to different device types Requirements: - Function operations must implement `mlir::FunctionOpInterface` to be identified and associated with routine directives. - Call operations must implement `mlir::CallOpInterface` to detect function calls and traverse the call graph. - Optionally pre-register `acc::OpenACCSupport` if custom behavior is needed for determining if a symbol use is valid within GPU regions (such as functions which are already considerations for offloading even without `acc routine` markings) Co-authored-by: delaram-talaashrafi<dtalaashrafi@nvidia.com>

This allows SDNodes to be validated against their expected type profiles and reduces the number of changes required to add a new node. The validation functionality has detected several issues, see `PPCSelectionDAGInfo::verifyTargetNode()`. Most of the nodes have a description in `*.td` files and were successfully "imported". Those that don't have a description are listed in the enum in `PPCSelectionDAGInfo.td`. These nodes are not validated. Part of llvm#119709. Pull Request: llvm#168108

We build the callsite graph by first adding nodes and edges for all allocation contexts, then match the interior callsite nodes onto actual calls (IR or summary), which due to inlining may result in the generation of new nodes representing the inlined context sequence. We attempt to update edges correctly during this process, but in the case of recursion this becomes impossible to always get correct. Specifically, when creating new inlined sequence nodes for stack ids on recursive cycles we can't always update correctly, because we have lost the original ordering of the context. This PR introduces a mechanism, guarded by -memprof-top-n-important= flag, to keep track of extra information for the largest N cold contexts. Another flag -memprof-fixup-important (enabled by default) will perform more expensive fixup of the edges for those largest N cold contexts, by saving and walking the original ordered list of stack ids from the context.

z1-cciauto · 2025-11-17T23:48:53Z

PSDB Link: https://compiler-ci.amd.com/job/compiler-psdb-amd-staging/2842

Meinersbur and others added 30 commits November 17, 2025 17:09

[OpenMP][omptest] Fix missing source extention

e69d2bf

The file extention was accidentally omitted from llvm#164794.

[libc++] Fix __hash_table::erase(iterator, iterator) to update the bu…

a9633aa

…cket list correctly when erasing the last bucket (llvm#167865) Fixes llvm#167820

[VectorUtils] Use PatternMatch in findScalarElement (NFC) (llvm#168389)

7659cd4

[AMDGPU][GlobalISel] Add RegBankLegalize support for G_FMUL (llvm#167847

4dd2796

)

[MLIR][NVVM][Docs] Explain memory spaces (llvm#168059)

35ae515

[MC] Use MCRegister::id() to avoid implicit casts. NFC (llvm#168233)

c66f1fd

[CIR] Upstream handling for BaseToDerived casts (llvm#167769)

0c84643

Upstream handling for BaseToDerived casts, adding the cir.base_class_addr operation and lowering to LLVM IR.

[AMDGPU] Fix layering violations in AMDGPUMCExpr.cpp. NFC (llvm#168242)

72b02c7

AMDGPUMCExpr lives in the MC layer it should not depend on Function.h or GCNSubtarget.h Move the function that needed GCNSubtarget to the one file that called it.

[RISCV] Fold Zba-expanded (mul (shr exact X, C1), C2) (llvm#168019)

be6296e

[PowerPC][AIX] Remove flag for no semantic interposition (llvm#168109)

a7579fd

Remove flag to sepecifcy "no semantic interposition" since this is the default for AIX.

[compiler-rt][Profile] Mark Darwin test work with internal shell

72059be

This test was using subshells and then passing the results to diff. Write out the results to files before passing to diff as the internal shell does not support subshells.

[flang][OpenMP] Undeprecate accidentally deprecated TARGET LOOP (llvm…

4be0ab6

…#167495)

[TableGen] Strip directories from filename prefixes. (llvm#168355)

cc304e5

Fixes llvm#167700 to support builds where TableGen's output file is specified as full path rather than just filename.

lialan and others added 20 commits November 17, 2025 20:46

Fix bazel dep caused by f5b7376 (llvm#168436)

b00588f

[VPlan] Replace VPIRMetadata::addMetadata with setMetadata. (NFC)

321b9d1

Replace addMetadata with setMetadata, which sets metadata, updating existing entries or adding a new entry otherwise. This isn't strictly needed at the moment, but will be needed for follow-up patches.

[LLDB] Fix test compilation errors under asan (NFC) (llvm#168408)

54c2c7c

https://green.lab.llvm.org/job/llvm.org/view/LLDB/job/lldb-cmake-sanitized/2744/consoleText

[mlir][bazel] Fix llvm#167957 (llvm#168441)

7693f12

Fixed typo in llvm-otool (llvm#168395)

0d8c294

[bazel][buildifier] reformat changes in llvm#168434 (llvm#168443)

ed617bd

[libc] Implement wcstod and wcstold. (llvm#168020)

92c8c87

These are simply implemented as specializations of strtofloatingpoint for double / long double and for wchar_t. The unit tests are copied from the strtod / strtold ones.

[mlir][amdgpu] Add lowerings for ScaledExtPacked816 (llvm#168123)

909c9aa

* Adds lowerings for amdgpy.scaled_ext_packed816 * updates verifiers

[libc]Github] Fix typo on build_type param (llvm#168453)

e89e359

There is an extra underscore in build_type param in llvm#167583 patch. Fixing it in this PR.

merge main into amd-staging

fc0b487

ronlieb requested review from a team and dpalermo November 17, 2025 23:48

ronlieb requested review from krzysz00 and kuhar as code owners November 17, 2025 23:48

ronlieb removed request for krzysz00 and kuhar November 17, 2025 23:48

dpalermo approved these changes Nov 18, 2025

View reviewed changes

z1-cciauto merged commit d6c2e8c into amd-staging Nov 18, 2025
17 checks passed

z1-cciauto deleted the amd/merge/upstream_merge_20251117172424 branch November 18, 2025 02:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

merge main into amd-staging #611

merge main into amd-staging #611

Uh oh!

ronlieb commented Nov 17, 2025

Uh oh!

z1-cciauto commented Nov 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

50 participants

merge main into amd-staging #611

merge main into amd-staging #611

Uh oh!

Conversation

ronlieb commented Nov 17, 2025

Uh oh!

z1-cciauto commented Nov 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

50 participants