Skip to content

Conversation

@ronlieb
Copy link
Collaborator

@ronlieb ronlieb commented Nov 17, 2025

No description provided.

Meinersbur and others added 30 commits November 17, 2025 17:09
The file extention was accidentally omitted from llvm#164794.
…cket list correctly when erasing the last bucket (llvm#167865)

Fixes llvm#167820
)

The mbarrier Ops also require access to the `mem_scope` and
`shared_space` attributes. Hence, this patch moves their definitions
to the beginning of the file alongside the other attribute definitions.

Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
…cSection (llvm#166323)

A field-named 'size' already available and perfectly usable via
inheritance from InputSection, and these variables shadow it for no good
reason.

The only interesting change here is in PaddingSection, because a
parent's field cannot be initialized via a constructor initializer list,
setting it needs to be done inside the constructor body.
…2629)

This patch adds LLVM infrastructure to support pretty printing of the
intrinsic arguments.
The motivation is to improve the readability of LLVM intrinsics and
facilitate easy
modifications and debugging of LLVM IR.

This feature adds a property `ArgInfo<ArgIndex, [ArgName<"argName">,
ImmArgPrinter<"functionName">]>`
to the intrinsic arguments to print self-explanatory inline comments for
the arguments.

The addition of pretty print support can provide a simple, low-overhead
feature that
enhances the usability of LLVM intrinsics without disrupting existing
workflows.

Link to the RFC, where this feature was discussed:

https://discourse.llvm.org/t/rfc-pretty-printing-immediate-arguments-in-llvm-intrinsics/88536

---------

Signed-off-by: Dharuni R Acharya <dharunira@nvidia.com>
Co-authored-by: Rahul Joshi <rjoshi@nvidia.com>
This PR builds upon the infrastructure set up for Sparse Tensor Loop
Ordering Heuristics (llvm#154656) by adding a preference to have dense loops
outer and sparse loops inner.

As always I'd love to get feedback and know if there's any other
direction to go with this work that might be better.
Upstream handling for BaseToDerived casts, adding the
cir.base_class_addr operation and lowering to LLVM IR.
AMDGPUMCExpr lives in the MC layer it should not depend on Function.h or
GCNSubtarget.h

Move the function that needed GCNSubtarget to the one file that called
it.
This updates the CIR direct callee builder code to handle the case of
calls to functions that were declared with an assembly label using
`__asm`. The implementation doesn't actually have any explicit handling
of the AsmLabelAttr. It is handled by the name mangler.

See https://reviews.llvm.org/D137073 and
https://reviews.llvm.org/D134362 for details on how this was implemented
in classic codegen. The test here is copied from
https://reviews.llvm.org/D134362 because the test in
https://reviews.llvm.org/D134362 requires a target that isn't yet
supported in CIR.
This PR fixes a Fortran syntax violation in the OpenMP default mapper
naming convention. The suffix .omp.default.mapper contains dots which
are invalid in Fortran identifiers, causing failures when mappers are
written to and read from module files. The fix changes the suffix to
_omp_default_mapper which uses underscores instead of dots, complying
with Fortran syntax rules.

Key changes:

- Changed OmpDefaultMapperName constant from .omp.default.mapper to
_omp_default_mapper
- Added GetUltimate() calls in mapper symbol resolution to properly
handle symbols across module boundaries
- Added new test case verifying default mappers work correctly when
defined in a module and used in consuming programs

This fixes llvm#168336.
This commit refactors the SPIRV post-legalizer to use a worklist to
process
new instructions. Previously, the post-legalizer would iterate through
all
instructions and try to assign types. This could fail if a new
instruction
depended on another new instruction that had not been processed yet.

The new implementation adds all new instructions that require a SPIR-V
type
to a worklist. It then iteratively processes the worklist until it is
empty.
This ensures that all dependencies are met before an instruction is
processed.

This change makes the post-legalizer more robust and fixes potential
ordering
issues with newly generated instructions.

Existing tests cover existing functionality. More tests will be added as
the legalizer is modified.

Part of llvm#153091
llvm#140307 added support for
cstring hashes in the orderfile to layout cstrings in a specific order,
but only when `--deduplicate-strings` is used. This PR supports cstring
ordering when `--no-deduplicate-strings` is used.

1. Create `cStringPriorities`, separate from `priorities`, to hold only
priorities for cstring pieces. This allows us to lookup by hash
directly, instead of first converting to a string. It also fixes a
contrived bug where we want to order a symbol named `CSTR;12345` rather
than a cstring.

2. Rather than calling `buildCStringPriorities()` which always
constructs and returns a vector, we use `forEachStringPiece()` to
efficiently iterate over cstring pieces without creating a new vector if
no cstring is ordered.

3. Create `SymbolPriorityEntry::{get,set}Priority()` helper functions to
simplify code.
This only does this for Linux currently as the issue-write workflow
currently does not support writing out multiple comments. This gets the
ball rolling as the failures that most people see are common to both
platforms. Ensuring we have coverage on Windows for comments will be
done in a future patch.
…5923)

This MR adds support for the `exact` flag to the
`arith.shrui/shrsi/divsi/divui` operations. The semantics are identical
to those of the LLVM dialect and the LLVM language reference.

This MR also modifies the mechanism for converting `arith` dialect
**attributes** to corresponding **properties** in the `LLVM` dialect.
(As a specific example, the integer overflow flags `nsw/nuw` are
**properties** in the `LLVM` dialect, as opposed to attributes.)

Previously, attribute converter classes were required to have a specific
method to support integer overflow flags:
```C++
template <typename SourceOp, typename TargetOp>
class AttrConvertPassThrough {
public:
  ...
  LLVM::IntegerOverflowFlags getOverflowFlags() const {
    return LLVM::IntegerOverflowFlags::none;
  }
};
```
This method was required, even for `arith` source operations that did
not use integer overflow flags (e.g. `AttrConvertFastMathToLLVM`).

This MR modifies the interface required by `arith` dialect attribute
converters to instead provide a (possibly NULL) properties attribute:
```C++
template <typename SourceOp, typename TargetOp>
class AttrConvertPassThrough {
public:
  ...
  Attribute getPropAttr() const { return {}; }
};
```
For `arith` operations with attributes that map to `LLVM` dialect
**properties**, the attribute converter can create a `DictionaryAttr`
containing target properties and return that attribute from the
attribute converter's `getPropAttr()` method. The `arith` attribute
conversion framework will set the `propertiesAttr` of an
`OperationState`, and the target operation's `setPropertiesFromAttr()`
method will be invoked to set the properties when the target operation
is created. The `AttrConvertOverflowToLLVM` class in this MR uses the
new approach.
… marked as Pure (llvm#166648)

This MR modifies side effect traits of some integer arithmetic
operations in the LLVM dialect.

Prior to this MR, the LLVM dialect `sdiv` and `udiv` operations were
marked as `Pure` through `tblgen` inheritance of the
`LLVM_ArithmeticOpBase` class. The `Pure` trait allowed incorrect
hoisting of `sdiv`/`udiv` operations by the
`loop-independent-code-motion` pass.

This MR modifies the `sdiv` and `udiv` LLVM operations to have traits
and code motion behavior identical to their counterparts in the `arith`
dialect, which were established by the commit/review below.


llvm@ed39825
https://reviews.llvm.org/D137814
Remove flag to sepecifcy "no semantic interposition" since this is the
default for AIX.
This test was using subshells and then passing the results to diff. Write out
the results to files before passing to diff as the internal shell does not
support subshells.
This PR adds initial support for codegen of `blockAddressOp`. This is
emitted when using the GNU extension labels as values. The operation is
used together with `indirectBrOp`, which will be implemented in a future
PR. Lowering will be added in a later PR.
As the PGPF effort has been turned down, there is no current way to
generate profiles that will be used by these passes. Current efforts are
also focused around inserting prefetches in PLO optimizers, which have a
more accurate view of how the code looks.
20a22a4 was supposed to fully remove
these, but left around the functionality to actually compute them and a
unittest that ensured they worked. These are not development features in
the sense of features used in development mode, but experimental
features that have been superseded by MIR2Vec.
…68137)

This test uses `ulimit -f 1` to test what libFuzzer does when trying to
create a file > **_1KB_**. However, the none of the input files used by
this test are actually >= 1KB, so there's no reason to expect this test
to pass.

This test appears to be passing on accident since the "control file"
happens to be > 1KB, but this is not always the case depending upon the
length of the path where the test is run from.

This modifies the test to ensure that one of the input file is actually
>1KB.
Fixes llvm#167700 to support
builds where TableGen's output file is specified as full path
rather than just filename.
Required after llvm#167700

This adds yet another format for `tbl_outs` where you pass the list of
opts, and a list of outputs (where previously you could only have 1
output). In that case all outputs must be produced, but the first is
used for the `-o` arg since tblgen is generating the other names based
on that single argument.
AIX out-of-box memory soft limits are often
insufficient to run LLVM on reasonably size
inputs. Thus, we often encounter users who run
into spurious out of memory errors.

This change raises the memory soft limits
to the hard limits at LLVM startup to prevent
these types of issues.
lialan and others added 20 commits November 17, 2025 20:46
Replace addMetadata with setMetadata, which sets metadata, updating
existing entries or adding a new entry otherwise.

This isn't strictly needed at the moment, but will be needed for
follow-up patches.
…vm#167065)

Otherwise, we end up using whatever system-provided compiler runtime is
available, which doesn't work on macOS since compiler-rt is located
inside the toolchain path, which can't be found by default.

However, disable the tests for compiler-rt since those are linking
against the system C++ standard library while using the just-built
libc++ headers, which is non-sensical and leads to undefined references
on macOS.
… tests (llvm#167346)

We want to eliminate all .compile.fail.cpp tests since they are brittle:
these tests pass regardless of the specific compilation error, which
means that e.g. a mising include will render the test null.

This is not an exhaustive pass, just a few tests I stumbled upon.
…#167253)

Update VPlan to populate VPIRMetadata during VPInstruction construction
and use it when creating widened recipes, instead of constructing
VPIRMetadata from the underlying IR instruction each time.

This centralizes VPIRMetadata in VPInstructions and ensures metadata is
consistently available throughout VPlan transformations.

PR: llvm#167253
These are simply implemented as specializations of strtofloatingpoint
for double / long double and for wchar_t. The unit tests are copied from
the strtod / strtold ones.
…cking safe patterns, if "cond" is a constant (llvm#167989)

In `-Wunsafe-buffer-usage`, many safe pattern checks can benefit from
constant folding. This commit improves null-terminated pointer checks by
folding conditional expressions.

rdar://159374822

---------

Co-authored-by: Balázs Benics <benicsbalazs@gmail.com>
* Adds lowerings for amdgpy.scaled_ext_packed816
* updates verifiers
…167661)

The motivation is to allow passes such as MachineLICM to hoist trivial
FMOV instructions out of loops, where previously it didn't do so even
when the RHS is a constant.
On most architectures, these expensive move instructions have a latency
of 2-6 cycles, and certainly not cheap as a 0-1 cycle move.
Starting in version 15, GCC emits a `.base64` directive instead of
`.string` or `.ascii` for char arrays of length `>= 3`.

See [this godbolt link](https://godbolt.org/z/ebhe3oenv) for an example.

This patch adds support for the .base64 directive to AsmParser.cpp, so
tools like `llvm-mc` can process the output of GCC more effectively.

This addresses llvm#165499.
…lvm#167981)

During the initialization sequence in our tests the first 'threads'
response sould only be kept if the process is actually stopped,
otherwise we will have stale data.

In VSCode, during the debug session startup sequence immediately after
'configurationDone' a 'threads' request is made. This initial request is
to retrieve the main threads name and id so the UI can be populated.
However, in our tests we do not want to cache this value unless the
process is actually stopped. We do need to make this initial request
because lldb-dap is caching the initial thread list during
configurationDone before the process is resumed. We need to make this
call to ensure the cached initial threads are purged.

I noticed this in a CI job for another review
(https://github.com/llvm/llvm-project/actions/runs/19348261989/job/55353961798)
where the tests incorrectly failed to fetch the threads prior to
validating the thread names.
There is an extra underscore in build_type param in llvm#167583 patch.
Fixing it in this PR.
…lvm#168433)

This change adds the ACCImplicitRoutine pass which implements the
OpenACC specification for implicit routine directives (OpenACC 3.4 spec,
section 2.15.1).

According to the specification: "If no explicit routine directive
applies to a procedure whose definition appears in the program unit
being compiled, then the implementation applies an implicit routine
directive to that procedure if any of the following conditions holds:
The procedure is called or its address is accessed in a compute region."

The pass automatically generates `acc.routine` operations for functions
called within OpenACC compute constructs or within existing routine
functions that do not already have explicit routine directives. It
recursively applies implicit routine directives while avoiding infinite
recursion when dependencies form cycles.

Key features:
- Walks through all OpenACC compute constructs (parallel, kernels,
serial) to identify function calls
- Creates implicit `acc.routine` operations for functions without
explicit routine declarations
- Recursively processes existing `acc.routine` operations to handle
transitive dependencies
- Avoids infinite recursion through proper tracking of processed
routines
- Respects device-type specific bind clauses to skip routines bound to
different device types

Requirements:
- Function operations must implement `mlir::FunctionOpInterface` to be
identified and associated with routine directives.
- Call operations must implement `mlir::CallOpInterface` to detect
function calls and traverse the call graph.
- Optionally pre-register `acc::OpenACCSupport` if custom behavior is
needed for determining if a symbol use is valid within GPU regions (such
as functions which are already considerations for offloading even
without `acc routine` markings)

Co-authored-by: delaram-talaashrafi<dtalaashrafi@nvidia.com>
This allows SDNodes to be validated against their expected type profiles
and reduces the number of changes required to add a new node.

The validation functionality has detected several issues, see
`PPCSelectionDAGInfo::verifyTargetNode()`.

Most of the nodes have a description in `*.td` files and were
successfully "imported". Those that don't have a description are listed
in the enum in `PPCSelectionDAGInfo.td`. These nodes are not validated.

Part of llvm#119709.

Pull Request: llvm#168108
We build the callsite graph by first adding nodes and edges for all
allocation contexts, then match the interior callsite nodes onto actual
calls (IR or summary), which due to inlining may result in the
generation of new nodes representing the inlined context sequence. We
attempt to update edges correctly during this process, but in the case
of recursion this becomes impossible to always get correct.
Specifically, when creating new inlined sequence nodes for stack ids on
recursive cycles we can't always update correctly, because we have lost
the original ordering of the context.

This PR introduces a mechanism, guarded by -memprof-top-n-important=
flag, to keep track of extra information for the largest N cold
contexts. Another flag -memprof-fixup-important (enabled by default)
will perform more expensive fixup of the edges for those largest N cold
contexts, by saving and walking the original ordered list of stack ids
from the context.
@ronlieb ronlieb requested review from a team and dpalermo November 17, 2025 23:48
@ronlieb ronlieb removed request for krzysz00 and kuhar November 17, 2025 23:48
@z1-cciauto
Copy link
Collaborator

@z1-cciauto z1-cciauto merged commit d6c2e8c into amd-staging Nov 18, 2025
17 checks passed
@z1-cciauto z1-cciauto deleted the amd/merge/upstream_merge_20251117172424 branch November 18, 2025 02:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.