merge main into amd-staging #554

ronlieb · 2025-11-10T23:16:01Z

No description provided.

… blocks (llvm#160449) Split off from llvm#158690. Currently if an instruction needs predicated due to tail folding, it will also have a predicated discount applied to it in multiple places. This is likely inaccurate because we can expect a tail folded instruction to be executed on every iteration bar the last. This fixes it by checking if the instruction/block was originally predicated, and in doing so prevents vectorization with tail folding where we would have had to scalarize the memory op anyway. On llvm-test-suite this causes 4 loops in total to no longer be vectorized with -O3 on arm64-apple-darwin, and there's no observable performance impact.

…lvm#166855) These checks ensure that retained nodes of a DISubprogram belong to the subprogram. Tests with incorrect IR are fixed. We should not have variables of one subprogram present in retained nodes of other subprograms. Also, interface for accessing DISubprogram's retained nodes is slightly refactored. `DISubprogram::visitRetainedNodes` and `DISubprogram::forEachRetainedNode` are added to avoid repeating checks like ``` if (const auto *LV = dyn_cast<DILocalVariable>(N)) ... else if (const auto *L = dyn_cast<DILabel>(N)) ... else if (const auto *IE = dyn_cast<DIImportedEntity>(N)) ... ```

…le> (llvm#167232)

…#167307)

This paper allows use of * in a multidimensional array extent within a _Generic selection association, as a wildcard for any array extent. Clang does not currently support this feature, so this is just some initial test coverage along with an update to the conformance site.

…lvm#166202) `DISubprogram`s are attached to call sites to support various debug info features, including entry values and tail calls. Clang 9.0 (0f65168) was the first version to include this kind of call site `DISubprogram` attachment. This earlier work appears to visit only some call site variants, however. The call site attachment was added to a higher-level `EmitCall` path in Clang's code gen that is only used by some call variants. In particular, some C++ member calls use a different code gen path, which did not include this call site attachment step, and thus the debug info it triggers (e.g. call site entries) was not emitted for such calls. This moves `DISubprogram` attachment to a lower-level call emission path that is used by all call variants. Fixes llvm#161962

… mode (llvm#166576) Fixes a bug causing every conversion to fail fatally with "expected pattern to replace the root operation or modify it in place" when `MLIR_ENABLE_EXPENSIVE_PATTERN_API_CHECKS` is enabled and pattern rollback is disabled. When `allowPatternRollback` is disabled, the rewriter does not keep track of the rewrites it performs and can therefore not use that list to check whether the root op was replaced or updated in place.

Add an error test to check that a suitable error diagnostic is generated for the use of the GL::unpackhalf2x16 operation in invalid contexts. Fixes llvm#166965 Co-authored-by: Tim Corringham <tcorring@amd.com>

…lvm#166941)

…eger comparisons (llvm#166778) A generic alternative to llvm#166564 - make the assumption that expanding integer comparisons will be expensive if they are larger than the largest legal type so avoid sinking if they are also used in the current BB + any phis. Fixes llvm#166534

…m#167226) It's supported together with the other spellings and results in the same attribute. Document it and prefer it in the documentation as the `asm()` spelling is C++ and GNU-only. See: llvm#167221 (comment)

Add `transform.xegpu.get_desc_op` transform op that finds a `xegpu.create_nd_tdesc` producer op of a `Value`.

Added CONSTEXPR macro and test for the following intrinsics: -- _mm_mask_adds_epi16 _mm_maskz_adds_epi16 -- _mm_mask_adds_epi8 _mm_maskz_adds_epi8 -- _mm_mask_adds_epu16 _mm_maskz_adds_epu16 -- _mm_mask_adds_epu8 _mm_maskz_adds_epu8 -- _mm_mask_broadcastb_epi8 _mm_maskz_broadcastb_epi8 -- _mm_mask_broadcastw_epi16 _mm_maskz_broadcastw_epi16 -- _mm_mask_cvtepi8_epi16 _mm_maskz_cvtepi8_epi16 -- _mm_mask_cvtepu8_epi16 _mm_maskz_cvtepu8_epi16 -- _mm_mask_packs_epi16 _mm_maskz_packs_epi16 -- _mm_mask_packs_epi32 _mm_maskz_packs_epi32 -- _mm_mask_packus_epi16 _mm_maskz_packus_epi16 -- _mm_mask_packus_epi32 _mm_maskz_packus_epi32 -- _mm_mask_set1_epi16 _mm_maskz_set1_epi16 -- _mm_mask_set1_epi8 _mm_maskz_set1_epi8 -- _mm_mask_slli_epi16 _mm_mask_slli_epi16 -- _mm_mask_subs_epi16 _mm_maskz_subs_epi16 -- _mm_mask_subs_epi8 _mm_maskz_subs_epi8 -- _mm_mask_subs_epu16 _mm_maskz_subs_epu16 -- _mm_mask_subs_epu8 _mm_maskz_subs_epu8 -- _mm_mask_unpackhi_epi16 _mm_maskz_unpackhi_epi16 -- _mm_mask_unpackhi_epi8 _mm_maskz_unpackhi_epi8 -- _mm_mask_unpacklo_epi16 _mm_maskz_unpacklo_epi16 -- _mm_mask_unpacklo_epi8 _mm_maskz_unpacklo_epi8 -- _mm256_mask_adds_epi16 _mm256_maskz_adds_epi16 -- _mm256_mask_adds_epi8 _mm256_maskz_adds_epi8 -- _mm256_mask_adds_epu16 _mm256_maskz_adds_epu16 -- _mm256_mask_adds_epu8 _mm256_maskz_adds_epu8 -- _mm256_mask_broadcastb_epi8 _mm256_maskz_broadcastb_epi8 -- _mm256_mask_broadcastw_epi16 _mm256_maskz_broadcastw_epi16 -- _mm256_mask_cvtepi8_epi16 _mm256_maskz_cvtepi8_epi16 -- _mm256_mask_cvtepu8_epi16 _mm256_maskz_cvtepu8_epi16 -- _mm256_mask_packs_epi16 _mm256_maskz_packs_epi16 -- _mm256_mask_packs_epi32 _mm256_maskz_packs_epi32 -- _mm256_mask_packus_epi16 _mm256_maskz_packus_epi16 -- _mm256_mask_packus_epi32 _mm256_maskz_packus_epi32 -- _mm256_mask_set1_epi16 _mm256_maskz_set1_epi16 -- _mm256_mask_set1_epi8 _mm256_maskz_set1_epi8 -- _mm256_mask_slli_epi16 _mm256_mask_slli_epi16 -- _mm256_mask_subs_epi16 _mm256_maskz_subs_epi16 -- _mm256_mask_subs_epi8 _mm256_maskz_subs_epi8 -- _mm256_mask_subs_epu16 _mm256_maskz_subs_epu16 -- _mm256_mask_subs_epu8 _mm256_maskz_subs_epu8 -- _mm256_mask_unpackhi_epi16 _mm256_maskz_unpackhi_epi16 -- _mm256_mask_unpackhi_epi8 _mm256_maskz_unpackhi_epi8 -- _mm256_mask_unpacklo_epi16 _mm256_maskz_unpacklo_epi16 -- _mm256_mask_unpacklo_epi8 _mm256_maskz_unpacklo_epi8 -- _mm512_mask_adds_epi16 _mm512_maskz_adds_epi16 -- _mm512_mask_adds_epi8 _mm512_maskz_adds_epi8 -- _mm512_mask_adds_epu16 _mm512_maskz_adds_epu16 -- _mm512_mask_adds_epu8 _mm512_maskz_adds_epu8 -- _mm512_mask_broadcastb_epi8 _mm512_maskz_broadcastb_epi8 -- _mm512_mask_broadcastw_epi16 _mm512_maskz_broadcastw_epi16 -- _mm512_mask_mov_epi16 _mm512_maskz_mov_epi16 -- _mm512_mask_mov_epi8 _mm512_maskz_mov_epi8 -- _mm512_mask_packs_epi16 _mm512_maskz_packs_epi16 -- _mm512_mask_packs_epi32 _mm512_maskz_packs_epi32 -- _mm512_mask_packus_epi16 _mm512_maskz_packus_epi16 -- _mm512_mask_packus_epi32 _mm512_maskz_packus_epi32 -- _mm512_mask_set1_epi16 _mm512_maskz_set1_epi16 -- _mm512_mask_set1_epi8 _mm512_maskz_set1_epi8 -- _mm512_mask_subs_epi16 _mm512_maskz_subs_epi16 -- _mm512_mask_subs_epi8 _mm512_maskz_subs_epi8 -- _mm512_mask_subs_epu16 _mm512_maskz_subs_epu16 -- _mm512_mask_subs_epu8 _mm512_maskz_subs_epu8 -- _mm512_mask_unpackhi_epi16 _mm512_maskz_unpackhi_epi16 -- _mm512_mask_unpackhi_epi8 _mm512_maskz_unpackhi_epi8 -- _mm512_mask_unpacklo_epi16 _mm512_maskz_unpacklo_epi16 -- _mm512_mask_unpacklo_epi8 _mm512_maskz_unpacklo_epi8 closes llvm#162070

Adds `transform.xegpu.set_op_layout_attr` transform op that attaches `xegpu.layout` attribute to the target op.

…llvm#166961) add test case to test lib call are used for the memmove milicode

…166459) First, for internal variables, they are always global, so use the global AS by default unless specified otherwise. We can't really use `0` as a default like we do now because that has an actual meaning on some targets, so we really need specified vs unspecified, so I used `std::optional` which is already used in many places in OMPIRBuilder. Second, for the critical lock variable, add an addrspace cast if needed. Signed-off-by: Nick Sarnie <nick.sarnie@intel.com>

mlir-opt's registerAndParseCLIOptions() forces users to both register default MLIR options and parse the command line string. Custom mlir-opt implementations, however, may need to provide own options or own parsing. It seems that separating the two functions makes it easier to achieve necessary customizations. For example, one can register "default" options, then register custom options (not available in standard mlir-opt), then parse all of them. Other cases include two-stage parsing where some additional options become available based on parsed information (e.g. compilation target can allow additional options to be present).

…#166782) According to SPIR-V spec: > It is invalid to decorate any given id or structure member more than one time with the same [decoration](https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#Decoration), unless explicitly allowed below for a specific decoration. `FuncParamAttr` explicitly allows multiple uses of the decoration on the same id, so this patch honors it.

This PR adds all the missing doc strings in IRCore.cpp. It also 1. Normalizes all doc strings to have proper punctuation; 2. Inlines non-duplicated docstrings which are currently at the top of the source file (and thereby possibly out of sync). Follow-up PRs will do the same for the rest of the modules/source files. --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

- unify isRAStateSigned and isRAStateUnsigned to a common getRAState, - unify setRASigned and setRAUnsigned into setRAState(MCInst, bool), - update users of these to match the new implementations.

…ubstNonTypeTemplateParmPackExpr, PseudoObjectExpr (llvm#160904) Add new visit functions to ASTImporter for CXXParenListInitExpr, SubstNonTypeTemplateParmPackExpr and PseudoObjectExpr. On CTU analysis there are lot of "cannot import unsupported AST node" for CXXParenListInitExpr, SubstNonTypeTemplateParmPackExpr and PseudoObjectExpr. Problem occurred after full support of Concepts in importer.

llvm#167332) Fixes 1553f90

)

…vm#167265)

…4648) Flang on Windows added `-latomic` to the link line. This library does not exist on Windows and the linker gives a warning.

After llvm#163011 was merged, the tests in [`offload/test/offloading/gpupgo`](https://github.com/llvm/llvm-project/compare/main...EthanLuisMcDonough:llvm-project:gpupgo-names-fix-pr?expand=1#diff-f769f6cebd25fa527bd1c1150cc64eb585c41cb8a8b325c2bc80c690e47506a1) broke because the offload plugins were no longer able to find `__llvm_prf_nm`. This pull request explicitly makes `__llvm_prf_nm` visible to the host on GPU targets and reverses the changes made in f7e9968.

…ptions library (llvm#163659) This change moves option-related code from clangDriver into a new clangOptions library. This refactoring is part of a broader effort to support driver-managed builds for compilations using C++ named modules and/or Clang modules. It is required for linking the dependency scanning tooling against the driver without introducing cyclic dependencies, which would otherwise cause build failures when dynamic linking is enabled. In particular, clangFrontend must no longer depend on clangDriver for this to be possible. This PR is motivated by the following review comment: llvm#152770 (comment)

…ffset (llvm#167231)

) In `<__cxxabi_config.h>` there were a few things still around which aren't ever actually used. This removes some of that cruft.

…6982) sincospi/sincospif/sincospil does not appear to exist on common targets. Darwin targets have __sincospi and __sincospif, so define and use those implementations. I have no idea what version added those calls, so I'm just guessing it's the same conditions as __sincos_stret. Most of this patch is working to preserve codegen when a vector library is explicitly enabled. This only covers sleef and armpl, as those are the only cases tested. The multiple result libcalls have an aberrant process where the legalizer looks for the scalar type's libcall in RuntimeLibcalls, and then cross references TargetLibraryInfo to find a matching vector call. This was unworkable in the sincospi case, since the common case is there is no scalar call available. To preserve codegen if the call is available, first try to match a libcall with the vector type before falling back on the old scalar search. Eventually all of this logic should be contained in RuntimeLibcalls, without the link to TargetLibraryInfo. In principle we should perform the same legalization logic as for an ordinary operation, trying to find a matching subvector type with a libcall.

…lvm#166612) Complex part designators do not have their own symbols. A symbol obtained for the expression `x%re` will be the symbol for `x`, and in this case x is allowed to be allocatable. Fixes llvm#166278.

At the moment the behavior is no different from Version 3.

…vm#166995) There is a possible nullptr deref in BuildCXXNestedNameSpecifier when calling ExtendNestedNameSpecifier or using isa<>. This initially showed up as a crash in clangd, that didn't manifest in when compiling w/ clang. The reduced test case added in this patch, however does expose the issue in clang. Testing locally shows that both this test case and the original clangd issue are fixed by checking the validity of the pointer before trying to dispatch. Since all code paths require the pointer to be valid (usually by virtue of a dyn_cast or isa<> check), there should be no functional difference. Fixes llvm#166843

…3618) Fixes llvm#163256

…m#167373) Revert "[CIR][NFC] Add test for Complex imag with GUN extension" to fix the name This reverts commit 9f0c449.

…m#160232) On AArch64, ADRP and its user instructions (LDR, ADD, etc.), that are referencing a GOT symbol, when separated into different functions by machine outliner exposes a correctness issue in the linker ICF. In such cases, user instructions can end up pointing to a folded section (with its canonical folded symbol), while ADRP instruction point to a GOT entry corresponding to the original symbol. This leads to loading from incorrect memory address after ICF. llvm#129122 explains how this can happen in detail. This addresses llvm#131660 which should fix two things: 1. Hide the correctness issue described above in the LLVM linker. 2. Allows optimizations that could relax GOT addressing to PC-relative addressing.

) This reverts commit c0e4bce. This was causing issues on older python versions. They are fixed in the reland and have been tested as working.

…llvm#166983) These are the tested set of libcalls used for codegen of llvm.sincos and are needed to get the legalization to follow standard procedure.

llvm#166459)" This reverts commit c17a839.

z1-cciauto · 2025-11-10T23:17:37Z

PSDB Link: https://compiler-ci.amd.com/job/compiler-psdb-amd-staging/2759

lukel97 and others added 30 commits November 10, 2025 12:10

Remove unused standard headers: <string>, <optional>, <numeric>, <tup…

28d9f99

…le> (llvm#167232)

[X86] 2012-01-10-UndefExceptionEdge.ll - regenerate test checks (llvm…

2705951

…#167307)

[HLSL][SPIRV] Add error test for unpackhalf2x16 (llvm#166969)

be84705

Add an error test to check that a suitable error diagnostic is generated for the use of the GL::unpackhalf2x16 operation in invalid contexts. Fixes llvm#166965 Co-authored-by: Tim Corringham <tcorring@amd.com>

[MLIR][XeGPU] Decouple inst_data and lane_layout in propagation (l…

bba40ab

…lvm#166941)

[MLIR][XeGPU][TransformOps] Add get_desc_op (llvm#166801)

1553f90

Add `transform.xegpu.get_desc_op` transform op that finds a `xegpu.create_nd_tdesc` producer op of a `Value`.

[MLIR][XeGPU][TransformOps] Add set_op_layout_attr op (llvm#166854)

94a7006

Adds `transform.xegpu.set_op_layout_attr` transform op that attaches `xegpu.layout` attribute to the target op.

[NFC][PowerPC] Pre-commit adding test case: use millicode for memmove (…

69c8756

…llvm#166961) add test case to test lib call are used for the memmove milicode

[BOLT] Simplify RAState helpers (NFCI) (llvm#162820)

cd68056

- unify isRAStateSigned and isRAStateUnsigned to a common getRAState, - unify setRASigned and setRAUnsigned into setRAState(MCInst, bool), - update users of these to match the new implementations.

[BAZEL] Add missing dependency on /llvm:Support from XeGPUTransformOps (

9625cf6

llvm#167332) Fixes 1553f90

AMDGPU: Add baseline test for known bits of AssertNoFPClass (llvm#167288

741ba82

)

AMDGPU: Add baseline test for nofpclass on call results (llvm#167263)

726c049

AMDGPU: Add baseline tests for copysign with known signmask input (ll…

54053cf

…vm#167265)

[Flang][driver] Do not emit -latomic on link line on Windows (llvm#16…

b9e22cc

…4648) Flang on Windows added `-latomic` to the link line. This library does not exist on Windows and the linker gives a warning.

[flang][cuda] Fix detection of assumed size arrays in shared memory o…

0bae337

…ffset (llvm#167231)

[SandboxIR] Fix typo in doc (llvm#167315)

61e5bc3

philnik777 and others added 12 commits November 10, 2025 20:04

[libc++abi][NFC] Remove some cruft from <__cxxabi_config.h> (llvm#164578

46a8ddb

) In `<__cxxabi_config.h>` there were a few things still around which aren't ever actually used. This removes some of that cruft.

[flang][OpenMP] Detect complex part designators in atomic variables (l…

89577e9

…lvm#166612) Complex part designators do not have their own symbols. A symbol obtained for the expression `x%re` will be the symbol for `x`, and in this case x is allowed to be allocatable. Fixes llvm#166278.

[NFC][SpecialCaseList] Precommit Version 4 tests (llvm#167282)

efc83cc

At the moment the behavior is no different from Version 3.

[WebAssembly] Enable musttail only when tail-call is enabled (llvm#16…

5c4083e

…3618) Fixes llvm#163256

Revert "[CIR][NFC] Add test for Complex imag with GUN extension" (llv…

70a6475

…m#167373) Revert "[CIR][NFC] Add test for Complex imag with GUN extension" to fix the name This reverts commit 9f0c449.

Reapply "[CI] Make premerge_advisor_explain write comments" (llvm#167198

eae817d

) This reverts commit c0e4bce. This was causing issues on older python versions. They are fixed in the reland and have been tested as working.

RuntimeLibcalls: Add call entries for sincos sleef and armpl libcalls (…

f2f04c3

…llvm#166983) These are the tested set of libcalls used for codegen of llvm.sincos and are needed to get the legalization to follow standard procedure.

Revert "[OMPIRBuilder] Fix addrspace of internal critical section lock (

26b4ac0

llvm#166459)" This reverts commit c17a839.

merge main into amd-staging

70559d4

ronlieb requested review from a team and 13524182838 November 10, 2025 23:16

ronlieb requested review from fabianmcg, krzysz00, kuhar, nicolasvasilache and stellaraccident as code owners November 10, 2025 23:16

ronlieb requested review from dpalermo and removed request for 13524182838, fabianmcg, krzysz00, kuhar, nicolasvasilache and stellaraccident November 10, 2025 23:16

dpalermo approved these changes Nov 11, 2025

View reviewed changes

z1-cciauto merged commit 12daccd into amd-staging Nov 11, 2025
17 checks passed

z1-cciauto deleted the amd/merge/upstream_merge_20251110162944 branch November 11, 2025 02:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

merge main into amd-staging #554

merge main into amd-staging #554

Uh oh!

ronlieb commented Nov 10, 2025

Uh oh!

z1-cciauto commented Nov 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

57 participants

merge main into amd-staging #554

merge main into amd-staging #554

Uh oh!

Conversation

ronlieb commented Nov 10, 2025

Uh oh!

z1-cciauto commented Nov 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

57 participants