merge amd-staging into amd-feature/wave-transform #704

cdevadas · 2025-11-28T09:40:39Z

No description provided.

This is noted by the specification, and should save a dynamic instruction. Code size should be no worse than before, as the pairs of moves can usually be turned into two 16-bit moves, but `fmv.d` is always a 32-bit instruction. LLVM can look through a `FSGNJ_D_IN32X`, in `RISCVInstrInfo::isCopyInstrImpl` which helps copy propagation.

Automated with `sed -i 's/\.Value//g' lib/Target/AMDGPU/*.td` plus a tiny bit of manual reformatting.

This upstreams the code to handle member initialization for non-record arrays.

When individual elements of a vector are updated via vector swizzle, it needs to be handled as separate store operations to the individual vector elements. Clang treats vectors as one unit, so if a part of a vector needs to be updated, the whole vector is loaded, some elements modified, and then the whole vector is stored. In HLSL vector elements are handled separately. We need to avoid this load/modify/store sequence to prevent overwriting other vector elements that might be getting updated in parallel. Fixes llvm#152815

Flags are now passed on construction/cloning. Remove unnecessary transferFlags call, and make code independent of VPRecipeWithIRFlags, to support additional recipes in the future.

…e op, NFC

It is possible that a fork could occur while MutexTSDs is being held and then cause a deadlock in a forked process when something attempts to lock it again. Instead add it to the enable/disable list of mutexes.

Add test cases for canonicalizing AddRecs that may wrap.

…-vector-to-llvm`." (llvm#169570) Reverts llvm#166204 There was a build issue due to a missing dependency.

libcxx requires minimal macOS 11 to build. This patch bumps the minimal OS X target in Fuchsia's cmake cache file to 11.0 to satisfy this requirement.

…69569) Some globals (e.g., fir.global) have initialization regions that may transitively reference other globals or type descriptors. Add getInitRegion() to GlobalVariableOpInterface to retrieve these regions, returning Region* (nullptr if the global uses attributes for initialization, as with memref.global).

…m#166597) Currently, -gsplit-dwarf and -mrelax are incompatible options in Clang. The issue is that .dwo files should not contain any relocations, as they are not processed by the linker. However, relaxable code emits relocations in DWARF for debug ranges that reside in the .dwo file when DWARF fission is enabled. This patch makes DWARF fission compatible with RISC-V relaxations. It uses the StartxEndx DWARF forms in .debug_rnglists.dwo, which allow referencing addresses from .debug_addr instead of using absolute addresses. This approach eliminates relocations from .dwo files.

…#682) AMDGPU requires more complex CFI rules, normally these would be expressed with .cfi_escape, however this would make the CFI unreadable and makes it difficult to update registers in CFI instructions (also something AMDGPU requires). Authored-by: Emma Pilkington <Emma.Pilkington@amd.com> (cherry picked from commit fd94b41)

This PR adds codegen for `cir.await` ready and suspend. One notable difference from the classic codegen is that, in the suspend branch, it emits an `AwaitSuspendWrapper`(`.__await_suspend_wrapper__init`) function that is always inlined. This function wraps the suspend logic inside an internal wrapper that gets inlined. Example here: https://godbolt.org/z/rWYGcaaG4

@fiigii

…" (llvm#169546) This reverts commit f67409c. cc @fiigii Including us, several separate groups are experiencing regressions with this change. This is the smallest reproducer pasted by @akuegel : llvm#162930 (comment)

Allows construction of ErrorAsOutParameters from Error references.

…m#169565) Use `getNestedDoConstruct` from Utils to get the nested DoConstructs. Fixes llvm#169532

Implementation similar to the clang one in `clang/lib/Headers/__clang_cuda_intrinsics.h`

This makes all of the clangd tests work with the internal shell. Modifications needed for each test are as follows: 1. system-include-extractor.test was using variable expansion which is not supported in the internal shell. This patch rewrites it to use the readfile mechanism along with python. This isn't super pretty but is readily understandable and there are only two tests across the monorepo that use this construction, so making it prettier is hard to justify. 2. include-cleaner-batch-fix.test - Was using $'' construction to create new lines in a string. Simply replace it with multiple echo commands to be canonical with the rest of the repository. 3. index-tools.test - Just add IndexBenchmark to the clangd test depends, so the test now just works unconditionally. This should significantly increase test coverage at little cost. Reviewers: ilovepi, HighCommander4, petrhosek, kadircet Reviewed By: ilovepi Pull Request: llvm#169539

Enable it now that all of the tests pass under the internal shell. The internal shell is slightly faster (10-15%) and also provides a better debugging experience. Reviewers: petrhosek, ilovepi, kadircet, HighCommander4 Reviewed By: ilovepi Pull Request: llvm#169540

Fixes smoke-dev mpi-reduce, mpi-allreduce. Fixes errors of the form EmissaryMPI.h:54:3: error: unknown type name 'MPI_Datatype'.

This reverts commit c51c382. This breaks at least one buildbot: 1. https://lab.llvm.org/buildbot/#/builders/134/builds/30460

This reverts commit 9c414c4. This one is causing buildbot failures too at CMake configure time: 1. https://lab.llvm.org/buildbot/#/builders/193/builds/12452

Split from llvm#158900 it adds a PerThreadContainer that can use STL-like indexed containers based on a slightly refactored PerThreadTable. --------- Co-authored-by: Joseph Huber <huberjn@outlook.com>

…uiltins (llvm#168666) This PR extends __scoped_atomic builtins with inc and dec functions. They map to LLVM IR `atomicrmw uinc_wrap` and `atomicrmw udec_wrap`. These enable implementation of OpenCL-style atomic_inc / atomic_dec with wrap semantics on targets supporting scoped atomics (e.g. GPUs). --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

…-uses.ll` due to changes to ReplaceConstant (llvm#169848) Fixes an LLVM DirectX codegen test after it broke due to llvm#169141 The CBuffer loads and GEPs are no longer duplicated when there are two or more accesses within the same basic block. This PR removes the duplicate check for CBuffer load and GEP from the original test function `@f` and adds a new test function `@g` which places duplicate CBuffer loads into separate basic blocks.

Indirect call instrumentation snippet uses x16 register in exit handler to go to destination target __bolt_instr_ind_call_handler_func: msr nzcv, x1 ldp x0, x1, [sp], llvm#16 ldr x16, [sp], llvm#16 ldp x0, x1, [sp], llvm#16 br x16 <----- This patch adds the instrumentation snippet by calling instrumentation runtime library through indirect call instruction and adding the wrapper to store/load target value and the register for original indirect instruction. Example: mov x16, foo infirectCall: adrp x8, Label add x8, x8, #:lo12:Label blr x8 Before: Instrumented indirect call: stp x0, x1, [sp, #-16]! mov x0, x8 movk x1, #0x0, lsl llvm#48 movk x1, #0x0, lsl llvm#32 movk x1, #0x0, lsl llvm#16 movk x1, #0x0 stp x0, x1, [sp, #-16]! adrp x0, __bolt_instr_ind_call_handler_func add x0, x0, #:lo12:__bolt_instr_ind_call_handler_func blr x0 __bolt_instr_ind_call_handler: (exit snippet) msr nzcv, x1 ldp x0, x1, [sp], llvm#16 ldr x16, [sp], llvm#16 ldp x0, x1, [sp], llvm#16 br x16 <- overwrites the original value in X16 __bolt_instr_ind_call_handler_func: (entry snippet) stp x0, x1, [sp, #-16]! mrs x1, nzcv adrp x0, __bolt_instr_ind_call_handler add x0, x0, x0, #:lo12:__bolt_instr_ind_call_handler ldr x0, [x0] cmp x0, #0x0 b.eq __bolt_instr_ind_call_handler str x30, [sp, #-16]! blr x0 <--- runtime lib store/load all regs ldr x30, [sp], llvm#16 b __bolt_instr_ind_call_handler _________________________________________________________________________ After: mov x16, foo infirectCall: adrp x8, Label add x8, x8, #:lo12:Label blr x8 Instrumented indirect call: stp x0, x1, [sp, #-16]! mov x0, x8 movk x1, #0x0, lsl llvm#48 movk x1, #0x0, lsl llvm#32 movk x1, #0x0, lsl llvm#16 movk x1, #0x0 stp x0, x30, [sp, #-16]! adrp x8, __bolt_instr_ind_call_handler_func add x8, x8, #:lo12:__bolt_instr_ind_call_handler_func blr x8 <--- call trampoline instr lib ldp x0, x30, [sp], llvm#16 mov x8, x0 <---- restore original target ldp x0, x1, [sp], llvm#16 blr x8 <--- original indirect call instruction // don't touch regs besides x0, x1 __bolt_instr_ind_call_handler: (exit snippet) ret <---- return to original function with indirect call __bolt_instr_ind_call_handler_func: (entry snippet) adrp x0, __bolt_instr_ind_call_handler add x0, x0, #:lo12:__bolt_instr_ind_call_handler ldr x0, [x0] cmp x0, #0x0 b.eq __bolt_instr_ind_call_handler str x30, [sp, #-16]! blr x0 <--- runtime lib store/load all regs ldr x30, [sp], llvm#16 b __bolt_instr_ind_call_handler

Upstream TryCall Op as a prerequisite for Try Catch work Issue llvm#154992

Add a variant of m_Intrinsic that matches a variable runtime ID.

…169773)

…lvm#169338) In some case, VPWidenPointerInductions become only used by scalars after legalizeAndOptimizationInducftions was already run, for example due to some VPlan optimizations. Move the code to scalarize VPWidenPointerInductions to a helper and use it if needed. This fixes a crash after llvm#148274 in the added test case. Fixes llvm#169780

…ses.ll` more strict (llvm#169855) Continuation of PR llvm#169848 to address PR comments. This PR makes the test more strict by adding CHECKs to ensure the loads are indeed using the same or different GEPs.

passed https://compiler-ci.amd.com/blue/organizations/jenkins/compiler-psdb-amd-staging/detail/compiler-psdb-amd-staging/3005/pipeline/722/ which failed trying to land PR , since approval missing

Proof: https://alive2.llvm.org/ce/z/a5fzlJ Closes llvm#146642 --------- Co-authored-by: Yingwei Zheng <dtcxzyw@qq.com>

This pull request addresses an issue encountered when building **libcxx** with certain configurations (`-D_LIBCPP_HAS_MUSL_LIBC` & `-D__linux__`) that lack the `_GNU_SOURCE` definition. Specifically, this issue arises if the system **musl libc** is built with `_BSD_SOURCE` instead of `_GNU_SOURCE`. The resultant configuration leads to problems with the "Strtonum functions" in the file [libcxx/include/__locale_dir/support/linux.h](https://github.com/llvm/llvm-project/tree/master/libcxx/include/__locale_dir/support/linux.h), affecting the following functions: - `__strtof` - `__strtod` - `__strtold` **Error messages displayed include**: ```console error: no member named 'strtof_l' in the global namespace ``` ```console error: no member named 'strtod_l' in the global namespace ``` ```console error: no member named 'strtold_l' in the global namespace ``` For more insight, relevant code can be accessed [here](https://github.com/llvm/llvm-project/blob/79cd1b7a25cdbf42c7234999ae9bc51db30af1f0/libcxx/include/__locale_dir/support/linux.h#L85-L95).

…69458) Summary ====== This PR update the schedule for online sync-up and update link for past meeting slides. Changes ====== * Remove the wednesday schedule, since we did not have the meeting for Americas-friendly timezones. * Use a single folder for past meeting slides instead of individual links. Related Links ========= * [Meeting materials for Qualification Working Group](https://llvm.org/docs/QualGroup.html#meeting-materials) * [Online Sync-Ups](https://llvm.org/docs/GettingInvolved.html#online-sync-ups) --------- Signed-off-by: ZakyHermawan <zaky.hermawan9615@gmail.com>

…vm#166684) The `arith.cmpf` lowering pattern used to generate invalid IR when an unsupported floating-point type was used.

…ations. (llvm#169273) This is achieved by using some of the bits of RelType to tag vendor namespaces. This change also adds a relocation iterator for RISCV that folds vendor namespaces into the RelType of the following relocation. This patch is extracted from the implementation of RISCV vendor-specific relocations in the CHERIoT LLVM downstream: CHERIoT-Platform/llvm-project@3d6d6f7

github-actions · 2025-11-28T09:41:19Z

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

z1-cciauto · 2025-11-28T09:42:06Z

PSDB Build Link: http://mlse-bdc-20dd129:8065/#/builders/10/builds/30

lenary and others added 30 commits November 25, 2025 20:03

[AMDGPU] Simplify VT comparisons. NFC. (llvm#169526)

dbcf568

Automated with `sed -i 's/\.Value//g' lib/Target/AMDGPU/*.td` plus a tiny bit of manual reformatting.

[CIR] Upstream non-record array init handling (llvm#169429)

1c9368e

This upstreams the code to handle member initialization for non-record arrays.

[VPlan] Remove redundant transferFlags call from replicateByVF (NFC).

091aece

Flags are now passed on construction/cloning. Remove unnecessary transferFlags call, and make code independent of VPRecipeWithIRFlags, to support additional recipes in the future.

[SLP][NFC]Add a test with commutative instruction with non-commutativ…

00ffc70

…e op, NFC

[scudo] Lock/unlock MutexTSDs in disable/enable. (llvm#169440)

074d17e

It is possible that a fork could occur while MutexTSDs is being held and then cause a deadlock in a forked process when something attempts to lock it again. Instead add it to the enable/disable list of mutexes.

[SCEV] Add tests for UDiv canonicalization of AddRecs that may wrap.

e894654

Add test cases for canonicalizing AddRecs that may wrap.

merge main into amd-staging

3d0743b

Revert "[GPUToXeVMPipeline][Pipeline] Modify pipeline to add `convert…

9bf78ab

…-vector-to-llvm`." (llvm#169570) Reverts llvm#166204 There was a build issue due to a missing dependency.

[Fuchsia] Bump minimal OS X target to 11.0 (llvm#169568)

49828c2

libcxx requires minimal macOS 11 to build. This patch bumps the minimal OS X target in Fuchsia's cmake cache file to 11.0 to satisfy this requirement.

Regen llvm/test/Transforms/InstCombine/cast-mul-select.ll

4b512b4

[orc-rt] Add ErrorAsOutParameter convenience constructor. (llvm#169467)

9534ed9

Allows construction of ErrorAsOutParameters from Error references.

[flang][OpenMP] Skip compiler directives in getCollapsedLoopEval (llv…

fd22706

…m#169565) Use `getNestedDoConstruct` from Utils to get the nested DoConstructs. Fixes llvm#169532

[flang][cuda] Use PTX instruction for atomicAdd with 4xf32 (llvm#169581)

f7a9fca

Implementation similar to the clang one in `clang/lib/Headers/__clang_cuda_intrinsics.h`

[compiler-rt] [UBsan] precommit test (llvm#169579)

1c034a3

merge main into amd-staging (#685)

cef1d4a

[OpenMP] Added missing MPI include file in Emissary source. (#684)

8050157

Fixes smoke-dev mpi-reduce, mpi-allreduce. Fixes errors of the form EmissaryMPI.h:54:3: error: unknown type name 'MPI_Datatype'.

Revert "[clangd] Enable lit internal shell by default"

4cfbc44

This reverts commit c51c382. This breaks at least one buildbot: 1. https://lab.llvm.org/buildbot/#/builders/134/builds/30460

Revert "[clangd] Make lit tests work with the internal shell"

bd04ef6

This reverts commit 9c414c4. This one is causing buildbot failures too at CMake configure time: 1. https://lab.llvm.org/buildbot/#/builders/193/builds/12452

[OFFLOAD] Add support for indexed per-thread containers (llvm#164263)

3f22ed1

Split from llvm#158900 it adds a PerThreadContainer that can use STL-like indexed containers based on a slightly refactored PerThreadTable. --------- Co-authored-by: Joseph Huber <huberjn@outlook.com>

merge main into amd-staging

c8058ff

Icohedron and others added 18 commits November 27, 2025 12:02

[PowerPC] Implement paddis (llvm#161572)

d39f524

merge main into amd-staging

8ccc861

[CIR] Upstream TryCallOp (llvm#165303)

2e655c2

Upstream TryCall Op as a prerequisite for Try Catch work Issue llvm#154992

[VPlan] Add m_Intrinsic matcher that takes a variable intrinsic ID (NFC)

8f36135

Add a variant of m_Intrinsic that matches a variable runtime ID.

[clang-format][NFC] Remove the parameter of parseRequires...() (llvm#…

07d14cb

…169773)

[NFC] [DirectX] Make DirectX codegen test `CBufferAccess/gep-ce-two-u…

06c8ee6

…ses.ll` more strict (llvm#169855) Continuation of PR llvm#169848 to address PR comments. This PR makes the test more strict by adding CHECKs to ensure the loads are indeed using the same or different GEPs.

merge main into amd-staging (#701)

b617908

passed https://compiler-ci.amd.com/blue/organizations/jenkins/compiler-psdb-amd-staging/detail/compiler-psdb-amd-staging/3005/pipeline/722/ which failed trying to land PR , since approval missing

[InstCombine] fold icmp of select with invertible shl (llvm#147182)

583fba3

Proof: https://alive2.llvm.org/ce/z/a5fzlJ Closes llvm#146642 --------- Co-authored-by: Yingwei Zheng <dtcxzyw@qq.com>

[mlir][arith] Fix arith.cmpf lowering with unsupported FP types (ll…

b142912

…vm#166684) The `arith.cmpf` lowering pattern used to generate invalid IR when an unsupported floating-point type was used.

merge main into amd-staging

9a65678

merge main into amd-staging (#703)

2ae3ec9

merge amd-staging into amd-feature/wave-transform

c982655

cdevadas requested review from jmmartinez, lalaniket8 and vg0204 November 28, 2025 09:40

cdevadas requested review from b-sumner, david-salinas and lamb-j as code owners November 28, 2025 09:40

lalaniket8 approved these changes Nov 28, 2025

View reviewed changes

jmmartinez approved these changes Nov 28, 2025

View reviewed changes

cdevadas merged commit 91f1bb6 into amd-feature/wave-transform Nov 28, 2025
45 checks passed

cdevadas deleted the amd/dev/cdevadas/wave-transform/merge-from-stg-nov-28 branch November 28, 2025 11:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

merge amd-staging into amd-feature/wave-transform #704

merge amd-staging into amd-feature/wave-transform #704

Uh oh!

cdevadas commented Nov 28, 2025

Uh oh!

github-actions bot commented Nov 28, 2025

Uh oh!

z1-cciauto commented Nov 28, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

merge amd-staging into amd-feature/wave-transform #704

merge amd-staging into amd-feature/wave-transform #704

Uh oh!

Conversation

cdevadas commented Nov 28, 2025

Uh oh!

github-actions bot commented Nov 28, 2025

Uh oh!

z1-cciauto commented Nov 28, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants