forked from llvm/llvm-project
-
Notifications
You must be signed in to change notification settings - Fork 77
merge main into amd-staging #615
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The index == 0 scenerio has already been handled by the early return, so only the upper half scenerio is relevant here.
llvm#168392) Whenever llvm#149042 is relanded we will soon start EVL tail folding vectorized loops that have live-outs, e.g.: ```c int f(int *x, int n) { for (int i = 0; i < n; i++) { int y = x[i] + 1; x[y] = y; } return y; } ``` These are vectorized by extracting the last "active lane" in the loop's exit: ```llvm loop: %vl = call i32 @llvm.experimental.get.vector.length(i64 %avl, i32 4, i1 true) ... exit: %lastidx = sub i64 %vl, 1 %lastelt = extractelement <vscale x 4 x i32> %y, i64 %lastidx ``` Which in RISC-V translates to a vslidedown.vx with a VL of 1: ```llvm bb.loop: %vl:gprnox0 = PseudoVSETVLI ... %y:vr = PseudoVADD_VI_M1 $noreg, %x, 1, AVL=-1 ... bb.exit: %lastidx:gprnox0 = ADDI %vl, -1 %w:vr = PseudoVSLIDEDOWN_VX_M1 $noreg, %y, %lastidx, AVL=1 ``` However today we will fail to reduce the VL of %y in the loop and will end up with two extra VL toggles. The reason being that today RISCVVLOptimizer is conservative with vslidedown.vx as it can read the lanes of %y past its own VL. So in `getMinimumVLForUser` we say that vslidedown.vx demands the entirety of %y. One observation with the sequence above is that it only actually needs to read the first %vl lanes of %y, because the last lane of vs2 used is offset + 1. In this case, that's `%lastidx + 1 = %vl - 1 + 1 = %vl`. This PR teaches RISCVVLOptimizer about this case in `getMinimumVLForVSLIDEDOWN_VX`, and in doing so removes the VL toggles. The one case that I had to think about for a bit was whenever `ADDI %vl, -1` wraps, i.e. when %vl=0 and the resulting offset is all ones. This should always be larger than the largest VLMAX, so vs2 will be completely slid down and absent from the output. So we don't need to read anything from vs2. This patch on its own has no observable effect on llvm-test-suite or SPEC CPU 2017 w/ rva23u64 today.
### Summary This PR resolves llvm#163895. Just add fcmp-sse part of X86 vector builtins for CIR. --------- Co-authored-by: liuzhenya <zyliu@siorigin.com>
We need to fallthrough here in case we're not jumping to the labels. This is only needed in expression contexts.
Add a pass option to `convert-scf-to-cf` to deactivate pattern rollback for better performance. The lowering patterns from SCF->CF to benefit a lot from this feature because `splitBlock` is expensive in the rollback driver.
…vm#168430) Updated the evaluate handler to check for DAP ErrorResponse bodies, which are used to display user errors if a request fails. This was updated in PR llvm#167720 This should fix https://lab.llvm.org/buildbot/#/builders/163
…ts (llvm#166851) `-fsanitize=address,fuzzer` should be rejected like `-fsanitize=fuzzer,address`. The address sanitizer enables the device sanitizer pipeline. The fuzzer implicitly turns on LLVMs SanitizerCoverage, which the driver then forwards to the device cc1. SanitizerCoverage is not supported on amdgcn.
…vm#168058) Also breaks the long inheritance chains by making both `SIGfx10CacheControl` and `SIGfx12CacheControl` inherit from `SICacheControl` directly. With this patch, we now just have 3 `SICacheControl` implementations that each do their own thing, and there is no more code hidden 3 superclasses above (which made this code harder to read and maintain than it needed to be).
1. Fixed 2 DTLTO cache tests that failed on MacOS because input to grep command is different compared to Windows 2. Removed unneeded comments from dtlto-cache.ll
…m#166360) This will be used to support ZT0 in the MachineSMEABIPass.
…lvm#166247) This patch implements a transform to hoists single-scalar replicated loads with invariant addresses out of the vector loop to the preheader when scoped noalias metadata proves they cannot alias with any stores in the loop. This enables hosting of loads we can prove do not alias any stores in the loop due to memory runtime checks added during vectorization. PR: llvm#166247
) This generates more optimal codegen when using partial reductions with predication. ``` partial_reduce_*mla(acc, sel(p, mul(*ext(a), *ext(b)), splat(0)), splat(1)) -> partial_reduce_*mla(acc, sel(p, a, splat(0)), b) partial.reduce.*mla(acc, sel(p, *ext(op), splat(0)), splat(1)) -> partial.reduce.*mla(acc, sel(p, op, splat(0)), splat(trunc(1))) ```
…lvm#168341) This is harmless due to the previous checks for > 0, but it is still confusing for the readers.
…154972) AsmLexer expects the buffer it's provided for lexing to be NULL-terminated, where the NULL terminator is pointed to by `CurBuf.end()`. However, this expectation isn't explicitly stated anywhere. This commit adds a couple of comments as well as an assert as means of documenting this expectation.
…#167705) Generally, to_tensor and to_buffer already perform sufficient verification. However, there are some unnecessarily strict constraints: * builtin tensor requires its buffer counterpart to always be memref * to_buffer on ranked tensor requires to always return memref These checks are assertions (i.e. preconditions), however, they actually prevent an apparently useful bufferization where builtin tensors could become custom buffers. Lift these assertions, maintaining the verification procedure unchanged, to allow builtin -> custom bufferizations at operation boundary level.
…used in constexpr (llvm#162816) This PR just resolves ss/sd part of AVX512 masked arithmetic intrinsics of llvm#160559.
…s to be used in constexpr (llvm#168496) ### Summary This PR resolves llvm#160559 - other pd/ps/epi/epu part of AVX512 masked arithmetic intrinsics.
Add a few patterns for extadd pairwise.
…in (NFC) (llvm#168343) In 4 years the plugin wasn't adapted to other object formats. This patch makes it specific for ELF, which will allow to remove some abstractions down the line. It also moves the plugin from LLVMOrcJIT into LLVMOrcDebugging, which didn't exist back then.
Nest arguments are supported by CC in X86CallingConv.td. Nothing special is required in GlobalISel as we reuse the code. Nest attribute is mostly generated by fortran frontend.
…167322) There was a minor oversight in commit 6836261; the AArch64 GICv5 instruction `GIC CDEOI` takes no operands, since the text of the specification says: ``` The Rt field should be set to 0b11111. If the Rt field is not set to 0b11111, it is CONSTRAINED UNPREDICTABLE whether: * The instruction is UNDEFINED. * The instruction behaves as if the Rt field is set to 0b11111. ```
This commit adds support for tgen05.mma family of instructions in the NVVM MLIR dialect and lowers to LLVM Intrinsics. Please refer [PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/#tcgen05-mma-instructions) for information
…m#148650) This patch introduces preliminary support for additional memory locations. They are: target_mem0 and target_mem1 and they model memory locations that cannot be represented with existing memory locations. It was a solution suggested in : https://discourse.llvm.org/t/rfc-improving-fpmr-handling-for-fp8-intrinsics-in-llvm/86868/6 Currently, these locations are not yet target-specific. The goal is to enable the compiler to express read/write effects on these resources.
(Reland of llvm#161546, fixing three build and test issues) This commit adds optimized assembly versions of single-precision float multiplication and division. Both functions are implemented in a style that can be assembled as either of Arm and Thumb2; for multiplication, a separate implementation is provided for Thumb1. Also, extensive new tests are added for multiplication and division. These implementations can be removed from the build by defining the cmake variable COMPILER_RT_ARM_OPTIMIZED_FP=OFF. Outlying parts of the functionality which are not on the fast path, such as NaN handling and underflow, are handled in helper functions written in C. These can be shared between the Arm/Thumb2 and Thumb1 implementations, and also reused by other optimized assembly functions we hope to add in future.
…llvm#166245) Implement CastInfo from VPRecipeBase to VPIRMetadata to support isa/dyn_Cast. This is similar to CastInfoVPPhiAccessors, supporting dyn_cast by down-casting to the concrete recipe types inheriting from VPIRMetadata. Can be used for more generalized VPIRMetadata printing following llvm#165825. PR: llvm#166245
Fixed llvm#148354 Lower SPIR-V Tan/Tanh ops using the corresponding LLVM intrinsics to reduce instructions and prevent overflow caused by the previous `exp`-based expansion.
…7915) Exceptions include intrinsics that: * take or return floating point data * read or write FFR * read or write memory * read or write SME state
…m#168427) This adds handling for f16 and f128 lround/llround under LP64 targets, promoting the f16 where needed and using a libcall for f128. This codegen is now identical to the selection dag version.
ronlieb
approved these changes
Nov 18, 2025
Collaborator
Author
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.