forked from llvm/llvm-project
-
Notifications
You must be signed in to change notification settings - Fork 77
merge main into amd-staging #699
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This is the followup of llvm#168135
`[[nodiscard]]` should be applied to functions where discarding the return value is most likely a correctness issue. - https://libcxx.llvm.org/CodingGuidelines.html#apply-nodiscard-where-relevant
…thIf` (llvm#169606) This commit adds support for `replaceUsesWithIf` (and variants such as `replaceAllUsesExcept`) to the `ConversionPatternRewriter`. This API is supported only in no-rollback mode. An assertion is triggered in rollback mode. (This missing assertion has been confusing for users because it seemed that the API supported, while it was actually not working properly.) This commit brings us a bit closer towards removing [this](https://github.com/llvm/llvm-project/blob/76ec25f729fcc7ae576caf21293cc393e68e7cf7/mlir/lib/Transforms/Utils/DialectConversion.cpp#L1214) workaround. Additional changes are needed to support this API in rollback mode. In particular, no entries should be added to the `ConversionValueMapping` for conditional replacements. It's unclear at this point if this API can be supported in rollback mode, so this is deferred to later. This commit turns `replaceUsesWithIf` into a virtual function, so that the `ConversionPatternRewriter` can override it. All other API functions for conditional value replacements call that function. Note for LLVM integration: If you are seeing failed assertions due to this change, you are using unsupported API in your dialect conversion. You have 3 options: (1) Migrate to the no-rollback driver. (2) Rewrite your patterns without the unsupported API. (3) Last resort: bypass the rewriter and call `replaceUsesWithIf` etc. directly on the `Value` object.
`[[nodiscard]]` should be applied to functions where discarding the return value is most likely a correctness issue. - https://libcxx.llvm.org/CodingGuidelines.html#apply-nodiscard-where-relevant
…lvm#169744) # Summary This is a forward fix for test errors from llvm#163653. The PR moved debugger initialization outside of InitializeRequestHandler, and into Launch/AttachRequestHandlers to support DAP sessions sharing debugger instances for dynamically created targets. However, DExTer's DAP class seemed to set breakpoints before the debugger was initialized, which caused the tests to hang waiting for a breakpoint to hit due to none of the breakpoints getting resolved. # Tests ``` bin/llvm-lit -v /home/qxy11/llvm/llvm-project/cross-project-tests/debuginfo-tests/dexter-tests/ ```
Fixes memory leak uncovered by llvm#133533.
…tiple times in the same basic block (llvm#169141) Fixes llvm#167500.
…69005) This change adds the `RN` and `RZ` rounding modes to the `convert.f32x2.to.f16x2` and `convert.f32x2.to.bf16x2` Ops. Tests are added in `convert_fp16x2.mlir` and `invalid_convert_fp16x2.mlir`. Tests with these Ops in `convert_stochastic_rounding.mlir` and `invalid-convert-stochastic-rounding.mlir` have been removed or modified. PTX spec reference: https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-cvt
… types (llvm#168188) Add createIntrinsicCall overload that accepts return type and arguments, automatically resolve overload types rather than requiring manual computation. Simplifies NVVM_PrefetchOp by removing conditional overload logic.
… macro arguments (llvm#169757) When the start and end token are both spelled in macro arguments, we still want to reject the range if they come from two separate macro arguments, as the original specified range is not precisely spelled in a single sequence of characters in source.
…vm#169517) This commit extends the CGProfile module flags export with support for missing function references. Previously, this caused a crash and now it's properly exported to `null` values in the metadata node. Fixes: llvm#160717
We emit this diagnostic from CheckPointerToIntegralCast() already, so remove the emission from CastPointerIntegral().
- Mass-reformat tests in `std/utilities/optional/optional.object/optional.object.ctor` and rearrange header `#include`s - No functional changes - Prelude for llvm#169203
Use const& in range-based for loop to avoid unnecessary copies
…specified layouts (llvm#169267) Introduce anchor layout for XeGPU anchor ops: load_nd, store_nd, prefetch_nd, dpas, load, store, prefetch, load_matrix, store_matrix, and atomic_rmw. Anchor layout is permanent, and is guaranteed to be honored by XeGPU distribution and lowerinngs once specified. 1. Add anchor_layout for XeGPU anchor OPs: load_nd, store_nd, prefetch_nd, dpas, load, store, prefetch, load_matrix, store_matrix, and atomic_rmw. 2. rename layout attributes to anchor_layout for these ops: load, store, load_matrix, store_matrix 3. update layout propagation pass: Only when user doesn't specify anchor layout, the pass computes a default layout and set to anchor op's permant layout and use that for propagation. if user specified anchor layout, the pass takes user-specified anchor layout. permant layout and use that for propagation. if user specified anchor layout, the pass takes user-specified anchor layout.
…ax_lanes (llvm#169293) On RISC-V, some loops that the loop vectorizer vectorizes pre-LTO may turn out to have the exact trip count exposed after LTO, see llvm#164762. If the trip count is small enough we can fold away the @llvm.experimental.get.vector.length intrinsic based on this corollary from the LangRef: > If %cnt is less than or equal to %max_lanes, the return value is equal to %cnt. This on its own doesn't remove the @llvm.experimental.get.vector.length in llvm#164762 since we also need to teach computeKnownBits about @llvm.experimental.get.vector.length and the sub recurrence, but this PR is a starting point. I've added this in InstCombine rather than InstSimplify since we may need to insert a truncation (@llvm.experimental.get.vector.length can take an i64 %cnt argument, the result is always i32). Note that there was something similar done in VPlan in llvm#167647 for when the loop vectorizer knows the trip count.
This patch adds recognition of high-half multiply by parts into a single
larger multiply.
Considering a multiply made up of high and low parts, we can split the
multiply into:
x * y == (xh*T + xl) * (yh*T + yl)
where `xh == x>>32` and `xl == x & 0xffffffff`. `T = 2^32`.
This expands to
xh*yh*T*T + xh*yl*T + xl*yh*T + xl*yl
which I find it helpful to be drawn as
[ xh*yh ]
[ xh*yl ]
[ xl*yh ]
[ xl*yl ]
We are looking for the "high" half, which is xh*yh + xh*yl>>32 + xl*yh>>32 +
carrys. The carry makes this difficult and there are multiple ways of
representing it. The ones we attempt to support here are:
Carry: xh*yh + carry + lowsum
carry = lowsum < xh*yl ? 0x1000000 : 0
lowsum = xh*yl + xl*yh + (xl*yl>>32)
Ladder: xh*yh + c2>>32 + c3>>32
c2 = xh*yl + (xl*yl >> 32); c3 = c2&0xffffffff + xl*yh
Carry4: xh*yh + carry + crosssum>>32 + (xl*yl + crosssum&0xffffffff) >> 32
crosssum = xh*yl + xl*yh
carry = crosssum < xh*yl ? 0x1000000 : 0
Ladder4: xh*yh + (xl*yh)>>32 + (xh*yl)>>32 + low>>32;
low = (xl*yl)>>32 + (xl*yh)&0xffffffff + (xh*yl)&0xfffffff
They all start by matching `xh*yh` + 2 or 3 other operands. The bottom of the
tree is `xh*yh`, `xh*yl`, `xl*yh` and `xl*yl`.
Based on llvm#156879 by @c-rhodes
With EVL tail folding, the LastActiveLane can be computed with EVL - 1.
This removes the need for a header mask and vfirst.m for loops with live
outs on RISC-V:
# %bb.5: # %for.cond.cleanup7
- vsetvli zero, zero, e32, m2, ta, ma
- vmv.v.x v8, s1
- vmsleu.vv v10, v8, v22
- vfirst.m a0, v10
- srli a1, a0, 63
- czero.nez a0, a0, a1
- czero.eqz a1, s8, a1
- or a0, a0, a1
- addi a0, a0, -1
- vsetvli zero, zero, e64, m4, ta, ma
- vslidedown.vx v8, v12, a0
+ addi s1, s1, -1
+ vslidedown.vx v8, v12, s1
…#169776) This commit fixes the import of `branch_weights` metadata from LLVM IR to the LLVM dialect. Previously, `branch_weights` metadata containing the `!"expected"` field were rejected because the importer expected integer weights at operand 1, but found a string.
As far as I can tell the llvm.arm.mve.vminnm.m intrinsic used in these tests was the pre-upstream name of llvm.arm.mve.min.predicated. The tests should not need IR sections, so remove them just relying on the MIR portions.
This tests show how type-checking is performed for `__builtin_amdgcn_load_to_lds`, but not for `__builtin_amdgcn_raw_ptr_buffer_load_lds`, `__builtin_amdgcn_struct_ptr_buffer_load_lds` and `__builtin_amdgcn_global_load_lds` since they are declared with the 't' attribute. Stacked on top of: llvm#165387
…ed by no-return blocks (llvm#167548) At present, the shrink wrapping pass misses opportunities to shrink wrap in the presence of machine basic blocks which exit the function without returning. Such cases arise from C++ functions like the following: ```cxx int foo(int err, void* ptr) { if (err == -1) { if (ptr == nullptr) { throw MyException("Received `nullptr`!", __FILE__, __LINE__); } handle(ptr); } return STATUS_OK; } ``` In particular, assuming `MyException`'s constructor is not marked `noexcept`, the above code will generate a trivial EH landing pad calling `__cxa_free_exception()` and rethrowing the unhandled internal exception, exiting the function without returning. As such, the shrink wrapping pass refuses to touch the above function, spilling to the stack on every call, even though no CSRs are clobbered on the hot path. This patch tweaks the shrink wrapping logic to enable the pass to fire in this and similar cases.
… AVX512 VPMULTISHIFTQB intrinsics to be used in constexpr (llvm#168995) Resolves llvm#167477
The natural assumption is that there's some sort of order here and having people read the reference manual before the basic tutorial does not make sense to me.
…mentwise_sqrt versions (llvm#165682)
Currently LLVM fails to recognize a manual implementation of `phadd` https://godbolt.org/z/zozrssaWb ```llvm declare <8 x i16> @llvm.x86.ssse3.phadd.sw.128(<8 x i16>, <8 x i16>) declare <8 x i16> @llvm.sadd.sat.v8i16(<8 x i16>, <8 x i16>) define <8 x i16> @phaddsw_v8i16_intrinsic(<8 x i16> %a, <8 x i16> %b) { entry: %res = call <8 x i16> @llvm.x86.ssse3.phadd.sw.128(<8 x i16> %a, <8 x i16> %b) ret <8 x i16> %res } define <8 x i16> @phaddsw_v8i16_generic(<8 x i16> %a, <8 x i16> %b) { entry: %even = shufflevector <8 x i16> %a, <8 x i16> %b, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14> %odd = shufflevector <8 x i16> %a, <8 x i16> %b, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15> %sum = call <8 x i16> @llvm.sadd.sat.v8i16(<8 x i16> %even, <8 x i16> %odd) ret <8 x i16> %sum } ``` ```asm phaddsw_v8i16_intrinsic: # @phaddsw_v8i16_intrinsic phaddsw xmm0, xmm1 ret phaddsw_v8i16_generic: # @phaddsw_v8i16_generic movdqa xmm2, xmmword ptr [rip + .LCPI1_0] # xmm2 = [0,1,4,5,8,9,12,13,8,9,12,13,12,13,14,15] movdqa xmm3, xmm1 pshufb xmm3, xmm2 movdqa xmm4, xmm0 pshufb xmm4, xmm2 punpcklqdq xmm4, xmm3 # xmm4 = xmm4[0],xmm3[0] psrad xmm1, 16 psrad xmm0, 16 packssdw xmm0, xmm1 paddsw xmm0, xmm4 ret ``` This PR does recognize the pattern.
`[[nodiscard]]` should be applied to functions where discarding the return value is most likely a correctness issue. - https://libcxx.llvm.org/CodingGuidelines.html#apply-nodiscard-where-relevant
…set (llvm#169786) We can't access the RecordLayout of an invalid decl, so return failure if that happens. Fixes llvm#167076
Extends test coverage to include different start and step values, as well as interleaving.
`[[nodiscard]]` should be applied to functions where discarding the return value is most likely a correctness issue. - https://libcxx.llvm.org/CodingGuidelines.html#apply-nodiscard-where-relevant
This should unbreak that test on 32bit builders, e.g. https://lab.llvm.org/buildbot/#/builders/154/builds/24509
…169781) Let standard casting / builtin_convertvector handle the conversions from BF16 to F32 My only query is how to best implement _mm_cvtpbh_ps - I went for the v8bf16 -> v8f32 conversion followed by subvector extraction in the end, but could just as easily extract a v4bf16 first - makes no difference to final optimized codegen. First part of llvm#154911
…9670) Remove explicit VT numbers from ValueTypes.td so that patches that add a new VT do not have to renumber the entire file. In TableGen VTs are now identified by ValueType.LLVMName instead of ValueType.Value. This is important for target-defined types (typically based on PtrValueType) which are not mentioned in ValueTypes.td itself.
Adding support for serializing the ada entry flags helps with mir based test cases. Without this change, the flags are simple displayed as being "unkmown".
Collaborator
Author
ronlieb
approved these changes
Nov 27, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.