forked from llvm/llvm-project
-
Notifications
You must be signed in to change notification settings - Fork 77
merge main into amd-staging #665
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
+6,804
−716
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Fix warning: llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:1455:23: warning: variable 'Store' set but not used [-Wunused-but-set-variable]
…duct (llvm#168074) A `transform` pass to lower `vector.contract` to (a) `vector.fma` for `F32`, (b) `x86vector.avx512.dot` for `BF16`, (c) `x86vector.avx.dot.i8` for `Int8` packed types. The lowering works on condition with `m`, `batch`, `k` dims to be `one` and `vnni` dim should be `2` for `bf16`; `4` for `int8`. **The lowering pattern**: `batch_reduce.matmul` (input) -> register-tiling(M, N) -> Vectorization (to `vector.contract`) -> `unroll` vector.contract (`unit` dims) -> `hoisting` transformation (move `C` loads/store outside batch/k loop) -> apply `licm`, `canonicalization`, and `bufferize`.
Adds required dependency for `inferContractionDims`. Fixes llvm#168074
Identified with llvm-use-ranges.
Identified with llvm-use-ranges.
Identified with modernize-loop-convert.
Identified with modernize-loop-convert.
…lvm#169245) This patch simplifies iterator_range construction with the conversion constructor.
…es (llvm#169030) We used to create a scope for the true- and false expression of a conditional operator. This was done so e.g. in this example: ```c++ struct A { constexpr A(){}; ~A(); constexpr int get() { return 10; } }; // all-note 2{{declared here}} static_assert( (false ? A().get() : 1) == 1); ``` we did _not_ evaluate the true branch at all, meaning we did not register the local variable for the temporary of type `A`, which means we also didn't call it destructor. However, this breaks the case where the temporary needs to outlive the conditional operator and instead be destroyed via the surrounding `ExprWithCleanups`: ``` constexpr bool test2(bool b) { unsigned long __ms = b ? (const unsigned long &)0 : __ms; return true; } static_assert(test2(true)); ``` Before this patch, we diagnosed this example: ```console ./array.cpp:180:15: error: static assertion expression is not an integral constant expression 180 | static_assert(test2(true)); | ^~~~~~~~~~~ ./array.cpp:177:24: note: read of temporary whose lifetime has ended 177 | unsigned long __ms = b ? (const unsigned long &)0 : __ms; | ^ ./array.cpp:180:15: note: in call to 'test2(true)' 180 | static_assert(test2(true)); | ^~~~~~~~~~~ ./array.cpp:177:51: note: temporary created here 177 | unsigned long __ms = b ? (const unsigned long &)0 : __ms; | ^ 1 error generated. ``` because the temporary created for the true branch got immediately destroyed. The problem in essence is that since the conditional operator doesn't create a scope at all, we register the local variables for both its branches, but we later only execute one of them, which means we should also only destroy the locals of one of the branches. We fix this similar to clang codgen's `is_active` flag: In the case of a conditional operator (which is so far the only case where this is problematic, and this also helps minimize the performance impact of this change), we make local variables as disabled-by-default and then emit a `EnableLocal` opcode later, which marks them as enabled. The code calling their destructors checks whether the local was enabled at all.
…69281) WaitingOnGraphTests.Emit_SingleContainerSimpleCycle tests a pair of emit operations where the second completes a simple cycle (1: A -> B, 2: B -> A). We already had a test of WaitingOnGraph::simplify's behavior in this case, but did not have one for WaitingOnGraph::emit.
…n auto type with typename (llvm#162514) ASTImporter on importing template specialization with auto return type faces cycle when return type is not nested one, but typename from template arguments and other template. There is code, that prevents cycle to auto return types when nested type declared. Solved case differs somehow from nested types, but have same solution with UsedDifferentProtoType - with delayed return type determining.
Supported Ops: `fadd`, `fsub`
To make it clear that the return value is immutable.
This method returns the current expression being emitted, but is only used testing whether an expression is being emitted or not. This patch therefore replaces it with a boolean isEmittingExpression() method.
Building with GCC I got:
```
<...>/OnDiskGraphDB.cpp:624:18: warning: ‘static {anonymous}::DataRecordHandle {anonymous}::DataRecordHandle::construct(char*, const {anonymous}::DataRecordHandle::Input&)’ defined but not used [-Wunused-function]
624 | DataRecordHandle DataRecordHandle::construct(char *Mem, const Input &I) {
| ^~~~~~~~~~~~~~~~
<...>/OnDiskGraphDB.cpp:456:1: warning: ‘static {anonymous}::DataRecordHandle {anonymous}::DataRecordHandle::create(llvm::function_ref<char*(long unsigned int)>, const {anonymous}::DataRecordHandle::Input&)’ defined but not used [-Wunused-function]
456 | DataRecordHandle::create(function_ref<char *(size_t Size)> Alloc,
| ^~~~~~~~~~~~~~~~
```
These implement parts of a class that is defined in an anonymous
namespace. All llvm tests passed with them removed.
…69294) This PR fixes the bazel build that went out of sync with the changes introduced in llvm#168074. Signed-off-by: Ingo Müller <ingomueller@google.com>
The implementation is based on the directive tree. Fixes clangd/clangd#1623
…encing blocks (llvm#169208) Objective-C blocks are like lambdas. They have captures, just like lambdas. However, they can also implicitly capture themselves unlike lambdas. This means that when walking the captures of a block, we may end up in infinite recursion. This is not possible with lambdas, but happened in practice with blocks downstream. In this patch, I just use a set to keep track of the visited MemRegions. Note that theoretically, there is nothing preventing usual lambdas or functors from falling for the same trap, but probably slightly more difficult to do so. You would likely need a pointer to itself, etc. I'll not speculate here. This inf recursion was likely caused by llvm#126620, released in clang-21. rdar://162215172
…#167502) This lowers an SVE FMUL of bf16 using the BFMLAL top/bottom instructions rather than extending to an f32 mul. This does require zeroing the accumulator, but requires fewer extends/unpacking.
…lvm#162633) Introducing this utility makes the `__grow_by{,_and_replace}` significantly easier to understand and allows us to migrate away from these functions in the future.
…7670) This allows propagating optimizations to different algorithms by just optimizing the lowest one. This is especially relevant now that we start optimizing how we're iterating through ranges (e.g. the segmented iterator optimizations) and adding assumptions so the compier can better leverage semantics guaranteed by the standard (e.g. `__builtin_assume_dereferenceable`).
In some workloads we see an argument passed on the stack where it is loaded, only for it to be immediately spilled to a different slot on the stack and then reloaded from that spill slot later on. We can avoid the unnecessary spill by marking loads as rematerializable and just directly loading from where the argument was originally passed on the stack. TargetTransformInfo::isReMaterializableImpl checks to make sure that any loads are `MI.isDereferenceableInvariantLoad()`, so we should be able to move the load down to the remat site. This gives a 14.8% reduction in spills in 544.nab_r on rva23u64 -O3, and a few other smaller reductions on llvm-test-suite. I didn't find any benchmarks where the number of spills/reloads increased. Related: llvm#165761
…#157819) This patch introduces the LASX and LSX conversion intrinsics: - __m256 __lasx_cast_128_s (__m128) - __m256d __lasx_cast_128_d (__m128d) - __m256i __lasx_cast_128 (__m128i) - __m256 __lasx_concat_128_s (__m128, __m128) - __m256d __lasx_concat_128_d (__m128, __m128d) - __m256i __lasx_concat_128 (__m128, __m128i) - __m128 __lasx_extract_128_lo_s (__m256) - __m128d __lasx_extract_128_lo_d (__m256d) - __m128i __lasx_extract_128_lo (__m256i) - __m128 __lasx_extract_128_hi_s (__m256) - __m128d __lasx_extract_128_hi_d (__m256d) - __m128i __lasx_extract_128_hi (__m256i) - __m256 __lasx_insert_128_lo_s (__m256, __m128) - __m256d __lasx_insert_128_lo_d (__m256d, __m128d) - __m256i __lasx_insert_128_lo (__m256i, __m128i) - __m256 __lasx_insert_128_hi_s (__m256, __m128) - __m256d __lasx_insert_128_hi_d (__m256d, __m128d) - __m256i __lasx_insert_128_hi (__m256i, __m128i) Relevant GCC patch: https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=c2013267642fea4a6e89b826940c8aa80a76089d
`[[nodiscard]]` should be applied to functions where discarding the return value is most likely a correctness issue. - https://libcxx.llvm.org/CodingGuidelines.html#apply-nodiscard-where-relevant --------- Co-authored-by: Hristo Hristov <zingam@outlook.com>
`[[nodiscard]]` should be applied to functions where discarding the return value is most likely a correctness issue. - https://libcxx.llvm.org/CodingGuidelines.html#apply-nodiscard-where-relevant
…epi128/_mm256_bsrli_epi128 intrinsics (llvm#169309)
…9303) We didn't take `IntAP`/`IntAPS` into account when casting to and from the computation LHS type. This broke the `std/ranges/range.factories/range.iota.view/end.pass.cpp` test.
…llvm#166353) In combination with llvm#149470 this will introduce parallel accumulators when unrolling reductions with vector instructions. See also llvm#166630, which aims to introduce parallel accumulators for FP reductions.
Linaro is doing network maintenance and I don't have an estimated time these will be back online.
…llvm#169316) This is a second attempt to fix the bazel build (after the first in llvm#169294, which was accidentally merged before CI passed). In the first attempt, not all bazel dependencies had been added; this PR should add them all and make CI pass. Signed-off-by: Ingo Müller <ingomueller@google.com>
InstCombine phi scalarization would always create a new binary op with the phi as the first operand, which is not correct for non-commutable binary ops such as sub. This fix preserves the original binary op ordering in the new binary op and adds a test for this behavior. Currently, this transformation can produce silently incorrect IR, and in the case of the added test, would optimize it out entirely.
Avoid adding any given SuperNode SN to its own SuperNode-deps set. This saves us from trying to redundantly merge its dependencies back into itself (a no-op, but a potentially expensive one).
show information about the signal when the user presses `process handle <unix-signal>` i.e ```sh (lldb) process handle SIGWINCH NAME PASS STOP NOTIFY DESCRIPTION =========== ===== ===== ====== =================== SIGWINCH true false false window size changes ``` Wanted to use the existing `GetSignalDescription` but it is expected behaviour to return the signal name if no signal code is passed. It is used in stop info. https://github.com/llvm/llvm-project/blob/65c895dfe084860847e9e220ff9f1b283ebcb289/lldb/source/Target/StopInfo.cpp#L1192-L1195
…168745) Fix the dependency of `CodeGenDAGPatterns::ParseDefaultOperands()` on the particular order of SDNode definitions. Implicit usage of the first definition as a placeholder makes `llvm-tblgen -gen-dag-isel` fail if that SDNode is not usable as an output pattern operator and an instance of `OperandWithDefaultOps` is used in a pattern. Presently, each `OperandWithDefaultOps` record is processed by constructing an instance of TreePattern from its `DefaultOps` argument that has the form `(ops ...)`. Even though the result of processing the root operator of that DAG is not inspected by `ParseDefaultOperands()` function itself, that operator has to be supported by the underlying `TreePattern::ParseTreePattern()` function. For that reason, a temporary DAG is created by replacing the root operator of `DefaultOps` argument with the first SDNode defined, which is usually `def imm : ...` defined in `TargetSelectionDAG.td` file. This results in misleading errors being reported when implementing new `SDNode` types, if the new definition happens to be added before the `def imm : ...` line. The error is reported by several test cases executed by `check-llvm` target, as well as by the regular build, if one of the enabled targets inherit one of its operand types from `OperandWithDefaultOps`: OptionalIntOperand: ../llvm/test/TableGen/DAGDefaultOps.td:28:5: error: In OptionalIntOperand: Cannot use 'unexpected_node' in an output pattern! def OptionalIntOperand: OperandWithDefaultOps<i32, (ops (i32 0))>; This commit implements a dedicated constructor of `TreePattern` to be used if the caller does not care about the particular root operator of the pattern being processed.
…nsfer_read' for PVC & BMG (llvm#168910) The PR changes the `TransferReadLowering` to always use `xegpu.load` (and not `xegpu.load_nd`) for 1D cases as it has more developed interface (e.g. layouts capabilites). Signed-off-by: dchigarev <dmitry.chigarev@intel.com>
Collaborator
Author
ronlieb
approved these changes
Nov 24, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.