Skip to content

Conversation

@z1-cciauto
Copy link
Collaborator

No description provided.

hstk30-hw and others added 30 commits November 24, 2025 12:49
Fix warning: 
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:1455:23: warning:
variable 'Store' set but not used [-Wunused-but-set-variable]
…duct (llvm#168074)

A `transform` pass to lower `vector.contract` to (a) `vector.fma` for
`F32`, (b) `x86vector.avx512.dot` for `BF16`, (c) `x86vector.avx.dot.i8`
for `Int8` packed types.

The lowering works on condition with `m`, `batch`, `k` dims to be `one`
and `vnni` dim should be `2` for `bf16`; `4` for `int8`.

**The lowering pattern**: `batch_reduce.matmul` (input) ->
register-tiling(M, N) -> Vectorization (to `vector.contract`) ->
`unroll` vector.contract (`unit` dims) -> `hoisting` transformation
(move `C` loads/store outside batch/k loop) -> apply `licm`,
`canonicalization`, and `bufferize`.
Adds required dependency for `inferContractionDims`.

Fixes llvm#168074
Identified with modernize-loop-convert.
Identified with modernize-loop-convert.
…lvm#169245)

This patch simplifies iterator_range construction with the conversion
constructor.
…es (llvm#169030)

We used to create a scope for the true- and false expression of a
conditional operator. This was done so e.g. in this example:

```c++
  struct A { constexpr A(){}; ~A(); constexpr int get() { return 10; } }; // all-note 2{{declared here}}
  static_assert( (false ? A().get() : 1) == 1);
```

we did _not_ evaluate the true branch at all, meaning we did not
register the local variable for the temporary of type `A`, which means
we also didn't call it destructor.

However, this breaks the case where the temporary needs to outlive the
conditional operator and instead be destroyed via the surrounding
`ExprWithCleanups`:
```
constexpr bool test2(bool b) {
  unsigned long __ms = b ? (const unsigned long &)0 : __ms;
  return true;
}
static_assert(test2(true));
```
Before this patch, we diagnosed this example:
```console
./array.cpp:180:15: error: static assertion expression is not an integral constant expression
  180 | static_assert(test2(true));
      |               ^~~~~~~~~~~
./array.cpp:177:24: note: read of temporary whose lifetime has ended
  177 |   unsigned long __ms = b ? (const unsigned long &)0 : __ms;
      |                        ^
./array.cpp:180:15: note: in call to 'test2(true)'
  180 | static_assert(test2(true));
      |               ^~~~~~~~~~~
./array.cpp:177:51: note: temporary created here
  177 |   unsigned long __ms = b ? (const unsigned long &)0 : __ms;
      |                                                   ^
1 error generated.
```
because the temporary created for the true branch got immediately
destroyed.

The problem in essence is that since the conditional operator doesn't
create a scope at all, we register the local variables for both its
branches, but we later only execute one of them, which means we should
also only destroy the locals of one of the branches.

We fix this similar to clang codgen's `is_active` flag: In the case of a
conditional operator (which is so far the only case where this is
problematic, and this also helps minimize the performance impact of this
change), we make local variables as disabled-by-default and then emit a
`EnableLocal` opcode later, which marks them as enabled. The code
calling their destructors checks whether the local was enabled at all.
…69281)

WaitingOnGraphTests.Emit_SingleContainerSimpleCycle tests a pair of emit
operations where the second completes a simple cycle (1: A -> B, 2: B ->
A).

We already had a test of WaitingOnGraph::simplify's behavior in this
case, but did not have one for WaitingOnGraph::emit.
…n auto type with typename (llvm#162514)

ASTImporter on importing template specialization with auto return type
faces cycle when return type is not nested one, but typename from
template arguments and other template.
There is code, that prevents cycle to auto return types when nested type
declared. Solved case differs somehow from nested types, but have same
solution with UsedDifferentProtoType - with delayed return type
determining.
To make it clear that the return value is immutable.
This method returns the current expression being emitted, but is only
used testing whether an expression is being emitted or not. This patch
therefore replaces it with a boolean isEmittingExpression() method.
Building with GCC I got:
```
<...>/OnDiskGraphDB.cpp:624:18: warning: ‘static {anonymous}::DataRecordHandle {anonymous}::DataRecordHandle::construct(char*, const {anonymous}::DataRecordHandle::Input&)’ defined but not used [-Wunused-function]
  624 | DataRecordHandle DataRecordHandle::construct(char *Mem, const Input &I) {
      |                  ^~~~~~~~~~~~~~~~
<...>/OnDiskGraphDB.cpp:456:1: warning: ‘static {anonymous}::DataRecordHandle {anonymous}::DataRecordHandle::create(llvm::function_ref<char*(long unsigned int)>, const {anonymous}::DataRecordHandle::Input&)’ defined but not used [-Wunused-function]
  456 | DataRecordHandle::create(function_ref<char *(size_t Size)> Alloc,
      | ^~~~~~~~~~~~~~~~
```

These implement parts of a class that is defined in an anonymous
namespace. All llvm tests passed with them removed.
…69294)

This PR fixes the bazel build that went out of sync with the changes
introduced in llvm#168074.

Signed-off-by: Ingo Müller <ingomueller@google.com>
The implementation is based on the directive tree.

Fixes clangd/clangd#1623
…encing blocks (llvm#169208)

Objective-C blocks are like lambdas. They have captures, just like lambdas.
However, they can also implicitly capture themselves unlike lambdas.

This means that when walking the captures of a block, we may end up in
infinite recursion. This is not possible with lambdas, but happened in
practice with blocks downstream.

In this patch, I just use a set to keep track of the visited MemRegions.

Note that theoretically, there is nothing preventing usual lambdas or
functors from falling for the same trap, but probably slightly more
difficult to do so. You would likely need a pointer to itself, etc. I'll
not speculate here.

This inf recursion was likely caused by llvm#126620, released in clang-21.

rdar://162215172
…#167502)

This lowers an SVE FMUL of bf16 using the BFMLAL top/bottom instructions
rather than extending to an f32 mul. This does require zeroing the
accumulator, but requires fewer extends/unpacking.
…lvm#162633)

Introducing this utility makes the `__grow_by{,_and_replace}`
significantly easier to understand and allows us to migrate away from
these functions in the future.
…7670)

This allows propagating optimizations to different algorithms by just
optimizing the lowest one. This is especially relevant now that we start
optimizing how we're iterating through ranges (e.g. the segmented
iterator optimizations) and adding assumptions so the compier can better
leverage semantics guaranteed by the standard (e.g.
`__builtin_assume_dereferenceable`).
In some workloads we see an argument passed on the stack where it is
loaded, only for it to be immediately spilled to a different slot on the
stack and then reloaded from that spill slot later on.

We can avoid the unnecessary spill by marking loads as rematerializable
and just directly loading from where the argument was originally passed
on the stack. TargetTransformInfo::isReMaterializableImpl checks to make
sure that any loads are `MI.isDereferenceableInvariantLoad()`, so we
should be able to move the load down to the remat site.

This gives a 14.8% reduction in spills in 544.nab_r on rva23u64 -O3, and
a few other smaller reductions on llvm-test-suite. I didn't find any
benchmarks where the number of spills/reloads increased.

Related: llvm#165761
…#157819)

This patch introduces the LASX and LSX conversion intrinsics:

- __m256 __lasx_cast_128_s (__m128)
- __m256d __lasx_cast_128_d (__m128d)
- __m256i __lasx_cast_128 (__m128i)
- __m256 __lasx_concat_128_s (__m128, __m128)
- __m256d __lasx_concat_128_d (__m128, __m128d)
- __m256i __lasx_concat_128 (__m128, __m128i)
- __m128 __lasx_extract_128_lo_s (__m256)
- __m128d __lasx_extract_128_lo_d (__m256d)
- __m128i __lasx_extract_128_lo (__m256i)
- __m128 __lasx_extract_128_hi_s (__m256)
- __m128d __lasx_extract_128_hi_d (__m256d)
- __m128i __lasx_extract_128_hi (__m256i)
- __m256 __lasx_insert_128_lo_s (__m256, __m128)
- __m256d __lasx_insert_128_lo_d (__m256d, __m128d)
- __m256i __lasx_insert_128_lo (__m256i, __m128i)
- __m256 __lasx_insert_128_hi_s (__m256, __m128)
- __m256d __lasx_insert_128_hi_d (__m256d, __m128d)
- __m256i __lasx_insert_128_hi (__m256i, __m128i)

Relevant GCC patch:

https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=c2013267642fea4a6e89b826940c8aa80a76089d
`[[nodiscard]]` should be applied to functions where discarding the
return value is most likely a correctness issue.

-
https://libcxx.llvm.org/CodingGuidelines.html#apply-nodiscard-where-relevant

---------

Co-authored-by: Hristo Hristov <zingam@outlook.com>
`[[nodiscard]]` should be applied to functions where discarding the
return value is most likely a correctness issue.
-
https://libcxx.llvm.org/CodingGuidelines.html#apply-nodiscard-where-relevant
…9303)

We didn't take `IntAP`/`IntAPS` into account when casting to and from
the computation LHS type. This broke the
`std/ranges/range.factories/range.iota.view/end.pass.cpp` test.
juliannagele and others added 9 commits November 24, 2025 11:12
…llvm#166353)

In combination with llvm#149470
this will introduce parallel accumulators when unrolling reductions with
vector instructions. See also
llvm#166630, which aims to
introduce parallel accumulators for FP reductions.
Linaro is doing network maintenance and I don't have an estimated time
these will be back online.
…llvm#169316)

This is a second attempt to fix the bazel build (after the first in
llvm#169294, which was accidentally merged before CI passed). In the first
attempt, not all bazel dependencies had been added; this PR should add
them all and make CI pass.

Signed-off-by: Ingo Müller <ingomueller@google.com>
InstCombine phi scalarization would always create a new binary op with
the phi as the first operand, which is not correct for non-commutable
binary ops such as sub. This fix preserves the original binary op
ordering in the new binary op and adds a test for this behavior.
Currently, this transformation can produce silently incorrect IR, and in
the case of the added test, would optimize it out entirely.
Avoid adding any given SuperNode SN to its own SuperNode-deps set. This
saves us from trying to redundantly merge its dependencies back into
itself (a no-op, but a potentially expensive one).
show information about the signal when the user presses `process handle
<unix-signal>` i.e

```sh
(lldb) process handle SIGWINCH 
NAME         PASS   STOP   NOTIFY  DESCRIPTION
===========  =====  =====  ======  ===================
SIGWINCH     true   false  false   window size changes
```

Wanted to use the existing `GetSignalDescription` but it is expected
behaviour to return the signal name if no signal code is passed. It is
used in stop info.


https://github.com/llvm/llvm-project/blob/65c895dfe084860847e9e220ff9f1b283ebcb289/lldb/source/Target/StopInfo.cpp#L1192-L1195
…168745)

Fix the dependency of `CodeGenDAGPatterns::ParseDefaultOperands()` on
the particular order of SDNode definitions. Implicit usage of the first
definition as a placeholder makes `llvm-tblgen -gen-dag-isel` fail if
that SDNode is not usable as an output pattern operator and an instance
of `OperandWithDefaultOps` is used in a pattern.

Presently, each `OperandWithDefaultOps` record is processed by
constructing an instance of TreePattern from its `DefaultOps` argument
that has the form `(ops ...)`. Even though the result of processing the
root operator of that DAG is not inspected by `ParseDefaultOperands()`
function itself, that operator has to be supported by the underlying
`TreePattern::ParseTreePattern()` function. For that reason, a temporary
DAG is created by replacing the root operator of `DefaultOps` argument
with the first SDNode defined, which is usually `def imm : ...` defined
in `TargetSelectionDAG.td` file.

This results in misleading errors being reported when implementing new
`SDNode` types, if the new definition happens to be added before the
`def imm : ...` line. The error is reported by several test cases
executed by `check-llvm` target, as well as by the regular build, if one
of the enabled targets inherit one of its operand types from
`OperandWithDefaultOps`:

    OptionalIntOperand: ../llvm/test/TableGen/DAGDefaultOps.td:28:5: error: In OptionalIntOperand: Cannot use 'unexpected_node' in an output pattern!
    def OptionalIntOperand: OperandWithDefaultOps<i32, (ops (i32 0))>;

This commit implements a dedicated constructor of `TreePattern` to be
used if the caller does not care about the particular root operator of
the pattern being processed.
…nsfer_read' for PVC & BMG (llvm#168910)

The PR changes the `TransferReadLowering` to always use `xegpu.load`
(and not `xegpu.load_nd`) for 1D cases as it has more developed
interface (e.g. layouts capabilites).

Signed-off-by: dchigarev <dmitry.chigarev@intel.com>
@z1-cciauto z1-cciauto requested a review from a team November 24, 2025 12:06
@z1-cciauto
Copy link
Collaborator Author

@ronlieb ronlieb removed the request for review from nicolasvasilache November 24, 2025 12:07
@z1-cciauto z1-cciauto merged commit 1f35e37 into amd-staging Nov 24, 2025
15 checks passed
@z1-cciauto z1-cciauto deleted the upstream_merge_202511240706 branch November 24, 2025 14:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.