Skip to content

Conversation

@ronlieb
Copy link
Collaborator

@ronlieb ronlieb commented Nov 24, 2025

No description provided.

RKSimon and others added 20 commits November 24, 2025 12:32
Current implementation for SV_Position was very basic to allow
implementing/testing some semantics. Now that semantic support is more
robust, I can move forward and implement the whole semantic logic.

DX part is still a bit placeholder.
The attributor can infer the alignment of %p at the call-site in this
example [1]:

```
  define void @f(ptr align 8 %p, i1 %c1, i1 %c2) {
  entry:
    br i1 %c1, label %bb.1, label %exit

  bb.1:
    call void (...) @llvm.fake.use(ptr %p)
    br label %exit

  exit:
    ret void
  }
```

but not when there's an additional conditional branch:

```
  define void @f(ptr align 8 %p, i1 %c1, i1 %c2) {
  entry:
    br i1 %c1, label %bb.1, label %exit

  bb.1:
    br i1 %c2, label %bb.2, label %exit

  bb.2:
    call void (...) @llvm.fake.use(ptr %p)
    br label %exit

  exit:
    ret void
  }
```

unless `-attributor-annotate-decl-cs` is enabled. This patch extends
`followUsesInMBEC` to handle such recursive branches.

n.b. admittedly I wrote this patch before discovering inferring the
alignment in this example is already possible with
`-attributor-annotate-decl-cs`, I came to realise this once writing the
tests, but this seems like a gap regardless looking at existing FIXMEs,
plus the alignment can now be inferred in this particular example
without the flag.

[1] https://godbolt.org/z/aKoc75so5
)

This patch fixes a crash in Clang that occurs when the compiler
retrieves the element type of a complex type but receives a sugared
type. See example here: https://godbolt.org/z/cdbdeMcaT
This patch fixes the crash.
Extend the load of a expand shape rewrite pattern to support folding a
`memref.expand_shape` and `vector.transfer_read` when the permutation
map on `vector.transfer_read` is a minor identity.

---------

Signed-off-by: Jack Frankland <jack.frankland@arm.com>
Introduce `AVX512_128_SETALLONES`, `AVX512_256_SETALLONES` pseudos to
generate all-ones vectors.

Post-RA expansion:

- Use VEX vpcmpeqd for XMM/YMM0–15 when available (matches current
codegen as `AVX512_128/256_SETALLONES` will be preferred over
`AVX1/2_SETALLONES` for AVX512VL target).
- Use EVEX `vpternlogd imm=0xFF` for high regs.

Includes MIR tests for both VEX and EVEX paths.
…lates. (llvm#168946)

Reduces the pain of manual editing tests applying the same
changes over multiple instructions and keeping them consistent.
This patch adds unary nodes plus and minus, introduces unary type
conversions, and adds integral promotion to the type system.
…wards branches (llvm#168398)

If we have a conditional branch, followed by an epilogue, followed by
more code, LLDB will incorrectly compute unwind information through
instruction emulation. Consider this:

```
// ...
<+16>: b.ne   ; <+52> DO_SOMETHING_AND_GOTO_AFTER_EPILOGUE

// epilogue start
<+20>: ldp    x29, x30, [sp, #0x20]
<+24>: add    sp, sp, #0x30
<+28>: ret
// epilogue end

AFTER_EPILOGUE:
<+32>: do something
// ...
<+48>: ret

DO_SOMETHING_AND_GOTO_AFTER_EPILOGUE:
<+52>: stp    x22, x23, [sp, #0x10]
<+56>: mov    x22, #0x1
<+64>: b      ; <+32> AFTER_EPILOGUE
```

LLDB will think that the unwind state of +32 is the same as +28. This is
false, as +32 _never_ executes after +28.

The root cause of the problem is the order in which instructions are
visited; they are visited in the order they appear in the text, with
unwind state always being forwarded to positive branch offsets, but
never to negative offsets.

In the example above, `AFTER_EPILOGUE` should inherit the state of the
branch in +64, but it doesn't because `AFTER_EPILOGUE` is visited right
after the `ret` in +28.

Fixing this should be simple: maintain a stack of instructions to visit.
While the stack is not empty, take the next instruction on stack and
visit it.
* After visiting a non-branching instruction, push the next instruction
and forward unwind state to it.
* After visiting a branch with one or more known targets, push the known
branch targets and forward state to them.
* In all other cases (ret, or branch to register), don't push nor
forward anything.

Never push an instruction already on the stack. Like the algorithm
today, this new algorithm also assumes that, if two instructions branch
to the same target, the unwind state in both better be the same.

(Note: yes, branch to register is also handled incorrectly today, and
will still be incorrect).
This patch implements the lowering for the 'copy' clause for a
function-local declare directive.

This is the first of the clauses that requires a 'cleanup' step, so it
also includes some basic infrastructure for that. Fortunately there are
only 8 clauses (only 6 of which require cleanup), so the if/else chain
won't get too long.

Also fortunately, we don't have to include any of the AST components, as
it is possible to tell all the required details from the entry operation
itself.
`[[nodiscard]]` should be applied to functions where discarding the
return value is most likely a correctness issue.
- https://libcxx.llvm.org/CodingGuidelines.html#apply-nodiscard-where-relevant
When interleaving a loop with an early exit, the parts before the active
lane will be all zero. Currently we emit @llvm.experimental.cttz.elts
with ZeroIsPoison=true for these parts, which means that they will
produce poison.

We don't see any miscompiles today on AArch64 because it has the same
lowering for cttz.elts regardless of ZeroIsPoison, but this may cause
issues on RISC-V when interleaving. This fixes it by setting
ZeroIsPoison=false.

The codegen is slightly worse on RISC-V when ZeroIsPoison=false and we
could potentially recover it by enabling it again when UF=1, but this is
left to another PR.

This is split off from llvm#168738, where LastActiveLane can get expanded to
a FirstActiveLane with an all-zeroes mask.
More missed target checks.

Signed-off-by: Nick Sarnie <nick.sarnie@intel.com>
This is exactly like the 'copy', except the exit operation is a 'delete'
instead of a 'copyout'. Also, creating the 'delete' op has one less
argument to it, so we have to do some special handling when creating
that.
@ronlieb ronlieb requested review from a team and dpalermo November 24, 2025 15:47
@z1-cciauto
Copy link
Collaborator

@ronlieb ronlieb requested review from a team and removed request for Groverkss and nicolasvasilache November 24, 2025 17:18
@z1-cciauto z1-cciauto merged commit 32c812f into amd-staging Nov 24, 2025
14 checks passed
@z1-cciauto z1-cciauto deleted the amd/merge/upstream_merge_20251124093421 branch November 24, 2025 18:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.