Skip to content

Conversation

@jorickert
Copy link

No description provided.

arsenm and others added 30 commits December 17, 2024 17:45
Do not remove S_CBRANCH_EXECZ if one of the following blocks contains an
unconditional branch to a block other than the one immediately following
it. This can cause unwanted behavior like infinite loops.
…m#118117)

This adds a new helper `canFoldStoreIntoLibCallOutputPointers()` to
check that it is safe to fold a store into a node that will expand to a
library call that takes output pointers. This requires checking for two
(independent) properties:

1. The store is not within a CALLSEQ_START..CALLSEQ_END pair
* If it is, the expansion would lead to nested call sequences (which is
invalid)
2. The node does not appear as a predecessor to the store
* If it does, attempting to merge the store into the call would result
in a cycle in the DAG

These two properties are checked as part of the same traversal in
`canFoldStoreIntoLibCallOutputPointers()`
This adds support for the loongarch64 architecture to the offload host
plugin.

Similar to llvm#115773

To fix some test issues, I've had to add the LoongArch64 target to:

- CompilerInvocation::ParseLangArgs
- linkDevice in ClangLinuxWrapper.cpp
- OMPContext::OMPContext (to set the device_kind_cpu trait)

Reviewed By: jhuber6

Pull Request: llvm#120173
…ethod (llvm#118951)

[llvm-debuginfo-analyzer] Fix crash due to un-checked error in LVReaderHandler::handleArchive
method.

- Added README describing how to generated the binary files used for the test.
- A follow up patch to add extra ASSERT_NE

Committed on behalf of @aurelien35
…0002)

Theses modules are deprecated and have trivial implementations in modern
cmake.
Multiple reentry points may be associated with a single key.
…#119546)

This patch implements the following intrinsics:

Multi-vector 8-bit floating-point multiply-add long (multiple vectors).

``` c
// Only if __ARM_FEATURE_SME_F8F16 != 0
void svmla_za16[_mf8]_vg2x2_fpm(uint32_t slice, svmfloat8x2_t zn, svmfloat8x2_t zm,
                                fpm_t fpm) __arm_streaming __arm_inout("za");

void svmla_za16[_mf8]_vg2x4_fpm(uint32_t slice, svmfloat8x4_t zn, svmfloat8x4_t zm,
                                fpm_t fpm) __arm_streaming __arm_inout("za");
// Only if __ARM_FEATURE_SME_F8F32 != 0
void svmla_za32[_mf8]_vg4x2_fpm(uint32_t slice, svmfloat8x2_t zn, svmfloat8x2_t zm,
                                fpm_t fpm) __arm_streaming __arm_inout("za");

void svmla_za32[_mf8]_vg4x4_fpm(uint32_t slice, svmfloat8x4_t zn, svmfloat8x4_t zm,
                                fpm_t fpm) __arm_streaming __arm_inout("za");                              
```

In accordance with ARM-software/acle#323
…E routines. (llvm#119864)"

Avoid issues caused by `.subsections_via_symbols` directive, by using
numbered labels instead of named labels for the branch locations.

This reverts commit 4032ce3.
Building on top of llvm#115489
extend support for binops with SExt operand.

PR: llvm#115879
This patch adds initial matchers for unary and binary SCEV expressions 
and specializes it for SExt, ZExt and binary add expressions.

Also adds matchers for SCEVConstant and SCEVUnknown.

This patch only converts a few instances to use the new matchers to make
sure everything builds as expected for now.

The goal of the matchers is to hopefully make it slightly easier to
write code matching SCEV patterns.

Depends on llvm#119389

PR: llvm#119390
We still need to check the input pointer, so let this go through
BitCastPrim.
Before this patch, there was a calling-convention mismatch between the
constructors and the actual call emitted for the entrypoint wrapper.

Such mismatch causes the InstCombine pass to replace this call with an
`unreachable`, breaking the whole function.

Signed-off-by: Nathan Gauër <brioche@google.com>
llvm#120218)

With all the recent versions of Clang that I tested, ObjC forward
declarations like
```
@Class ForwardObjcClass;
```
don't emit the kind of DWARF that this workaround was put in place for.

Also, zero-sized structures are valid in C (and thus Objective-C), so
this workaround makes things confusing to reason about when mixing the
two languages.

This workaround has been in place for at least a decade, and given that
recent compilers don't produce this anymore, we think it's a good time
to remove it.
Store GEPNoWrapFlags instead of only InBounds and propagate them.
The `llvm-gcc` front-end has been EOL'd at least since 2011 (based on
some `git` archeology). And Clang/LLVM has been removing references to
it ever since.

This patch removes the remaining references to it from LLDB. One benefit
of this is that it will allow us to remove the code checking for
`DW_AT_decl_file_attributes_are_invalid` and
`Supports_DW_AT_APPLE_objc_complete_type`.
…lete_type and DW_AT_decl_file_attributes_are_invalid (llvm#120226)

Depends on llvm#120225

With `llvm-gcc` support being removed from LLDB, these APIs
are now trivial and can be removed too.
Class BuiltinTypeMethodBuilder has a user-defined destructor so likely
compiler generated special functions may behave incorrectly. Delete
explicitly copy constructor and copy assignment operator to avoid
potential errors.
A necessary AddrSpaceCast was wrongfully deleted in
5c91b28 . Recover the AddrSpaceCast.

This fixes llvm#86791 .
Summary:
This message is only confusing and shouldn't have been added in the
first place.
Summary:
This installs the shared header to the users installation. I couldn't
decide if this should be a standalone thing or use the existing support
in `include/` mostly because this is completely separate from hdrgen
stuff and it's C++.
This patch introduces the LLVM components of a type sanitizer: a
sanitizer for type-based aliasing violations.

It is based on Hal Finkel's https://reviews.llvm.org/D32198.

C/C++ have type-based aliasing rules, and LLVM's optimizer can exploit
these given TBAA metadata added by Clang. Roughly, a pointer of given
type cannot be used to access an object of a different type (with, of
course, certain exceptions). Unfortunately, there's a lot of code in the
wild that violates these rules (e.g. for type punning), and such code
often must be built with -fno-strict-aliasing. Performance is often
sacrificed as a result. Part of the problem is the difficulty of finding
TBAA violations. Hopefully, this sanitizer will help.

For each TBAA type-access descriptor, encoded in LLVM's IR using
metadata, the corresponding instrumentation pass generates descriptor
tables. Thus, for each type (and access descriptor), we have a unique
pointer representation. Excepting anonymous-namespace types, these
tables are comdat, so the pointer values should be unique across the
program. The descriptors refer to other descriptors to form a type
aliasing tree (just like LLVM's TBAA metadata does). The instrumentation
handles the "fast path" (where the types match exactly and no
partial-overlaps are detected), and defers to the runtime to handle all
of the more-complicated cases. The runtime, of course, is also
responsible for reporting errors when those are detected.

The runtime uses essentially the same shadow memory region as tsan, and
we use 8 bytes of shadow memory, the size of the pointer to the type
descriptor, for every byte of accessed data in the program. The value 0
is used to represent an unknown type. The value -1 is used to represent
an interior byte (a byte that is part of a type, but not the first
byte). The instrumentation first checks for an exact match between the
type of the current access and the type for that address recorded in the
shadow memory. If it matches, it then checks the shadow for the
remainder of the bytes in the type to make sure that they're all -1. If
not, we call the runtime. If the exact match fails, we next check if the
value is 0 (i.e. unknown). If it is, then we check the shadow for the
remainder of the byes in the type (to make sure they're all 0). If
they're not, we call the runtime. We then set the shadow for the access
address and set the shadow for the remaining bytes in the type to -1
(i.e. marking them as interior bytes). If the type indicated by the
shadow memory for the access address is neither an exact match nor 0, we
call the runtime.

The instrumentation pass inserts calls to the memset intrinsic to set
the memory updated by memset, memcpy, and memmove, as well as
allocas/byval (and for lifetime.start/end) to reset the shadow memory to
reflect that the type is now unknown. The runtime intercepts memset,
memcpy, etc. to perform the same function for the library calls.

The runtime essentially repeats these checks, but uses the full TBAA
algorithm, just as the compiler does, to determine when two types are
permitted to alias. In a situation where access overlap has occurred and
aliasing is not permitted, an error is generated.

Clang's TBAA representation currently has a problem representing unions,
as demonstrated by the one XFAIL'd test in the runtime patch. We'll
update the TBAA representation to fix this, and at the same time, update
the sanitizer.

When the sanitizer is active, we disable actually using the TBAA
metadata for AA. This way we're less likely to use TBAA to remove memory
accesses that we'd like to verify.

As a note, this implementation does not use the compressed shadow-memory
scheme discussed previously
(http://lists.llvm.org/pipermail/llvm-dev/2017-April/111766.html). That
scheme would not handle the struct-path (i.e. structure offset)
information that our TBAA represents. I expect we'll want to further
work on compressing the shadow-memory representation, but I think it
makes sense to do that as follow-up work.

It goes together with the corresponding clang changes
(llvm#76260) and compiler-rt
changes (llvm#76261)

PR: llvm#76259
ptxas needs a proper triplet

for 133352f
This change has a long history. It was first attempted naively in
https://reviews.llvm.org/D131425, which didn't work because we broke the
ability for code to include e.g. <stdio.h> multiple times and get
different definitions based on the pre-defined macros.

However, in llvm#86843 we managed to simplify <stddef.h> by including the
underlying system header outside of any include guards, which worked.

This patch applies the same simplification we did to <stddef.h> to the
other headers that currently mention __need_FOO macros explicitly.
…ss (llvm#120135)

Add some comments that hopefully clarify a few things.

This was supposed to be NFC, but there is a difference in the inferred
register class for EXTRACT_SUBREG.

Pull Request: llvm#120135
This patch introduces the Clang components of type sanitizer: a
sanitizer for type-based aliasing violations.

It is based on Hal Finkel's https://reviews.llvm.org/D32198.

The Clang changes are mostly formulaic, the one specific change being
that when the TBAA sanitizer is enabled, TBAA is always generated, even
at -O0.

It goes together with the corresponding LLVM changes
(llvm#76259) and compiler-rt
changes (llvm#76261)

PR: llvm#76260
teresajohnson and others added 27 commits December 18, 2024 12:47
Don't unnecessarily clone for a caller that wasn't matched to a call
instruction.

This necessitated updated a couple of tests that were either
unnecessarily cloning or unnecessarily processing an allocation and
hinting it not cold.
Some bots are failing with 2916352,
likely due to the escapes in the FileCheck pattern. Add extra quotes to
try to fix this.
E.g. https://lab.llvm.org/buildbot/#/builders/46/builds/9442
Two bugs here. First calling `Inst->getFunction()` has undefined
behavior if the instruction is not tracked to a function. I suspect the
`replaceAllUsesWith` was leaving the GEPs in a weird ghost parent
situation. I switched up the visitor to be able to `eraseFromParent` as
part of visiting and then everything started working.

The second bug was in `DXILFlattenArrays.cpp`. I was unaware that you
can have multidimensional arrays of `zeroinitializer`, and `undef` so
fixed up the initializer to handle these two cases.

fixes llvm#117273
…lvm#118958)

In many cases the emptyTensorElimination can not transform or eliminate
the empty tensor which is being inserted into the
`SubsetInsertionOpInterface`.

Two major reasons for that:

1- Failing when trying to find a legal/suitable insertion point for the
`subsetExtract` which is about to replace the empty tensor. However, we
may try to handle this issue by moving the needed values which
responsible on building the `subsetExtract` nearby the empty tensor
(which is about to be eliminated). Thus increasing the probability to
find a legal insertion point.

2-The EmptyTensorElimination transform replaces the tensor.empty's uses
all at once in one apply, rather than replacing only the specific use
which was visited in the use-def chain (when traversing from the
tensor.insert_slice). This scenario of replacing all the uses of the
tensor.empty may lead into additional read effects after bufferization
of the specific subset extract/subview which should not be the case.

Both cases may result in many copies in the coming bufferization which
can not be canonicalized.

The first case can be noticed when having a `tensor.empty` followed by
`SubsetInsertionOpInterface` (or in simple words `tensor.insert_slice`),
which have been lowered from `tensor/tosa.concat`.

The second case can be noticed when having a `tensor.empty`, with many
uses and leading to applying the transformation only once, since the
whole uses have been replaced at once.

The first commit in the PR only adds the lit tests for the cases shown
above (NFC), to emphasize how the transform works, in the coming MRs
will upload a slight changes to handle these case.

The second commit in this PR, we want to replace only the specific use
which was visited in the `use-def` chain (when traversing from the
`tensor.insert_slice`'s source).
This sets up the initial blocks needed to initialize a VPlan directly
in the constructor. This will allow tracking of all created blocks
directly in VPlan, simplifying block deletion.
Give the properties from tablegen a `predicate` field that holds the
predicate that the property needs to satisfy, if one exists, and hook
that field up to verifier generation.
This patch undrifts source locations in MemProfRecord before readMemprof
starts the matching process.

The thoery of operation is as follows:

1. Collect the lists of direct calls, one from the IR and the other
   from the profile.

2. Compute the correspondence (called undrift map in the patch)
   between the two lists with longestCommonSequence.

3. Apply the undrift map just before readMemprof consumes
   MemProfRecord.

The new function gated by a flag that is off by default.
…lhs floordiv rhs` (llvm#119245)

Fixes an issue where the `SimpleAffineExprFlattener` would simplify
`lhs % rhs` to just `-(lhs floordiv rhs)` instead of 
`lhs - (lhs floordiv rhs)`
if `lhs` happened to be equal to `lhs floordiv rhs`.

The reported failure case was 
`(d0, d1) -> (((d1 - (d1 + 2)) floordiv 8) % 8)`
from llvm#114654.

Note that many paths that simplify AffineMaps (e.g. the AffineApplyOp
folder and canonicalization) would not observe this bug because of
of slightly different paths taken by the code. Slightly different
grouping of the terms could also result in avoiding the bug.

Resolves llvm#114654.
Found assertion failures when using EXPENSIVE_CHECKS and running lit
tests for APINotes:
Assertion `left.first != right.first && "two entries for the same
version"' failed.

It seems like std::is_sorted is verifying that the comparison function
is reflective (comp(a,a)=false) when using expensive checks. So we would
get callbacks to the lambda used for comparison, even for vectors with a
single element in APINotesReader::VersionedInfo<T>::VersionedInfo, with
"left" and "right" being the same object. Therefore the assert checking
that we never found equal values would fail.

Fix makes sure that we skip the check for equal values when "left" and
"right" is the same object.
This patch also makes following amendments to core exegesis:
* Added distinction between regular registers aliasing check and
registers used as memory address in instruction.
* Added scratch memory space pointer register.
* General exegesis options were amended:
        * mattr - new option to pass a list of enabled target features

Llvm-exegesis RISCV port is a result of team effort. Below everyone
involved listed.
Co-authored-by: Konstantin Vladimirov
<konstantin.vladimirov@syntacore.com>
Co-authored-by: Dmitrii Petrov <dmitrii.petrov@syntacore.com>
Co-authored-by: Dmitry Bushev <dmitry.bushev@syntacore.com>
Co-authored-by: Mark Goncharov <mark.goncharov@syntacore.com>
Co-authored-by: Anastasiya Chernikova
<anastasiya.chernikova@syntacore.com>

Original pr: llvm#89047

---------

Co-authored-by: Kazu Hirata <kazu@google.com>
Support true16 format for v_cvt_pknorm_i16/u16_f16 in MC.
Support true16 format for v_div_fixup_f16 in MC.
Support true16 format for v_minmax/maxmin_f16 in MC.

Since we are replacing `v_minmax/maxmin_f16` to `v_minmax/maxmin_f16_t16
/ v_minmax/maxmin_f16_fake16` in Post-GFX11, have to update the CodeGen
pattern for `v_minmax/maxmin_f16` to get CodeGen test passing.
The arguments to this are the same as for the 'wait' clause, so this
reuses all of that infrastructure. So all this has to do is support a
pair of clauses that are already implemented (if and async), plus create
an AST node.  This patch does so, and adds proper testing.
'-mllvm -ubsan-unique-traps'
(llvm#65972) applies to all UBSan
checks. This patch introduces -fsanitize-merge (defaults to on,
maintaining the status quo behavior) and -fno-sanitize-merge (equivalent
to '-mllvm -ubsan-unique-traps'), with the option to selectively
applying non-merged handlers to a subset of UBSan checks (e.g.,
-fno-sanitize-merge=bool,enum).

N.B. we do not use "trap" in the argument name since
llvm#119302 has generalized
-ubsan-unique-traps to work for non-trap modes (min-rt and regular rt).

This patch does not remove the -ubsan-unique-traps flag; that will
override -f(no-)sanitize-merge.
…lvm#112694 (llvm#120418)

I missed that FalseCnt for each Case was used to calculate percentage in
the SwitchStmt. At the moment I resurrect them.

In `!HasDefaultCase`, the pair of Counters shall be `[CaseCountSum,
FalseCnt]`. (Reversal of before llvm#112694)
I think it can be considered as the False count on SwitchStmt.

FalseCnt shall be folded (same as current impl) in the coming
SingleByteCoverage changes, since percentage would not make sense.
…llvm#118242)

Scalarize vector FPOWI instead of promoting the type. This allows the
scalar FPOWIs to be visited and converted to libcalls before promoting
the type.

FIXME: This should be done in LegalizeVectorOps/LegalizeDAG, but call
lowering needs the unpromoted EVT.

Without this patch, in some backends, such as RISCV64 and LoongArch64,
the i32 type is illegal and will be promoted. This causes exponent type
check to fail when ISD::FPOWI node generates a libcall.

Fix llvm#118079
This reverts commit 9af5de3.

Reason: buildbot breakage
(https://lab.llvm.org/buildbot/#/builders/24/builds/3394/steps/10/logs/stdio)
"Unexpectedly Passed Tests (1):
   llvm-libc++-shared.cfg.in :: libcxx/language.support/support.dynamic/libcpp_deallocate.sh.cpp"
…464)" (llvm#120511)

This reverts commit 2691b96. This
reapply fixes the buildbot breakage of the original patch, by updating
clang/test/CodeGen/ubsan-trap-debugloc.c to specify -fsanitize-merge
(the default, which is merge, is applied by the driver but not
clang_cc1).

This reapply also expands clang/test/CodeGen/ubsan-trap-merge.c.

----

Original commit message:
'-mllvm -ubsan-unique-traps'
(llvm#65972) applies to all UBSan
checks. This patch introduces -fsanitize-merge (defaults to on,
maintaining the status quo behavior) and -fno-sanitize-merge (equivalent
to '-mllvm -ubsan-unique-traps'), with the option to selectively
applying non-merged handlers to a subset of UBSan checks (e.g.,
-fno-sanitize-merge=bool,enum).

N.B. we do not use "trap" in the argument name since
llvm#119302 has generalized
-ubsan-unique-traps to work for non-trap modes (min-rt and regular rt).

This patch does not remove the -ubsan-unique-traps flag; that will
override -f(no-)sanitize-merge.
…ice memory (llvm#120485)

When emboxing memory that comes from CUFMemAlloc, we need to allocate
the descriptor in manage memory as it might be passed to a kernel.
@jorickert jorickert changed the title [AutoBump] Merge with 89b34ec9 (Dec 18) (23) [AutoBump] Merge with 89b34ec9 (Dec 18) (23) [Only tested MLIR] Mar 17, 2025
@jorickert jorickert merged commit bb98eec into bump_to_392622d0 Apr 14, 2025
51 checks passed
@jorickert jorickert deleted the bump_to_89b34ec9 branch April 14, 2025 07:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.