Skip to content

[AutoBump] Merge with ccd3defd (Feb 19) (58) #602

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 77 commits into
base: bump_to_c4f8da94
Choose a base branch
from

Conversation

jorickert
Copy link

No description provided.

petrhosek and others added 30 commits February 18, 2025 23:54
This is a (no-op) locale version of strftime.
…#127640)

CaptureTracking considers insertions into aggregates and vectors as
captures. As such, extractions from aggregates and vectors are escape
sources. A non-escaping identified local cannot alias with the result of
an extractvalue/extractelement.

Fixes llvm#126670.
…27705)

This patch adds handling of the RISCVISD::VCPOP_VL node in
RISCVTargetLowering::computeKnownBitsForTargetNode. It eliminates
redundant zero-extension instructions.
…atization (llvm#127754)

Issue: Compilation abnormally terminates in parallel default(private)

Documentation reference:
A threadprivate variable must not appear as the base variable of a list
item in any clause except for the copyin and copyprivate clauses

Explanation:
From the reference, the threadprivate symbols cannot be used in the DSA
clauses, which in turn means, the symbol can be skipped for default 
privatization

Fixes llvm#123535
…lvm#125826)

gfx940 and gfx941 are no longer supported. This is one of a series of
PRs to remove them from the code base.

For SWDEV-512631 and SWDEV-512633
…m#127455)

Delete `equivalenceAnalysis`, which has been incorporated into the
`getAliasingValues` API. Also add an additional test case to ensure that
equivalence is properly propagated across function boundaries.
…5836)

gfx940 and gfx941 are no longer supported. This is one of a series of
PRs to remove them from the code base.

For SWDEV-512631
…patterns. (llvm#127643)

Handles both BWI and non-BWI cases (skips PMOV*XBW without BWI).

The vector-interleaved-store-i8-stride-8.ll VPTERNLOG diffs are due to
better value tracking now recognizing the zero-extension patterns where
before it was any-extension
…126762)

gfx940 and gfx941 are no longer supported. This is one of a series of
PRs to remove them from the code base.

This PR removes all occurrences of gfx940/gfx941 from clang that can be
removed without changes in the llvm directory. The
target-invalid-cpu-note/amdgcn.c test is not included here since it
tests a list of targets that is defined in
llvm/lib/TargetParser/TargetParser.cpp.

For SWDEV-512631
…ted constexprs in metadata (llvm#127665)

Metadata that references unsupported constant expressions can be
represented with `poison` metadata instead of `undef` metadata.
The standard libcalls for half to float and float to half conversion are
__extendhfsf2 and __truncsfhf2. However, LLVM currently uses
__gnu_h2f_ieee and __gnu_f2h_ieee instead. As far as I can tell, these
libcalls are an ARM-ism and only provided by libgcc on that platform.
compiler-rt always provides both libcalls.

Use the standard libcalls by default, and only use the __gnu libcalls on
ARM.
gfx940 and gfx941 are no longer supported. This is one of a series of
PRs to remove them from the code base.

This PR removes all non-documentation occurrences of gfx940/gfx941 from
the llvm directory, and the remaining occurrences in clang.

Documentation changes will follow.

For SWDEV-512631
…lvm#127759)

This _can_ happen with non-pointers, but we shouldn't diagnose it in
that case.
This was only used for gfx940 and gfx941, which have since been removed.

For SWDEV-512631
…vm#126887)

gfx940 and gfx941 are no longer supported. This is one of a series of
PRs to remove them from the code base.

This PR removes all documentation occurrences of gfx940/gfx941 except
for the gfx940 ISA description, which will be the subject of a separate
PR.

For SWDEV-512631
… decoding (llvm#127630)

Similar to insert_subvector - limit this to vXi64 vector cases to make the most of cross lane shuffles (for now).
…lvm#126906)

gfx940 and gfx941 are no longer supported. This is the last one of a
series of PRs to remove them from the code base.

The ISA documentation still contains a lot of links and file names with
the "gfx940" identifier. Changing them to "gfx942" is probably not worth
the cost of breaking all URLs to these pages that users might have saved
in the past.

For SWDEV-512631
This also includes comparing the two ImpliedDo

Details
- For ArrayConstructor, check if x and y have the same
  elements and type
- For ImpliedDo, check if x and y have the same lower,
  upper, stride and values

Fixes: llvm#104526
…o-math-errno is set (llvm#121763)

This will allow vectorizing these calls (after a few more patches). This
should not change the codegen for targets that enable the use of AA
during the codegen (in `TargetSubtargetInfo::useAA()`). This includes
targets such as AArch64. This notably does not include x86 but can be
worked around by passing `-mllvm -combiner-global-alias-analysis=true`
to clang.

Follow up to llvm#114086.
This commit improves the behaviour of (__clc_)nextafter around zero.
Specifically, the nextafter value of very small negative numbers in the
positive direction is now negative zero. Previously we'd return positive
zero.

This behaviour is not required as far as OpenCL is concerned: at least,
the CTS isn't testing for it. However, this change does bring our
implementation into bit-equivalence with (libstdc++'s implementation of)
std::nextafter, tested on all possible values of 32-bit float towards
both positive and negative INFINITY.

Furthermore, since the implementation of libclc's floating-point 'rtp'
and 'rtz' conversions use __clc_nextafter, the previous behaviour was
resulting in CTS validation issues. For example, when converting float
-0x1.000002p-25 to half, rounding towards zero or positive infinity,
nextafter was returning +0.0, whereas the correct conversion requires us
to return -0.0.

We could work around this issue in the conversion functions, but since
the change to nextafter is small enough and the behaviour around zero
matches libstdc++, the fix feels at home there.

This commit also converts several variables to unsigned types to avoid
undefined behaviour surrounding signed underflow on the subtractions.
It also converts some variables to be kept in floating-point types, using
fabs to get the absolute value rather than by bit-hacking.
Fix affine.parallel op verifier for missing check on zero result lower
or upper bound maps. lb/ub maps should have at least one result.

Fixes: llvm#120186
Part of the DECLARE REDUCTION was already supported by the parser, but
the semantics to add the reduction identifier wasn't implemented.

The semantics would not accept the name given by the reduction, so a few
lines added to support that.

Some tests were in place but not quite working, so fixed those up too.
Adding new tests for unparsing and parse-tree, as well as checking the
symbolic name being generated.

Lowering of DECLARE REDUCTION is not supported in this patch, and a test
that it hits the relevant TODO is in this patch (most of this was
already existing, but not actually testing the TODO message).
Enable optional ISA extensions on Grace when mcpu=grace
is used: sve2-sm4, sve2-aes, sve2-sha3.
Grace is no longer an alias, but a separate CPU definition.
The motivation is llvm#123622 and the fact that is hard to fine the last
line entry in a given range. `FindLineEntryByAddress(range_end-1)` is
the best we have, but it's not ideal because it has a magic -1 and that
it relies on there existing a line entry at that address (generally, it
should be there, but if for some case it isn't, we might end up ignoring
the entries that are there (or -- like my incorrect fix in llvm#123622 did
-- iterating through the entire line table).

What we really want is to get the last entry that exists in the given
range. Or, equivalently (and more STL-like) the first entry after that
range. This is what these functions do. I've used the STL names since
they do pretty much exactly what the standard functions do (the main
head-scratcher comes from the fact that our entries represent ranges
rather than single values).

The functions can also be used to simplify the maze of `if` statements
in `FindLineEntryByAddress`, but I'm keeping that as a separate patch.
For now, I'm just adding some unit testing for that function to gain
more confidence that the patch does not change the function behavior.

---------

Co-authored-by: Jonas Devlieghere <jonas@devlieghere.com>
)

This is per style-guide: make file-scope symbol static whenever possible.

Fix llvm#125983.
During a recent change, the build system accidentally dropped the
(theoretical) support for the CLC builtins library to build
target-specific builtins from the 'amdgpu' directory, due to a change in
variable names. This functionality wasn't being used but was spotted
during another code review.

This commit takes the opportunity to clean up and better document the
code that manages the list of directories to search for builtin
implementations.

While fixing this, some references to now-removed SOURCES files were
discovered which have been cleaned up.
This patch adds intrinsics for tcgen05.cp and
tcgen05.shift instructions.

lit tests are added and verified with a
ptxas-12.8 executable.

Docs are updated in the NVPTXUsage.rst file.

Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
Sirraide and others added 30 commits February 19, 2025 16:48
While reviewing llvm#127623, I missed that it didn’t have a release note.
Doing so provides stability when compiling the builtins in a mode in
which unqualified pointers may be interpreted as being in the generic
address space, such as in OpenCL 3.0.

We eventually want to provide 'generic' overloads of the builtins in
libclc so this prepares the ground a little better.

It could be argued that having the internal CLC helper functions be
unqualified is more flexible, in case it's better for a target to have
the pointers in the generic address space. This commits to the private
address space for more stability across different OpenCL environments.
…7682)

This makes GetOutputStreamSP and GetErrorStreamSP protected members of
Debugger. Users who want to print to the debugger's stream should use
GetAsyncOutputStreamSP and GetAsyncErrorStreamSP instead and the few
remaining stragglers have been migrated.
…ureInfo (llvm#125880)"

This reverts commit 0fab404.
Seems to break LTO builds of clang on Windows, see comments on
llvm#125880
A handful of minor improvements to StreamAsynchronousIO:

 - Document the class.
 - Use a named enum value to distinguishing between stdout and stderr.
 - Add missing period to comment.
 - Clear the string instead of assigning to it.
 - Eliminate color argument.
…llvm#121109)

As a follow-up to llvm#121013 (which optimized `ranges::copy`) and llvm#121026
(which optimized `ranges::copy_backward`), this PR enhances the
performance of `std::ranges::{move, move_backward}` for
`vector<bool>::iterator`, addressing a subtask outlined in issue llvm#64038.

The optimizations bring performance improvements analogous to those
achieved for the `{copy, copy_backward}` algorithms: up to 2000x for
aligned moves and 60x for unaligned moves. Moreover, comprehensive
tests covering up to 4 storage words (256 bytes) with odd and even bit
sizes are provided, which validate the proposed optimizations in this
patch.
Previous PR llvm#122950 get
reverted since it hit the buildbot failure. Another patch get merged
when this PR is under review, and thus causing one test not up to date.

repen this PR and fixed the issue.
For example, determine that the address in `obj%p` below cannot alias
the address of `v`:

```
module m
  type :: ty
    real, pointer :: p
  end type ty
end module m
subroutine test()
  use m
  real, target :: t
  real :: v
  type(ty) :: obj
  obj%p => t
  v = obj%p
end subroutine test
```
This patch allows using fpfeatures pragmas with __builtin_convertvector:
- added TrailingObjects with FPOptionsOverride and methods for handling
it to ConvertVectorExpr
- added support for codegen, node dumping, and serialization of
fpfeatures contained in ConvertVectorExpr
Add frontend actions to support emitting assembly, bitcode, and object
files when compiling with ClangIR. This change also correctly sets and
propagates the target triple in the MLIR and LLVM modules, which was a
necessary prerequisite for emitting assembly and object files.
)

Disable fold when it will result in more instructions.
When the script has executed `cd %t`, it is fine to to use the output
file `a.out`.
(We don't want to rely on lit's default PWD to support lit compatible
runners. Therefore -o /dev/null is used when PWD has not been changed
to a %t derived path.)
for `trunc nuw` saves a instruction and otherwise only other
instructions without the select, same behavior as for bit test before.

proof: https://alive2.llvm.org/ce/z/a6QmyV
…lvm#120909)

This refactor includes the following changes:
- Refactor similar tests using `types::for_each` to remove redundant code;
- Explicitly include the missing header `type_algorithms.h` in some test files;
- Some tests scattered in different test functions with ad-hoc names
  (e.g., `test5()`, `test6()`) but belong to the same kind are now grouped
  into one function (`test_struct_array()`).
The mapping of IR ExitBB to a VPBB isn't used. It also sets an incorrect
VPBB for the ExitBB; the regions successor is the middle block, no the
exit block.

It also unnecessarily triggers an assertion after 38376de.
…, replace HasVariableMask bool arg. NFC. (llvm#127826)

Minor NFC refactor before making better variable mask combining decisions - isTargetShuffleVariableMask doesn't discriminate between fast (AND, PSHUFB etc.) and slow (VPERMV3 etc.) variable shuffles, so an opaque HasVariableMask is only of limited use.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.