Skip to content

Conversation

@mgehre-amd
Copy link
Collaborator

No description provided.

dpaoliello and others added 30 commits September 30, 2024 10:40
After llvm#109774 MSVC is failing to build LLVM with the error:

```
llvm\lib\Target\RISCV\RISCVInstrInfo.cpp(782): warning C4018: '<': signed/unsigned mismatch
```

Fix is ensure that the RHS is an unsigned integer.
Update the docs to mention that kernel argument preloading is not
supported on MI210.
Adding optimality test to `add`, `sub`, `avgCeilU` and `avgFloorU`
llvm#109252)

This fixes issue llvm#109250

The issue happens during the `MachineBlockPlacement` pass. The block,
whose address was previously not taken, is deemed redundant by the pass
and subsequently replaced using
`MachineBasicBlock::ReplaceUsesOfBlockWith` in `BranchFolding`.

ReplaceUsesOfBlockWith only replaces uses in the terminator. However,
`expandPostRAPseudo` introduces new block uses when expanding catchrets.
These uses do not get replaced, which results in undefined label errors
later on.

Marking the block addresss as taken prevents the replacement of the
block, without also replacing non-terminator uses.
…es (llvm#110501)

This is a re-application of bc6bd3b which was reverted in
f11abac because it broke the Clang pre-commit CI.

Original commit message:

This patch rewrites the modulemap to have fewer top-level modules.
Previously, our modulemap had one top level module for each header in
the library, including private headers. This had the well-known problem
of making compilation times terrible, in addition to being somewhat
against the design principles of Clang modules.

This patch provides almost an order of magnitude compilation time
improvement when building modularized code (certainly subject to
variations). For example, including <ccomplex> without a module cache
went from 22.4 seconds to 1.6 seconds, a 14x improvement.

To achieve this, one might be tempted to simply put all the headers in a
single top-level module. Unfortunately, this doesn't work because libc++
provides C compatibility headers (e.g. stdlib.h) which create cycles
when the C Standard Library headers are modularized too. This is
especially tricky since base systems are usually not modularized: as far
as I know, only Xcode 16 beta contains a modularized SDK that makes this
issue visible. To understand it, imagine we have the following setup:

   // in libc++'s include/c++/v1/module.modulemap
   module std {
      header stddef.h
      header stdlib.h
   }

   // in the C library's include/module.modulemap
   module clib {
      header stddef.h
      header stdlib.h
   }

Now, imagine that the C library's <stdlib.h> includes <stddef.h>,
perhaps as an implementation detail. When building the `std` module,
libc++'s <stdlib.h> header does `#include_next <stdlib.h>` to get the C
library's <stdlib.h>, so libc++ depends on the `clib` module.

However, remember that the C library's <stdlib.h> header includes
<stddef.h> as an implementation detail. Since the header search paths
for libc++ are (and must be) before the search paths for the C library,
the C library ends up including libc++'s <stddef.h>, which means it
depends on the `std` module. That's a cycle.

To solve this issue, this patch creates one top-level module for each C
compatibility header. The rest of the libc++ headers are located in a
single top-level `std` module, with two main exceptions. First, the
module containing configuration headers (e.g. <__config>) has its own
top-level module too, because those headers are included by the C
compatibility headers.

Second, we create a top-level std_core module that contains several
dependency-free utilities used (directly or indirectly) from the __math
subdirectory. This is needed because __math pulls in a bunch of stuff,
and __math is used from the C compatibility header <math.h>.

As a direct benefit of this change, we don't need to generate an
artificial __std_clang_module header anymore to provide a monolithic
`std` module, since our modulemap does it naturally by construction.

A next step after this change would be to look into whether math.h
really needs to include the contents of __math, and if so, whether
libc++'s math.h truly needs to include the C library's math.h header.
Removing either dependency would break this annoying cycle.

Thanks to Eric Fiselier for pointing out this approach during a recent
meeting. This wasn't viable before some recent refactoring, but wrapping
everything (except the C headers) in a large module is by far the
simplest and the most effective way of doing this.

Fixes llvm#86193
These files all use `strtod` - make sure to include a proper header for
this function. Otherwise, building MLIR fails on some systems after the
recent commit 1b5f691 which removed
inclusion of `<cmath>` and thus broke transitive inclusion of
`<stdlib.h>` in these headers.
Fill the regular delay-load IAT with x86_64 delay-load thunks. Similarly
to regular imports, create an auxiliary IAT and its copy for ARM64EC
calls. These are filled with the same `__impchk_` thunks used for
regular imports, which perform an indirect call with
`__icall_helper_arm64ec` on the regular delay-load IAT. These auxiliary
IATs are exposed via CHPE metadata starting from version 2.

The MSVC linker creates one more copy of the auxiliary IAT. `__imp_func`
symbols refer to that hidden IAT, while the `#func` thunk performs a
call with the public auxiliary IAT. If the public auxiliary IAT is fine
for `#func`, it should be fine for calls using the `__imp_func` symbol
as well. Therefore, I made `__imp_func` refer to that IAT too.
…nt padding in assignRVVStackObjectOffsets. (llvm#110312)

If we know vlen is a multiple of 16, we don't need any alignment
padding.

I wrote the code so that it would generate the minimum amount of padding
if the stack align was 32 or larger or if RVVBitsPerBlock was smaller
than half the stack alignment.
… fixed vectors in some cases. (llvm#109232)

Copy the same FSUB check from ExpandFNEG to avoid breaking AArch64 and
ARM.
Precommit tests for follow-up improvements to Clang's TBAA emission.

Also add variants with -pointer-tbaa to tbaa-reference.cpp.
…ues` (llvm#110414)

Simplify the nesting structure of "if" checks in `remapValues` and
update the code comments.

This is what the comments stated in case there is no type converter:
```
      // TODO: What we should do here is just set `desiredType` to `origType`
      // and then handle the necessary type conversions after the conversion
      // process has finished. Unfortunately a lot of patterns currently rely on
      // receiving the new operands even if the types change, so we keep the
      // original behavior here for now until all of the patterns relying on
      // this get updated.
```

However, without a type converter it is not possible to perform any
materializations. Furthermore, the absence of a type converter indicates
that the pattern does not care about type legality. Therefore, the
current implementation is correct and this TODO can be removed.

Note: Patterns that actually require a remapped type to match the
original operand type can be equipped with a type converter that maps
each type to itself.

This TODO is outdated:
```
      // TODO: There currently isn't any mechanism to do 1->N type conversion
      // via the PatternRewriter replacement API, so for now we just ignore it.
```
1->N type conversions are already possible as part of block signature
conversions. It is incorrect to just ignore such cases. However, there
is currently no better way to handle 1->N conversions in this function
because of infrastructure limitations. This is now clarified in the
comments.
Use `moduleOp.getBody()` instead of `moduleOp.getBodyRegion().front()`.
…s. (llvm#110302)

As mentioned in llvm#108633, we don't respect the lower bound of the assumed
shape arrays if those were specified. It happens in both cases:
1. When caller has non-default lower bound and callee has default
2. When callee has non-default lower bound and caller has default

This PR tries to fix this issue by improving our generation of lower
bound attribute on DICompositeTypeAttr. If we see a lower bound in the
declaration, we respect that. Note that same function is also used for
allocatable/pointer variables. We make sure that we get the lower bound
from descriptor in those cases. Please note that DWARF assumes a lower
bound of 1 so in many cases we don't need to generate the lower bound.

Fixes llvm#108633.
…#109698)

After a CONTAINS statement in a program unit, a statement that cannot
begin a subprogram will trigger catastrophic error recovery. But the
compiler is presently emitting multiple errors for the same location
about expected variations of END statements. Emit fewer messages.

Fixes llvm#109609.
…lvm#110031)

When an INTEGER conversion to a smaller kind overflows in constant
folding, report the truncated value so that it makes more sense later if
it shows up in other messages.
When a format is missing a comma between two edit descriptors, the
previous token was an integer, and the following item is a repeatable
edit descriptor or a parenthesized group, we emit an error, since it
can't be known where the digits of the integer should be split. But in
the case of a single digit, the situation is not ambiguous, and the
message should be a warning.

Fixes llvm#110261.
…110333)

Fortran INCLUDE lines have (until now) been treated like #include
directives. This isn't how things work with other Fortran compilers when
running under the -E option for preprocessing only, so stop doing it by
default, and add -fpreprocess-include-lines to turn it back on when
desired.
…78510)

This change contains following:
	- adds lowering of printf op to spirv.CL.printf op in GPUToSPIRV pass.
	- Fixes Constant decoration parsing for spirv GlobalVariable.
	- minor modification to spirv.CL.printf op assembly format.

---------

Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com>
Alongside something like:
  vpternlogq      zmm0, zmm2, zmm1, 64

We will now have a comment on the right like:
  # zmm0 = zmm0 & zmm2 & ~zmm1

This makes it easy to tell at a glance what sort of truth table the
instruction will provide.
…#110559)

Currently, many attempts to lower loads and stores on buffer fat
pointers lower directly to intrinsic calls that will be unsupported by
or crash codegen (ex, storing a [2 x i32], a <6 x half>, or an i160).

Record the current behavior to make the effects of the fix more visible
in an upcoming PR.
This removes the need for macOS nodes in Buildkite. It also moves to the
proper way of testing backdeployment, which is to actually run on the
target OS itself, instead of using packaged dylibs from previous OS
versions and trying to emulate backdeployment with DYLD_LIBRARY_PATH.

As a drive-by change, also fix a few back-deployment annotations that
were incorrect and add support for minor versions in the Lit feature
determining availability from the target triple.
Move the helper to VPIRBasicBlock to allow easier re-use outside
VPlan.cpp
MALLOC and FREE are extensions provided by gfortran, Intel Fortran and
classic flang to allocate memory for Cray pointers. These are used in
some legacy codes such as libexodus.

All the above compilers accept using MALLOC and FREE with integers as
well, despite that this will often signify a bug in user code. We should
accept the same as the other compilers for compatibility.
There are some spurious libraries which can be removed.

I'm trying to bundle MLIR/LLVM library dependencies for our own
libraries. We're utilizing cmake function to recursively collect
MLIR/LLVM related dependencies. However, we identified certain library
dependencies as redundant and safe for removal.
…0544)

This replaces some of the most frequent offenders of using a DenseMap that
cause a malloc, where the typical element-count is small enough to fit in
an initial stack allocation.

Most of these are fairly obvious, one to highlight is the collectOffset
method of GEP instructions: if there's a GEP, of course it's going to have
at least one offset, but every time we've called collectOffset we end up
calling malloc as well for the DenseMap in the MapVector.
…ns (llvm#110549)

This patch implements a few set operations for the intervals. These
include:
- operator==() and operator!=() for comparing two intervals.
- disjoint()
- intersection()
- difference, which uses operator-()
dtcxzyw and others added 26 commits October 1, 2024 22:16
…icmp spred X, Y` (llvm#110630)

```
icmp spred (mul nsw X, Z), (mul nsw Y, Z) -> icmp spred X, Y iff Z > 0
icmp spred (mul nsw X, Z), (mul nsw Y, Z) -> icmp spred Y, X iff Z < 0
```
Alive2: https://alive2.llvm.org/ce/z/9fXFfn
…m#110108)

Most of PAuth-related code counts the instructions being inserted and
asserts that no more bytes are emitted than the size returned by the
getInstSizeInBytes(MI) method. This check seems useful not only for
PAuth-related instructions. Also, reimplementing it globally in
AArch64AsmPrinter makes it more robust and simplifies further
refactoring of PAuth-related code.
When combining two geps into one by adding the offsets, we have
to take some care when intersecting the flags, because nusw flags
cannot be straightforwardly preserved.

Add a helper for this on GEPNoWrapFlags so we won't have to repeat
this logic in various places.
…lvm#110672)

As a proxy criterion, mesa targets have unaligned-access-mode (which
determines whether the hardware allows unaligned memory accesses) not
set whereas amdhsa targets do. This PR changes tests to use amdhsa
instead of mesa and inserts additional checks with unaligned-access-mode
unset explicitly.

This is in preparation for PR llvm#110219, which will generate different
code depending on the unaligned-access-mode.
…erlapping Def/Use (llvm#109875)

The current RP handling for uses of an MI that overlap with defs is
confusing and unnecessary. Moreover, the lane masks do not accurately
model the liveness behavior of the subregs. This cleans things up a bit
and more accurately models subreg lane liveness by sinking the use
handling into subsent Uses loop.

The effect of this PR is to replace

A. `increaseRegPressure(Reg, LiveAfter, ~LiveAfter & LiveBefore)`

with 

B. `increaseRegPressure(Reg, LiveAfter, LiveBefore)`

Note that A (Defs loop) and B (Uses loop) have different definitions of
LiveBefore

A. `LiveBefore = (LiveAfter & ~DefLanes) | UseLanes`

and 

B. `LiveBefore =  LiveAfter | UseLanes`

Also note, `increaseRegPressure` will exit if `PrevMask` (`LiveAfter`
for both A/B) has any active lanes, thus these calls will only have an
effect if `LiveAfter` is 0.


A. NewMask = ~LiveAfter & ((LiveAfter & ~DefLanes) | UseLanes) => (1 &
UseLanes) => UseLanes = (0 | UseLanes) => (LiveAfter | UseLanes) =
NewMask B.
…lauses (llvm#109809)

This patch updates printing and parsing of operations including clauses
that define entry block arguments to the operation's region. This
impacts `in_reduction`, `map`, `private`, `reduction` and
`task_reduction`.

The proposed representation to be used by all such clauses is the
following:
```
<clause_name>([byref] [@<sym>] %value -> %block_arg [, ...] : <type>[, ...]) {
  ...
}
```

The `byref` tag is only allowed for reduction-like clauses and the
`@<sym>` is required and only allowed for the `private` and
reduction-like clauses. The `map` clause does not accept any of these
two.

This change fixes some currently broken op representations, like
`omp.teams` or `omp.sections` reduction:
```
omp.teams reduction([byref] @<sym> -> %value : <type>) {
^bb0(%block_arg : <type>):
  ...
}
```

Additionally, it addresses some redundancy in the representation of the
previously mentioned cases, as well as e.g. `map` in `omp.target`. The
problem is that the block argument name after the arrow is not checked
in any way, which makes some misleading representations legal:
```mlir
omp.target map_entries(%x -> %arg1, %y -> %arg0, %z -> %doesnt_exist : !llvm.ptr, !llvm.ptr, !llvm.ptr) {
^bb0(%arg0 : !llvm.ptr, %arg1 : !llvm.ptr, %arg2 : !llvm.ptr):
  ...
}
```

In that case, `%x` maps to `%arg0`, contrary to what the representation
states, and `%z` maps to `%arg2`. `%doesnt_exist` is not resolved, so it
would likely cause issues if used anywhere inside of the operation's
region.

The solution implemented in this patch makes it so that values
introduced after the arrow on the representation of these clauses
implicitly define the corresponding entry block arguments, removing the
potential for these problematic representations. This is what is already
implemented for the `private` and `reduction` clauses of `omp.parallel`.

There are a couple of consequences of this change:
- Entry block argument-defining clauses must come at the end of the
operation's representation and in alphabetical order. This is because
they are printed/parsed as part of the region and a standardized
ordering is needed to reliably match op arguments with their
corresponding entry block arguments via the `BlockArgOpenMPOpInterface`.
- We can no longer define per-clause assembly formats to be reused by
all operations that take these clauses, since they must be passed to a
custom printer including the region and arguments of all other entry
block argument-defining clauses. Code duplication and potential for
introducing issues is minimized by providing the generic
`{print,parse}BlockArgRegion` helpers and associated structures.

MLIR and Flang lowering unit tests are updated due to changes in the
order and formatting of impacted operations.
Since no passes compute DependenceAnalysis via the PassManager, there is
no value in preserving it here. Hence, strip the unnecessary dependency
on DependenceAnalysis.
…llvm#110562)

For setScore, the root function is setScoreByInterval with RegInterval
input
For determineWait, the root function is determineWait with RegInterval
input
…m#109810)

This patch updates the `omp.target_data` operation to use the same
formatting as `map` clauses on `omp.target` for `use_device_addr` and
`use_device_ptr`. This is done so the mapping that is being enforced
between op arguments and associated entry block arguments is explicit.

The way it is achieved is by marking these clauses as entry block
argument-defining and adjusting printer/parsers accordingly.

As a result of this change, block arguments for `use_device_addr` come
before those for `use_device_ptr`, which is the opposite of the previous
undocumented situation. Some unit tests are updated based on this
change, in addition to those updated because of the format change.
`Type::getPointerTo()` is to be deprecated & removed soon.
…lvm#109811)

This patch adds general information on the proposed approach to unify
the handling and representation of clauses that define entry block
arguments attached to operations that accept them.
None of these tested the case where the non-frame index operand
was a register.
)

The `omp.section` operation is an outlier in that the block arguments it
has are defined by clauses on the required parent `omp.sections`
operation.

This patch updates the definition of this operation introducing the
`BlockArgOpenMPOpInterface` to simplify the handling and verification of
these block arguments, implemented based on the parent `omp.sections`.
…vm#110573)

Decrease code size of `Intrinsic::getAttributes` function by uniquing
the function and argument attributes separately and using the
`IntrinsicsToAttributesMap` to store argument attribute ID in low 8 bits
and function attribute ID in upper 8 bits.

This reduces the number of cases to handle in the generated switch from
368 to 131, which is ~2.8x reduction in the number of switch cases.

Also eliminate the fixed size array `AS` and `NumAttrs` variable, and
instead call `AttributeList::get` directly from each case, with an
inline array of the <index, AttribueSet> pairs.
…m#109719)

Add support for taking the intersection of two AttributeLists s.t the
result list contains attributes that are valid in the context of both
inputs.

i.e if we have `nonnull align(32) noundef` intersected with `nonnull
align(16) dereferenceable(10)`, the result is `nonnull align(16)`.

Further it handles attributes that are not-droppable. For example
dropping `byval` can change the nature of a callsite/function so its
impossible to correct a correct intersection if its dropped from the
result. i.e `nonnull byval(i64)` intersected with `nonnull` is
invalid.

The motivation for the infrastructure is to enable sinking/hoisting
callsites with differing attributes.
@mgehre-amd mgehre-amd requested a review from cferry-AMD January 10, 2025 15:23
@mgehre-amd mgehre-amd merged commit 44962f7 into bump_to_357c1970 Jan 13, 2025
45 checks passed
@mgehre-amd mgehre-amd deleted the bump_to_afc0557a branch January 13, 2025 08:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.