[AutoBump] Merge with afc0557a (Oct 01) (5) #441

mgehre-amd · 2025-01-10T15:12:25Z

No description provided.

After llvm#109774 MSVC is failing to build LLVM with the error: ``` llvm\lib\Target\RISCV\RISCVInstrInfo.cpp(782): warning C4018: '<': signed/unsigned mismatch ``` Fix is ensure that the RHS is an unsigned integer.

https://lab.llvm.org/buildbot/#/builders/89/builds/7488

Update the docs to mention that kernel argument preloading is not supported on MI210.

Adding optimality test to `add`, `sub`, `avgCeilU` and `avgFloorU`

llvm#109252) This fixes issue llvm#109250 The issue happens during the `MachineBlockPlacement` pass. The block, whose address was previously not taken, is deemed redundant by the pass and subsequently replaced using `MachineBasicBlock::ReplaceUsesOfBlockWith` in `BranchFolding`. ReplaceUsesOfBlockWith only replaces uses in the terminator. However, `expandPostRAPseudo` introduces new block uses when expanding catchrets. These uses do not get replaced, which results in undefined label errors later on. Marking the block addresss as taken prevents the replacement of the block, without also replacing non-terminator uses.

…es (llvm#110501) This is a re-application of bc6bd3b which was reverted in f11abac because it broke the Clang pre-commit CI. Original commit message: This patch rewrites the modulemap to have fewer top-level modules. Previously, our modulemap had one top level module for each header in the library, including private headers. This had the well-known problem of making compilation times terrible, in addition to being somewhat against the design principles of Clang modules. This patch provides almost an order of magnitude compilation time improvement when building modularized code (certainly subject to variations). For example, including <ccomplex> without a module cache went from 22.4 seconds to 1.6 seconds, a 14x improvement. To achieve this, one might be tempted to simply put all the headers in a single top-level module. Unfortunately, this doesn't work because libc++ provides C compatibility headers (e.g. stdlib.h) which create cycles when the C Standard Library headers are modularized too. This is especially tricky since base systems are usually not modularized: as far as I know, only Xcode 16 beta contains a modularized SDK that makes this issue visible. To understand it, imagine we have the following setup: // in libc++'s include/c++/v1/module.modulemap module std { header stddef.h header stdlib.h } // in the C library's include/module.modulemap module clib { header stddef.h header stdlib.h } Now, imagine that the C library's <stdlib.h> includes <stddef.h>, perhaps as an implementation detail. When building the `std` module, libc++'s <stdlib.h> header does `#include_next <stdlib.h>` to get the C library's <stdlib.h>, so libc++ depends on the `clib` module. However, remember that the C library's <stdlib.h> header includes <stddef.h> as an implementation detail. Since the header search paths for libc++ are (and must be) before the search paths for the C library, the C library ends up including libc++'s <stddef.h>, which means it depends on the `std` module. That's a cycle. To solve this issue, this patch creates one top-level module for each C compatibility header. The rest of the libc++ headers are located in a single top-level `std` module, with two main exceptions. First, the module containing configuration headers (e.g. <__config>) has its own top-level module too, because those headers are included by the C compatibility headers. Second, we create a top-level std_core module that contains several dependency-free utilities used (directly or indirectly) from the __math subdirectory. This is needed because __math pulls in a bunch of stuff, and __math is used from the C compatibility header <math.h>. As a direct benefit of this change, we don't need to generate an artificial __std_clang_module header anymore to provide a monolithic `std` module, since our modulemap does it naturally by construction. A next step after this change would be to look into whether math.h really needs to include the contents of __math, and if so, whether libc++'s math.h truly needs to include the C library's math.h header. Removing either dependency would break this annoying cycle. Thanks to Eric Fiselier for pointing out this approach during a recent meeting. This wasn't viable before some recent refactoring, but wrapping everything (except the C headers) in a large module is by far the simplest and the most effective way of doing this. Fixes llvm#86193

These files all use `strtod` - make sure to include a proper header for this function. Otherwise, building MLIR fails on some systems after the recent commit 1b5f691 which removed inclusion of `<cmath>` and thus broke transitive inclusion of `<stdlib.h>` in these headers.

Fill the regular delay-load IAT with x86_64 delay-load thunks. Similarly to regular imports, create an auxiliary IAT and its copy for ARM64EC calls. These are filled with the same `__impchk_` thunks used for regular imports, which perform an indirect call with `__icall_helper_arm64ec` on the regular delay-load IAT. These auxiliary IATs are exposed via CHPE metadata starting from version 2. The MSVC linker creates one more copy of the auxiliary IAT. `__imp_func` symbols refer to that hidden IAT, while the `#func` thunk performs a call with the public auxiliary IAT. If the public auxiliary IAT is fine for `#func`, it should be fine for calls using the `__imp_func` symbol as well. Therefore, I made `__imp_func` refer to that IAT too.

…nt padding in assignRVVStackObjectOffsets. (llvm#110312) If we know vlen is a multiple of 16, we don't need any alignment padding. I wrote the code so that it would generate the minimum amount of padding if the stack align was 32 or larger or if RVVBitsPerBlock was smaller than half the stack alignment.

… fixed vectors in some cases. (llvm#109232) Copy the same FSUB check from ExpandFNEG to avoid breaking AArch64 and ARM.

Precommit tests for follow-up improvements to Clang's TBAA emission. Also add variants with -pointer-tbaa to tbaa-reference.cpp.

…ues` (llvm#110414) Simplify the nesting structure of "if" checks in `remapValues` and update the code comments. This is what the comments stated in case there is no type converter: ``` // TODO: What we should do here is just set `desiredType` to `origType` // and then handle the necessary type conversions after the conversion // process has finished. Unfortunately a lot of patterns currently rely on // receiving the new operands even if the types change, so we keep the // original behavior here for now until all of the patterns relying on // this get updated. ``` However, without a type converter it is not possible to perform any materializations. Furthermore, the absence of a type converter indicates that the pattern does not care about type legality. Therefore, the current implementation is correct and this TODO can be removed. Note: Patterns that actually require a remapped type to match the original operand type can be equipped with a type converter that maps each type to itself. This TODO is outdated: ``` // TODO: There currently isn't any mechanism to do 1->N type conversion // via the PatternRewriter replacement API, so for now we just ignore it. ``` 1->N type conversions are already possible as part of block signature conversions. It is incorrect to just ignore such cases. However, there is currently no better way to handle 1->N conversions in this function because of infrastructure limitations. This is now clarified in the comments.

Use `moduleOp.getBody()` instead of `moduleOp.getBodyRegion().front()`.

…s. (llvm#110302) As mentioned in llvm#108633, we don't respect the lower bound of the assumed shape arrays if those were specified. It happens in both cases: 1. When caller has non-default lower bound and callee has default 2. When callee has non-default lower bound and caller has default This PR tries to fix this issue by improving our generation of lower bound attribute on DICompositeTypeAttr. If we see a lower bound in the declaration, we respect that. Note that same function is also used for allocatable/pointer variables. We make sure that we get the lower bound from descriptor in those cases. Please note that DWARF assumes a lower bound of 1 so in many cases we don't need to generate the lower bound. Fixes llvm#108633.

…#109698) After a CONTAINS statement in a program unit, a statement that cannot begin a subprogram will trigger catastrophic error recovery. But the compiler is presently emitting multiple errors for the same location about expected variations of END statements. Emit fewer messages. Fixes llvm#109609.

…lvm#110031) When an INTEGER conversion to a smaller kind overflows in constant folding, report the truncated value so that it makes more sense later if it shows up in other messages.

When a format is missing a comma between two edit descriptors, the previous token was an integer, and the following item is a repeatable edit descriptor or a parenthesized group, we emit an error, since it can't be known where the digits of the integer should be split. But in the case of a single digit, the situation is not ambiguous, and the message should be a warning. Fixes llvm#110261.

…110333) Fortran INCLUDE lines have (until now) been treated like #include directives. This isn't how things work with other Fortran compilers when running under the -E option for preprocessing only, so stop doing it by default, and add -fpreprocess-include-lines to turn it back on when desired.

…78510) This change contains following: - adds lowering of printf op to spirv.CL.printf op in GPUToSPIRV pass. - Fixes Constant decoration parsing for spirv GlobalVariable. - minor modification to spirv.CL.printf op assembly format. --------- Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com>

Test: AArch64/GlobalISel/irtranslator-subvector.ll Reference: https://llvm.org/docs/LangRef.html#llvm-vector-extract-intrinsic https://llvm.org/docs/LangRef.html#llvm-vector-insert-intrinsic

Alongside something like: vpternlogq zmm0, zmm2, zmm1, 64 We will now have a comment on the right like: # zmm0 = zmm0 & zmm2 & ~zmm1 This makes it easy to tell at a glance what sort of truth table the instruction will provide.

…#110559) Currently, many attempts to lower loads and stores on buffer fat pointers lower directly to intrinsic calls that will be unsupported by or crash codegen (ex, storing a [2 x i32], a <6 x half>, or an i160). Record the current behavior to make the effects of the fix more visible in an upcoming PR.

This removes the need for macOS nodes in Buildkite. It also moves to the proper way of testing backdeployment, which is to actually run on the target OS itself, instead of using packaged dylibs from previous OS versions and trying to emulate backdeployment with DYLD_LIBRARY_PATH. As a drive-by change, also fix a few back-deployment annotations that were incorrect and add support for minor versions in the Lit feature determining availability from the target triple.

Move the helper to VPIRBasicBlock to allow easier re-use outside VPlan.cpp

MALLOC and FREE are extensions provided by gfortran, Intel Fortran and classic flang to allocate memory for Cray pointers. These are used in some legacy codes such as libexodus. All the above compilers accept using MALLOC and FREE with integers as well, despite that this will often signify a bug in user code. We should accept the same as the other compilers for compatibility.

There are some spurious libraries which can be removed. I'm trying to bundle MLIR/LLVM library dependencies for our own libraries. We're utilizing cmake function to recursively collect MLIR/LLVM related dependencies. However, we identified certain library dependencies as redundant and safe for removal.

…0544) This replaces some of the most frequent offenders of using a DenseMap that cause a malloc, where the typical element-count is small enough to fit in an initial stack allocation. Most of these are fairly obvious, one to highlight is the collectOffset method of GEP instructions: if there's a GEP, of course it's going to have at least one offset, but every time we've called collectOffset we end up calling malloc as well for the DenseMap in the MapVector.

…ns (llvm#110549) This patch implements a few set operations for the intervals. These include: - operator==() and operator!=() for comparing two intervals. - disjoint() - intersection() - difference, which uses operator-()

…icmp spred X, Y` (llvm#110630) ``` icmp spred (mul nsw X, Z), (mul nsw Y, Z) -> icmp spred X, Y iff Z > 0 icmp spred (mul nsw X, Z), (mul nsw Y, Z) -> icmp spred Y, X iff Z < 0 ``` Alive2: https://alive2.llvm.org/ce/z/9fXFfn

…m#110108) Most of PAuth-related code counts the instructions being inserted and asserts that no more bytes are emitted than the size returned by the getInstSizeInBytes(MI) method. This check seems useful not only for PAuth-related instructions. Also, reimplementing it globally in AArch64AsmPrinter makes it more robust and simplifies further refactoring of PAuth-related code.

When combining two geps into one by adding the offsets, we have to take some care when intersecting the flags, because nusw flags cannot be straightforwardly preserved. Add a helper for this on GEPNoWrapFlags so we won't have to repeat this logic in various places.

…lvm#110672) As a proxy criterion, mesa targets have unaligned-access-mode (which determines whether the hardware allows unaligned memory accesses) not set whereas amdhsa targets do. This PR changes tests to use amdhsa instead of mesa and inserts additional checks with unaligned-access-mode unset explicitly. This is in preparation for PR llvm#110219, which will generate different code depending on the unaligned-access-mode.

…erlapping Def/Use (llvm#109875) The current RP handling for uses of an MI that overlap with defs is confusing and unnecessary. Moreover, the lane masks do not accurately model the liveness behavior of the subregs. This cleans things up a bit and more accurately models subreg lane liveness by sinking the use handling into subsent Uses loop. The effect of this PR is to replace A. `increaseRegPressure(Reg, LiveAfter, ~LiveAfter & LiveBefore)` with B. `increaseRegPressure(Reg, LiveAfter, LiveBefore)` Note that A (Defs loop) and B (Uses loop) have different definitions of LiveBefore A. `LiveBefore = (LiveAfter & ~DefLanes) | UseLanes` and B. `LiveBefore = LiveAfter | UseLanes` Also note, `increaseRegPressure` will exit if `PrevMask` (`LiveAfter` for both A/B) has any active lanes, thus these calls will only have an effect if `LiveAfter` is 0. A. NewMask = ~LiveAfter & ((LiveAfter & ~DefLanes) | UseLanes) => (1 & UseLanes) => UseLanes = (0 | UseLanes) => (LiveAfter | UseLanes) = NewMask B.

…lauses (llvm#109809) This patch updates printing and parsing of operations including clauses that define entry block arguments to the operation's region. This impacts `in_reduction`, `map`, `private`, `reduction` and `task_reduction`. The proposed representation to be used by all such clauses is the following: ``` <clause_name>([byref] [@<sym>] %value -> %block_arg [, ...] : <type>[, ...]) { ... } ``` The `byref` tag is only allowed for reduction-like clauses and the `@<sym>` is required and only allowed for the `private` and reduction-like clauses. The `map` clause does not accept any of these two. This change fixes some currently broken op representations, like `omp.teams` or `omp.sections` reduction: ``` omp.teams reduction([byref] @<sym> -> %value : <type>) { ^bb0(%block_arg : <type>): ... } ``` Additionally, it addresses some redundancy in the representation of the previously mentioned cases, as well as e.g. `map` in `omp.target`. The problem is that the block argument name after the arrow is not checked in any way, which makes some misleading representations legal: ```mlir omp.target map_entries(%x -> %arg1, %y -> %arg0, %z -> %doesnt_exist : !llvm.ptr, !llvm.ptr, !llvm.ptr) { ^bb0(%arg0 : !llvm.ptr, %arg1 : !llvm.ptr, %arg2 : !llvm.ptr): ... } ``` In that case, `%x` maps to `%arg0`, contrary to what the representation states, and `%z` maps to `%arg2`. `%doesnt_exist` is not resolved, so it would likely cause issues if used anywhere inside of the operation's region. The solution implemented in this patch makes it so that values introduced after the arrow on the representation of these clauses implicitly define the corresponding entry block arguments, removing the potential for these problematic representations. This is what is already implemented for the `private` and `reduction` clauses of `omp.parallel`. There are a couple of consequences of this change: - Entry block argument-defining clauses must come at the end of the operation's representation and in alphabetical order. This is because they are printed/parsed as part of the region and a standardized ordering is needed to reliably match op arguments with their corresponding entry block arguments via the `BlockArgOpenMPOpInterface`. - We can no longer define per-clause assembly formats to be reused by all operations that take these clauses, since they must be passed to a custom printer including the region and arguments of all other entry block argument-defining clauses. Code duplication and potential for introducing issues is minimized by providing the generic `{print,parse}BlockArgRegion` helpers and associated structures. MLIR and Flang lowering unit tests are updated due to changes in the order and formatting of impacted operations.

Since no passes compute DependenceAnalysis via the PassManager, there is no value in preserving it here. Hence, strip the unnecessary dependency on DependenceAnalysis.

…llvm#110562) For setScore, the root function is setScoreByInterval with RegInterval input For determineWait, the root function is determineWait with RegInterval input

…m#109810) This patch updates the `omp.target_data` operation to use the same formatting as `map` clauses on `omp.target` for `use_device_addr` and `use_device_ptr`. This is done so the mapping that is being enforced between op arguments and associated entry block arguments is explicit. The way it is achieved is by marking these clauses as entry block argument-defining and adjusting printer/parsers accordingly. As a result of this change, block arguments for `use_device_addr` come before those for `use_device_ptr`, which is the opposite of the previous undocumented situation. Some unit tests are updated based on this change, in addition to those updated because of the format change.

`Type::getPointerTo()` is to be deprecated & removed soon.

…lvm#110067) llvm#60481

…lvm#109811) This patch adds general information on the proposed approach to unify the handling and representation of clauses that define entry block arguments attached to operations that accept them.

None of these tested the case where the non-frame index operand was a register.

) The `omp.section` operation is an outlier in that the block arguments it has are defined by clauses on the required parent `omp.sections` operation. This patch updates the definition of this operation introducing the `BlockArgOpenMPOpInterface` to simplify the handling and verification of these block arguments, implemented based on the parent `omp.sections`.

…lvm#109028)

…vm#110573) Decrease code size of `Intrinsic::getAttributes` function by uniquing the function and argument attributes separately and using the `IntrinsicsToAttributesMap` to store argument attribute ID in low 8 bits and function attribute ID in upper 8 bits. This reduces the number of cases to handle in the generated switch from 368 to 131, which is ~2.8x reduction in the number of switch cases. Also eliminate the fixed size array `AS` and `NumAttrs` variable, and instead call `AttributeList::get` directly from each case, with an inline array of the <index, AttribueSet> pairs.

…m#109719) Add support for taking the intersection of two AttributeLists s.t the result list contains attributes that are valid in the context of both inputs. i.e if we have `nonnull align(32) noundef` intersected with `nonnull align(16) dereferenceable(10)`, the result is `nonnull align(16)`. Further it handles attributes that are not-droppable. For example dropping `byval` can change the nature of a callsite/function so its impossible to correct a correct intersection if its dropped from the result. i.e `nonnull byval(i64)` intersected with `nonnull` is invalid. The motivation for the infrastructure is to enable sinking/hoisting callsites with differing attributes.

dpaoliello and others added 30 commits September 30, 2024 10:40

[flang][cuda] Fix buildbot failure (llvm#110540)

3e5e48a

https://lab.llvm.org/buildbot/#/builders/89/builds/7488

[docs][amdgpu] Update kernarg documentation for gfx90a (llvm#109690)

5d45815

Update the docs to mention that kernel argument preloading is not supported on MI210.

[clang][bytecode] Implement ia32_bextr builitns (llvm#110513)

8518178

[KnownBitsTest] Add optimality tests to some optimal impls; NFC

bbdca53

Adding optimality test to `add`, `sub`, `avgCeilU` and `avgFloorU`

[SandboxIR][NFC] Rename SandboxIRValues.def to Values.def (llvm#110538)

9e85937

[LegalizeVectorOps] Enable ExpandFABS/COPYSIGN to use integer ops for…

92a8b81

… fixed vectors in some cases. (llvm#109232) Copy the same FSUB check from ExpandFNEG to avoid breaking AArch64 and ARM.

[TBAA] Add tests with pointers to structs to tbaa-pointers.c.

df3f291

Precommit tests for follow-up improvements to Clang's TBAA emission. Also add variants with -pointer-tbaa to tbaa-reference.cpp.

[mlir][NFC] Minor cleanup around ModuleOp usage (llvm#110498)

49df12c

Use `moduleOp.getBody()` instead of `moduleOp.getBodyRegion().front()`.

[flang] Improve error messages about overflowed integer conversions (l…

1759f3b

…lvm#110031) When an INTEGER conversion to a smaller kind overflows in constant folding, report the truncated value so that it makes more sense later if it shows up in other messages.

[GlobalISel] Import extract/insert subvector (llvm#110287)

53943de

Test: AArch64/GlobalISel/irtranslator-subvector.ll Reference: https://llvm.org/docs/LangRef.html#llvm-vector-extract-intrinsic https://llvm.org/docs/LangRef.html#llvm-vector-insert-intrinsic

[X86] Decode VPTERNLOG truth tables when disassembling

6c5277b

Alongside something like: vpternlogq zmm0, zmm2, zmm1, 64 We will now have a comment on the right like: # zmm0 = zmm0 & zmm2 & ~zmm1 This makes it easy to tell at a glance what sort of truth table the instruction will provide.

[VPlan] Move createVPIRBasicBlock helper to VPIRBasicBlock (NFC).

725eb6b

Move the helper to VPIRBasicBlock to allow easier re-use outside VPlan.cpp

dtcxzyw and others added 26 commits October 1, 2024 22:16

[gn build] Port 2469d7e

616d1d2

[DebugInfo] Avoid repeated hash lookups (NFC) (llvm#110620)

f01d45c

[ExecutionEngine] Avoid repeated hash lookups (NFC) (llvm#110621)

ede866d

AMDGPU: Mark scc dead when materialized frame base registers

8395b3f

AMDGPU: Fix executable permissions on file

dc98482

[clang][bytecode] Check GetPtrBase ops for null pointers (llvm#110673)

55c70f6

[clang][bytecode] Implement ia32_{pdep,pext} builtins (llvm#110675)

f3baa73

LoopSimplify: strip dependency on DA (NFC) (llvm#107379)

9f6f6af

Since no passes compute DependenceAnalysis via the PassManager, there is no value in preserving it here. Hence, strip the unnecessary dependency on DependenceAnalysis.

[AMDGPU] Refactor several functions for merging with downstream work. (…

c66dee4

…llvm#110562) For setScore, the root function is setScoreByInterval with RegInterval input For determineWait, the root function is determineWait with RegInterval input

[llvm][OMPIRBuilder] Avoid Type::getPointerTo() (NFC) (llvm#110678)

d071fda

`Type::getPointerTo()` is to be deprecated & removed soon.

[libc][stdio] Use proxy headers of stdio.h in src and test folders. (l…

c63112a

…lvm#110067) llvm#60481

[MLIR][OpenMP] Document entry block argument-defining clauses (NFC) (l…

4e52e6a

…lvm#109811) This patch adds general information on the proposed approach to unify the handling and representation of clauses that define entry block arguments attached to operations that accept them.

AMDGPU: Add missing tests for local stack alloc s_add_i32 handling

f61abee

None of these tested the case where the non-frame index operand was a register.

[libc++] Remove potential 0-sized array in __compressed_pair_padding (l…

0eb2602

…lvm#109028)

[SLP][NFC]Add a test with external cast and extracted operand, NFC

0dab022

[AutoBump] Merge with afc0557 (Oct 01)

a3b7199

mgehre-amd requested a review from cferry-AMD January 10, 2025 15:23

cferry-AMD approved these changes Jan 13, 2025

View reviewed changes

mgehre-amd merged commit 44962f7 into bump_to_357c1970 Jan 13, 2025
45 checks passed

mgehre-amd deleted the bump_to_afc0557a branch January 13, 2025 08:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AutoBump] Merge with afc0557a (Oct 01) (5) #441

[AutoBump] Merge with afc0557a (Oct 01) (5) #441

Uh oh!

mgehre-amd commented Jan 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

86 participants

[AutoBump] Merge with afc0557a (Oct 01) (5) #441

[AutoBump] Merge with afc0557a (Oct 01) (5) #441

Uh oh!

Conversation

mgehre-amd commented Jan 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

86 participants