[AutoBump] Merge with 0195ec45 (Jan 15) (46) #517

jorickert · 2025-03-19T15:30:20Z

No description provided.

Add constant-folding for nvvm float/double fmin + fmax intrinsics, including all combinations of xorsign.abs, nan-propagation, and ftz.

…lvm#122471) This enable delayed privatization by default for `omp.wsloop` ops, with one caveat! I had to workaround the "impure" alloc region issue that being resolved at the moment. The workaround detects whether the alloc region's argument is used in the region and at the same time defined in block that does not dominate the chosen alloca insertion point. If so, we move the alloca insertion point below the defining instruction of the alloc region argument. This basically reverts to the non-delayed-privatizaiton behavior.

…EL to ACLE Q3 (llvm#123056)

SM80 has fma for bfloat16 but not add/mul/sub. Currently these ops incur a promotion to f32, but we can avoid this by writing them in terms of the fma: ``` FADD(a, b) -> FMA(a, 1.0, b) FMUL(a, b) -> FMA(a, b, -0.0) FSUB(a, b) -> FMA(b, -1.0, a) ``` Unfortunately there is no `fma.ftz` so when ftz is enabled, we still fall back to promotion.

For .wv widening instructions when checking if the opperand is vs1 or vs2, we take into account whether or not it has a passthru. For tied pseudos though their passthru is the vs2, and we weren't taking this into account.

…ed in CallLowering. (llvm#122853) For "returned" attribute arguments, the physical register is really a virtual register which shouldn't be stored in an MCRegister. This patch moves the conversion from Register to MCRegister into the derived classes of IncomingArgHandler. The derived class ReturnedArgCallReturnHandler does not use the register so no MCRegister is created in that case. The function and argument have been renamed to remove "Phys".

Checking the remark message if interchange did or didn't happen is more straight forward than the full IR for these cases. This comment was also made when I moved some tests away from relying on debug builds in change llvm#116780, and this is a prep step for llvm#119345 that is going to change these test cases.

…python_extension` (llvm#122865) This PR allows the users to specify the `NB_DOMAIN` for `add_mlir_python_extension`. This allows users to avoid nanobind domain conflicts, when python bindings from multiple `mlir` projects were imported. (https://nanobind.readthedocs.io/en/latest/faq.html#how-can-i-avoid-conflicts-with-other-projects-using-nanobind)

…deprecation (llvm#123118) The release note did not clearly mention that std::uncaught_exception had been removed in C++20.

…116147) These functions weren't added until API 26 (Android 8.0), but libc++ is supported for API 21 and up. These APIs are undeclared as of r.android.com/3216959.

llvm#108961) Missing information about begin and end pointers of std::vector can lead to missed optimizations in LLVM. This patch adds alignment assumptions at the point where the begin and end pointers are loaded. If the pointers would not have the same alignment, end might never get hit when incrementing begin. See llvm#101372 for a discussion of missed range check optimizations in hardened mode. Once llvm#108958 lands, the created `llvm.assume` calls for the alignment should be folded into the `load` instructions, resulting in no extra instructions after InstCombine. Co-authored-by: Louis Dionne <ldionne.2@gmail.com>

This restores the functionality of AsmPrinterHandlers to what it was prior to llvm#96785. The attempted hack there of adding a duplicate DebugHandlerBase handling added a lot of hidden state and assumptions, which just segfaulted when we tried to continuing using this API. Instead, this just goes back to the old design, but adds a separate array for the basic EH handles. The duplicate array is identical to the other array of handler, but which doesn't get their begin/endInstruction callbacks called. This still saves the negligible but measurable amount of virtual function calls as was the goal of llvm#96785, while restoring the API to the pre-LLVM-19 status quo.

) We have two copies of the same code in clang-tidy and clang-reorder-fields, and those are extremenly similar to `Lexer::findNextToken`, so just add an extra agument to the latter. --------- Co-authored-by: cor3ntin <corentinjabot@gmail.com>

…llvm#121806) Since there are no opcodes for atomic loads and stores comparing to SelectionDAG, we add `CheckMMOIsNonAtomic` predicate immediately after the opcode predicate to make a logical combination of them. Otherwise when `IPM_AtomicOrderingMMO` is inserted after `IPM_GenericPredicate`, the patterns without predicates get a higher priority as `IPM_AtomicOrderingMMO` has higher priority than `IPM_GenericPredicate`. This is important to preserve an order of aligned/unaligned patterns on X86 because aligned memory operations have an additional alignment predicate and should be checked first according to their placement in td file. Closes llvm#121446

…lvm#123197) The svluti4_lane intrinsic currently requires the tuple size to be specified in the intrinsic name when using a tuple type input. According to the ACLE specification, the svluti4_lane intrinsic with a tuple type input, such as: svint16_t svluti4_lane[_s16_x2(svint16x2_t table, svuint8_t indices, uint64_t imm_idx); should allow the tuple size of the input type to be optional.

Add Apple M4 host detection, which fixes rust-lang/rust#133414. Also add support for older ARM families (this is likely never going to get used, since only macOS is officially supported as host OS, but nice to have for completeness sake). Error handling (checking `CPUFAMILY_UNKNOWN`) is also included here. Finally, add links to extra documentation to make it easier for others to update this in the future. NOTE: These values are taken from `mach/machine.h` the Xcode 16.2 SDK, and has been confirmed on an M4 Max in rust-lang/rust#133414 (comment).

… -1, MASK). (llvm#123115) Co-authored-by: Brandon Wu <brandon.wu@sifive.com>

…23109) Some of this was needed to fix implicit conversions from MCRegister to unsigned when calling getReg() on MCOperand for example. The majority was done by reviewing parts of the code that dealt with registers, converting them to MCRegister and then seeing what new implicit conversions were created and fixing those. There were a few places where I used MCPhysReg instead of MCRegiser for static arrays since its uint16_t instead of unsigned.

Note that PointerUnion::dyn_cast has been soft deprecated in PointerUnion.h: // FIXME: Replace the uses of is(), get() and dyn_cast() with // isa<T>, cast<T> and the llvm::dyn_cast<T> Literal migration would result in dyn_cast_if_present (see the definition of PointerUnion::dyn_cast), but this patch uses dyn_cast because we expect IntegerType to be nonnull.

Note that PointerUnion::dyn_cast has been soft deprecated in PointerUnion.h: // FIXME: Replace the uses of is(), get() and dyn_cast() with // isa<T>, cast<T> and the llvm::dyn_cast<T> Literal migration would result in dyn_cast_if_present (see the definition of PointerUnion::dyn_cast), but this patch uses dyn_cast because we expect Data to be nonnull.

Note that PointerUnion::dyn_cast has been soft deprecated in PointerUnion.h: // FIXME: Replace the uses of is(), get() and dyn_cast() with // isa<T>, cast<T> and the llvm::dyn_cast<T> Literal migration would result in dyn_cast_if_present (see the definition of PointerUnion::dyn_cast), but this patch uses dyn_cast because we expect AP to be nonnull.

A canonicalized pack indexing should refer to a canonicalized pattern Fixes llvm#123033

… DSA list" (llvm#123220) Reverts llvm#121028 Reverting due to CI failure (https://lab.llvm.org/buildbot/#/builders/89/builds/14474)

This patch uses new FMF interfaces introduced by llvm#121657 to simplify existing code with `andIRFlags` and `copyFastMathFlags`.

xxHash, inferior to xxh3, is discouraged. We try not to use xxhash in lld. Switch to read32le for content hash and xxh3/stable_hash_combine for relocation hash. Remove the intermediate std::string for relocation hash. Change the tail hashing scheme to consider individual bytes instead. This helps group 0102 and 0201 together. The benefit is negligible, though. Pull Request: llvm#121729

…e(VAL, NEW_ADDR, -1, MASK) (llvm#123123) Co-authored-by: Brandon Wu <brandon.wu@sifive.com>

) Because `c_devptr` has a `c_ptr` field, any assignment were done via the Assign runtime function. This leads to stack overflow on the device and taking too much memory. As we know the c_devptr can be directly copied on assignment, make it a special case.

If the operands of a CmpInst are constants then it gets folded into a constant. Therefore CmpInst::create() should return a Value*, not a Constant* and should handle the creation of the constant correctly.

This patch implements a helper ShuffleMask data structure that helps describe shuffles of elements across lanes.

Some tools (e.g. Rust tooling) produce element segment descriptors with neither elemkind or element type descriptors, but with init exprs instead of func indices (this is with the flags value of 4 in https://webassembly.github.io/spec/core/binary/modules.html#element-section). LLVM doesn't fully model reference types or the various ways to initialize element segments, but we do want to correctly parse and skip over all type sections, so this change updates the object parser to handle that case, and refactors for more clarity. The test file is updated to include one additional elem segment with a flags value of 4, an initializer value of (32.const 0) and an empty vector. Also support parsing files that export imported (undefined) functions.

llvm#123229) print-after-all is useful for diffing IR between two passes. When one of the two is a function pass, and the other is a loop pass, the diff becomes useless. Add an option which prints the entire function for loop passes.

…lvm#122443) Add TreeEntry::hasState. Add assert for getTreeEntry. Remove the OpValue parameter from the canReuseExtract function. Remove the Opcode parameter from the ComputeMaxBitWidth lambda function.

…#123415) Given the comment, I'd expected test coverage. There was none so let's do the simple thing which benefits the one thing we have tests for.

…23279) Fixes llvm#123179.

The test checks specific compiler version to determine the output. However, the compiler version string is always set to 15.0.0 for our local build. Remove this check and use regex match instead. ## Test Plan ``` ./bin/llvm-lit -sva /home/wanyi/llvm-sand/external/llvm-project/lldb/test/API/commands/expression/import-std-module/vector-of-vectors/TestVectorOfVectorsFromStdModule.py ... Skipping the following test categories: ['dsym', 'gmodules', 'debugserver', 'objc'] -- Command Output (stderr): -- UNSUPPORTED: LLDB (/home/wanyi/llvm-sand/build/Release+Distribution/fbcode-x86_64/toolchain/bin/clang-x86_64) :: test_dsym (TestVectorOfVectorsFromStdModule.TestVectorOfVectors) (test case does not fall in any category of interest for this run) PASS: LLDB (/home/wanyi/llvm-sand/build/Release+Distribution/fbcode-x86_64/toolchain/bin/clang-x86_64) :: test_dwarf (TestVectorOfVectorsFromStdModule.TestVectorOfVectors) PASS: LLDB (/home/wanyi/llvm-sand/build/Release+Distribution/fbcode-x86_64/toolchain/bin/clang-x86_64) :: test_dwo (TestVectorOfVectorsFromStdModule.TestVectorOfVectors) ---------------------------------------------------------------------- Ran 3 tests in 4.636s OK (skipped=1) -- ******************** Testing Time: 4.97s Total Discovered Tests: 1 Passed: 1 (100.00%) ```

…lvm#122282) Fixes llvm#106228.

… runner (llvm#122920)" Revert as this caused LIT test to fail, due to some passes not being registered This reverts commit 7402521.

[AutoBump] Merge with eff6b64 (Jan 17) (50)

[AutoBump] Merge with 1181921 (Jan 17) (48)

[AutoBump] Merge with fixes of 0bd0765 (Jan 17) (49) [Only tested MLIR]

[AutoBump] Merge with e240261 (Jan 17) (52)

[AutoBump] Merge with fixes of d28a4f1 (Jan 17) (51) [Only tested MLIR]

[AutoBump] Merge with fixes of f9a8006 (Jan 15) (47) [Only tested MLIR]

LewisCrawford and others added 30 commits January 16, 2025 14:38

[NVPTX] Constant fold NVVM fmin and fmax (llvm#121966)

cea9244

Add constant-folding for nvvm float/double fmin + fmax intrinsics, including all combinations of xorsign.abs, nan-propagation, and ftz.

[FMV][AArch64][clang] Advance __FUNCTION_MULTI_VERSIONING_SUPPORT_LEV…

9033e0c

…EL to ACLE Q3 (llvm#123056)

[gn build] Port 2c75bda

25e5eb1

[gn build] Port 8fb29ba

da5ec78

[libc++] Clarify the release note for uncaught_exception removal and …

df3ba91

…deprecation (llvm#123118) The release note did not clearly mention that std::uncaught_exception had been removed in C++20.

[libc++][Android] XFAIL some tests for mblen/towctrans/wctrans (llvm#…

c281b12

…116147) These functions weren't added until API 26 (Android 8.0), but libc++ is supported for API 21 and up. These APIs are undeclared as of r.android.com/3216959.

[RISCV] Fold vp.reverse(vp.load(ADDR, MASK)) -> vp.strided.load(ADDR,…

fc7a1ed

… -1, MASK). (llvm#123115) Co-authored-by: Brandon Wu <brandon.wu@sifive.com>

[Analysis] Avoid repeated hash lookups (NFC) (llvm#123159)

5fa989b

[CodeGen] Avoid repeated hash lookups (NFC) (llvm#123160)

09bf5b0

[Clang] Fix canonicalization of pack indexing types (llvm#123209)

b311ab0

A canonicalized pack indexing should refer to a canonicalized pattern Fixes llvm#123033

Revert "[Flang OpenMP] Add semantics checks for cray pointer usage in…

ebc7efb

… DSA list" (llvm#123220) Reverts llvm#121028 Reverting due to CI failure (https://lab.llvm.org/buildbot/#/builders/89/builds/14474)

[FileCheck] Remove unneeded unique_ptr. NFC. (llvm#123216)

c10e826

[InstCombine] Simplify FMF propagation. NFC. (llvm#121899)

94fee13

This patch uses new FMF interfaces introduced by llvm#121657 to simplify existing code with `andIRFlags` and `copyFastMathFlags`.

topperc and others added 14 commits January 17, 2025 14:22

[RISCV] Fold vp.store(vp.reverse(VAL), ADDR, MASK) -> vp.strided.stor…

0c6e03e

…e(VAL, NEW_ADDR, -1, MASK) (llvm#123123) Co-authored-by: Brandon Wu <brandon.wu@sifive.com>

[SandboxIR] Fix CmpInst::create() when it gets folded (llvm#123408)

22d4ff1

If the operands of a CmpInst are constants then it gets folded into a constant. Therefore CmpInst::create() should return a Value*, not a Constant* and should handle the creation of the constant correctly.

[SandboxVec][Legality] Implement ShuffleMask (llvm#123404)

87e4b68

This patch implements a helper ShuffleMask data structure that helps describe shuffles of elements across lanes.

[SLP] Replace MainOp and AltOp in TreeEntry with InstructionsState. (l…

07d4965

…lvm#122443) Add TreeEntry::hasState. Add assert for getTreeEntry. Remove the OpValue parameter from the canReuseExtract function. Remove the Opcode parameter from the ComputeMaxBitWidth lambda function.

[RISCV] Consider only legally typed splats to be legal shuffles (llvm…

143c33c

…#123415) Given the comment, I'd expected test coverage. There was none so let's do the simple thing which benefits the one thing we have tests for.

[clang-format] Correctly annotate braces in macro definitions (llvm#1…

a7bca18

…23279) Fixes llvm#123179.

[clang-format] Fix option BreakBinaryOperations for operator >> (l…

e240261

…lvm#122282) Fixes llvm#106228.

Revert "[mlir-cpu-runner] Pass --exclude-libs to linker when building…

7a03692

… runner (llvm#122920)" Revert as this caused LIT test to fail, due to some passes not being registered This reverts commit 7402521.

[AutoBump] Merge with 0195ec4 (Jan 15)

bced9b4

[AutoBump] Merge with fixes of f9a8006 (Jan 15)

f2e5af0

jorickert force-pushed the bump_to_74025216 branch from 7a03692 to 528b284 Compare March 20, 2025 07:22

jorickert and others added 11 commits March 20, 2025 01:46

[AutoBump] Merge with 1181921 (Jan 17)

cb34987

[AutoBump] Merge with fixes of 0bd0765 (Jan 17)

2836d49

[AutoBump] Merge with eff6b64 (Jan 17)

ca75a9f

[AutoBump] Merge with fixes of d28a4f1 (Jan 17)

ef04755

[AutoBump] Merge with e240261 (Jan 17)

c11d5c0

Merge pull request #521 from Xilinx/bump_to_eff6b642

0f6f0b4

[AutoBump] Merge with eff6b64 (Jan 17) (50)

Merge pull request #519 from Xilinx/bump_to_11819214

7f52cec

[AutoBump] Merge with 1181921 (Jan 17) (48)

Merge pull request #520 from Xilinx/bump_to_0bd07652

9d07d6b

[AutoBump] Merge with fixes of 0bd0765 (Jan 17) (49) [Only tested MLIR]

Merge pull request #523 from Xilinx/bump_to_e2402615

161b5c6

[AutoBump] Merge with e240261 (Jan 17) (52)

Merge pull request #522 from Xilinx/bump_to_d28a4f1f

8ac91d4

[AutoBump] Merge with fixes of d28a4f1 (Jan 17) (51) [Only tested MLIR]

Merge pull request #518 from Xilinx/bump_to_f9a80062

4ce17f1

[AutoBump] Merge with fixes of f9a8006 (Jan 15) (47) [Only tested MLIR]

Base automatically changed from bump_to_74025216 to bump_to_1b2c8f10 April 14, 2025 08:20

Merge branch 'bump_to_1b2c8f10' into bump_to_0195ec45

81e973d

jorickert merged commit f93946e into bump_to_1b2c8f10 Apr 14, 2025
4 checks passed

jorickert deleted the bump_to_0195ec45 branch April 14, 2025 08:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AutoBump] Merge with 0195ec45 (Jan 15) (46) #517

[AutoBump] Merge with 0195ec45 (Jan 15) (46) #517

Uh oh!

jorickert commented Mar 19, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

113 participants

[AutoBump] Merge with 0195ec45 (Jan 15) (46) #517

[AutoBump] Merge with 0195ec45 (Jan 15) (46) #517

Uh oh!

Conversation

jorickert commented Mar 19, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

113 participants