forked from llvm/llvm-project
-
Notifications
You must be signed in to change notification settings - Fork 4
[AutoBump] Merge with 0195ec45 (Jan 15) (46) #517
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add constant-folding for nvvm float/double fmin + fmax intrinsics, including all combinations of xorsign.abs, nan-propagation, and ftz.
…lvm#122471) This enable delayed privatization by default for `omp.wsloop` ops, with one caveat! I had to workaround the "impure" alloc region issue that being resolved at the moment. The workaround detects whether the alloc region's argument is used in the region and at the same time defined in block that does not dominate the chosen alloca insertion point. If so, we move the alloca insertion point below the defining instruction of the alloc region argument. This basically reverts to the non-delayed-privatizaiton behavior.
SM80 has fma for bfloat16 but not add/mul/sub. Currently these ops incur a promotion to f32, but we can avoid this by writing them in terms of the fma: ``` FADD(a, b) -> FMA(a, 1.0, b) FMUL(a, b) -> FMA(a, b, -0.0) FSUB(a, b) -> FMA(b, -1.0, a) ``` Unfortunately there is no `fma.ftz` so when ftz is enabled, we still fall back to promotion.
For .wv widening instructions when checking if the opperand is vs1 or vs2, we take into account whether or not it has a passthru. For tied pseudos though their passthru is the vs2, and we weren't taking this into account.
…ed in CallLowering. (llvm#122853) For "returned" attribute arguments, the physical register is really a virtual register which shouldn't be stored in an MCRegister. This patch moves the conversion from Register to MCRegister into the derived classes of IncomingArgHandler. The derived class ReturnedArgCallReturnHandler does not use the register so no MCRegister is created in that case. The function and argument have been renamed to remove "Phys".
Checking the remark message if interchange did or didn't happen is more straight forward than the full IR for these cases. This comment was also made when I moved some tests away from relying on debug builds in change llvm#116780, and this is a prep step for llvm#119345 that is going to change these test cases.
…python_extension` (llvm#122865) This PR allows the users to specify the `NB_DOMAIN` for `add_mlir_python_extension`. This allows users to avoid nanobind domain conflicts, when python bindings from multiple `mlir` projects were imported. (https://nanobind.readthedocs.io/en/latest/faq.html#how-can-i-avoid-conflicts-with-other-projects-using-nanobind)
…deprecation (llvm#123118) The release note did not clearly mention that std::uncaught_exception had been removed in C++20.
…116147) These functions weren't added until API 26 (Android 8.0), but libc++ is supported for API 21 and up. These APIs are undeclared as of r.android.com/3216959.
llvm#108961) Missing information about begin and end pointers of std::vector can lead to missed optimizations in LLVM. This patch adds alignment assumptions at the point where the begin and end pointers are loaded. If the pointers would not have the same alignment, end might never get hit when incrementing begin. See llvm#101372 for a discussion of missed range check optimizations in hardened mode. Once llvm#108958 lands, the created `llvm.assume` calls for the alignment should be folded into the `load` instructions, resulting in no extra instructions after InstCombine. Co-authored-by: Louis Dionne <ldionne.2@gmail.com>
This restores the functionality of AsmPrinterHandlers to what it was prior to llvm#96785. The attempted hack there of adding a duplicate DebugHandlerBase handling added a lot of hidden state and assumptions, which just segfaulted when we tried to continuing using this API. Instead, this just goes back to the old design, but adds a separate array for the basic EH handles. The duplicate array is identical to the other array of handler, but which doesn't get their begin/endInstruction callbacks called. This still saves the negligible but measurable amount of virtual function calls as was the goal of llvm#96785, while restoring the API to the pre-LLVM-19 status quo.
…llvm#121806) Since there are no opcodes for atomic loads and stores comparing to SelectionDAG, we add `CheckMMOIsNonAtomic` predicate immediately after the opcode predicate to make a logical combination of them. Otherwise when `IPM_AtomicOrderingMMO` is inserted after `IPM_GenericPredicate`, the patterns without predicates get a higher priority as `IPM_AtomicOrderingMMO` has higher priority than `IPM_GenericPredicate`. This is important to preserve an order of aligned/unaligned patterns on X86 because aligned memory operations have an additional alignment predicate and should be checked first according to their placement in td file. Closes llvm#121446
…lvm#123197) The svluti4_lane intrinsic currently requires the tuple size to be specified in the intrinsic name when using a tuple type input. According to the ACLE specification, the svluti4_lane intrinsic with a tuple type input, such as: svint16_t svluti4_lane[_s16_x2(svint16x2_t table, svuint8_t indices, uint64_t imm_idx); should allow the tuple size of the input type to be optional.
Add Apple M4 host detection, which fixes rust-lang/rust#133414. Also add support for older ARM families (this is likely never going to get used, since only macOS is officially supported as host OS, but nice to have for completeness sake). Error handling (checking `CPUFAMILY_UNKNOWN`) is also included here. Finally, add links to extra documentation to make it easier for others to update this in the future. NOTE: These values are taken from `mach/machine.h` the Xcode 16.2 SDK, and has been confirmed on an M4 Max in rust-lang/rust#133414 (comment).
… -1, MASK). (llvm#123115) Co-authored-by: Brandon Wu <brandon.wu@sifive.com>
…23109) Some of this was needed to fix implicit conversions from MCRegister to unsigned when calling getReg() on MCOperand for example. The majority was done by reviewing parts of the code that dealt with registers, converting them to MCRegister and then seeing what new implicit conversions were created and fixing those. There were a few places where I used MCPhysReg instead of MCRegiser for static arrays since its uint16_t instead of unsigned.
Note that PointerUnion::dyn_cast has been soft deprecated in PointerUnion.h: // FIXME: Replace the uses of is(), get() and dyn_cast() with // isa<T>, cast<T> and the llvm::dyn_cast<T> Literal migration would result in dyn_cast_if_present (see the definition of PointerUnion::dyn_cast), but this patch uses dyn_cast because we expect IntegerType to be nonnull.
Note that PointerUnion::dyn_cast has been soft deprecated in PointerUnion.h: // FIXME: Replace the uses of is(), get() and dyn_cast() with // isa<T>, cast<T> and the llvm::dyn_cast<T> Literal migration would result in dyn_cast_if_present (see the definition of PointerUnion::dyn_cast), but this patch uses dyn_cast because we expect Data to be nonnull.
Note that PointerUnion::dyn_cast has been soft deprecated in PointerUnion.h: // FIXME: Replace the uses of is(), get() and dyn_cast() with // isa<T>, cast<T> and the llvm::dyn_cast<T> Literal migration would result in dyn_cast_if_present (see the definition of PointerUnion::dyn_cast), but this patch uses dyn_cast because we expect AP to be nonnull.
A canonicalized pack indexing should refer to a canonicalized pattern Fixes llvm#123033
… DSA list" (llvm#123220) Reverts llvm#121028 Reverting due to CI failure (https://lab.llvm.org/buildbot/#/builders/89/builds/14474)
This patch uses new FMF interfaces introduced by llvm#121657 to simplify existing code with `andIRFlags` and `copyFastMathFlags`.
xxHash, inferior to xxh3, is discouraged. We try not to use xxhash in lld. Switch to read32le for content hash and xxh3/stable_hash_combine for relocation hash. Remove the intermediate std::string for relocation hash. Change the tail hashing scheme to consider individual bytes instead. This helps group 0102 and 0201 together. The benefit is negligible, though. Pull Request: llvm#121729
…e(VAL, NEW_ADDR, -1, MASK) (llvm#123123) Co-authored-by: Brandon Wu <brandon.wu@sifive.com>
If the operands of a CmpInst are constants then it gets folded into a constant. Therefore CmpInst::create() should return a Value*, not a Constant* and should handle the creation of the constant correctly.
This patch implements a helper ShuffleMask data structure that helps describe shuffles of elements across lanes.
Some tools (e.g. Rust tooling) produce element segment descriptors with neither elemkind or element type descriptors, but with init exprs instead of func indices (this is with the flags value of 4 in https://webassembly.github.io/spec/core/binary/modules.html#element-section). LLVM doesn't fully model reference types or the various ways to initialize element segments, but we do want to correctly parse and skip over all type sections, so this change updates the object parser to handle that case, and refactors for more clarity. The test file is updated to include one additional elem segment with a flags value of 4, an initializer value of (32.const 0) and an empty vector. Also support parsing files that export imported (undefined) functions.
llvm#123229) print-after-all is useful for diffing IR between two passes. When one of the two is a function pass, and the other is a loop pass, the diff becomes useless. Add an option which prints the entire function for loop passes.
…lvm#122443) Add TreeEntry::hasState. Add assert for getTreeEntry. Remove the OpValue parameter from the canReuseExtract function. Remove the Opcode parameter from the ComputeMaxBitWidth lambda function.
…#123415) Given the comment, I'd expected test coverage. There was none so let's do the simple thing which benefits the one thing we have tests for.
The test checks specific compiler version to determine the output. However, the compiler version string is always set to 15.0.0 for our local build. Remove this check and use regex match instead. ## Test Plan ``` ./bin/llvm-lit -sva /home/wanyi/llvm-sand/external/llvm-project/lldb/test/API/commands/expression/import-std-module/vector-of-vectors/TestVectorOfVectorsFromStdModule.py ... Skipping the following test categories: ['dsym', 'gmodules', 'debugserver', 'objc'] -- Command Output (stderr): -- UNSUPPORTED: LLDB (/home/wanyi/llvm-sand/build/Release+Distribution/fbcode-x86_64/toolchain/bin/clang-x86_64) :: test_dsym (TestVectorOfVectorsFromStdModule.TestVectorOfVectors) (test case does not fall in any category of interest for this run) PASS: LLDB (/home/wanyi/llvm-sand/build/Release+Distribution/fbcode-x86_64/toolchain/bin/clang-x86_64) :: test_dwarf (TestVectorOfVectorsFromStdModule.TestVectorOfVectors) PASS: LLDB (/home/wanyi/llvm-sand/build/Release+Distribution/fbcode-x86_64/toolchain/bin/clang-x86_64) :: test_dwo (TestVectorOfVectorsFromStdModule.TestVectorOfVectors) ---------------------------------------------------------------------- Ran 3 tests in 4.636s OK (skipped=1) -- ******************** Testing Time: 4.97s Total Discovered Tests: 1 Passed: 1 (100.00%) ```
… runner (llvm#122920)" Revert as this caused LIT test to fail, due to some passes not being registered This reverts commit 7402521.
7a03692 to
528b284
Compare
[AutoBump] Merge with eff6b64 (Jan 17) (50)
[AutoBump] Merge with 1181921 (Jan 17) (48)
[AutoBump] Merge with fixes of 0bd0765 (Jan 17) (49) [Only tested MLIR]
[AutoBump] Merge with e240261 (Jan 17) (52)
[AutoBump] Merge with fixes of d28a4f1 (Jan 17) (51) [Only tested MLIR]
[AutoBump] Merge with fixes of f9a8006 (Jan 15) (47) [Only tested MLIR]
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.