Skip to content

Conversation

@jorickert
Copy link

No description provided.

LewisCrawford and others added 30 commits January 16, 2025 14:38
Add constant-folding for nvvm float/double fmin + fmax intrinsics,
including all combinations of xorsign.abs, nan-propagation, and ftz.
…lvm#122471)

This enable delayed privatization by default for `omp.wsloop` ops, with
one caveat! I had to workaround the "impure" alloc region issue that
being resolved at the moment. The workaround detects whether the alloc
region's argument is used in the region and at the same time defined in
block that does not dominate the chosen alloca insertion point. If so,
we move the alloca insertion point below the defining instruction of the
alloc region argument. This basically reverts to the
non-delayed-privatizaiton behavior.
SM80 has fma for bfloat16 but not add/mul/sub. Currently these ops incur
a promotion to f32, but we can avoid this by writing them in terms of
the fma:
```
FADD(a, b) -> FMA(a, 1.0, b)
FMUL(a, b) -> FMA(a, b, -0.0)
FSUB(a, b) -> FMA(b, -1.0, a)
```

Unfortunately there is no `fma.ftz` so when ftz is enabled, we still
fall back to promotion.
For .wv widening instructions when checking if the opperand is vs1 or
vs2, we take into account whether or not it has a passthru. For tied
pseudos though their passthru is the vs2, and we weren't taking this
into account.
…ed in CallLowering. (llvm#122853)

For "returned" attribute arguments, the physical register is really a
virtual register which shouldn't be stored in an MCRegister. This
patch moves the conversion from Register to MCRegister into the derived
classes of IncomingArgHandler. The derived class
ReturnedArgCallReturnHandler
does not use the register so no MCRegister is created in that case.
The function and argument have been renamed to remove "Phys".
Checking the remark message if interchange did or didn't happen is more
straight forward than the full IR for these cases. This comment was also
made when I moved some tests away from relying on debug builds in change
llvm#116780, and this is a prep step for llvm#119345 that is going to change
these test cases.
…python_extension` (llvm#122865)

This PR allows the users to specify the `NB_DOMAIN` for
`add_mlir_python_extension`. This allows users to avoid nanobind domain
conflicts, when python bindings from multiple `mlir` projects were
imported.
(https://nanobind.readthedocs.io/en/latest/faq.html#how-can-i-avoid-conflicts-with-other-projects-using-nanobind)
…deprecation (llvm#123118)

The release note did not clearly mention that std::uncaught_exception
had been removed in C++20.
…116147)

These functions weren't added until API 26 (Android 8.0), but libc++ is
supported for API 21 and up.

These APIs are undeclared as of r.android.com/3216959.
llvm#108961)

Missing information about begin and end pointers of std::vector can lead
to missed optimizations in LLVM.

This patch adds alignment assumptions at the point where the begin and
end pointers are loaded. If the pointers would not have the same
alignment, end might never get hit when incrementing begin.

See llvm#101372 for a discussion
of missed range check optimizations in hardened mode.

Once llvm#108958 lands, the created
`llvm.assume` calls for the alignment should be folded into the `load`
instructions, resulting in no extra instructions after InstCombine.

Co-authored-by: Louis Dionne <ldionne.2@gmail.com>
This restores the functionality of AsmPrinterHandlers to what it was
prior to llvm#96785. The attempted
hack there of adding a duplicate DebugHandlerBase handling added a lot
of hidden state and assumptions, which just segfaulted when we tried to
continuing using this API. Instead, this just goes back to the old
design, but adds a separate array for the basic EH handles. The
duplicate array is identical to the other array of handler, but which
doesn't get their begin/endInstruction callbacks called. This still
saves the negligible but measurable amount of virtual function calls as
was the goal of llvm#96785, while restoring the API to the pre-LLVM-19
status quo.
)

We have two copies of the same code in clang-tidy and
clang-reorder-fields, and those are extremenly similar to
`Lexer::findNextToken`, so just add an extra agument to the latter.

---------

Co-authored-by: cor3ntin <corentinjabot@gmail.com>
…llvm#121806)

Since there are no opcodes for atomic loads and stores comparing to
SelectionDAG, we add `CheckMMOIsNonAtomic` predicate immediately after
the opcode predicate to make a logical combination of them. Otherwise
when `IPM_AtomicOrderingMMO` is inserted after `IPM_GenericPredicate`,
the patterns without predicates get a higher priority as
`IPM_AtomicOrderingMMO` has higher priority than `IPM_GenericPredicate`.

This is important to preserve an order of aligned/unaligned patterns on
X86 because aligned memory operations have an additional alignment
predicate and should be checked first according to their placement in td
file.

Closes llvm#121446
…lvm#123197)

The svluti4_lane intrinsic currently requires the tuple size to be
specified in the intrinsic name when using a tuple type input.

According to the ACLE specification, the svluti4_lane intrinsic with a
tuple type input, such as:

svint16_t svluti4_lane[_s16_x2(svint16x2_t table, svuint8_t indices,
uint64_t imm_idx);

should allow the tuple size of the input type to be optional.
Add Apple M4 host detection, which fixes
rust-lang/rust#133414.

Also add support for older ARM families (this is likely never going to
get used, since only macOS is officially supported as host OS, but nice
to have for completeness sake). Error handling (checking
`CPUFAMILY_UNKNOWN`) is also included here.

Finally, add links to extra documentation to make it easier for others
to update this in the future.

NOTE: These values are taken from `mach/machine.h` the Xcode 16.2 SDK,
and has been confirmed on an M4 Max in
rust-lang/rust#133414 (comment).
… -1, MASK). (llvm#123115)

Co-authored-by: Brandon Wu <brandon.wu@sifive.com>
…23109)

Some of this was needed to fix implicit conversions from MCRegister to
unsigned when calling getReg() on MCOperand for example.

The majority was done by reviewing parts of the code that dealt with
registers, converting them to MCRegister and then seeing what new
implicit conversions were created and fixing those.

There were a few places where I used MCPhysReg instead of MCRegiser for
static arrays since its uint16_t instead of unsigned.
Note that PointerUnion::dyn_cast has been soft deprecated in
PointerUnion.h:

  // FIXME: Replace the uses of is(), get() and dyn_cast() with
  //        isa<T>, cast<T> and the llvm::dyn_cast<T>

Literal migration would result in dyn_cast_if_present (see the
definition of PointerUnion::dyn_cast), but this patch uses dyn_cast
because we expect IntegerType to be nonnull.
Note that PointerUnion::dyn_cast has been soft deprecated in
PointerUnion.h:

  // FIXME: Replace the uses of is(), get() and dyn_cast() with
  //        isa<T>, cast<T> and the llvm::dyn_cast<T>

Literal migration would result in dyn_cast_if_present (see the
definition of PointerUnion::dyn_cast), but this patch uses dyn_cast
because we expect Data to be nonnull.
Note that PointerUnion::dyn_cast has been soft deprecated in
PointerUnion.h:

  // FIXME: Replace the uses of is(), get() and dyn_cast() with
  //        isa<T>, cast<T> and the llvm::dyn_cast<T>

Literal migration would result in dyn_cast_if_present (see the
definition of PointerUnion::dyn_cast), but this patch uses dyn_cast
because we expect AP to be nonnull.
A canonicalized pack indexing should refer to a canonicalized pattern

Fixes llvm#123033
This patch uses new FMF interfaces introduced by
llvm#121657 to simplify existing
code with `andIRFlags` and `copyFastMathFlags`.
xxHash, inferior to xxh3, is discouraged. We try not to use xxhash in
lld.

Switch to read32le for content hash and xxh3/stable_hash_combine for
relocation hash. Remove the intermediate std::string for relocation
hash.

Change the tail hashing scheme to consider individual bytes instead.
This helps group 0102 and 0201 together. The benefit is negligible,
though.

Pull Request: llvm#121729
topperc and others added 14 commits January 17, 2025 14:22
…e(VAL, NEW_ADDR, -1, MASK) (llvm#123123)

Co-authored-by: Brandon Wu <brandon.wu@sifive.com>
)

Because `c_devptr` has a `c_ptr` field, any assignment were done via the
Assign runtime function. This leads to stack overflow on the device and
taking too much memory. As we know the c_devptr can be directly copied
on assignment, make it a special case.
If the operands of a CmpInst are constants then it gets folded into a
constant. Therefore CmpInst::create() should return a Value*, not a
Constant* and should handle the creation of the constant correctly.
This patch implements a helper ShuffleMask data structure that helps
describe shuffles of elements across lanes.
Some tools (e.g. Rust tooling) produce element segment descriptors with
neither
elemkind or element type descriptors, but with init exprs instead of
func indices
(this is with the flags value of 4 in

https://webassembly.github.io/spec/core/binary/modules.html#element-section).
LLVM doesn't fully model reference types or the various ways to
initialize element
segments, but we do want to correctly parse and skip over all type
sections, so
this change updates the object parser to handle that case, and refactors
for more
clarity.

The test file is updated to include one additional elem segment with a
flags value
of 4, an initializer value of (32.const 0) and an empty vector. 

Also support parsing files that export imported (undefined) functions.
llvm#123229)

print-after-all is useful for diffing IR between two passes. When one of
the two is a function pass, and the other is a loop pass, the diff
becomes useless. Add an option which prints the entire function for loop
passes.
…lvm#122443)

Add TreeEntry::hasState.
Add assert for getTreeEntry.
Remove the OpValue parameter from the canReuseExtract function.
Remove the Opcode parameter from the ComputeMaxBitWidth lambda function.
…#123415)

Given the comment, I'd expected test coverage. There was none so let's
do the simple thing which benefits the one thing we have tests for.
The test checks specific compiler version to determine the output.
However, the compiler version string is always set to 15.0.0 for our
local build. Remove this check and use regex match instead.

## Test Plan
```
./bin/llvm-lit -sva /home/wanyi/llvm-sand/external/llvm-project/lldb/test/API/commands/expression/import-std-module/vector-of-vectors/TestVectorOfVectorsFromStdModule.py
...
Skipping the following test categories: ['dsym', 'gmodules', 'debugserver', 'objc']

--
Command Output (stderr):
--
UNSUPPORTED: LLDB (/home/wanyi/llvm-sand/build/Release+Distribution/fbcode-x86_64/toolchain/bin/clang-x86_64) :: test_dsym (TestVectorOfVectorsFromStdModule.TestVectorOfVectors) (test case does not fall in any category of interest for this run) 
PASS: LLDB (/home/wanyi/llvm-sand/build/Release+Distribution/fbcode-x86_64/toolchain/bin/clang-x86_64) :: test_dwarf (TestVectorOfVectorsFromStdModule.TestVectorOfVectors)
PASS: LLDB (/home/wanyi/llvm-sand/build/Release+Distribution/fbcode-x86_64/toolchain/bin/clang-x86_64) :: test_dwo (TestVectorOfVectorsFromStdModule.TestVectorOfVectors)
----------------------------------------------------------------------
Ran 3 tests in 4.636s

OK (skipped=1)

--

********************

Testing Time: 4.97s

Total Discovered Tests: 1
  Passed: 1 (100.00%)
```
… runner (llvm#122920)"

Revert as this caused LIT test to fail, due to some passes not being registered

This reverts commit 7402521.
Base automatically changed from bump_to_74025216 to bump_to_1b2c8f10 April 14, 2025 08:20
@jorickert jorickert merged commit f93946e into bump_to_1b2c8f10 Apr 14, 2025
4 checks passed
@jorickert jorickert deleted the bump_to_0195ec45 branch April 14, 2025 08:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.