[pull] main from llvm:main #5540

The tests were a bit of a mess -- the testing coverage wasn't bad but it was extremely difficult to see what was being tested and where. I split up the tests to make them easier to audit for completeness and did such an audit, adding a few missing tests (e.g. the conditional noexcept-ness of std::cbegin and std::cend). I also audited the synopsis and adjusted it where it needed to be adjusted. This patch is in preparation of fixing #67471.

…sts (#67559) We don't neeed to handle both spellings anymore since we don't support Clang 15 anymore.

#67630) The vectorization of the FindLastIV reduction does not depend on the nocapture and readonly attributes.

Avoid an unnecessary use of ConstantExpr::getZExt() when APInt::zext() is sufficient.

In preparation for removing these constant expressions.

This makes some tests robust against minor codegen differences that will be caused by PR #67038.

(This fails if the input is not writable)

We don't require a Constant here, so let IRBuilder fold this.

Let the IRBuilder constant fold instead.

From two aspects: - For function templates, emit additional template argument placeholders in the context where it can't be a call in order to specify an instantiation explicitly. - Consider expressions with base type specifier such as 'Derived().Base::foo^' a function call. Reviewed By: nridge Differential Revision: https://reviews.llvm.org/D156605

Use IRBuilder instead, which will either insert an instruction or constant fold.

Work on APInt instead.

Instead work on APInt.

This patch updates `transform.loop.peel` so that this Op returns two rather than one handle: * one for the peeled loop, and * one for the remainder loop. Also, following this change this Op will fail if peeling fails. This is consistent with other similar Ops that also fail if no transformation takes place. Relands #67482 with an extra fix for transform_loop_ext.py

…llvm.mlir test Fix mistyped syntax in omptarget-region-parallel-llvm.mlir test added by b05d436

This addresses missing cmake files needed to build some sub-projects like libstdcxx. Co-authored-by: René Rebe <rene@exactcode.de>

Define operations that wrap the gfx940's new operations for converting between f32 and registers containing packed sets of four 8-bit floats. Define rocdl operations for the intrinsics and an AMDGPU dialect wrapper around them (to account for the fact that MLIR distinguishes the two float formats at the type level but that the LLVM IR does not). Define an ArithToAMDGPU pass, meant to run before conversion to LLVM, that replaces relevant calls to arith.extf and arith.truncf with the packed operations in the AMDGPU dialect. Note that the conversion currently only handles scalars and vectors of rank <= 1, as we do not have a usecase for multi-dimensional vector support right now. Reviewed By: jsjodin Differential Revision: https://reviews.llvm.org/D152457

Use the constant folding API instead. In preparation for dropping zext constant expressions.

Summary: These wrapper headers need to work around things in the standard headers. The existing workarounds didn't correctly handle the macros for `iscascii` and `toascii`. Additionally, `memrchr` can't be used because it has a different declaration for C++ mode. Fix this so it can be compiled.

Add helpers getLosslessUnsignedTrunc/getLosslessSignedTrunc for this common pattern.

…7650) There is a crash before hitting the TODO when the length parameter kind depends on a KIND parameter. I do not want to fix it since I cannot test it because of the TODO, so I just moved to TODO up and added a comment.

Use the constant folding API instead, which should always succeed in this case.

Check the result of constant folding here, as I'm not confident that no constant expressions can make it in here.

Splitting up patches for #20571. I found these comments generally useful to add and not predicated on those changes. Hopefully they help future travelers.

… init anything (#67638) Close #56794 And see #67582 for a detailed backgrond for the issue. As required by the Itanium ABI, the module units have to generate the initialization function. However, the importers are allowed to elide the call to the initialization function if they are sure the initialization function doesn't do anything. This patch implemented this semantics.

Assumed shape array are using descriptor and must be handled differently than known shape arrays. This patch adds support to generate the `init` and `combiner` region for the reduction recipe operation with assumed shape array by using the descriptor and the HLFIR lowering path. `createTempFromMold` function is moved from `flang/lib/Optimizer/HLFIR/Transforms/BufferizeHLFIR.cpp` to `flang/include/flang/Optimizer/Builder/HLFIRTools.h` to be reused to create the private copy.

This patch adds the OutlineableOpenMPOpInterface to omp.target. This prevents other operations inside the target region such as WSLoop from hoisting new allocas outside the region.

Theses tests were part of https://reviews.llvm.org/D140184, which is no longer necessary but preserving the tests seems useful. Thanks to Richard Trieu for providing these tests and the work on this PR.

#67678) When a MODULE SUBROUTINE or MODULE FUNCTION is implemented in the same scope as its interface and appears in a generic with the same name, the parse::Name of the implementation was not correctly reset and remained the SubprogramNameDetails symbol after semantics, causing a crash in lowering that picks up the procedure symbols on the parser names. Reset the parser::Name symbol before the new symbol is created.

… with auto Reviewed By: tahonermann Differential Revision: https://reviews.llvm.org/D159474

This reverts commit 315a407. The new test added fails to link the unit tests correctly and breaks certain buildbots.

…ntally (#66935) HWAddressSanitizerPass::run sanitizes functions one by one. The sanitization of each function - which may split blocks via insertShadowTagCheck - may result in some cached analyses are invalid. This matters because sanitizeFunction(F', FAM) may indirectly call the global stack safety analysis, hence we need to make sure the analyses of F are up to date. Bug report: #66934

This is failing in a few buildbots to link due to missing references to undefined reference to llvm::Triple::Triple from DataLayoutTest_UEFI_Test. Attempt to fix them by adding the TargetParser lib for IR unit tests.

Summary: We use these image wrappers to do runtime specifica registration of variables and to load the device image that was compiled. This was intended to support multiple of these running at the same time, e.g. you can have a CUDA instance running with OpenMP and they should both function so long as you do not share state between the two. However, because we did not use a unique name for this file it would cause conflicts when included. This patch names the image based off of the language runtime it's using so that they remain separate. Fixes: #67583

…n PCM files (#67383) This patch adopts `FileEntryRef` in the `HeaderFileInfo`-writing part of `ASTWriter`. First, this patch removes the loop over `FileManager::VirtualFileEntries`. It's redundant, since all virtual file entries are also present in `SeenFileEntries` and thus already in `UIDToFiles`. Second, since we now no longer rely on `FileEntry::getLastRef()`/`FileEntry::getName()`, this patch takes care to establish which path gets used for each UID by picking the `FileEntryRef` with the most "`<`" name (instead of just relying on the `StringMap` iteration order). Note that which `FileEntry`/`FileEntryRef` objects we pick for each UID for serialization into the `llvm::OnDiskChainedHashTable` doesn't really matter. The hash function only includes the file size and modification time. The file name only plays role during resolution of hash collisions, in which case it goes through `FileManager` and resolves to a `FileEntry` that gets pointer-compared with the queried `FileEntry`. (Reincarnation of [D143414](https://reviews.llvm.org/D143414) and [D142780](https://reviews.llvm.org/D142780).)

Need to consider the length of the original vector for extractelements, not the length, matched number of the scalars. It fixes 2 issues: 1) improves cost estimation; 2) Fixes crashes after D158449.

…ted instructions in the prologue and the epilogue. (#66967) This fixes an error from checkARM64Instructions() in MCWin64EH.cpp.

Previously we would crash with an assertion failure (unreachable code) whenever we had an error in JITLink. Change this to use JITLink API correctly and let it print the error to output, so we can read and more easily diagnose what's happening. Before this patch: unexpected abandoned allocation UNREACHABLE executed at... After this patch: BOLT-ERROR: JITLink failed: In graph in-memory object file, section .local.foo: relocation target .text + 0x1 at address 0xa7c00000 is out of range of BranchPCRel32 fixup at 0x132d40f1 (bar, 0x132d40f0 + 0x1)

…d HAS_DEVICE_ADDR clauses on OMP TARGET directive and add more semantic checks for OMP TARGET. (#67290) Summary of this patch - Add semantic support for HAS_DEVICE_ADDR and IS_DEVICE_PTR clauses. - A list item that appears in an IS_DEVICE_PTR clause must be a valid device pointer for the device data environment. - A list item may not be specified in both an IS_DEVICE_PTR clause and a HAS_DEVICE_ADDR clauses on the directive. - A list item that appears in an IS_DEVICE_PTR or a HAS_DEVICE_ADDR clauses must not be specified in any data-sharing attribute clause on the same target construct.

Add the `BufferizableOpInterface` implementation of `scf.index_switch`.

The *Policy suffix came from the earlier MemAllocPolicy type, where it was included to distinguish the type from a memory-allocation operation. MemLifetime is a noun already, so the *Policy suffix is just dead weight now.

This was never doing anything ever since it was introduced.

This patch simplifies the overflow check of unsigned addition. `a + b <u a` implies `a + b <u b` `a + b >=u a` implies `a + b >=u b` Alive2: https://alive2.llvm.org/ce/z/H8oK8n Fixes #65863.

#67717) …semble Pack/Unpack are overridden in many other places, rename the operations to avoid confusion.

…Inst. Need to add NumSrcElts param to is..Mask functions in ShuffleVectorInstruction class for better mask analysis. Mask.size() not always matches the sizes of the permuted vector(s). Allows to better estimate the cost in SLP and fix uses of the functions in other cases. Differential Revision: https://reviews.llvm.org/D158449

This NFC change was reverted as part of 880fa7f, but the change is really good regardless of the associated Clang patch.

Including select builtin headers in system modules is a workaround for module cycles, primarily in Apple's Darwin module that includes all of its C standard library headers. The workaround is problematic because it doesn't include all of the builtin headers (inttypes.h is notably absent), and it also doesn't include C++ headers. The straightforward for for this is to make top level modules for all of the C standard library headers and unwind.h in C++, clang, and the OS. However, doing so in clang before the OS modules are ready re-introduces the module cycles. Add a -fbuiltin-headers-in-system-modules option to control if the special builtin headers belong to system modules or builtin modules. Pass the option by default for Apple. Reviewed By: ChuanqiXu, Bigcheese, benlangmuir Differential Revision: https://reviews.llvm.org/D159483

Adding support for X86_64 UEFI target to begin with. Reviewed By: phosek, MaskRay Differential Revision: https://reviews.llvm.org/D152206

… 0)` -> `(and (zext c), X)`; NFC

The middle end canonicalizes: `(and (zext c), X)` -> `(select c, (and X, 1), 0)` But the `and` + `zext` form gets better codegen.

…se (#67713) Add CSC, but also adds BSR as a future format. Coming soon!

#66740) This patch canonicalizes the pattern `and(zext(A), B)` into `select A, B & 1, 0`. Thus, we can reuse transforms `select B == even, B & 1, 0 -> 0` and `select B == odd, B & 1, 0 -> zext(B == odd)` in `InstCombine`. It is an alternative to #66676. Alive2: https://alive2.llvm.org/ce/z/598phE Fixes #66733. Fixes #66606. Fixes #28612.

…leVectorInst." This reverts commit c88c281 to fix the crash revealed by https://lab.llvm.org/buildbot/#/builders/230/builds/19353.

…+fuchsia This uses a custom size class map and primary allocator arena size that allows us to run all bringup tests on riscv64 with asan instrumentation reliabely. Differential Revision: https://reviews.llvm.org/D151157

…nstructor initializers (#66755) By default, OuterScope aligns lambdas to the beginning of the current line. This makes sense for most types of statements within code blocks but leads to unappealing and misleading indentation for lambdas within constructor initializers.

The new implementation was brought in with the gtest update in a866ce7, but it crashes when building with rpmalloc, see #65823 (comment) Comment out the new implementation basically gives us the code before the gtest update.

#67579) The LLVM implementation of DWARFDebugAbbrev does not have a way of listing all the DW_FORM values that have been parsed but are unsupported or otherwise unknown. AFAICT this functionality does not exist in LLVM at all. Since my primary goal is to unify the implementations and not judge the usefulness or completeness of this functionality, I decided to move it out of LLDB's implementation of DWARFDebugAbbrev for the time being.

#67514) CheckDefaultArgumentVisitor::Visit(...) assumes that the children of Expr will not be NULL. This is not a valid assumption and when we have a CXXFoldExpr the children can be NULL and this causes a crash. Fixes: #67395

) …ptzn. This changes performStackMoveOptzn to take a TypeSize instead of uint64_t to avoid an implicit conversion when called from processStoreOfLoad. performStackMoveOptzn has been updated to allow scalable types in the rest of its code.

My patch (b3b6ede) broke the build (https://lab.llvm.org/buildbot/#/builders/5/builds/37053) because it incorrectly assumed LoopInfo could not be null and used a reference. This fixes forward by replacing &LI with *LI.

… memref (#67714) The offset when converting type in emulating narrow types did not account for the offset in strided memrefs. This patch fixes this.

This patch adds correct support for the assumed shape arrays in the privatization recipes. This follows the same IR generation than in #67610.

Summary: Previously this test hanged indefinitely on NVPTX. This was due to an issue fixed previously where we would wait indefinitely inside the CUDA runtime waiting for the kernel to complete if it was blocked on the RPC server. This patch enables this test again now that it can run without deadlocking, at least on CUDA 12.2.

Fix: -amdgpu-disable-unclustred-high-rp-reschedule Now: -amdgpu-disable-unclustered-high-rp-reschedule

MLIR official build is not quite at 12.1 yet, so until then we protext the Bsr method with a macro guard

We used to update the deallocated block with atomic_compare_exchange_strong to ensure the concurrent double-free will be detected. However, this operation incurs huge performance overhead which takes over 50% execution time in deallocate(). Given that we already have the checksum to guard the most double-free cases and other block verifications in the primary allocator, use atomic-store instead.

…67719) This patch makes use of the bounds in the combiner region for known shape arrays. Until know the combiner region was iterating over the whole array. Lowerbound, upperbound and step are passed as block arguments after the two values. A follow up patch will make use of this information for the assumed shape arrays as well.

…ded for all rounding modes. (#67048) Implementing expm1 function for double precision based on exp function algorithm: - Reduced x = log2(e) * (hi + mid1 + mid2) + lo, where: * hi is an integer * mid1 * 2^-6 is an integer * mid2 * 2^-12 is an integer * |lo| < 2^-13 + 2^-30 - Then exp(x) - 1 = 2^hi * 2^mid1 * 2^mid2 * exp(lo) - 1 ~ 2^hi * (2^mid1 * 2^mid2 * (1 + lo * P(lo)) - 2^(-hi) ) - We evaluate fast pass with P(lo) is a degree-3 Taylor polynomial of (e^lo - 1) / lo in double precision - If the Ziv accuracy test fails, we use degree-6 Taylor polynomial of (e^lo - 1) / lo in double double precision - If the Ziv accuracy test still fails, we re-evaluate everything in 128-bit precision.

Example: ``` module types type t real,allocatable :: c end type t contains function h(x) class(t),allocatable :: h ... end function h subroutine test type(t),allocatable :: b(:) allocate(b(2),source=h(2.5)) end subroutine test7 end module type ``` `DoFromSourceAssign` creates two descriptors for initializing `b(1)` and `b(2)` from the result of `h`. This Create call creates a descriptor without properly initialized addendum, so the Assign just does shallow copies of the descriptor representing result of `h` into `b(1)` and `b(2)`. I modified Create code to properly establish the descriptor for derived type case. I had to keep the `addendum` argument to keep the testing in `flang/unittests/Runtime/TemporaryStack.cpp`.

This patch fixes: flang/lib/Lower/OpenACC.cpp:876:14: error: unused variable 'nbRangeArgs' [-Werror,-Wunused-variable]

This implements the [[msvc::no_unique_address]] attribute. There is not ABI compatibility in this patch because the attribute is relatively new and there's still some uncertainty in the MSVC version. The recommit changes the attribute definitions so that instead of making two separate attributes for no_unique_address and msvc::no_unique_address, it modifies the attributes tablegen emitter to allow spellings to be target-specific. This reverts commit 71f9e76.

…t()`

… inlining; NFC

@foo

Poison generating return attributes can't be propagated the same as others, as they can change the behavior of other uses and/or create UB where it otherwise wouldn't have occurred. For example: ``` define nonnull ptr @foo() { %p = call ptr @bar() call void @use(ptr %p) ret ptr %p } ``` If we inline `@foo` and propagate `nonnull` to `@bar`, it could change the behavior of `@use` as instead of taking `null`, `@use` will now be passed `poison`. This can be even worth in a case like: ``` define nonnull ptr @foo() { %p = call noundef ptr @bar() ret ptr %p } ``` Where propagating `nonnull` to `@bar` will cause UB on `null` return of `@bar` (`noundef` + `poison`) where it previously wouldn't have occurred. To fix this, we only propagate poison generating return attributes if either 1) The only use of the callsite to propagate too is return and the callsite to propagate too doesn't have `noundef`. Or 2) the callsite to be be inlined has `noundef`. The former case ensures no new UB or `poison` values will be added. The latter is UB anyways if the value is `poison` so we can go ahead without worrying about behavior changes.

This patch adds line numbers to perf jitdump records emitted by the PerfSupportPlugin, by parsing and using a DWARFContext from preserved debug sections. To avoid making the OrcJIT library depend on DebugInfoDWARF this patch introduces a new OrcDebugging library. Reviewed By: lhames Differential Revision: https://reviews.llvm.org/D146391

…rings (#67500) GPU targets can gather on non-default address spaces (e.g. global), so this removes the check for the default memory space.

The legalizer currently generates lots of G_AND artifacts. For example between boolean uses and defs there is always a G_AND with a mask of 1, but when the target uses ZeroOrOneBooleanContents, this is unnecessary. Currently these artifacts have to be removed using post-legalize combines. Omitting these artifacts at their source in the artifact combiner has a few advantages: - We know that the emitted G_AND is very likely to be useless, so our KnownBits call is likely worth it. - The G_AND and G_CONSTANT can interrupt e.g. G_UADDE/... sequences generated during legalization of wide adds which makes it harder to detect these sequences in the instruction selector (e.g. useful to prevent unnecessary reloading of AArch64 NZCV register). - This cleans up a lot of legalizer output and even improves compilation-times. AArch64 CTMark geomean: `O0` -5.6% size..text; `O0` and `O3` ~-0.9% compilation-time (instruction count). Since this introduces KnownBits into code-paths used by `O0`, I reduced the default recursion depth. This doesn't seem to make a difference in CTMark, but should prevent excessive recursive calls in the worst case. Reviewed By: aemerson Differential Revision: https://reviews.llvm.org/D159140

…67739) While a DecltypeType node itself is not uniqued, an instantiation dependent DecltypeType will have a DependentDecltypeType as an underlying type, which is uniqued. In that case, there can be non-identical non-sugar DecltypeTypes nodes which nonetheless represent the same type. Fixes #67603

/llvm-project/clang/lib/AST/ASTContext.cpp:12938:46: error: unused variable 'DY' [-Werror,-Wunused-variable] const auto *DX = cast<DecltypeType>(X), *DY = cast<DecltypeType>(Y); ^ 1 error generated.

…AND" This reverts commit 3686a0b. This seems to have broken some sanitizer tests: https://lab.llvm.org/buildbot/#/builders/184/builds/7721

…ith fixes. This re-applies db51e57, which was reverted in 05b1a2c due to bot failures. The DebuggerSupportPlugin now depends on DWARF, so it has been moved to the new OrcDebugging library (as has the enableDebuggerSupport API).

The OrcDebugging library depends on JITLink after b251897.

This revision ensures that unsuppoert DISubranges are properly skipped instead of being transformed into invalid metadata.

Use this flag to give more context to implicit def comments in assembly. Reviewed on phabricator: https://reviews.llvm.org/D153754

… (#67667) This reverts commit 0afbcb2.

) This is not standard but is vastly expected by existing code. This was implemented by https://reviews.llvm.org/D149877 for simple scalars, but MLIR lacked a generic way to deal with aggregate types (arrays and derived type). Support was recently added in #65508. Leverage it to zero initialize all types.

The operators are defined in DwarfDebug.cpp but are referenced in the struct definitions of FrameIndexExpr and EntryValueInfo in DwarfDebug.h, and since they weren't declared before, gcc warned with [694/5646] Building CXX object lib/CodeGen/AsmPrinter/CMakeFiles/LLVMAsmPrinter.dir/DwarfDebug.cpp.o ../lib/CodeGen/AsmPrinter/DwarfDebug.cpp:273:6: warning: 'bool llvm::operator<(const llvm::FrameIndexExpr&, const llvm::FrameIndexExpr&)' has not been declared within 'llvm' 273 | bool llvm::operator<(const FrameIndexExpr &LHS, const FrameIndexExpr &RHS) { | ^~~~ In file included from ../lib/CodeGen/AsmPrinter/DwarfDebug.cpp:13: ../lib/CodeGen/AsmPrinter/DwarfDebug.h:112:15: note: only here as a 'friend' 112 | friend bool operator<(const FrameIndexExpr &LHS, const FrameIndexExpr &RHS); | ^~~~~~~~ ../lib/CodeGen/AsmPrinter/DwarfDebug.cpp:278:6: warning: 'bool llvm::operator<(const llvm::EntryValueInfo&, const llvm::EntryValueInfo&)' has not been declared within 'llvm' 278 | bool llvm::operator<(const EntryValueInfo &LHS, const EntryValueInfo &RHS) { | ^~~~ In file included from ../lib/CodeGen/AsmPrinter/DwarfDebug.cpp:13: ../lib/CodeGen/AsmPrinter/DwarfDebug.h:121:15: note: only here as a 'friend' 121 | friend bool operator<(const EntryValueInfo &LHS, const EntryValueInfo &RHS); | ^~~~~~~~

Fixes #45339

We expand aarch64_neon_rshrn intrinsics to trunc(srl(add)), having tablegen patterns to combine the results back into rshrn. See D140297. Unfortunately, but perhaps not surprisingly, other combines can happen that prevent us converting back. For example sext(rshrn) becomes sext(trunc(srl(add))) which will turn into sext_inreg(srl(add))). This patch just prevents the expansion of rshrn intrinsics, reinstating the old tablegen patterns for selecting them. This should allow us to still regognize the rshrn instructions from trunc+shift+add, without performing any negative optimizations for the intrinsics. Closes #67451

Use the constant folding API instead. In the second case using IR builder should also work, but the way the instructions are created an inserted there is very unusual, so I've left it alone.

Use the constant folding API instead. One of these uses actually improves results, because the bitcast expression gets folded away.

There were a couple of issues with maintaining register def/uses held in `MachineRegisterInfo`: * when an operand is changed from one register to another, the corresponding instruction must already be inserted into the function, or MRI won't be updated * when traversing the set of all uses of a register, that set must not change

#67775) …" (#67523) Discussion in https://reviews.llvm.org/D153132. This reverts commit f703774.

gcc warned about it: ../lib/Transforms/Utils/ScalarEvolutionExpander.cpp: In lambda function: ../lib/Transforms/Utils/ScalarEvolutionExpander.cpp:2104:22: warning: unused variable 'ARPtrTy' [-Wunused-variable] 2104 | if (PointerType *ARPtrTy = dyn_cast<PointerType>(ARTy)) { | ^~~~~~~ Fix the warning by removing the variable and turn dyn_cast into isa.

The types should always match here. Possibly this is a leftover from pre-opaque-pointers times.

…er D158449." This caused asserts: Assertion failed: NumElts > 1 && "Expected at least 2-element fixed length vector(s).", file C:\b\s\w\ir\cache\builder\src\third_party\llvm\llvm\lib\Transforms\Vectorize\SLPVectorizer.cpp, line 7096 see comment on 59a67ea > Need to consider the length of the original vector for extractelements, > not the length, matched number of the scalars. It fixes 2 issues: 1) > improves cost estimation; 2) Fixes crashes after D158449. This reverts commit 59a67ea.

Add rpath for libc++ libraries in order to not specify rpath by user each time. Disable -frthlib-add-ppath by default for VE similar to other architectures. Update regression tests to check modifications.

Modernize affine dialect ops: Define LB, UB, step and inits as operands in TableGen.

gcc warned about it: [232/4788] Building CXX object lib/Transforms/IPO/CMakeFiles/LLVMipo.dir/AttributorAttributes.cpp.o ../lib/Transforms/IPO/AttributorAttributes.cpp: In lambda function: ../lib/Transforms/IPO/AttributorAttributes.cpp:12555:17: warning: unused variable 'SI' [-Wunused-variable] 12555 | if (auto *SI = dyn_cast<StoreInst>(Inst)) { | ^~ Fix the warning by removing the variable and turn dyn_cast into isa.

Fixed #67604.

Use the constant folding API instead.

Work on APInt instead.

Instead of ConstantExpr::getCast() with a fixed opcode, use the corresponding getXYZ methods instead. For the one place creating a pointer bitcast drop it entirely, as this is redundant with opaque pointers.

gcc warned with: [236/4788] Building CXX object lib/Analysis/CMakeFiles/LLVMAnalysis.dir/LazyValueInfo.cpp.o ../lib/Analysis/LazyValueInfo.cpp: In member function 'void llvm::LazyValueInfo::forgetValue(llvm::Value*)': ../lib/Analysis/LazyValueInfo.cpp:1978:13: warning: unused variable 'Impl' [-Wunused-variable] 1978 | if (auto *Impl = getImpl()) | ^~~~ ../lib/Analysis/LazyValueInfo.cpp: In member function 'void llvm::LazyValueInfo::eraseBlock(llvm::BasicBlock*)': ../lib/Analysis/LazyValueInfo.cpp:1983:13: warning: unused variable 'Impl' [-Wunused-variable] 1983 | if (auto *Impl = getImpl()) | ^~~~ ../lib/Analysis/LazyValueInfo.cpp: In member function 'void llvm::LazyValueInfo::clear()': ../lib/Analysis/LazyValueInfo.cpp:1988:13: warning: unused variable 'Impl' [-Wunused-variable] 1988 | if (auto *Impl = getImpl()) | ^~~~ ../lib/Analysis/LazyValueInfo.cpp: In member function 'void llvm::LazyValueInfo::printLVI(llvm::Function&, llvm::DominatorTree&, llvm::raw_ostream&)': ../lib/Analysis/LazyValueInfo.cpp:1993:13: warning: unused variable 'Impl' [-Wunused-variable] 1993 | if (auto *Impl = getImpl()) | ^~~~ Use the locals instead of calling getImpl() again.

Add a generalized getLosslessTrunc() helper to simplify this.

This adds a simple higher-level op for the tile slice to vector intrinsics (and updates the existing vector.print lowering to use it). This op will be used a few more times to implement vector.insert/extract lowerings in later patches.

This patch fixes an error where ASM with constraints cannot select SME instructions which use the top eight predicate-as-counter registers.

Call the constant folding API instead.

Use the constant folding API instead.

…rgets (#67461) In order to avoid duplicating every dpp pseudo opcode that has src1, we allow it for all opcodes and add manual checks on subtargets that do not support it.

gcc warned about it: ../lib/Transforms/Coroutines/CoroFrame.cpp:2785:15: warning: unused variable 'MD' [-Wunused-variable] 2785 | if (MDNode *MD = AI->getMetadata(LLVMContext::MD_coro_outside_frame)) | ^~ Fix the warning by removing the unused variable and change the call from getMetadata to hasMetadata.

/llvm-project/llvm/lib/Target/AMDGPU/GCNDPPCombine.cpp:194:17: error: unused function 'getOperandSize' [-Werror,-Wunused-function] static unsigned getOperandSize(MachineInstr &MI, unsigned Idx, ^ 1 error generated.

Migrate creation of most casts to use the FoldXYZ rather than CreateXYZ style APIs. This means that InstSimplifyFolder now works for these, which is what accounts for the AMDGPU test changes.

Without the fix gcc warned with ../lib/ObjCopy/ELF/ELFObjcopy.cpp: In function 'uint64_t getSectionFlagsPreserveMask(uint64_t, uint64_t, uint16_t)': ../lib/ObjCopy/ELF/ELFObjcopy.cpp:106:31: warning: enumeral and non-enumeral type in conditional expression [-Wextra] 106 | ~(EMachine == EM_X86_64 ? ELF::SHF_X86_64_LARGE : 0UL); | ~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~

...of guarded variables, when the function is not marked as requiring locks: ``` class Return { Mutex mu; Foo foo GUARDED_BY(mu); Foo &returns_ref_locked() { MutexLock lock(&mu); return foo; // BAD } Foo &returns_ref_locks_required() SHARED_LOCKS_REQUIRED(mu) { return foo; // OK } }; ``` Review on Phabricator: https://reviews.llvm.org/D153131

#67795) … (#67776)" This detects issues in `scudo`. Reverting until these are fixed. ``` /b/sanitizer-x86_64-linux-autoconf/build/llvm-project/compiler-rt/lib/scudo/standalone/tsd.h:74:12: error: returning variable 'QuarantineCache' by reference requires holding mutex 'Mutex' exclusively [-Werror,-Wthread-safety-reference] 74 | return QuarantineCache; | ^ /b/sanitizer-x86_64-linux-autoconf/build/llvm-project/compiler-rt/lib/scudo/standalone/combined.h:248:28: note: in instantiation of member function 'scudo::TSD<scudo::Allocator<scudo::DefaultConfig, &malloc_postinit>>::getQuarantineCache' requested here 248 | Quarantine.drain(&TSD->getQuarantineCache(), | ^ /b/sanitizer-x86_64-linux-autoconf/build/llvm-project/compiler-rt/lib/scudo/standalone/tsd.h:57:15: note: in instantiation of member function 'scudo::Allocator<scudo::DefaultConfig, &malloc_postinit>::commitBack' requested here 57 | Instance->commitBack(this); | ^ /b/sanitizer-x86_64-linux-autoconf/build/llvm-project/compiler-rt/lib/scudo/standalone/tsd_exclusive.h:172:27: note: in instantiation of member function 'scudo::TSD<scudo::Allocator<scudo::DefaultConfig, &malloc_postinit>>::commitBack' requested here 172 | TSDRegistryT::ThreadTSD.commitBack(Instance); | ^ /b/sanitizer-x86_64-linux-autoconf/build/llvm-project/compiler-rt/lib/scudo/standalone/tsd_exclusive.h:33:46: note: in instantiation of function template specialization 'scudo::teardownThread<scudo::Allocator<scudo::DefaultConfig, &malloc_postinit>>' requested here 33 | CHECK_EQ(pthread_key_create(&PThreadKey, teardownThread<Allocator>), 0); | ^ /b/sanitizer-x86_64-linux-autoconf/build/llvm-project/compiler-rt/lib/scudo/standalone/tsd_exclusive.h:42:5: note: in instantiation of member function 'scudo::TSDRegistryExT<scudo::Allocator<scudo::DefaultConfig, &malloc_postinit>>::init' requested here 42 | init(Instance); // Sets Initialized. | ^ /b/sanitizer-x86_64-linux-autoconf/build/llvm-project/compiler-rt/lib/scudo/standalone/tsd_exclusive.h:130:5: note: in instantiation of member function 'scudo::TSDRegistryExT<scudo::Allocator<scudo::DefaultConfig, &malloc_postinit>>::initOnceMaybe' requested here 130 | initOnceMaybe(Instance); | ^ /b/sanitizer-x86_64-linux-autoconf/build/llvm-project/compiler-rt/lib/scudo/standalone/tsd_exclusive.h:74:5: note: in instantiation of member function 'scudo::TSDRegistryExT<scudo::Allocator<scudo::DefaultConfig, &malloc_postinit>>::initThread' requested here 74 | initThread(Instance, MinimalInit); | ^ /b/sanitizer-x86_64-linux-autoconf/build/llvm-project/compiler-rt/lib/scudo/standalone/combined.h:221:17: note: in instantiation of member function 'scudo::TSDRegistryExT<scudo::Allocator<scudo::DefaultConfig, &malloc_postinit>>::initThreadMaybe' requested here 221 | TSDRegistry.initThreadMaybe(this, MinimalInit); | ^ /b/sanitizer-x86_64-linux-autoconf/build/llvm-project/compiler-rt/lib/scudo/standalone/combined.h:790:5: note: in instantiation of member function 'scudo::Allocator<scudo::DefaultConfig, &malloc_postinit>::initThreadMaybe' requested here 790 | initThreadMaybe(); | ^ /b/sanitizer-x86_64-linux-autoconf/build/llvm-project/compiler-rt/lib/scudo/standalone/wrappers_c.inc:36:25: note: in instantiation of member function 'scudo::Allocator<scudo::DefaultConfig, &malloc_postinit>::canReturnNull' requested here 36 | if (SCUDO_ALLOCATOR.canReturnNull()) { ``` This reverts commit 6dd96d6.

…exical block scopes (4/7)" This caused asserts: llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp:2331: virtual void llvm::DwarfDebug::endFunctionImpl(const llvm::MachineFunction *): Assertion `LScopes.getAbstractScopesList().size() == NumAbstractSubprograms && "getOrCreateAbstractScope() inserted an abstract subprogram scope"' failed. See comment on the code review for reproducer. > RFC https://discourse.llvm.org/t/rfc-dwarfdebug-fix-and-improve-handling-imported-entities-types-and-static-local-in-subprogram-and-lexical-block-scopes/68544 > > Similar to imported declarations, the patch tracks function-local types in > DISubprogram's 'retainedNodes' field. DwarfDebug is adjusted in accordance with > the aforementioned metadata change and provided a support of function-local > types scoped within a lexical block. > > The patch assumes that DICompileUnit's 'enums field' no longer tracks local > types and DwarfDebug would assert if any locally-scoped types get placed there. > > Reviewed By: jmmartinez > > Differential Revision: https://reviews.llvm.org/D144006 This reverts commit f8aab28.

Closes #67783.

…erloads that take `stop_token` - This is section 32.6.4 of P0660R10 - https://eel.is/c++draft/thread.condvarany.intwait Differential Revision: https://reviews.llvm.org/D153441

…oldICmpEquality This special case will be handled in foldICmpXorConstant later. See also commit e9cb50a.

Address comments in https://github.com/llvm/llvm-project/pull/67638/files#r1340342453 to rename the field variable.

… or PMF I just found that we didn't handle the imports in GMF of PMF when we're generating the init functions for the current module unit. This looks like a simple oversight and I'm going to fix that in this patch directly.

…ses. (#67674)

…al Cores on AIX (#67683) The threading library does not recognize AIX and always returns `-1` for number of physical cores on AIX. This PR teaches the library to recognize AIX and obtain the correct value for the number of physical cores.

…re-commit CI (#67743) Since we moved to Github PRs, the workflow has changed a bit and folks often merge `main` back into their PR branch. This is fine, except the previous way of determining modified files for pre-commit CI would use the content modified just in the latest commit, whatever it is. This means that in case someone merged main back into their PR branch, we'd think that the files in the merge commit were modified by the PR, and we'd spuriously trigger a CI run. This should fix this issue. The downside is that the merge target is hardcoded to `main`, which might not always be what we want. I still think this is an improvement over the status quo.

…ctors of `vector` and `deque`." (#67753) This reverts commit 10edd5d and guards against older versions of GCC to work around the problem.

#66977) …oadcast functions

The file format on z/OS is called GOFF (Generalized Object File Format), not GCOFF.

#66930) At the moment, `hoistRedundantVectorTransfers` would hoist the `vector.transfer_read`/`vector.transfer_write` pair in this function: ```mlir func.func @no_hoisting_write_to_memref(%rhs: i32, %arg1: vector<1xi32>) { %c0_i32 = arith.constant 0 : i32 %c0 = arith.constant 0 : index %c1 = arith.constant 1 : index %c4 = arith.constant 4 : index %c20 = arith.constant 20 : index %alloca = memref.alloca() {alignment = 64 : i64} : memref<1x1x2xi32> %cast = memref.cast %alloca : memref<1x1x2xi32> to memref<1x1x2xi32> %collapsed_1 = memref.collapse_shape %alloca [[0, 1, 2]] : memref<1x1x2xi32> into memref<2xi32> scf.for %_ = %c0 to %c20 step %c4 { %collapsed_2 = memref.collapse_shape %alloca [[0, 1, 2]] : memref<1x1x2xi32> into memref<2xi32> %lhs = vector.transfer_read %collapsed_1[%c0], %c0_i32 {in_bounds = [true]} : memref<2xi32>, vector<1xi32> %acc = vector.transfer_read %collapsed_2[%c0], %c0_i32 {in_bounds = [true]} : memref<2xi32>, vector<1xi32> %op = vector.outerproduct %lhs, %rhs, %acc {kind = #vector.kind<add>} : vector<1xi32>, i32 vector.transfer_write %op, %collapsed_1[%c0] {in_bounds = [true]} : vector<1xi32>, memref<2xi32> } return } ``` as follows: ```mlir func.func @no_hoisting_write_to_memref(%arg0: i32, %arg1: vector<1xi32>) { %c0_i32 = arith.constant 0 : i32 %c0 = arith.constant 0 : index %c4 = arith.constant 4 : index %c20 = arith.constant 20 : index %alloca = memref.alloca() {alignment = 64 : i64} : memref<1x1x2xi32> %collapse_shape = memref.collapse_shape %alloca [[0, 1, 2]] : memref<1x1x2xi32> into memref<2xi32> %collapse_shape_0 = memref.collapse_shape %alloca [[0, 1, 2]] : memref<1x1x2xi32> into memref<2xi32> %0 = vector.transfer_read %collapse_shape[%c0], %c0_i32 {in_bounds = [true]} : memref<2xi32>, vector<1xi32> %1 = vector.transfer_read %collapse_shape_0[%c0], %c0_i32 {in_bounds = [true]} : memref<2xi32>, vector<1xi32> %2 = scf.for %arg2 = %c0 to %c20 step %c4 iter_args(%arg3 = %0) -> (vector<1xi32>) { %3 = vector.outerproduct %arg3, %arg0, %1 {kind = #vector.kind<add>} : vector<1xi32>, i32 scf.yield %3 : vector<1xi32> } vector.transfer_write %2, %collapse_shape[%c0] {in_bounds = [true]} : vector<1xi32>, memref<2xi32> return } ``` This is not safe. While one argument for `vector.outerproduct` (`%rhs` from the original loop) is correctly being forwarded via `iter_args`, the other one (`%acc` from the original loop) is not. This patch disables hoisting in cases where the source of "candidate" `vector.transfer_read` aliases with some other `memref`. A more generic approach would be to make sure that all values are correctly forwarded via `iter_args`, but that would require involving alias analysis. [1] Based on iree-org/iree#14994.

Need to consider the length of the original vector for extractelements, not the length, matched number of the scalars. It fixes 2 issues: 1) improves cost estimation; 2) Fixes crashes after D158449.

And adjust an existing test to not be a simple reduction to preserve test intent.

This patch adds the PNR_3b regclass for predicate-as-counter registers 0-7 and allows the Upl ASM constraint to use this register class.

It's subsumed by an order of magnitude more popular `vectorization` label that is applied for the same path patterns. Statistics (issues and PRs together): `vectorization`: 91 open, 91 closed `vectorizers`: 8 open, 5 closed All `vectorizers` usages has occurred in just the past 2 weeks, and likely by our bot.

Also default to disassembling a and m features Some code taken from https://reviews.llvm.org/D62732 , which hasn't been updated in a year. Tested with 32 and 64 bit Linux user space QEMU Reviewed By: jasonmolenda Differential Revision: https://reviews.llvm.org/D159101

This patch adds a flag to LLVM such that the output generated by the `-print-(before|after|all)` family of flags is written to files in a directory rather than to stderr. This new flag is `-ir-dump-directory` and is used to specify where to write the files. No other flags are added, it just modifies the behavior of the print flags. This is a second simplified version of the changes proposed in #65179. This patch only adds support for the new pass manager. If this patch is accepted, similar support can be added to the legacy pass manager. Co-authored-by: Nuri Amari <nuriamari@fb.com>

This is the `ASTReader` counterpart to PR #67383.

Differential Revision: https://reviews.llvm.org/D159550

…ternal compiler-rt FMV functions. The patch fixes Function Multi Versioning features detection by ifunc resolver on Android API levels < 30. Ifunc hwcaps parameters are not supported on Android API levels 23-29, so all CPU features are set unsupported if they were not initialized before ifunc resolver call. There is no support for ifunc on Android API levels < 23, so Function Multi Versioning is disabled in this case. Also use two underscore prefix for FMV runtime support functions to avoid conflict with user program ones. Differential Revision: https://reviews.llvm.org/D158641

Following on from D135150, this patch fixes another crash caused by this DAG combine: fadd (fma A, B, (fmul C, D)), E --> fma A, B, (fma C, D, E) The combine calls ReplaceAllUsesOfValueWith to replace (fmul C, D) with (fma C, D, E). This can cause nodes to get CSEd. In D135150 the problem was that the (fma C, D, E) node got CSEd away. In this new case, the problem is that the outer fadd node gets CSEd away. To fix it we have to return SDValue(N, 0) from the combine and be careful not to add a deleted node to the worklist.

…l loads/stores This fixes up the generation of 128bit atomic, volatile and non-temporal loads/stores, under the assumption that they should usually be the same as standard versions. https://godbolt.org/z/xxc89eMKE Fixes #64580 Closes #67413

…nts()`

This is probably a copy-and-paste error and the variable 'more' was left unused.

Added extra comment that should clarify the need for an insertion guard when using `getLoopOverTileSlices`. Also removed some redundant calls to `setInsertionPointAfter` - the insertion guard would overwrite that on destruction anyway.

#67167 reports a potential memory overflow caused by the wrong size passed to the function `memcpy_s`. This patch fixes it. Fix #67167.

They're still non-standard in C++17.

…6844) This patch starts the support for OpenMP kernel language, basically to write OpenMP target region in SIMT style, similar to kernel languages such as CUDA. What included in this first patch is the `ompx_bare` clause for `target teams` directive. When `ompx_bare` exists, globalization is disabled such that local variables will not be globalized. The runtime init/deinit function calls will not be emitted. That being said, almost all OpenMP executable directives are not supported in the region, such as parallel, task. This patch doesn't include the Sema checks for that, so the use of them is UB. Simple directives, such as atomic, can be used. We provide a set of APIs (for C, they are prefix with `ompx_`; for C++, they are in `ompx` namespace) to get thread id, block id, etc. For more details, you can refer to https://tianshilei.me/wp-content/uploads/llvm-hpc-2023.pdf.

This patch makes sure that everything is cleaned up properly when ExprConstant evaluates an ArrayInitLoopExpr. Fixes #57135

Similar to commit 806761a to avoid issues due to object file format differences. These tests are currently benign.

…rinting IR (#67730) These don't affect the IR, so we should ignore them.

…rFileOr{None,Fake}()`

LLDB_EXPORT_ALL_SYMBOLS is useful when building out-of-tree plugins and extensions that rely on LLDB's internal symbols. For example, this is how the Mojo language provides its REPL and debugger support. Supporting this on windows is kind of tricky because this is normally expected to be done using dllexport/dllimport, but lldb uses these with the public api. This PR takes an approach similar to what LLVM does with LLVM_EXPORT_SYMBOLS_FOR_PLUGINS, and what chromium does for [abseil](https://github.com/chromium/chromium/blob/253d14e20fdc0cab05e5516770dceca18f9bddaf/third_party/abseil-cpp/generate_def_files.py), and uses a python script to extract the necessary symbols by looking at the symbol table for the various lldb libraries.

Similar to 806761a

…dule_ctor (#67745) On ELF platforms, when there is no global variable, COMDAT asan.module_ctor is created with no `__asan_register_elf_globals` calls. If this COMDAT is the prevailing copy selected by the linker, the linkage unit will have no `__asan_register_elf_globals` call: the redzone will not be poisoned and ODR violation checker will not work (#67677). This behavior is benign for -fno-sanitize-address-globals-dead-stripping because asan.module_ctor functions that call `__asan_register_globals` (`InstrumentGlobalsWithMetadataArray`) do not use COMDAT. To fix #67677: * Use COMDAT for -fsanitize-address-globals-dead-stripping on ELF platforms. * Call `__asan_register_elf_globals` even if there is no global variable. Alternatively, when there is no global variable, asan.module_ctor is not COMDAT and does not call `__asan_register_elf_globals`. However, the asan.module_ctor function cannot be eliminated by the linker. Tested the following script. Only ELF -fsanitize-address-globals-dead-stripping has changed behaviors. ``` echo > a.cc # no global variable, empty uniqueModuleId echo 'void f() {}' > b.cc # with global variable, with uniqueModuleId echo 'int g;' > c.cc # with global variable for t in x86_64-linux-gnu arm64-apple-macosx x86_64-windows-msvc; do for gc in -f{,no-}sanitize-address-globals-dead-stripping; do for f in a.cc b.cc c.cc; do echo /tmp/Rel/bin/clang -S --target=$t -fsanitize=address $gc $f -o - /tmp/Rel/bin/clang -S --target=$t -fsanitize=address $gc $f -o - | sed -n '/asan.module_ctor/,/ret/p' done done done ```

-- SoftmaxOp's `reifyResultShapes` function was wrongly casting it as a `LinalgOp`. -- This commit thus adds a fix to SoftmaxOp's reify result shape calculation. Signed-off-by: Abhishek Varma <abhishek@nod-labs.com>

Summary: Previously this code was applied to the integration tests but did not copy the logic that stopped this from being passed to the GPU build. Copy the full line to avoid the warnings and prevent any libraries from being included.

…66647) Previously, bitwise shifts with constant operands were validated by the checker `core.UndefinedBinaryOperatorResult`. However, this logic was unreliable, and commit 25b9696 added the dedicated checker `core.BitwiseShift` which validated the preconditions of all bitwise shifts with a more accurate logic (that uses the real types from the AST instead of the unreliable type information encoded in `APSInt` objects). This commit disables the inaccurate logic that could mark bitwise shifts as 'undefined' and removes the redundant shift-related warning messages from core.UndefinedBinaryOperatorResult. The tests that were validating this logic are also deleted by this commit; but I verified that those testcases trigger the expected bug reports from `core.BitwiseShift`. (I didn't convert them to tests of `core.BitwiseShift`, because that checker already has its own extensive test suite with many analogous testcases.) I hope that there will be a time when the constant folding will be reliable, but until then we need hacky solutions like this improve the quality of results.

Summary: The NVPTX backend is picky about the definitions of functions. Because we call these functions with these arguments it can cause some problems when it goes through the backend. This was observed in a different test for `printf` that hasn't been landed yet. Also adjust the priority.

This change sets the debug compilation directory when generating debug information for PCH object containers. This allows for overriding the compilation directory in debug information in precompiled pcm files.

…tly (#67518)

There is only one constructor in use so the rest can be removed.

Set ups the infrastructure to create an empty GOFF file. Also adds a GOFF writer which writes only HDR/END records. Reviewed By: jhenderson, kpn Differential Revision: https://reviews.llvm.org/D111437

The GOFF file format is only supported on SystemZ.

…entable This patch fixes: clang/lib/StaticAnalyzer/Checkers/UndefResultChecker.cpp:61:13: error: unused function 'isShiftOverflow' [-Werror,-Wunused-function] clang/lib/StaticAnalyzer/Checkers/UndefResultChecker.cpp:66:13: error: unused function 'isLeftShiftResultUnrepresentable' [-Werror,-Wunused-function]

Otherwise they may mistakenly get the large section flag.

…e header files Extract included parts in the following tests to separate header files: - SemaCXX/warn-unsafe-buffer-usage-fixits-parm-span-overload.cpp - SemaCXX/warn-unsafe-buffer-usage-fixits-parm-span.cpp Removed the included part in the following tests as it is not useful: - SemaCXX/warn-unsafe-buffer-usage-warning-unevaluated-context.cpp

…Inst. Need to add NumSrcElts param to is..Mask functions in ShuffleVectorInstruction class for better mask analysis. Mask.size() not always matches the sizes of the permuted vector(s). Allows to better estimate the cost in SLP and fix uses of the functions in other cases. Differential Revision: https://reviews.llvm.org/D158449

…er size" This reverts commit 7a80a5d.

…uage (#66844)" This reverts commit e997dca.

This initially just adds support for mangling.

… COMDAT asan.module_ctor (#67745)" This reverts commit 1a4b9b6. When getUniqueModuleId(&M) is empty, we may add comdat to internal constants like $.str, causing spurious `error: relocation refers to a symbol in a discarded section` lld errors.

…rwinLog

…eferenceListInitialization (#65918)

…dule_ctor (#67745) On ELF platforms, when there is no global variable and the unique module ID is non-empty, COMDAT asan.module_ctor is created with no `__asan_register_elf_globals` calls. If this COMDAT is the prevailing copy selected by the linker, the linkage unit will have no `__asan_register_elf_globals` call: the redzone will not be poisoned and ODR violation checker will not work (#67677). This behavior is benign for -fno-sanitize-address-globals-dead-stripping because asan.module_ctor functions that call `__asan_register_globals` (`InstrumentGlobalsWithMetadataArray`) do not use COMDAT. To fix #67677: * Use COMDAT for -fsanitize-address-globals-dead-stripping on ELF platforms. * Call `__asan_register_elf_globals` even if there is no global variable. * If the unique module ID is empty, don't call SetComdatForGlobalMetadata: placing `@.str` in a COMDAT would incorrectly discard internal COMDAT `@.str` in other compile units. Alternatively, when there is no global variable, asan.module_ctor is not COMDAT and does not call `__asan_register_elf_globals`. However, the asan.module_ctor function cannot be eliminated by the linker. Tested the following script. Only ELF -fsanitize-address-globals-dead-stripping has changed behaviors. ``` echo > a.cc # no global variable, empty uniqueModuleId echo 'void f() {}' > b.cc # with global variable, with uniqueModuleId echo 'int g;' > c.cc # with global variable for t in x86_64-linux-gnu arm64-apple-macosx x86_64-windows-msvc; do for gc in -f{,no-}sanitize-address-globals-dead-stripping; do for f in a.cc b.cc c.cc; do echo /tmp/Rel/bin/clang -S --target=$t -fsanitize=address $gc $f -o - /tmp/Rel/bin/clang -S --target=$t -fsanitize=address $gc $f -o - | sed -n '/asan.module_ctor/,/ret/p' done done done ```

…leVectorInst." This reverts commit 9f5960e to fix buildbots reported here https://lab.llvm.org/buildbot/#/builders/230/builds/19412.

…7711) SerializetToHsaco, as currently implemented, leaks the file descriptor of the .hsaco temporary file, which causes issues in long-running parallel compilation setups. See also ROCm/rocMLIR#1257

Correct bad test in 1e00423. This affects clang-ppc64le-rhel.

Context BoundsSanitizer is a mitigation that is part of UBSAN. It can be enabled in "trap" mode to crash on OOB array accesses. Problem BoundsSan has zero false positives meaning every crash is a OOB array access, unfortunately optimizations cause these crashes in production builds to be a bit useless because we only know which function is crashing but not which line of code. Godbolt example of the optimization: https://godbolt.org/z/6qjax9z1b This Diff I wanted to provide a way to know exactly which LOC is responsible for the crash. What we do here is use the size of the basic block as an iterator to an immediate value for the ubsan trap. Previous discussion: https://reviews.llvm.org/D148654

Summary: The POSIX standard expects the first argument to this function to be constant, e.g. https://man7.org/linux/man-pages/man2/nanosleep.2.html. This fixes that problem and also corrects an obvious problem with enabling this for offloading.

The new number of elements should be the original one divided by a scale factor computed from old and new bit width.

The plan is to fix memcmp interceptor in HWASAN and remove the unsupported statement at that time. --------- Co-authored-by: Vitaly Buka <vitalybuka@gmail.com>

Summary: This was disabled on the GPU because it conflicted with the definition in `glibc`. According to information online and in the `glibc` implementation, the first argument should be a `const void *`. Fixing this resolves the problem when exporting this to offloading languages.

[MLIR] Add stage and effectOnFullRegion to side effect This patch add stage and effectOnFullRegion to side effect for optimization pass to obtain more accurate information. Stage uses numbering to track the side effects's stage of occurrence. EffectOnFullRegion indicates if effect act on every single value of resource. RFC disscussion: https://discourse.llvm.org/t/rfc-add-effect-index-in-memroy-effect/72235 Differential Revision: https://reviews.llvm.org/D156087 Reviewed By: mehdi_amini, Mogball Differential Revision: https://reviews.llvm.org/D156087

…quencies (#67826) The goal in #66818 was to capture function entry counts, but those are not the same as the frequency of the entry (machine) basic block. This fixes that, and adds explicit profiles to the test. We also increase the precision of `MachineBlockFrequencyInfo::getBlockFreqRelativeToEntryBlock` to double. Existing code uses it as float so should be unaffected.

Replace some uses of `Type::getPointerTo` via 2 ways * Remove entirely if it's only used to support an unnecessary bitcast (remove the bitcast as well). * Replace with `PointerType::get`/`PointerType::getUnqual` NFC opaque pointer clean-up effort.

This partially reverts #66380. The assertion that the underlying buffer of an EncodingReader is aligned to any required alignments for resource sections. Resources know their own alignment and pad their buffers accordingly, but the bytecode reader doesn't know that ahead of time. Consequently, it cannot give the resource EncodingReader a base buffer aligned to the maximum required alignment. A simple example from the test fails without this: ```mlir module @TestDialectResources attributes { bytecode.test = dense_resource<resource> : tensor<4xi32> } {} {-# dialect_resources: { builtin: { resource: "0x2000000001000000020000000300000004000000", resource_2: "0x2000000001000000020000000300000004000000" } } ```

… strided memref (#67714)" This reverts commit 35ec6ea. Breaks downstream narrow type execution tests.

This patch tentatively fixes the various test failures introduced following 0ea3d88: https://green.lab.llvm.org/green/view/LLDB/job/as-lldb-cmake/6316/ From my understanding, the main issue here is that we can't find some headers when evaluating C++ expressions since those headers have been promoted to be system modules, and to be shipped as part of the toolchain. Prior to 0ea3d88, the `BuiltinHeadersInSystemModules` flag for in the clang `LangOpts` struct was always set, however, after it landed, the flag becomes opt-in, depending on toolchain that is used with the compiler instance. This gets set in `clang::createInvocation` down to `Darwin::addClangTargetOptions`, as this is used mostly on Apple platforms. However, since `ClangExpressionParser` makes a dummy `CompilerInstance`, and sets the various language options arbitrarily, instead of using the `clang::createInvocation`, the flag remains unset, which causes the various error messages: ``` AssertionError: 'error: module.modulemap:96:11: header 'stdarg.h' not found 96 | header "stdarg.h" // note: supplied by the compiler | ^ ``` Given that this flag was opt-out previously, this patch brings back that behavior by setting it in lldb's `ClangExpressionParser` constructor, until we actually decide to pull the language options from the compiler driver. Signed-off-by: Med Ismail Bennani <ismail@bennani.ma>

At line 191, `addSymbol` takes the name by reference but does not make an internal copy to the string, meaning the local `optional<std::string>` would get freed and leave Orc with a dangling pointer. Fix this by just using an `optional<StringRef>` instead.

…67871) Variables that point to physical storage buffer require aliasing decorations. This is specified by the `SPV_KHR_physical_storage_buffer` extension. Also add an example of a variable with a decoration attribute.

The goal of the class is to be an (almost) drop in replacement for SmallVector and std::vector when those are presized and filled later, as it happens in SourceManager and ASTReader. By doing so, sparsely accessed PagedVector can profit from reduced memory footprint.

This patch replaces any_cast with llvm::any_cast. This in turn allows us to gracefully switch to std::any in future by forwarding llvm::Any and llvm::any_cast to: using Any = std::any; template <class T> T *any_cast(Any *Value) { return std::any_cast<T>(Value); } respectively. Without this patch, it's ambiguous whether any_cast refers to std::any_cast or llvm::any_cast. As an added bonus, this patch makes it easier to mechanically replace llvm::any_cast with std::any_cast without affecting other occurrences of any_cast (e.g. in libcxx).

After patch #67288 landed, unfoldMemoryOperand would not return NewMIs whose size ==3. So the removed line is useless.

It always returns false in the unwrapped line parser!

Most of the print options are hidden, make hidden them all.

To do this: 1. Protect BC.Ctx with mutex 2. Don't call exit from thread, please check the reason comment near PassFailed variable definition. The other option would be call _Exit instead of exit, but I think we shall call destructors properly.

The AArch64StorePairSuppress pass prevents the creation of STP under some heuristics. Unfortunately it often prevents the creation of STP in cases where it is obviously beneficial, and it doesn't match my understanding of scheduling/cpu pipelining to prevent the creation of STP. From some benchmarking, even on an in-order cpu where the scheduling is most important I don't see it giving better results. In general the lower instruction count for STP would be expected to give a slightly better cycle count. As the pass specifically mentions the cyclone cpu, this patch adds a target feature for FeatureStorePairSuppress, enabled for all the non-Arm cpus. This has the effect of disabling it for all Arm cpus. Differential Revision: https://reviews.llvm.org/D134646

* Remove if its sole use is to support an unnecessary ptr-to-ptr bitcast (remove the bitcast as well) * Replace with use of other APIs. NFC opaque pointer cleanup effort.

…ng modes The newly added tests check which rounding mode is used by default, and that it is printed when aliases are disabled but not otherwise. These tests will be used in #67555.

… instrs (#65228) Fixes #65227 LLVMGetOrdering previously did not support Fence instructions, and calling it on a fence would lead to a bad cast as it assumed a load/store, or an AtomicRMWInst. This would either read a garbage memory order, or assertion LLVMIsAtomicSingleThread did not support either Fence instructions, loads, or stores, and would similarly lead to a bad cast. It happened to work out since the relevant types all have their synch scope ID at the same offset, but it still should be fixed These cases are now fixed for the C API, and tests for these instructions are added. The echo test utility now also supports cloning Fence instructions, which it did not previously ----- From what I can tell, there's no unified API to pull `getOrdering`/`getSyncScopeID` from, and instead requires casting to individual types: if there is a better way of handling this I can switch to that

…ntime LLVM integration See discourse thread https://discourse.llvm.org/t/rfc-support-cmake-option-to-control-link-type-built-for-flang-runtime-libraries/71602/18 for full details. Flang-rt is the new library target for the flang runtime libraries. It builds the Flang-rt library (which contains the sources of FortranRuntime and FortranDecimal) and the Fortran_main library. See documentation in this patch for detailed description (flang-rt/docs/GettingStarted.md). This patch aims to: - integrate Flang's runtime into existing llvm infrasturcture so that Flang's runtime can be built similarly to other runtimes via the runtimes target or via the llvm target as an enabled runtime - decouple the FortranDecimal library sources that were used by both compiler and runtime so that different build configurations can be applied for compiler vs runtime - add support for running flang-rt testsuites, which were created by migrating relevant tests from `flang/test` and `flang/unittest` to `flang-rt/test` and `flang-rt/unittest`, using a new `check-flang-rt` target. - provide documentation on how to build and use the new FlangRT runtime Reviewed By: DanielCChen Differential Revision: https://reviews.llvm.org/D154869

This adds `IntegralAP` backing the two new primtypes `IntAP` (unsigned arbitrary-precision int) and `IntAPS` (same but signed). We use this for `int128` support (which isn't available on all host systems we support AFAIK) and I think we can also use this for `_BitInt` later.

…rs (#65844)" This reverts commit 16b9e6f. This breaks buildbots.

These are not available in all build configurations. Originally introuduced in: #66430

This seems to be an issue common to both GCC and LLVM. There are various RISC-V FCVT instructions where the frm field makes no difference to the output as the result is always exact (e.g. fcvt.d.s, fcvt.s.h, fcvt.d.h). As with GCC, we always generate a form of these fcvt instructions where frm=0b000. However, the ISA manual _doesn't_ state that frm values are invalid, and we should ensure we can accept them. This patch does so by adding the frm field to fcvt.d.s and adding an InstAlias so that if no frm is specified, it defaults to rne (0b000). This patch just corrects fcvt.d.s in order to allow the approach to be reviewed, before applying it to the other affected instructions. I haven't added tests to llvm/test/MC/Disassembler/RISCV, because it doesn't seem necessary to test there in addition to our usual round-trip tests in llvm/test/MC/RISCV. But feedback is welcome. Recently added tests ensure that the default `rne` rounding mode is printed as desired.

Xcode `lipo` seems to support a non-documented `-fat64` option that creates Universal Mach-O archives using 64 bit versions of the `fat_arch` header, which allows offsets larger than 32 bits to be specified. Modify `llvm-lipo` to support the same flag, and use the value of the flag to use either 32 bits or 64 bits Mach-O headers. The Mach-O universal writer allows specifying a new option to write these 64 bits headers. The default is still using 32 bits. `dsymutil` implemented support for a similar flag in https://reviews.llvm.org/D146879.

/llvm-project/llvm/lib/Object/MachOUniversalWriter.cpp:351:3: error: default label in switch which covers all enumeration values [-Werror,-Wcovered-switch-default] default: ^ 1 error generated.

Followup to #67628 that relaxes the symbol regex a bit to cover more lldb_private symbols.

C:\buildbot\mlir-x64-windows-ninja\llvm-project\llvm\lib\Object\MachOUniversalWriter.cpp(352) : error C2220: the following warning is treated as an error C:\buildbot\mlir-x64-windows-ninja\llvm-project\llvm\lib\Object\MachOUniversalWriter.cpp(352) : warning C4715: 'llvm::object::writeUniversalBinaryToStream': not all control paths return a value

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pull] main from llvm:main #5540

[pull] main from llvm:main #5540

Commits on Sep 28, 2023

Commits on Sep 29, 2023

Commits on Sep 30, 2023

Commits on Oct 1, 2023